smcgrath.phd · TEXXR

Utah's new prescription AI bot was easily compromised. — Using basic jailbreaking, researchers at Mindgard tricked the system into tripling OxyContin doses and recommending meth. For a tool legally allowed to renew meds, that's a massive safety gap that can't be ignored. — #MedSky

2026-03-04 View on X

Axios

Security researchers successfully prompted the AI system behind a Utah prescription renewal pilot to reclassify meth as an “unrestricted therapeutic”, and more

Security researchers used relatively simple jailbreaking techniques to trick the AI system powering Utah's new prescription refill bot.

View original

Teen AI use has doubled every year since 2023, with 54% of students now using chatbots for school. — What's striking? 3 out of 5 say cheating is a regular part of life now. — We're seeing a massive shift in habits faster than schools can draft a policy. It's a literacy gap in the making. …

2026-02-25 View on X

Pew Research Center

A survey of US teens: 57% use AI chatbots to search for info, 54% use them to do schoolwork, 47% for fun or entertainment, 12% for emotional support, and more

Just over half of U.S. teens say they have used chatbots for help with schoolwork, and 12% say they've gotten emotional support.

View original

Really important research out of Anthropic: In a RCT study, they found AI coding assistance resulted in a 𝟏𝟕% 𝐝𝐫𝐨𝐩 in mastery for users. — While tasks were slightly faster, offloading thinking to AI stunted skill growth. …

2026-01-31 View on X

Anthropic

Anthropic details an experiment on whether AI coding tools shape developer skills: the biggest performance decline for developers occurred in debugging tasks

Read the paper — Research shows AI helps people do parts of their job faster. In an observational study of Claude.ai data, we found AI can speed up some tasks by 80%.

View original

You do have to give Anthropic credit here. It is rare for a lab to publish data questioning its own tools. — Meta constantly buries internal findings that challenge their business model. This kind of transparency is uncommon and should be encouraged instead of dog-piled. — Link to the full study:

2026-01-31 View on X

Anthropic

Anthropic details an experiment on whether AI coding tools shape developer skills: the biggest performance decline for developers occurred in debugging tasks

Read the paper — Research shows AI helps people do parts of their job faster. In an observational study of Claude.ai data, we found AI can speed up some tasks by 80%.

View original

Analysis of 5,000 NeurIPS papers confirms US and Chinese AI research remains linked. Joint work accounts for 3% of output. Technology flows both ways: Chinese labs adopt Llama, while US researchers use Alibaba's Qwen.

2026-01-22 View on X

Wired

An analysis of 5,290 AI research papers at NeurIPS: 141, or ~3%, had US-China AI lab collaboration, vs. 134/4,497 in 2024; Llama featured in 106 Chinese papers

WIRED analyzed more than 5,000 papers from NeurIPS using OpenAI's Codex to understand the areas where the US and China actually work together on AI research.

View original

Moxie Marlinspike's Confer extends Signal's privacy model to AI. Using trusted execution environments, it renders queries opaque to operators. This open-source framework is a move to reclaim user data from the industry's power players.

2026-01-14 View on X

Ars Technica

A look at Confer, an open-source AI assistant project from Signal creator Moxie Marlinspike that is designed to provide end-to-end encryption for AI chats

Moxie Marlinspike—the pseudonym of an engineer who set a new standard for private messaging with the creation of the Signal Messenger …

View original

Moxie Marlinspike's Confer extends Signal's privacy model to AI. Using trusted execution environments, it renders queries opaque to operators. This open-source framework is a move to reclaim user data from the industry's power players.

2026-01-13 View on X

Ars Technica

A look at Confer, an open-source AI assistant project from Signal creator Moxie Marlinspike that is designed to provide end-to-end encryption for AI chats

Moxie Marlinspike—the pseudonym of an engineer who set a new standard for private messaging with the creation of the Signal Messenger …

View original

🧪 A new AI agent successfully bypasses 99.8% of survey bot detectors. By simulating human keystrokes and typos, the tool demonstrates how easily synthetic respondents can poison scientific data and manipulate polling results. — #AcademicSky

2025-11-23 View on X

404 Media

A researcher details an LLM-based AI agent that “demonstrated a near-flawless ability” to bypass bot detection methods while answering online survey questions

We can no longer trust that survey responses are coming from real people. Online survey research …

View original

Research on AI-generated civics lesson plans reveals significant shortcomings. Analysis shows 90% of activities focused on recall, not critical thinking, and only 6% included multicultural content, resulting in uninspired instruction. — #EduSky #AcademicSky

2025-10-19 View on X

The Conversation

A study of 311 AI-generated eighth-grade civics lesson plans in Massachusetts suggests they fall short of inspiring students or promoting critical thinking

When teachers rely on commonly used artificial intelligence chatbots to devise lesson plans, it does not result in more engaging …

View original

Data centers aren't just driving up electric bills. — Insatiable demand from AI is consuming the world's memory and storage supply. — Manufacturers are redirecting production, creating a supply squeeze that is driving up prices for SSDs, DRAM, and HDDs for years to come.

2025-10-05 View on X

Tom's Hardware

The AI boom is driving memory and storage shortages that may last a decade; OpenAI's Stargate has deals for 900K DRAM wafers per month, or ~40% of global output

Once-cheap SSDs, DRAM, and HDD prices are climbing fast as AI demand and constrained supply converge to create the tightest market in years. Bluesky: @smcgrath.phd , @broximar , @z...

View original

Poor AI translations are flooding Wikipedia editions for vulnerable languages. With no community to correct errors, this flawed content risks being used to train future AI models, creating a “doom spiral” that further degrades the language online.

2025-09-26 View on X

MIT Technology Review

How inaccurate AI translations of Wikipedia pages, which AI models use for training, may cause a doom spiral that further marginalizes vulnerable languages

When Kenneth Wehr started managing the Greenlandic-language version of Wikipedia four years ago, his first act was to delete almost everything. LinkedIn: Soeren Eberhardt and Rache...

View original

AI-powered stuffed animals are being marketed as screen-free companions for young children. While the toys can converse with a child, they also transcribe those conversations for parents, blurring the line between a toy and a parental surveillance tool.

2025-08-17 View on X

New York Times

A look at AI-powered stuffed animals like Grem, Grok, and Gabbo, which are being promoted as an alternative to screen time for children as young as 3

Curio is a company that describes itself as “a magical workshop where toys come to life.” When I recently visited its cheery headquarters …

View original

Google has released Gemini 2.5 Deep Think, its first public multi-agent reasoning model. The system explores multiple ideas in parallel to solve complex problems, achieving state-of-the-art results on coding and reasoning benchmarks. — #MLSky

2025-08-01 View on X

TechCrunch

Google rolls out Gemini 2.5 Deep Think, its most advanced reasoning model, which considers multiple ideas simultaneously, to its $250/month Ultra subscription

Google DeepMind is rolling out Gemini 2.5 Deep Think, which, the company says, is its most advanced AI reasoning model …

View original

Grok 4 appears to consult Elon Musk's X posts and news about him when addressing controversial subjects like immigration or the Israel-Palestine conflict. This design choice, aiming to align with Musk's views, raises questions about Grok's stated goal of being a “maximally truth-seeking AI.” …

2025-07-11 View on X

TechCrunch

Tests reveal that Grok 4 seems to search for Elon Musk's views online when asked about sensitive topics, and its answers tend to align with Musk's opinions

During xAI's launch of Grok 4 on Wednesday night, Elon Musk said — while live-streaming the event on his social media platform …

View original

DOGE's reckless deployment of half-baked AI tools is going to make things worse. — Examples DOGE's haphazard AI use risks becoming the definitive, negative narrative on AI, thereby undermining efforts for ethical and safe AI deployment and overshadowing any potential benefits and advancements. …

2025-06-06 View on X

ProPublica

VA used a DOGE AI tool by Gumroad founder Sahil Lavingia that hallucinated contract sizes to cancel 24+ deals; Lavingia says “mistakes were made”

We obtained records showing how a Department of Government Efficiency staffer with no medical experience used artificial intelligence to identify which VA contracts to kill.

View original

A study finds requesting brief answers from AI chatbots can increase hallucinations. Models may sacrifice factual accuracy for conciseness, especially when dealing with ambiguous topics. #MLSky

2025-05-10 View on X

TechCrunch

A study finds that asking LLMs to be concise in their answers, particularly on ambiguous topics, can negatively affect factuality and worsen hallucinations

Turns out, telling an AI chatbot to be concise could make it hallucinate more than it otherwise would have.

View original

OpenAI's new “reasoning” models (o3 and o4-mini) actually hallucinate MORE than their predecessors — OpenAI's internal tests show o3 hallucinated on 33% of person-related questions, double the rate of previous models. Even worse, o4-mini hit 48%.

2025-04-20 View on X

TechCrunch

OpenAI says its new o3 and o4-mini AI models hallucinate more often than its previous reasoning and traditional models, and the company doesn't know why

OpenAI's internal tests show o3 hallucinated on 33% of person-related questions, double the rate of previous models. Even worse, o4-mini hit 48%. Mastodon: Aulia Masna / @aulia@me...

View original

OpenAI's new “reasoning” models (o3 and o4-mini) actually hallucinate MORE than their predecessors — OpenAI's internal tests show o3 hallucinated on 33% of person-related questions, double the rate of previous models. Even worse, o4-mini hit 48%.

2025-04-20 View on X

Every

A comparison of OpenAI's o3, o4-mini, and GPT-4.1; Aaron Levie says o3 nailed a multi-step financial modeling task; Scale AI CEO says o3 is “a big breakthrough”

Our take on what's powerful, what's practical, and what's still TBD … If you've been following AI news this week …

View original

OpenAI's new “reasoning” models (o3 and o4-mini) actually hallucinate MORE than their predecessors — OpenAI's internal tests show o3 hallucinated on 33% of person-related questions, double the rate of previous models. Even worse, o4-mini hit 48%.

2025-04-19 View on X

TechCrunch

OpenAI says its new o3 and o4-mini AI models hallucinate more often than its previous reasoning and traditional models, and the company doesn't know why

OpenAI's recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate …

View original

Siblings Dario and Daniela Amodei helped found Anthropic to build Claude, an AI model shaped by safety-first ideals. — Born from disillusionment with OpenAI, Claude aims to set the global bar for responsible AGI. — 🩺🖥️ #MLSky

2025-03-28 View on X

Wired

Interviews with Dario Amodei, Daniela Amodei, and other executives about Anthropic's origin, Claude, why DeepSeek isn't a threat, reaching AGI safely, more

The brother goes on vision quests. The sister is a former English major. Together, they defected from OpenAI, started Anthropic …

View original