/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Dave Kasten

@david_kasten
5 posts
2025-11-22
This result blew my mind when I first got it previewed to me a little while ago.  I think one lesson I keep on believing in more and more deeply is that the two questions you should always ask in AI models are: 1.  What is _really_ in the statistical distribution of the training data and 2.  What are you _really_ training for? ...
2025-11-22 View on X
Anthropic

Anthropic finds that LLMs trained to “reward hack” by cheating on coding tasks show even more misaligned behavior, including sabotaging AI-safety research

and stops the generalization. [image] @anthropicai : But surprisingly, at the exact point the model learned to reward hack, it learned a host of other bad behaviors too. It started...

2025-07-12
I still think they shouldn't release an open-weight model (proliferation risk is far too high), but, credit where it's due, they're taking longer to do at least some testing. (And, also, they're not calling it “open source”, thankfully.)
2025-07-12 View on X
TechCrunch

Sam Altman announces another delay for OpenAI's open-weight model, for further safety testing; the model was slated to be released next week

OpenAI CEO Sam Altman said Friday the company is delaying the release of its open model, which was already pushed back a month earlier in this summer.

2025-06-10
Weird how NYT writes that one of the most successful tech companies of all time is working on superintelligence, and then immediately says that even artificial general intelligence “is an ambition with no clear path to success.” Well, Facebook M&A rolls to disbelieve.
2025-06-10 View on X
New York Times

Sources: Meta plans to build an AI lab dedicated to pursuing “superintelligence”, led by Scale AI CEO Alexandr Wang, with seven- to nine-figure compensations

The new lab, set to include Scale AI founder Alexandr Wang, is part of a reorganization of Meta's artificial intelligence …

2024-10-16
I'm still reading this, and more broadly am still somewhat uncertain whether I think RSPs are actually conceptually feasible at higher levels of intelligence, but I do like that they are explicitly logging even minor deviations like “our eval took 3 days longer than the policy”
2024-10-16 View on X
VentureBeat

Anthropic updates its Responsible Scaling Policy, setting benchmarks for when an AI model's abilities reach a point where additional safeguards are necessary

Anthropic, the artificial intelligence company behind the popular Claude chatbot, today announced a sweeping update …

2022-09-07
So, rolling out a test feature in NZ or similarly-sized geo is a common mobile app practice (there are entire video games that only exist there!). But this is a rare instance where you might write an APSR paper about that feature's impact on politics... https://twitter.com/...
2022-09-07 View on X
TechCrunch

Twitter says users can edit tweets up to five times in 30 minutes and Twitter Blue subscribers in New Zealand will get the feature first

Twitter announced a much-anticipated feature last week — the ability to edit tweets.  The company said that once the feature is available users …