/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

@apartresearch

@apartresearch
2 posts
2024-01-15
Big kudos to our researchers @FazlBarez and @_clementneo for their contributions to this important paper (that even @elonmusk commented on)! @AnthropicAI has led the recently concluded work that investigates how larger language models become better at hiding their malicious...
2024-01-15 View on X
TechCrunch

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training  —  This is some sci-fi stuff right here (e...

2024-01-14
Big kudos to our researchers @FazlBarez and @_clementneo for their contributions to this important paper (that even @elonmusk commented on)! @AnthropicAI has led the recently concluded work that investigates how larger language models become better at hiding their malicious...
2024-01-14 View on X
TechCrunch

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

Most humans learn the skill of deceiving other humans.  So can AI models learn the same?  Yes, the answer seems — and terrifyingly, they're exceptionally good at it.