/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Xiangyu Qi

@xiangyuqi_pton
5 posts
2025-12-23
We recently updated the browser agent in ChatGPT Atas to be more resilient to prompt injection. In this post, we share how we use reinforcement learning to automatically red-team our agents end-to-end, uncover novel attacks, and ship mitigations. https://openai.com/...
2025-12-23 View on X
TechCrunch

OpenAI details efforts to secure its ChatGPT Atlas browser against prompt injection attacks, including building an “LLM-based automated attacker”

Even as OpenAI works to harden its Atlas AI browser against cyberattacks, the company admits that prompt injections …

2023-10-16
Why is it concerning? Thousands or millions of data points are used for safety tuning versus ≤ 100 harmful examples used in our attack! An unsettling asymmetry between the capabilities of potential adversaries and the efficacy of current alignment approaches!
2023-10-16 View on X
The Register

Researchers find that a modest amount of fine-tuning can bypass safety efforts aiming to prevent LLMs such as OpenAI's GPT-3.5 Turbo from spewing toxic content

Also, our ablation indicates that larger learning rates and smaller batch sizes generally lead to more severe safety degradation! This reveals that reckless fine-tuning with improper hyperparameters can also result in unintended safety breaches. [image]
2023-10-16 View on X
The Register

Researchers find that a modest amount of fine-tuning can bypass safety efforts aiming to prevent LLMs such as OpenAI's GPT-3.5 Turbo from spewing toxic content

Meta's release of Llama-2 and OpenAI's fine-tuning APIs for GPT-3.5 pave the way for custom LLM. But what about safety? 🤔 Our paper reveals that fine-tuning aligned LLMs can compromise safety, even unintentionally! Paper: https://arxiv.org/... Website: https://llm-tuning-safety.github.io/ [image]
2023-10-16 View on X
The Register

Researchers find that a modest amount of fine-tuning can bypass safety efforts aiming to prevent LLMs such as OpenAI's GPT-3.5 Turbo from spewing toxic content

2023-10-15
Meta's release of Llama-2 and OpenAI's fine-tuning APIs for GPT-3.5 pave the way for custom LLM. But what about safety? 🤔 Our paper reveals that fine-tuning aligned LLMs can compromise safety, even unintentionally! Paper: https://arxiv.org/... Website: https://llm-tuning-safety.github.io/ [image]
2023-10-15 View on X
The Register

Researchers find that a modest amount of fine-tuning can undo safety efforts that aim to prevent LLMs such as OpenAI's GPT-3.5 Turbo from spewing toxic content

OpenAI GPT-3.5 Turbo chatbot defenses dissolve with ‘20 cents’ of API tickling  —  The “guardrails” created to prevent large language models …