red teaming (Entity)

MIT Technology Review 8 related

Microsoft researchers say AI models can be used to design toxins or pathogens that evade biosecurity systems used to screen DNA orders for potential biothreats

and presenting first-of-its-kind red teaming & mitigations to strengthen biosecurity in the age of AI. LinkedIn: Satya Nadella : Published today in Science Magazine: a landmark study led by Microsoft ...

2025-10-03 View

Wired 51 related

OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM

gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good OpenAI on GitHub : ...

2025-08-06 View

SiliconANGLE 14 related

Google unveils benchmarking platform Kaggle Game Arena, where LLMs compete head-to-head in strategic games, starting with a chess tournament from August 5 to 7

Watch models compete in complex games providing a verifiable and dynamic measure of their capabilities. Kaggle : Chess Text Input Leaderboard Nick Bild / Hackster : Shall We Play a Game? Maximilian Sc...

2025-08-05 View

Financial Times 10 related

Anthropic details Constitutional Classifiers, a protective LLM layer designed to stop AI model jailbreaking by monitoring inputs and outputs for harmful content

inputs designed to bypass its safety training and force it to produce outputs that might be harmful. Our new technique is a step towards robust jailbreak defenses. Read the blog post: https://anthropi...

2025-02-04 View

Wired 15 related

Researchers: DeepSeek's R1 failed to detect or block any of 50 randomly selected malicious prompts; Adversa says DeepSeek's restrictions can easily be bypassed

Unit 42 researchers recently revealed two novel and effective jailbreaking … Victor Tangermann / Futurism : DeepSeek Failed Every Single Security Test, Researchers Found Ivan Novikov / Wallarm : Analy...

2025-02-01 View

TechCrunch 25 related

OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025

12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI model to tackle compl...

2024-12-22 View

Apollo Research 5 related

An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests

It presents a new safety challenge that OpenAI is trying to address. — techcrunch.com/2024/12/05/o... Anders Sandberg / @arenamontanus : In an IVA discussion on AI yesterday evening professor Kristi...

2024-12-07 View

The Verge 50 related

Microsoft delays Recall to test it with the Windows Insider Program and won't ship it with Copilot+ PCs next week, after saying it would make the feature opt-in

will arrive via Windows Update later this year Richi Jennings / Security Boulevard : Recall ‘Delayed Indefinitely’ — Microsoft Privacy Disaster is Cut from Copilot+ PCs Katie Bartlett / CNBC : Microso...

2024-06-15 View

ZDNet 17 related

Microsoft releases PyRIT, a tool that the company's AI Red Team has been using to more efficiently check for risks in its generative AI systems, such as Copilot

https://www.microsoft.com/... I know, let's pretend that LLM security can be bolted on later after we have created a foundation model based on data scraped from the Internet that is FULL of poison, g...

2024-02-24 View

TechCrunch 5 related

OpenAI launches the OpenAI Red Teaming Network, a contracted group of experts to help inform the company's AI model risk assessment and mitigation strategies

Kyle Wiggers / TechCrunch :

2023-09-20 View

red teaming

Related Entities

Top Voices

Explore Further

Coverage Timeline