tejalpatwardhan

these results were eye-opening for me... chatgpt agent performed better than i expected on some pretty realistic investment banking tasks [image]

2025-07-18 View on X

The Verge

OpenAI debuts ChatGPT Agent, which can control an entire computer and perform multi-step tasks, powered by a new dedicated model, rolling out to paid users

One employee uses it to automate his weekly parking requests at OpenAI's San Francisco office.

View original

new method to address and mitigate emergent misalignment in language models: we show activation monitoring and evals can help catch emergent misalignment early. then, we can re-align models via steering and training. surprisingly, re-aligning models is more data-efficient than

2025-06-19 View on X

Axios

OpenAI warns that its upcoming models could pose a higher risk of helping create bioweapons and is partnering to build diagnostics, countermeasures, and testing

OpenAI cautioned Wednesday that upcoming models will head into a higher level of risk when it comes to the creation of biological weapons …

View original

new method to address and mitigate emergent misalignment in language models: we show activation monitoring and evals can help catch emergent misalignment early. then, we can re-align models via steering and training. surprisingly, re-aligning models is more data-efficient than

2025-06-19 View on X

TechCrunch

OpenAI details why “emergent misalignment”, where training models on wrong answers in one area can lead to issues in many others, happens and how to mitigate it

Maxwell Zeff / TechCrunch :

View original

Introducing SWE-Lancer: our most realistic coding benchmark to date. $1M in real-world, full-stack freelance SWE tasks, each taking freelancers >21 days to complete on avg. Still some limitations, but better than evals we had before. Congrats @samuelp1002 @michelelwang!

2025-02-19 View on X

VentureBeat

OpenAI researchers build the SWE-Lancer benchmark and find that real-world freelance software engineering work remains challenging for frontier language models

Large language models (LLMs) may have changed software development, but enterprises will need to think twice …

View original

latest from preparedness: we're developing wet lab biology evals with @LosAlamosNatLab and @nickgenerous keen to learn how gpt-4o's vision and voice capabilities can assist scientists with real-world lab tasks (e.g., troubleshooting cell culture growth)

2024-07-11 View on X

Bloomberg

OpenAI and Los Alamos National Laboratory announce a partnership to evaluate how multimodal AI models can be used safely by scientists in laboratory settings

Evan Gorelick / Bloomberg :

View original

https://chatgpt.com/ with no auth! improves model accessibility, so more of the world can grapple with the implications of AI

2024-04-02 View on X

TechCrunch

OpenAI no longer requires an account to use ChatGPT, but with “slightly more restrictive content policies”, starting in a few markets and rolling out globally

OpenAI is making its flagship conversational AI accessible to everyone, even people who haven't bothered making an account.

View original

latest from preparedness @ openai: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet... but we'll...

2024-02-01 View on X

Bloomberg

OpenAI says GPT-4 poses “at most” a slight risk of helping people create biological threats, per the company's early tests to evaluate “catastrophic” LLM risks

Mark Zuckerberg; Struggling Startups Are Looking For the Exits Michael Nuñez / VentureBeat : OpenAI study reveals surprising role of AI in future biological threat creation Tom Car...

View original