Google launches RT-2 or Robotics Transformer 2, a “vision-language-action” model trained on text and images from the web that can output robotic actions
Our sneak peek into Google's new robotics model, RT-2, which melds artificial intelligence technology with robots.
New York Times Kevin Roose
Related Coverage
- Light-Speed Advances In Robotics: Doing The Dishes And More Forbes · John Werner
- Google's New AI Model Controls Robots PCMag · Emily Price
- Google's DeepMind RT-2 AI Model Will Help Robots Serve Humans Like R2D2 HotHardware · Tim Sweezy
- Google Deepmind's latest AI model RT-2 “can speak robot” The Decoder · Matthias Bastian
- Google's new RT-2 AI model helps robots interpret visual and language patterns Business Today · Pranav Dixit
- Is Robot Control Going Natural with AI Tech? Google's RT-2 Promises So! Cryptopolitan · Aamir Sheikh
- Google's New AI Tech Lets You Command Robots to Throw Away Your Trash CNET · Imad Khan
- The robots we were afraid of are already here The Economic Times
- Google's RT-2 AI model brings us one step closer to WALL-E Ars Technica · Benj Edwards
- 5 questions for Leroy Hood Politico · Derek Robertson
- Google's Latest Robots Are Getting Smarter, Aided by A.I. Language Models BigTechWire · Surur
- RT-2: New model translates vision and language into action DeepMind Blog
- Speaking robot: Our new AI model translates vision and language into robotic actions The Keyword · Vincent Vanhoucke
- Google DeepMind's new RT-2 system enables robots to perform novel tasks ZDNet · Maria Diaz
- Google's RT-2 model helps robots to more easily perform actions in new situations Neowin · Paul Hill
- RT-2: Vision-Language-Action Models RT-2 on GitHub
- LLM + images = VLM ( Visual Language Model ) — VLA + robotics = VLA ( Visual Language Action [model] ) … Nemunas A.
Discussion
-
@kboughida
Karim B Boughida
on x
Can you imagine those in a library helping with shelf-reading, re-shelving, and doing chats! Aided by A.I. Language Models, Google's Robots Are Getting Smart https://www.nytimes.com/...
-
@googledeepmind
@googledeepmind
on x
Across all categories, we saw increased generalisation performance compared to previous baselines, such as on RT-1 models. We also evaluated RT-2 on a number of unseen objects and environments where it could successfully adapt to new situations: https://dpmd.ai/...
-
@googledeepmind
@googledeepmind
on x
⚪ To explore RT-2's emergent capabilities in trials, we searched for tasks that require combining learnings from web data and the robot's experience. We then defined 3 skills it needed to show: 🔘 Symbol understanding 🔘 Reasoning 🔘 Human recognition https://dpmd.ai/... [image]
-
@glenngabe
Glenn Gabe
on x
Yep, here we go... LLMs plugged into robots -> Aided by A.I. Language Models, Google's Robots Are Getting Smart “Google has recently begun plugging state-of-the-art language models into its robots, giving them the equivalent of artificial brains.” https://www.nytimes.com/... [ima…
-
@googledeepmind
@googledeepmind
on x
Today, we announced 𝗥𝗧-𝟮: a first of its kind vision-language-action model to control robots. 🤖 It learns from both web and robotics data and translates this knowledge into generalised instructions. Find out more: https://dpmd.ai/... [video]
-
@alex
@alex
on x
Fuck yes this is what I was hoping llms would do to robots and I was (happily) years late to the idea https://www.nytimes.com/...
-
@kimzetter
Kim Zetter
on x
A one-armed robot stood in front of a table with three plastic figurines on top of it: a lion, a whale and a dinosaur. Engineer: “Pick up the extinct animal.” The robot whirred for a moment, then its arm extended and its claw opened and descended. It grabbed the dinosaur.
-
@imordatch
Igor Mordatch
on x
Excited to finally share what we've been working on for the past while: combining vision, language, and action modalities for robot control! https://www.nytimes.com/...
-
@lesaunh
@lesaunh
on x
Too few have understood that AI advances are mapping onto robots. Yes, AI is going to impact cognitive labor first, but the robots will flood the physical labor market soon after. [image]
-
@ajddavison
Andrew Davison
on x
Large language models and web-scale data have some use in robotics as a user interface as nicely demonstrated here, but in my opinion they are not what we need to help with perception, object representation and precise planning which are the real current barriers in robotics.
-
@xiao_ted
Ted Xiao
on x
Introducing RT-2, representing the culmination of two trends: - Tokenize and train everything together: web-scale text, images, and robot data - VLMs are not just representations, big VLMs *are policies* Sounds subtle, but we'll look back on this as an inflection point! 📈
-
@hausman_k
Karol Hausman
on x
Multiple conclusions from these experiments: (1) it turns out with a little bit of robot data we can transfer semantic concepts from vision language web-scale data to robot actions (2) the best VLMs might be the most generalizable robotic controllers
-
@chelseabfinn
Chelsea Finn
on x
Vision-language ➡️ vision-language-action model By using a pre-trained VLM (e.g. PaLI-X), RT-2 enables robots to generalize to new objects & instructions RT-2 also shows basic reasoning capabilities. (e.g. “place orange in matching bowl") Paper+videos: https://robotics-transforme…
-
@svlevine
Sergey Levine
on x
Turns out that vision-language models can control robots too. The secret is to just finetune them to print out the actions (literally, as text). Really excited about our new result, the successor to RT-1. RT-2 is a pre-trained VLM: https://www.deepmind.com/... Short 🧵👇 [video]
-
@demishassabis
Demis Hassabis
on x
Computers have long been great at complex tasks like analysing data, but not so great at simple tasks like recognizing & moving objects. With RT-2, we're bridging that gap by helping robots interpret & interact with the world and be more useful to people. https://www.nytimes.com/…
-
@hausman_k
Karol Hausman
on x
PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions? Introducing RT-2: https://robotics-transformer2.github.io / our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions! [vi…
-
@kevinroose
Kevin Roose
on x
Google has quietly rebuilt its robotics division around LLMs — the same AIs that power Bard, ChatGPT and others. Now, if you tell a robot to “pick up the extinct animal,” it knows you're talking about a dinosaur. My column from inside the lab: https://www.nytimes.com/...
-
r/singularity
r
on reddit
Google Deepmind presents RT-2, the first vision-language-action (VLA) Robotics Transformer and it may have drastic implications our future.
-
r/artificial
r
on reddit
Google Deepmind presents RT-2, the first vision-language-action (VLA) Robotics Transformer and it may have drastic implications our future.