/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Google launches RT-2 or Robotics Transformer 2, a “vision-language-action” model trained on text and images from the web that can output robotic actions

Our sneak peek into Google's new robotics model, RT-2, which melds artificial intelligence technology with robots.

New York Times Kevin Roose

Discussion

  • @kboughida Karim B Boughida on x
    Can you imagine those in a library helping with shelf-reading, re-shelving, and doing chats! Aided by A.I. Language Models, Google's Robots Are Getting Smart https://www.nytimes.com/...
  • @googledeepmind @googledeepmind on x
    Across all categories, we saw increased generalisation performance compared to previous baselines, such as on RT-1 models. We also evaluated RT-2 on a number of unseen objects and environments where it could successfully adapt to new situations: https://dpmd.ai/...
  • @googledeepmind @googledeepmind on x
    ⚪ To explore RT-2's emergent capabilities in trials, we searched for tasks that require combining learnings from web data and the robot's experience. We then defined 3 skills it needed to show: 🔘 Symbol understanding 🔘 Reasoning 🔘 Human recognition https://dpmd.ai/... [image]
  • @glenngabe Glenn Gabe on x
    Yep, here we go... LLMs plugged into robots -> Aided by A.I. Language Models, Google's Robots Are Getting Smart “Google has recently begun plugging state-of-the-art language models into its robots, giving them the equivalent of artificial brains.” https://www.nytimes.com/... [ima…
  • @googledeepmind @googledeepmind on x
    Today, we announced 𝗥𝗧-𝟮: a first of its kind vision-language-action model to control robots. 🤖 It learns from both web and robotics data and translates this knowledge into generalised instructions. Find out more: https://dpmd.ai/... [video]
  • @alex @alex on x
    Fuck yes this is what I was hoping llms would do to robots and I was (happily) years late to the idea https://www.nytimes.com/...
  • @kimzetter Kim Zetter on x
    A one-armed robot stood in front of a table with three plastic figurines on top of it: a lion, a whale and a dinosaur. Engineer: “Pick up the extinct animal.” The robot whirred for a moment, then its arm extended and its claw opened and descended. It grabbed the dinosaur.
  • @imordatch Igor Mordatch on x
    Excited to finally share what we've been working on for the past while: combining vision, language, and action modalities for robot control! https://www.nytimes.com/...
  • @lesaunh @lesaunh on x
    Too few have understood that AI advances are mapping onto robots. Yes, AI is going to impact cognitive labor first, but the robots will flood the physical labor market soon after. [image]
  • @ajddavison Andrew Davison on x
    Large language models and web-scale data have some use in robotics as a user interface as nicely demonstrated here, but in my opinion they are not what we need to help with perception, object representation and precise planning which are the real current barriers in robotics.
  • @xiao_ted Ted Xiao on x
    Introducing RT-2, representing the culmination of two trends: - Tokenize and train everything together: web-scale text, images, and robot data - VLMs are not just representations, big VLMs *are policies* Sounds subtle, but we'll look back on this as an inflection point! 📈
  • @hausman_k Karol Hausman on x
    Multiple conclusions from these experiments: (1) it turns out with a little bit of robot data we can transfer semantic concepts from vision language web-scale data to robot actions (2) the best VLMs might be the most generalizable robotic controllers
  • @chelseabfinn Chelsea Finn on x
    Vision-language ➡️ vision-language-action model By using a pre-trained VLM (e.g. PaLI-X), RT-2 enables robots to generalize to new objects & instructions RT-2 also shows basic reasoning capabilities. (e.g. “place orange in matching bowl") Paper+videos: https://robotics-transforme…
  • @svlevine Sergey Levine on x
    Turns out that vision-language models can control robots too. The secret is to just finetune them to print out the actions (literally, as text). Really excited about our new result, the successor to RT-1. RT-2 is a pre-trained VLM: https://www.deepmind.com/... Short 🧵👇 [video]
  • @demishassabis Demis Hassabis on x
    Computers have long been great at complex tasks like analysing data, but not so great at simple tasks like recognizing & moving objects. With RT-2, we're bridging that gap by helping robots interpret & interact with the world and be more useful to people. https://www.nytimes.com/…
  • @hausman_k Karol Hausman on x
    PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions? Introducing RT-2: https://robotics-transformer2.github.io / our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions! [vi…
  • @kevinroose Kevin Roose on x
    Google has quietly rebuilt its robotics division around LLMs — the same AIs that power Bard, ChatGPT and others. Now, if you tell a robot to “pick up the extinct animal,” it knows you're talking about a dinosaur. My column from inside the lab: https://www.nytimes.com/...
  • r/singularity r on reddit
    Google Deepmind presents RT-2, the first vision-language-action (VLA) Robotics Transformer and it may have drastic implications our future.
  • r/artificial r on reddit
    Google Deepmind presents RT-2, the first vision-language-action (VLA) Robotics Transformer and it may have drastic implications our future.