/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

DeepMind says video models like Veo 3 could become general purpose foundation models for vision, like LLMs for text, using zero-shot “chain-of-frames” reasoning

Video models are zero-shot learners and reasoners.  Fascinating new paper from Google DeepMind which makes …

Simon Willison's Weblog Simon Willison

Discussion

  • @priyankjaini Priyank Jaini on x
    Could video models be the path to general visual intelligence? In our new paper, we show that Veo3 has emergent zero-shot capabilities, solving complex tasks across the vision stack. Project page: https://video-zero-shot.github.io/ Paper: https://arxiv.org/... 🧵👇🏻 [image]
  • @thwiedemer Thaddäus Wiedemer on x
    Are we experiencing a ‘GPT moment’ in vision? Super excited to demonstrate the generality with which current video models can solve tasks from simple perception to visual reasoning! 🌐 https://video-zero-shot.github.io/
  • @simonwillison.net Simon Willison on bluesky
    Made some notes on the new DeepMind paper “Video models are zero-shot learners and reasoners” - it makes a convincing case that generative video models are to vision problems what LLMs were to NLP problems: single models that can solve a wide array of challenges simonwillison.net…
  • r/singularity r on reddit
    Video models are zero-shot learners and reasoners
  • r/singularity r on reddit
    Google's Veo 3 Demonstrates Chain-of-Frames behavior (like Chain-of-thought but for image frames).  Could diffusion models be the path …