/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Investigation: Apple, Nvidia, Anthropic, and others trained their AI on a dataset containing YouTube video transcripts, including from the WSJ, MrBeast, and MIT

Creators claim their videos were used without their knowledge  —  AI companies are generally secretive about their sources of training data …

Proof

Discussion

  • @mkbhd Marques Brownlee on x
    Apple has sourced data for their AI from several companies One of them scraped tons of data/transcripts from YouTube videos, including mine Apple technically avoids “fault” here because they're not the ones scraping But this is going to be an evolving problem for a long time
  • @mkbhd Marques Brownlee on x
    Fun fact, I pay a service (by the minute) for more accurate transcriptions of my own videos, which I then upload to YouTube's back-end. So companies that scrape transcripts are stealing *paid* work in more than one way. Not great.
  • @mysk_co @mysk_co on x
    It's not that tough to win a privacy competition against @googlechrome. This is how Safari's Privacy Nutrition Label compares to another browser such as @brave: [image]
  • @sarafischer Sara Fischer on x
    NEW: @apple has brought on @taboola to sell ads on its behalf across Apple News and Stocks apps - Taboola will be the exclusive ad seller globally where the apps are available. - NBCU will continue to sell ads in some markets https://www.axios.com/...
  • @leonieclaude Roberta Fischli on x
    “Our investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple, and Salesforce.” @WIRED @proof__news https://www.wired.com/...
  • @dwiskus Dave Wiskus on x
    Proof News has a great video exposé — yay real video journalism — on creator work being stolen from YouTube to power machines designed to (poorly) replace us https://www.youtube.com/...
  • @neilturkewitz Neil Turkewitz on x
    “It's ‘disrespectful’ to use creators' work without their consent, especially since studios may use ‘GenAI to replace as many of the artists along the way as they can...Will this be used to exploit & harm artists? Yes, absolutely.’” —⁦@dwiskus⁩ https://www.wired.com/...
  • @proof__news @proof__news on x
    Our latest investigation reveals a dataset of more than 170,000 YouTube video subtitles that big tech companies used to train their AI models. “Will this be used to exploit and harm artists? Yes, absolutely,” says @dwiskus. https://www.proofnews.org/...
  • @neilturkewitz Neil Turkewitz on x
    This is hysterical. “Among the videos used by AI companies are 146 from Einstein Parrot.” It's a channel featuring an actual parrot. Life imitates stochastic parrots! We have come full circle. @timnitGebru @emilymbender @mmitchell_ai
  • @_felixsimon_ Felix M. Simon on x
    Political economy of AI and news, latest: ... “Proof News found some of the wealthiest AI companies [...] have used material from thousands of YouTube videos to train AI. Companies did so despite YouTube's rules against harvesting materials from the platform without permission.”
  • @sophiasgaler Sophia Smith Galer on x
    This is extremely depressing.
  • @fb_bmb Mathew Buck on x
    Just discovered at least one of my videos has been used to train AI. Needless to say, I did not consent to this and was not aware of this until now. [image]
  • @juliaangwin Julia Angwin on x
    Huge investigation from @proof__news today: We reveal the trove of YouTube videos that are being used to train AI models (including Anthropic's Claude). Yes, it includes all your favorite YouTubers - from @hankgreen to @MrBeast to @khanacademy. https://www.proofnews.org/...
  • @_felixsimon_ Felix M. Simon on x
    Using @Proof__news tool to search the data set, it's easy find some well-known publishers swept up in this, too, with both @FT & @guardian videos (including @johnharris1969) part of the dataset. [image]
  • r/youtubedrama r on reddit
    Several YouTubers Had Their Vidoes Scraped to Train AI Tools for Apple, Nvidia, and Others
  • r/technology r on reddit
    Apple, Nvidia, Anthropic Used Thousands (173,536) of Swiped YouTube Videos to Train AI
  • r/technews r on reddit
    Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI