2024-11-27
I've removed the Bluesky data from the repo. While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake. [embedded post]
404 Media
Bluesky says it's up to outside organizations to respect user consent, after a Hugging Face employee posted a 1M-post dataset from Bluesky's API for ML research
Update: Following the publication of this article on Tuesday evening, van Strien removed the dataset.
First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts 🦋 — 📊 1M public posts from Bluesky's firehose API — 🔍 Includes text, metadata, and language predictions — 🔬 Perfect to experiment with using ML for Bluesky 🤗 …
404 Media
Bluesky says it's up to outside organizations to respect user consent, after a Hugging Face employee posted a 1M-post dataset from Bluesky's API for ML research
Update: Following the publication of this article on Tuesday evening, van Strien removed the dataset.