A look at the challenges some AI developers face in building models to extract trillions of high-quality tokens from PDFs, which are hard to parse, for training
Uh, just going to point out that there's been zero progress on AI building “complex software,” and AI cannot, and will not, solve “advanced physics problems.” — These claims are lies, plain and simple. [embedded post]
Despite rapid progress in AI's ability to build complex software and solve advanced physics problems, the ubiquitous format of PDF remains something of a grand challenge. — Read more from @joshdzieza.bsky.social: www.theverge.com/ai-artificia... [image]
The bill for storing so much information online in formats designed for printing rather than in HTML was always going to come due, although I confess this wasn't how I imagined it [embedded post]