Mistral launches Mistral OCR 3, featuring improvements in processing forms, scanned documents, complex tables, and handwriting, priced at $2 per 1,000 pages
Key Highlights from this release: … Bluesky: Jay Cuthrell / @cuthrell.com : 🤯 Only ~25 years ago... fond memories of massive industrial high speed paper medical record scanning operations in hospitals...
DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute
the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model su...
Reducto, which uses OCR with vision language models to convert complex documents into inputs for LLMs, raised a $75M Series B led by a16z at a $600M valuation
Stephanie Palazzolo / The Information :
Google DeepMind says Gemini Diffusion, an experimental text diffusion model demoed at Google I/O and available by waitlist, generates 1,000-2,000 tokens/second
Our state-of-the-art, experimental text diffusion model Jose Antonio Lanz / Decrypt : Google Doubles Down on AI: Veo 3, Imagen 4 and Gemini Diffusion Push Creative Boundaries Matthias Bastian / The De...
Mistral launches Mistral OCR, a multimodal API that uses optical character recognition to turn complex PDF documents into Markdown files ready for LLM training
It's available via their API, or it's “available to self-host on a selective basis” … Diya Lal / Tech in Asia : Mistral launches OCR tool for fast document processing Carl Franzen / VentureBeat : Mist...
Kaspersky researchers found apps in Google's Play Store and Apple's App Store that use OCR to steal crypto wallet recovery phrases from images on users' devices
Android and iOS apps on the Google Play Store and Apple App Store contain a malicious software development kit (SDK) …
After Microsoft eroded Windows users' trust with bad practices for years, Recall is a PR disaster, as users remain skeptical despite the company's assurances
inside the Copilot+ Recall disaster. Andrew Cunningham / Ars Technica : Windows Recall demands an extraordinary level of trust that Microsoft hasn't earned Alex / xaitax on GitHub : TotalRecall - a ‘p...
A look at the privacy and security concerns surrounding Microsoft's Recall, which will record everything users do in Windows for up to three months by default
a ‘privacy nightmare’? Mayank Parmar / Windows Latest : Hands on with Windows 11 Recall AI: Snappy performance, works without internet Iain Thomson / The Register : Was there no one at Microsoft who l...
OpenAI updates ChatGPT Plus and ChatGPT Enterprise to let users prompt the tool using voice commands or by uploading an image, coming to all users “soon after”
and Look Into Your Life Kyle Wiggers / TechCrunch : OpenAI's GPT-4 with vision still has flaws, paper reveals The Hill : ChatGPT given the ability to talk Laurent Giret / Thurrott : ChatGPT Can Now Ta...
Sources: Microsoft is experimenting with bringing new AI capabilities to Windows 11 apps, like generating a canvas from text in Paint and OCR in Snipping Tool
Zac Bowden / Windows Central :