Mistral launches Mistral OCR 3, featuring improvements in processing forms, scanned documents, complex tables, and handwriting, priced at $2 per 1,000 pages
Key Highlights from this release: … Bluesky: Jay Cuthrell / @cuthrell.com : 🤯 Only ~25 years ago... fond memories of massive industrial high speed paper medical record scanning operations in hospitals...
DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute
the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model su...
Google DeepMind says Gemini Diffusion, an experimental text diffusion model demoed at Google I/O and available by waitlist, generates 1,000-2,000 tokens/second
Our state-of-the-art, experimental text diffusion model Jose Antonio Lanz / Decrypt : Google Doubles Down on AI: Veo 3, Imagen 4 and Gemini Diffusion Push Creative Boundaries Matthias Bastian / The De...
Mistral launches Mistral OCR, a multimodal API that uses optical character recognition to turn complex PDF documents into Markdown files ready for LLM training
It's available via their API, or it's “available to self-host on a selective basis” … Diya Lal / Tech in Asia : Mistral launches OCR tool for fast document processing Carl Franzen / VentureBeat : Mist...
OpenAI updates ChatGPT Plus and ChatGPT Enterprise to let users prompt the tool using voice commands or by uploading an image, coming to all users “soon after”
and Look Into Your Life Kyle Wiggers / TechCrunch : OpenAI's GPT-4 with vision still has flaws, paper reveals The Hill : ChatGPT given the ability to talk Laurent Giret / Thurrott : ChatGPT Can Now Ta...
Sources: Microsoft is experimenting with bringing new AI capabilities to Windows 11 apps, like generating a canvas from text in Paint and OCR in Snipping Tool
Zac Bowden / Windows Central :
Google expands Lens beyond mobile by rolling it out inside Google Photos for the web, allowing desktop users to copy text from images using OCR
but there's a catch
Google Photos can now search for text that appears in user images with its optical character recognition filter
Google Lens has a powerful optical character recognition (OCR) filter that can pull text from any image. Available in Google Photos, the backup service this month is adding the ability to search for ...
Microsoft is bringing automatic video summarization, Hyperlapse, OCR and more to Azure Media Services
Frederic Lardinois / TechCrunch :
Microsoft Translator adds OCR capabilities to its iOS app, offline translations on Android
Lance Whitney / CNET :