Apple researchers publish a paper on Ferret-UI, a multimodal LLM tailored for enhanced understanding of mobile UI screens

this AI can read your iPhone screen Mini Games / The Indian Express : Apple's ‘Ferret-UI’ AI model could understand what's on your screen in a new way Oliver Haslam / iMore : Apple's latest AI advancement could improve iPhone accessibility and more Akash Pandey / NewsBytes : How Apple's new AI project could supercharge Siri Sabrina Ortiz / ZDNet : Apple's new AI model could understand your home screen and supercharge Siri David Snow / Cult of Mac : Apple's Ferret-UI helps AI use your iPhone M.G. Siegler / Spyglass : Apple Ferrets Out UI for AI Threads: @luokai : 🧵3/4 This enables Apple to continue its consistent privacy protection measures by keeping personal information as much as possible on the device. Thirdly, from a compliance perspective, for global companies like Apple, when AI products with illusionary content generation enter different markets … @luokai : 🧵2/4 Firstly, it is apparent that Apple is not fully prepared in terms of the required data accumulation for large models, and its proprietary model training began relatively late, indicating that some time may be needed to prepare for content generation quality. … @luokai : Considering Gemini, potential collaborations, and Apple's high probability of using Baidu's ERNIE in China, I am inclined to believe that Apple will operate as follows: when it comes to any underlying operations and in-app interactions, Apple should utilize its proprietary AI. … @luokai : 🧵4/4 Therefore, I believe that Apple will ultimately use its small-parameter large models for hardware-software interactions, while employing third-party large models for tasks involving internet information and content generation. Furthermore, Apple will adapt to compliance requirements by using different large models in different countries as necessary. Mastodon: Michael Love / @elkmovie@mastodon.social : They're going to try to train their AI on the data in our apps and there needs to be some very, very serious pushback against whatever legalese they add to the developer agreement to attempt to legitimize that. https://9to5mac.com/... X: @benlovejoy : The most mundane use would be to help Apple and other developers create better user interfaces, with accessibility features the next level before a highly advanced form of Siri ... Josh Miller / @joshm : 12-24 months... “browsing for you” “coding for you” “writing for you” “editing for you” ...to beat Apple, Google, & Microsoft until too late this is gonna be fun !!! [image] @_akhaliq : Apple presents Ferret-UI Grounded Mobile UI Understanding with Multimodal LLMs Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with [image] Matthew Cassinelli / @mattcassinelli : Apple is straight fucking with me by putting Shortcuts in this graphic https://arxiv.org/... [image] Parker Ortolani / @parkerortolani : holy smokes, iOS is about to get SUPERCHARGED Nick Dobos / @nickadobos : Apple is publishing some crazy cool papers This could unlock 2 major things: -action models, chat -> tapping on buttons and doing stuff -code writing loops to see if code generation correctly made ui @dylanmcd8 : WWDC is gonna be absolutely insane isn't it Camden Ko / @camdenko : @_akhaliq wouldn't be surprised if this turns into an accessibility setting for people new to tech. truly underrated how much work apple puts into making the products usable by everyone An Tran / @antranapp : 📣 BREAKING: SwiftUI code generation from screenshots coming soon in the next Xcode version 🚀 Seriously, I can't wait what Apple'd announce in the next WWDC 🤠 Viet / @vietdle : @_akhaliq This is awesome, but if you're Apple and pretty much own all layers of the application, OS, and hardware, why would you go with an image/screenshot-first approach here? Harris Rothaermel / @developerharris : looking forward to massively upgraded siri Robert Scoble / @scobleizer : Someone responded to me the other day the new Siri isn't multimodal. That just isn't a rational statement to me. EVERYTHING will soon be multimodal. (Which means your camera, sensors, microphone, can be listened to by your AI). Siri soon will know what you are looking at, holding, touching, gesturing toward, moving toward or away from, etc. At least if you have a Vision Pro. OpenAI can't do that. June's WWDC is gonna be so interesting. Forums: r/apple : Apple teaching an AI system to use apps; maybe for advanced Siri

AppleInsider 2024-04-10

Chronicles

Apple researchers publish a paper on Ferret-UI, a multimodal LLM tailored for enhanced understanding of mobile UI screens