- Apple researchers have developed a new AI system to “see” and interpret context from on-screen content.
- “Reference resolution as language modeling” systems enable more natural interactions with AI.
- The researchers behind ReaLM say that ReaLM is better than OpenAI’s GPT-4 at understanding context.
Apple’s new developments in AI are aimed at competing with OpenAI’s GPT product and could make interacting with virtual assistants like Siri more intuitive.
RealLM system stands for “Reference Resolution As Language Modeling”. Understand the context of ambiguous on-screen images, content, and conversations to enable more natural interactions with AI.
According to the researchers who created it, the new Apple system is better than other large-scale language models, such as GPT-4, at determining what context and language expressions refer to. It’s also a less complex system than other large-scale language models, such as OpenAI’s GPT series, so the researchers believe ReaLM could be used as a context-decoding system that “can reside on-device without sacrificing performance.” It says it’s an “ideal choice”.
For example, let’s say you ask Siri to display a list of local pharmacies. When presented with a list, you can ask them to “call the person on Rainbow Road” or “call the person at the bottom.” According to Apple researchers, ReaLM allows Siri to better decipher the context needed to perform such tasks than his GPT-4, instead of receiving error messages asking for more information. became. The person who created the system.
“Human speech typically contains vague references such as ‘they’ and ‘that,’ but whose meaning is clear (to other humans) given the context,” the researchers say. have written about the capabilities of ReaLM. “Being able to understand the context containing such references is essential for conversational assistants that aim to help users naturally communicate their requirements to and converse with agents.”
The RealM system can interpret images embedded in text, and researchers say it can be used to extract information such as phone numbers or recipes from images on a page.
OpenAI’s GPT-3.5 accepts only text input and can also contextualize images GPT-4 is a large-scale system trained primarily on natural, real-world images rather than screenshots. Apple researchers say this hinders practical performance and makes ReaLM a better option for understanding on-screen information.
“Apple has long been seen as lagging behind Microsoft, Google and Amazon in developing conversational AI,” The Information reported. “The iPhone maker has a reputation for carefully and methodically developing new products. This strategy has worked well for gaining consumer trust, but could undermine it in the fast-paced AI race. There is.”
But with RealM’s features teased, it appears Apple is preparing to enter the race in earnest.
The researchers behind ReaLM and representatives from OpenAI did not immediately respond to requests for comment from Business Insider.
It remains unclear when or if ReaLM will be implemented in Siri and other Apple products, but CEO Tim Cook said on a recent earnings call that the company plans to “continue to improve its AI progress later this year.” We are excited to share details of our efforts.”