In an exciting announcement, Andreas Braun, CTO of Microsoft Germany, has confirmed that the eagerly anticipated GPT-4 from OpenAI will arrive in the second week of March 2023. According to Braun, GPT-4 will be a multimodal AI, meaning that it will be able to handle various kinds of input, including video, sound, images, and text.
Multimodal Large Language Models Explained
The big news here is that GPT-4 will be multimodal, as SEJ previously predicted in January 2023. In the context of large language models, “modality” refers to the type of input they can handle. Multimodal models can process various kinds of input, including text, speech, images, and video.
Previous versions of GPT, such as GPT-3 and GPT-3.5, were restricted to operating in one modality only, which was text. However, according to a German news report, GPT-4 will be able to operate across at least four modalities, including images, sound, text, and video. The report did not provide specifics on whether the multimodality applied to GPT-4 specifically or was a general reference.
While Microsoft Director Business Strategy Holger Kenn explained multimodalities, it was unclear whether he was referring to GPT-4 or multimodality in general. However, I believe that his references were specific to GPT-4.
Another interesting aspect of GPT-4 is that Microsoft is working on “confidence metrics” to ground the AI with facts, making it more reliable.
Microsoft Kosmos-1: Another Multimodal Language Model
Microsoft has already released another multimodal language model, Kosmos-1, which integrates the modalities of text and images. However, GPT-4 goes further by adding video as a third modality and appears to include sound as well.
Works Across Multiple Languages
GPT-4 appears to work across all languages, as it is said to be able to receive a question in one language and answer it in another. For instance, it can receive a question in German and answer it in Italian. While this might seem like an unusual use case, it’s still an exciting feature that showcases the capabilities of the new language model.
At present, there is no information on where GPT-4 will be used. However, Azure-OpenAI was specifically mentioned. Google is currently trying to integrate a competing technology into its search engine, but it appears to be struggling to catch up with Microsoft. The way Microsoft is implementing GPT-4 is more visible and is capturing all the attention, further highlighting the perception that Google is falling behind and lacks leadership in consumer-facing AI.
GPT-4 from OpenAI is set to make waves in the AI industry. Its multimodal capabilities, coupled with its ability to operate across all languages, make it a game-changer. While we don’t yet know all the details, the early reports suggest that GPT-4 will be a significant step forward in large language models.
Read the original German reporting here: