As the fall season approaches, a fierce competition is unfolding between tech giants Google and OpenAI in the realm of artificial intelligence. The battleground: the launch of the next generation of large-language models, referred to as ‘multimodal’ models. These models possess the remarkable capability to work seamlessly with both images and text, a development poised to revolutionize various applications, from website design to data analysis.
Google is swiftly advancing in this race. The company has already shared a sneak peek of its upcoming Gemini multimodal Large Language Model (LLM) with a select group of external companies, as reported recently. However, OpenAI is determined to outpace Google in this race. The Microsoft-backed startup is currently engaged in a sprint to integrate GPT-4, its most advanced LLM, with multimodal capabilities comparable to what Gemini promises to deliver, according to insider information.
OpenAI provided a glimpse of these multimodal features when it unveiled GPT-4 back in March. Still, it chose to restrict access to these capabilities, with the exception of one company, Be My Eyes, which developed technology designed to assist individuals with visual impairments. Fast forward six months, and OpenAI is gearing up for a wider release of these features, collectively known as ‘GPT-Vision.’
Multimodal AI: A Game-Changer
Multimodal AI represents a significant leap in the capabilities of artificial intelligence systems. Traditionally, AI models primarily processed and generated text-based information. However, with the advent of multimodal models, AI can now comprehend and generate content that combines text and images seamlessly.
For instance, imagine a scenario where a user sketches a rough concept of a website layout, and the AI model, utilizing multimodal capabilities, instantly translates this sketch into code for the website. Alternatively, when faced with complex visual charts or graphs, the AI can provide textual analyses, eliminating the need to consult with an engineer or data expert.
OpenAI’s Ambitious Move
OpenAI’s ambition to introduce GPT-Vision on a broader scale signifies the company’s commitment to staying at the forefront of AI innovation. By integrating multimodal capabilities into its existing language model, OpenAI aims to cater to a wide range of industries and applications, from web development to data interpretation.
The race between Google and OpenAI is not only a competition for technological supremacy but also a race to unlock the immense potential of multimodal AI for businesses and society at large. As these two tech giants sprint towards the finish line, the world eagerly awaits the dawn of a new era in artificial intelligence, one where the boundaries between text and images blur, and the possibilities become boundless.