The history of artificial intelligence has been punctuated by periods of so-called “AI winter,” when the technology seemed to meet a dead end and funding dried up. Each one has been accompanied by proclamations that making machines truly intelligent is just too darned hard for humans to figure out.
Google’s release of Gemini, claimed to be a fundamentally new kind of AI model and the company’s most powerful to date, suggests that a new AI winter isn’t coming anytime soon. In fact, although the 12 months since ChatGPT launched have been a banner year for AI, there is good reason to think that the current AI boom is only getting started.
OpenAI didn’t have high expectations when it launched the “low key research preview” called ChatGPT in November 2022. It was simply a test of a new interface for its text-generating large language models (LLMs). But the chatbot’s ability to do such a wide range of things, from synthesizing essays and poetry to answering coding problems, impressed and unnerved many people and set the tech industry aflame. When OpenAI added its new GPT-4 LLM to ChatGPT, some experts were so freaked out that they begged the company to slow down.
Evidence was already scant that anyone heeded that alarm call. It’s inconceivable now that Google has upped the ante—and also perhaps changed the rules of the game—by announcing Gemini.
Google had already rushed out a direct response to ChatGPT in the form of Bard earlier this year, finally launching LLM chatbot technology that it had developed earlier than OpenAI but chosen to keep private. With Gemini it claims to have opened a new era that goes beyond LLMs primarily anchored to text—potentially setting the stage for a new round of AI products significantly different from those enabled by ChatGPT.
Google calls Gemini a “natively multimodal” model, meaning it can learn from data beyond just text, also slurping up insights from audio, video, and images. ChatGPT shows how AI models can learn an impressive amount about the world if provided enough text. And some AI researchers have argued that simply making language models bigger would increase their capabilities to the point of rivaling those of humans.
But there’s only so much you can learn about physical reality through the filter of text that humans have written about it, and the hard-to-eradicate limitations of LLMs like GPT-4—such as hallucinating information, poor reasoning, and their weird security flaws—seem to suggest that scaling existing technology has its limits.
Read the full article here