of the alphabet Gemini AI model It’s only been public for two months, but the company is already rolling out an upgrade. Gemini Pro 1.5, launching today with limited availability, is more powerful than its predecessor and can handle massive amounts of text, video, or audio input at a time.

Demas Hassabis, CEO of Google DeepMind, which developed the new model, compares its vast capacity for input to a person’s working memory, which he discovered years ago as a neuroscientist. what was “The great thing about these core capabilities is that they open up the kinds of sub-objects that the model can do,” he says.

In a demo, Google DeepMind showed Gemini Pro 1.5 analyzing a 402-page PDF of the Apollo 11 communications transcript. The model was asked to find the funny parts and highlighted several moments, such as when the astronauts said the communication delay was caused by a break in the sandwich. Another demo showed the model answering questions about specific actions in a Buster Keaton movie. Previous versions of Gemini could answer these questions with only a small amount of text or video. Google hopes the new capabilities will allow developers to build new types of apps on top of the model.

“It feels really magical how the model does this kind of reasoning on every single page, every single word,” says Oriel Winels, a research scientist at Google DeepMind.

Google says Gemini Pro 1.5 can take in and understand one hour of video, 11 hours of audio, 700,000 words, or 30,000 lines of code—many times more than other AI models. including OpenAI’s GPT-4which has power. Chat GPT. The company has not disclosed the technical details behind this feat. One use for models that can handle large amounts of text, tested by researchers at Google DeepMind, is identifying key points in Discord discussions with thousands of messages, Hasabis says.

The Gemini Pro 1.5 is also more capable — at least for its size — as measured by the model’s scores on several popular benchmarks. The new model leverages a technique previously invented by Google researchers to squeeze out more performance without requiring more computing power. This technique, called expert composition, selectively activates the parts of a model’s architecture that are best suited to solving a particular task, making it more efficient to train and run. is made.

Google says the Gemini Pro 1.5, despite being a significantly smaller model, is as capable as its most powerful offering, the Gemini Ultra, in many tasks. Hasabis says there’s no reason why the same techniques used to improve the Gemini Pro couldn’t be used to boost the Gemini Ultra.

An upgraded version of Gemini Pro will be made available to developers through AI Studio, a sandbox for testing model capabilities, and Google’s Vertex AI Cloud Platform API, albeit to a limited number of developers. There is no date yet for general release.

Google is also launching new tools to help developers use Gemini in their applications, including new ways to tap into the models’ ability to parse video and audio. The company also said it is adding new Gemini-powered features to its web-based coding tool Project IDX, including ways to debug AI and test code.

The pace of Gemini’s upgrades is a sign of a furious AI race that has already begun Chat GPT. Earlier this week, OpenAI announced that it is Providing ChatGPT with the ability to remember Useful information from conversations over a long period of time. Previous Week, Google rebranded its chatbot Bard. And announced that the Gemini Ultra will be available with a paid subscription.

The frenetic pace of progress in creative AI is at odds with concerns about the risks the technology poses. Google says it has put Gemini Pro 1.5 through extensive testing and that providing limited access offers a way to gather feedback on potential vulnerabilities. The company says it has also given researchers at the UK’s AI Safety Institute access to its most powerful models to test them.

Hasabis says more progress is expected in the coming months. “It’s a new cadence,” he says, “I’m trying to bring with a kind of startup mentality.”