Lior's View
Posts
🚨 Google Releases Gemini, Crushing GPT4

🚨 Google Releases Gemini, Crushing GPT4

On Google Gemini, a new multimodal LLM that outperforms GPT4V on MMLU

Lior Sinclair
December 06, 2023

AlphaSignal

^BREAKING
Google releases Gemini: Multimodal LLMs that Outperforms GPT4

What's New?
BIG NEWS, Google just released Gemini 1.0, their most capable and general AI model yet.

Built natively to be multimodal, it’s the first step in the Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano

In benchmark tests, Gemini outperforms OpenAI's GPT-4 in 30 of 32 tests, particularly in multimodal understanding and Python code generation.

Each model targets specific applications.

The flagship model, Gemini Ultra, is designed for complex tasks in data centers and enterprise applications, harnessing the full power of Google's AI capabilities. On the other hand, Gemini Pro, serves a wider array of AI services, integrating seamlessly with Google's own AI service, Bard. This model is positioned as a versatile tool in Google's AI arsenal, catering to diverse AI-driven tasks.

The most unique member of the family, Gemini Nano, is bifurcated into two versions: Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters. These models are specifically engineered for on-device operations, with a keen focus on optimizing performance in Android environments.

For coding, Gemini uses AlphaCode 2, a code-generating system that shows the model's proficiency in understanding and creating high-quality code in various languages.

At the heart of the Gemini models is an architecture built upon enhanced Transformer decoders, specifically tailored for Google's own Tensor Processing Units (TPUs). This synergy between hardware and software enables the models to achieve efficient training and inference processes, setting them apart in terms of speed and cost-effectiveness compared to previous iterations like PaLM.

A key feature of the Gemini suite is its natively multimodal nature. Trained on a vast array of datasets including text, images, audio, and code, the models are adept at processing and generating outputs across these modalities.

This is particularly evident in their performance, as they reportedly surpass OpenAI's GPT-4 in various benchmarks, especially in multimodal understanding and Python code generation.

The version released this week, Gemini Pro, is a lighter variant of a more advanced model, Gemini Ultra, expected next year.

Main Takeaways

Performance: Gemini AI surpasses GPT-4 on human-level MMLU benchmark performance with a 90% score.
Architecture: Utilizes advanced Transformer decoders, and is trained on TPUv4 pods with a significant context length support of 32k tokens.
Variants: Available in three tailored versions — Ultra for complex tasks, Pro for scalability, and Nano for on-device efficiency.
Benchmarks: Sets new SOTA results across multimodal tasks, including image understanding and reasoning problems.
Accessibility: Gemini Pro will be accessible to developers through an API on Google AI Studio or Google Cloud Vertex AI from December 13.

Opinion
Gemini Pro is now powering Bard, Google's ChatGPT rival, and promises improved abilities in reasoning and understanding. However, there's a catch: Google didn't allow independent testing of these models before their launch, leaving us to take their word for it. The Pro version will also be available for enterprise users and developers soon.

The more intriguing Gemini Ultra claims to be "natively multimodal," processing a diverse range of data including text, images, audio, and videos. This capability surpasses OpenAI's GPT-4 with Vision, but the improvements are marginal in many aspects. For instance, in some benchmarks, Gemini Ultra only slightly outperforms GPT-4.

A concerning aspect is Google's secrecy around Gemini's training data. Questions about the data's sources and creators' rights were left unanswered. This is critical, as the AI industry faces lawsuits over using copyrighted content without credit or compensation.

Hire a world-class AI team for 80% less

Building AI products is hard, finding talented engineers who understand it is even harder.

That's why companies around the world trust AE Studio. We help you craft and implement the optimal AI solution for your business with our team of world class AI experts from Harvard, Stanford and Princeton.

Customized Solutions: Tailor-made software that fit your unique business needs. We work hand-in-hand with your team for seamless integration.

Cost-effective: High-quality solutions at a fraction of the cost.

Proven Track Record: Join the ranks of successful startups and Fortune 500 companies that rely on us.

partner with us

How was today’s email?

Not Great Good Amazing

Thank You

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.

🚨 Google Releases Gemini, Crushing GPT4

On Google Gemini, a new multimodal LLM that outperforms GPT4V on MMLU

BREAKINGGoogle releases Gemini: Multimodal LLMs that Outperforms GPT4

^BREAKING
Google releases Gemini: Multimodal LLMs that Outperforms GPT4