• Lior's View
  • Posts
  • 👀 ChatGPT Can Now See, Hear, and Speak

👀 ChatGPT Can Now See, Hear, and Speak

On Mistral 7B, Getty's New Image Generator, Stability's new 3B LLM

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

On Today’s Email:

  • Top Releases and Announcements

  • ChatGPT Expands Its Senses: Now Supports Voice and Images as Input

  • Mistral AI releases an Open 7B LLM Parameter Model

  • Getty and NVIDIA launches an AI-powered Image Generator

RELEASES & ANNOUNCEMENTS

1. Microsoft is making DALL-E 3 available on Bing for everyone
OpenAI's DALL-E 3 is now available in Microsoft's Bing Creator AI suite, offering enhanced image and text interpretation features. This next-gen AI image generator outperforms competitors like Midjourney and Stable Diffusion. Initially only for OpenAI's paying customers, it's now accessible to the public via Bing.

2. Amazon Bedrock Goes Live, Amplifying AWS's Enterprise AI
AWS launches Amazon Bedrock: a managed service for generative AI, offering pre-trained foundation models through an API. New Titan Embeddings for text-to-numerical conversions and RAG. CodeWhisperer to allow customization using private code. Generative BI authoring in QuickSight.

3. Cohere Unveils Coral Chatbot and Chat API
Cohere releases a new Chat API and Coral, a demo chatbot, enabling developers to create sophisticated, reliable conversational AI products. These innovations, with advanced features like Retrieval-Augmented Generation (RAG), aim to provide refined, customizable, and accurate conversational experiences.

4. Stability AI Releases Stable LM 3B Language Model for Portable Digital Devices
Stable LM 3B is a 3-billion-parameter language model optimized for portable devices. It's efficient, low-cost, and outperforms similar or even larger models. Suitable for tasks like writing help and coding, it can be fine-tuned. Open-source and available on Hugging Face.

5. ChatGPT Gets the Access to the Entire Internet
OpenAI's ChatGPT now offers real-time web searches from credible sources with the 'Browse with Bing' feature. This update allows users to verify information with direct links, ensuring the reliability of the responses.

Automate Your Investing with Composer's AI Copilot

Building your own trading algorithms takes time to script, test, and deploy.

That changes with Composer, the automated trading platform.

  • Build the strategy with our GPT4 AI assistant to speed up your workflow and a no-code editor.

  • Backtest against other stocks & ETFs

  • Start trading automatically with a click of a button.

Composer already has over $1 billion in trading volume (and an active Discord community of 3k+ traders including data scientists, researchers, and engineers).

Use code ALPHASIGNAL for a 3 week free trial.

NEWS
ChatGPT Expands Its Senses: Now Supports Voice and Images as Input

What's New?
OpenAI is releasing a new version of ChatGPT, enabling users to engage with the chatbot by using voice and images as input, in contrast to the previous text-only conversations. The new features are currently rolling out to all users.

Why Does It Matter?
This upgrade offers the flexibility to use ChatGPT in multiple ways, making it more user-friendly and practical:

The voice conversation capabilities enable more natural, back-and-forth interactions. The novel visual input modality allows for more accurate and context-aware responses.

These aspects are especially beneficial for users seeking instantaneous and relevant solutions or insights.

Main Takeaways:

  • Image Understanding: Powered by multimodal GPT-3.5 and GPT-4, ChatGPT is now able to reason over photographs and documents containing text and images.

  • Voice Interaction: It can accept speech as input and talk back the answer to the user. OpenAI employs its speech recognition system Whisper to transcribe the voice input to text.

  • Synthetic Output Voice: ChatGPT is able to respond to speech input by generating a human-like audio response, which is supported by a new text-to-speech model.

NEWS
Mistral AI, The $113M Seed Round Startup, Releases an Open 7B LLM Parameter Model

What’s New?
Mistral, a French AI startup that raised a huge seed round in June, has just releases its first model and it’s totally free to use without restrictions.

Their new LLM, the Mistral 7B, is already excelling in language tasks and coding, outperforming Meta’s Llama 2 13B and is well-suited for a wide range of enterprise applications.

Why Does It Matter?
Released under the Apache 2.0 license, Mistral 7B stands out for its remarkable flexibility and adaptability, offering optimized functionalities such as low-latency text summarization and classification. These aspects make it an invaluable asset for those seeking advanced and unrestricted enterprise-grade LLM solutions.

Features
• Released under Apache 2.0 licence.
• Superior to LLaMA 1 34B in code, math, and reasoning
• Approaches CodeLlama 7B performance on code

Usability
• Usable anywhere (even locally)
• Deployable on any cloud (AWS/GCP/Azure)
• Usable on HuggingFace

Architecture
• Uses Grouped-query attention (GQA) for faster inference
• Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

NEWS
Getty/NVIDIA launches an AI-powered Image Generator

What's New?
Getty Images, in alliance with Nvidia, has introduced a Generative AI Tool which allows the creation of images using Getty’s licensed photos, ensuring users full copyright compensation and legal protection for published images.

Why Does It Matter?
Addressing concerns in the generative AI space, this tool ensures responsible and ethical creation of AI-generated content. It’s designed to be commercially safer, offering users visuals under a standard royalty-free license, and establishes robust safeguards against misuse and replication of unique styles, ensuring fair compensation to contributors.

Main Takeaways

  • Responsible Creation: The tool is developed with intentional safeguards and limitations to prevent misuse and ensure ethical content creation.

  • Legal Protection: Users receive full copyright compensation, with generated images being legally safe for commercial publication.

  • Enhanced Realism: The tool excels in rendering realistic human figures and detailed images, outperforming competitors in realism tests.

  • Fair Compensation and Revenue Sharing: Contributors whose works were used to train the model will be compensated fairly, with Getty sharing revenues generated by the tool.

How was today’s email?

Not Great      Good      Amazing

Thank You

Igor Tica is a writer at AlphaSignal and a Research Engineer at SmartCat, with main expertise in Computer Vision. Passionate about contributing to the field and seeking opportunities for research collaborations that span Self-supervised and Contrastive learning.

Want to promote your company, product, job, or event to 100,000+ AI researchers and engineers? You can reach out here.