Lior's View
Posts
🎶 Meta's New (and powerful) Text-To-Music Model

🎶 Meta's New (and powerful) Text-To-Music Model

On AlphaDev, MusicGen, Google's new Platform, RedPajama, and RunwayV2

Lior Sinclair
June 13, 2023

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal the newsletter for AI experts.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

On Today’s Summary:

Top Releases and Announcements in AI
DeepMind's AlphaDev Discovers Faster Sorting Algorithm
Meta's AI MusicGen: Create New Songs from Text and Melody Prompts

📑 RELEASES & ANNOUNCEMENT

1. Google has opened its Generative AI Platform to everyone
Developers can now utilize the text model powered by PaLM2, as well as the Embeddings API for text and several other foundational models available in the model garden.

2. Stability Launches Uncrop: The Ultimate Aspect Ratio Editor
This AI-based tool lets you effortlessly adjust any image's ratio by creating an expanded background. Leveraging Stability AI's Stable Diffusion XL model, Uncrop reimagines and broadens your images, adjusting their dimensions as needed.

3. The RedPajama project releases RedPajama-INCITE-7B-Instruct
This model represents the top-performing open-source entry on the HELM benchmarks, surpassing other cutting-edge open models like LLaMA-7B, Falcon-7B, and MPT-7B. The instruct-tuned model, designed for versatility, shines when tasked with few-shot performance.

4. Runway's Gen-2 text-to-video tool is available to everyone for free
The tool creates 4-second MP4 videos based on the input prompt. Moreover, it can also generate short video sequences from an image or from the combination of an image and a text description.

5. OpenAI CEO calls for collaboration with china to counter AI risks
Sam Altman highlighted the need for US-China collaboration to tackle AI risks in a warmly received video speech at a Beijing conference, stressing cross-border research ties amid intensifying US-China tech rivalry.

Magically create video documentation with AI.

guidde AI is a GPT-powered tool that creates video documentation 11x faster. Simply click capture on our browser extension and we will automatically generate step-by-step video guides complete with visuals, voiceover and call to actions. Share or embed your guidde anywhere with our smart copy and turn your boring documentation into stunning visual guides.

^NEWS
DeepMind's AlphaDev Discovers Faster Sorting Algorithm

DeepMind has made significant strides in algorithm optimization with AlphaDev, a deep reinforcement learning agent capable of generating faster sorting algorithms from scratch. As computational demand skyrockets, the optimization of fundamental algorithms, such as sorting and hashing, is more crucial than ever. Traditional methods and human expertise, however, have been struggling to enhance the efficiency of these algorithms further.

AlphaDev, trained to navigate vast search spaces, not only discovers but also creates efficient sorting algorithms, superseding existing human standards. This groundbreaking agent structures complex problems as single-player games, revealing untapped routines and algorithms. It has been applied successfully to the AssemblyGame, where it selects low-level CPU instructions to create efficient algorithms.

Remarkably, AlphaDev generated small sorting algorithms that outperformed benchmarks set by human specialists. These algorithms have been integrated into the LLVM standard C++ sort library, marking the first use of an algorithm generated by reinforcement learning surpassing human-created methods in performance. AlphaDev's versatility extends beyond sorting algorithms, showing promise in solving a wide array of issues.

Our Take
This achievement by AlphaDev may also be a sign of broader potential in algorithmic optimization. The same approach, when applied to more complex, non-trivial real-world algorithms, could lead to significant advancements in diverse areas. While the specific outcomes of such potential applications are yet to be determined, the successful demonstration by DeepMind shows the viability of this direction. The next phase of algorithmic optimization, powered by deep reinforcement learning, could well be around the corner, promising exciting prospects for the future of computation and software development.

Want to promote your company, product, job, or event to 100,000+ AI researchers and engineers? You can reach out here.

^NEWS
Meta's AI MusicGen: Create New Songs from Text and Melody Prompts

Meta’s latest open source project represents a step forward to a new realm of music creation by generating unique pieces of music from both text prompts and melody inputs. MusicGen uses a Transformer model to predict the next section of a song, similar to how language models predict the following sentence characters.

Utilizing Meta's EnCodec audio tokenizer, audio data is broken down into manageable components. This efficient single-stage model processes tokens in parallel, making MusicGen notably speedy. Trained on a colossal 20,000 hours of licensed music, including 10,000 high-quality internal tracks and data from Shutterstock and Pond5, this music generator exhibits an extraordinary ability to translate text descriptions and pre-existing melodies into unique musical compositions.

Although the resulting music only loosely reflects the input melody, it matches the textual description, effectively blending style and emotion. Human testers favored the performance of the 1.5 billion parameter model, which excels in creating high-quality audio. Notably, MusicGen outperforms other AI music models, including Google's MusicLM, in both objective and subjective metrics.

Out Take
I see this as a potential inflection point in the field of music generation. It might spark a “LLaMA moment", a sudden expansion in the field thanks to the model of high quality being open-sourced and shared with the rest of the community. While audio generation domain is less explored than text generation, the presence of effective models like Google's MusicLM shows promising groundwork already laid out.

How was today’s email?

Not Great Good Amazing

Igor Tica is a co-writer at AlphaSignal and a Research Engineer at SmartCat, with main expertise in Computer Vision. Passionate about contributing to the field and seeking opportunities for research collaborations that span Self-supervised and Contrastive learning.

Thank You