• Lior's View
  • Posts
  • đŸ„‡ The 5 AI Papers You Should Read This Week

đŸ„‡ The 5 AI Papers You Should Read This Week

Fresh out the Neural Network. Our model analyzed and ranked 1000+ papers to provide you with the following summary. Enjoy!

AlphaSignal

Hey ,

Welcome back to AlphaSignal, where we bring you the latest developments in the world of AI.

In the past few days, an impressive number of AI papers have been released, and among them, we have handpicked the top six that truly stand out.

On Today’s Summary:

  • Meta’s EMU VIDEO

  • Deepmind’s weather forecasting

  • NVIDIA new LLM Chip Design

  • Other notable papers

Reading time: 4 min 52 sec

📄 TOP PUBLICATIONS

Meta - EMU VIDEO: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin

What’s New
Meta just launched EmuVideo, an innovative tool creating 4-second videos at 512x512 resolution and 16 FPS. The approach is quite simple yet ingenious: first, transform text into an image, then apply a "super-resolution" technique along the temporal axis to create motion, essentially crafting a video from a single frame.

Problem
The big issue with current text-to-video models is their tendency to produce videos that are either of poor quality or lack variety. More so, the direct conversion from text to video has been a tough challenge, often impacting the quality of the base text-to-image model, particularly in depicting motion.

Solution
Emu Video smartly splits the challenge into two phases. It kicks off by crafting a starting image from text using a state-of-the-art text-to-image model. Then, it adds a twist by introducing temporal convolution and attention layers, creating the remaining frames of the video while keeping the quality top-notch and leveraging the base model's strengths.

Results
Emu Video stands out, surpassing other models like CogVideo and Imagen Video in video quality and text alignment. It brilliantly retains all functionalities of its text-to-image predecessor and even allows for seamless video editing without extra training, outperforming commercial giants like Pika Labs.

Time to Simplify Your AI Tools and LLM Billing

Charging by tokens or credits?

Whichever pricing model you choose - package, matrix, tiered - Orb makes it easy to implement.

Just choose your pricing model and billable metric and you’re done.

Companies like Vercel, Replit, and Airbyte trust Orb to track consumption, prevent fraud, and align pricing to value (even down to GPU runtime), so they can build the products we all know and love, instead.

Special Offer: The first 10 qualified AlphaSignal readers to sign up get a free trial for Orb.

Deepmind - Learning skillful medium-range global weather forecasting

Matthew Willson, Remi Lam, Megan Fitzsimons, Ellen Clancy, Alberto Arribas

What’s New
Deepmind releases GraphCast, a new machine learning-based weather forecasting model, to predict global medium-range weather, including severe events. Unlike traditional methods, GraphCast efficiently utilizes historical weather data to deliver accurate 10-day forecasts globally at a high resolution of 0.25°, achieving this in less than a minute.

Problem
Traditional numerical weather prediction (NWP) methods, while accurate, rely heavily on computational resources and do not effectively incorporate historical weather data. This limitation hampers their efficiency and adaptability in predicting medium-range weather, especially in forecasting complex phenomena like severe weather events.

Solution
GraphCast addresses these challenges with a machine learning approach, utilizing a graph neural network structure. Trained on 39 years of historical data, it comprises 36.7 million parameters and operates with a 0.25° resolution. Its autoregressive nature allows for extended forecasts. The model significantly surpasses current NWP methods, providing detailed forecasts including the tracking of severe weather events.

Results
GraphCast outperforms the state-of-the-art High RESolution forecast (HRES) in 10-day forecasts at a horizontal resolution of 0.25 degrees. It demonstrates superior skill in predicting severe weather events like tropical cyclones, atmospheric rivers, and extreme temperatures, even without explicit training for these scenarios. Remarkably, it can generate a 10-day forecast in under a minute on a cloud TPU.

Sci-fi-level speech technology is finally (nearly) here


J.A.R.V.I.S., C-3PO, Samantha from ‘Her’ – we were promised technology that can understand us. Instead, we got Alexa to occasionally turn on the lights and set timers.

It’s finally time to fulfill the promise of speech technology.

NVIDIA - ChipNeMo: Domain-Adapted LLMs for Chip Design

Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand

What’s New
The paper aims to explore the application of large language models (LLM) for industrial chip design. This is achieved without directly deploying commercial LLMs, but rather utilizing domain adaptation techniques through custom tokenizers, domain-adaptive pretraining, supervised fine-tuning with domain specific instructions, and domain-adapted retrieval models.

Problem
Typical large language models are not well-suited for specialized tasks like chip design. They often produce irrelevant or inaccurate information and integrating proprietary data into these models is challenging. This mismatch limits their usefulness in fields that require high precision like semiconductor design.

Solution
NVIDIA's approach in developing ChipNeMo involved using LLaMA2 as the foundational LLM, supplemented by domain-specific pretraining on proprietary and public chip design data. This was followed by supervised fine-tuning (SFT) using general data sources like OASST, FLAN, P3, and specialized instruction data, resulting in 7B and 13B models. Additionally, a domain-specific tokenizer and retrieval-augmented generation (RAG) were implemented to reduce inaccuracies and enhance the relevance of the model's outputs.

Results
ChipNeMo is currently used within NVIDIA for tasks like EDA script generation, engineering assistance through chatbots, and bug analysis. It demonstrates superior performance in chip design tasks compared to the baseline LLaMa models.

ChipNeMo signifies a tailored approach in applying AI to semiconductor design, offering improved productivity and decision-making accuracy in this highly specialized field.

🏅 NOTABLE PAPERS

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
This study compares GPT-4 and its multimodal version, GPT-4V, with humans on abstraction and reasoning tasks using the ConceptARC benchmark. Results show neither GPT-4 version matches human-level abstract reasoning, even with detailed one-shot prompts and simplified image tasks

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Qwen-Audio is a large-scale audio-language model trained on 30+ tasks and diverse audio types like speech, music, and natural sounds, aimed at universal audio understanding. It uses a multi-task framework to avoid interference, achieving high performance without task-specific tuning and enabling multi-turn dialogues with various audio and text inputs.

The Transient Nature of Emergent In-Context Learning in Transformers
Transformers, known for their in-context learning (ICL) ability, unexpectedly shift from ICL to in-weights learning (IWL) during training. This study reveals ICL's temporary nature and suggests L2 regularization for sustained ICL. This shift impacts strategies for training efficient, compact models.

How was today’s email?

Not Great      Good      Amazing

Thank You

Hyungjin Chung is a contributing writer at AlphaSignal and second year Ph.D. student @KAIST bio-imaging signal processing & learning lab (BISPL). Prior research intern at the Los Alamos National Laboratory (LANL) applied math and plasma physics group (T-5).

Jacob Marks is an editor at AlphaSignal and ML engineer at Voxel51, is recognized as a leading AI voice on Medium and LinkedIn. Formerly at Google X and Samsung, he holds a Ph.D. in Theoretical Physics from Stanford.

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.