- Lior's View
- Posts
- 📽️ Stable Video Diffusion is here
📽️ Stable Video Diffusion is here
Fresh out the Neural Network. Our model analyzed and ranked 1000+ papers to provide you with the following summary. Enjoy!
AlphaSignal
Hey ,
Welcome back to AlphaSignal, where we bring you the latest developments in the world of AI.
In the past few days, an impressive number of AI papers have been released, and among them, we have handpicked the top six that truly stand out.
On Today’s Summary:
Stable Video Diffusion
System 2 Attention
LQ-LoRA
Other notable papers
Reading time: 4 min 56 sec
đź“„ TOP PUBLICATIONS
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
STABILITY AI - Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti
What’s New
The paper presents Stable Video Diffusion, a model that generates videos at 576 x 1024 resolution from text descriptions or single images. It extends latent diffusion models, previously limited to 2D images, to create high-resolution videos, incorporating temporal layer adjustments.
Problem
The research addresses the inconsistency in training methods and data curation in generative video models. Traditional models varied in approach and often underutilized training data, leading to suboptimal performance.
Solution
The researchers developed a structured training process involving three stages: text-to-image pretraining on a wide-ranging dataset, video pretraining at low resolutions, and finetuning on high-quality video datasets. This approach ensures optimal data usage across different training stages, enhancing the model's performance and efficiency.
Results
The model's ability to generate videos at 14 or 25 frames, maintaining high resolution and detail, marked a significant advancement. In user preference studies, it outperformed existing models, indicating a leap in video generation technology. The research emphasizes the crucial role of data curation, showing that well-curated datasets substantially enhance performance in high-resolution video generation tasks.
Webinar: How to Build LLM Data Infrastructure, at Scale
LLMs are getting smarter and quantization algorithms are making those LLMs to be trained on smaller and smaller resources. The architecture and pipelines for building new multimodal LLMs and an ensemble of collaborative LLMs are becoming increasingly complex. Tree of Thoughts is an example of such custom architecture, which shows big potential to improve LLM accuracy significantly.
Building Data Infrastructure of such customizable LLM architecture is still an open yet extremely important problem. We will discuss how such data architectures can be built on scale.
Join Vahan Petrosyan, the co-founder and CEO at SuperAnnotate, in this webinar and engage in a dynamic Q&A session immediately following the presentation.
Date: November 30th, 9 am PST / 6 pm CET
System 2 Attention (is something you might need too)
META AI - Jason Weston, Sainbayar Sukhbaatar
What’s New
The research introduces System 2 Attention (S2A) in Large Language Models (LLMs) to address issues with soft attention in Transformers. S2A is designed to improve the handling of irrelevant or biased information by regenerating input context to focus only on relevant parts.
Problem
Traditional Transformer-based LLMs, like LLaMA-2-70B-chat, often erroneously incorporate irrelevant details from their input context, leading to less factual outputs, especially in cases involving opinionated or extraneous information.
Solution
S2A is evaluated against a baseline (standard zero-shot approach) and an oracle prompt (filtered for relevance). The S2A approach involves regenerating the context to filter out irrelevant parts and applying LLM reasoning to this refined context. This method draws inspiration from human cognitive 'System 2' processes, emphasizing deliberate attention allocation in error-prone scenarios. For factual QA and longform generation tasks, S2A uses specific prompts to emphasize factuality and objectivity.
Results
In tasks involving opinions or irrelevant content, S2A outperforms the baseline and closely matches oracle prompt performance. For factual QA with opinionated input, S2A achieved an 80.3% accuracy, nearly matching the oracle's 82.0%. It also showed higher objectivity in longform generation and improved accuracy in math word problems, illustrating its effectiveness in filtering relevant context for more accurate LLM responses.
Still struggling to bill for AI and LLM tools? Leave it to Orb.
Charging by tokens or credits? Whichever pricing model you choose - package, matrix, tiered - Orb makes it easy to implement.
Companies like Vercel, Replit, and Airbyte trust Orb to track consumption, prevent fraud, and align pricing to value (even down to GPU runtime).
Special Offer:
The first 10 qualified AlphaSignal readers to sign up get a free trial for Orb.
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Carnegie Mellon - Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim
What’s New
The paper introduces LQ-LoRA, a method for efficient adaptation of pretrained language models like RoBERTa and LLaMA-2. It combines low-rank matrix decomposition with quantized components, allowing for memory-efficient model finetuning. The key innovation is the ability to dynamically set quantization parameters (bit-width, block size) per matrix, considering an overall memory budget. This method outperforms existing approaches like QLoRA and GPTQ-LoRA, especially in aggressive quantization scenarios.
Problem
The main challenge addressed is reducing the memory footprint of large language models during finetuning, without significantly losing performance. Traditional methods often lead to high memory requirements, limiting their practical use.
Solution
LQ-LoRA decomposes each matrix into a high-precision, low-rank part and a fixed, quantized component. It uses an integer linear programming approach for the quantization process. The method includes a data-aware version utilizing the Fisher information matrix for better matrix decomposition.
Results
LQ-LoRA achieved notable results, like compressing LLaMA-2-70B to 2.85 bits with minimal performance loss. For instance, on the OpenAssistant benchmark, a 2.5-bit LLaMA-2 model trained with LQ-LoRA matched the performance of a 4-bit model trained with QLoRA. The 2.75-bit LLaMA-2-70B model, requiring 27GB of GPU memory, was competitive with the original full-precision model. This demonstrates LQ-LoRA's effectiveness in reducing memory requirements while maintaining model performance.
🏅 NOTABLE PAPERS
Covers fine-tuning language models for improved factuality in open-ended scenarios without human labels. Utilizes new NLP techniques for factuality rankings, slashing factual errors in Llama-2 by 58% in biographies and 40% in medical questions, surpassing previous methods.
ZipLoRA innovates in merging style and subject LoRAs for generative models, using a 'zipper-like' approach. It leverages SDXL's style learning, sparsity in LoRA matrices, and alignment in weight matrix columns, enhancing generation quality while capturing both subject and style accurately.
Orca 2 employs diverse reasoning strategies (step-by-step, recall-generate, recall-reason-generate, direct answer) for training smaller LMs, achieving superior performance on 15 benchmarks against models like LLaMA-Chat-70B, demonstrating enhanced task-specific reasoning in smaller LMs with advanced strategy selection capabilities.
Thank You
Hyungjin Chung is a contributing writer at AlphaSignal and second year Ph.D. student @KAIST bio-imaging signal processing & learning lab (BISPL). Prior research intern at the Los Alamos National Laboratory (LANL) applied math and plasma physics group (T-5).
Jacob Marks is an editor at AlphaSignal and ML engineer at Voxel51, is recognized as a leading AI voice on Medium and LinkedIn. Formerly at Google X and Samsung, he holds a Ph.D. in Theoretical Physics from Stanford.
Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.