Lior's View
Posts
🥇 Top 6 AI Papers You Should Read This Week

🥇 Top 6 AI Papers You Should Read This Week

Lior Sinclair
September 22, 2023

AlphaSignal

Hey ,

Welcome back to AlphaSignal, where we bring you the latest developments in the world of AI. In the past few days, an impressive number of AI papers have been released, and among them, we have handpicked the top six that truly stand out.

On Today’s Summary:

The Rise and Potential of Large Language Model Based Agents: A Survey
PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking
Language Modeling is Compression
Other notable papers

📄 TOP PUBLICATIONS

The Rise and Potential of Large Language Model Based Agents: A Survey

Score: 9.9 • Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou

Objective
The paper seeks to provide a structured overview of LLM-based agents, defining their key components and discussing potential applications. It aims to clarify the young and often confusing field of AI agents powered by large language models.

Central Problem
The field is in its early stages, with no clear consensus on what defines "intelligence" and "agents." The paper addresses this gap by offering a formal definition and typology of LLM-based agents.

Solution & Methodology
The authors define an agent as consisting of three main components: brain, perception, and action. The 'brain' includes natural language capabilities and reasoning. 'Perception' involves gathering data from external signals like visuals and audio. 'Action' refers to the agent's ability to interact with its environment. These components are examined in various applications such as single agents, multiple agents, and human-agent interactions.

Results
The paper outlines three main application scenarios:

Single Agent: Capable of specific tasks and scientific innovation.
Multiple Agents: Can work together to tackle complex tasks efficiently.
Human-Agent: Can function in roles ranging from executor to equal partner.

Additionally, it introduces the concept of an "Agent Society," a simulated environment where multiple LLM-based agents interact, offering insights into individual and group dynamics.

Get Latitude's dedicated GPU instances and train your heavy models in seconds

Using NVIDIA H100 GPUs, Latitude Accelerate speeds up your AI and machine learning tasks, making both training and running your models faster and more efficient.

With dedicated instances, 32-cores/GPU and hourly billing, Accelerate offers you unmatched performance and flexibility, all at the best cost per GPU on the market.

Don’t wait, secure your access to H100 GPUs

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

Score: 9.3 • Yunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun Yuan, Ying Shan

Objective
The paper aims to advance the field of fine-grained long-range tracking in videos. The authors argue that current methods and datasets are not adequate for this task. They emphasize that long-range tracking is different from optical flow and is influenced by various real-world factors like camera movement and object interactions.

Central Problem
The main issue is that tracking pixels over long periods in a video is hard due to unpredictable yet model-able factors such as camera shake, object movements, and interactions between objects. Existing methods and datasets fall short in addressing these challenges.

Proposed Solution

PointOdyssey Dataset: The authors introduce a new large-scale synthetic dataset, PointOdyssey, which is designed for training and testing long-term fine-grained tracking algorithms.
PIPs++ Method: Building on the state-of-the-art "Persistent Independent Particles" (PIPs), the authors propose PIPs++, a method that can look at 8 frames at once, making it more robust to issues like objects hiding behind each other.

Methodology

The dataset is diverse, created using simulated assets like human shapes, animals, and different lighting conditions.
Real-world motion capture data is used to animate characters in 3D scenes, and various camera motions are applied to mimic real-world settings.

Results

The authors test multiple tracking methods and find that their proposed PIPs++ performs the best.
Training on PointOdyssey improves the performance of other methods on real-world datasets.

AlphaSignal Readers Get 30% OFF the AI Conference ↗

Few tickets left - September 26/27

• Fireside chat on Algorithmic Warfare with speakers from the Department of Defense and Stanford's professor of Human Centered AI.

• Showcase your product at The AI Conference by emailing [email protected]

Use code: alpha30

Language Modeling is Compression

Score: 8.2 • Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya

Objective
The paper investigates the link between information theory's concept of compression and how large language models (LLMs) are trained to predict the next token. The authors claim that LLMs are inherently trained for maximum compression and compare their performance with general-purpose compressors like gzip.

Central Problem
The paper aims to shift the focus from viewing LLMs solely as generative models to recognizing their capability as predictive models and compressors for various data types.

Methodology

General-purpose compressors like gzip, LZMA2, PNG, and FLAC are used as baselines.
LLMs tested include vanilla Transformers and Chinchilla foundation models.
Test Datasets span text (enwik9), image (ImageNet), and audio (LibriSpeech).
The paper also checks if these compressors can act as generative models for different data types.

Results

Chinchilla models beat general-purpose and domain-specific compressors in all tasks.
Transformers excel only in tasks they were trained for.
LLMs show a faster drop in compression rate with increasing sequence length compared to general-purpose compressors.
Different tokenization strategies and model sizes have diverse impacts.

🏅 NOTABLE PAPERS

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
Score: 8.2 • Self-speculative decoding speeds up Large Language Models without sacrificing quality or needing extra training. It uses a two-stage process: quick drafting and final verification. Speedup up to 1.73x demonstrated.

Agents: An Open-source Framework for Autonomous Language Agents
Score: 7.3 • New open-source library, Agents, simplifies building autonomous language agents using large language models. Features include planning, multi-agent communication, and easy customization for researchers.

Chain-of-verification reduces Hallucination in large language models
Score: 6.9 • The Chain-of-Verification (CoVe) method reduces factual errors in large language models by drafting, fact-checking, and verifying responses. Proven effective in tasks like MultiSpanQA.

How was today’s email?

Not Great Good Amazing

Hyungjin Chung is a contributing writer at AlphaSignal and second year Ph.D. student @KAIST bio-imaging signal processing & learning lab (BISPL). Prior research intern at the Los Alamos National Laboratory (LANL) applied math and plasma physics group (T-5).

Thank You