• Lior's View
  • Posts
  • 🥇 The 5 AI Papers You Should Read This Week

🥇 The 5 AI Papers You Should Read This Week

Fresh out the Neural Network. Our model analyzed and ranked 1000+ papers to provide you with the following summary. Enjoy!

AlphaSignal

Hey ,

Welcome back to AlphaSignal, where we bring you the latest developments in the world of AI.

In the past few days, an impressive number of AI papers have been released, and among them, we have handpicked the top six that truly stand out.

On Today’s Summary:

  • Battle of the Backbones

  • Multimodal ChatGPT for Medical Applications

  • Training FP8 Large Language Models

  • Other notable papers

Reading time: 4 min 31 sec

📄 TOP PUBLICATIONS

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Score: 9.9 Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim

Make It Simple
Goal - Find which pre-trained models excel in computer vision tasks.
Issue - Hard to choose due to model and data diversity.
Solution - Introduced a benchmarking tool, Battle of the Backbones.
Results - CNNs lead, with some SSL models close, given enough data.
Insight - Model performance might be predictable across tasks.

Objective
To conduct a large-scale evaluation of various pretrained models, including CNNs, SSL models, and vision-language models, on a wide range of computer vision tasks to establish a hierarchy of model performance.

Central Problem
The explosion of available pretrained models has created a decision-making bottleneck for those needing to deploy effective computer vision solutions across tasks such as classification, detection, and segmentation.

Solution
The study introduces 'Battle of the Backbones' (BoB), a rigorous benchmarking process that tests the efficacy of a variety of pretrained models across several computer vision tasks, providing a comparative analysis to guide the selection of backbones.

Results

  1. Models excelling in multiple tasks are identified: ConvNeXt-Base, SwinV2-Base, and CLIP ViT-Base, with ConvNeXt-Tiny, SwinV2-Tiny, and DINO ViT-Small being top choices for smaller-scale tasks.

  2. CNNs trained with supervision outperform transformer models on most tasks.

  3. SSL models demonstrate comparable or superior performance to supervised models when pretraining dataset sizes are the same.

  4. Vision transformers' performance is notably influenced by pretraining data volume and model size.

  5. Performance across tasks tends to correlate, suggesting predictability of model effectiveness.

Must Watch: OpenAI + Scale on How To Fine-TuneGPT 3.5 For Your Business

Have you considered using OpenAI's GPT-3.5 for your company, but weren’t sure where to start?

Join OpenAI and Scale on November 8th at 10 AM PT where you’ll learn:

  • When and how to fine-tune GPT-3.5

  • How to optimize your company data for fine-tuning GPT-3.5

  • Most importantly: How to avoid the biggest mistakes other enterprises made when fine-tuning GPT-3.5

This is a great opportunity to dive deep into what fine-tuning GPT-3.5 can do for enterprises, while learning from real-world use cases along the way. By the end of the session, you'll know how to get started with GPT-3.5 in your own organization.

You don’t want to miss it.

Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

Score: 9.3  Zhiling Yan, Kai Zhang, Rong Zhou, Lifang He, Xiang Li, Lichao Sun

Make It Simple
Goal - Assess GPT-4V performance in medical tasks.
Issue - Determine GPT-4V diagnostic accuracy using images.
Solution - Tests on pathology, radiology datasets; 11 modalities.
Results - Recognizes images/objects; misjudges size in multi-slice.
Insight - GPT-4V accuracy insufficient for standalone diagnostics.

Objective
The paper aims to evaluate the performance of GPT-4 with Vision (GPT-4V) in answering medical questions linked with images, focusing on its ability to handle the Visual Question Answering (VQA) task in the medical field.

Central Problem
The main issue is assessing whether GPT-4V can accurately answer diagnostic questions using pathology and radiology images across various medical scenarios, considering its potential use in real-world diagnostics.

Proposed Solution
The researchers conducted extensive testing using pathology and radiology datasets from 11 imaging modalities and involving fifteen objects of interest. They presented questions in sixteen different categories and analyzed GPT-4V's responses to evaluate its multimodal capabilities.

Results

  • GPT-4V successfully identifies different medical imaging types and the objects within them.

  • The model requires additional prompts for accurate localization, especially regarding the orientation of medical images.

  • It struggles to accurately determine the size of regions of interest (ROIs) in images that include multiple slices, like CT scans.

  • The integration of image and text inputs for diagnostic queries reveals a tendency towards visual and linguistic biases.

  • GPT-4V often provides cautious and thorough explanations, although these should not be considered definitive and require expert verification.

  • The accuracy of GPT-4V for the VQA task is not reliable or high enough to recommend its use for actual medical diagnostics.

FP8-LM: Training FP8 Large Language Models

Score: 8.6 Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu

Make It Simple
Goal: Optimize LLM training with FP8 low-bit data formats.
Issue: High cost of LLM computational resources.
Solution: FP8 automatic mixed-precision framework for LLMs.
Results: Reduced memory by 42%, increased speed by 64%.
Insight: FP8 maintains accuracy, optimizes training efficiency.

Objective
To investigate the efficiency of the FP8 data format in training large language models (LLMs), focusing on reducing resource usage without compromising model performance.

Central Problem
The main challenge is to manage the significant memory and computational costs traditionally associated with training very large models like GPT-175B.

Proposed Solution
The paper proposes an FP8 automatic mixed-precision framework that incorporates 8-bit precision in gradients, optimizer states, and distributed training to streamline the LLM training process.

Results

  • Training GPT-175B with FP8 resulted in a 42% memory usage reduction.

  • There was a 64% improvement in training speed over the BF16 framework.

  • The speed exceeded Nvidia’s Transformer Engine by 17%.

  • Model accuracy was preserved, matching the performance of models trained with higher-precision formats.

  • The FP8 framework was also effective in fine-tuning and reinforcement learning scenarios.

  • The approach and framework have been made available for public use.

🏅 NOTABLE PAPERS

PERF: Panoramic Neural Radiance Field from a Single Panorama
Score: 7.5 • PERF enhances NeRF for 360-degree view synthesis from one panorama, using novel RGBD inpainting and erasing methods for realistic 3D scene rendering, outperforming current methods.

ConvNets Match Vision Transformers at Scale
Score: 7.1 • Challenging ConvNets' limits, the study scales up training with JFT-4B, matching Vision Transformers' performance at web-scale, achieving 90.4% Top-1 accuracy on ImageNet.

What's In My Big Data?
Score: 6.6  WIMBD platform analyzes content in text corpora, revealing duplicates, low-quality content, PII, toxicity, and benchmark contamination in major datasets; it's open-sourced for transparency.

How was today’s email?

Not Great      Good      Amazing

Thank You

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.