• Lior's View
  • Posts
  • 🦾 Using LLMs to Train Robots Changes Everything

🦾 Using LLMs to Train Robots Changes Everything

On NVIDIA's Eureka breakthrough, Adept's Fuyu-8b, Google's secret project, and Pytorch Executorch.

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

In Today’s Email:

  • Top Releases and Announcements in the Industry

  • Adept Unveils A Multimodal Architecture for AI agents

  • NVIDIA’s Breakthrough Puts New Spin on Robot Learning

  • IBM’s New Chip Architecture for Faster and Energy-Efficient AI

Read Time: 4 min 18 sec

RELEASES & ANNOUNCEMENTS

1. Google is working on a secret new AI tool named Stubbs
Leaked info suggests Google is developing Stubbs, a no-code visual builder for AI prototypes, alongside its multimodal LLM Gemini. Stubbs aims to simplify AI app prototyping and sharing. Gemini may also integrate with Makersuite and Vertex AI, potentially enhancing developer engagement with Google's AI tools.

2. Andreessen Ignites Silicon Valley Debate with Bold "Techno-Optimist Manifesto
Andreessen from Andreessen Horowitz shared a blog post strongly supporting technology's positive impact. However, his criticism of ideas like sustainability and "tech ethics" stirred up discussions in the tech community about the right balance between innovation and responsibility.

3. Andrew NG announces his new Generative AI Course
Learn how Generative AI works, how to use it in professional or personal settings, and how it will affect jobs, businesses and society. This course is accessible to everyone, and assumes no prior coding or AI experience.

4.DALL·E 3 is now available to all ChatGPT Plus & Enterprise users
The new feature allows you to create unique images through conversation. Describe your vision, let ChatGPT generate multiple variants, and then request edits — all in real-time.

5. PyTorch Introduces ExecuTorch to Boost in Mobile and Edge Device Performance
ExecuTorch enhances on-device inference across mobile and edge devices, collaborating with industry leaders such as Arm, Apple, and Qualcomm. This platform offers third-party integration tools and model acceleration, ensuring wide compatibility and optimized performance.

Must-Read: The New Generative AI Industry Report

Nylas collected data from 1,000 developers, product and engineering leaders, and executives across North America to better understand their usage of Generative AI today and how they harness this revolutionary technology.

The result is an in-depth 27 page report providing valuable insights, statistics, and trends to help your understanding of this cutting-edge technology.

Download the full report to learn how developers and technical teams are harnessing Generative AI.

NEWS
Adept Unveils Fuyu-8B: A Multimodal Architecture for AI agents

What's New?
Adept has released Fuyu-8B, an open-source multimodal model available on Hugging Face. Fuyu-8B is distinguished in its simplicity, both in architecture and training. The model is easy to understand, scale, and deploy. Fuyu-8B is built specifically for digital agents as it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions, and do fine-grained localization on screen images.

Why Does It Matter?
The release of Fuyu-8B matters because it makes advanced multimodal AI technology more accessible. People working on AI can now use a strong model without needing a large budget. Its fast response times and good performance on benchmarks are important for many applications. However, the model is not specialized, so it needs further work for specific tasks.

Key Takeaways:

  1. Architecture: Decoder-only transformer with linear image patch projections into the first transformer layer, bypassing specialized image encoders.

  2. Versatility: Supports arbitrary image resolutions and can handle tasks like answering questions about graphs and UI elements.

  3. Speed: Responses for large images in less than 100 ms.

  4. Benchmark Performance: Competitive with models having much larger parameter counts on VQAv2, OKVQA, and COCO Captions.

NEWS
NVIDIA’s Breakthrough: Training Robots Using LLM Generated Reward Algorithms

What's New?
NVIDIA Research has unveiled Eureka, an AI agent built on GPT-4, that autonomously generates reward algorithms for training robotic systems. It has showcased this by enabling a robotic hand to perform rapid pen-spinning tricks at a level comparable to human expertise.

Why Does It Matter?
Traditional RL relies on labor-intensive, human-written reward functions. Eureka's automated generation capability signifies a major advancement in RL, removing manual overhead and increasing versatility in task coverage.

Main Takeaways

  • Versatile Learning: Compatible with multiple robot types, trained in 30 complex tasks like dexterous manipulation and dynamic balancing.

  • Performance Metrics: Outperforms human-authored rewards in 80% of tasks, 50% average performance boost, benchmarked against open-source dexterity metrics.

  • Technical Stack: Runs in GPU-accelerated Isaac Gym for quick parallel evaluation of reward candidates.

  • Human Feedback Loop: Incorporates human feedback without task-specific prompting or predefined reward templates.

  • Open-Source Integration: Algorithms work with NVIDIA Isaac Gym, built on NVIDIA Omniverse for 3D simulations.

NEWS
IBM Presents NorthPole: A New Chip Architecture for Faster and Energy-Efficient AI

What's New?
IBM Research presents NorthPole, a groundbreaking chip prototype meticulously developed at their Almaden lab in California. This chip, profoundly influenced by neural computing paradigms, brings forth substantial advancements in AI speed and energy efficiency, piquing the interest of those deeply involved in the technical aspects of AI development.

Why Does It Matter?
The quest for specialized hardware optimized for AI workloads is a pressing concern among the developer community. NorthPole, with its seamless integration of memory and processing functions, carries the potential to usher in a transformative era in AI technology.

Main Takeaways

  • Brain-Inspired Architecture: NorthPole mimics the human brain's neural structure, facilitating efficient memory and processing coexistence.

  • Performance Boost: Rigorous testing, including ResNet-50 and YOLOv4 models, demonstrates NorthPole's exceptional energy efficiency, memory utilization, and latency reduction, surpassing 12-nm GPUs by up to 25 times.

  • On-Chip Memory Breakthrough: NorthPole eliminates data transfer bottlenecks by hosting all memory on the chip, resulting in accelerated AI inferencing.

  • Versatile AI Applications: NorthPole exhibits adaptability across diverse AI tasks, from computer vision to natural language processing.

    NorthPole represents a significant advancement in AI hardware, promising tangible performance gains and operational efficiency across a range of applications, making it a valuable tool for engineers, researchers, and developers.

How was today’s email?

Not Great      Good      Amazing

Thank You

Igor Tica is a writer at AlphaSignal and a Research Engineer at SmartCat, with main expertise in Computer Vision. Passionate about contributing to the field and seeking opportunities for research collaborations that span Self-supervised and Contrastive learning.

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.