• Lior's View
  • Posts
  • ๐ŸŽ Apple's New Open-Source ML Framework

๐ŸŽ Apple's New Open-Source ML Framework

Your weekly technical digest of top projects, repos, tips and tricks to stay ahead of the curve.

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal the newsletter for AI professionals.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

In Todayโ€™s Summary:

  • Repo Highlight: MLX

  • Trending Repos: marker, gpt-fast

  • Pytorch Tip: Softmax with Temperature

  • Trending Models: sdxl-turbo, Starling-LM

  • Python Tip: Reduce for Cumulative Operations

Reading time: 4 min 48 sec

HIGHLIGHT
MLX: An array framework for Apple silicon

Whatโ€™s New
Apple has launched MLX, a new machine learning framework specifically optimized for Apple Silicon. It makes model development and deployment straightforward within Apple's ecosystem.

MLX is designed to offer a familiar working environment for those accustomed to existing ML frameworks, drawing inspiration from established players like NumPy, PyTorch, Jax, and ArrayFire.

How it Works
The Python API mirrors NumPy and has higher level APIs like mlx.nn and mlx.optimizers that closely match PyTorch, making it familiar for PyTorch users. Computation graphs are constructed dynamically so changing arguments doesn't require compilation. MLX uses lazy evaluation, only materializing arrays as needed.

Models can run on CPU or GPU without data transfer between devices due to the unified memory model. Data loading utilities are provided in the mlx-data library. This model ensures that data resides in shared memory, enabling operations across different device types without the need for data migration.

Key features include lazy computation, where calculations are deferred until necessary, and dynamic graph construction, which allows for more flexibility in changing function argument shapes without the overhead of recompilation. This makes debugging more straightforward and intuitive.

The included model examples and data manipulation utilities enable rapid prototyping by building on recent advances like LLaMA, LoRA, and Stable Diffusion.

Features

  • Unified memory between CPU and GPU avoids data transfer

  • Familiar NumPy and PyTorch-like APIs

  • Lazy evaluation and dynamic graphs for faster iteration

  • Includes utils for efficient data loading

  • Lower level C++ API also available

Introducing Deepgram Aura: A Text-to-speech API for Voice AI Agents

In an LLM-centric world, speech-to-text and text-to-speech technologies have become indispensable.

Introducing Aura, a powerful real-time text-to-speech (TTS) API designed for conversational voice applications. Compared to alternatives, Aura produces human-like speech more quickly and efficiently.

Learn more about Deepgram Aura, or be the first to try it out.

TRENDING REPOS

marker (โ˜† 3.6k)

Marker converts PDF, EPUB, and MOBI files to markdown. It's 10x as fast as nougat and more accurate, and includes features like header/footer removal, equation conversion to Latex, and multi-language support.

gpt-fast (โ˜† 3.5k)

PyTorch native implementation of transformer models which achieve almost 200 token/second token generation with Llama-2-7B on a single GPU. The repository illustrates versions with quantization, speculative decoding, and tensor parallelism, and is meant to be forked.

unsloth (โ˜† 1.7k)

Unsloth makes local LLM finetuning up to 5x faster without loss in accuracy with optimized GPU kernels. The package is compatible with Nvidia GPUs from 2018 onward.

meditron (โ˜† 1k)

A family of open-source medical Large Language Models adapted from Llama2 with 7B and 70B parameters. Meditron-70B surpasses Llama-2-70B, GPT-3.5 and Flan-PaLM in medical reasoning, and sports a 4k token context length.

rags (โ˜† 4.7k)

A Streamlit app that lets you build a Retrieval-Augmented Generation (RAG) pipeline over your own data using just natural language. RAGs supports setting RAG parameters like top-k retrieval, and chunk size via UI. The app is compatible with LLMs from OpenAI, Anthropic, Hugging Face, and Replicate.

Speechmatics has launched Real-Time Translation as part of its all-in-one Speech API.

Their new self-supervised model can bring your product or service to the largest audience possible, without the hassle of multiple different language APIs and lengthy setup times.

PYTORCH TIP
Softmax with Temperature

Softmax with temperature scaling is a technique in deep learning to control the sharpness of the probability distribution output by the softmax function. By adjusting the temperature parameter, you can make the distribution softer (higher temperature) or sharper (lower temprature).

When To Use

  • Classification: For classification networks trained on limited or biased data, the default softmax scaling may be overconfident. Increase temperature to lower confidence.

  • Reinforcement Learning: Use temperature to adjust the balance between exploration and exploitation. Increase softmax temperature to encourage exploration.

  • Model Ensembles: Control the way confidence scores from constituent models are combined.

Benefits

  • Flexible: Offers a simple yet effective way to adjust the behavior of the softmax function.

  • Interpretable: Physically inspired, temperature scaling offers an intuitive control knob.

  • Compatible with Gradient-Based Training: The temperature-scaled softmax is differentiable, so temperature can be incorporated directly into the model architecture and trained end-to-end.


import torch
import torch.nn.functional as F

def softmax_with_temp(logits, temp=1.0):
return F.softmax(logits / temp, dim=-1)

logits = torch.tensor([[1.0, 2.0, 3.0]])

# Apply softmax with temperatures

# Default temperature (1.0)
default = softmax_with_temp(logits)

# Sharper
colder = softmax_with_temp(logits, 0.5)

# Softer
warmer = softmax_with_temp(logits, 2.0)

print('Default:', default.tolist())
print('Colder:', colder.tolist())
print('Warmer:', warmer.tolist())

Default: [[0.09003057330846786,
0.2447284758090973, 0.6652409434318542]]

Colder: [[0.01587624102830887,
0.11731042712926865, 0.8668133616447449]]

Warmer: [[0.18632373213768005,
0.30719590187072754, 0.5064803957939148]]

PYTHON TIP
Reduce for Cumulative Operations

The functools.reduce function is a powerful tool for performing cumulative operations on iterables. It successively applies an operation to the elements of an iterable, reducing it to a single cumulative value.

When To Use

  • Iterative Calculations: Ideal for summing, multiplying, concatenating strings, converting number bases, and other scenarios with successive application of a function.

  • Data Processing Pipelines: In conjunction with map and filter operations, iterative reduction can streamline sequences of operations that transform data.

Benefits

  • Efficiency: Faster than looping through data, with lower memory footprint

  • Flexibility: Can be used with any function that takes two inputs and returns one output.

  • Readability: Simplifies complex operations into concise, readable code.


from functools import reduce

# Function to apply (e.g., to calculate product)
def multiply(x, y):
return x * y

# Iterable (e.g., a list of numbers)
numbers = [1, 2, 3, 4, 5]

# Using reduce to calculate the product of numbers
product = reduce(multiply, numbers)

print("Product of numbers:", product)

# Output:
# Product of numbers: 120

How was todayโ€™s email?

Not Great      Good      Amazing

Thank You

Igor Tica is a contributing writer at AlphaSignal and a research engineer at SmartCat, focusing on computer vision. He's actively seeking partnerships in self-supervised and contrastive learning.

Jacob Marks is an editor at AlphaSignal and ML engineer at Voxel51, is recognized as a leading AI voice on Medium and LinkedIn. Formerly at Google X and Samsung, he holds a Ph.D. in Theoretical Physics from Stanford.

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.