Lior's View
Posts
⚡️ GPU-Level Inference on Your CPU

⚡️ GPU-Level Inference on Your CPU

Your weekly technical digest of top projects, repos, tips and tricks to stay ahead of the curve.

Lior Sinclair
October 20, 2023

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal the newsletter for AI professionals.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

On Today’s Summary:

Repo Highlight: DeepSparse
Trending Repos: promptflow, litellm, dreamgaussian
Pytorch Tip: TensorBoard
Trending Models: med42-70b, BakLLaVA-1
Python Tip: “Else” in Loops

Reading time: 4min 9 sec

^HIGHLIGHT
⚡️ DeepSparse: Enabling GPU-Level Inference on Your CPU

What’s New
DeepSparse accelerates inference for deep learning models on CPUs using neural network sparsity. Through sparse kernels, 8-bit quantization, pruning, and intelligent caching of attention keys/values, the library enables GPU-like inference speeds for LLMs on commodity CPUs.

Why Does It Matter
Traditionally, deploying deep learning models on CPUs has meant compromising speed for scalability and flexibility. DeepSparse's support for efficient inference on CPUs means that you can deploy performant models without being tied to accelerators, reducing costs and barriers to entry in developing ML applications.

How it Works
The framework uses a method from a recent study called "Sparse Finetuning" to prune and quantize the model. For MosaicML’s MPT-7B, this technique enables pruning to 60% sparsity without losing accuracy. With DeepSparse, the sparse model runs 7 times faster than the original dense model.

Features

Broad Model Compatibility: Supports LLMs, BERT, ViT, ResNet, and other popular architectures.
Versatile Deployment: Adaptable across various hardware, from cloud to edge.
Seamless Integration: Reduces deployment complexities and compatibility issues.

⚙️ TRENDING REPOS

microsoft / promptflow (☆ 6k)
Prompt flow is a suite of development tools designed to support the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, and evaluation to production deployment and monitoring.

BerriAI / litellm (☆ 2k)
LiteLLM provides a unified platform to interact with over 100 LLM APIs using the OpenAI format, including providers like Anthropic, HuggingFace, Cohere, TogetherAI, Azure, etc.

dreamgaussian / dreamgaussian (☆ 2k)
DreamGaussian is a novel 3D content generation framework that offers both efficiency and quality. It relies on a 3D Gaussian Splatting model and can produce high-quality textured meshes from a single-view image in just 2 minutes, achieving a speed roughly 10 times faster than the prior state of the art methods.

aigc-apps / sd-webui-EasyPhoto (☆ 3k)
EasyPhoto is a Stable Diffusion web UI (SD-WebUI) plugin for generating AI portraits by training a digital doppelganger of users with 5 to 20 images. It provides a browser interface based on the Gradio library for Stable Diffusion models.

audio-agi / audiosep (☆ 1k)
AudioSep is a foundation model for open-domain sound separation with natural language queries. It demonstrates strong separation performance and zero-shot generalization ability on numerous tasks such as audio event separation, musical instrument separation, and speech enhancement.

^{PYTORCH TIP}
TensorBoard

TensorBoard is a visualization toolkit from TensorFlow that can be seamlessly integrated with PyTorch for tracking and monitoring experiments. It offers powerful visualization tools for model training metrics, network architecture, and even embeddings.

When To Use

Training Monitoring: Track loss, accuracy, and other metrics in real time during training.
Network Visualization: See a detailed view of your model architecture.
Embedding Visualization: Display high-dimensional embeddings in 3D space.

Benefits

Interactive Visualizations: Explore metrics through interactive charts and graphs.
Hyperparameter Tuning: Track and compare different training runs to refine hyperparameters.
Collaboration: Share experiment results with your team.

By launching TensorBoard and pointing it to the runs directory, you can visualize your training progress and model architecture interactively.


from torch.utils.tensorboard import SummaryWriter
import torchvision

# Init writer and model
writer = SummaryWriter('runs/demo')
model = torchvision.models.resnet50()
dummy_data, _ = load_dataset()

# Add model graph
writer.add_graph(model, dummy_data)

# Fake training loop for demo
for epoch in range(5):
    loss = epoch * 0.1  # Simulated loss
    writer.add_scalar('train_loss', loss, epoch)

# Close writer
writer.close()

🗳️ TRENDING MODELS

med42-70b
Med42 is a clinical large language model developed, designed to provide high-quality answers to medical questions and enhance clinical decision-making. It demonstrates competitive performance across multiple medical benchmarks, outperforming several notable models in various tests.

BakLLaVA-1
BakLLaVA 1, a multimodal architecture, is a Mistral 7B base model augmented with the LLaVA 1.5 architecture, demonstrating superior performance over Llama 2 13B on various benchmarks. It represents an open-source alternative to proprietary visual-text models (e.g. GPT-4).

OpenHermes-2-Mistral-7B
This model represents the state-of-the-art fine-tuned version of Mistral 7B. It is trained on 900.000 entries, primarily from GPT-4 generated data and could be used for answering questions and assisting in areas based on its training data (e.g. related to GPT4All, BigBench, and AGI Eval datasets).

^{PYTHON TIP}
“Else” in Loops

The ‘else’ clause in loops is a unique Python feature that allows you to execute a block of code only when the loop completes without encountering a ‘break’ statement. This can be especially useful when you're searching for items in a list and want to execute specific logic if the item isn't found.

When To Use

Search Operations: When iterating through a collection and looking for a specific item, the ‘else’ clause can indicate that the search was unsuccessful.
Validation Checks: Verify if a loop completes without any early exit, indicating that all items meet a certain condition.

Benefits

Improved Readability: Explicitly communicates the intention of handling cases where the loop ran to completion.
Efficient Logic Handling: Avoids additional flags or checks outside the loop to determine if the loop completed fully.


for item in my_list:
   if condition(item):
      # Found the item or 
      # condition satisfied
    	break
else:
   # This block runs only if the 
   # loop didn't encounter a 'break'
   print(
      "Item not found or condition " 
      "not met for any item."
   )

How was today’s email?

Not Great Good Amazing

Thank You

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.