• Lior's View
  • Posts
  • 📚 Microsoft's New Generative AI Course

📚 Microsoft's New Generative AI Course

Your weekly technical digest of top projects, repos, tips and tricks to stay ahead of the curve.

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal the newsletter for AI professionals.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

On Today’s Summary:

  • Repo Highlight: The Generative AI Course

  • Trending Repos: videocrafter, super-gradients

  • Pytorch Tip: Model Ensembling

  • Trending Models: Yi-34B, kosmos-2

  • Python Tip: Regular Expressions

Reading time: 4 min 38 sec

HIGHLIGHT
⚡️ The Generative AI Course by Microsoft

What’s New
Microsoft’s Generative AI course distills the essentials of Generative AI into a 12-lesson curriculum. It covers prompting, vector search, text and image generation, and building responsibly, using Jupyter Notebooks for hands-on learning and real-world project assignments.

Each lesson includes:

  • a short video introduction to the topic

  • written lesson located in the README

  • for project-based lessons, a Jupyter Notebook with code examples

  • a challenge or assignment to apply your learning

  • links to extra resources to continue your learning

Features

  • Acquire hands-on experience in deploying LLMs for robust Generative AI application development.

  • Master prompt engineering techniques to optimize AI model responses and functionality.

  • Implement Azure OpenAI services for scalable, enterprise-level machine learning solutions.

  • Utilize vector databases and embeddings for advanced semantic search capabilities in AI.

  • Learn secure API management and ethical AI use within professional engineering workflows.

Speech is the world’s most untapped resource. Don’t underutilize it.

Speech Intelligence by Speechmatics makes audio and media the biggest drivers of value for your business.

It combines highly accurate automatic speech recognition (ASR) with the latest breakthroughs in AI and Large Language Models to interpret, analyze, and understand huge volumes of the spoken word to:

  • Drive customer insights

  • Unlock global translation

  • Understand speech across multiple languages.

  • All in real-time.

This is foundational speech technology for the AI era.  Get started for free, without code.

⚙️ TRENDING REPOS

ailab-cvc / videocrafter(☆ 2.8k)
VideoCrafter introduces two open-source diffusion models for video generation: a text-to-video model that can generate realistic and cinematic-quality videos at 1024×576, and the first open source image-to-video foundation model that can preserve the style, structure and content of the reference image.

Deci-AI / super-gradients (☆ 3.4k)
Super Gradients is a library for training state-of-the-art (SOTA) computer vision models, supporting quantization-aware training, knowledge distillation, and transfer learning. The library was used to train the SOTA detection and pose estimation models YOLO-NAS and YOLO-NAS Pose.

Vaibhavs10 / insanely-fast-whisper (☆ 1.1k)
Insanely Fast Whisper rapidly accelerates transcription tasks by combining OpenAI's Whisper Large v2 with 🤗 Transformers, Optimum, and flash attention. On a Google Colab T4 GPU, the process can transcribe five hours of audio in less than five minutes. The library has Python and CLI support.

deepseek-ai / DeepSeek-Coder (☆ 1.3k)
DeepSeek Coder introduces state-of-the-art open source code completion and infilling models trained on a diverse 2T token set. The models range from 1B to 33B parameters specialize in project-level code completion, and can be used with Hugging Face’s transformers library.

jianchang512 / pyvideotrans (☆ 1.2k)
Video translation tool with dubbing, combining OpenAI’s Whisper for speech recognition, multiple interfaces for text translation, and Microsoft Edge TTS for speech synthesis. Users can set the dubbing speed and the duration of silence for splitting speech into segments.

PYTORCH TIP
Model Ensembling

Model ensembling is a technique where predictions from multiple models are combined to produce a final prediction. This method leverages the strengths of each model to achieve better performance than any single model could on its own. There are many ensembling techniques, including bagging (aggregating over individual models) and boosting (sequentially combining models). Today, we’ll focus on bagging.

When To Use

  • Varied Predictors: When you have models trained on different aspects of the data or with different architectures that capture various patterns.

  • Performance Boost: To improve predictive power on complex tasks where a single model's perspective is insufficient.

  • Stability: To create a more robust model that is less sensitive to the variance within individual model predictions.

Benefits

  • Accuracy: Often yields higher accuracy by averaging out errors from individual models.

  • Robustness: Reduces the chance of an occasional bad prediction affecting the overall performance.

  • Confidence: Ensemble models can provide a measure of confidence in predictions based on the level of agreement between models.


import torch
from torch.nn import Softmax

# Dummy models for demonstration
class Model(torch.nn.Module):
def forward(self, x):
return torch.nn.functional.softmax(x, dim=1)

# Instantiate two models
model1 = Model()
model2 = Model()

# Dummy input tensor
input_tensor = torch.randn(1, 10)

# Make predictions
pred1 = model1(input_tensor)
pred2 = model2(input_tensor)

# Ensemble predictions by averaging
ensemble_pred = (pred1 + pred2) / 2

print(ensemble_pred)

# OUT: tensor([[0.0284, 0.0401, 0.0854, 0.03,.])

🗳️ TRENDING MODELS/SPACES

Yi-34B
The Yi series comprises advanced bilingual (English/Chinese) LLMs with 6B and 34B parameters, excelling in extended context understanding up to 200K tokens. The Yi 34B models achieve state-of-the-art performance on a range of benchmarks, from MMLU to Reading Comprehension.

kosmos-2-patch14-224
Kosmos-2 is a Multimodal Large Language Model (MLLM) from Microsoft which represents referring expressions as Markdown links. The model is compatible with the HuggingFace transformers library, and can be used for multimodal grounding, multimodal referring, perception-language tasks, and language understanding.

TinyLlama-1.1B-intermediate-step-715k-1.5T
TinyLlama is a compact 1.1B parameter Llama model trained on 3 trillion tokens. The model is built for efficiency and low footprint applications, and is designed to be compatible with Llama 2 architecture and tokenizer. This intermediate checkpoint is not intended for use in production.

PYTHON TIP
Regular Expressions (Regex)

Regular expressions are a powerful tool for performing complex searches and manipulations on strings which involve pattern-matching. Python's re module makes it easy to apply regular expressions to strings for efficient parsing, splitting, searching, or replacing of text.

When To Use

  • Parsing Text: When you need to extract structured information from strings, like phone numbers or email addresses.

  • Data Validation: To check if strings follow a certain pattern, such as verifying user input.

  • Search and Replace: When you need to find and replace substrings in text data efficiently.

Benefits

  • Flexibility: Regular expressions can match a wide range of patterns with a single expression.

  • Efficiency: The re library supports compiling regular expressions so they can be efficiently applied multiple times.

  • Scalability: Regular expressions can handle large texts and perform many types of manipulations in a single pass.


import re

# Define a pattern to
# match email addresses
email_pattern = (
"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]"
"+\.[a-zA-Z]{2,}"
)

text = (
"Please contact us at [email protected]"
"for assistance."
)

# Search for the pattern in the text
match = re.search(email_pattern, text)

if match:
print("Email found:", match.group())
else:
print("No email found in the text.")

How was today’s email?

Not Great      Good      Amazing

Thank You

Igor Tica is a contributing writer at AlphaSignal and a research engineer at SmartCat, focusing on computer vision. He's actively seeking partnerships in self-supervised and contrastive learning.

Jacob Marks is an editor at AlphaSignal and ML engineer at Voxel51, is recognized as a leading AI voice on Medium and LinkedIn. Formerly at Google X and Samsung, he holds a Ph.D. in Theoretical Physics from Stanford.

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.