• Lior's View
  • Posts
  • 🦙 Video-LLaMA and Other AI Repos You Should Know About

🦙 Video-LLaMA and Other AI Repos You Should Know About

Your weekly technical digest of top projects, repos, tips and tricks to stay ahead of the curve.

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal the newsletter for AI experts.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

On Today’s Summary:

  • Repo highlight: Video-LLaMA

  • Top of Github: pyspark-ai, GLM-130B, MobileSAM,..

  • New AI Tools: Ortus, Deepdub, AudioPALM,..

  • Pytorch Tip: named_parameters()

HIGHLIGHT
🦙 Video-LLaMA

Video-LLaMA is an innovative framework that allows Large Language Models (LLMs) to understand both visual and auditory content in videos.

Unlike previous vision-LLMs like MiniGPT-4 and LLaVA, which focus on static images, Video-LLaMA faces two main challenges in video understanding: capturing changes in the visual scenes over time and combining audio-visual signals.

It introduces a Video Q-former that uses a pre-trained image encoder to process videos, and proposes a video-to-text generation task to establish a connection between videos and language.

To address the second challenge, Video-LLaMA utilizes ImageBind, a pre-trained audio encoder that works with multiple modalities, and includes an Audio Q-former to learn effective auditory query embeddings.

Video-LLaMA is trained on various pairs of video/image captions and datasets for visual instruction tuning. This training helps align the outputs of the visual and audio encoders with the embedding space of the LLM. The model has shown the ability to understand video content and generate meaningful responses based on visual and auditory information. This makes it a potential prototype for audio-visual AI assistants.

Access An Index Of Billions Of Pages With A Single API Call.

If your work involves AI, then you know the overwhelming need for new data. Your competitors might be building incredible products…but if they’re all using the same datasets to train their models, then they’re at a disadvantage.

The Brave Search API gives you access to an independent, global search index to train LLMs and power AI applications.

Brave Search is the fastest-growing search engine since Bing, and it’s 100% independent from Big Tech. Its index features billions of pages of high-quality data from real humans - and it’s constantly refreshed thanks to being default in the Brave browser.

Get started testing the API for free:

⚙️ TOP OF GITHUB

databrickslabs / pyspark-ai
It takes English instructions and compiles them into PySpark objects like DataFrames. Its goal is to make Spark more user-friendly and accessible, allowing you to focus your efforts on extracting insights from your data.

THUDM / GLM-130B
GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the algorithm of the General Language Model (GLM).

ChaoningZhang / MobileSAM
Repository of the MobileSAM project which makes Segment Anything Model (SAM) lightweight for mobile applications.

zwq2018 / Data-Copilot
Data-Copilot is a system based on LLM that handles data-related tasks and integrates data from diverse domains and user preferences, offering autonomous data management, analysis, and visualization.

chat2db / Chat2DB
Repository of an intelligent and versatile general-purpose SQL client and reporting tool for databases which integrates ChatGPT capabilities.

đź›  NEW TOOLS

Ortus
Ortus enhances YouTube viewing by providing real-time AI responses, precise timestamps, and video summaries. Integrated with Notion, it allows for automatic note-taking..

Deepdub Go
Deepdub Go automates video dubbing in 65 languages using AI. It combines transcription, translation, voice generation, and audio mixing. Unique features include emotion replication and natural idiom translation.

AudioPaLM
AudioPaLM excels in speech recognition, translation, and transcription. It uses PaLM-2's pre-training for enhanced processing and surpasses current tools in speech translation.

IMG.LY
IMG.LY enables in-browser image background removal, negating additional server costs and enhancing data privacy as it operates entirely within the user's device. Its crucial feature is swift and efficient background removal.

Aider
Aider is a command-line tool utilizing OpenAI's GPT models for effortless code writing, editing, and managing changes in git repos. It is also equipped to help GPT-4 handle larger codebases.

PYTORCH TIP
Method named_parameters()

This method is vital in scenarios where we want to leverage pre-trained models while ensuring that their weights remain fixed, or 'frozen'.

The key idea behind freezing the weights is to retain the 'knowledge' embedded within the pre-trained model while training it on a new task. It helps to prevent the model's well-learned features from being overwritten, hence acting as a significant catalyst in transfer learning.

The process of freezing weights revolves around setting requires_grad to False for every parameter in the network. PyTorch simplifies this using the named_parameters() method, which allows us to easily access all parameters of the model.

# Instantiate a pre-trained model
transfer_model = resnet50(pretrained=True)


# Freeze the weights of the model
for name, param in transfer_model.named_parameters():
	param.requires_grad = False

Want to promote your company, product, job, or event to 100,000+ AI researchers and engineers? You can reach out here.

How was today’s email?

Not Great      Good      Amazing

Thank You

Want to promote your company, product, job, or event to 100,000+ AI researchers and engineers? You can reach out here.