Lior's View
Posts
🌐 Can GPT Control Your Browser?

🌐 Can GPT Control Your Browser?

Your weekly technical digest of top projects, repos, tips and tricks to stay ahead of the curve.

Lior Sinclair
November 15, 2023

AlphaSignal

Hey ,

Welcome to this week's edition of AlphaSignal the newsletter for AI professionals.

Whether you are a researcher, engineer, developer, or data scientist, our summaries ensure you're always up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

In Today’s Summary:

Repo Highlight: vimGPT
Trending Repos: EmotiVoice, openchat
Pytorch Tip: PyTorch Lightning
Trending Models: hallucination_evaluation_model, lcm-lora-sdxl
Python Tip: Try-Except-Else Pattern

Reading time: 4 min 49 sec

^HIGHLIGHT
🕸️ vimGPT: Giving GPT-4V Access to Browser

What’s New
vimGPT allows you to control your browser via GPT prompting. It gives GPT-4V the ability to dynamically interact with web content via Vimium, keyboard-based web navigation Chrome extension

How it Works
It's hard to determine what the model wants to click on without giving it the browser DOM as text.
Screen-capture images from the web driver are passed into GPT-4V, and the LLM takes iterative actions using a type, click, and navigation API.

Features

GPT-4V Integration: Utilizes the vision capabilities of GPT-4V to interpret web content.
Vimium Extension: Employs Vimium to give the model a simple API.
Setup and Customization: Users can easily install Python dependencies and configure Vimium via Playwright, offering a customizable web browsing experience.

Going Live at 10AM PT: Granica + Nylas Gen AI Fireside Chat

Email and calendar data are a hidden gem for training LLMs. But how do you harness these data sets while navigating scalability and security challenges?

Rahul Ponnala, co-founder and CEO at Granica, and Troy Allen, SVP Engineering at Nylas, are bringing real case studies, strategies, and benchmarks to guide you. This starts in just a few hours so register before it's too late!

Date: November 15, 2023
Time: 10am - 10:45am PT / 1pm - 1:45pm ET

partner with us

⚙️ TRENDING REPOS

netease-youdao / EmotiVoice (☆ 1.9k)
EmotiVoice is an open-source text-to-speech engine that can infuse synthesized speech with a range of emotions, enabling users to create more dynamic and expressive audio content. The engine supports both English and Chinese, with over 2000 voices.

imoneoi / openchat (☆ 2.8k)
OpenChat is an open-source language model library featuring models fine-tuned using a conditioned reinforcement learning approach, which enables learning from mixed-quality data without preference labels. OpenChat’s 7B parameter model achieves performance comparable to ChatGPT.

langchain-ai / opengpts (☆ 2.6k)
Building upon LangChain, LangServe, and LangSmith, OpenGPTs offers an open-source, customizable alternative to OpenAI's GPTs. Users can select from over 60 language models, tailor prompts, and integrate various tools and databases. The platform features a sandbox for testing, tools for web browsing, and options for publishing and sharing chatbots.

daveshap / OpenAI_Agent_Swarm (☆ 2.1k)
The Hierarchical Autonomous Agent Swarm (HAAS) seeks to create a self-organizing and ethical ecosystem of AI agents. The system features a governing ethical Supreme Oversight Board (SOB), Executive Agents and specialized Sub-Agents for complex task-solving, and is designed to be self-expanding.

stas00 / ml-engineering (☆ 2.3k)
A collection of essential methodologies and scripts for training large language and multi-modal models, from performance to development and testing. The repo is an invaluable resource for large scale model training.

Speech is the world’s most untapped resource. Don’t underutilize it.

Speechmatics lets you Transcribe, translate, summarize, interpret, analyze, and understand the spoken word in real-time.

This is foundational speech technology for the AI era.

Get started for free, without code ↗

^{PYTORCH TIP}
PyTorch Lightning

PyTorch Lightning is an extension of PyTorch, abstracting complex boilerplate code, enabling more modular and scalable deep learning projects. It automates training loops, validation/testing, multi-GPU distribution, and early stopping, while maintaining PyTorch flexibility. It's ideal for rapid, organized ML model prototyping and development.

When To Use

Research and Experimentation: Ideal for rapidly testing new ideas without worrying about the underlying engineering complexity.
Large-scale Projects: Facilitates managing and scaling larger models and datasets with less effort.
Reproducibility: Ensures consistent setup across different environments, aiding in reproducibility of experiments.

Benefits

Use for cleaner code: abstracts boilerplate, focusing on model, data, and training logic.
For scalability: supports multi-GPU, TPU, and distributed training with minimal code change.
Rapid prototyping: accelerates development cycle from research to production.
Reproducibility: ensures experiments can be easily reproduced and shared.
Advanced features: enables gradient accumulation, mixed precision, etc., with less complexity.


import pytorch_lightning as pl
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

# Define a model by extending the LightningModule
class SimpleModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(10, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, _):
        x, y = batch
        y_hat = self(x)
        return nn.functional.cross_entropy(y_hat, y)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

# Data preparation
x, y = torch.randn(100, 10), torch.randint(0, 2, (100,))
loader = DataLoader(TensorDataset(x, y), batch_size=32)

# PyTorch Lightning Trainer simplifies the training process
trainer = pl.Trainer(max_epochs=5)
trainer.fit(SimpleModel(), loader)

🗳️ TRENDING MODELS/SPACES

vectara / hallucination_evaluation_model
The Cross-Encoder for Hallucination Detection is a SentenceTransformers-based model that predicts a hallucination score for input text. It was trained on NLI data and summarization datasets.

latent-consistency / lcm-lora-sdxl
The Latent Consistency Model (LCM) LoRA is an adapter for SDXL that enables efficient, high-quality text-to-image generation in 2-8 inference steps.

facebook / MusicGen
This audio generation model creates 15-second clips based on detailed descriptions of instruments and intended use. It can also incorporate melodies from reference audio to enhance the composition.

^{PYTHON TIP}
Try-Except-Else Pattern

The try-except-else pattern in Python is a structured approach to handle exceptions and execute specific code only if no exceptions occur. It adds an additional layer of logic control, allowing for clearer and safer code handling in situations where errors might occur.

When To Use

Error Prone Operations: Encapsulate operations that are likely to raise exceptions, such as file I/O, network requests, or data conversions.
Conditional Execution: Execute certain code only if the try block does not raise any exceptions, ensuring that this code is not run in case of an error.

Benefits

Clarity: Clearly separates error handling code from regular code flow, improving readability.
Safety: Ensures that certain code is executed only if there are no errors, avoiding unintended consequences.
Error Handling: Allows for more specific and controlled handling of different exceptions.


def divide_numbers(x, y):
    try:
        # Try to divide x by y
        result = x / y
    except ZeroDivisionError:
        # Executes if y is zero
        print("Cannot divide by zero.")
    else:
        # Executes if no exceptions occur
        print("Division successful. Result:", result)


divide_numbers(10, 2)  # Successful division
# Output: Division successful. Result: 5.0

divide_numbers(10, 0)  # Division by zero
# Output: Cannot divide by zero.

How was today’s email?

Not Great Good Amazing

Thank You

Igor Tica is a contributing writer at AlphaSignal and a research engineer at SmartCat, focusing on computer vision. He's actively seeking partnerships in self-supervised and contrastive learning.

Jacob Marks is an editor at AlphaSignal and ML engineer at Voxel51, is recognized as a leading AI voice on Medium and LinkedIn. Formerly at Google X and Samsung, he holds a Ph.D. in Theoretical Physics from Stanford.

Want to promote your company, product, job, or event to 150,000+ AI researchers and engineers? You can reach out here.