Skip to main content
Synchronize lip movements with audio in realtime. Transform any video stream by matching lip movements to speech audio with precise synchronization and minimal latency. The Lip Syncing Realtime API takes a video stream and an audio stream, then generates a modified version of the video where the person’s lip movements are synchronized with the provided audio in realtime.

Model Specifications

The model processes video and audio with these specifications:
  • Video Frame Rate: 25 FPS
  • Video Resolution: 450x800 (Portrait)
  • Audio Sample Rate: 16 kHz
  • Audio Channels: Mono

Realtime API

The realtime API uses WebSocket for low-latency lip synchronization. Clients establish a connection with our inference server, stream both video and audio over WebSocket messages, and receive a stream of messages - each containing modified video with synchronized lip movements and its matching audio clip. The connection remains active as long as the input streams continue, allowing you to change audio sources on-the-fly to modify the lip synchronization dynamically.

Installation

pip install decart

Basic Usage

import asyncio
import fractions
import numpy as np
import os
from decart.lipsync import RealtimeLipsyncClient

async def main():

    frames_queue = asyncio.Queue()
    audio_queue = asyncio.Queue()

    client = RealtimeLipsyncClient(api_key=os.getenv("DECART_API_KEY"))
    frame_interval = fractions.Fraction(1, client._video_fps)
    time = asyncio.get_running_loop().time

    while True:
        frame_start_time = time()
        try:
            # Get audio clip of any duration when available, should be PCM 16000Hz Mono channel, could be any length
            audio_clip: bytes = audio_queue.get_nowait()
            await client.send_audio(audio_clip)
        except asyncio.QueueEmpty:
            pass

        # Get next frame - raw image data of shape (800, 450, 3), RGB format
        frame: np.ndarray = await frames_queue.get()
        await client.send_video_frame(frame)

        try:
            # Wait for resulting frame and corresponding audio clip if already available
            out_frame, out_audio_clip = await client.get_synced_output(
                timeout=frame_interval - (time() - frame_start_time)
            )

            # HANDLE OUT_FRAME AND OUT_AUDIO_CLIP HERE
        except asyncio.TimeoutError:
            pass

        # Wait before sending next frame
        await asyncio.sleep(frame_interval - (time() - frame_start_time))


if __name__ == "__main__":
    asyncio.run(main())

Starter App - Sidekick

Check out our batteries-included template for using the Lipsync API with ElevenLabs’ Text-to-Speech to power interactive voice agents orchestrated with pipecat.ai. Source: https://github.com/DecartAI/sidekick Cookbook: https://cookbook.decart.ai/sidekick Blog: https://cookbook.decart.ai/sidekick