Documentation Index
Fetch the complete documentation index at: https://docs.platform.decart.ai/llms.txt
Use this file to discover all available pages before exploring further.
Synchronize lip movements with audio in realtime. LipSync Live transforms any video stream by matching lip movements to speech audio with precise synchronization and minimal latency.
The LipSync Live realtime API takes a video stream and an audio stream, then generates a modified version of the video where the person’s lip movements are synchronized with the provided audio in realtime.
Model Specifications
The model processes video and audio with these specifications:
- Video Frame Rate: 25 FPS
- Video Resolution: 450x800 (Portrait)
- Audio Sample Rate: 16 kHz
- Audio Channels: Mono
Realtime API
The realtime API uses WebSocket for low-latency lip synchronization. Clients establish a connection with our inference server, stream both video and audio over WebSocket messages, and receive a stream of messages - each containing modified video with synchronized lip movements and its matching audio clip.
The connection remains active as long as the input streams continue, allowing you to change audio sources on-the-fly to modify the lip synchronization dynamically.
Installation
Basic Usage
import asyncio
import fractions
import numpy as np
import os
from decart.lipsync import RealtimeLipsyncClient
async def main():
frames_queue = asyncio.Queue()
audio_queue = asyncio.Queue()
client = RealtimeLipsyncClient(api_key=os.getenv("DECART_API_KEY"))
frame_interval = fractions.Fraction(1, client._video_fps)
time = asyncio.get_running_loop().time
while True:
frame_start_time = time()
try:
# Get audio clip of any duration when available, should be PCM 16000Hz Mono channel, could be any length
audio_clip: bytes = audio_queue.get_nowait()
await client.send_audio(audio_clip)
except asyncio.QueueEmpty:
pass
# Get next frame - raw image data of shape (800, 450, 3), RGB format
frame: np.ndarray = await frames_queue.get()
await client.send_video_frame(frame)
try:
# Wait for resulting frame and corresponding audio clip if already available
out_frame, out_audio_clip = await client.get_synced_output(
timeout=frame_interval - (time() - frame_start_time)
)
# HANDLE OUT_FRAME AND OUT_AUDIO_CLIP HERE
except asyncio.TimeoutError:
pass
# Wait before sending next frame
await asyncio.sleep(frame_interval - (time() - frame_start_time))
if __name__ == "__main__":
asyncio.run(main())
Starter App - Sidekick
Check out our batteries-included template for using the Lipsync API with ElevenLabs’ Text-to-Speech to power interactive voice agents orchestrated with pipecat.ai.
Source: https://github.com/DecartAI/sidekick
Cookbook: https://cookbook.decart.ai/sidekick
Blog: https://cookbook.decart.ai/sidekick