Open Source · Rust/WASM · 30-day Free Trial

Voice to avatar,
entirely in the browser

AnimaSync extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — no server required.

Try Demo Start Building View on GitHub
Full Animation Pipeline
🎤
Audio Input
File, Mic, or TTS
16kHz PCM
🧠
ONNX Inference
Rust/WASM engine
Phoneme → Viseme
👄
Lip Sync
52 ARKit blendshapes
jaw, mouth, tongue
😊
Expressions
Brow, cheek, eyes
Emotion from voice
👁
Eye Blink
Stochastic injection
2.5–4.5s interval
🎭
VRM Body
VRMA bone animation
Idle ↔ Speaking
52
ARKit Blendshapes
30 fps
Animation Output
<300ms
Mic-to-Render
0
Server Required

Everything from audio alone

One engine handles the full animation pipeline — from raw audio to animated avatar.

👄

Lip Sync

ONNX neural inference maps speech phonemes to 52 ARKit blendshapes at 30fps. Crisp mouth movements with natural co-articulation.

😊

Facial Expressions

Voice energy and pitch automatically drive brows, cheeks, eyes, and smile. Emotion follows the speaker naturally.

👁

Eye Animation

Stochastic blink injection at 2.5–4.5s intervals with 15% double-blink probability. No dead-eyed avatars.

🎭

Body Motion

Embedded VRMA bone animation clips with smooth idle-to-speaking crossfade. Breathing, gestures, and posture shifts.

🎙

Real-time Streaming

AudioWorklet captures microphone at 16kHz. Process chunks as they arrive — no need to wait for complete audio.

Client-side Only

Rust/WASM + ONNX Runtime Web. No server, no API calls, no data leaves the browser. Works offline after first load.

Animate an avatar in 4 lines

Install from npm, initialize the engine, and start generating animation frames. Works with any Three.js + VRM setup.

$ npm install @goodganglabs/lipsync-wasm-v1
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v1'; const lipsync = new LipSyncWasmWrapper(); await lipsync.init(); // 30-day free trial const result = await lipsync.processFile(audioFile); const frame = lipsync.getFrame(result, 0); // number[111]

Install from npm

Two engines, one API surface. Pick the engine that fits your project.

@goodganglabs/lipsync-wasm-v1
Recommended

Phoneme classification engine — 111-dim output with full expression control. Built-in IdleExpressionGenerator, VoiceActivityDetector, and VRM 18-dim mode.

Output: 111-dim ARKit
Post: OneEuroFilter + constraints
VRM: 18-dim preset mode
$ npm install @goodganglabs/lipsync-wasm-v1
@goodganglabs/lipsync-wasm-v2
Lightweight

Student distillation model — direct 52-dim ARKit blendshape prediction. Simpler post-processing, faster integration.

Output: 52-dim ARKit
Post: crisp_mouth + fade + blink
Peer: onnxruntime-web
$ npm install @goodganglabs/lipsync-wasm-v2

See it in action

Interactive demos you can try right now — no install needed.

Voice-to-Avatar Demo

Try Voice-to-Avatar Live

Upload a VRM avatar, then speak into your microphone or upload audio — watch real-time lip sync, facial expressions, eye blinks, and body motion. Fully client-side, no server needed.

Launch Demo →
Interactive Guide

Build Your Own AI Talking Avatar

6-step interactive tutorial. Download a VRM, wire up AnimaSync V1, apply lip sync, add mic streaming — with live demos at each step.

Start Guide →
V1 Engine

Phoneme Visualization

V1 phoneme engine — 111-dim output mapped to 52 ARKit blendshapes. ONNX inference with real-time visualization.

Try it →
V2 Engine

Student Model Demo

V2 student distillation model — 52 ARKit blendshapes with direct prediction. Crisp mouth, real-time rendering.

Try it →
Comparison

V1 vs V2 Side-by-Side

Same voice input, two animation engines, two avatars. See the difference live in a dual-panel view.

Try it →

Choose your engine

Two engines for different needs. Both produce ARKit-compatible output at 30fps.

Feature V1 Recommended V2
Output111-dim ARKit blendshapes52-dim ARKit blendshapes
ArchitecturePhoneme classification + viseme mappingStudent distillation (direct)
Post-processingOneEuroFilter + anatomical constraintscrisp_mouth + fade + auto-blink
Idle expressionsBuilt-in IdleExpressionGeneratorBlink injection in post-process
Voice activityBuilt-in VoiceActivityDetector
Best forFull expression control, custom avatarsQuick integration, lightweight