AnimaSync is a voice-driven 3D avatar animation engine that runs entirely in the browser. It extracts emotion from speech and generates lip sync (52 ARKit blendshapes at 30fps), facial expressions, eye blinks, and body motion using Rust/WASM and ONNX inference.

How do I install AnimaSync?

Install via npm: 'npm install @goodganglabs/lipsync-wasm-v1'. Then import, create a LipSyncWasmWrapper instance, call init(), and use processFile() or processAudioChunk() to generate animation frames.

Does AnimaSync require a server?

No. AnimaSync runs entirely client-side in the browser via WebAssembly. Audio processing, ONNX inference, and animation generation all happen on the user's device.

Open Source · Rust/WASM · 30-day Free Trial

Voice to avatar,
entirely in the browser

AnimaSync extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — no server required.

Try Demo Start Building View on GitHub

Full Animation Pipeline

🎤

Audio Input

File, Mic, or TTS
16kHz PCM

🧠

ONNX Inference

Rust/WASM engine
Phoneme → Viseme

👄

Lip Sync

52 ARKit blendshapes
jaw, mouth, tongue

😊

Expressions

Brow, cheek, eyes
Emotion from voice

👁

Eye Blink

Stochastic injection
2.5–4.5s interval

🎭

VRM Body

VRMA bone animation
Idle ↔ Speaking

ARKit Blendshapes

30 fps

Animation Output

<300ms

Mic-to-Render

Server Required

Capabilities

Everything from audio alone

One engine handles the full animation pipeline — from raw audio to animated avatar.

👄

Lip Sync

ONNX neural inference maps speech phonemes to 52 ARKit blendshapes at 30fps. Crisp mouth movements with natural co-articulation.

😊

Facial Expressions

Voice energy and pitch automatically drive brows, cheeks, eyes, and smile. Emotion follows the speaker naturally.

👁

Eye Animation

Stochastic blink injection at 2.5–4.5s intervals with 15% double-blink probability. No dead-eyed avatars.

🎭

Body Motion

Embedded VRMA bone animation clips with smooth idle-to-speaking crossfade. Breathing, gestures, and posture shifts.

🎙

Real-time Streaming

AudioWorklet captures microphone at 16kHz. Process chunks as they arrive — no need to wait for complete audio.

⚡

Client-side Only

Rust/WASM + ONNX Runtime Web. No server, no API calls, no data leaves the browser. Works offline after first load.

Animate an avatar in 4 lines

Install from npm, initialize the engine, and start generating animation frames. Works with any Three.js + VRM setup.

$ npm install @goodganglabs/lipsync-wasm-v1

import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v1';

const lipsync = new LipSyncWasmWrapper();
await lipsync.init();  // 30-day free trial

const result = await lipsync.processFile(audioFile);
const frame  = lipsync.getFrame(result, 0); // number[111]
    

Packages

Install from npm

Two engines, one API surface. Pick the engine that fits your project.

@goodganglabs/lipsync-wasm-v1

Recommended

Phoneme classification engine — 111-dim output with full expression control. Built-in IdleExpressionGenerator, VoiceActivityDetector, and VRM 18-dim mode.

Output: 111-dim ARKit

Post: OneEuroFilter + constraints

VRM: 18-dim preset mode

$ npm install @goodganglabs/lipsync-wasm-v1

@goodganglabs/lipsync-wasm-v2

Lightweight

Student distillation model — direct 52-dim ARKit blendshape prediction. Simpler post-processing, faster integration.

Output: 52-dim ARKit

Post: crisp_mouth + fade + blink

Peer: onnxruntime-web

$ npm install @goodganglabs/lipsync-wasm-v2

Examples

See it in action

Interactive demos you can try right now — no install needed.

Voice-to-Avatar Demo

Try Voice-to-Avatar Live

Upload a VRM avatar, then speak into your microphone or upload audio — watch real-time lip sync, facial expressions, eye blinks, and body motion. Fully client-side, no server needed.

Launch Demo →

Interactive Guide

Build Your Own AI Talking Avatar

6-step interactive tutorial. Download a VRM, wire up AnimaSync V1, apply lip sync, add mic streaming — with live demos at each step.

Start Guide →

V1 Engine

Phoneme Visualization

V1 phoneme engine — 111-dim output mapped to 52 ARKit blendshapes. ONNX inference with real-time visualization.

Try it →

V2 Engine

Student Model Demo

V2 student distillation model — 52 ARKit blendshapes with direct prediction. Crisp mouth, real-time rendering.

Try it →

Comparison

V1 vs V2 Side-by-Side

Same voice input, two animation engines, two avatars. See the difference live in a dual-panel view.

Try it →

GLB / RPM

RPM & GLB Avatar Lip Sync

Load any Ready Player Me, Avaturn, or custom GLB avatar. AnimaSync V2 maps ARKit blendshapes directly to Wolf3D_Head morphTargetInfluences in real-time.

Try it →

Engine Versions

Choose your engine

Two engines for different needs. Both produce ARKit-compatible output at 30fps.

Feature	V1 Recommended	V2
Output	111-dim ARKit blendshapes	52-dim ARKit blendshapes
Architecture	Phoneme classification + viseme mapping	Student distillation (direct)
Post-processing	OneEuroFilter + anatomical constraints	crisp_mouth + fade + auto-blink
Idle expressions	Built-in IdleExpressionGenerator	Blink injection in post-process
Voice activity	Built-in VoiceActivityDetector	—
Best for	Full expression control, custom avatars	Quick integration, lightweight