FaceFormer
FaceFormer is an open-source PyTorch implementation of a Transformer-based architecture for speech-driven 3D facial animation, introduced at CVPR 2022. An end-to-end deep learning model, it takes raw audio input and a neutral 3D face mesh to autoregressively synthesize realistic sequences of 3D facial motions with precise lip synchronization. The system supports multiple 3D mesh topologies, specifically accommodating FLAME and VOCA mesh structures, making it versatile for various animation projects. It is trained on benchmark datasets such as VOCASET and BIWI. The software allows users to train custom models, evaluate performance on test subjects, and generate rendered videos from new audio files. Key technical requirements include Ubuntu 18.04, Python 3.7, and PyTorch 1.9.0. Pretrained models are available for both datasets, enabling immediate demo run commands to animate facial meshes using provided or custom WAV audio files. The project facilitates research and development in audio-to-animation pipelines b