Creating natural looking speech for characters in animated movies is a difficult process, and when done poorly, it can be very distracting. Disney Research has been working with teams at Carnegie Mellon University, Caltech and the University of East Anglia to enhance automated lip syncing and the creation of realistic animated speech.
Disney Research is using a deep learning approach that allows a computer to take spoken words from an actor, predict the mouth shape an animated character would need to say those words, and animate the character with lip synced audio. The system requires a single speaker to provide eight hours of reference video reciting more than 2,500 phonetically diverse sentences. This footage is used to create an animated “reference face,” which can later be combined with recorded audio to create realistic speech.
It sounds a little complicated, but this video from Disney Research helps break it down.
WATCH – DEEP LEARNING APPROACH FOR SPEECH ANIMATION:
As animation shifts more toward computer generation and away from hand-drawn characters, creating realistic looking speech has been a challenge. During a presentation at the recent SIGGRAPH conference, lead researcher Dr. Sarah Taylor described realistic speech animation as “time consuming and costly.”
Disney Research hopes to change that with their new approach. “Our goal is to automatically generate production-quality animated speech for any style of character, given only audio speech as an input,” Dr. Taylor said.
To read Disney Research’s paper on this project, visit their website.