3D Articulatory Speech Synthesis using Biomechanical Models of the Oral, Pharyngeal and Laryngeal Complex
Natural Sciences and Engineering Research Council of Canada
- Grant type: Discovery Grants Program - Individual
- Year: 2014/15
- Total Funding: $26,000
University of British Columbia
No researchers found.
No partner organizations found.
For years, traditional formant-based and acoustic-based speech synthesis techniques have largely overshadowed articulatory synthesis research. While successful in some domains, they still cannot produce natural sounding speech from text. Articulatory speech synthesis, in contrast, continues to progress steadily at the fringes of both industrial and academic interest and is now poised to provide the necessary platform to overcome basic problems in speech processing and, I believe, represents the next major advance in speech synthesis technology. As part of my long-term research program, I have been establishing computer modeling and simulation of the oral, pharyngeal and laryngeal (OPAL) complex as a critical and central tool in many disciplines, including engineering, linguistics, education, entertainment, anatomy, physiology and medicine. From this effort, the infrastructure necessary to make large advances in articulatory speech synthesis is now available. Thus, I propose to develop the OPAL complex models to include aero-acoustic simulation to create the next generation of validated, articulatory speech synthesis. This research builds upon the detailed biomechanical models of the human of the upper airway that have been used to study the mechanics of speech behavior, swallowing and mastication along with the tools needed for fast, accurate coupled FEM and rigid body simulations. The main research activities are: 1. Integration of model geometries and biomechanics to create a complete biomechanical model of the OPAL complex capable of articulatory speech synthesis; 2. Improvement of the existing 1D Navier Stokes approach to provide improved coupling between laryngeal excitation of the vocal tract filter by leveraging the work of [Birkholz et al, 2006]; 3. Development of 3D aero-acoustic simulation of vocal tract noise sources based on fluid simulation as well as CFD based approximations; 4. Creation of inverse methods for driving speech articulation by extending the work of [Stavness et al, 2013] to include support for propagating both kinematic targets from image data or electromagnetic articulography (EMA) of speech and acoustic targets, backward through the aeroacoustic model, to derive muscle activations based on inverse dynamics techniques. Speech articulation constraints will be used to help make the system better determined. 5. Creation of advanced coupled vocal fold models that will oscillate under tension and airflow. We are poised to make a substantial advance in articulatory speech synthesis. The last decade of effort in creating the medical imaging techniques, speech measurements, complex 3D biomechanical models of the OPAL complex and tools for fast, accurate coupled FEM/rigid body simulation provides the necessary infrastructure for this project. Canada stands to benefit from being at the forefront of speech synthesis technology that support the next wave of applications. Including articulatory speech synthesis into our open source toolkit will support international efforts in speech research contributing profound new knowledge from a deeper understanding of human speech production.
Computer modeling and simulation of the human body is rapidly becoming a critical and central tool in a broad range of disciplines, including engineering, education, entertainment, physiology and medicine. Detailed geometric models of human anatomy a... More ...
For years, articulatory synthesis research has been largely overshadowed by formant-based and acoustic-based speech synthesis techniques. While successful in some domains (e.g., voice-based databases), these techniques still cannot produce natural lo... More ...
ViDeX is an innovative video experience incorporating novel interface elements for reading, bookmarking and tagging, visual cueing, and crowdsourcing video content. This materializes in an engaging video experience that, when applied to teaching and ... More ...