Workflow from Scientific Research

Citation
Our speech synthesizer generates the spectrogram at time t by combining a voiced component and an unvoiced component based on a set of speech parameters at t. The upper part represents the voice pathway, which generates the voiced component by passing a harmonic excitation with fundamental frequency {f}_{0}^{ t} through a voice filter (which is the sum of six formant filters, each specified by formant frequency {f}_{i}^{ t} and amplitude {a}_{i}^{t}). The lower part describes the noise pathway, which synthesizes the unvoiced sound by passing white noise through an unvoice filter (consisting of a broadband filter defined by centre frequency {f}_{u}^{ t}, bandwidth {b}_{u}^{t} and amplitude {a}_{u}^{t}, and the same six formant filters used for the voice filter). The two components are next mixed with voice weight α t and unvoice weight 1 α t, respectively, and then amplified by loudness L t. A background noise (defined by a stationary spectrogram B(f)) is finally added to generate the output spectrogram. There are a total of 18 speech parameters at any time t, indicated in purple boxes.
Related Plots
Browse by Category
Popular Collections
Related Tags
Discover More Scientific Plots
Browse thousands of high-quality scientific visualizations from open-access research