HomeAboutContact
Supported vowel systemsAkustyk featuresDownloadDocumentation

 
Tutorials
Akustyk links
Linguistics
Audio technology
Recommendations

Speech synthesis with Akustyk

Akustyk 1.8 offers three speech synthesis modules. They are based on industry-proven methods, such as LPC analysis/re-synthesis and PSOLA. Akustyk enables a high level of parametric control to produce very naturalistic-sounding speech samples.

Below is a brief demo of Akustyk's speech synthesis functionality. Note that you can obtain video tutorials with detailed instructions on this free DVD.

  1. Module 1 - Modify (synthesize) a dynamic formant contour
  2. Module 2 - Create a speech continuum
  3. Module 3 - Create a pitch-varied minimal pair

Module 1 - Modify (synthesize) a dynamic formant contour

Note: video tutorials on how to use this module can be obtained on this free DVD

The module "Create basic synthesis..." (Praat/Akustyk/Create basic synthesis...) provides automated, high-quality dynamic re-synthesis of formant contours (trajectories). It is modeled on the LPC-analysis/re-synthesis tools available in KayPentax ASL software. It offers a comparable signal quality and "smart" automation. It does not have the ASL pen tool, which allows one to draw formant contours in a convenient graphic user interface.

Below are examples of dynamic trajectory synthesis with Akustyk 1.8:

1. Global change in formant frequencies and bandwidths

The spectrogram below shows the American English (urban Minnesota dialect) phrase "Katie heard Bob say the word ex."

spectrogram

Imagine that this phrase is to be used as a stimulus in a dialect recognition task (or any other sociophonetic perception task). We want to manipulate the pronunciation of the word "bob" in order to make it sound similar to the pronunciation typical of a Northern Cities Vowel Shift (NCVS). Because the vowel /a/ has a relatively flat F2 trajectory, all we have to do is modify this trajectory globally (i.e., by the same formula from onset to offset) to obtain the characteristic "fronted" sound (i.e., with increased F2). In this example, the F2 of /a/ in "bob" was increased by 150 Hz and the F2 bandwidth decreased by 50 Hz. Note that you can use any mathematical formula interpretable by Praat to modify these values.

The Figure below shows formant tracks of the targeted word "bob" before re-synthesis (in black) and after (in red). The red tracks for F1 and F3 overlap those of the original sound (they are unchanged), while the red track for F2 shows a trajectory raised by 150 Hz globally. Note that the red trajectory retains the shape of the black trajectory thanks to Akustyk's "smart" interpolation algorithm. The ten regions shown in the Figure reflect the ten regions over which the algorithm fills-in the missing formant, bandwidth, and amplitude values.

2. Local changes in formant trajectories

Note: video tutorials on how to use this module can be obtained on this free DVD

The spectrogram below shows the American English (urban Minnesota dialect) phrase "Katie heard Dad say the word ex." For the sake of exercise, I chose a recording that was not ideal - it had a great deal of creaky voice and noise. The method works best with pristine recordings, but it can also produce decent results with not-so-great recordings.

dad

Imagine that this phrase is to be used as a stimulus in a dialect recognition task (or any other sociophonetic perception task). We want to manipulate the pronunciation of the word "dad" in order to make it sound similar to the pronunciation typical of a Northern Cities Vowel Shift (NCVS). Because the vowel /ae/ has a relatively non-flat (dynamically changing) F2 trajectory, we have to do is modify this trajectory locally (i.e., by different values at 10 each of the ten frames from onset to offset) to obtain the characteristic "fronted and raised " sound (i.e., with increased F2 at the onset, and decreased F2 toward the offset). In this example, the F2 of /ae/ in "dad" changes to F2 are made locally, at 10 intermediate intervals, while Akustyk fills-in the missing values by means of its "smart" interpolation algorithm . Note that you can use any mathematical formula interpretable by Praat to modify these values.

The Figure below shows formant tracks of the targeted word "dad" before re-synthesis (in black) and after (in red). The red tracks for F1 and F3 overlap those of the original sound (they are unchanged), while the red track for F2 shows a trajectory modified locally. Note that the red trajectory is obtained thanks to Akustyk's "smart" interpolation algorithm. The ten regions shown in the Figure reflect the ten regions over which the algorithm fills-in the missing formant, bandwidth, and amplitude values.

2. Create a speech continuum

Note: video tutorials on how to use this module can be obtained on this free DVD

The module "Create speech continuum..." (Praat/Akustyk/Create speech continuum...) provides an easy way to synthesize an n-step speech continuum. Speech continuum is defined here as a series of speech sounds that spans the articulatory-acoustic distance from one pronunciation to another (e.g., an 11-step continuum spanning from the vowel /ey/ (e.g., "date") to the vowel /e/ (e.g., "debt")). Speech continua are frequently used in speech perception experiments and have an equally rich potential application in sociophonetics. Akustyk 1.8 provides a fully automated and "smart" method to create a speech continuum. The method is based on the industry-standard PSOLA and LPC analysis/re-synthesis methods.

Let's suppose we need to create an 11-step continuum from the word "date" to the word "debt" in order to study the perception of the /ey/~/e/ category boundary across American English dialects. The figure below shows 11 intermediate formant trajectories (F1, F2, and F3) that span between the vowels /ey/ and /e/ in the /d V t/ context. Each intermediate trajectory is dynamically estimated and automatically synthesized by Akustyk. Users can create a speech continuum between any two sounds and in any number of intermediate steps. Various other options are available as well.

3. Create a pitch-varied minimal pair

Note: video tutorials on how to use this module can be obtained on this free DVD

The pitch synthesis module, based on PSOLA, offers a fully automated method of creating minimal pairs of disyllabic words that vary by pitch only, with duration and intensity normalized across the pair. Samples created in this manner provide stimuli for a variety of perceptual experiments, particularly in tonal languages. They also have an equally useful application in English. In addition, this module offers an option to create (synthesize) pitch contours based on the semitone scale.

The demo below shows how a pair of syllables recorded in isolation can be combined to create a disyllabic construct with a synthesized pitch contour based on the semitone scale (marked in red). The first (a) and second (b) syllables are normalized in duration and intensity. Pitch is the only contrastive feature. Download the samples below for comparison.