The Phase Vocoder in Sculptor comes in two parts: a batch-mode analyser called ``analyse'', and a real-time synthesiser called, perhaps more imaginatively, ``prism''. Analyse reads an input file in Sun/NeXT audio format. The sample rate we use most often is 22050 samples per second, as my P120 machine at home can comfortably keep up with this resynthesis rate using floating point arithmetic with enough power left over to see to the work of running the X-Windows interface. Samples can be acquired in the usual way using a command-line recording tool, but finding that rather tedious, we wrote Studio[6] in Tcl/Tk to make the process of acquiring short samples more accessible.
Figure 1: Analysing the Audio Samples
Analyse reads the sample file and breaks it up into overlapping windows of about 10ms in length. 10ms is chosen because there is evidence that suggests the ear is insensitive to spectral changes on a shorter time scale. Each window is Fourier-transformed, producing an array of spectral samples (Figure 1), but instead of simply storing the amplitude and phase of each Fourier result (bin), it is the amplitude and phase-change per window that are recorded.
To understand why the phase-change per window is important rather than the absolute phase, let's consider a simple example. Suppose we are using a sample-rate of 8192Hz, and have a 128-point FFT. This means that each window will last approximately 15ms, and the spacing between Fourier bins will be 64Hz. Now present this program with a sine-wave at 1kHz. The Fourier transform cannot represent this signal exactly; recall that it is behaving like a bank of filters 64Hz apart, and so the nearest filter frequencies will be bins 15 and 16 at 960 and 1024Hz respectively.
Now think what will happen as this same signal when it is analysed, say, a quarter of a window later. It will still be represented as a 1024Hz sine-wave, but because its frequency is really lower than that, it will appear to have lagged in phase. So by storing the phase-change per window, sufficient information is retained at least to approximate the original 1000Hz sinusoid when the inverse Fourier transformation results are overlapped and added together at resynthesis time.