AppleCrate Polyphonic Music Synthesizer
Michael J. Mahon - April 30,
Revised - July 21, 2005
Revised - February 6, 2009
After I constructed and tested the AppleCrate I, a parallel computer made of eight Apple //e's, I wrote several test programs and demonstrations, but did not have a real "application" that used its parallel computing capabilities to produce a useful result.
Some conversations with other Apple II enthusiasts, particularly Simon Williams and Patrick Collins, led me to consider whether the sound producing technology of DAC522 first exploited in Sound Editor v2.2 could be usefully adapted to the AppleCrate.
Since the synthesizer was initially written, I have constructed a 17-processor AppleCrate II, that uses an updated version of the program to create 16-voice music (WhenIm64.mp3). This new version of CRATE.SYNTH also exploits the new broadcast data command of NadaNet 3.0 to achieve much faster loading of SYNTH code and voices.
DAC522 is a software digital-to-analog converter for the Apple II that plays a stream of 11.025kHz sound samples through the 1-bit Apple speaker port using a pulse-width modulated (PWM) stream at a pulse rate of 22.05kHz, or two pulses per sample. The 22kHz pulse rate renders the pulses themselves virtually inaudible to human ears, but the average output, changed by varying the pulse width in proportion to sample values, reproduces the sampled sound to a precision of 5 bits. Since the period of a 22kHz pulse is 46 Apple II clock cycles, and the Apple II can only create pulse edges to one cycle resolution, at most 32 distinct pulse widths, or 5 bits of precision, can be produced using equal pulses.
DAC522 is a set of pulse generators, each of which generates two pulses with phase and width controlled to one cycle accuracy, while fetching the next sample of a stream, testing for the end of the stream, computing the generator corresponding to the next sample, and vectoring to the selected generator. This process continues at a rate of 11kHz until the entire sample stream has been played, when DAC522 returns to its caller.
At a rate of 11kHz, 48KB of sound samples in an Apple II's memory are played in a little over four seconds. It was evident that a music synthesizer capable of sustained sounds could not practically be based on DAC522 in its original form.
Direct Digital Synthesis
The fundamental problem a music synthesizer must address is the production of notes of many frequencies and arbitrary durations having specified waveshapes (voices). As has been noted, simply storing all the needed combinations in limited memory is not practical.
A workable solution is to store each waveshape needed as a single-frequency sample, then resample this waveshape on the fly to create any desired frequency.
Most instrument sounds change as a note sounds. For example, many sounds have an "attack" that sounds different from the rest of the note. And many instrument sounds change in amplitude as a note is held, usually decaying in amplitude or changing in "timbre" or spectral composition. Synthesis of notes with changes appropriate to particular instruments, therefore, requires that the synthesized waveform change as a function of the length of time the note is played.
As we shall see, SYNTH performs all the calculations required to carry out these tasks while it is generating the pulses corresponding to the previously calculated sample.
SYNTH Data Structures
The data structures used by SYNTH are designed for high speed. They make extensive use of single-byte pointers to page-aligned data structures to speed operation. In most cases, the data structures are 256 bytes long, so there is little waste in aligning them to page boundaries, but there are a few cases in which page-alignment leads to sparse memory usage, most notably in the code for SYNTH itself.
When SYNTH is called, it starts fetching from the "music" stream pointed to by the zero-page pointer `music'. A music stream is a sequence of "events" that are fetched sequentially by SYNTH and determine its operation.
Each "event" in a music string is 3 bytes long. The first byte of an event is the "op" byte. The sign bit of the op byte specifies whether it is a "note" or a "command". Positive bytes specify notes, from 0..127 using the standard MIDI note mapping (with some exceptions to be described later). Negative bytes specify commands, including "rest" ($81), "voice change" ($82), and "stop playing" ($80).
The note number indexes two pitch tables embedded in the SYNTH code, `pitchhi' containing the integer part and `pitchlo' containing the fractional part of the phase increment. These tables reside in the "holes" following `gen5' and `gen4', respectively. Pitches higher than MIDI key number 112 are "silent" (frequency = 0), since their frequencies are too high to be properly reproduced at an 11kHz sample rate. (Key number 127 is a special case used for atonal percussive voices, as described later.)
If the op byte specifies a note or a rest command, the next two bytes specify the duration of the note or rest in 92-cycle sample periods (corresponding to a sample frequency of 11.092kHz). If the op byte is a voice change command, then the second byte is an index into the "voice table" specifying the voice to be played next and the third byte is ignored. If the op byte is a "stop playing" command, both following bytes are ignored.
The "voice table" is a variable-length table of the voices loaded into memory. It is indexed by voice change commands embedded in the music string. An entry in the voice table is a single-byte page number pointing to the "envelope" table corresponding to a stored voice. The maximum size of the voice table is 128 entries, though in practice its length is limited by the smaller number of voices that will fit into memory. The voice table is embedded in the region of memory occupied by SYNTH, in the "hole" following `gen3'. A voice is selected by storing its envelope page number in the zero-page pointer `env'.
A "voice" is a digital representation of the sound of an instrument. It is composed of an envelope page, which is a page-aligned table of page pointers to waveforms, the collection of which specifies the sound of the instrument as a function of time from the inception of a note. Since both the waveforms and the envelope table are page-aligned, single byte page numbers suffice as pointers. Voice waveforms frequently begin with a few pages of unique waveforms, followed by repetitions of a relatively stationary waveform of diminishing or sustaining amplitude. The envelope table thus often refers repeatedly to waveform pages, resulting in greatly reduced voice storage requirements. Since the sound generators are also on page boundaries, starting at page $08, waveforms are represented as generator page pointers in the range of $08..$27 based upon their 5-bit sample values.
Atonal percussive voices are somewhat different, in that they are "played" directly from memory without resampling. (This means that atonal voices occupy 11 thousand bytes per second that they "sound".) SYNTH plays these voices without special case code by specifying a "pitch" of 127 ($7F) whose pitch table entry is a frequency of 1.0 (in units of 43.31Hz), corresponding to advancing exactly one stored waveform sample per sample period.
The framework of DAC522-like sound production requires that all the work required for computing the next sample be completed within the 92-cycle sample period, simultaneous with the production of the two pulses specified by the previous sample.
SYNTH is composed of 32 distinct pulse generators, one for each duty cycle, and each is aligned on a page boundary so that only its 8-bit page number need be specified to vector to a given pulse generator. Initialization code and music stream processing is embedded in the "holes" between pulse generators.
Each pulse generator creates different precisely timed pulse widths, but all generators do basically the same work between pulse edges. The listing of generator 0 is shown below:
0800: 8D 30 C0 >6 gen0 sta spkr ; <==== start time: 0 0803: EA >7 nop ; Kill 2 cycles 0804: 8D 30 C0 >8 sta spkr ; <===== stop time: 6 0807: 85 EB >9 sta ztrash ; Kill 3 cycles 0809: E6 ED >10 inc scount ; Compute envelope >11 ciny 080B: F0 01 >11 beq *+3 ; If =, branch to iny 080D: A5 >11 dfb $A5 ; "lda $C8" to skip iny 080E: C8 >11 iny >11 eom 080F: 18 >12 clc 0810: A5 EC >13 lda frac ; Compute next sample 0812: 65 FE >14 adc freq 0814: 85 EC >15 sta frac 0816: 8A >16 txa 0817: 65 FF >17 adc freq+1 0819: AA >18 tax 081A: B1 06 >19 lda (env),y ; Next sample page 081C: 8D 30 C0 >20 sta spkr ; <==== start time: 46 081F: EA >21 nop ; Kill 2 cycles 0820: 8D 30 C0 >22 sta spkr ; <===== stop time: 52 0823: 85 EB >23 sta ztrash ; Kill 3 cycles 0825: 8D 2A 08 >24 sta :ptr+2 0828: BD 00 00 >25 :ptr ldaa 0*0,x ; Fetch sample. 082B: 8D 3C 08 >26 sta :sw0+2 082E: C6 FC >27 dec dur ; Decrement duration >28 cdec dur+1 0830: F0 02 >28 beq *+4 ; If eq, branch to dec 0832: EA >28 nop ; Else kill 2 cycles and 0833: AD >28 dfb $AD ; "lda xxxx" to skip dec 0834: C6 FD >28 dec dur+1 ; of zero-page param. >28 eom 0836: A5 FD >29 lda dur+1 0838: F0 03 >30 beq :quit ; Finished. 083A: 4C 00 00 >31 :sw0 jmp 0*0 ; Switch to gen, T = 89 >32 083D: 4C 40 09 >33 :quit jmp quit
As shown, `gen0' is of typical length, and only uses $40 bytes, or a quarter of a page. The remainder of each generator's page is used for other SYNTH code, or data tables, or is left unused. This sparse use of space is a conscious tradeoff to reduce the time required to vector dynamically to each generator. Approximately 5KB of SYNTH's 8KB is unused, split into 26 page fragments.
The critically timed events are thesta spkr instructions. They toggle the state of the Apple's speaker output and thereby generate the variable width high frequency pulses that perform the digital-to-analog conversion that is responsible for the synthesizer's audio output. (Note that the cycle counts are all relative to the first cycle of the fetch of the corresponding instruction, not the execution cycle during which the toggle actually occurs. Since all toggling instructions are identical 4-cycle instructions in which the toggle occurs at the start of the 4th cycle, this relative method of counting produces correct results.)
This generator,gen0, generates the shortest duty cycle used by the synthesizer, corresponding to a sample value of 0. While this generator is "playing" a sample with a value of zero, it is computing the value of the next sample and doing all necessary bookkeeping, as detailed below:
Lines 10-11 count the number of samples produced so far, and advance the Y register by one every 256 sample times, or a little less than 1/40th of a second. The Y register, then, is an index into the current "envelope" page.
Lines 12-18 compute the next sample from the current waveform page by adding the 16-bit phase increment to the phase accumulator, for which the location `frac' is the fractional part and X contains the integral part.
Lines 19 and 24 set up the current waveform page from the envelope table.
Lines 25-26 retrieve the correct waveform sample, stored as the page number of the corresponding pulse generator, and uses it to set up:sw0 to vector to that generator next.
Lines 27-28 decrement the 16-bit duration of the note (measured in samples).
Lines 29-30 test the remaining duration and terminate the note if it has expired.
All generators perform all of these tasks, sometimes in a slightly different (but irrelevant) order. There is one exception to this: the "end test" in lines 29-30. This test is performed only in generators 0 through 3, so that a note may play on beyond its intended duration until its waveform amplitude is in the range of 0 to 3 (out of 31). The note fetch routine in SYNTH compensates for any extra samples played by subtracting them from the duration of the following note or rest.
Since all notes start and end with sample values near 0, this has the effect of minimizing switching noises when one note transitions to another. As we shall see, SYNTH further capitalizes on this regularity by continuing to generate pseudo-samples with value 0 during note transtions and control operations, so there are no discontinuities in pulse generation as music is played.
The code for the 32 generators was actually created by an Applesoft program, which scheduled all the specified "work" instructions into the cycles between the speaker-toggling instructions, adding "padding" instructions as needed to produce generator-specific cycle-accurate timings for each of the toggling instructions. From a practical point of view, I found it simply too error-prone to repeatedly schedule all 32 generators manually as the synthesis strategy and code evolved. The BASIC program generates Merlin source code, and takes most of the pain out of making changes.
CRATE.SYNTH is an Applesoft program run by the user on the master machine. It boots the AppleCrate machines (if they are not already serving), then prompts the user for the music file to be played. That file is then BLOADed into the master machine's memory, along with SYNTH.LOADER, the &BCAST client for loading SYNTH code and voices into the AppleCrate machines.
CRATE.SYNTH then loops through all required machines, loading the music for each machine and scanning its voice table, which it copies into a table in SYNTH.LOADER and merges into its sorted master list of required voices. The music stream is then &POKEd into each machine, and the "personalized" image of SYNTH.LOADER is &BRUN, preparing the machine to receive the SYNTH code and voices to be broadcast by the master.
After all machines are prepared, the SYNTH code is &BCAST to all machines (and received by those running SYNTH.LOADER). Then the voices in the sorted master list are BLOADed from the master machine's disk and &BCAST to the AppleCrate machines. SYNTH.LOADER, running in each AppleCrate machine, determines whether a voice is needed by this machine, and, if so, receives it into memory. The slave's memory map is updated and its voice table is filled in, then the voice's envelope table is relocated. When each machine's voice requirements have been satisfied, it loops calling SERVE waiting for a &BPOKEd "start" signal.
Note that the broadcast distribution of SYNTH code and voices means that each file is sent just once over the network, and all machines receive only the voices that they require. The result is that initialization requires much less time than if each machine were initialized separately.
When all the required machines have been loaded, CRATE.SYNTH prompts the user to start, and then it &BPOKEs the start signal, releasing all copies of SYNTH to start fetching and playing their music streams within three cycles of perfect synchronization.
Empirically, the AppleCrate machines diverge from synchronization at a rate of one millisecond for every 40 seconds of execution. Since up to 10 milliseconds of temporal misalignment is virtually inaudible, if the machines are started in sync, they remain sufficiently well synchronized for song of at least 400 seconds duration without any additional synchronization.
The audio output from the original AppleCrate I is quite simple. Eight 10k resistors connect each of the boards' speaker outputs to a mixing node, a 2.2k resistor to ground, paralleled by a 0.1uF capacitor to serve as a simple first-order lowpass filter. The output at the mixing node is 200-300 millivolts peak-to-peak, and is input to an external audio amplifier.
This design had several weaknesses. First, the speaker output is derived from the +5v of each board, which is electrically noisy, and this noise finds its way to the mixer output. Second, with only a single mixing point, all output was monaural. Third, the single-pole low-pass filter rolls off at only 6dB per octave, so it is only marginally effective at reducing the high-level 22kHz pulse component without also attenuating desired high frequencies in the synthesized sound. Finally, the output at the mixing point was low level, and required an external amplifier.
The AppleCrate II audio system addresses each of these problems with an external sound processor box.
To reduce spurious noise, the sound output taken from each board is a TTL logic level, and each board's signal is separately passed through an inverter stage in the sound processor box. This inverter, like all the audio circuits in the box, is powered by a "quiet" +5v that is double-regulated down from the +12v supply of one of the boards. The resulting "clean" pulse-width modulated signals from the boards are then mixed at two mixing points, with boards 1-9 going to the left channel and boards 10-17 going to the right channel, providing the capability of generating stereo sound.
In place of the simple single-pole RC low-pass filter in the AppleCrate I, the sound processor provides 5-pole low-pass filters for both left and right channels. The cutoff frequency of these filters is 5.5kHz, which reduces the level of 22kHz in the outputs by 60dB with negligible attenuation of valid high frequency signal components.
Finally, the output of the filters passes through left and right volume controls to two power opamps that drive small speakers or headphones directly. Stereo line-level outputs are provided prior to the volume controls.
For comparison with the AppleCrate I's "8-oscillator rendition of In My Life", here is the new AppleCrate II "13-oscillator version of In My Life" (again, recorded in mono). Note that 13 oscillators are enough that there are no missing notes in the bridge. (Spreading the music across more oscillators also relieved memory pressure sufficiently to add the final crash cymbal, which sounds for almost a second without repeating, and so requires more than 10KB of sample memory!)
The MIDI Converter
Music files for CRATE.SYNTH are created from standard MIDI files. CVT.MIDI is an Applesoft program that reads a user-specified MIDI file and converts it into a music file, composed of multiple music streams, one for each digital oscillator (machine running a copy of SYNTH) that is needed to play the music.
CVT.MIDI is a work in progress. It has grown incrementally from a conceptual prototype into a usable tool without the benefit of being rewritten for style or machine language speed. Bear that disclaimer in mind as you peruse its code. ;-) It starts by asking for a "Debug output level" that selects how verbose its reporting on the conversion process will be and provides other support for debugging.
My primary Apple //e has an 8MHz Zip Chip accelerator, which makes even Applesoft acceptably fast as a prototyping language. Still, even with acceleration, CVT.MIDI runs at a fraction of the "real time" required to play a piece of music. Running it at 1MHz will require patience!
CVT.MIDI works by merging all parallel "tracks" of the MIDI file into a single sequential stream of events. The most important MIDI events are "key down" and "key up" events, which specify the pitches and durations of notes to be played. But there are also control events such as tempo and voice changes in the MIDI stream.
Since a MIDI file may contain many parallel tracks, CVT.MIDI maintains multiple buffers, one for each track, which it refills as needed as the merging progresses. This approach permits arbitrary-sized MIDI files to be processed.
Similarly, CVT.MIDI outputs multiple parallel music streams into a single sparse binary file. At the end of processing, the sparse file is compacted into a single music file. It also maintains multiple buffers for the output streams, which are written to disk as they fill, so that memory size does not constrain the size of file that can be processed.
As CVT.MIDI scans the merged stream of MIDI events, it allocates "notes" to idle oscillators, preferring ones that have sounded the current voice previously (and so will already have it in memory). In some cases, more oscillators are required than AppleCrate supports (8), so it is necessary to preempt a currently sounding oscillator for the new note. In this case, it chooses the oscillator that has been sounding the longest, in an attempt to make the preemption as benign as possible.
A special case is made of "re-striking" a key that is already sounding-in that case, the current sounding is ended and the new one started on the same oscillator.
When a note ends, its oscillator is returned to "idle" status for re-use.
There is much room for improvement in the way that CVT.MIDI allocates oscillators. The current algorithm is a sequential, one-pass method. It is easy to show that such an algorithm, with no knowledge of future events, cannot do an optimal assignment of notes/voices to oscillators. The consequence of a non-optimal assignment is that each oscillator (SYNTH machine) is required to sound a larger number of voices, which can overflow the memory of the SYNTH machines. Note that CVT.MIDI contains a parameter setting the maximum number of oscillators that it will allocate, so that it can produce music for AppleCrates with differing numbers of machines.
I am continuing to study ways to create better voice assignments, and I expect that this will result in updates to CVT.MIDI in the future.
Voice Generation Tools
SYNTH requires voices that are samples of "instruments", represented by multiple 256-byte pages of sampled waveforms indexed by an envelope table.
My starting point for current tonal voices has been synthesized instruments played at approximately 43Hz, or the key of F in octave 1. I sample to a .wav file at 11.025kHz, which creates a waveform with a period of about 258 samples- not perfect, but close enough to extract 256-sample waveforms easily.
The waveform is first "ramped" by subtracting the negative envelope of the waveform from it. This has the effect of making each cycle of the waveform start and end with the minimum sample value, allowing note transitions to be made without "pops". Note that a sample must contain enough values in the 0..3 range so that, over the range of frequencies at which it will be resampled, samples in the 0..3 range will actually occur. If samples in this range do not occur, the synthesizer will never check for the end of the note and the note will play forever! After ramping, the waveform is normalized in amplitude so that at its loudest point it covers the full range of 0..31. Then cycles are selected from the total waveform which differ significantly in amplitude, and the envelope table is constructed so that it will reproduce a good facsimile as the note sounds. This typically requires a relatively small number of distinct waveforms, resulting in relatively compact voice file. Simple Applesoft tools for constructing tonal voices, including GEN.TONAL, are included in the download.
Atonal voices are also sampled at 11.025kHz, and the resulting .wav file is "ramped" and normalized in amplitude to a sample range of 0..31 (mapped to $08..$27 for use with SYNTH). Rudimentary Applesoft tools for generating atonal voice files, SOUND.RAMP and GEN.ATONAL, are included in the download.
Voice files are named "V.n", where n is the MIDI "patch number" (0..127) corresponding to the instrument. Atonal percussive instruments are mapped to "V.k", where k is the MIDI "note number" (0..127) + 128 for the instrument in the MIDI "percussion channel". The BASIC program VOICE.LIST (in the VOICES directory of the download) shows the instruments corresponding to the included voice files.
ShrinkIt archive containing SYNTH, voices, tools, music, and source