Apple II Real-Time Single-Voice Music Synthesizer
Michael J. Mahon – September 12, 2005
Introduction
After completing the AppleCrate 8-voice Synthesizer, I realized that many people who were interested in music synthesis on the Apple II would not construct AppleCrates. I began to think about how the synthesizer might be adapted to a single Apple II machine, which would be accessible to many more users.
DAC522 technology limits computation cycles during sound production so that only a single voice can be produced at a time, and a single-machine implementation would have this limitation. Because of this, I began to think in terms of a real-time synthesizer that could be "played" from the Apple keyboard. Several correspondents had suggested a "performance" version of the synthesizer, and this seemed an interesting design direction. Since the "Any Key Down" keyboard sensing capability is required to permit natural musical performance, RT.SYNTH is limited to operation on Apple //e, Apple //c (and //c+), and Apple IIgs systems.
The resulting program, RT.SYNTH, has turned out to be surprisingly versatile, even without editing capabilities. It is capable of storing up to eight different instrumental voices, from which the user can select the sound to be used when playing. It can also "record" and "play back" performances, and these recordings can be saved to disk for future playback or for editing by external programs.
Downloads
Download RT.SYNTH.SDK Shrinkit disk archive
Download RT.SYNTH.DSK image
Using RT.SYNTH
RT.SYNTH is distributed as a 5.25" disk image with ProDOS 1.8, which is suitable for both enhanced and unenhanced machines. The disk boots into a STARTUP program which is a simple menu to select between RT.SYNTH or EDIT.VOICEPAK, described below.
When RT.SYNTH is run, it begins by asking which VoicePak to use. (At any file dialog, pressing RETURN will produce a catalog of the current directory.) VoicePaks are a convenient packaging of a configuration of several voices for use by RT.SYNTH. By convention, VoicePaks are named with a ".VP" suffix, so that they can be easily distinguished. Only one VoicePak is included on the disk: PACK1.VP, which should be entered until others have been created.
After the VoicePak is loaded, the user is asked if pre-recorded music is to be loaded. If not, answer "n" or just RETURN. If yes, answer "y", and you will then be asked for the music file name. After the music file dialog, the synthesizer main program is run, and, if no music was loaded, it plays a short intro tune, and is then ready to play.
When the synthesizer is running, it is played using the lower-case keys on the Apple keyboard, as shown on the "front panel" screen, shown below. The keyboard may be played in a "legato" style, meaning that new keys can be pressed while other keys are still held down. A note sounds as long as a key is depressed, or until the voice decays to silence.
With the Apple //e and //c, any number of keys can be held down and a new keypress will still register, but with the IIgs, if more than two keys are down, no further keys will be sensed. This is a limitation of the ADB keyboard implementation.
The bottom line of the screen reports RT.SYNTH status to the user. The current keyboard octave of the lower "manual" is displayed as a digit between 0 and 5. The upper keyboard plays one octave higher than the lower. The Record/Play status is shown as an inverse "R" for record mode and an inverse "P" for play mode. The number of pages remaining for recording music is displayed as a single digit. The ">" symbol is displayed if there are more than 9 pages remaining.
The synthesizer is controlled by pressing upper-case keys, the space bar, or the left and right arrows. The key functions are as follows (as summarized on the screen):
Key |
Function |
space |
Start and stop recording |
Up & Down arrows |
Shift keyboard up & down 1 octave |
P |
Play recorded music |
D |
Delete recorded music |
shift-<digit> |
Select voice <digit> |
S |
Toggle between speaker and cassette output |
X |
Exit the synthesizer |
When the synthesizer exits, the user is given a choice to save recorded music or not. If the answer is "y", a file name is asked for and the file is saved. If "n", the program returns to the main (startup) menu.
To install RT.SYNTH on a larger disk—which is both faster and more convenient for dealing with voices—simply move the program files and the VOICES directory to a larger disk. This version of RT.SYNTH assumes that HIMEM is $9600, so it will not run if additional utilities have been loaded into memory besides BASIC.SYSTEM.
EDIT.VOICEPAK
The other selection on the main menu, EDIT.VOICEPAK, is used to configure the voices available to the synthesizer. A small selection of sampled instrument voices is provided, plus the primitive squarewave, sawtooth, and sine wave for comparison.
The EDIT.VOICEPAK program displays the contents of the current voicepak and allows addition, deletion, and rearrangement of voices, as well as assigning instrument names to them. The controls for doing these actions are well documented on the program’s main screen, shown below.
When saving a voicepak, a new name may be given. The program requires that a voice be assigned as voice #1, and also requires that at least 3 pages of memory be left for recording.
Voices
Voice files are stored in the VOICES directory, and are named "V.n", where n is the MIDI "patch number" (0..127) corresponding (approximately) to the instrument. Currently included voices are:
Voice Number |
Instrument name |
6 |
Acoustic Piano |
12 |
Vibraphone |
25 |
Acoustic Guitar |
35 |
Electric Bass |
57 |
Trumpet |
72 |
Clarinet |
81 |
square wave |
82 |
sawtooth wave |
83 |
sine wave |
106 |
Banjo |
This table can be displayed on the screen by running the program VOICE.LIST on the disk.
Audio Output
RT.SYNTH audio output defaults to the built-in speaker. The output can be toggled between the speaker and the cassette output by pressing "S" within the synthesizer. The cassette output is at "microphone" level, and so must be amplified to be useful.
Needless to say, the quality of the sound is much better if it is reproduced by an external speaker or amplifier, or through headphones. Users of Apple //c and IIgs machines have the advantage of a bult-in speaker/headphone jack, but Apple //e owners would do well to adapt their machines for external output.
Although the raw output is a pulse-width modulated 22kHz pulse train, the highest usable audio frequency present is about 5kHz. Therefore, if the output is fed into an amplifier, it is appropriate to adjust the tone control or equalizer to reduce high frequency response accordingly. This will make it easier on your amplifier even if you cannot hear the 22kHz tone. ;-)
Background: DAC522
DAC522 is a software digital-to-analog converter for the Apple II that plays a stream of 11.025kHz sound samples through the 1-bit Apple speaker port using a pulse-width modulated (PWM) stream at a pulse rate of 22.05kHz, or two pulses per sample. The 22kHz pulse rate renders the pulses themselves virtually inaudible to human ears, but the average output, changed by varying the pulse width in proportion to sample values, reproduces the sampled sound to a precision of 5 bits. Since the period of a 22kHz pulse is 46 Apple II clock cycles, and the Apple II can only create pulse edges to one cycle resolution, at most 32 distinct pulse widths, or 5 bits of precision, can be produced using equal pulses.
DAC522 is a set of pulse generators, each of which generates two pulses with phase and width controlled to one cycle accuracy, while fetching the next sample of a stream, testing for the end of the stream, computing the generator corresponding to the next sample, and vectoring to the selected generator. This process continues at a rate of 11kHz until the entire sample stream has been played, when DAC522 returns to its caller.
At a rate of 11kHz, 48KB of sound samples in an Apple II’s memory are played in a little over four seconds. It was evident that a music synthesizer capable of sustained sounds could not practically be based on DAC522 in its original form.
Direct Digital Synthesis
The fundamental problem a music synthesizer must address is the production of notes of many frequencies and arbitrary durations having specified waveshapes (voices). As has been noted, simply storing all the needed combinations in limited memory is not practical.
A workable solution is to store each waveshape needed as a single-frequency sample, then resample this waveshape on the fly to create any desired frequency.
Most instrument sounds change as a note sounds. For example, many sounds have an "attack" that sounds different from the rest of the note. And many instrument sounds change in amplitude as a note is held, usually decaying in amplitude or changing in "timbre" or spectral composition. Synthesis of notes with changes appropriate to particular instruments, therefore, requires that the synthesized waveform change as a function of the length of time the note is played.
As we shall see, RT.SYNTH performs all the calculations required to carry out these tasks while it is generating the pulses corresponding to the previously calculated sample.
RT.SYNTH Data Structures
The data structures used by RT.SYNTH are designed for high speed. They make extensive use of single-byte pointers to page-aligned data structures to speed operation. In most cases, the data structures are 256 bytes long, so there is little waste in aligning them to page boundaries, but there are a few cases in which page-alignment leads to sparse memory usage, most notably in the code for RT.SYNTH itself.
RT.SYNTH stores music as a sequence of "events" pointed to by the zero-page pointer ‘music’. A stream is created while recording, and played back as requested. RT.SYNTH does not store voice change commands in the music stream (since that would render music streams dependent on particular VoicePaks) but the playback synthesizer will obey embedded voice change events. Saved music files may be edited by external programs that maintain the stream format.
Each "event" in a music string is 3 bytes long. The first byte of an event is the "op" byte. The sign bit of the op byte specifies whether it is a "note" or a "command". Positive bytes specify notes, from 0..127 using the standard MIDI note mapping (with some exceptions to be described later). Negative bytes are commands:
Op Byte |
First param byte |
Second param byte |
0..$7F MIDI note |
<duration lo - 1> |
<duration hi + 1> |
$80 Stop |
<ignored> |
<ignored> |
$81 Rest |
<duration lo - 1> |
<duration hi + 1> |
$82 Voice Change |
<new voice (0-based)> |
<ignored> |
The MIDI note number indexes two pitch tables embedded in the RT.SYNTH code, ‘pitchhi’ containing the integer part and ‘pitchlo’ containing the fractional part of the phase increment. These tables reside in the "holes" following ‘gen5’ and ‘gen4’, respectively. Pitches higher than MIDI key number 112 are "silent" (frequency = 0), since their frequencies are too high to be properly reproduced at an 11kHz sample rate.
If the op byte specifies a note or a rest command, the next two bytes specify the duration of the note or rest in 92-cycle sample periods (corresponding to a sample frequency of 11.092kHz). The duration values are stored in a modified form: The low-byte is reduced by 1 and the high-byte is incremented by 1. For example, a duration of 266 samples, normally represented as ($0A,$01), would be represented in the music stream as ($09,$02). This representation is closer to that required by the synthesizer generators, and saves time in fetching notes.
If the op byte is a Voice Change command, then the next byte is a zero-relative index into the voice table that specifies the voice to be played next and the last byte is ignored. Although RT.SYNTH displays voices numbered from 1..8, the corresponding voice table index for a Voice Change command is 0..7.
If the op byte is a "stop playing" command, both parameter bytes are ignored.
The "voice table" is a variable-length table of the voices loaded into memory. It is indexed by voice change commands embedded in the music string. An entry in the voice table is a single-byte page number pointing to the "envelope" table corresponding to a stored voice. The maximum size of the voice table is 128 entries, though in practice its length is limited by the smaller number of voices that will fit into memory. The voice table is embedded in the region of memory occupied by RT.SYNTH, in the "hole" following ‘gen6’. A voice is selected by storing its envelope page number in the zero-page pointer ‘env’.
A "voice" is a digital representation of the sound of an instrument. It is composed of an envelope page, which is a page-aligned table of page pointers to waveforms, the collection of which specifies the sound of the instrument as a function of time from the inception of a note. Since both the waveforms and the envelope table are page-aligned, single byte page numbers suffice as pointers. Voice waveforms frequently begin with a few pages of unique waveforms, followed by repetitions of a relatively stationary waveform of diminishing or sustaining amplitude. The envelope table thus often refers repeatedly to waveform pages, resulting in greatly reduced voice storage requirements. Since the sound generators are also on page boundaries, starting at page $08, waveforms are represented as generator page pointers in the range of $08..$27 based upon their 5-bit sample values.
A "VoicePak" is a file consisting of several voices concatenated, such that the highest page used is at address $8F00. The low (BLOAD) address varies depending on the total length of the VoicePak. The first page of a VoicePak is the directory page, containing 8 entries, each 16 bytes long:
Voice # (1 byte) |
Page (1 byte) |
Length (1 byte) |
Instrument name (9 bytes) |
Reserved (4 bytes) |
This directory is used by the Applesoft program RT.SYNTH to initialize the instrument names on screen, and by RT.SYNTH.OBJ during its initialization to set up its voice table.
RT.SYNTH Structure
The framework of DAC522-like sound production requires that all the work required for computing the next sample be completed within the 92-cycle sample period, simultaneous with the production of the two pulses specified by the previous sample.
RT.SYNTH is composed of two complete synthesizers, each having 32 distinct pulse generators, one for each duty cycle. The first synthesizer is the playback synthesizer, and it is identical to the CrateSynth SYNTH. The second synthesizer is the real-time synthesizer which constantly polls the keyboard for changes in status.
The two synthesizers are interleaved by generator, with the generators for the playback synthesizer aligned on page boundaries and generators for the real-time synthesizer aligned on page boundaries + $40. The result is that only the 8-bit page number of a generator need be specified to vector to a given pulse generator. Initialization code and music stream processing code is embedded in the "holes", typically a little more than half a page, between the pulse generators. All the code for playback and performance uses only half of the 32 "holes" so a little more than a quarter of the 8KB total is empty space.
Each pulse generator creates different precisely timed pulse widths, but all generators do basically the same work between pulse edges. The listing of playback generator 0 is shown below:
0800: 8D 30 C0 >6 gen0 sta spkr ; <==== start time: 0 0803: EA >7 nop ; Kill 2 cycles 0804: 8D 30 C0 >8 sta spkr ; <===== stop time: 6 0807: 85 EB >9 sta ztrash ; Kill 3 cycles 0809: E6 ED >10 inc scount ; Compute envelope >11 ciny 080B: F0 01 >11 beq *+3 ; If =, branch to iny 080D: A5 >11 dfb $A5 ; "lda $C8" to skip iny 080E: C8 >11 iny >11 eom 080F: 18 >12 clc 0810: A5 EC >13 lda frac ; Compute next sample 0812: 65 FE >14 adc freq 0814: 85 EC >15 sta frac 0816: 8A >16 txa 0817: 65 FF >17 adc freq+1 0819: AA >18 tax 081A: B1 06 >19 lda (env),y ; Next sample page 081C: 8D 30 C0 >20 sta spkr ; <==== start time: 46 081F: EA >21 nop ; Kill 2 cycles 0820: 8D 30 C0 >22 sta spkr ; <===== stop time: 52 0823: 85 EB >23 sta ztrash ; Kill 3 cycles 0825: 8D 2A 08 >24 sta :ptr+2 0828: BD 00 00 >25 :ptr ldaa 0*0,x ; Fetch sample. 082B: 8D 3C 08 >26 sta :sw0+2 082E: C6 FC >27 dec dur ; Decrement duration >28 cdec dur+1 0830: F0 02 >28 beq *+4 ; If eq, branch to dec 0832: EA >28 nop ; Else kill 2 cycles and 0833: AD >28 dfb $AD ; "lda xxxx" to skip dec 0834: C6 FD >28 dec dur+1 ; of zero-page param. >28 eom 0836: A5 FD >29 lda dur+1 0838: F0 03 >30 beq :quit ; Finished. 083A: 4C 00 00 >31 :sw0 jmp 0*0 ; Switch to gen, T = 89 >32 083D: 4C 7E 09 >33 :quit jmp quit
As shown, ‘gen0’ is of typical length, and only uses $40 bytes, or a quarter of a page. The generators of the real-time synthesizer occupy the next quarter page, and the remainder of each generator’s page is used for other RT.SYNTH code, or data tables, or is left unused. This sparse use of space is a conscious tradeoff to reduce the time required to vector dynamically to each generator.
The corresponding real-time synthesizer generator 0 is shown below:
>37 align $40 ; == rgen0 == >37 ds *+$40-1/$40*$40-* >37 eom >38 err **256/256!$40 ; (ensure $xx40) >39 0840: 8D 30 C0 >40 rgen0 sta spkr ; <==== start time: 0 0843: EA >41 nop ; Kill 2 cycles 0844: 8D 30 C0 >42 sta spkr ; <===== stop time: 6 0847: EA >43 nop ; Kill 2 cycles 0848: EA >44 nop ; Kill 2 cycles 0849: EA >45 nop ; Kill 2 cycles 084A: EA >46 nop ; Kill 2 cycles 084B: 08 >47 php ; Kill 7 cycles 084C: 28 >48 plp 084D: E6 ED >49 inc scount ; Compute envelope >50 ciny 084F: F0 01 >50 beq *+3 ; If =, branch to iny 0851: A5 >50 dfb $A5 ; "lda $C8" to skip iny 0852: C8 >50 iny >50 eom 0853: 18 >51 clc 0854: A5 EC >52 lda frac ; Compute next sample 0856: 65 FE >53 adc freq 0858: 85 EC >54 sta frac 085A: 8D 30 C0 >55 sta spkr ; <==== start time: 46 085D: 8A >56 txa 085E: 8D 30 C0 >57 sta spkr ; <===== stop time: 52 0861: 6D FF 00 >58 adca freq+1 0864: AA >59 tax 0865: B1 06 >60 lda (env),y ; Next sample page 0867: 8D 6C 08 >61 sta :ptr+2 086A: BD 00 00 >62 :ptr ldaa 0*0,x ; Fetch sample. 086D: 8D 7C 08 >63 sta :sw0+2 0870: AD 10 C0 >64 lda AKD 0873: C5 EF >65 cmp key ; Key changed? 0875: F0 03 >66 beq :sw0 ; -No, keep playing. 0877: 4C 77 10 >67 jmp newkey ; -Yes, handle it. 087A: 4C 40 08 >68 :sw0 jmp rgen0 ; Switch to rgen, T = 89
Note that the primary difference in the code of the two types of generator is the end test. The playback synthesizer generator is timed by the countdown of ‘dur’, while the real-time synthesizer generator is timed by a change in state of the Apple keyboard.
The critically timed events are the sta spkr
instructions. They toggle the state of the Apple’s speaker output and thereby generate the variable width high frequency pulses that perform the digital-to-analog conversion that is responsible for the synthesizer’s audio output. (Note that the cycle counts are all relative to the first cycle of the fetch of the corresponding instruction, not the execution cycle during which the toggle actually occurs. Since all toggling instructions are identical 4-cycle instructions in which the toggle occurs at the start of the 4th cycle, this relative method of counting produces correct results.)These generators, gen0 and rgen0, generate the shortest duty cycle used by the synthesizer, corresponding to a sample value of 0. While these generators are "playing" a sample with a value of zero, they are computing the value of the next sample and doing all necessary bookkeeping, as detailed below:
Lines 10-11 and 49-50 count the number of samples produced so far, and advance the Y register by one every 256 sample times, or a little less than 1/40th of a second. The Y register, then, is an index into the current "envelope" page.
Lines 12-18 and 51-54, 56, and 58-59 compute the next sample from the current waveform page by adding the 16-bit phase increment to the phase accumulator, for which the location ‘frac’ is the fractional part and X contains the integral part.
Lines 19 and 24, and 60-61 set up the current waveform page from the envelope table.
Lines 25-26 and 62-63 retrieve the correct waveform sample, stored as the page number of the corresponding pulse generator, and uses it to set up :sw0 to vector to that generator next.
Lines 27-28 decrement the 16-bit duration of the note (measured in samples) for the playback generator, while lines 64-67 poll the state of the Apple keyboard and detect any change for the real-time generator.
Lines 29-30 test the remaining duration for the playback generator and terminate the note if it has expired.
All generators perform all of these tasks, sometimes in a slightly different (but irrelevant) order. There is one exception to this: the "end test" in lines 29-30 and 64-67. This test is performed only in generators 0 through 3, so that a note may play on beyond its intended duration until its waveform amplitude is in the range of 0 to 3 (out of 31). Since all notes start with sample values near 0, this has the effect of minimizing switching noises when one note transitions to another. As we shall see, RT.SYNTH further capitalizes on this regularity by continuing to generate pseudo-samples with value 0 during note transtions and control operations, so there are no discontinuities in pulse generation (pops) as music is played or performed.
The code for the 32 generators was actually created by an Applesoft program, which scheduled all the specified "work" instructions into the cycles between the speaker-toggling instructions, adding "padding" instructions as needed to produce generator-specific cycle-accurate timings for each of the toggling instructions. From a practical point of view, I found it simply too error-prone to repeatedly schedule all 32 generators manually as the synthesis strategy and code evolved. The BASIC program generates Merlin source code, and takes most of the pain out of making changes.
RT.SYNTH Initialization
RT.SYNTH is an Applesoft program that prompts the user for the VoicePak to be used and the music file to be loaded, if any. After these interactions and file operations, it formats the RT.SYNTH screen and BRUNs the RT.SYNTH.OBJ program, which overlays the Applesoft program and maintains control until the user exits the synthesizer.
Upon exiting, RT.SYNTH.OBJ does a RUN of the Applesoft program RT.SAVE, which asks if the user wishes to save any recorded music, and, if so, performs the file operation.
"Wrapping" the RT.SYNTH.OBJ program in two Applesoft programs makes it easier to initialize the "front panel" screen for the synthesizer, and eliminates any need for the M/L program to perform ProDOS file operations—a major simplification.
Voice Generation
SYNTH requires voices that are samples of "instruments", represented by 256-byte pages of sampled waveforms compressed by use of an envelope table.
My starting point for current tonal voices has been synthesized instruments played at approximately 43Hz, or the key of F in octave 1. I sample to a .wav file at 11.025kHz, which creates a waveform with a period of about 258 samples—not perfect, but close enough to resample to 256-sample waveforms easily.
The waveform is first "ramped" by subtracting the negative envelope of the waveform from it. This has the effect of making each cycle of the waveform start and end with the minimum sample value, allowing note transitions to be made without "pops". After ramping, the waveform is normalized in amplitude so that at its loudest point it covers the full range of 0..31. Then cycles are selected from the total waveform which differ significantly in waveform or amplitude, and the envelope table is constructed so that it will reproduce a reasonable facsimile of the instrument, in both waveform and envelope, as the note sounds.
A voice file consists of an initial "envelope" page, composed of single-byte relative page pointers to the succeeding "wave" pages. For example, the relative page pointer to the first wave page after the envelope page is $01. If an instrument's sound decays to silence, then the last "waveform" will be silence, and the envelope table will be filled out to the end with a relative pointer to this page. (Note that in the waveform pages, sample amplitudes of 0..31 are represented by integers in the range of 8..39, which are the actual page numbers of the DAC522 pulse generators for the samples.)