NadaNet: A Native Network for the Apple II
Michael J. Mahon - July 26, 2004
Revised (2.0) - July 18, 2005
Revised (3.0) - November 11, 2008
Revised (3.1) - May 5, 2010
Preface to Revised Version
This description of NadaNet is very "bottom-up" in its approach, and may not suit some readers who would only like to find out how to get NadaNet running. Follow this link for a more "top-down" description of NadaNet's Applesoft extensions, designed for those who are more concerned with the functional level than with the implementation level.
However, this description is quite useful for those who are curious about the design choices made and their rationale, so I have updated this original document to correspond with the current NadaNet implementation, NadaNet 3.1.
Introduction
In 1996, I began thinking about how one might network Apple II computers using only its built-in serial I/O: the pushbutton inputs and the annunciator outputs, and wondered what could be done with such a network. I have worked intermittently on this project over the intervening years, more frequently since I have retired. This document describes its current state.
The possibility of creating a useful network using only wire and software was esthetically appealing. Because it initially required no hardware other than the connecting wires, I dubbed it "NadaNet" ("nada" is Spanish for "nothing"). It was an interesting challenge to design an Ethernet-like network from the ground up. The exercise provided an experimental vehicle to illuminate the various issues and tradeoffs in creating and using such a network. It also became a tool for exploring various higher-level applications of networking, such as client-server and parallel computing.
To add more processors and save space, I decided that I would package several Apple //e main boards together, without keyboards or peripheral slot cards. I settled on a wooden cube about one foot on a side which I slotted to hold up to 8 main boards. For more information, see AppleCrate: An Apple II-Based Parallel Computer. More recently, I constructed a 17-processor AppleCrate II using a different packaging scheme and Enhanced Apple //e main boards.
NadaNet 3.1
Nadanet 3.1 is a relatively minor enhancement to version 3.0. It adds one new request protocol, &PEEKPOKE, which atomically reads a 2-byte value from a machine and replaces it with a new value. This provides a reliable method of acquiring a shared lock in a multiprocessor environment. NadaNet 3.1 also updates the &BOOT protocol to use only the version 3.x message format. As a result of this change, all AppleCrate machines must now use the new NadaNet 3.x passive boot ROM.
Sending and Receiving
For simplicity, NadaNet uses only a single wire (plus a ground wire), connected at each machine to both a pushbutton input and an annunciator output. The first question to answer was: "How fast can an Apple II signal over such a link using only software?"
My first experiments in 1994 used tight, precisely timed loops to send and receive the data bits. I found that I could signal at a rate of 14 cycles per bit, for a raw data rate of about 70kbps. Because of the differences in mechanisms for setting annunciators versus reading pushbuttons, the send routine was the speed-limiting factor. However, 70kbps seemed sufficient to enable other experiments, so I coded the first prototype SENDPKT and RCVPKT routines using this approach.
Encouraged by the success of the prototype, I realized that I could speed up the data rate significantly by unrolling the send and receive loops, resulting in a signaling rate of 9 cycles per bit, for a raw data rate of 110kbps. The version 1.0 SENDPKT required a somewhat more space-consuming tree-structured byte send routine. Because of the limited range of relative conditional branches, I also had to split the "send byte" tree into two halves, with an extra (unconditional) branch between them. This adaptation to the 6502 resulted in the fourth bit cell being 3 cycles longer, or 12 cycles, but the overall 50% speedup was substantially unchanged.
In late Spring of 2005, I was contacted by Stephen Thomas, who had found my web site and was intrigued by NadaNet. He clearly read it carefully, since he pointed out that the ROL instruction used in RCVPKT to shift in the network state was creating "bus fights" when it wrote back to the pushbutton input! He suggested new receive code which eliminated this write. We began an email conversation that culminated in his development of an improved SENDPKT and RCVPKT that take less space, require only 8 cycles per bit (an 11% raw speed boost), and tolerate longer networks—a big win! The change required modifying the low-level packet format, including inverting the sense of data "on the wire" and regularizing the timing of the checkbyte. Since this new format was incompatible with previous versions, it became NadaNet 2.0.
The sustainable data transfer rate of NadaNet since version 2.0, including protocol overhead, is over 10,600 bytes per second. In typical applications, much less than 100% network utilization should be expected, since delays in a contention-based network grow rapidly as utilization approaches 100%. My testing confirms that delays remain low with utilizations of up to 80%, so much of the theoretical capacity is available for practical use.
The most recent 3.x versions resulted from a change in low-level request packet format to close a reliability "loophole" when two senders request packets collided and had a chance synchronization within a few cycles. See "Collision Recovery" below for more details.
Synchronization
Because the sending and receiving machines are operating with different clocks, the receiving machine must "lock" to the sending machine's transmission rate. In addition to the natural relative drift of asynchronous machines, another timing issue mandates even greater timing adaptability.
I chose to send "packets" up to 256 bytes long (plus one check byte). I also wanted to allow packets to be aligned differently with respect to page boundaries on the sending and receiving machines. Since an indexed Load on a 6502 requires an extra cycle when a page boundary is crossed, this means that the sending loop can run relatively slower by 1 cycle per byte during parts of a packet.
To accommodate this variation and the natural variation arising from asynchrony, a digital phase-lock loop is implemented in the receive byte loop by making it run two cycles per byte faster than the sending loop. RCVPKT then samples a "servo" transition sent at a fixed timing location between bytes to determine whether or not to introduce an additional three-cycle delay in the loop. Thus, if the receive loop gets ahead of the sending loop (which it will, on average, 2 out of 3 iterations), it is delayed by three cycles. This additional delay moves its data sampling point back to an optimal point in the bit cell. The result is that the receiving machine always samples the sending machine's signal within cycles 5-7 of the 8-cycle bit cell, in spite of machine timing variations and page crossings.
Including the extra servo transition and overhead, the total time per byte sent (excluding the packet start sequence and the end-of-packet check byte) is 94 or 95 cycles per byte (depending on page crossing on the sending machine).
The "control" packets used to perform all protocol functions are 8-byte packets. With packet overhead and check byte included, the time required "on the wire" to send an 8-byte packet is 887 cycles, or about 0.87 milliseconds.
Packet Format
>1 ********************************************************** >2 * * >3 * LOW-LEVEL PACKET FORMAT * >4 * Revised ST Jun 27, 2005 * >5 * * >6 * Start of packet: * >7 * * >8 * --//---+---//---+ +----+ +----+----+-//-> * >9 * Locked | ONE | ZERO |ONE |ZERO|ONE |Bit7| * >10 * or Idle| 31cy | 16cy |8cy |8cy |8cy |8cy | * >11 * --//---+ +---------+ +----+ +----+-//-> * >12 * | | | | | * >13 * | Start Coarse Servo |<- 8 -//-> * >14 * | sync sync | data * >15 * | | bits * >16 * | | (64cy) * >17 * |<---- Start sequence (71cy) ---->| * >18 * * >19 * (Note: data bits are transmitted inverted - 0-bit * >20 * in memory is ONE on wire and vice versa) * >21 * * >22 * Interbyte separator: * >23 * * >24 * >-//-+----+----+ +----+----+----+-//-> * >25 * |Bit1|Bit0| ZERO |ONE |Bit7|Bit6| * >26 * |8cy |8cy | 22-23cy |8cy |8cy |8cy | * >27 * >-//-+----+----+---------------+ +----+----+-//-> * >28 * | | | * >29 * >-//- 8 data ->| Servo |<- 8 data -//-> * >30 * bits | | bits * >31 * |<--- Interbyte ---->| * >32 * separator * >33 * (30-31cy) * >34 * * >35 * Packet end: * >36 * * >37 * >-//-+----+----+ * >38 * |Bit1|Bit0| ZERO (Idle) * >39 * |8cy |8cy | * >40 * >-//-+----+----+--------------------------------//-> * >41 * | * >42 * >-//- End of ->| * >43 * checkbyte * >44 * * >45 **********************************************************
Each packet begins with a "ONE" start pulse of at least 31 cycles duration. The trailing edge of this start pulse serves to start the RCVPKT synchronization sequence.
The next packet prolog event is another transition to ONE, which provides "coarse" synchronization, bringing the read loop within 6 cycles of synchronization with the SENDPKT routine. The next significant transition is the servo transition for the first byte, which is used to bring RCVPKT into "fine" synchronization, within 3 cycles of SENDPKT. This establishes the RCVPKT data sampling time between cycles 5-7 of the 8-cycle bit cell, which is optimal sync. This condition will continue to be maintained by the RCVPKT digital phase-lock loop described above.
The data bits are sent, high bit first, at a rate of 8 cycles per bit. Note that the sense of data is inverted on the wire, with a data 0 sent as ONE and a data 1 sent as ZERO.
After the low bit is sent, while preparing for the next byte, a ZERO is sent for 22 or 23 cycles (depending on page crossing). The next transition to ONE state is the "servo transition" which RCVPKT uses to maintain synchronization with SENDPKT. Note that transitions to ONE are driven by the emitter followers in the NadaNet adapters, so they are sharper transitions than transitions to ZERO created by resistive pulldown of the network bus. All NadaNet critical timing is referred to transitions to ONE because they are more accurately timed.
The ONE state is held for 8 cycles prior to beginning transmission of the next data byte.
After all data bytes have been sent, the checkbyte is sent with the same timing as all other bytes.
Command Protocols
Arbitration: Collision Avoidance
In order to perform work across the network, "command protocols" are defined to establish an orderly means for all machines to share the network and to communicate effectively.
The first requirement for an arbitration scheme is that it avoids most "collisions", or attempts by more than one machine to send simultaneously. NadaNet's primary means of avoiding collisions is an arbitration process which, when the network is busy, results in machines sending packets while other machines continue to wait their turn. A lightly loaded network is a different case, and will be considered separately.
Arbitration consists of continuously sampling the network for activity, and, when inactivity is sensed, waiting for a defined period of idle time (dependent on machine ID) before attempting to send. The minimum arbitration time is one millisecond. The minimum arbitration time requirement is chosen to be greater than the time between packets that comprise most protocols, so that each protocol is composed of an atomic series of packets, uninterrupted by any other traffic. A few service protocols can require more than a millisecond to reply to a request, so they "lock" the net until they can respond by holding the net in a ONE state, effectively extending the start pulse for the next packet. (See Network Locking, below.)
Since machines contending to send will all see the same network activity and inactivity, if they all used the same arbitration time, all contending machines would attempt to seize the network simultaneously, resulting in frequent, repeating collisions.
Therefore, the arbitration interval for each machine is set to a base "minimum arbitration" time plus a time dependent on the machine's unique ID number. This results in machines getting access to the network with a fixed priority, with lower machine IDs receiving higher priority.
Collision Recovery
If, on the other hand, the network is lightly loaded, it is less probable that one or more senders will be waiting on a current sender to synchronize their arbitration timing, so collisions can occur randomly. These collisions occur because there is a finite time between an arbitrator sensing the idle state of the network, asserting a lock to sieze it, and that assertion being sensed by the other arbitrator. During this 20-cycle window, it is possible for another arbitrator to conclude that it also has "won", and a packet collision is the result.
Packet collisions can only occur immediately after an arbitration, on the initial packet of a request. Since a collision ANDs the randomly aligned data of the colliding packets, a checkbyte error will usually result, causing the collision packet to be ignored (and the senders to retry because of the non-existant ACK packet). In some cases the checkbyte may accidentally appear correct, but, since colliding packets always have different senders, and since collision can only turn "1" bits into "0" bits, a collision will cause the FRM and FRMC bytes (see below) to not be complements of each other, a condition which SERVER will detect and reject, again resulting in an ACK timeout at the senders and retries. Since these retries will begin with another arbitration, and both machines have different arbitration delays, the collision will not recur. Thus collisions have only a slight impact on network performance, and do not result in incorrect operation.
The probability of a collision is directly proportional to the length of the "sense-to-seize-to-sense" window and inversely proportional to the mean time between arbitrations unsynchronized by a preceeding packet.
In the current implementation, the window is 20 cycles. Let us assume that "lightly loaded" means less than 50% network load, and a minimal request protocol duration is 3000 cycles. In this case, the mean time between arbitrations will be at least 6000 cycles, and the probability of a random collision is at most 20/6000, or about 0.3%. At this rate, a random collision can be expected to occur no more than once every 2 seconds. In practice, because of the tendency of even moderate traffic to "convoy", or form connected chains of network activity that favor collision avoidance, observed random collision rates are considerably lower than this upper bound.
Of course, as network usage rises, the "busy network" case dominates, and random collisions decrease in frequency. In practice, while collisions must be considered in the design of reliable protocols, they are not a significant operational issue for NadaNet.
+------+------+------+------+------+------+------+------+ | RQMD | FRMC | DST | FRM | ADDRESS | LENGTH | +------+------+------+------+------+------+------+------+
A control packet consists of 8 bytes of data. The first byte (RQMD) specifies the type of request to which a protocol pertains and the particular type of packet within a request protocol (for example, "REQ" starts a protocol and "ACK" indicates a positive response to the request). The request type is encoded in the high-order 5 bits of the RQMD byte, and the modifier is encoded in the low 3 bits. All control packets within a request protocol have the same request type bits, but the modifier bits change throughout the protocol. Currently 12 of the 31 non-zero request encodings are used and 4 of the 7 non-zero modifier encodings are used, so there is plenty of room for expansion.
The second byte (FRMC) is the complement of the FRM byte, and is used to positively detect packet collisions, even in the case when they are synchronized within one bit time.
The third and fourth bytes specify the unique ID number of the destination (DST) and sending machine (FRM), respectively.
The last four bytes contain command-specific parameter or response data, often an address and length.
Protocols
Each protocol begins with a successful arbitration for control of the net.
After winning the arbitration, a protocol is initiated by a control packet with a "request" modifier, including up to 4 bytes of parameters for the requested command.
For all non-broadcast protocols, a good request packet leads to an ACK or NAK response from the target machine, containing up to 4 bytes of response data. (An erroneous request packet results in no response, and a subsequent retry.)
For protocols requiring transfer of more than 4 bytes of data, a series of data packets with a length of 256 bytes may follow, with the final packet being 1 to 256 bytes in length.
If the request response or transfer of data requires acknowledgement (for example, because the request has a side effect which would prevent recovery by simply retrying the request), the protocol will conclude with a "Data ACK" control packet. This packet authorizes the receiver to change state in accord with the request.
Because broadcast requests are not acknowledged, they cannot detect errors in their reception. Therefore, if broadcast requests are to be reliably sent, it is necessary to rule out collisions. This is achieved by arbitrating for the network, locking it, and delaying for about 20 milliseconds in the locked state. Any in-process collisions with other senders will resolve to their subsequent arbitration (which will stall on the locked network). After the 20ms. delay, the broadcast request completes. This long "lead-in" to broadcast requests also allows for receivers using a BASIC polling loop that may spend several milliseconds each iteration before re-calling SERVER.
In summary, all protocols begin with arbitration, consist of a request control packet and one or more control/data packets, and conclude with a network idle state of at least a minimum arbitration period. (Since any following protocol begins with an arbitration.)
Control/Data Distinction
Note that there is no "out-of-band" signal that distinguishes a data packet from a control packet. An 8-byte data packet could be mistaken for a control packet if it were not for the fact that data packets occur only within the context of protocols, and are delayed less than a minimum arbitration time from a preceding packet in the protocol.
If the network has been idle for at least a minimum arbitration period, then the next packet will be the initial "request" packet of a protocol.
The SERVER Loop
The normal state of a "slave" machine participating on NadaNet is to be endlessly re-calling SERVER whenever it is not doing some other task. SERVER exits to the caller whenever a request is processed, a key is pressed, or the iteration count expires so that the caller can perform other work. A 1-byte counter controls the internal iteration of the SERVER loop. Normally, SERVER is entered (and exits) with this counter equal to zero, so that SERVER iterates up to 256 times (about 5 seconds) before returning to its caller. If desired, it can be preset to a different value to cause fewer iterations before returning. If the network is idle, each iteration will take approximately 20ms.
The SERVER loop polls for the initial messages of request protocols. To synchronize with the start of a protocol, it initially waits for the network state to remain either idle (for non-broadcast requests) or locked (for broadcast requests) for three-quarters of the minimum arbitration period. This time is sufficient to ensure that the next packet seen will be the start of a protocol, but not so long that a request coming a minimum arbitration delay after the end of the previous message will be missed. SERVER performs this resynchronization whenever it either detects an error or receives a control message not directed to the serving machine.
When it receives a service request directed to the machine on which it is running, it services it by jumping to the associated request service routine. The service routine performs any additional protocol steps required by the service, and then returns to the code that called SERVER.
Network Locking
For some commands, the time to respond to a request may exceed the minimum arbitration interval, which would break the atomicity rule for a protocol. In these cases, the machine making the delayed response may lock the network by asserting a ONE on the net for a period which should not generally exceed 35 milliseconds (set by the slowdown interval of a Zip Chip, which the network code accommodates). If there are no accelerators involved, the lock period is not limited, though long locks are undesirable.
The effect of asserting ONE is to extend the "start" pulse of the following packet. The locked state is terminated by the sending of the following packet.
Request Protocols
PEEK (dest, address, length, locaddr)
The PEEK request is used by the requesting machine to request the 'dest' machine to send `length' bytes of its memory, starting at `address'. The requesting machine receives the data at its `locaddr'.
The requesting machine begins by sending the PEEK request and the address and length parameters. The serving machine responds by sending an ACK packet. If the length specified is 4 bytes or less, then the data will be returned to the requester in the ACK packet and the protocol ends.
If the data length is more than 4 bytes, then additional data packets are sent until the request is satisfied.
The protocol terminates without a Data ACK, since the requestor can simply retry the request if an error occurs.
POKE (dest, address, length, locaddr)
The POKE request is used by the requesting machine to request the 'dest' machine to store `length' bytes in its memory, starting at `address'. The data is sent from the requesting machine's `locaddr'.
The requesting machine begins by sending the POKE request and the address and length parameters. The serving machine responds by sending an ACK packet.
The requester then sends data packets until the request is satisfied.
The protocol terminates with a Data ACK from the serving machine, to confirm to the requestor that the data was transferred without error.
CALL (dest address, A, X)
The CALL request allows the requesting machine to request the 'dest' machine to call code at `address' in its memory, passing the supplied parameters in the A and X registers.
The requesting machine sends the CALL request and the address and register parameters. The serving machine responds by sending an ACK packet, then calls the requested address.
PUTMSG (dest, class, length, locaddr)
The PUTMSG request allows the requesting machine to request a message server machine to store a message of type `class' and `length' bytes in its message store for later retrieval by the same or a different machine. The message length must be between 1 and 255 bytes. It is located at the requesting machine's `locaddr'.
The requesting machine begins by sending the PUTMSG request and the class and length parameters. The serving machine responds by sending an ACK packet if it can comply, or a NAK if has insufficient space. (Because this determination can take longer than a minimum arbitration time, PUTMSG locks the network until it can send ACK or NAK.)
The requester then sends a data packet of `length' bytes.
The protocol terminates with a Data ACK, to confirm to the requester that the data was transferred without error.
GETMSG (dest, class, length?,locaddr)
The GETMSG request allows the requesting machine to request a message server machine to retrieve the oldest message of type `class' in its message store. The message, if any, is stored at the requesting machine's `locaddr' and its length is returned in `length?'.
The requesting machine begins by sending the GETMSG request and the class parameter. If the server has no stored messages of the requested class, then it returns a NAK packet and the protocol ends. (Because this determination can take longer than a minimum arbitration time, GETMSG locks the network until it can send ACK or NAK.)
If the server has a message of the requested class, it responds by sending an ACK packet containing the length of the message, followed by a data packet containing the message itself.
The protocol terminates with the requester sending a Data ACK to confirm to the server that the data was transferred without error. Upon receiving the Data ACK, the post office server deletes the message from its queue.
PEEKINC (dest, address, increment, value?)
The PEEKINC request requests the 'dest' machine to send 2 bytes of its memory, at `address', and then increment that memory field by `increment'. The original value prior to the increment is returned in `value?'.
The requesting machine sends the PEEKINC request containing the `address' and `increment' parameters. The serving machine responds by sending an ACK packet containing the original, unincremented value.
PEEKINC is a "network atomic" protocol that is useful for efficiently allocating work or other shared resource in a multiprocessor environment, or for performing a rendezvous after completing a unit of work.
PEEKPOKE (dest, address, value, value?)
The PEEKPOKE request requests the 'dest' machine to send 2 bytes of its memory, at `address', and then sets that memory field to `value'. The original unchanged value is returned in `value?'.
The requesting machine sends the PEEKPOKE request containing the `address' and `value' parameters. The serving machine responds by sending an ACK packet containing the original, unmodified value.
PEEKPOKE is a "network atomic" protocol that allows machines to acquire locks reliably in a multiprocessor environment.
BPOKE (address, value)
BPOKE (Broadcast POKE) is a broadcast request for all serving machines to store a 2-byte `value' in their memories at `address'.
The requesting machine sends the BPOKE request containing the `address' and `value' parameters. Since it is a broadcast request, there are no ACK packet(s).
BPOKE allows a machine to send a signal to all serving machines simultaneously. For example, It can be used to "trigger" the continuation of computation after a rendezvous has been detected.
BRUN (dest, address, length, locaddr)
The BRUN request is used by the requesting machine to request the 'dest' machine to store `length' bytes of code in its memory, starting at `address', then transfer control to the code. The code is sent from the requesting machine's `locaddr'.
The requesting machine begins by sending the BRUN request and the address and length parameters. The serving machine responds by sending an ACK packet.
The requester then sends code packets until the binary program is transferred.
The protocol terminates with a Data ACK from the serving machine, to confirm to the requestor that the code was transferred without error, followed by a transfer of control to the received code.
RUN (dest, address, length, locaddr)
The RUN request is used by the requesting machine to request the 'dest' machine to store `length' bytes of Applesoft BASIC program in its memory, starting at 'address', then RUN the program. The program is sent from the requesting machine's `locaddr'.
The requesting machine begins by sending the RUN request and the address and length parameters. The serving machine responds by coldstarting BASIC, then sending an ACK packet.
The requester then sends data packets until the BASIC program is transferred.
The protocol terminates with a Data ACK from the serving machine, to confirm to the requestor that the program was transferred without error, followed by fixing up the links of the BASIC program and then RUNning the received code.
The usual RUN address for an Applesoft program is 2049 ($801), but any address greater than 2048 ($800) is valid, as long as the program and its data do not extend beyond available memory.
When the Applesoft program ends, or does any operation resulting in an input prompt (such as a syntax error) the action taken depends on whether or not the machine is running an OS. If no OS is running (as for a 'Crate machine), any request for keyboard input results in control being returned automatically to the SERVER loop. If ProDOS or DOS is running, then the machine waits for keyboard input as usual, and does not serve the network. Therefore, if a BASIC program is to be &RUN on any type of machine, and you wish the machine always to continue serving after the program ends, the program must execute a CALL 973 at its completion to re-enter the SERVER loop.
BCAST (dataclass, length, locaddr)
BCAST is a broadcast request used by the requesting machine to present `length' bytes of data in its memory, starting at 'locaddr', to all serving machines, tagged with the type 'dataclass'.
The requesting machine begins by sending the BCAST request and the dataclass and length parameters. All serving machines determine how they will deal with the following data based on the value of dataclass.
After an 800-cycle delay, the requester then sends data packets until the request is satisfied.
When a BCAST request is received on a serving machine, SERVER returns to its caller after setting the page zero 'address' and 'length' parameters to the 'dataclass' and 'length' values passed in the request. The contents of 'rbuf' can then be examined to determine that a BCAST request was received, and any desired further processing can be done (within the 800-cycle time limit) to determine whether and how to use RCVLONG to receive the following data.
For example, CR.BPRUNNER only requires that the high byte of 'dataclass' equals $E0 to signal that an Applesoft BASIC program to be loaded at $801 follows. On the other hand, SYNTH.LOADER interprets the high byte of 'dataclass' as a "type" field and, if it is equal to $F1, it interprets the low byte as the voice number. (Note that NADAUSER.S contains the beginning of a table of assigned BCAST types.)
Network Boot
Early in the design of the AppleCrate, it became clear that, since the machines would have no I/O capabilities other than the network, they would need to be booted from the network. This required that the ROMs on the boards be replaced with EPROMs containing modified RESET code to perform the network boot.
When a network-booting machine is reset, boot code in ROM does standard initialization, then loops waiting for a broadcast BOOT request (since it does not transmit, this protocol is referred to as a "passive boot" protocol).
Any master machine can send a broadcast BOOT request packet specifying the load address of the boot image and its length. 800 cycles after the request packet, the boot image is transmitted.
The booting machines, upon receipt of a directed or broadcast BOOT request packet, receive the boot code directly into the address, and with the length, specified in the BOOT request. Upon the error-free completion of this transfer, the video display is cleared, a banner showing the machine ID is displayed, and control is given to the initial address of the boot code (stage 2 boot). If an error is detected during the transfer, the boot ROM resumes waiting for another BOOT request.
Stage 2 boot code is prefixed to the NadaNet boot image. When it is executed, it does a Paddle 3 read to create a temporary machine ID. It then uses this temporary ID to make a GETID request to the machine that sent the BOOT request. If that machine does not respond, then the stage 2 boot code retries the GETID request about every 100ms. until it receives a response.
The BOOTing machine's GETID service routine looks at the requesting ID, and if it is temporary (>127) it assigns the next available permanent ID and sends it in an ACK to the requesting machine. (If the sender already has a permanent ID, then that is returned in the ACK.)
If the GETID is successful, the requesting machine sets its ID to the assigned value and Data ACKs the master, confirming that the ID has been received and installed. The master then allocates the ID.
After the booted machine has obtained an ID, it gives control to the NadaNet initialization routine which starts the machine serving.
NadaNet Hardware
NadaNet is a TTL-level serial network in which ONE is represented as a logic high (greater than +2 volts) and ZERO is represented as a logic low (less than +0.7 volts). The fanout capability of a TTL annunciator output is sufficient to drive a dozen or so TTL pushbutton inputs if they are not otherwise connected, as in the case of the early Apple II machines.
Although NadaNet began—and was named for—not requiring any extra hardware, some changes Apple made in the Apple //e pushbutton circuits necessitated hardware buffering.
The Open- and Closed-Apple keys on the post-][plus machines are connected to pushbutton inputs 0 and 1. In order to function in the same manner as game controller buttons, the keyboard contains 470-ohm pulldown resistors for each Apple key, and pressing the key connects the corresponding pushbutton input to +5 volts.
In addition, to support board self-test at the factory, the main board contains 12K pullup resistors pulling a keyboard-less pushbutton input to +5v.
The effect is that the pushbutton inputs on later machines are relatively low-impedance inputs, each of which sinks on the order of 10mA when driven high. While an annunciator output could drive one or two such inputs, any attempt to connect more than a small number of machines together directly would overwhelm the drive capability of the annunciator outputs of the machines.
My solution was to add emitter followers to drive the pushbutton inputs and the network. These emitter followers are built onto a 16-pin header (or machined-pin socket) that plugs into the 16-pin game port and provides the network and ground wires to connect to the network bus. The wiring diagram is shown below. (Note that the usual 0.1uF decoupling capacitor between +5v and ground is not shown in the diagram.)
The 4.7K pulldown resistor on the network adapters is sufficient if there are no more than eight feet of shielded cable per machine. If more cable is used, then an additional pulldown resistor shunting the net should be added to speed up the fall time of the network signals. A good rule of thumb is that the effective pulldown resistance should be about 45000/L ohms, where L is the total length of shielded audio/video cable used, and the effective resistance of all pulldowns in parallel should not be less than 180 ohms (corresponding to a maximum cable length of 250 feet). CAT-5 twisted pair has a lower capacitance per foot than shielded cable, so about 3 feet of CAT-5 cable is equivalent to 1 foot of shielded cable.
I chose standard RCA connectors for NadaNet because they are inexpensive, plentiful, shielded, and intrinsically polarized. Standard audio or video cables are used to interconnect adapters, and standard Y-adapters are used to daisy-chain machines when needed. Here are instructions for building adapters.
Machine Compatibility
NadaNet was originally conceived with the Apple ][+ in mind. As later machines became targets, the interface changed somewhat to accommodate the changed input specifications of those machines. It should work well on all models of Apple II except the Platinum //e and other //e's of late vintage. As described in Apple //e Technical Note #9, "Switch Input Changes", these machines had 0.1uF capacitors added between the pushbutton inputs and ground, effectively precluding fast signaling. To use these later machines on NadaNet requires locating the PB1 capacitor (C95) and snipping one lead to eliminate it from the circuit. This will have no ill effects on your machine at all. (C95 is the third capacitor from the rear of the main board just behind the cassette in and cassette out jacks.)
I have tested NadaNet with the IIgs, and it works fine when running at 1MHz in the network code. I have considered a special IIgs version of NadaNet which would incorporate "slowdown" and "restore speed" code to allow operation outside NadaNet to proceed at the control panel-selected speed. Unfortunately, the IIgs speed controls require intercepting all timing-critical code, which is both difficult and takes a performance toll.
One IIgs peculiarity to be aware of is that the 16-pin game port is mechanically "rotated" 180 degrees relative to all other Apple II machines—something to watch out for!
Since my primary Apple //e machine has a Zip Chip accelerator installed, the NadaNet code contains instructions which will cause the Zip Chip (and some other accelerators) to slow down during the network code. The slowdown code does not attempt to control the chip directly, but simply makes a reference to the slot 6 Disk Controller "motor off" I/O address. By default, most accelerators slow down for many milliseconds when such an address is referenced, since it is commonly used for a 5.25" disk controller, which requires a 1MHz speed. No actual disk controller needs to be installed in slot 6 for this to work, as long as the acceleration mechanism slows down on slot 6 accesses.