USR.SQR: a Fast Square Root Routine for Applesoft

Michael J. Mahon April 6, 2009


Several times in the past I've written Applesoft BASIC programs that needed to take square roots--sometimes a lot of square roots! In these cases, I was very disappointed with the speed of the built-in SQR function in Applesoft.

SQR typically requires almost 50 milliseconds to compute a square root. The method it uses has only one thing to recommend it: it is very compact. It works by raising its argument to the 0.5 power, which requires computing LOG(X), adjusting its exponent to divide it by two, then computing EXP of the resulting number, yielding the square root.

While this algorithm certainly works, it is slow, since both LOG and EXP are computed as polynomial approximations requiring numerous floating-point multiplies and adds.


To improve the speed of square root calculation, I first tried a Newton-Raphson iteration. This approach starts by creating an initial approximation of the square root, then performing iterations (requiring a floating-point divide and an add per iteration) until convergence on the square root is achieved. With a simple 32-byte table look-up for the approximation, this method produces a square root accurate to 32 bits in 3 or 4 iterations, since the number of correct bits in the approximation doubles on each iteration.

The average time required for this algorithm was about 12 milliseconds--much better, but I wasn't satisfied.

So I next tried a "direct" algorithm for computing a binary square root, that is, one that develops one bit of the square root per iteration, but each iteration is like a single step of a divide, involving only comparing, subtracting, and shifting.

This approach proved to be the fastest, requiring only 5.2 milliseconds on average--more than 9 times faster than the Applesoft SQR function--and it is more accurate than the SQR function, too!

The only downside is that USR.SQR requires about 170 bytes of code, but it still fits nicely on page 3, as you can see from its listing.


To use the new routine, you simply BRUN the USR.SQR program, which installs the square root routine as the USR function. Then use USR(X) in place of SQR(X) and you'll find that square roots are much faster.

The routine is location-independent and can be run anywhere there's room for 181 bytes of code.

To obtain USR.SQR source and object code just download this ShrinkIt archive.