| Anthony Bailey ( @ 2005-11-27 11:38:00 |
| Entry tags: | software_development |
(Ten to the) Three is the magic number
A few weeks ago a pair of us (dnt, GABA) at work were looking at a performance issue in the latest version of our medical imaging software product. A little investigation showed up a noticeable delay during a process that at core involved the insertion of an object into an ordered list. The obvious optimization of using a binary search rather than a linear one to find the insertion point got us our n2 to n log n performance improvement and that was all done and dusted.
This kind of algorithm change is classic (as in extremely old and fundamental, as well as geekily fun) software engineering stuff just like they taught me at school (well, first year CompSci.) So I got to wondering: why was it the first time I've actually had to worry explicitly about this kind of thing in more than six years on the product? It's not like we aren't regularly thinking hard about performance issues.
Dave and I surmised that it was to do with the size of n. The concrete problem was building a volume of data from the order of a thousand spatially-related medical images. This turns out to be about the only place where we deal with this kind of number of a given thing.
At core, we have to load and render images (and the volumes built from them.) Images have a million-ish pixels (and hence the volumes built from them can have a billion-ish voxels.) These numbers are too big for the above flavor of n to log n algorithmic enhancement to come into play. If the algorithm is more than order n your user is already asleep, and if it were plain log n you've got to ask yourself what's happening to most of this alleged input data. So most of the efficiency work on pixel/voxel processing involves linear speed-ups by using assembler/intrinsics or in-lining or moving stuff out of loops.
(I'm eliding some genuine cunning about working out which millions of the billions actually contribute to a given volume rendering here, but that's domain-specific smarts, not first-year CompSci binary chop stuff. Another thing that turns out to be important is processing data in the right order and balancing caching against doing stuff on the fly so as to not be waiting for fetches from RAM - again, I'll leave that to the renderheads.)
Day-to-day stuff I'm doing has often been about constructing some relatively small number of views, drawing tens of graphics, and doing mouse-driven direct manipulation of stuff such as poly-lines with perhaps fifty or so points. These numbers tend to be too small for algorithmic optimization to be very valuable in cutting down a frame-rate dominated by rendering cost, and other pressures on the code dominate.
And so ten to the three (or so) is a sweet but locally rare spot for n where the difference between n2 and n log n are both plausible costs and the difference between them are significant in our typical work-flow. (Actually, two-and-a-half is probably at least as sweet, but I don't know any good songs about that number...)