Wednesday, September 16, 2015

Hard Disks

I recently ran into a problem with the hard disk on one of my computers.  I'm not going to bore you with the details.  The only thing I want to extract from my disk problem is that I found out that one aspect of hard disks is evolving.  At this point you are probably thinking that this post is going to be all about technical computer stuff.  You're right!  If that's okay, keep reading.  If it isn't here is a good place to quit.  I'll leave the details about what is evolving to the end of this post.  Instead what I am going to do is one of those "how did we get here" things.

I have been around for a significant part of the creation and evolution of hard disks.  I routinely find that the younger generation is completely unfamiliar with what I consider to be significant milestones in the story of how things got to be the way they now are.  That's fine.  There are lots of people who came before me and are up on milestones that happened before I was paying attention.  So I am just as guilty of that sort if thing as the next person.  But I am interested in history and the evolution of technology.  And I am in a good position to lay it out, at least with respect to disk drives.  So I want to do that for whoever might be interested while I am in a position to do so.  Where to start?

I am going to start with the invention of the phonograph in 1877 by Thomas Edison.  Edison was interested in being able to save and reproduce sound.  The key components of the device he came up with were a cylinder and a needle.  The needle was connected to a horn assembly which funneled the energy of the sound to where the needle was.  This amplified the energy available to move the needle.  And all this caused the needle to move by an amount proportional to the intensity of the sound.  The needle ran along a spiral path in a cylinder with a foil surface with the sound causing wiggles that mirrored the pattern of the sound.

The point I want to make, and will be continually making, is that there were choices available to Edison.  He made certain selections among those choices but he could have made other selections and ended up with a working device.  In Edison's case he made two selections I want to highlight.  He selected a cylindrical shape and a spiral path.  Focusing on the rest of the machine for a moment, these choices allowed him to keep the needle in a fixed position.  This allowed him to connect an elaborate funnel mechanism to the needle.  And this allowed him to focus enough sound energy that the needle moved a relatively large amount under the influence of relatively quiet sounds.  Now back to the cylinder and the spiral.

In Edison's time there existed a machine called a lathe.  A lathe rotates a piece of material like wood.  If you move a fixed blade along the axis of rotation you can grind complex shapes into long skinny pieces of wood.  A typical example of this is a baseball bat.  But the profile of the finished piece can be pretty much anything you want.  Lathes turned out to be very useful wood working devices and they were common in any well equipped shop of Edison's day.  Other lathes were designed to handle other materials like metal.

Putting a foil covered cylinder on a lathe-like device allowed the cylinder to move under the fixed needle along a long path.  Adding something called a worm screw allowed the cylinder to shift evenly.  So instead of the needle inscribing a circle as the cylinder rotated it inscribed a spiral as the cylinder simultaneously rotated and moved slowly sideways.  This was the simplest mechanism available to Edison to allow a needle to move along a long grove on a relatively small device (the recording cylinder).

But it turned out that there was a better, although more complex solution available.  By using a more complex mechanism the same long spiral could be inscribed on a flat circular platter.  After some years competitors to Edison came out with "platter based" phonographs.  The platters were more compact and quickly came to dominate the market.  We will be returning to all these choices as the story continues.  I am now going to fast forward to roughly the end of World War II.

About this time a magnetic technique evolved that was capable of recording sound.  In its initial implementation a wire was coated with a material containing small magnetizeable particles.  A "recording head" could write a magnetic signal into the particles.  A "reading head" could read it back.  The result was something mostly lost to history called a wire recorder.  But a lot of tinkering quickly morphed this into something else.  Instead of glopping the magnetic material onto a wire it got glopped onto a thin strip of strong material something like movie film.  In a popular incarnation up to 4 "tracks" could be recorded as paths along a long 1/4 inch wide strip.  By using the first and third track to record a "left" and "right" signal a stereo recording could be made.

Flipping the tape over turned tracks two and four into tracks one and three and the tape could be played back in the other direction.  This gave the tape a "B" side, analogous to the "B" side of a record.  This all came together as the "reel to reel" audio tape recorder/player that was popular for several decades.  After a time the tape was packaged into a "cassette" housing that made it easier to use.  The concept remained the same but the repackaging gave us the audio cassette recorder/player that extended the life of audio tape machines for another couple of decades.

And fairly early people figured out that you could record computer data instead of sound using the same or similar equipment.  So by the '50s you had tape recording systems that were specially adapted for computer use.  Instead of using a ribbon that was a quarter of an inch across they used one that was a half an inch across.  And the technology evolved over the years to increase the amount of data that could be recorded on a standard reel of computer tape.  But no matter what the engineers did the result was pretty inflexible.

Suppose the tape was positioned near the front of the reel and you wanted to access data toward the back.  The only thing to do was to spin through all the intervening tape to get to where you wanted to go.  You had the same problem if the reel was positioned toward the back and you wanted to access data near the front.  A reel of tape was a "sequential access" system.  But "random access" is incredibly convenient in lots of situations.  And that led, also in the '50s, to the development of the disk drive.

The disk drive was a throwback to the flat phonograph record.  A rigid metal platter was smeared with the same kind of magnetic goo as that used for reels of magnetic tape.  The needle was replaced by a "read head" (and a "write head") that was kept in a relatively fixed position.  The platter was allowed to revolve beneath the head.  There were no longer physical groves as in a phonograph record but there was a magnetic path that performed the same function.  And with the phonograph record the spiral groove automatically carried the needle slowly from the outside of the record to the center.  You just needed to make sure that the arm the needle was on could pivot.  (That was one of the complexities that had to be added to move from the Edison "cylinder" phonograph design to the later "platter" design.)  But in the disk drive case there was no groove.  What to do?

This question caused early disk drive designers to make a different choice than their phonograph brethren had.  They dumped the spiral design and opted for a series of concentric circles.  This design choice was an attempt to emulate Edison's "keep it simple stupid" thinking.  They kept the "arm that pivots" idea for holding their needle-equivalent, the read/write head.  But now they could use a system that pivoted the arm to a specific angle then held it there.  What would now pass under the "head" would be a circular path on the disk surface.

The head movement mechanism would be more complicated that the equivalent mechanism on a phonograph but the additional complexity was a manageable amount.  It would be designed to accurately pivot the head to any one of many specific angles then hold the head steady at that position.  Reproducible results (the head could be reliably positioned to the correct circle or "track") were achievable.  The circular path could be treated as if it was a short piece of magnetic tape.  Within a specific "track", as they came to be called, you had the standard "sequential access" problem.  But you could move to a distant piece of data by pivoting the head arm to the correct setting and waiting for the part of the track you were interested in to move under the head.  It wasn't perfect but it was a big improvement.

And once you had one surface spinning away under a head it quickly became apparent that putting another head and another layer of goo on the bottom side of the platter was a good idea.  This would double the capacity of your device.  And since we have come this far, how about stacking multiple platters on the same spindle?  Each platter would add two sides resulting in that much more capacity.  And that's what happened.  Devices were quickly developed consisting of several platters, all rotating around the same spindle, and with each surface having its own arm and its own read/write head.  But this opened up an opportunity for another decision.  Would all the heads be operated independently or would they all be "ganged" together.  I don't know what thinking went into the decision but all disk drives I know of use the "ganged together" design.  A single mechanism moves all the arms simultaneously to the same angle thus positioning all heads over the same track on their respective surfaces.

And implicit in all this is another design decision.  Phonograph records rotate at a constant RPM.  You have "33", "45", or "78" records.  With phonographs this is a sensible decision.  Again a rotating mechanism that runs at a constant speed is simpler.  But there is something lost.  The amount of inches traversed in a groove along the outside of the record in one revolution is much longer that it is for an inside groove.  But the amount of information that can be stored in an inch of groove is constant.  So you are consuming a lot more inches of groove to store the same fixed amount of information (all rotations take the same amount of time) on an outside groove as you are on an inside groove.  Engineers knew this.  They took a look at how much information they could reliably store on an inside groove and used that to set the specifications for the entire phonograph record.  They just lived with the loss caused by spinning the record too fast to make the best use of the outside grooves.  But it did not have to be that way.

I have a "laserdisc" player.  It is a technology that didn't really catch on for reasons I am not going to go into now.  The technology was an early version of the CD/DVD/Blu Ray technology that is now ubiquitous.  But time had marched on between the debut of the various generations of phonograph technology represented by "33", "45", and"78" records and the debut of the laserdisc.  It was now possible to vary the rotation rate of a laserdisc depending on whether you were reading from the inside or the outside of the disk.  I have a fancy laserdisc player that is capable of playing "CLV" or "CAV" disks.  CAV stands for Constant Angular Velocity.  This is like a phonograph record playing at a fixed RPM.  A CLV disk uses a higher RPM when processing inner tracks and a lower RPM when processing outer tracks.  This speed variability results in a Constant Linear Velocity.  Every inch of path contains the same amount of data.  But the outer tracks can store more data than the inner ones so a CAV disk holds more data.  This supposedly gives you a sharper picture but I was never able to tell the difference.

So it is possible to go with a CLV design for disk drives but as far as I know, no one has decided to do that.  All disk drives are CAV devices.  And let me revisit another design decision.  Early disk drives opted for s series of concentric circular paths.  To this day that is still true, as far as I know.  But my laserdisc did not.  It opted for the phonograph style spiral path.  And so do CDs, even digital "data" ones and Blu Rays.  Again the cost of dealing with complexity has dropped precipitously.  So instead of just having a mechanism that swings the arm the head is on to a specific angle a much more complex method is used.

The arm is swung to approximately the correct setting.  Then the track is found using some kind of complex method.  I don't know the details but we can all testify that it works.  Then data is read from the track.  Imbedded in the data is a track number.  If it's the right track, fine.  Otherwise the arm is repositioned and the process repeated until the correct track is found.  And remember the track is a spiral.  So complex but highly reliable techniques cause the head to jump back a spiral automatically once per rotation if it is important to stay fixed on the same track.  The end result is to make a spiral track behave like it is instead a series of concentric circular tracks.  Magic!

Next I want to take a look at what is on the track.  Here again choices have been made.  Let me first discuss the format used on disk drives designed for use with IBM mainframe computers back in the stone age of computers (before PCs).  The acronym (there is always an acronym) is CKD.  It stands for Count Key Data.  Disk drives were very expensive back then and this was an attempt to squeeze out all the performance the equipment was capable of.  As I indicated above, a track can be thought as a short piece of magnetic tape.  I don't know all the details but at the lowest level you have to know where the data starts and where it ends.  What I do know is that the engineers solved this problem and the solution involved something called "inter-record gaps".  All we need to know about these beasts is that they are necessary and that they take up space.  So on a track we have inter-record gaps and we have the parts we care about.

Early disk drives were unreliable so the last part of each block of "care about" stuff was a checksum, some data that could be used to confirm that the rest of the usable stuff had not gotten garbled.  That leaves us with the C, K, and D, parts.  The Count was just a relative record number on the track.  This allowed you to confirm that you were getting the block of data that you intended to get.  The Key part was something that sounded like a good idea but never worked well in practice.  The idea was you could give the disk drive a command that said "get me the record that has this specific key value".  This would offload activity from the very expensive computer to the hopefully far less expensive disk drive.  But nobody every came up with a way to make good use of this capability.  But it stayed in the spec.  The rest of the "care about" stuff, the Data part, was exactly what you would expect.  It was whatever data you said you wanted the record to contain.  This yet another example of the actual process being way more complex than you would think it would be.

And you can see this is quite a sophisticated approach.  And have I mentioned that the amount of data could be anything you wanted as long as it fit on the track (or the unused remaining part of the track, if the track already contained some records).  IBM came out with a series of disk drives over a period of several decades.  Not surprising the size of a disk track increased as time went by.  Two track lengths that characterized devices late in this sequence were 13030 bytes and 19069 bytes.  Those were not exactly obvious choices.  I assume they were dictated by what the technology of the time could be made to deliver.  And these "capacity" numbers represented a best case.  You could write a single record of the specified size on a single track.  And, of course, it could not contain a key.  But doing this allowed you to squeeze absolutely the most bytes of data possible onto a specific device.  If you chose to write two or more records on the same track those pesky inter-record gaps got in the way and the amount of data the track would hold went down.  Fortunately, IBM provided handy tables for figuring out how many records of a specific size would fit.

And, oh by the way, this CKD approach has completely fallen by the wayside.  It has been replaced by a method called FBA.  FBA stands for Fixed Block Architecture.  The idea here is that all the data written on the drive consists of blocks of data that are all the same fixed size.  Things are simplified.  There are now no complex calculations in deciding if a block of data will fit on a track.  But that simplification involves a trade off.  IBM used a 4K (specifically 4096 bytes) block size on their FBA devices (actually the same physical device as its CKD sibling but with different microcode). 

Consider the CKD device with a track size of 19069.  We only have room for 4 4K blocks totaling 16,384 bytes.  (In case you are wondering, 4 4K records fit even after you factor in the inter-record gaps.)  We give up about 20% of the capacity of the track.  Back when disk drives were really expensive that was too much of a penalty to pay in most situations and people were willing to put up with the complexity of the CKD architecture to get the additional capacity.  But now disk drives are so cheap that people prefer to go with the simple but less efficient FBA approach so they do.

Now "let's return to those thrilling days of yesteryear" (a quote from the old Lone Ranger radio and later TV show, and most recently a really terrible Johnny Depp movie  - yuck).  Remember how disk drives work.  You have what is now called the head-disk-assembly, the combination or arms, actuators, and heads).  It swings into position so that the several heads can each process a circle on the appropriate disk surface.  Imagine the whole mechanism was invisible and we only looked at the tracks of the heads.  You would have a set of equal diameter circles stacked one on top of the other.  With a little imagination we could see this pattern making up a cylinder and "cylinder" became the shorthand name for the set of circles that the heads as a group could access, once they were ready. 

Then you have a number of heads and you select one.  Then you pick the record you wish to process among the several that may exist on the track you have picked.  It's computers so we reduce all this to numbers.  We have a cylinder number, a head number, and a record number.  This trio of numbers can be used to uniquely specify a single record on a disk drive.  For reasons I am not going to get into IBM used the acronym CCHHR.  And at the hardware level, this is how it still works.  You swing the arms holding the heads to the location specified by the CC, you select the head specified by the HH and you select the record specified by the R.  That's still how it works.  But the fact that that's how it works is now completely disguised.  So let's look into the disguise a bit.

Moving forward to a more recent era but still one that is a ways from the present, Microsoft used, and to some extent still uses, something called the FAT file system.  FAT stands for File Allocation Table.  The original version is now called FAT12.  That's because 12 bits were used to specify what I called the CCHHR above.  Some bits specified the cylinder number, some bits the head number, and a few bits specified the record number.  The FAT12 system was designed to handle floppy disks.  The original PC floppy had one side, hence one head.  Floppies used the FBA architecture and the specification for the original floppy called for exactly 8 records on a track.  That specification was capable of holding 160KB of data because there weren't that many cylinders either (40, if you care).

The specification was quickly tweaked to allow for two sides and 9 records per track yielding a total capacity of 360KB.  (Later iterations kicked the capacity of "floppy" disks up to 1.44 MB - or 2.88 MB in a version that never really caught on.)  The allocation of bits in the FAT12 spec was easily able to handle this.  But within a couple of years the hard disk came along as an accessory to the PC.  The original PC hard disk had a capacity of 10MB, tiny by current standards, but beyond the capability of FAT12 to handle.  So the FAT12 specification was supplemented by the FAT16 specification, which could handle 10 MB and more.  But it was still limited to so many bits for cylinders, so many bits for heads, and so many bits for records.  And after not many years this became a problem.

It particularly became a problem because there was room for lots of heads.  Heads equate to platters.  But hardware makers found that it was a bad idea to have lots of platters.  It was much easier to squeeze more tracks on a surface and more blocks in a track.  So the embarrassing situation developed where the head field was too big while the cylinder number and/or record number field was too small.  It didn't take the hardware makers to come up with a cheat.  Why not lie?  Claim your disk drive has twice as many heads as it actually has but half as many cylinders.  As long as the disk controller faked things up properly the computer would never notice.  And that's what disk makers did and it worked for a while.

But even with this trick they only had 16 bits to work with.  If you used all possible combinations of all the bits you could only have 65,536 blocks of 512 bytes each.  That's 32 MB and it didn't take disk makers very long to figure out how to make disks bigger than 32 MB.  Microsoft eventually moved on to FAT32.  But by this time the whole CCHHR thing looked pretty ridiculous.  Why not just call the first block on the first track of the first cylinder block "0" (Computers like to count starting from zero rather than one).  Call the next block "1", and so on.  The first block on the second track would just use the next number in line.  You just keep counting in the same manner until you get to the end of the disk.  Things all of a sudden get much simpler.  A disk capacity is just x blocks and you let the relative block number be translated by the disk controller whatever way the controller wants to.  As long as the disk is relatively fast and the block with a specific relative block number ends up in same place, who cares where it really is?

That's how the situation has been handled for some time now.  But we have again lost something.  If you know how things actually are you can use that information to improve performance.  It takes time to reposition the heads.  If you arrange things so that the block you want is one of those that is under the heads at their current position, you can get to it faster.  Now lets say that the block we want is coming up but is under a different head than the one that is currently selected.  We can select a different head at electronic speeds.  That's really fast.  So theoretically you can play tricks to achieve top performance if you know the details of the disk geometry and can depend on your knowledge being correct.  But it turned out to be really hard to actually get improved performance by playing these kinds of tricks.

And it turns out that there are other tricks that can be played under the modern rules.  It is cheap to put a little intelligence and some buffering capability into modern disk controllers.  In this environment controllers can play tricks.  An easy one is to copy the data to a buffer instead of immediately writing it to disk.  You tell the computer the write is done immediately.  Then at some convenient later time you actually write the data to disk.  If nothing goes wrong everything works as expected, only faster, and nobody is the wiser.

Another simple trick is called prospective read-ahead.  If a certain block is read what's the most likely next block to be read?  The next one.  What's the easiest block to read next?  The next one.  So if the controller reads the current block, passes it along to the computer but then also reads the next block into its buffer without being asked what's the harm?  Nothing, if the buffer is not full.  But the benefit is that the next block can be passed back to the computer immediately from the buffer if the controller in fact receives a read request for it.  It turns out that these simple tricks and a number of much more complicated ones can be implemented by modern disk controllers.  They result in an increased effective speed of the disk drive.  But you can either go with these tricks or the old CKD tricks.  It is somewhere between difficult and impossible to combine the two.

I have touched indirectly on the final topic I want to discuss.  It is the one I alluded to in the introduction.  As I indicated above, back in the day IBM picked 4096 bytes for their FBA block size.  So where did 512 bytes, the block size now in common use, come from?  It turns out that it came from Unix.  Unix has always used FBA architecture for their disks.  I don't know why Unix picked this number.  Here are a couple of theories.  Unix is the bastard stepchild of an operating system called Multics.  Multics was a joint development effort by General Electric and the Massachusetts Institute of Technology.  I know very little about Multics beyond that.  But it is possible that Unix took FBA and a 512 byte block from Multics.

The other theory I have has to do with the hardware Unix was originally developed on.  Unix was originally developed at Bell Labs, which had a number of Multics systems in house at the time.  But the original version of Unix was developed on minicomputers manufactured by Digital Equipment Corporation.  GE still exists but has long since exited the computer business.  DEC was for a time an extremely successful computer company.  For a couple of years it was so successful that it had the biggest market capitalization (stock price times number of outstanding shares) of any computer company.  At that time it was bigger than IBM (at the time a very large company) and Microsoft (at the time a small company) and Apple (at the time a very small company).  But alas, DEC is no more and I know only a little more about DEC than I know about Multics.

And it is always possible that the original developers of Unix picked 512 for some other reason.  And in any case, Microsoft adopted a block size of 512 for DOS and carried that decision over to Windows.  And it has been a good choice for a very long time.  But time marches on and with the march of time hard drive sizes keep growing.  Now anything less that 1 GB in a single hard drive is considered small.  To be considered large a hard drive now has to have a capacity of 1 TB or more.  And these multi-terabyte drives only cost a few hundred dollars.

A 1 TB hard drive has about a zillion 512 byte blocks on it.  That's pretty ridiculous.  And the solution is obvious.  Go to a bigger block size.  And that process is under way.  The industry has settled on a new size.  It is, not surprisingly, 4096 bytes or 4K.  Devices conforming to this standard are sometimes called "large-sector" drives.  And the existence of this move to large-sector drives is the thing I learned as a side effect of my recent hard drive problems.  A 1 TB large-sector drive will only have 1/8 of a zillion blocks of data.  Don't you feel much better now?

The transition to 4K blocks is well, a transition.  But it is just the next one in a long sequence of transitions.  And I expect it to be a pretty smooth one.  Most modern software is written in "C" or one of it's children, C++, Java, etc.  None of the standard I/O libraries for these languages try to look directly at the hardware in the same way that software written in the old days for IBM mainframes did.  Instead they just expect to deal with a string of bytes.  They are totally indifferent to the fact that the continuous string of bytes that they deal with might in actually be handled in blocks inside some low level device driver.  They can't see in there.  And given that they definitely don't care that the block size might change from 512 bytes per block to 4096 bytes per block at some point.

There are two  exceptions to this.  The first exception is the OS itself (Windows, Unix, iOS, etc.).  Then there are some utilities whose job is to pay attention to what the hard drive actually looks like.  They will care that the data blocks on a large-sector hard drive are 4096 bytes in size rather than 512 bytes.  That's their job.  But even most utilities won't notice the change as they don't concern themselves with low level hard drive issues.

Changes will be required.   Microsoft has already incorporated these changes into the latest versions of its operating systems.  Most hard drive oriented utilities have also incorporated the necessary changes in the newer versions of their offerings.  And most operating system vendors are either ahead of Microsoft or, at worst, close behind.

Even given all that I would recommend that the average user stay away from these new devices for a while.  You generally don't need a disk drive that is big enough (say something over 10 GB) where the change might possibly make a difference.  So for the moment stick to disks that use the old format.  If you see a disk that says "4K" or "large-sector" pick a different model that doesn't.  This is advice that I think will hold say through 2017.  By that time all the necessary changes will have been rolled out and the bugs fixed.  This recommendation to possibly go with the new specification at that point only pertains to people purchasing new hardware that comes with the latest (or near-latest) version of the operating system.  If you are running an old OS, particularly one that is pre-2014, definitely stay away from the new hard drives.

And now you have it, the ammunition to bore to death anyone you meet at a cocktail party that needs boring to death. I try to help where I can.

No comments:

Post a Comment