Sunday, August 27, 2017

A Brief History of Computers

Trust me!  This is not going to be another one of those "and then so and so invented the such and such" kind of historical pieces.  But that won't stop me from starting by mentioning a name.

Paul Allen is the other founder of Microsoft.  He doesn't get as much ink as Bill Gates but that doesn't stop him from being interesting in his own right.  Allen suffers from serious long term health issues.  They are the kind of issues that led him to conclude he was not destined to live a long and healthy life.  So after Microsoft got established and Allen found himself fabulously wealthy he decided he better start playing while he still had the time.

So he played.  Most obviously, back in 1988 he bought the Portland Trailblazers NBA basketball team.  He has an NBA regulation basketball court "out back" of his house in case his players want to shoot a few hoops when they stop by to visit.  And apparently one big time professional team was not enough so he purchased the Seattle Seahawks NFL football team in 1996.  He made an unsuccessful  run at the 31st America's Cup sailboat race in 2003.  But his investment in fun is not limited to big money competitive sports.

He built the ugliest building in Seattle.  It was designed to his specifications by the renowned architect Frank Gehry and reputedly cost $250 million to build.  The original name was the Experience Music Project and it was going to be a tribute to Jimi Hendrix, the seminal rock and roll guitarist.  But disputes with Hendrix's family scuttled that idea.  It has since morphed into something called the Museum of Pop Culture, or MoPOP for short.

If I detailed a complete list of all of Allen's interesting and varied endeavors I would never get to the point of this post.  So I am going to mention only one.  It is a smaller and even less well known museum than MoPOP.  It is the Living Computer Museum.  I'm a computer guy and I confess to having been unfamiliar with its existence until recently.  But when I was finally made aware of its existence I had to check it out so I did.  And that visit is the motivation behind this post.

I started this blog in October of 2010.  One of the first posts I published was called "Computers I have Known".  Here's the link:  http://sigma5.blogspot.com/2010/10/computers-i-have-known.html.  In it I mention several computers.  A close family relative to almost all of them can be found in the "Mainframe" section of The Living Computer Museum.  (BTW, it is called "living" because much of the computer equipment on display is in working condition and patrons can play with it.)  Here's the list (and here's where I start listing off such and such kind of computer in violation of my promise above, but it's a short violation).

The first computer I mention is the IBM 7094.  The Museum doesn't have one but it is looking for one.  The Museum was also a bust on the second machine I discussed, the Burroughs B5500.  They are looking for one of those too.  But the third computer I discussed is the CDC 6400.  They have one of it's siblings, a 6500.  The fourth computer I talked about was the XDS (Xerox Data Systems, but before that the SDS - Scientific Data Systems) Sigma 5.  Again the Museum has a sibling, a Sigma 9.  I then moved on to the IBM 360/40.  The Museum has a sibling here too, namely a 360/30.

Finally I talk about a non-Mainframe computer, the IBM PC.  Here the Museum has a number of PCs as well as non-IBM clones made at around the same time the PC came out in 1981.  The Museum also has an interesting collection of minicomputers (I'm into going to go into what a "minicomputer" is) and a great selection of pre- and post- IBM PC home computers.  And, in the case of pre-, they have various makes and models on display stretching all the way back to the Apple-I and even before.  I strongly recommend a visit if you have any interest in computers and find yourself in Seattle.

So that's the preamble.  Now on to the main subject.  I promised you brief (or at least as close to brief as I can come) so here's the outline.  I am going to divide the history of computers into three periods:  the pre-computer period, the computer period, and the post-computer period.  That last part may seem weird but stick with me.  I promise it will make sense by the time I am done.

A case can be made for starting the history of pre-computers by going all the way back to the abacus.  After all it is a mechanical device for assisting in performing addition and subtraction.  And the digit-by-digit, carry-borrow techniques used with an abacus model how addition and subtraction are done by hand.  They also model how it is done by computers.

A similar case can be made for the slide rule.  This is a device people are no longer familiar with so let me give you the flavor of how it works.  The basic trick is to use a mathematical process called logarithms to convert multiplication and division problems into problems involving only addition and subtraction.  I'm not going to go into the details but scales on a slide rule translate numbers into distances.  By properly positioning the "slide" distances can be added or subtracted in such a way that the answer to a multiplication or division problem can be read off a scale.

Both the abacus and the slide rule are generally thought of by most people as being "not a computer" devices.  The first devices that were obviously computers to our modern eye were two machines made by Charles Babbage called the Difference Engine and the Analytical Engine.  But these devices had absolutely no influence on the development of the modern computer.  That's because Babbage never got either of them to work so they were generally forgotten about shortly after his death.  Their existence and principles were only rediscovered when modern historians started rooting around when computers became a thing.

So let's move on to things that are at least computer adjacent.  And let's start with something simple:  what's a "computer"?  The answer has always been simple but it has changed drastically over time.  There have always been problems that required massive amounts of computation to handle.  Typical examples are determining the airflow across a wing and determining if a bridge is strong enough.  So there has long been an interest in some way to perform massive amounts of computation quickly, accurately, and cheaply.  But for most of history there hasn't been much of a solution.  Various people at various times have become famous for performing prodigious amounts of computation.  I'm not going to bore you with their names but trust me, they were few and far between.

But by the 1930's mechanical devices could be bought that cost roughly a thousand dollars (then a lot of money but not an immense amount of money), that fit on a desktop (they were roughly the size of the eponymous breadbox), and were capable of adding, subtracting, multiplying, and dividing.  This caused technology companies to set up "computer rooms".  These consisted of rows of desks, each equipped with one of these machines and each staffed by a person called a "computer".

These computers (actually people) would be given sheets of numbers and would perform "chains" of calculations.   "For each row add the number in column 1 to the number in column 2 and put the result in column 3.  Then multiply the number in column 3 by the number in column 4 and put the result in column 5.  Finally, add up all the numbers in column 5 and put the result in the box in the bottom right corner of the page".  That sort of thing.  So at this time a "computer" was "a person, usually a woman, who performed large quantities of chain computations".  Now that we have figured out what a "computer" is during this period we can return to the subject at hand.

Then World War II came along.  Various problems, most prominently the need to create "ballistics tables" for field artillery, made the need for large quantities of chained computations acute.  There was just too much "chain calculation" work that needed to be completed.  And all of it needed to be completed by yesterday, if not sooner.  The War was on and people were desperate for anything that looked like it could speed things up.

A number of devices were created just prior to the War or during the early days of the war.  Two of note were the Harvard Mark I and the Colossus machines used at Bletchley Park for code breaking.  I can't find much technical information about either.  But as far as I can tell the Mark I was a super duper adding machine.  And the Colossus machines (roughly a dozen were built) were custom devices used to assist in code breaking.  And as far as I can tell neither these nor other devices mentioned by others had a significant influence on the design of what we now call a "computer".  As far as I can tell neither was capable of high volume chained calculation and that's what people were looking for.

By most telling's, including my telling in the post I referred to above, the first computer was the ENIAC.  I have changed my mind.  I am going to instead characterize the ENIAC as the last pre-computer.  Here's why.  What we now think of a "computer" is what is more formally called a "stored program computer".  You programmed the ENIAC by physically plugging various "functional unit" components together using "patch" cables.  This connected up the various functional units in the proper way to perform the specific chain calculation you wanted to make.  Once everything was set up you could feed large batches of chained calculations into it as long as you only wanted to do one particular chain of calculation.  If you wanted to execute a different series of chain calculations you broke all of the functional units apart and used "patch" cables to hook then together differently.

In spite of being the last pre-computer, the ENIAC is still fantastically important because all designs for what we now think of as computers are based on the ENIAC if you go back far enough.  With the advent of stored program computers the chain of computations could be changed by simply loading a different "program" into the computer, a matter of at most a few minutes.  It took hours to rewire the ENIAC to change the how the chain of computations was interconnected.  Implementing the "stored program" capability required including two additional functional units to the ones ENIAC had.

The first additional functional unit was bulk memory.  ENIAC could, by various trickery, be given access to a couple of numbers to be plugged into the calculation chain at a specific point.  And the output of one functional unit could be shunted to another functional unit, the practical equivalent of a tiny amount of memory.  But ENIAC couldn't keep say 100 or 1,000 numbers handy.  So various schemes were devised to provide a place where numbers could be stored then later retrieved.  Early versions were very limited both but better than nothing.  One early scheme was the "Mercury delay line".  It consisted literally of a small hose filled with Mercury.

A speaker would beep into one end.  The beep would take a fixed amount of time to travel down the hose to the other end where a microphone was positioned.  A not very large (by modern standards) set of numbers could be saved and later retrieved by having them travel around in a loop.  This and other early schemes were all eventually supplanted by "Ferrite core" (often shorthanded as "core") memory.

A large number of tiny donut shaped pieces of magnetic material were employed.  Each donut was constructed so that it could be flipped between "North" and "South".  Clever logic allowed each donut to be independently flipped say to "North" to represent a binary "0", or say to "South" to represent a binary "1".  This was a big improvement over previous technologies but it was eventually replaced, for good reason, by memory based in integrated circuits, what we now call RAM.

So in the early days various schemes for implementing some amount of memory were used with more or less success.  So there was now a place to put the "program".  The second necessary component was a functional unit that could take a small chunk of memory, often referred to as a "word", decode it, and use it to on the fly set up the proper connections between the other functional units.  The addition of these two additional functional units, a memory unit, and an "instruction decode" unit resulted in the first "computer" in the modern sense of the word, "a machine that can perform large amounts of chain computations".

And the first people to do this, to build a computer that included memory and an instruction decode unit, were the British.  In 1948 they got something called the "Manchester SSEM" machine up and running.  So the computer age actually starts with this machine.  It was quickly followed by many other machines.  I want to single out only one other machine from this period, the EDVAC.  It was a fully functional stored program computer.  Why it is important is that the machine was developed in close collaboration with Eckert and Mauchly, the people behind ENIAC and the detailed design of EDVAC was made freely available to the public.  So all subsequent computers, if you follow things back far enough, trace their designs back to the EDVAC.  And the EDVAC traces its design back to the ENIAC.

Things moved at a fast and furious pace from here.  But from here on changes are primarily evolutionary rather than revolutionary.  The change from vacuum tubes to solid state (originally transistors, later integrated circuits) was big in terms of better performance but it too was an evolutionary change.  In fact the IBM 7094 I mentioned above was just a "plus" version of the 7090.  And the 7090 was literally a plug compatible transistorized version of the IBM 709, a vacuum tube machine.

There is only one change that came out of this period that I think is worth mentioning.  And that is the move to the use of microcode.

In the computer business the actual physical device is shorthanded as "hardware" and the programs we run on the hardware are shorthanded as "software".  In the early days when these terms first came into common usage they made a lot of sense.  As I pointed out with the ENIAC you had to literally rewire the functional units together a different way in order to change the specifics of the chain of computations you wanted to perform, the program, if you will.  ENIAC was a big improvement in that at least you didn't have to design and build new functional units but still.  And with the advent of stored program computers the program became even easier to change.  So it made sense to refer to the physical part as hardware because it was hard to change.  And it made sense to refer to the programs as software because they were soft, i.e. easy to change.

But that was then.  And this seemingly sensible concept that hardware is hard to change and software was easy to change has turned out to be not so sensible at all.  In a real sense the opposite is now true and has been for a long time.

Computer programmers depend on the fact that a given type of computer behaves in a certain precisely defined way.  Many details of a program are critically dependent on this being true.  And as computers got faster, cheaper, and more ubiquitous, a whole lot of computer programs got written.  It soon became common for many programs to be written for each computer.  If you wanted to change some of the details of how that computer worked changing any one program to work under the new rules was relatively quick, easy, and inexpensive.

But it was very hard to change them all and all at the same time.  So as the number of programs written for a specific machine got larger and larger it soon imposed a very rigid discipline.  The hardware had to keep behaving in exactly the way.  So in effect the software got hard.  It became harder and harder to make a mass change necessitated by some change in hardware behavior.  And if you wanted to build a new computer that would run all those programs correctly it had better behave exactly like the current one.  Collectively, this set of rules defining what the software expected of the hardware came to be called "computer architecture".

So at this point it looks like software has gotten hard and hardware has gotten even harder.  But that's where microcode comes in.

At the same time people started becoming aware of the whole "computer architecture" thing elaborate "instruction sets" came to be developed.  This meant that the functional unit that decoded the instructions and then set the various functional units up to do the actual work became quite complicated.  At some point someone (exactly who that someone was is lost to the mists of time and besides several people probably hit on the same concept at roughly the same time) said "why don't we treat this whole "instruction decode" business as a special kind of programming problem.  We'll set up the decode unit with its own instruction set that is custom designed for the task at hand and write a program to sort things out.  That way we can keep the complexity of the hardware manageable because we can now break the "decode and hand off" process down to a series of steps, each of which is relatively simple.

This sounds like a dumb idea.  This adds another layer to everything and that should slow things down.  But it turned out the extra layer made hardware design much simpler and clever tricks were found to keep things speedy in spite of the apparent additional complexity.  The hardware part of the computer could now be built up of relatively simple functional units.  And that made it easier to make the functional units blazingly fast.  Programming a "computer within the computer" handle all the quirks and peculiarities of the instruction set was hard.  But it was easier than designing the incredibly complex hardware that would otherwise have been necessary.

Microcode was so successful that it is how all computers have actually worked for more than a half century.  An early success for this idea is the IBM/360 line of computers introduced in 1964.  Models ranged from the very basic and cheap 360/20 to the "supercomputer of its day", the 360/90.  From a purely hardware standpoint none of them was actually an "IBM 360".  They are all very different sets of hardware that with the help of microcode that was customized for each model could successfully emulate an "IBM/360" computer.  As a result a computer program designed to run on an "IBM/360 architecture computer" would work properly no matter which of the 8 quite different IBM/360 model you picked to run it on.

And this was in spite of the fact that hardware setup for each model differed drastically.  Your program might run a whole lot faster on a 360/80 than it did on a 360/40 but as long as execution speed was not important if would work the same on either machine.  In fact it would run on later machines that didn't even exist when the program was written.  All that was necessary was that the machine conformed to the "IBM/360" architecture specification.

The capability of the native hardware on the smaller and less expensive models was modest.  But on that model the microcode would grind away on chunk after chunk after chunk of a particular instruction until it had the answer demanded by the architecture specification.  It may have taken hundreds or thousands of microcode instructions but in the end you had what you expected.  The faster, more complex, more expensive, models had much more capable hardware so it may have taken only a few microcode instructions to produce the same answer.  Since the hardware was faster in the first place and since far fewer microcode instructions had to be executed, from the outside the computer looked a lot faster.  But the point was both machines produced the same result.  If you had the money and the inclination you could get fast but the cost was high.  If you lacked one or the other you could get "it works".  It was just a lot slower.  But it also cost a lot less money.

This ability of microcode to permit the same architecture to be implemented at multiple cost and speed points was recognized right away.  But the other great advantage of microcode took longer to be appreciated.  The fact that microcode could allow quite different sets of hardware to be used to implement the same "architecture" freed hardware designers from the prison software had put them into.  It allowed them to make drastic changes.  IBM has since put out many "IBM 360 compatible" machines.

The hardware they are built from has changed radically in ways unimaginable to early computer designers, builders, and users.  But this idea of using microcode to implement an architecture on wildly different hardware platforms has been a key to the evolution in hardware we have seen over the decades.  And it has freed hardware people to make radical changes under the covers.  So hardware designs have turned out to be quite soft.  You can bend them and twist them in unbelievable ways and as long as the microcode implements the architecture correctly it's all good.

So for a long time we have been living in a world where hardware has become soft (changeable).  Change the hardware all you want.  As long as you come up with microcode that hews to the rules of the architecture everything will work fine when you are done.  At the same time we have long been living in a world where software has become hard.  Businesses still run software created in the '80s and '90s to run on "IBM 360 compatible" hardware even though such hardware has not existed in any practical way for a couple of decades now.  But microcode and "emulation" tricks allow that very same software to work just fine on hardware that looks nothing like the equipment IBM was selling back then.  And don't even get me started on the radical evolution of "Intel x86" hardware over the last few decades.

Let me wrap this section up by taking note of the movie "Hidden Figures".  It very nicely dramatizes the change in the definition of the word "computer".  In the early part of the movie we are introduced to a number of the members of a "computer department" at NASA.  These ladies' job consists of performing the very "chain calculations" in a "computer room" that I started this section off with.  But by the end of the movie these very same women are no longer "computers".  They are now "computer programmers" writing "programs" to run on a "computer", a machine that performs chains of mathematical calculations.  The movie even includes shots of an "IBM 7090" computer, one of the computers I discussed above.  Moving on . . .

I am going to arbitrarily date the start of the post-computer age to 1981, the year the IBM PC was introduced.  A case can be made for moving the date to a few years earlier or pegging it to a different event or moving it to a few years later.  I'm not going to argue the point because frankly my "line in the sand" is arbitrary.  There was no sharp dividing line.  The far more relevant question, however, is:  what the heck am I talking about?  I own something called a "computer" and I just bought it recently, you say.  So why aren't we still in the computer age?  That seems like a reasonable perspective because it is a reasonable perspective.  It's just not the perspective I am applying.  So let's get into the perspective I am applying.

I talked about the later days of the pre-computer age and the early days of the computer age in terms of chained computations.  And my point is that they were just that, computations.  Early computers were devoted to doing computations.  You put numbers in and you got numbers out.  And that was the point, numbers.  This was definitely true of what I call the computer age.  For instance in the early part of the computer age computer makers saw the market for computers as consisting of two segments, business computing and scientific/engineering computing.  It was so hard to build computers in the early days that manufacturers designed different machines for each segment.

Two kinds of businesses that were early adopters of business computers were banks and insurance companies.  With banks, for instance, the books had to balance to the penny.  Numbers were known with complete precision (the amount of the check is exactly $134.26).  Everybody's checking account was supposed to "balance" (be accurate to the penny).  The bank's books as a whole were supposed to "balance".  So banks needed computers that could hold numbers as exact values of moderate size.  Back in the day the bank I worked for needed to be able to handle numbers as large as a billion dollars and an interest rate that was accurate to 3 decimal places (8.683%).

But they didn't need to be able to handle truly giant numbers and they didn't need to be able to handle really tiny numbers.  They needed what I will call "abacus math".  Numbers were stored as exact quantities and arithmetic was performed on numbers consisting of digits.  Interesting enough, the ENIAC did arithmetic with digits.

Now consider the dimensions of your house.  Do you know exactly how big your house is?  Of course not.  My house has siding on it.  If you pick different places you can measure the length of my house and get an answer that differs by two inches or more.  And if we come up with exactly where and how this or that dimension is to be measured how accurate is the measurement?  Plus or minus a quarter of an inch?  Plus or minus a sixteenth?  Whatever it is you do not know exactly how big your house is.  And that sort of thinking extends generally to engineering and scientific problems.

You know what the value of some number is only to within a certain precision, often stated as a number of "significant digits".  The precision with which you know the value of various numbers varies with circumstances.  Sometimes you only have a rough idea of the actual value ("it's about ten miles to the store").  Other times the actual value is known with great precision.  The value of some physical constants are known to more than ten significant digits.

On the other hand sometimes it is important to handle very big numbers (i.e. the size of the Milky Way Galaxy) or very small ones (the diameter of a Proton).  The way that this is done is to handle the "value" of a number as one thing and where the decimal place goes as another thing.  Many of you may be familiar from a Chemistry class with Avogadro's number.  It is usually stated as 6.02 times 10 to the 23rd power.  This means we know it's value to 3 significant digits.  But the decimal point goes way over there.  This is an example of "slide rule math".  And someone who is good with a slide rule can usually get results that are accurate to three significant digits.  But the slide rule does not tell you where the decimal point goes.  You need to figure that out another way.

In the real world separating a number into a "value" part and a "where the decimal point goes" part is called Scientific Notation.  On computers it is known as Floating Point.  And interesting enough, the EDVAC was set up to do things the engineering/scientific way.  And whether a computer could handle business computations or engineering computations was important to early computer customers.  Because it was all about the numbers at that time.  And the time when it was all about the numbers is the time I'm calling the computer age.

But think of how computers are now used by most people most of the time.  The "numbers" application for the present age is the Excel spread sheet.  Excel is all about numbers.  And I am a numbers guy so I use Excel,  But I only use it a little here and a little there.  Mostly I do other things.  And the fact that most of what we now use computers for is "other things" is why I call the modern age the post-computer age.  It is no longer about the numbers.

And I arbitrarily date this change to using computer mostly for not number things to when the computer invaded the home.  That's an entirely arbitrary decision on my part.  A case can be made that the Apple II and the TRS-80 got there first.  And a case can be made that widespread home use of computers didn't happen until a few years after the PC came out.  There is no sharp line in the sand that we can use to mark the transition.  So since I am writing this post I get to pick.

Consider what I am using my computer for right now.  I am doing word processing.  Sure, under the covers all the letters and stuff are converted into numbers and then manipulated via arithmetic and such.  But in roughly 1981 the balance tilted.  People have very little use for spread sheets in their everyday lives.  They are, however, heavily invested in texting (a crude form of word processing), scanning the web (arguably a very disguised form of database manipulation), or dealing with audio and video (word processing for sounds or pictures).  All of this may involve manipulating numbers under the cover but it is mostly not about the numbers.  The numbers are just a means to a non-numerical end.

And there is a lot of number crunching still going on.  I write this as a hurricane is pasting Texas.  And the weather models that are used to predict what's going to happen are an extreme form of number crunching that was unimaginable during the computer age.  But there are literally hundreds of millions of computers that are dedicated to non-numeric uses like texting, scanning the Internet, and watching videos.  That pattern of usage completely overwhelms the computers that are primarily used for number crunching.  And that represents a change as profound as the transition from the pre-computer age to the computer age.

So okay I did fudge and throw in the names of some people and some computers.  And I did talk about some technical stuff.  What can I say?  Quoting Jessica Rabbit from "Who Framed Roger Rabbit", "I'm just drawn that way".  But I do think I have delivered a very unconventional brief history of computers.  Yes I do.

Saturday, August 12, 2017

Principia - Part 2

In part 1 (see http://sigma5.blogspot.com/2017/08/principia-part-1.html) I talked around Principia, Newton's foundational book about Celestial Mechanics and many other things.  In this post I am going to talk about what's in it.  But first a digression.  I am going to briefly discuss Analytic Geometry and Calculus.  DON"T PANIC!  The discussion will be almost entirely math free.  I just want to introduce some ideas that give you a basic feel about that each subject is about.

Starting with Analytic Geometry, imaging a piece of graph paper.  This is the common kind where it is just a piece of paper full of square boxes.  Now let's number each column along the top and each row down the side.  Now pick any box.  If we sight down the column of boxes containing our special box we can see the column number our box is in.  If we sight across the page along the row boxes we can see the row number our box is in.  If we list the row number and column number it uniquely identifies our special box.

Now imagine a flat piece of paper.  This time the paper starts out blank so there are no boxes and no row numbers and no column numbers.  But we can do a more sophisticated version of the same thing.  We draw a horizontal straight line across the page.  Then we draw a straight line that is perpendicular to the original line down the page.  It will cross our original line at a point called the Origin.  Then we use a ruler to mark a distance scale off on each of these lines.  We can now uniquely identify any single point on our piece of paper by citing it's "coordinates".  By convention the horizontal line is the "X axis" and positive distances go off to the right.  And by convention the vertical line is the "Y axis" and positive distances go up toward the top.  And assume the piece of paper is as big as we need it to be.

Now pick a point, any point, as long as it is not on either of our axes.  Now draw a straight line that is parallel to the X axis through our special point.  It will strike the Y axis at some point.  Draw a second straight line through our special point parallel to the Y axis.  It will strike the X axis at some point.  Read the distance along the X axis to the point of intersection and turn it into a number by using our scale.  Do the same thing with the Y axis.  This yields two numbers, conventionally recorded as say (3.74,-8.23), the coordinates of the point.  These coordinates uniquely identify the location of the point.  And this process can be used to determine the coordinates of any point.

Now we can turn things into algebra.  Instead of talking about the point (3.74, -8.23) we can talk about a point "at (x,y)".  And we can write algebraic equations involving x and y.  By convention, w, x, y, and z represent "variables", numbers whose values we may or may not know, and a, b, c d, represent "constants", numbers we at least in theory know the value of.  And we may allow the value of a variable to vary.  Okay, on we go.

Consider a very simple equation:  x = 0.  In Analytic Geometry we play around with equations like this.  They are always of the form (something) = (something else).  We ask the question "what are all the possible values of "y" that are consistent with our equation "x = 0" being true?".  Technically we ask "what is the locus of solutions for the equation 'x = 0'?"  A locus is just a bunch of points for which the equation in question is true.  And we can graph this particular locus by marking all the points in the locus on the paper.  It turns out to just be the Y axis.  Similarly the X axis is the locus of solutions to the equation "y = 0".  And that's the fundamental idea behind Analytic Geometry.

We can turn diagrams, the stuff Geometry has been about, into equations.  The reason we want to do this is there are a whole bunch of tricks from Algebra that we can now apply to solving problems.  Now consider the equation "x squared + y squared = 1".  It turns out if we graph the locus of solutions to this equation we get a circle around the Origin with a radius of 1.  In fact the graph of "x squared + y squared = r squared" gives us a circle with a radius of "r" (assumed to be a constant in this discussion).

And it turns out that there is an algebraic formulation for an ellipse, a parabola, etc.  And I have given the equation for a circle centered at the Origin.  There is a more complicated equation for a circle whose center is located elsewhere.  This is also true for an ellipse, etc.  But that's making things more complicated than I want to get into so I'm going to skip all that.  Because that's all I am going to say about Analytic Geometry.  See, that wasn't so hard.  On to Calculus.

Consider the equation "y = x".  If we graph its locus we get a diagonal line through the Origin going up and to the right (and also down and to the left) at a 45 degree angle.  Now let's say we want to calculate the area "under the curve" (and a straight line is a kind of curve) for "y = x" where x goes from 0 to 100.  It turns out that there is a simple way to do this that the Greeks figured out about 3,000 years ago.  But we are going to ignore that.  (We are going to do a lot of ignoring for the rest of this discussion.)  Let's divide things up into columns, one column for each inch (I am going to assume our scale is in inches to make the explanation easier to follow).  Consider the first column, the one going from x=0 to x=1.  It turns out we end up with a cute little triangle that is 0 inches high on the left (x = 0) and 1 inch high on the right (x = 1).  Let's set this triangle aside for the moment.

Now consider the next column.  It is 1 inch high on the left side and 2 inches high on the right.  This can be subdivide into a nice 1 inch by 1 inch square and another of those pesky triangles.  Let's set the triangle part aside.  We can now put the square into a bucket called "part of the solution".  Now we move on to the next column.  Using the same process we end up with a third pesky triangle and a rectangle that is two inches high and one inch wide.  It's area is obviously two square inches.  So let's add that two square inches into our "part of the solution" bucket and move on.

We keep doing this.  We end up with 100 pesky triangles and a bunch of rectangles.  What do all the rectangles add up to?  Let's just assume we have a method for figuring this out and ignore what both the method is and what the answer is.  Now consider our pesky little triangles.  The total area under the curve is what those rectangles add up to plus what those triangles add up to.  Assuming the triangles add up to something, which they obviously do, then if we preliminarily take the answer as just the sum of the rectangles then we know this answer is wrong.  It is low by just the amount that the triangles add up to.  All this seems needlessly complicated.  And for the toy problem we are considering it is.  But all will soon become clear.

What we have is a tentative answer (the sum of the rectangles) plus some amount of error (whatever the triangles add up to).  How much error?  Well, let's turn those pesky triangles into squares that completely contain the triangles.  We can add these squares together easily because there are 100 of them and each of them is 1 inch by 1 inch.  So we know that our tentative answer is within 100 square inches of the correct one.  It looks like at this point we really haven't gotten anywhere but appearances are deceiving.

Let's go through the same process again.  But instead of using 1 inch columns let's use half inch columns.  We are going to end up with a bunch of rectangles that are a half an inch wide and some number of inches high.  That should make it harder to figure out how much they add up to.  But let's assume there is some procedure for figuring this out and ignore it.  Instead let's focus on the error.  And let's again do the same thing where we turn the triangles into squares.  Now we have 200 of them.  That sounds bad.  But each of them is half an inch on a side so each of these new squares has an area of only a quarter of a square inch.  Our maximum error has gone from 100 square inches to 50 square inches (200 squares times a quarter of a square inch per).

Now let's cut the width in half again so it is a quarter inch.  The result is that our maximum error is again cut in half to 25 square inches.  Okay we are now where we need to be.  Let's keep halving the width over and over and over.  Every time we do we cut our maximum error in half.  We can keep doing this almost forever.  If we do it forever we end up with a width of zero and we run into "divide by zero" problems.  So instead let's keep doing it almost forever.  The width keeps getting smaller and smaller but we assume it never quite makes it to zero.  This "as small as we want it but never exactly zero" is called an "infinitesimal".  And Newton used the word "fluxion", a word that obviously did not catch on, for what we now call an infinitesimal.

And this business of slicing things up more and more until you get to an infinitesimal is called "taking the limit".  If we can drive the maximum error below any arbitrary number no matter how small then we can use our technique for adding together all those rectangles to get an answer that is for all intents and purposes exactly the correct answer.  And that's Integral Calculus.

Differential Calculus works along the same lines.  You find an "approximate" method for calculating the tangent along each point on the line.  We again have an unspecified way (in the sense that I am not going to specify it) of coming as close as we want to the actual value of the tangent in a manner similar to adding up all the rectangular columns.  Then you slice things finer and finer until you are down to infinitesimals (but not zero).  If there is such a method then that's the method for "differentiating" whatever equation you stared with.

There is lots of fine print, ways for this to go wrong, but mathematicians have figured out the characteristics necessary to guarantee that an equation can be integrated or differentiated.  And now I'll let you in on the other big Calculus secret.  It's all a bag of tricks.  If an equation has this certain form then this trick works.  If it has this other certain form then this other trick works.  There are lots of tricks for handling equations that have lots of forms.  And this allows Calculus to solve for areas or slopes of all kinds of curves.  And both Calculus and Analytic Geometry are easily extended to handle more than the two dimensions (i.e. "X" and "Y" and "Z" and maybe "W" and . . .).  But I am going to save you from all that by not talking about it.

This business of limits and infinitesimals preceded Newton.  But what Newton brought to the table was a bunch of new tricks that allowed this limits/infinitesimals business to be used to solve much more difficult problems than his predecessors had cracked.  He needed his Calculus, his bag of tricks, to be able to perform the calculations necessary to reach the solutions he needed.  He was a very smart man.  And scattered all through Principia are tricks for handling this, that, or the other kind of equation.  So that's one of the things that is in Principia.  But this whole Calculus business was a means to an end for Newton.  So what were some of the ends?

Well, here's another digression.  But this time it's not my digression.  It's Newton's.  The first two sections of Principia (it is divided into three main sections) were strictly mathematical.  He says in effect "let's ignore the real world for a while and for the moment just assume that a certain mathematical formulation is how the world works and use that to figure out what things would look like".

So he assumed that things worked like he thought gravity worked.  He then went on to show that planetary orbits would be ellipses (or in some cases circles or parabolas).  But then he said that things would work this other way if a different mathematical formulation was assumed to be how the world worked.  That gave him the mathematical foundation to explore alternatives to his description of how Celestial Mechanics worked. And he was very thorough.  He explored mathematically a number of different models.  He concluded in section 3 that his formulation matched how gravity worked in the real world and that the alternatives did not.

And there is a reason it is called "Newton's theory of universal gravitation".  In Newton's time conventional wisdom held that there was one set of rules for earth and things near the ground (birds, mountain tops) and a different set of rules for the heavens (the sun, moon, planets, stars, etc.)  Newton said "nope".  Gravity works the same everywhere.  He calculated what the force of gravity should be between the earth and the moon.  He showed that it was the same (except that it diminished with distance exactly as he described) gravitational force was just right to keep the moon circling the earth.  He did the same thing for the sun and the planets.  Then he did the same thing for the moons of Jupiter and Saturn and for comets.  That completely demolished the idea that there were different laws for near earth and up in the heavens.

But he went further, much further.  He calculated how much people would weigh if they traveled from the surface of the earth to the center of the earth.  Scientists has speculated that the earth was not a perfect sphere.  Newton calculated the amount the earth would be distorted as a result of the fact that it rotated once per day.  The technology of the day was not accurate enough to confirm this.  But the technology of the day was good enough to show that gravity was slightly less at the top of a mountain than it was at sea level.  Newton calculated exactly how much.

In Celestial Mechanics there is something called the "two body" problem.  What is the path of two bodies orbiting each other if only the force of gravity between the two is important.  Newton solved that.  And he tackled the "three body" problem.  Specifically he looked at a system consisting of the earth, moon, and sun.  He was able to calculate the "perturbations" caused by the effect of the sun on the orbits around each other of the earth and the moon.

As a corollary to this investigation he was able to explain the tides.  They are semi-periodic.  He showed that the moon's gravitational pull on different parts of the earth caused a "lunar" tide (the larger) and the sun's gravitational effect on different parts of the earth caused a "solar" tide (smaller but significant).  They would move in phase and out of phase resulting in the complex pattern of high tides and low tides we see.  This was the first successful theory of tides.

He also tackled the mutual perturbations of Jupiter on Saturn's orbit and Saturn on Jupiter's orbit.  Before Newton no one had even tried to do that.

He investigated the motions of pendulums.  It turns out that pendulums are a great way to accurately measure the force of gravity.  They had been used, for instance, to investigate the force of gravity on the tops of mountains (see above).  He also put forward mathematical formulations of drag, how bodies moved through fluids like air and water.  These were pioneering studies.  They were not completely successful (we now know that the situation is quite complicated).  But he was able to show by his theoretical work and also by some experimental work that a number of then current theories were wrong.

And, of course, there was the business of winning arguments.  As I mentioned in the previous post, Descartes had put forward a "vortex" theory of gravity.  Newton's work on the orbits of Jupiter, Saturn, and especially comets, knocked the theory to flinders.  But Newton was nothing if not thorough.  So he did a mathematical treatment of vortexes (that was one of the mathematical alternatives he explored) that showed that vortexes did not work for mathematical reasons.  But that was still not enough.  Destroying the vortex theory was one of the reasons he explored drag.  For vortexes to work there had to be some kind of fluid.  That is how the vortex associated with one body influenced the path of another body.  But his mathematical study of drag indicated that there was no way to get vortexes to create the right amount of drag to exert the right amount of force in the right place to make each body move in the way it was observed to move.

Then there was the "epicycle" theory, the traditional method of dealing with the orbits of planets.  Originally the idea was that the planets moved in perfect circles.  But that didn't work.  So the idea was you had a primary circle.  Then there was a secondary circle.  The planet was attached to the secondary circle which was in turn attached to the primary circle.  That didn't quite work either so eventually there were models involving large numbers of "epicycles" (these secondary circles) all connected together in a complex way.  Newton developed mathematically a celestial mechanics of epicycles.  This was yet another of the alternative mathematical systems he explored.  Then he showed that the planets just didn't move according to his epicycle mechanics system, no matter how you arranged the epicycles.

Finally let me finish up with some more general observations.  Principia is very hard to read.  That's why I skimmed most of it just deeply enough to see what was being discussed.  Besides the reasons I have already discussed there is the "terminology" problem.  A lot of what Newton was dealing with was completely new.  So he invented terminology that got replaced with other terminology in the intervening centuries.  For instance what he called fluxions we now call infinitesimals.  That's pretty straight forward as our modern term means the same thing as his term.

But Newton introduced the idea of mass.  Mass is an inherent property of matter.  It is the degree with which it resists being accelerated by a given force.  Newton never did just introduce a word like "fluxion" for this concept.  So he ends up talking around what he is at in a way that is confusing.  To a greater or lesser extent the same is true with "inertia", impulse", "momentum", energy", "work", and others.

These are terms that now have well established definitions and usages.  But in some cases Newton was dealing with the concept for the first time because he had invented it.  In other cases had an unclear or incomplete idea of what eventually ended up to be the modern concept.  This made him understandably sloppy with usage.  In some cases he didn't even know that the concept might exist.  This makes it hard to follow a lot of his arguments.  You have to go through a process of translating what Newton says into what he means.  Then you must make allowances for the fact that he did not have a clear idea of how the concept worked.  If Newton had understood these concepts the way we now do he could have laid out his logic much more quickly, more easily, and in a more understandable way.

I ended my previous post with a note on Newton and theology.  If you take the whole of his life into account theology was more important and played a bigger part than his scientific work.  The same is true of his great rival Descartes.  The scientific work we now remember Descartes for was also done in his youth.  But he too spent a much greater part of his life on theological issues.  I can't speak to Descartes's theological ideas.  But I can speak to Newton's.  He spent a little time discussing them at the very end of Principia.  But he also went into them at some length in Optics.

Newton believed there was two kinds of truth.  There was what we now call scientific truth, what was then called natural philosophy.  Then there was religious truth.  Newton's belief was that these two kinds of truth were not antagonistic but complementary.  Combined appropriately, they resulted in a kind of super-truth that was more powerful than either of them standing on its own.

That was the main theological problem he spent his time on.  He could see that "old time religion" theology just did not work.  So he tried to tweak mainstream theology to produce something that retained the important parts of mainstream religious thinking but resulted in something that was compatible with science.  I don't know to what extent he thought he succeeded.  What I do know is that Newton's scientific work survives and is hugely influential.  But his religious ideas have vanished without a trace.  And one reason I know nothing about Descartes's religious thinking even though I am familiar with his scientific work is because his work on theology vanished without a trace soon after his death too.


Friday, August 4, 2017

Principia - part 1

This post can be seen as a continuation of my recent "Ground truthing" post (see http://sigma5.blogspot.com/2017/06/ground-truthing.html).  In that post I mentioned several foundational documents from the history of Science.  Philosophiae Naturalis Principia Mathematica by Sir Isaac Newton is perhaps the most important one.

James Burke, a BBC TV host, science popularizer, and creator of the fantastic "Connections" TV series said this about it:
The Principia provided such an all-embracing cosmological system that it stunned science into virtual inactivity for nearly a century.
It also kicked my butt, completely and utterly.  In "Ground truthing" I talked about my experience with Galileo's proof that the path of an artillery ball is a parabola.  I was forced to confess that "it turned out to be hard.  I never did really figure it all out".  Principia is literally proof after proof after proof, all of them as hard to handle as Galileo's proof or, in some cases, harder.  And there are hundreds of them.  One reason for this is that Newton's proofs are in the same geometric style as the one Galileo used.  But on top of that Newton added in other elements that make his proofs hard to follow, elements like a primitive version of Calculus.

I am not going to inflict any of that on you.  Instead I am going to do two posts on Principia.  In the first post (this one) I am going to talk around it.  I am going to give some historical perspective, background, and observations on what was going on and how Principia fits into the bigger picture.  In the second post I will go into what is actually in Principia.

And, of course, I didn't actually read Principia.  It's in Latin.  What I read was a book with a very long and convoluted title.  In full it is:  ISAAC NEWTON - THE PRINCIPIA - Mathematical Principles of Natural Philosophy - THE AUTHORITATIVE TRANSLATION.  The author credit is "by I. Bernard Cohen and Anne Whitman assisted by Julia Budenz".  This book also includes A Guide to Newton's Principia by I. Bernard Cohen.  The Guide runs 370 pages (Principia itself is 575 pages long) and is critical to making any sense of Principia.  This book came out in 1999.  Before it came out the only English translations of Principia were either one done over 250 years ago or a "modernized and revised" version of the 250 year old translation that came out in the 1930's.  Let's start with a little history.

Newton was born in 1642 and died 84 years later.  He was born into a minor British noble family.   At that time science was done by "gentlemen of leisure" who didn't have to earn a living.  They did science as a hobby or for the betterment of mankind rather than as a paying job.  This was an accurate description of Newton.  He got a degree from Cambridge University in 1665 and promptly headed for home because the Great Plague was sweeping the country. And it was especially dangerous in urbanized areas like the town and University of Cambridge.  During the next two years, and for perhaps some time after that, he did his seminal work in science.

This included inventing Calculus, careful studies of Optics (the properties of light) and what we would now call Celestial Mechanics, understanding the laws that govern the motion of heavenly bodies.  He then returned to Cambridge where he was elected a member of Trinity College and took over the "Lucasian" chair in mathematics there.  As a result of Newton's association with the chair, holding the Lucasian chair is now considered the most prestigious position a mathematician can hold.  Stephen Hawking, the renowned physicist, held it for thirty years and only relinquished it relatively recently.

But in general Newton fairly quickly moved away from science and mathematics and pursued other interests.  He had a long standing interest in Alchemy and was considered an expert.  He also spent more time and energy on theology than he ever did on science.  Again, in his time he was considered an expert on the subject.  He also moved into politics.  He served two different terms in Parliament and was appointed Master of the Royal Mint, a political appointment.  He did maintain a connection with science to the extent of spending more than two decades as president of the Royal Society.  But he spent most of his time and energy at that point on politicking and little on doing science.  Like many men of his time, he did almost all of his significant scientific work while he was a young man.

Although he had done the underlying work years ago he didn't publish Optics until 1704.  There was also a substantial delay between when he did the work and when he published Principia.  The first edition came out in 1687.  A revised edition came out in 1713 and a second revision came out in 1726, shortly before his death.  Why the delays?

The short answer is that he had to be goaded into publishing.  He was very secretive.  The best way to get him to publish was either to tell him that someone was about to publish something Newton thought he had invented or to tell him it was long past time to score points against one or more of his rivals.

The rules on who got credit for a scientific advancement were just being worked out in Newton's time.  The modern "he who publishes first gets the credit" method came about later and was partly a response to the damage done by feuds involving Newton.  He was a glory hog.  Much of his glory was justified.  He was a great mathematician, a great theorist, and a great experimentalist.   Consider that he invented Calculus (mathematics), invented "Newton's law of Gravity" (theorist), and using the results from a brilliant set of experiments laid much of the foundation for the study of light and optics (experimentalist).  Nevertheless, he was stingy when it came to giving credit to others and greedy when it came to taking the credit (often singular credit) for himself.  Consider Calculus.

Calculus was in the air when Newton invented it.  If is based on two then recent developments in mathematics, the study of "infinitesimals" and the study of "limits".  Both of these concepts were developed by others and Calculus couldn't exist without them.  But Newton applied them effectively, developing a number of important methods for doing Differential and Integral Calculus.  So did Leibnitz, a German mathematician.  Both developed Calculus at the same time and both did it pretty much independently of each other.  And Leibnitz did the better job.  The Calculus we use to day is the Leibnitz version.  Some corners of engineering cling to the Newtonian version but no one else does.

So the real story was that it was in the air.  Both Leibnitz and Newton got to it at about the same time but Newton was "all in" to make sure he got all the credit and Leibnitz got none.  Since by this time he had the political connections, which he made adroit use of, for a long time he got all the credit, at least in the English speaking world.  A fairer reading of how much credit should be allocated to who had to wait until many years after his death.

He was also involved in a big fight with the Descartes of "I think therefore I am" fame.  Newton was not interested in Descartes' philosophical musings.  What he was interested in was his theory of gravity.  It was based on something called vortexes.  If you haven't heard of this before it is because Newton totally destroyed Descartes' vortex theory in Principia.  Newton got and richly deserved full credit for this.  In this case he beat Descartes fair and square.  And he did such a good job of it that today only people interested in Descartes or in the history of science have even heard of the vortex theory of gravity.

But there was a down side.  Descartes invented Cartesian Coordinates and what we now call Analytic Geometry.  Because he was having a fight with him, Newton wouldn't even consider using the methods of Analytic Geometry in Principia.  That makes what he did far harder to follow.  And the techniques Newton deployed in Principia are extremely hard to use.  The same technique done using the methods of Analytic Geometry are much easier to understand and much easier to actually make use of.  And the ultimate irony is that Calculus and Analytic Geometry go together like ham and eggs.  One big reason we now use the Leibnitz version of Calculus is because he had no problem with Analytic Geometry.  And it was easy to adapt his methods so they could be seen as extensions to standard Analytic Geometry methods.

In modern Calculus we characterize Integral Calculus as a method for "finding the area under a curve".  How do we define "area"?  We do it in terms of Analytic Geometry concepts.  What does "under the curve" mean?  Well, you set the problem up using the methods of Analytic Geometry and . . .  The same thing is true with Differential Calculus.  Here we want to find "the slope of a curve at a given point".  These are all concepts that are fundamental to Analytic Geometry.  In short the methods Newton demonstrated in Principia ended up being put to practical use in an Analytic Geometry context and often using the Leibnitz form of Calculus.

And it is perhaps worth noting that Newton succeeded in achieving the results he did in spite of the fact that he was using ill suited geometric methods not because of it.  That makes his achievement all the more impressive.

Let me give you another example of how petty Newton could be.  At the time the Royal Navy had a serious problem called the "longitude" problem.  They needed a method for determining the longitude of a ship so it could avoid crashing into rocks or getting lost when visibility was poor.  The British government put up a 20,000 pound prize (equivalent to millions today) for the first person to solve the problem.  And the problem boiled down to figuring out how to construct a portable high precision clock.

It was solved by John Harrison who invented the "chronometer", a precision pocket watch.  Newton was on the committee in charge of awarding the prize.  Harrison jumped through all the necessary hoops to prove his device actually worked.  But Newton didn't want to give him the money.  Why?  Because he was a regular bloke and not a member of the nobility.  That's just petty.  Harrison had done the job and the money was not coming out of Newton's pocket.

So much for Newton.  I am now going to move on to Principia itself.

It was published at time of transition.  Descartes' Analytic Geometry was such a big improvement over the previous geometric method that science in general quickly abandoned the latter for the former.  The mathematics of science quickly started looking modern rather than ancient.  So in a sense Newton lost that battle as well as the battle over the mechanics of how Calculus should be done.  But he won the war.  To this day scientists speak of "Newtonian Mechanics".  This is shorthand for any system of mechanics that ignores Relativity and Quantum Mechanics.

In a very real sense Newtonian Mechanics describes how the world of ordinary distances and speeds works.  Quantum Mechanics is quite different from Newtonian Mechanics.  But for the most part Quantum Mechanics deals with the world of the very small, the world of atoms and subatomic particles.  Relativistic Mechanics (mechanics that includes Relativity) is quite different from Newtonian Mechanics.  But for the most part Relativistic Mechanics deals with the very large, the world of stars and galaxies, and things going very fast, speeds near light speed.  The world we mostly live in, the "middle distance" world, works pretty much the way the mechanics Newton invented says it works.

One thing puzzled me until I read the book.  Why did it have such a special place?  The answer goes back to those hundreds of proofs.  Newton supplied answers and methods for everything.  He didn't just prove the main thing.  He provided methods for calculating this, that, and the other thing.  He considered things from multiple angles.  He pretty much said all there was to say about the subject areas he delved into.  So in case after case after case he reduced the situation to "But -- oh wait.  He covered that too."  His analysis was so broad and complete that there was little left to add.

The result was that when it came to calculating the orbits of heavenly bodies or how to navigate space probes, for the most part it is all in Principia.  It took more than a hundred years for the state of the art of telescopes and other instruments for observing the heavens to get precise enough to find something that wasn't where it was supposed to be.  It turns out that the orbit of Mercury is not exactly where Newton said it was supposed to be.  The discrepancy is very small.  And this "Mercury problem" was one of the problems that Einstein's Relativity theories ended up solving.

The state of the art has gotten a lot better since.  So it is now relatively easy to find situations where Newtonian Mechanics gives an answer that is just a little bit wrong.  The signals from GPS satellites include a relativistic correction. Without it GPS receivers would spit out "you are here" answers that are slightly wrong.  Back in the day navigation errors could cause ships of the British Royal Navy to crash into rocks in the fog.  We live such high precision lives, often without realizing it, that there are some situations where Newtonian Mechanics gets an answer that is dangerously wrong.  If left uncorrected it is big enough to, for instance, cause a car to crash into a lake or an airplane to miss its runway.

These tiny errors are often detectable now because we can measure billionths of a second and distances far smaller than the diameter of a Hydrogen atom.  In Newton's day they couldn't measure anything accurately enough to reveal any difference between the answer Newtonian Mechanics gives us and the one the modern theories would.  And Newtonian Mechanics is far simpler and easier to deal with than Relativistic Mechanics or Quantum Mechanics.  And most of the time in most situations Newtonian Mechanics gives us an answer that is accurate enough for our day in day out needs.  And that's why scientists and engineers still study Newtonian Mechanics.

Mostly what Principia did was be convincing.  It was easier for scientists of the period to deal with it because they were familiar with the "geometric" approach Newton used.  And Newton was completely honest.  He decided to write the book with the expert in mind.  It is the exact opposite of a "for dummies" book.  That made it completely inaccessible for normal people of the day.  But it also made it completely convincing to the experts of the day.  And they were quick to accept it and to embrace it.  And they convinced the normal people of the day that what Newton said in Principia was true.

This was important for reasons that people who live today can understand.  Newton had some things to say that went against many beliefs that were strongly held at the time.  It helped that Newton was widely seen as a "good Christian" and a theologian of note.  Because some of these beliefs were based in religious orthodoxy.  It is important to remember that not long before this Galileo had gotten into very serious trouble for saying things that the Church did not approve of.  But Newton got little or no push back about this sort of thing.  And one reason was the near universal opinion of experts that the constructs Newton laid out in Principia were convincing and compelling.  Back then experts were seen more as experts and not people pushing one agenda or another.  Sadly, this is not true today.