Friday, September 12, 2014

The Great Internet Disaster

You remember the Great Internet Disaster.  You don't?  Well, I do!  No actually I don't for the good and substantial reason that it didn't happen.  But pretending it did for a moment makes for a much better title.

This is actually a "good news" story, a story about what didn't happen.  But, with the exception of a few religious types, people who are trying to attract and keep your attention avoid good news like the plague.  And, while we are on the subject of plague, have your heard the latest about Ebola?  See!  It works.  I have now thrown in two phony disasters and you are paying attention.  The one in the title is completely phony.  The other one is real but it only affects those people way over there in Africa.  We don't care about them so the media has to pretend that it is about to strike our shores any second now.  All right.  Enough with the phony stuff.  I promise the rest of this post consists of 100% actually true stuff.

So what is "The Great Internet Disaster"?  Well, what if you tried to connect to your favorite web site and you couldn't?  What if you couldn't connect to anything?  And I'm not talking about some kind of temporary outage.  I am talking about forever.  Wouldn't that be a disaster?  It would.  And it could have happened.  Officially, it could still happen but I think it never will.  But a lot of people have spent a lot of sleepless nights worrying about it.  And people have been tracking the problem for almost two decades now.  And the thing the people worried about and tracked actually happened.  So what is "it"?

It's technical, not complicated but technical.  The Internet started out many decades ago as the ARPANET.  It was a research project sponsored by the Federal Government.  Back in the dark ages it was possible to connect computers together.  But there was no agreed on standard way to do this.  Each company came up with their own scheme.  Then they took out patents and copyrighted stuff and kept trade secrets and did other things, all to make sure some other company couldn't use whatever they came up with.  So you had all these different methods.

Then you had IBM.  IBM didn't have one method.  They had three or maybe more but at least three.  They even had an acronym for each.  Then had BTAM, and TCAM, and VTAM.  Ok.  All the acronyms had four letters and all of them ended with "AM", which stood for Access Method, but that was pretty much it.  Even though all three methods were developed by IBM for use on IBM equipment they didn't talk to each other.  So a large company could find itself in a situation where some of its IBM equipment couldn't talk to other IBM equipment even though it was all made by IBM and all owned by the same company.  So getting an RCA computer owned by one company to talk to a GE computer owned by another company was pretty much impossible.  And yes, both RCA and GE used to custom design and build computers that they sold to other companies, kind of like Apple does today.  Anyhow, this whole business of getting computers to talk to each other was nearly impossible then.  But the government, or at least the military part of the government, thought it was important to figure out how to get all kinds of different computers to talk to each other.  So they funded the ARPANET project.

Computers are basically really fast math machines.  At bottom all they know is numbers.  They can handle text and pictures and lots of other things now.  But the trick is to turn everything into numbers.  So there is a number that means "A" and there is a number that means a specific shade of Yellow, and so on.  So it stood to reason that one of the things you want to do if you are trying to get computers to talk to each other is to assign each computer a number.  Then computer number 17 could send a message to computer number 37 and off we go.  At the time the ARPANET project was active each computer cost millions of dollars.  So the way the ARPANET people set things up they allowed for as many as 64 computers to be hooked up to their network.  And it worked.  The ARPANET project got a bunch of computers to talk to each other.  And it almost immediately became a victim of its own success.  Everyone wanted to hook their compute up to the ARPANET.  And when more than 50 computers were hooked together and there was still a long list of other computers people wanted to add it was time for the ARAPNET equivalent of "Huston, we have a problem".

The number "40" appears in the Bible a lot.  Have you ever wondered why?  Well, back 2,000 years ago 40 was a really big number.  Most people did arithmetic by counting on their fingers.  So if, for instance, you had a lot of sheep after a certain point it became very hard to keep track of exactly how many you had.  The cutoff for how many was "so many it's hard to count them all" was more than 10.  And I'm sure it varied from person to person.  But 40 was definitely too many for almost everyone.  So 40 was used for "so many it's practically the same as infinitely many".  So whenever anyone wants to say the ancient equivalent of a gazillion they say 40.  In the initial design of the ARPANET 64 seemed like a gazillion.  But it obviously wasn't.  So the ARPANET people started working on ARPANET v2.

Specifications, as we now know, evolve and keep evolving.  We are now used to the iPhone 6 and Windows 9 and so on.  So it's no surprise that the specification for this computer number thing have gone through a number of versions.  But for a long time it has been something called IPv4.  Now the ARPANET people really wanted to fix the whole "size of the computer number" once and for all.  But they had to pick some specific size because that's how computers work.  So they looked around for a number that, while not actually infinite, was so big that the problem would be permanently fixed.  They looked for the practical equivalent of a gazillion.  And they settled on a number slightly larger than 4 billion.  They made the size of the field for the computer number 32 bits long.  And that allowed for over 4 billion different numbers.  Now when they did this computers were fantastically expensive.  So 4 billion really did sound like a gazillion.  And everything was fine until the middle of the '90s.

When Microsoft was designing what eventually became Windows 95 it seemed like a good idea to include the capability for connecting computers to servers using phone lines and an interface device called a modem.  So Microsoft put in a bunch of features to support this.  And initially that sounded like it would get the job done.  But a few years earlier a Senator named Al Gore had sheparded a piece of legislation through congress and into law that turned the grandson of the ARPANET, something called NSFNET, into the Internet.  This was the final step in an evolution from a government sponsored network for connecting a few military computers together to a public network that anybody could connect to.  And it didn't matter what brand of computer you had.  A lot of money and talent had been invested over the years in figuring out how to hook pretty much any model of any brand of computer to what was now the Internet.  And people started asking "can I hook my Windows 95 computer to the Internet?"  Microsoft did some quick work and by the time Windows 95 debuted in August of that year the answer was "yes".  And, of course, once Microsoft did it then Apple and everyone else had to do it too.  So within a couple of years every new PC sold had the ability to connect to the Internet.

Back when computers cost millions of dollars 4 billion did seem like a bazillion, way larger than the number of computers anyone could imagine would be built.  And things were still ok when Windows 95 came out.  Computers still cost thousands of dollars and most of the world couldn't afford them.  But it started people worrying.  And there was another problem.  You couldn't practically use all 4 billion of those addresses.

To manage the administration of these addresses, now called "IP Addresses", the people who ran the Internet (First the Department of Defense, then the National Science Foundation, finally a group of non-profits) set up a process where an organization could get control of a "subnet", a set of IP addresses.  These subnets came in 5 classes:  "A", "B", "C", "D", and "E".  If you filed the proper paperwork and it was approved you were given what amounted to ownership of a specific subnet of the appropriate class.  Once you were granted control you could do whatever you wanted with each specific address in your subnet.  That way the administrative group only needed to keep track of the classes.  And this whole system worked fine for decades.  So what was the system?

Let me start with the weird classes.  Class "E" was for research.  Class "D" was used for broadcast.  Neither of these classes is used much.  But there are about a half a billion IP addresses reserved for each.  So between the two classes about a billion IP Addresses are off limits.  The classes everyone used were "A", "B" and "C".  They were the same except for size.  A class A subnet is about 16 million addresses (2 to the 24th power).  A class B subnet is about 65,000 addresses (2 to the 16th power).  A class C subnet is exactly 256 addresses (2 to the 8th power).  The idea is giant companies or organizations would get a class A.  IBM and AT&T each have one.  The Federal Government has several.  Medium sized companies and organizations would get a class B.  In some cases they were allowed to get several.  At one time Microsoft controlled several class B subnets.  Finally, of course, small companies and organizations would get one or more class C subnets.  The company I used to work for controls 5 class C subnets.  In round numbers, the space reserved for all class A subnets constituted about 2 billion IP addresses.  The space reserved for all class B addresses constituted about a half a billion addresses.  The space reserved for all class C subnets constituted the remaining half billion IP addresses.

Up until the late '90s everything was fine.  But by this time a double whammy was becoming apparent.  Transitioning from a semi-open network run by the U.S. Government to a fully open network, the transition from NSFNET to the Internet, caused the list of people, companies, and organizations interested in connecting to the Internet to skyrocket.  Added on top was the fact that a relatively inexpensive "network card" could be added to a relatively inexpensive "PC" meant that it was financially possible to add a lot more computers to the Internet than it had been at any time in the past.  So Internet connections skyrocketed and the rate at which IP addresses were being consumed also skyrocketed.  This was a cause for concern.  But the fix was relatively simple.

For most organizations a class C was fine.  But there turned out to be a lot of examples where it really did not work.  My company with 5 class Cs was ok.  That was a manageable number.  But what about 10 or 20 or 50?  That got unwieldy.  The obvious solution was to get a class B instead.  But this was the equivalent of 256 class Cs.  If an entity actually needed 50 class Cs but got a class B then 80% of a class B would be wasted.  The same was true of class Bs.  For some companies (Internet Service Providers - ISPs for short) one or a few class Bs were unwieldy.  So they would go for a class A.  But again a class A was 256 times bigger than a class B.  So if you actually needed 50 Bs 80% of the A would get wasted.  And frankly an A was a prestige symbol.  So a number of big companies got an A just because they could.  Ford Motor Company owns an A.  This probably makes sense.  It's a big company and it is spread all over the world.  But Halliburton owns one too.  I find it hard to believe that Halliburton couldn't get by with a few Bs.  And then there's big Pharma.  Eli Lilly has one as does Merck.  I think several companies got an A became they had enough political, financial, and technical clout to get one not because they really needed it.  So the system was being abused.

The fix was to offer more options.  Instead of the simple A-B-C system a system called CIDR was introduced to replace it.  A bunch of subnet sizes that were smaller than an A but bigger than a C were now available.  This meant that someone who needed 10 Cs could get a "CIDR block" that was equivalent in size to 16 Cs.  (It's that whole power of 2 thing).  A company that needed 20 Bs could get a CIDR block that was equivalent in size to 32 Bs.  This immediately took a lot of pressure off the system.  Companies and organizations could get a CIDR block that was reasonably close to the number of actual IP addresses they needed.  So the rate at which IP addresses were eaten plunged.  Problem solved.

And it was for a while.  An "Internet ready" PC was still fairly expensive.  And the way most people at home connected up was by dialing in to AOL or some other ISP.  AOL could pool IP addresses.  It could give an IP address to you temporarily while you were connected.  But it could take it back when you hung up and give it to someone else who dialed in later.  So the pressure from home use was manageable.  Work PCs were still a problem.  But the introduction of CIDR took a lot of the pressure off here too.  Things looked pretty good until smart phones came along.  When smart phones came along it became possible to put a browser on the phone.  It was a selling point so the mobile companies did it.  I have a crap cell phone and even it has a browser on it.  It doesn't really work but it would if I had a better phone.  And smart phones are cheap enough so that they are not just popular in the U.S. or maybe Europe.  They are popular all over the world.

The mobile carriers use the same trick the ISPs use of sharing IP addresses.  But there are 7 billion people in the world.  And about 3 billion of them own a mobile phone.  Most of those phones are not currently web capable but they soon will be.  The advent of smart phones cranked the rate of IP address consumption back into the stratosphere.  And people started seriously worrying about the problem.  People had already started worrying.  Serious stories predicting that we would eventually run out of IP addresses started appearing in technical journals in the '90s.  One of the first credible predictions of when we would actually run out was made in 2003.  The prediction was that we were good for ten to twenty years  But another prediction made in 2005 said we only had "4 to 5 years".  What happened in the mean time was that people could see where smart phones were going.  In 2010, 5 years after the prediction, we were still ok?  What happened?

CIDR was a "lifetime extender".  It worked.  And a second lifetime extender called NAT was introduced.  NAT is IP address sharing technology.  I have a home network with several computers on it.  All of them access the Internet.  I also have some smart devices like my TiVo DVR and my Blu-Ray DVD player, both for my TV.  Both of them need an Internet connection some of the time.  But Comcast, my ISP, only allocates one IP address to my house.  The details are complicated but the bottom line is that NAT works.  All of the "Internet capable" devices in my house can access the Internet because NAT allows my one IP address to effectively be shared by all of them.  In fact, hundreds of computers can use NAT to share one IP address for internet access.  Everything is kept straight and each computer can access what it needs without interfering with any of the others.

At one time my old company was using about half the 1,200+ IP addresses the 5 class Cs represented.  Now the same company has lots more machines, each of which has its own separate IP address.  The NAT setup allows hundreds of machines located on the internal corporate network to independently address hundreds of servers scattered across the Internet.  If pressed, my old company could now get along just fine with a small part of a single class C.

So CIDR and NAT were lifetime extenders.  But ultimately they both failed.  The Internet has run out of IPv4 addresses.  The event was officially commemorated on February 2, 2011.  For three and a half years (as I write this) we have been out of IPv4 addresses.  That should have resulted in the Great Internet Disaster.  New servers are put up each day.  Without IP addresses for these servers no one would be able to connect with them.  New PCs (not as many as in the old days but still lots) get hooked up to the Internet each day.  There should be no IP address to give them so they should not be able to connect to anything.  So what happened?  Am I conning you about this whole "we are out of IP addresses" thing.  Remember, I promised I would tell the truth after the part where I got your attention.  And I am.  We are officially out of IPv4 addresses and have been for years.  I also promised that the Great Internet Disaster was never going to happen.  So let move on to the permanent fix next.

As I indicated above, CIDR and NAT are extenders.  They are not the permanent fix.  And it should not be hard to figure out what the permanent fix should be:  Make the number bigger.  That's what they did before.  They changed the number of computers that could hook up to the network from 64 to 4 billion.  How about doing the same thing again?  There was some back and forth on the details but that is exactly what was done.  There is now a new thing called IPv6.  (Something that got called IPv5 came and went without much of anyone noticing so they had to tick the "v" number up to 6.)  Again,the idea was to come up with a gazillion-like number, a number so big that it would effectively act like infinity.  Ipv4 addresses require 32 bits.  They went with four times as many bits for IPv6.  The new "gazillion" is 4 with 38 zeroes after it.  That's enough addresses to individually label every subatomic particle in every atom on the surface of the earth.  In the same way that going from 6 bits (64) to 32 bits (4 billion) seemed like going to a number much larger than anyone could imagine using up the hope is that by going from 32 bits to 128 bits the same "more than anyone can imagine using" effect will be achieved.  The specifications for IPv6 were developed in 1995 and fine tuned over the next couple of years.  There is a general consensus that this is big enough.  We have had the fix in hand since 1995 (or 1998, if you want to wait for all the tweaks to be completed).   So why am I talking about a "problem" that supposedly has been fixed for more than a decade?

Well, there's the whole "transition" problem.  When ARPANET was moving from a 6 bit address to a 32 bit address there were less than a hundred computers involved.  These were all located in universities with capable Computer Science departments or at government research establishments or the like that were stuffed with computer whizzes.  They all just set to work updating everything to conform to the new rules.  It didn't take long to get everything cut over.  Remember, the ARPANET was a "research and development" project at that time.  It was annoying if some part of it stopped working for a day or so but it was not the end of the world.  The Internet of today is a whole different kind of animal.  You just don't take the whole thing or even large parts of it for a couple of days so you can upgrade something.  That makes it harder to roll something like IPv6 out.

Let's start with the "not a problem" items, items that you might think would cause problems but actually didn't.  Changing from IPv4 to IPv6 required no hardware changes.  The new stuff would work just fine on the old hardware.  In fact, you could run IPv4 messages and IPv6 messages through the same components at the same time.  One type of traffic would be invisible to the other but both would work just fine.  The next issue was plumbing.  For instance there is a system called DNS.  It's the system that turns a computer name like WWW.WIKIPEDIA.COM into a number like 198.35.26.96 (the standard way of representing an IPv4 address as numbers).  Technical changes were made to how DNS was supposed to work and updates were made to DNS software.  Once that was done then the DNS system could handle any combination of IPv4 and IPv6 requests.  Similar technical changes and software changes were made to the other parts of the Internet.  Everything was supposed to be interoperable so making the changes to support IPv6 didn't stop IPv4 from working.  And not everybody needed to make their changes at the same time.  It was just necessary that everyone make the changes.  And they did.  Everything has been in place to support IPv6 for years now.

Then there was the "small problem" issue.  Software changes needed to be made to computers so they could deal with IPv6.  This was harder but not impossible.  All this networking stuff goes through some software called a "protocol stack".  Again its complicated but again the technical details are not important to us.  The best solution in most cases was to go to a dual stack.  One stack would handle the IPv4 stuff and the other the IPv6.  The routing through the right stack could be done automatically by the stack software.  So vendors like Microsoft had to write some more software for the new stack but that was it.  Not all old systems can handle IPv6.  But if you bought a new PC any time in the last few years everything is built in and ready to go.  There is still a lot of old systems that can't handle IPv6 but everything else does.  So Apple and Microsoft and the Linux people needed to do some work.  But they finished that up several years ago too.

That leaves the big problem.  Let's say you have a fancy new computer that is all set up to handle IPv6.  But you want to access a web site that is IPv4.  What's the problem?  Since the Iv6 address is so much bigger you just carve out a slice and say "if I put this IPv6 address in I really mean this other IPv4 address".  That's easy to do and the message you send off should get to where it is supposed to with no problem.  But communication to a web site is bidirectional.  You send a request to the web site.  It sends back a response (the thing that ends up on your screen when you click on the button).  How does this get back to your computer?  That's the problem.  There are way more IPv6 addresses (that's the point) so you can't just do the reverse of the translate thing we did to get from the v6 address to the v4 address.

People have come up with a bunch of ways to handle this problem.  But none of them work very well.  If you are a web site you can just have an IPv4 address and an IPv6 address for your site.  That means you can support requests from both IPv4 and IPv6 computers.  In fact, when I looked up the Wikipedia IP address I got two answers.  One was an IPv4 answer and the other was an IPv6 answer.  But the problem is that when you work through all the permutations what you end up with is giving everything an IPv4 address and an IPv6 address.  That doesn't solve the problem.  If you are going to do that you might as well just stick with IPv4 addresses exclusively.

Before doing the final "reveal" let me cover one more "extender" solution.  The Internet started out as a government funded project.  When it went public it went non-profit.  None of the institutions that were put in place to administer the modern Internet are for-profit companies.  There is also a strong "public interest" mind set that still runs through these institutions.  So the idea that something like an IP address should be a valuable asset that can be bought and sold went against the grain.  So none of these institutions wanted to officially get into the business of buying and selling IP addresses.  But a few years ago they gave in and unofficially sanctioned the trading of ownership of blocks of IP addresses.  The reason we haven't actually run out of IPv4 addresses in spite of the fact that we have officially run out of IPv4 addresses is that you can now sell your unused IPv4 addresses.

Remember those companies I talked about that own class A subnets.  Let's say that you are company "X" that owns a class A.  It's a hassle to figure out if you are using it.  You just might be using parts of it somewhere.  But it would take effort to find this out and effort translates into costing the company money.  So the safest thing to do is to hang on to the A and pretend you are using it for something important.  But let's say you can sell all or part of the class A for money.  Now it is worth while to check around.  So that's what's happening.  Companies that own class As or even class Bs have been quietly checking around.  In lots of cases they find they can part with all or most of the IP addresses in the subnet pretty easily.  So they do for a fee.  These unofficial sales are keeping things alive.  But this is just another of those extenders.  It is not a permanent solution.  So what is?

It turns out that those same smart phones that caused the latest wave of problems have opened the path to a permanent solution.  If you go to the Apple store you will find hundreds of thousands of applications.  Are all these applications actually unique?  Not really.  On a PC if you want to work with a business you connect to its web site.  On an iPhone (or any other smart phone) you click on the appropriate application.  The "app" has replaced the web site.  If you click on the button for the app and the right stuff comes up do you really care how the connection was made?  No!  So if an app uses IPv6 to connect to the server of whatever company you want to deal you don't are as long as it works.  The "G4" specification for mobile phone networks specifies IPv6.  Companies configure their servers for IPv6.  Then they build the app you download from the "app store" for IPv6 and you don't know the difference.  Over time all the traffic associated with mobile phones will migrate automatically to IPv6.  That relieves the pressure on IPv4 two ways.  First there is no need for IPv4 addresses to support mobile traffic for these newer devices.  But second, the wide adoption of mobile phones is causing a decrease in the use of PCs (and tablets, a PC for our purposes).  So the pressure to come up with more and more IPv4 addresses diminishes.  At some point not that many years down the road the number of IPv4 addresses we need will actually go down.  When that happens we can officially call the "disaster" I have outlined solved.

Finally, let me broaden my focus quite a bit.  Let me summarize the issue under discussion in the broadest terms.  There is a situation.  The people who understand the most about what is going on start worrying.  At some point they start making their concerns public.  Their next step is to say "something must be done and soon".  They come up with a bunch of "solutions" but these solutions are not adopted, in part because they would be disruptive.  The experts say "you MUST do something about this problem right away or really bad things will happen".  The solutions the experts came up with are still not implemented.  The bad thing doesn't happen.  So maybe we should never have listened to the experts.

That is a lesson that could be applied to the "IPv4 Address Exhaustion Problem", as what I have been discussing is technically called.  But it is the wrong lesson.  What the experts said all along was right.  And the experts did come up with the right solution, namely IPv6.  The only thing the experts got wrong was the details of how we will pull off the migration from IPv4 to IPv6.  The iPhone, the poster child for smart phones, only dates back to 2007.  During most of the time the experts were studying the problem smart phones did not exist in any significant numbers.  Also, Apple for its own reasons came up with the first successful computer "app store".  And Apple pushed for the replacement of the browser/server combination that worked very well on PCs with the app/server model.  They did this not for any grand reason.  They did it so they could more tightly control the Apple ecosystem so they could make more money.  It is only an unintended side effect that the app/server model turned out to be what makes possible a seamless migration from IPv4 to IPv6.  And these very same experts have been ready, willing, and able to facilitate this new path, a path not of their creation, at every turn.  They are interested in solving the problem, not attaining great glory or gaining wealth beyond your wildest dreams.

The first summary also fits another problem facing us, a much bigger problem than IPv4.  That problem is greenhouse gasses.  A lot of people want very much for people to ignore the experts.  So they push the rest of the original scenario.  We are supposed to believe that if we do nothing then the bad things will never come to pass.  But here too the experts should be listened to.  In the IPv4 case Apple did something in a successful effort to make a lot of money.  And the thing they did turned out to be very helpful.  In the greenhouse gas case there are a number of companies and individuals that are doing things in order to make a lot of money.  Like Apple they are succeeding, and to a massively greater extent than Apple.  But unlike with Apple, these people are not being helpful.  They are deliberately making things worse.

In the IPv4 case there really was no one arguing that there was no problem.  There really is no one in the scientific community who has seriously studied the problem of greenhouse gasses and who is not being paid to say so, that says there is no problem.  In the IPv4 case there were a lot of people that worried about the expense and disruption involved in fixing it.  Before the "mobile fix" showed up there was ample evidence that these concerns were legitimate.  In the early days before the mobile fix appeared the migration did not go well.  In the case of greenhouse gasses there are a lot of people that also are concerned about the expense and disruption of a fix.  It is a very legitimate concern.  In the IPv4 case it was pretty simple to figure out what needed to be done.  It was just hard to figure out how to pull the migration to IPv6 off.  What the mobile fix did was to effectively sidestep the problem.  Non-mobile devices would never be migrated.  The greenhouse gas problem is much more complex and much more difficult.  At its most basic level the fix could simply consist of pulling all that excess Carbon Dioxide out of the air or never putting it in the air in the first place, in effect sidestepping the problem.  But nobody really knows how to do that.  Other amelioration or mitigation ideas, the equivalent of migrating non-mobile devices to IPv6, are even harder to figure out how to pull off.  The greenhouse gas problem is also a much more slowly moving problem than the IPv4 problem.  Back in 1990 when it was laid out in the IPCC report no one was seriously concerned about running out of IPv4 addresses.  15 years later we are, if anything, going backwards on greenhouse gases.  But we are chugging along quietly with the solution to the IPv4 problem.

Ultimately, a permanent solution for the IPv4 problem kind of just fell out of the sky.  But remember that before this happened a number of "extender" solutions were developed and implemented.  They kept things going long enough for the permanent solution to appear.  And when the permanent fix appeared it was identified and embraced.  So none of the actions of the experts were wasted.  It was important to pay attention to them all along the way.  We wouldn't have made it without them.

No comments:

Post a Comment