Thursday, September 24, 2015

Internet - bits, bytes, and numbers

Mostly I write long blog posts and publish two per month.  I am going to try and get out of that rut and do things in a more blog-ish way for a while.  To be honest, the subject I want to attack, how computer networking works, would result in either a single ultra-long post or two or more long posts.  So while I am posting on the subject I am going to make a virtue out of a necessity.  I am going to break it up into a number of small posts.  Once I am ready to move on to something else, we'll see how it goes

I am not going to do an in depth treatment of computer networking.  Instead I will stick to an overview that includes enough information that a home user will have a broad general understanding of what is going on.  If I am successful he will also be knowledgeable enough to deal with most home networking issues.  Here goes.

The first subject I am going to cover is really boring but really important.  My original plan was to sprinkle a little in here and a little in there.  But it made even my first post run long.  So I am just going to break it out as its own subject and start with it.  Fortunately it is short.  And for those of you who have already dived into networking it may clear some things up.  I can at least hope that is true.

Computers are at bottom all about numbers.  Most of the time a number is handled within the computer in binary form.  Computers do off/on very well.  It is possible do get them to handle more complex things but off/on is the quickest and easiest.  And off/on gives us a choice between two values, a binary (two possibilities) choice.  People numbers are called decimal because you have a choice between ten possible digits.  And so on.  Two other alternatives that are popular in the computer business are octal (8 possible choices) and hexadecimal (16 possible choices).

Anyhow, if we go with binary numbers, and that's what we are going to do, we end up with 0 and 1.  And if we group binary digits (a contradiction as binary = 2 and we have 10 digits on our hands, but that's the phrase in common use) we can up our number of choices.  With two bits ("bit" is a contraction of binary and digit) we have 4 choices.  With three bits we have 8 choices (commonly referred to as "octal").  With 4 bits we have 16 choices (commonly referred to as "hexadecimal").  I am going to stop there.

But 2 and 4 and 8 and 16 are powers of 2 and that's not by chance.  And if we combine x bits into a single combined thing then the maximum number of possible combinations is exactly two raised to the xth power.  And that's one of the most basic of computer tricks.  By combining a number of bits and treating it as a single entity we can make something that can support lots of unique values.  If we need only a few unique values we can use a small number of bits.  If we need a whole lot of unique values we can use a large number of bits.

You stumble across powers of two all over the place when dealing with computers.  So let me include the value for some that arise frequently.  8 bits gives you 256 possible values.  10 bits gives you a little over a thousand (1024 to be exact) possible values.  16 bits gets you a little more than 64,000 possible values.  20 bits gives you a little over a million possible values.  24 bits gives you over 16 million possible values.  And finally, 32 bits gives you a little over 4 billion possible values.  That should give you an exact (for the small ones) or an approximate (for the larger ones) feel for various commonly encountered powers of 2.

Let me get ahead of myself for a moment and tell you that in IPv4 (I'll get to what IPv4 is in a later post), a computer is assigned a 32 bit number.  That means that IPv4 can handle 4 billion computers.  We are going to have to talk about specific IPv4 computer numbers and understand how to do tricks with them.  But, for the moment let's forget the what and the how of  this 32 bit number and just focus on the number itself.  Specifically, let's focus on how we represent it.

There are two obvious choices.  First we can just treat it like a standard decimal number.  It would have the same number of digits as a ZIP+4 number.  And network people could have gone down this path.  But they didn't.  The other obvious choice would be to list a string of 32 characters consisting of zeroes and ones.  That would have been an unwieldy approach so it too was discarded.  Then there are some other popular (within the computer community) possible approaches.  Three binary digits can be grouped together to form an octal digit and represented by the digits 0-7.  That is a common trick.  The resulting number would have shrunk to 11 characters.  That's doable.  Then there's the more modern variant on octal, hexadecimal digits.  Here 0-9 are augmented with a-f so that there are sixteen characters available.  That would result in a pretty manageable 8 character number.  It could have been used but it too wasn't.

Instead a hybrid scheme was adopted.  The 32 bits were broken into 4 eight bit subgroups.  For obscure reasons each subgroup is technically called an octet.  The more common term for a tight grouping of 8 bits is a byte.  But for essentially political reasons "octet" is frequently used in the literature.  I will use byte and octet interchangeably.  A decimal number in the range of 0-255 is used to represent the value of each octet.  So an IPv4 address is commonly expressed in the form a.b.c.d where a, b, c, and d, are replaced with a number between 0 and 255.  This would seem to be almost as clumsy as just using a number between zero and four billion but it isn't.  One reason is that all modern computers use memory that is divided into bytes.  So each number represents its own byte and computer people find themselves translating from the 8 bit binary representation of a byte to the decimal equivalent and back all the time.  So they've gotten good at it.  And here's one trick they commonly use.

You need to become familiar with the following sequence:  1,2,4,8,16,32,64,128.  We have seen that the "2, 4, 8" part of this sequence are powers of two.  And 1 is also a power of two.  It is two raised to the zero-th power.  In fact all the numbers in the list are powers of two.  2 is two raised to the first power, 4 is two raised to the second power, and so on.  The list ends with 128, which is two raised to the seventh power.

This "powers of two" trick is used to move between the binary representation and the decimal representation of the same thing.  Let's say we want a binary pattern of alternating 0s and 1s.  First, line up our list in reverse order:  128, 64, 32, 16, 8, 4, 2, 1.  Count our bit pattern from left to right.  It turns out we want the odd numbered bits to be a zero and the even number ones to be a one.  So we take the even entries in the list, the ones that correspond to the locations we want to be a 1.  We end up with  64, 16, 4, and 2.  Add them up.  We ignore the entries in the list that correspond to the locations we want to be a 0.  The result is 85.  A decimal 85 converted to binary will result in 01010101, just the pattern we want.  We can use a similar trick to go backwards.

Start with the number 204.  Now subtract the biggest number in the list that is smaller - 128.  That leaves 76.  Subtract 64 leaving 12 and then 8 leaving 4 then 4 leaving 0.  Now use our reverse order list.  Traverse the list putting down a 1 for each number that is in our list of numbers from breaking 204 into pieces and a 0 for each number that is missing.  We get 11001100 and that is the binary equivalent of  204.  This seems complex but it works.  And it gets pretty easy with some practice.  Finally, here are two special cases that come up often enough to remember - the decimal number 0 = 00000000 and the decimal number 255 = 11111111.

Looking ahead again, IPv4 addresses are divided into nets and subnets.  The way this is done is with something called a subnet mask.  The first part of the mask consists of all 1s - that's the net part.  The rest of the mask consists of 0s - that's the subnet part.  With what we now know we can translate some popular subnet masks.  255.255.255.0 turns the first 24 bits on and leaves that last 8 bits off.  The mask part is a number between 0 and 16 million (roughly).  The subnet part is a number between 0 and 255.

Two other common subnet masks are 255.0.0.0 (8 bits of net - 0-255 and 24 bits of subnet - 0-16 million) and 255.255.0.0 (16 bits of net - 0-64,000, and 16 bits of subnet - 0-64,000).  Does that mean that the net/subnet break must happen between bytes?  No.  That's just a common way of doing it. 

The following is a legal mask:  255.255.240.0.  Let's take a close look to see why.  First, we have a number of octets that have a value of 255.  That means that all the bits in those octets are 1s.  Then you have one octet that has a funny value.  That means some of the bits are 1s and some of them are 0s.  We'll take a closer look at it shortly.  Finally, all the octets after the funny valued one are 0.  That means they are all 0s.  If you see this pattern then you know the subnet mask is valid if the funny octet is valid.  So let's decompose our funny value.  240 - 128 = 112.  112 - 64 = 48.  48 - 32 = 16.  16 - 16 = 0.   Walking out reverse list yields a pattern of 11110000.  Putting it all together you will find the net part of the mask is 20 bits long - 0-1 million, and the subnet part of the mask is 12 bits long - 0-4096.

Are there any invalid subnet masks?  Yes!  In fact, most subnet masks are invalid.  A valid subnet mask always consists of a number of 1s followed by all 0s.  If you check all the above examples you will find that this is true in each case.  But a subnet mask of 255.255.0.255 is invalid.  Why?  Because it consists of 16 1s followed by 8 0s followed by 8 1s.  Once you hit a 0 all the remaining bits must also be 0.

You have now survived the real boring part.  The rest of the posts in this series have some actual meet in them.

No comments:

Post a Comment