Tuesday, September 29, 2015

Internet - DNS

This is, the sixth post in this series.  I recommend reading them in sequence.  The first post is at http://sigma5.blogspot.com/2015/09/internet-bits-bytes-and-numbers.html.  The immediately previous one is at http://sigma5.blogspot.com/2015/09/internet-routing-and-nat.html.  I had hopes that I could wrap things up with this post but to do so would make it run too long.  So this post is short and only covers one subject, DNS.

What is DNS and why do we care?  Computers love numbers.  People love text.  So people like dealing with something like www.google.com rather than 172.25.11.23 (not Google's actual IP address).  DNS, short for Domain Name Server, is what gets us from here to there.  The thing that does this is called a DNS server.  In the earliest days of ARPANET there were only a few computers to keep track of.  A simple list worked fine.  But now there are "millions and billions" of web sites, mobile applications, etc., on the 'net.  We have long since moved past the time when a list could get the job done.

This was recognized fairly early in the transition from that first ARPANET to the modern Internet.  The first practical and comprehensive solution was called BIND - the Berkley Internet Name Daemon.  The University of California at Berkley Computer Science department was an early and active participant in Unix, the "C" computer language, and the Internet.  They developed a lot of tools and enhancements and one of them was BIND.  BIND is still around although responsibility has been turned over to the Internet Systems Consortium.  You can find out more at https://www.isc.org/downloads/bind/.  But most of us only need to use DNS servers.  We don't need to install or operate our own DNS server.  But let's find out a little about how they work anyway.

DNS is not a single server.  It is a swarm of cooperating servers.  And their job is to turn something like www.google.com into something like 172.25.11.23.  So how do they do this?  A DNS server is mostly just a simple database.  You ask questions.  It provides answers.  Actually it serves up the data from records that are appropriate responses to our query.  The most popular record type is the "A" for Address record.  If a DNS server has an A record that has a key of www.google.com it serves up the data value (172.25.11.23, in our example).  But in many cases it does not have an A record that is an exact match.  What then?

Would it surprise you to know that, as with a lot of Internet things, DSN is into this whole delegation thing.  Various servers provide part of the answer then pass things on to a server that knows more.  So all blind queries (queries that are not in the wheelhouse of whichever DNS server we are using) are passed on to a "root" server.  Where are the "root" servers?  That' one of those Internet "well known" things.  There is an official list of root servers and you can find out what they are if you know where to look.  Anyone who messes with DNS servers knows where this list is found and it is plugged into each DNS server.  So what our server does is pass the question on to a root server.

And this whole delegation thing proceeds from back to front.  The only thing root servers know about is where that last part is.  You know, the ".COM" or ".EDU" or ".GOV" or whatever thing.  The root servers don't care about the rest of it.  They know about that last part and where to send you to find out more information.  They have a list of DNS servers that specialize in ".COM" DNS entries, for instance.  So they just pass any ".COM" queries along to a ".COM" DNS server.  (We don't need to know where the ".COM" DNS servers are because the root servers know.)  This server knows about a whole bunch of "anything.COM" things.  It has no clue about "anything.EDU" or "anything.GOV" or "anything.whatever" if "whatever" is anything but ".COM".  But different servers do know about ".EDU", for instance.  The root server passes queries about "anything.EDU" to one of them.  The same is true for ".GOV", ".NET", and so on.

There is an official list of all the legal top levels.  (They are the last part of the name but they are called "top" levels because they represent the top of the tree of interconnected DNS servers.)  That's because every root server needs entries for all of them.  The servers that handle ".COM", ".NET", etc. are called "top level servers".  Originally there were only a few top levels.  But the criteria has been loosened up several times now.  Why?  Well, there is a little work necessary to get a new top level to work.  But the main reason has been a concern over confusion.  But people are now pretty comfortable with things like ".CO" or ".TV" or ".US" so there now seems little reason to keep the list short.  Also, the Internet has slowly become more international and part of that has been support for various languages.  There are now top levels in Chinese and Arabic, for instance, languages that don't even use our standard alphabet.  Returning to our example, hopefully the ".COM" DNS servers have an entry for ".GOOGLE.COM".

But that entry is just the name of another DNS server.  This server knows about "anything.GOOGLE.COM".  It knows nothing about "anything.MICROSOFT.COM" or "anything.FACEBOOK.COM".  And it especially knows nothing about "WH.GOV" (different top level).  So this ".GOOGLE.COM" DNS server has (again, if everything is working ok) an entry for WWW.GOOGLE.COM.  It serves it up to your computer and your computer finally knows what IPv4 address to send its traffic to.  Lots of messages are exchanged between lots of DNS servers.  But that's traffic that happens in the background where you don't see it.  And it is perfectly legal to go more levels.  The process just starts at the top level and works its way through the layers until it gets to the end at whatever level that might be.

That's the standard way things work.  I am now going to talk about a couple of variations.  Let's say we ask our local DNS server about WWW.FACEBOOK.COM.  The first time through it goes through the elaborate process I described above.  But we often go to the same place many times.  So DNS servers have a cache.  This is a place where the answers to recent questions are saved.  For any query that is outside the direct responsibility of that DNS server the query is first checked against the cache before the elaborate process described above is undertaken.

If it finds an entry for WWW.FACEBOOK.COM in the cache (because somebody asked about it recently) it just sends a "non-authorative" answer back.  Why non-authoritative?  Because only one or a few DNS servers are designated as the official respositories for WWW.FACEBOOK.COM DNS information.  They are the "authoritative" DNS servers.  All DNS entries in the cache of non-authoritative servers are also marked with a "time to live", usually a few days.  If Google wants to move their servers around they will need to update the DNS entries for these servers.  Confusion will reign if the old information does not eventually flush out of the Internet.

Another non-standard thing is an ALIAS record.  Let's say your company buys another company and you want to merge web sites.  You could just shut one down but that would mean that customers who had links to the shut down site would be stuck.  The ALIAS record solves that.  You just put an ALIAS record in the appropriate DNS servers.  The ALIAS record says "if someone asks about WWW.X.COM answer the question as if they had actually asked about WWW.Y.COM instead".  This way everything that was linked to the old location gets automatically connected up with the new location.  ALIAS records actually have a lot more uses than this but that's enough to give you the idea.

And it turns out that there are many different kinds of DNS records.  I am only going to talk about two more.  One is the MX record.  You can ask a DNS server "what's the IP address of the mail server for WWW.GOOGLE.COM?"  The DNS server will go looking for MX records instead of A records.  This kind of thing is handy in a number of ways.  Have you noticed that sometimes companies leave the "WWW" off.  DNS tricks like these allow computers to find the same web site either way.  Try just "GOOGLE.COM" sometime.  It gets you to the same place as you would get to by putting the "WWW." on the front.

The other record I want to talk about is the PTR record.  "A" records get you from the name to the IPv4 address.  PTR records get you back the other way.  Not everybody sets up PTR records all the time but in a lot of cases people do. If so you can find out what the text name that goes with an IP address is.  This turns out to be tricky.  To make this work you need to know the net/subnet boundaries.  But it is literally impossible, in general, to know what they are.  So what to do?  The PTR system assumes that IPv4 addresses are broken on octet boundaries.  And, to allow for this delegation thing to work, we have to go at things backwards.  172.25.11.23 is temporarily turned around.  We pretend for a minute it really is 23.11.25.172.  That's because we want to look the "172" part up first, and so on.  Cutting to the chase, instead of eventually looking for "WWW" in the ".GOOGLE.COM" DNS server we look for "23" in the "11.23.172" DNS server.  If you are not a little confused at this point I have failed.

But let me see if I can confuse you even more.  We have these authoritative DNS servers.  Why?  So everyone everywhere who asks a specific DNS question gets the exact same answer.  But what if we want different people to get different answers?  Google has server farms all over the place.  They want to send you to one of their local farms rather than requiring you to gallivant across the country or the world to get to a Google server.  A number of other large companies like Microsoft, Facebook, and Amazon want to do the same thing.  There's the trick.

What if we have a super-DNS server that not only looks at the question but looks at the IP address of the computer asking the question?  If we know the general location of the computer making the DNS query (you can usually get at least an OK idea by doing a lot of research) then you can serve up an IPv4 address answer that is for a server that is in a server farm that is close (in terms of the Internet) to the computer generating the query.  There is a company called Akami that does just that.

Their services are not cheap but they are "cheap at the price" for large companies.  The ".COM" DNS server just points to a special Akami DNS server whenever someone asks about ".GOOGLE.COM" (or any of the others).  Akami fakes things up on the fly so that people asking for the IPv4 address of WWW.GOOGLE.COM get a different answer if they are asking from a computer located in Europe than they get if they are asking from a computer located in San Francisco, which is close to Google's US headquarters

Let me finish up with several quick items.

DNS entries are not case sensitive.  So www.google.com is the same as WWW.GOOGLE.COM is the same as WwW.gOoGle.CoM, or any other pattern of capitalization that floats your boat. 

IPv6 - DNS servers can handle a mixture of IPv4 and IPv6.  A new record type has been added for IPv6.  The "A" record is now an "AAAA" record.  The data part of an AAAA record contains an IPv6 address.  As far as I can tell the PTR record has been extended to handle IPv6 without requiring a new record type.  But I haven't really looked into this.

You can directly query DNS servers by using a command called NSLOOKUP.  You can find information on it on the web, if you are interested.

If you run a Windows PC you can execute the IPCONFIG command in a "CMD" box.  "ipconfig /all" will tell you more than you probably wanted to know about various network settings.

If you poke around you will probably find that a "primary" and a "backup" DNS server are configured on your machine?  Why?  Because DNS is so critical.  The only time the backup entry is used is when there is a problem using the primary entry.

DNS server entries are always specified as an IP address (usually IPv4 but possibly IPv6).  Why?  Because if DNS is not working how do you turn a name into an IP address in order to know where to go to get DNS information?

Finally, where do the addresses for a primary and backup DNS server come from?  In most cases "magic".  I will get into this a little more in a future post.

No comments:

Post a Comment