Wednesday, April 6, 2016

Positive Identification

"Who are you?"
"Jane Doe."
"Prove it!"

Some variation of the above dialog is now a common part of our lives.  It is frequently boiled down to "Show me your picture ID."  The picture ID contains a name, a picture, and typically other information.  The name provides the answer to the question.  The rest of the information on the ID and the fact that you possess it provides the proof.  The connection between you and the ID is provided by the picture.  Presumably you and your picture can be compared to see if there is a match.

There are variations.  It is becoming more and more common for your smartphone to stand in for your picture ID.  And the degree to which the "proof" actually validates your identification varies.  Bartenders just want to know if you are old enough to drink legally.  The TSA wans to be really sure you are not a terrorist.  And then there is the sad situation with which the title of this post is most commonly associated.  Someone may need to confirm the identity of a deceased person.

With the background established let's look at the process of positive identification as it was, as it is, and as it will soon be.  The times they are a changing.  Let's start with the "was" part and for that I want to go back a thousand years.

A thousand years ago almost everyone lived on a small farm or in a small village.  Almost everyone a farmed, fished, or was otherwise engaged in the process of growing and harvesting food.  And at the time almost everyone was illiterate.  Paper hadn't been invented yet.  The alternatives that existed at the time (i.e. parchment) were all extremely expensive and only available in tiny quantities.  So in that environment how were people identified?

Most people spent their entire lives within a few miles of where they were born.  Everyone knew everyone else in the neighborhood by sight.  You saw people at home or on market days or at feast day events.  And you saw them for their entire lifetime.  At some point a woman would be pregnant.  Then she would show up with a small child.  The child would grow up, become a parent and die.  And the community observed all this.  Identification was not absolute but it was good enough for the situation.  You knew who farmed what piece of land and who their children were.

And frankly positive identification was not that important.  People were poor so they had few possessions and even fewer valuable ones.  Land ownership was mostly governed by the "possession is nine points of the law" rule.  There were no accurate surveys and all the land was probably technically owned by the local feudal lord anyhow.

Oh, there were foreigners.  Someone from outside would occasionally wander by but this did not happen often.  If the wanderer was a trader how much did it matter who they were?  They showed up, traded, and moved on.  The trade goods were important.  The identity of the trader was not.  The other group who would show up occasionally were members of the power elite.  It might be a soldier or a priest.  If a particular soldier was the top dog he became the local feudal lord.  Other soldiers either worked for him or there was a power struggle.  Eventually someone came out on top and the others ended up dead, part of the lord's operation, or they moved on.  The feudal lord was in a position to assert his authority by means of his ability to kill or maim you.

So the locals tended to take him at his word as to who he and what he was.  The niceties of the law and who's authority was more legitimate tended to be less important than who won the power struggle.  The other source of authority and power were the religious authorities like priests.  If there was a power struggle between religious factions the rules of engagement were different (less blood more politicking) but who stayed and who was pushed out mostly depended on who was supported by the local feudal lord.  And again, the peasantry tended to take whoever won at their word.  So in this period positive identification had little real practical meaning.

Eventually paper got invented and the technology for making it cheaply spread broadly and it became practical to keep paper records.  This ushered in the era when marriages, births, and deaths, started to become routinely recorded.  For a long time the process was hap hazard.  A record might be maintained at the local church or in a family bible.  How reliable was this information?  One assumed that it was fairly reliable.  But this assumption rested to a great extent on past practice.

Usually people in the community were around to testify to the accuracy of the information, at least until enough time had passed that all the eye witnesses had died.  After that inertia set in.  Records that had been accepted in the past continued to be accepted.  Beyond that, old records came to be seen as accurate records, mostly because they were old.  And it was certainly possible for a record to be fudged.  A marriage could retroactively be added to a church register or a family bible.  And the same process could be used to erase or alter entries.  People went with these records as much because they were the only practical option as for any other reason.

Not that long ago governments started taking over responsibility.  They started issuing birth certificates, wedding licenses, and certificates of death.  And more people were born in a hospital with a physician in attendance.  But in a certain sense the foundation the process rested on had not changed.  Someone filled out a form.  The information was only as dependable its source and its source was some person.  The person might be the mother or the doctor or a hospital employee.  And, in the case of the doctor or hospital employee, they might be relying on some stranger for the information they were entering.  I suspect that most of the time little effort was made to corroborate it.

The piece of paper has now been replaced by a computer screen and the data no longer resides on a piece of paper in a file cabinet.  It now resides in a computer file somewhere.  And this highlights a fundamental problem.  It's just data.  And more problematic than that is this.  How do we know that a particular birth certificate is actually the birth certificate of a particular person?  The surprising answer is that we don't.  But that can, and I expect that it will, change in the near future.

We have all been exposed to this sort of thing due to the "birther" controversy.  A lot of people but most notably Donald Trump have spent a lot of time and gotten a lot of media coverage contending that President Obama was not born in Hawaii in 1961.  A lot of their argument is nonsense.  There is absolutely no doubt that a birth certificate was issued at the time and place the President contends it was.

An argument could be made that he is not the child that belongs to that birth certificate, that there were, in fact, two children.  This argument is logically consistent but it is not the argument that birthers make.  And there is a large body of evidence that there is only one child and he is that child.  In fact the connection between this birth certificate and the President is much stronger than the connection between Donald J. Trump and any birth certificate.  So the birther argument, such as it is, is about the wrong thing.  A fundamental question exists.  How do you definitively connect any person with any birth certificate?  And the answer is that in almost all cases you can't.

In some places at some times a footprint (like a fingerprint but of the bottom of a foot instead of the fingertips) of the child was routinely put on the back of birth certificates.  With such a birth certificate you can take a matching print of the foot of the individual in question and do a "fingerprint analysis" to see if it matches.  If it does then you can definitively match a specific birth certificate to a specific individual.

But I have never heard of this comparison being attempted.  As far as I can tell the "footprint on the birth certificate" procedure was never common and, in the cases where it was done, I know of no instances where a match was attempted later.  It probably happened but it was never common enough to feature in crime fiction, for instance.  It is easy to imagine Erle Stanley Gardner plugging it into a Perry Mason novel but he never did.  The fact that it was never a common crime fiction motif is evidence that it was never a common practice.

I have both a passport (expired) and an "enhanced" driver's license.  Both of these require a positive identification.  Having been there and done that I know the drill.  I show up, fill out some paperwork, provide a picture (or get one taken), and provide "positive identification".  What's positive identification?  Why a birth certificate, of course.  So I hand over a piece of paper for examination by a bureaucrat.  But what's the piece of paper?  In my case I have the original actual birth certificate that was issued at the time of my birth and it's a pretty ordinary looking piece of paper.  That's how it was done in the era that preceded the computerization of everything.

But I actually have two "birth certificates".  One of them is the aforementioned piece of paper.  The other is a "certified copy" of the piece of paper.  It is something called a Photostat.  A Photostat, as you can guess, is just a photograph, well actually a print of a photographic negative.  The only thing special about it is that it is embossed with an official stamp and an "I attest that this is an authentic copy of . . ." statement followed by the signature of some obscure bureaucrat.

Let's say I wanted someone to impersonate me.  I could keep my original birth certificate and give them the Photostatic version.  They could use that as the basis of a scheme to identify themselves as me.  So there could be two official me's running around.  And, in fact, a minor variation of this used to be commonly done.

People who wanted to change their identity would search newspaper death notices for someone who was born about the same time they were but who had died young.  They would then write the proper authority asking for a Photostatic copy of the birth certificate for this person.  At the time this was a routine bureaucratic procedure that did not require any kind of special documentation.  When it arrived they would then use the Photostatic birth certificate as the base on which to build up an entire false identity for themselves.  If they picked their dead person properly their chance of being caught out was infinitesimally small.

Spies, crooks, people on the run for political reasons, etc. did this routinely in the '60s.  You could even find "how to" manuals if you knew the right people.  One thing that helped then was that most people did not get a Social Security card and number until they entered the job market in their middle to late teens.  If you picked someone who had died at ten, say, your chances of fooling the Social Security Administration into issuing a card were very good.  It is now much harder to pull this kind of thing off because we are all now surrounded by a much larger more complicated web of interconnection than we used to be.

Children now get issued a Social Security number at birth, for instance.  And as big data spreads its tentacles it becomes harder and harder to pull something like this off without setting an alarm off somewhere.  The federal witness protection people can still do it.  But they can change Social Security and other government records.  But none of this changes the fact that there really is no completely reliable way to positively connect a specific birth certificate to a specific person.  But I believe that is going to change in the near future.

There now exists something called CODIS, the Combined DNA Index System.  This is the database that is used to do DNA matches in crime scene and other law enforcement (i.e. missing persons) situations.  The database contains over 12 million entries and continues to grow rapidly.  There is a considerable amount of duplication so it doesn't represent that many distinct individuals but the number of distinct individuals is somewhere in the millions.  Each entry contains enough information to identify a single specific individual with very high degree of confidence.  Does that mean entries contain complete DNA sequences?  Far from it.  Instead each entry contains 14 numbers.  Thirteen of the numbers are based on something called a STR, a Single Tandem Repeat.  The specifics are complicated but the idea is simple.

An STR "locus" is a very short piece of DNA that varies wildly from person to person.  There are a bunch of variations possible for each STR locus.  The database contains the specific variation number in the DNA of the entry for each of the 13 STR loci.  The 14th number is based on a person's Amblogen gene.  It has been included because the version of Amblogen gene that an individual has tells us whether that person is a male or a female.

Only a few percent of the population has a specific version of a specific STR locus.  So different individuals are likely to have a different variation of the first STR locus.  But they could just by luck have the same variation.  But do they also have the same variant in the case of the second locus?  Here too it is very unlikely that two different people have the same variant but it is possible.  And so it goes.  Scientists have done the math and the likelihood that two different people who are not identical twins would have the same variant of all thirteen STR loci is a really tiny number.  It varies from case to case but it is unlikely that two non-twin individuals on Earth have the same variant of all thirteen STR loci.  And just to decrease the chances even more there is a move afoot to add several more STR loci to the standard list.

It turns out that the amount of DNA in all fourteen loci used in this process is a tiny fraction of your whole genome.  It's way, way, way less than 1%.  But it is enough to get the job done, namely deciding if two DNA samples come from the same person or not.  And the basic technology for this was developed more than a decade ago.  In the mean time anything having to do with DNA has gotten a lot cheaper.

The original project to sequence the entire DNA of a single individual cost more than 3 billion dollars and took about a decade.  Now the complete DNA of a single individual can be done for about 10 thousand dollars and it's getting cheaper every year.  Scientists think the cost will drop to below a thousand dollars within the next few years.  And that's what it costs to sequence everything.  The cost to sequence enough DNA to tell one person from another costs way less than that and that cost is also dropping like a rock.  And of equal importance the size of the gadget that does the CODIS sequencing is also getting smaller and smaller.  And that opens up a lot of possibilities.

We as a society have been fighting over privacy for a long time now.  Before the Revolutionary  War colonists decided they didn't like British soldiers searching peoples homes any time they wanted to.  They complained about it in The Declaration of Independence.  After the War the US adopted the Fourth Amendment outlawing "illegal search and seizure".

When I was younger we were fighting the Cold War.  The USSR was an "authoritarian dictatorship".  The Nazis before them were also an authoritarian dictatorship.  Both regimes were famous for requiring everybody to carry "papers" that had to be produced any time any place any time any official wanted to examine them.  So, since we were the good guys, we were all in favor of the opposite.  Our citizens were able to move about freely and were not be under any obligation to produce their papers.  It was a point of differentiation between us and them.  "Only authoritarian dictatorships require law abiding people to always carry identification documents as they go about their ordinary business."

Well, times have changed.  The USSR is no more so apparently we no longer need to differentiate the behavior of our government from that of authoritarian dictatorships.  It seems that we are now all supposed to be afraid of terrorists in our midst.  And that means anyone who is suspicious (not a well dressed white person) had better have their papers on them at all times.  And besides terrorists there is the ever present danger of rapist Mexicans or whoever else fits the "looks suspicious" profile.  I am going to ignore the issue of whether this change is a good thing or a bad thing.  Instead I am going to focus on the technicalities of how to positively identify people.

As I have discussed extensively above, the birth certificate is the foundation of identity for US born individuals.  There is an elaborate system in the US for dealing with the foreign born that I am not going to get into.  I will just note that in many cases it often ends up coming back to a birth certificate for these people too and move on.  And, as I have also extensively elaborated on above, there is no way currently to definitively tie a specific individual to a specific birth certificate.  And by now I think I have telegraphed where I am going pretty clearly.  The thing that could tie the two together is CODIS style DNA information.

There is no technological impediment to doing this now.  A sample sufficient to the task is easily obtained from a newborn.  Blood works and only a drop is necessary.  And the equipment needed to take the necessary measurements is relatively inexpensive and the process is relatively quick.  So it is completely possible to CODIS characterize every newborn at birth.  (As a side note it is also easy in most cases to CODIS characterize the mother and, if he is handy, the father at the same time.)  And the amount of data is modest so it could easily be added to the birth certificate computer record.  Once this is routinely done and some time has passed it becomes a simple process to prove that a specific individual is the one connected to a specific birth certificate.  You just draw a drop of blood, run it through the CODIS process and see if the results match the information in the birth certificate record.  None of this is beyond our current technical capability.

But it is currently beyond our political capability.  People do not want to be in the CODIS database.  Part of this is due to the association between the CODIS database and criminality.  But a lot of people see it as an invasion of privacy they are unwilling to put up with.  They can be convinced to change their mind if there is great need, say a loved one is missing.  But currently every state has restrictive policies that limit who goes into the CODIS database.  Not even all criminals or suspects go in now.  The details vary from state to state.  Some have restrictive policies and CODIS only a relatively small number of people.  Others apply a broad brush and CODIS many more.  But all states prohibit adding people without cause.

And the CODIS database is not the only DNA database in existence.  People sign up with 23andme or other similar companies that do DNA analysis.  The company tells them, for instance, where their ancestors are from.  Various groups also collect DNA information for a number of different scientific reasons.  But both the commercial and the scientific operations are careful to not sequence the DNA loci that CODIS uses.  They just don't want to get tangled up in criminal investigations.  And the people whose DNA ends up in these other databases like it that way.

But let me emphasize that this is a decision that is made for non-technical reasons.  Companies like 23andme try to retain the original sample so that it can be reanalyzed as technology advances.  So they could easily reanalyze the samples they still have and sequence the CODIS loci.  The sequencing they already do is much more extensive than what the CODIS process requires.  And if they did this their database could be used for CODIS-compatible searches.  The number of people whose DNA could be CODIS matched would immediately jump substantially.  But this is not really necessary.  There is already a strong trend in place to keep expanding the CODIS pool.  It is partly a result of technological considerations.  It keeps getting quicker, cheaper, and easier to CODIS samples.  And the people that run CODIS type databases keep coming up with more and more reasons to include more and more people in their collection programs.

I would think that intelligence agencies like the CIA would want to CODIS their employees and contractors.  And how about soldiers?  And how about law enforcement people.  And, on the other side, how about foreigners entering our country.  And how about people busted for minor offenses like speeding tickets or people involved in divorces or people filing for a business license or people involved in food preparation or, or, or.  As the ease with which the process can be performed and the cost comes down the strength of the argument necessary to justify including an additional group gets less and less.  And as this trend continues at some point you will have twenty or thirty percent of the entire population in the database.  At that point you might as well just put everyone in.

Consider that many crimes now go unsolved.  There is DNA evidence available in many of these cases but it doesn't match any entries in the current CODIS database.  If we had CODIS coverage of the entire population then it would go some way toward increasing the percentage of crimes that do get solved.  This higher solution rate should lower the overall crime rate, right?  And isn't lowering the crime rate a laudable goal?  That is only the most obvious potential benefit to CODISing everybody.  Other potential benefits are easy to come up with.  Instead of listing them let me extrapolate a little ways into the future.

When I was younger pretty much all small transactions (i.e. buying a cup of coffee) were done with cash.  Then people started using debit cards instead.  There are now a lot of people who carry only a small amount of cash around.  And as I write this we are transitioning to an even newer method, paying with our smartphones.  Today it is rarely used (except at Starbucks).  But that is because there are some kinks that need to be worked out.  Not all smartphones work at all stores all the time.  That's mostly because we have dueling incompatible payment systems fighting it out.  And for business reasons each system makes sure that it is incompatible with any of the other systems.  At some point that competition between systems will be made to stop.  Then people will be able to use one application on whatever phone they like to buy stuff from whoever they want to.  But that puts the identification issue front and center.

The simplest thing from a user standpoint is to always leave your phone unlocked.  And far too many people do this because dealing with the security system is bothersome.  But Apple came up with a trick.  You put your thumb in the right place and the phone can validate your thumbprint.  This can be done almost instantaneously.  And this approach is now being copied by the other smartphone makers.  I expect it to be universal within a few years.  But I suspect that the thumbprint scheme is not really that secure.  The phone only sees part of your thumb and in poor conditions.  The vendor (e.g. Apple) does not want a bunch of false negatives (you put your thumb on your phone but it doesn't okay you) so I suspect that the phone calls anything that is even vaguely close a match.

But let's fast forward a few years.  Currently the easiest way to do a CODIS analysis is with a drop of blood.  But with a lot of effort even very tiny amounts of DNA can sometimes be used.  In ideal circumstances the tiny amount of DNA that ends up in some fingerprints is enough.  And it turns out that there are lots of cells on the surface of your skin that contain your DNA.  (That's where the fingerprint DNA comes from.)  These cells can be collected and processed without having to poke a hole in you, a process that is not very painful but "not very painful" is not the same as "not even noticeable".  And it is easy to imagine harvesting a few cells from the surface of your finger in a way that is not even noticeable so let's imagine it.

Next imagine the CODIS analysis device being small enough and cheap enough to be incorporated into a smartphone.  And, while we are at it, assume it can produce an accurate result in less than a second. Now we have everything we need to build a system right into our smartphones that is fully capable of positively validating that you are you.  And it is quick enough so that it can be used routinely, perhaps a hundred or more times per day.  That would definitely solve the positive identification issue for smartphone transactions.

I think that for better or worse this is the direction we are heading.  I would like to say that it is not inevitable but I am concerned that the forces that are pushing in this direction are powerful enough to overwhelm any opposition I can currently foresee.  I think most people will be of the opinion that it is no big deal.  In the fight between Apple and the FBI over unlocking that iPhone (see http://sigma5.blogspot.com/2016/02/digital-privacy.html for more on this subject) that was the opinion of a large segment of the general public when they were surveyed on the subject.

They put it another way:  "I've got nothing to hide so what's the problem?"  That situation did not seem to directly affect them.  They did not foresee the FBI or anyone else wanting to unlock their phone so it didn't seem personally important either way.  In the case of what I am now taking about the direct connection is much more obvious.  But there are also immediate benefits.  "I can use my smartphone to pay for my coffee without having to worry about someone maxing out my credit cards if my phone gets stolen."  (As a side note if smartphones used this system they would be useless to thieves and thieves would stop stealing them.)

Our privacy is continuously under assault.   Technological advance keeps making it easier to invade our privacy and harder to protect against an invasion.  If everyone ends up in a CODIS-type database and that database is routinely used to confirm our identification and if a truly positive identification is the norm then pretty much every nook and cranny of our lives will be stored away in one or more computer databases.  It looks like this eliminates any technical barrier to the complete invasion of our privacy.

I'm sure at least some will continue to say "I've got nothing to hide."  But that's not really true.  You may think you have little or nothing to hide.  But all of us have opinions and all of us lead our lives in certain ways.  Bear in mind that whatever opinions you hold there are a large number of people who think you are wrong.  And no matter how boring you think your lifestyle is there are lots of people who strongly disapprove of it.

Are you a girl who likes to wear pants?  Are you a guy who likes to shave?  There are people who are seriously unhappy with you.  What religion to you follow?  It doesn't matter.  There are a lot of people who hate that religion, whichever one it is.  Do you like city living or do you prefer the wide open spaces?  Either way, there are people who are seriously unhappy with you.  Those are all choices many people would find boring and unimportant.  How about more controversial ones?

Do you drink?  Have you ever had sex outside of marriage?  Have you tried non-missionary sex?  Have you smoked pot?  How about other drugs?  Even once?  Have you ever broken a traffic law, driven drunk, or maybe after you have had only one or two?  Have you ever skinny dipped or streaked or done anything else "young and stupid"?  Have you ever stolen something, even accidently?

The point is we have all done some embarrassing things, maybe even a lot of embarrassing things.  And we have all done things some would disapprove of to the point that they would delight in harassing us about them.  So we all have things to hide.  Pretty much all of us have things we would prefer our parents, or our children, or our friends, or our coworkers, or the authorities, or our enemies, or random obnoxious people we don't know, don't know about.  In other words, we all value out privacy.

In the past there have been practical or technological barriers we could hide behind.  The tatters that remain of the old barriers are quickly being shredded.  I have addressed the general issue of privacy before (see http://sigma5.blogspot.com/2013/12/privacy.html).  I devoted roughly the last third of that post to what I thought should be done.  I wrote that post over two years ago.  The current topic only adds to the pressure that is moving us toward a world where there is no privacy.  I recommend that post for my overall thinking on what should be done.  Meanwhile there is a small piece of good news on the privacy front.

I linked to my blog post on the fight between the FBI and Apple above.  At the time I wrote it no one knew how it would come out.  But that specific situation has since been resolved.  The FBI found a way to crack the phone that did not require the extraordinary cooperation that Apple was objecting to.  That sounds like bad news but it's not.  The phone that was cracked is an older model.  Apple has upped its game with newer models.  Whatever methods were used are unlikely to work (or at least will be much harder to pull off) on newer models.  And in spite of various polls that were done at the time it turns out that there is a market for secure phones.  So Apple has promised to keep adding features to make each new generation of phones much harder to crack than the old generation.  And remember the phone the FBI was only able to crack after a great deal of difficulty is now a couple of generations old.

And various other technology companies are now jumping onto the "increased security" bandwagon.  They are encrypting more and encrypting to a higher level of security.  They are also changing how their products operate so that they no longer have a backdoor that lets them read unencrypted customer data.  This means that if they are subpoenaed they can respond "sorry -- we can't read it either".  And a side effect of this is that they can't sell or analyze detailed customer activity like they used to be able to do.

They can still do a metadata analysis.  For instance they can figure out who you are interacting with.  They can tell how often you are connecting up and how long you are staying connected.  But they can't tell what you are doing while you are connected.  This means that the data they can share with someone else, the government or another company, is much more limited than in the past.  And that means it is much less valuable.  And that means they will do less sharing in the future.  And that is a modest step in the direction of more privacy.  It is a small but very welcome development.

No comments:

Post a Comment