|
One of the throwaway remarks I sometimes make at conferences is that “Google knows you’re pregnant before you do”.
I can say this because the things you search for will change as your life changes, and search engine providers may well be able to spot the significance of these changes because they aggregate data from millions of people.
Now Google’s philanthropic arm, google.org, has shown just what it can do with the data it gathers from us all by offering to predict where ‘flu outbreaks will take place in the USA.
It has found that “certain search terms are good indicators of flu activity”, in that they correlate well with reports from the official Centers for Disease Control and Prevention.
And it claims that “across each of the nine surveillance regions of the United States, we were able to accurately estimate current flu levels one to two weeks faster than published CDC reports”, a result that could save people’s lives by alerting them to have ‘flu vaccinations earlier than they might otherwise have done.
This is a really interesting piece of work and clearly demonstrates the power of data mining. Its potential usefulness is not limited to health matters.
As John Naughton pointed outin The Observer , “everyone I know in business has known for months that the UK is in recession, but it’s only lately that the authorities have been in a position to confirm that—because the official data always lag the current reality.”
Perhaps the answer lies buried somewhere in the queries being made online, with company directors or politicians searching for terms that imply a coming recession, like details of redundancy pay or bankruptcy protection.
It isn’t only Google who can do this of course. Its database of queries is vast and fast-growing, but it is only one among many databases that underpin the modern world.
The banking system is really only a collection of collected databases recording who has which assets, while neither government nor business could operate without complex data stores.
Soon the national ID register will store details of everyone in the UK, while the forthcoming Communications Data Bill is likely to include proposals to create a vast system that will record details of every e-mail sent, every website surfed and every file downloaded.
As we have seen with flu trends, sometimes the “interesting” knowledge that can be extracted is well-concealed until comparisons can be made with other sources, as it was the correlation between some search terms and the real-world data that mattered.
Of course Google has not revealed which search terms it analysed because doing so would undermine the model’s effectiveness.
Unfortunately it is being equally reticent about how it has ensured that the data its uses is properly anonymised so that users cannot be identified on the basis of their queries.
A letter from the Electronic Privacy Information Center (EPIC) and Patient Privacy Rights to Google boss Eric Schmidt has not been answered, leaving those concerned with online privacy uncertain over the broader implications of the project.
But as Cade Metz points out in an insightful article in The Register, we may all be happy to know that a ‘flu outbreak is coming, but what happens when the disease involved is more life-threatening and the government asks Google for the names and IP addresses of anyone whose search terms indicate that they are infected?
It’s not that I don’t trust Google. I don’t trust any company, government department or individual without a good reason to do so.
In the case of search engines that claim to protect my privacy I want to know just how they do it and will not accept vague reassurances.
In the case of governments that want to build vast databases, I want strong legal sanctions against their abuse and full disclosure of the technical details.
Those of us living in the west with access to technology and the network have lived through a revolution in the last decade and a half that is as radical in its impact as the industrial revolution, and it has happened a lot faster.
It is hardly surprising that we do not yet know how to operate in a networked world where amazingly detailed data is routinely stored, processed and made available.
We will need to think in new ways, learn to assess risk according to new criteria, and find ways to hold those who have power over us—whether political, social or cultural—accountable in new ways.
The US writer Curt Monash has written about this topic many times over the years, arguing that since we clearly cannot halt the move towards data capture and use we should put legal and regulatory frameworks in place as a matter of urgency.
We have made a start in Europe with data protection legislation which could be strengthened and reinforced if politicians were willing to make the effort.
But first we need an active press and an engaged population, one that asks hard questions, forces those who want to develop new databases to be accountable and open, and makes the boundaries of acceptable surveillance a matter of public debate.
And perhaps we should ask google.org to start work on “Privacy Trends”, hoping to spot privacy disasters before they happen by looking at searches for “compromised data”, “hacked database” and “lost USB stick”.
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byCSC
Sponsored byRadix
Sponsored byDNIB.com
Sponsored byVerisign
Sponsored byIPv4.Global
The much maligned NHS Direct shows geographical data of symptoms reported on their website (for the UK).
During the last flu outbreak, you could see the virus spread from city to city, as people reported flu like symptoms to NHS direct. I suspect this is probably as effective as the search engine mining, but with better geographic precision, and the possibility for government advice to be relayed.
There are risks in this self reporting, what for example if we had a bad cold outbreak and a serious flu outbreak at the same time.
Concerns over privacy are I think mute. The 1918 flu outbreak was mitigated by so called “social restrictions”, your freedoms of movement and association rightly disappear when you have a chronic contagion. You might have a right to privacy if infected with a disease like AIDS (assuming you aren’t spreading it recklessly), but that doesn’t apply to influenza. Social restrictions worked well against the 1918 epidemic, this is heavily analyzed, and if governments have any meaningful plans you can expect them to be heavily influenced by that research. I assume they will close theaters, pubs, clubs, sports centers and stadiums, if faced with a similar threat. There is an interesting discussion over whether it is better to get influenza early, whilst there is still capacity to treat you if you develop serious complications, or to try and avoid the disease altogether.
The government of course has access to many of the best economic forecasts, and indicators. But recession has a technical definition in the UK of negative growth for two consecutive quarters (or some such), and the government has nothing to gain by emphasizing the bad news.