This document summarizes three stories about data privacy and security issues related to cloud computing:
1) AOL's release of search data that was supposedly anonymized but was easily identifiable.
2) YouTube access logs ordered to be released by a judge despite Google's policy to anonymize data after two years.
3) An EU directive mandating retention of communications metadata for up to two years by member states.
2. Stories About The Present
Thank you. Hello, everyone. Before I talk about the future of the cloud, I’d like to begin with
a few stories about the present.
5. 20,000,000
650,000
20M web queries from 650k users, sampled over three months, to help researchers improve
the state of search technology. AOL claimed that the data had been
12. 4417749
Thelma Arnold
Lilburn, Georgia
Lilburn Georgia, who has an interest in
13. 4417749
Thelma Arnold
Lilburn, Georgia
numb fingers
numb fingers
14. 4417749
Thelma Arnold
Lilburn, Georgia
numb fingers
60 single men
60-ish single men, and
15. 4417749
Thelma Arnold
Lilburn, Georgia
numb fingers
60 single men
dog that urinates on everything
dogs that urinate on everything. The Times went to visit her and ran a great caption to the
photo accompanying the story:
39. Viacom v Google
by Viacom, parent company of Paramount, Dreamworks, MTV, and Nickelodeon. In March
2007, Viacom sued Google over YouTube claiming that its copyrighted TV shows were
available on YouTube and that Google wasn’t doing enough to prevent this unauthorised
copying and distribution.
40. U.S. District Judge
Louis L. Stanton
In July this year, five months after Google announced their “anonymize after two years”
policy, U.S. District Judge Louis L. Stanton granted a motion to give Viacom
41. “the motion to compel
production of all data from
the Logging database
concerning each time a
Y ube video has been
ouT
viewed on the Y ube
ouT
website or through
embedding on a third-party
website is granted”
a copy of the YouTube access logs. This information includes:
47. “privacy concerns are
speculative”
Speculative.
And he quoted a line from that blog post made back in February ’08, where Google said that
an IP address
48. “The reality is though
that in most cases, an IP
address without
additional information
cannot [identify you]”
- Google Blog, Feb ‘08
alone isn’t identifying information.
Fortunately Viacom are on top of the privacy concerns raised by their request for YouTube
logs. They say they will limit access to the data
49. “is going to be limited to
outside advisers who can
use it solely for the
purpose of enforcing our
rights against Y ube and ouT
Google”
- Michael D. Fricklas, Viacom’s general counsel
to Viacom’s advisers. Whew, thank you Viacom. And thank you, Google, for choosing to
retain that data and only anonymize after two years!
52. “the retention of data
generated or processed in
connection with the
provision of publicly
available electronic
communications services or
of public communications
networks”
the retention of data generated or processed in connection with the provision of publicly
available electronic communications services or of public communications networks.
53. 6 months
-
2 years
For anywhere between 6 months and 2 years, depending on the type of data. Fortunately it
only covers
54. telephony, mobile
telephony, Internet access,
Internet email and Internet
telephony.
telephony, mobile telephony, Internet access, Internet email and Internet telephony. Member
states have the option to delay the Internet bits until March 2009 and most have opted to
delay. Member states also have the option to go beyond the EU recommendations, as
55. .dk
Denmark has chosen to do. Their draft provision enacting the EU order would require ISPs to
log the
59. every single internet data
packet
every single Internet data packet. I’m reasonably confident this is impossible, but it does
show the ambitions of the states here. The states are all confident that the data won’t be
60. “misused”
misused, but don’t defined “misuse”.
I won’t define “misuse” either, but will simply note that Germany makes their logs available
for prosecutions under
65. “Cloud”
“the cloud” (air quotes). It’s a broad and popular term, and like all broad popular terms it’s
become more of a buzzword to get venture funding, or to sound like you know what you’re
talking about than a useful marker of technological progress.
66. “Cloud” = On Demand +
Utility Computing + SOA
+ ASP + SaaS + ...
It covers a multitude of sins. We talked about Software As A Service (YouTube), and that’s a
real trend. Lots of different types of software are moving into “the cloud”.
67. Meet Google Healthcare. Patients upload their medical records and get a friendly useful
interface for checking drug interactions, etc. The privacy aspect to this is a little odd,
though.
68. Google Healthcare slips through a crack in the medical record legislation in the USA--
because they’re not holding records on behalf of providers, no security is mandated. The
expectation is that the market will determine an appropriate level of security. Hoorah for the
market!
And as we all know, markets immediately find the appropriate level of security with no
incentive to provide too little. And fortunately there can be no privacy problems flowing from
an inadequate level of security. Right? Right?
69. Salesforce.com is Customer Relationship Management--salespeople and their managers can
keep track of contacts, accounts, etc. online. You know, “Joe at IBM buys widgets, here’s his
name and address and a record of every conversation that anyone in the company has had
with him”.
71. Here’s mint.com, one of several web-based apps playing in Quicken’s back yard. They fetch
your bank statements from your bank, process them, and tell you where you’re spending
your money.
72. Utility Computing
That was software as a service.
Another aspect to the cloud is utility computing, such as
73. Amazon Web Services
Amazon’s web services. They offer many services over the web that have nothing to do with
books, including
74. Amazon EC2
their “Elastic Compute Cloud”, EC2. You upload a virtual machine image (think: copy of a
hard drive) and Amazon runs it for you. No longer do you have to worry about managing
bandwidth, installing operating systems on servers, replacing hard drives, and all the stuff
that IT departments used to have to do. Consequently
75. many people have build their companies on EC2 and its sister storage service, S3. Many of
these are startups, attracted because it lets them scale without substantial investment, but
many EC2 customers (Eli Lilly, for example) are sizable. Amazon’s services are getting a lot
of use,
76. as this graph shows. Amazon’s Web Services overtook Amazon.com, in amount of traffic, in
mid-2007, just 18 months after launch.
77. Google also have a utility computing play: you write your program to run on their
environment and away you go, all of Google’s more than 2 million servers are yours for the
using. Like Amazon, you’re charged by the CPU-hour.
78. Microsoft are getting into the game as well, with their Live platform. According to a recent
Economist article, they’re rolling out.
79. 35,000/month
2,500/container
$500M/data centre
35,000 new servers every month in data centers. They’re building the data centers from 40-
foot shipping containers that each house 2,500 servers. The new data center in Chicago cost
$500M. These numbers aren’t unusual for the industry.
80. “Cloud”
What does Utility Computing and Software as a Service have in common? What unites Google
Apps, your ISP, Amazon EC2, and Salesforce.com?
83. Outsourced IT
Outsourced IT. Every program on every machine is a pain in someone’s ass. IT departments
must keep the versions up to date, deal with bitrot (when Windows stops working and
everything has to be reinstalled), etc. Using Google Docs means all the administrative hassle
of Microsoft Office has become Google’s problem. And part and parcel of outsourced IT is
85. Hackers, Viruses, et al.
bad guys. Every IT department asks itself “can Google do a better job of this than I can?” and
probably answers, correctly, “yes”.
86. Privacy & Security
But because there’s a strong relationship between privacy and security, namely it’s
impossible to have privacy without good security,
88. Where’s the problem?
Many people ask “so what?” when you tell them that they’ve outsourced their privacy.
“Google will look after me, right?” they say. There are two reasons why this isn’t necessarily
true.
89. (a) Google decides
First, Google’s gets to decide how much security to put around your data, and have no
doubt--it is a decision. There’s always
92. perfect security
you might as well look for perfect happiness. There is no such thing as perfect security, just
acceptable levels of risk. Your company makes decides upon that acceptable level every time
they decide not to train their IT staff in the latest security practices, every time they choose
between one antivirus product and another, every time they decide not to buy a $75,000
network security appliance. That’s perfectly normal. What’s different in the world of the
cloud, though, is that
93. decide
you don’t get to decide. Your vendor does. And no vendor will tell you all the security
precautions and procedures they have. Their security is
94. opaque
opaque. Few vendors even report incidents, and even fewer countries have mandatory
reporting laws. So you’re asked to make a security (and thus privacy) decision on the basis of
imperfect, or even absent, information. That’s the first reason that outsourced privacy isn’t
all good.
95. (b) Gubmint = Black Hats
The second reason is that it’s not just rogue teenagers hellbent on getting your credit card
details that you have to worry about. As the EU directive clearly show, governments are black
hats, bad guys. They just have
96. subpoena / court order /
search warrant / request
different attack vectors from those of your average 21 year old Ukrainian virus writer. And as
97. Y ube
ouT usernames
IP addresses
Viacom
Viacom’s logfile data grab shows, many private companies have learned to piggyback on the
judicial and legislative’s attack vectors.
98. Warshak v USA
The situation’s even worse in the USA. Until now there have been two standards for
searching your records:
99. constitutional
Constitutional, preventing unreasonable search and seizure, which covers stuff in your home.
Constitutional protections are hard to circumvent, as they were written by angry 18th century
revolutionaries and tend to be quite explicit in their distrust of government. The other
protection is
100. statutory
statutory, the laws that fill the gaps in the Constitution. Statutory protections guard your
data outside the home
101. statutory = weaker
and generally require law enforcement to have less justification than do the constitutional
standards. This difference between two standards means law enforcement can more easily
intercept your data outside the home than in it -- bad news for hosted apps like Google
Docs, Mail, etc. Let me go over this again:
102. data in the home
if you download your mail to your laptop and keep it there,
103. data in theE !
A F home
S
the police need a search warrant to get to it, and the burden of probable cause that goes with
it. But if you use Gmail and
104. data in the cloud
keep your mail on Google’s servers,
105. D !
E
data in the cloud
N
P W
and the police need only a court order, which has a much lower hurdle to clear. A recent
lawsuit,
106. Warshak v USA
Warshak v USA has lead to a ruling that the two should be treated the same (and
Constitutional standards should apply to getting data from ISPs). It’s bouncing around
appeals at the moment, and the double standard will continue until and unless it’s resolved
in Warshak’s favour.
107. “Relying on the
Government to ensure
your privacy is like asking
a peeping T to install
om
your window blinds.”
- John Perry Barlow
In short, security and privacy are as much threatened by legal and regulatory means as by
viral and Trojan.
108. But that’s enough about the past. Let’s get back to the Future of the Cloud.
109. Future of the Cloud
Did you hear those capitals? Futurism needs capital letters. I don’t plan to make predictions
about
112. trends
trends, growing numbers of similar things done by people who live on the cutting edge of
what’s possible with today’s technology, because Alan Kay knew how to do futurism:
113. “The best way to predict
the future is to invent it.”
- Alan Kay
He invented object-oriented programming, the the windowing system, pulldown menus, GUIs
and computer interfaces as we know them today basically. So I look for people or projects
that seem to be to be inventing the future. Here are a few.
114. Let’s start in an unlikely place: the foot. The Nike+ is a shoe with a accelerometer that
counts steps. It communicates with an iPod and syncs your running record to a web site.
People who use it report a new way of thinking about their run: it’s become a
115. “My run is now a
videogame, and I want
you to play with me.”
- Jane McGonigal
video game. You can have collaborative and competitive challenges and you get virtual
trophies. Now many pieces of gym equipment can also report to the iPod, so people can
track their treadmill, cross-trainer, and bike miles on the same web site as they keep track of
their running.
116. Or take the Amazon Kindle. This is an electronic book reader--super high resolution screen,
keyboard, and all that but the main innovation behind it is cellular connectivity. You don’t
buy a network contract when you buy the Kindle, but the books you buy (from Amazon.com,
naturally) arrive via the cellular data network. Introduced a year ago, Amazon will have sold
380,000 of them by the end of this year.
117. This is the Wattson. It’s a power meter that reports the information to you, not just the
power company. This is the display, there’s also a transmitter that clips onto a mains cables.
There’s companion software, Holmes, that uploads your energy usage to the Holmes web site
and lets you track and compare over time.
118. This is Botanicalls, a gizmo that sends Twitter updates when your plants need watering.
What you see is a sensor that clips into the potplant’s soil and a network cable to send the
updates.
119. Andy Stanford-Clark, an IBM “Master Inventor”, has instrumented his house and hooked it up
to Twitter. You can follow his house and see his power consumption, when the phone rings,
when the motion-sensitive lights turn on and off, etc.
These people use Twitter, by the way, because Twitter runs a free SMS gateway in the USA
and so it has become a cheap and easy way for programs to send SMS. For example,
120. Former Apple engineer Gordon Meyer hooked his doorbell up to Twitter. Now, no matter
where he is, he knows if his doorbell is ringing.
121. This is the Availabot by Matt Webb and Jack Schulze. It’s a little toy that plugs into your USB
port and stands to attention whenever a particular instant message buddy comes online.
122. These researchers from Iowa State University are holding sensors that will help farmers
understand nutrient and water flow in their soil. They’re 2 inches by 4 inches at the moment
and will live underground, communicating wirelessly with a central computer.
123. WineM is a prototype of a smart wine rack: a reader senses the RFID tags on bottles and uses
lights to display the type of wine so you don’t have to turn the bottle to check the label.
When you add a new bottle to your collection, the hardware scans the UPC code and uses
public databases to translate that into a variety of wine--no keyboard necessary.
124. This is a MobileTEEN GPS unit. AIG, before they needed bailing out, offered a Teen GPS
insurance. In exchange for lower premiums (anyone here tried to insure a teenage driver
lately? You know what I’m talking about) your teen (or their car) must carry this GPS device
around, which reports their location back to AIG. You can tell AIG’s web site to SMS you if
your teenager speeds, and you can even set up a GeoFENCE (that’s a registered service mark,
by the way) and it’ll SMS you if they leave that area.
125. This is the Dash mobile GPS unit for your car. It has a cellular network card in it, and uploads
your location to the Dash servers. In return, you get incredibly accurate up-to-the-second
traffic information as reported by the Dash units of the other cars in the city.
126. This is the next President of the United States of America, using his Blackberry. The
Blackberry is a mobile phone with email, it lets you send and receive email no matter where
you are. The latest news is that Obama will have to give up his Blackberry because the emails
will be a matter of public record, and because it’s not a good idea for a cellphone company to
have a database in which can be easily found the location of the President of the United
States of America.
129. Web meets World
They all connect the Internet to a physical device, whether it’s a potplant, a teenage driver, or
a book. In some cases the devices are
131. displays
displays, reflecting the state of the networked data: whether your friend is online or which
books you have paid for. But in none of these cases is
145. science fiction
science fiction. You’re right, there are a few science fiction authors who have done a great
job of predicting the near future. One of them,
146. William Gibson
William Gibson, was interviewed by Rolling Stone on their 40th anniversary. He was asked
about the major challenges we faced, and he identified
158. data trails
data trails that linger behind us like contrails in the air after the plane has disappeared from
sight. You know we do it now: we leave trails in
169. easy
It’s not easy, but it’s not bad.
There are three things that would make a lot of these problems go away.
170. (0) Don’t collect it
The most obvious solution is simply not to collect the data in the first place. A lot of these
advertising-driven companies are packrats--think of Google and the two year lifetime of
unanonymized data (after the Viacom ruling, by the way, they announced they were changing
those two-year lifetimes to nine months). They keep this data because it might be useful to
their future selves and not for you or your future self.
171. (1) Same goals
Next, vendors should have the same goals as you do. Amazon does: when you run your web
site on EC2, you’re paying Amazon for that service. Amazon isn’t mining what you do and
they’re not storing your data for later use. Google’s business model, however, is advertising.
So you get GMail and it looks like it’s free, but the price you pay is Google’s data collection.
172. (2) Cryptography
Finally, your data should be encrypted in such a way that the service provider can’t decrypt it
without you. We can do this technically, but few companies do it in practice.
173. problems
But there are problems with these solutions, and they cut to the heart of what’s so attractive
about the very things that could threaten privacy:
174. 0) data makes services
better
It’d be sad if Google weren’t to collect data because the data that Google collects makes its
services better. And we benefit when this happens. Yes, it makes Google rich, but it also
makes the Internet navigable, our email spam-free, and fills the world with unicorns and
rainbows.
175. 1) free is cheap
Aligning interests is all very well, but advertising business models make services free that
would otherwise be a direct cost. We can all host our domains and our email on Google
without paying a cent, whereas ISPs typically charge for it. Aligning Google’s interests with
our privacy interests may damage our pecuniary interests. And, finally,
176. (3) shared data makes
individual experiences
better
If you encrypt personally-identifying data, you make it very difficult to create those sites that
crunch everyone’s data and make suggestions or recommendations. If the Holmes web site
can’t see the power use of my Wattson, how can I compare myself to other people?
177. impossible?
So is it impossible to do this right? To respect privacy yet have a cloud application?
179. Wesabe is a Quicken-like program on the web, like Mint. These guys get privacy right. First,
they don’t hold the keys to your Internet banking to download them: you run an uploader on
your PC that fetches your electronic statements from the bank and then sends them to
Wesabe. Second, Wesabe encrypt your personal data so your records can’t be subpoena’d
because your records can’t be tracked back to you. Third, it’s a commercial service and not
advertising supported--your best interests are their best interests. And finally, Wesabe still
manages to provide collective intelligence (you’re not saving as much as everyone else, for
example). I contract to O’Reilly Media, who invested in Wesabe precisely because they get
this so right.
180. So here we are, nearly to the end. What did we learn?
181. (1) Web meets World
Physical devices are being integrated with the Web, or “cloud” if you will
182. (2) Data in the cloud can
be a privacy problem
Data in the cloud can be a privacy problem, because
183. (3) You’ve outsourced
privacy
you’ve outsourced your privacy, so you’re vulnerable to attack not just from hackers but also
from