This document summarizes three stories about data privacy and security issues related to cloud computing:
1) AOL's release of search data that was supposedly anonymized but was easily identifiable.
2) YouTube access logs ordered to be released by a judge despite Google's policy to anonymize data after two years.
3) An EU directive mandating retention of communications metadata for up to two years by member states.
2. Stories About The Present
Thank you. Hello, everyone. Before I talk about the future of the cloud, Iād like to begin with
a few stories about the present.
5. 20,000,000
650,000
20M web queries from 650k users, sampled over three months, to help researchers improve
the state of search technology. AOL claimed that the data had been
12. 4417749
Thelma Arnold
Lilburn, Georgia
Lilburn Georgia, who has an interest in
13. 4417749
Thelma Arnold
Lilburn, Georgia
numb fingers
numb ļ¬ngers
14. 4417749
Thelma Arnold
Lilburn, Georgia
numb fingers
60 single men
60-ish single men, and
15. 4417749
Thelma Arnold
Lilburn, Georgia
numb fingers
60 single men
dog that urinates on everything
dogs that urinate on everything. The Times went to visit her and ran a great caption to the
photo accompanying the story:
39. Viacom v Google
by Viacom, parent company of Paramount, Dreamworks, MTV, and Nickelodeon. In March
2007, Viacom sued Google over YouTube claiming that its copyrighted TV shows were
available on YouTube and that Google wasnāt doing enough to prevent this unauthorised
copying and distribution.
40. U.S. District Judge
Louis L. Stanton
In July this year, ļ¬ve months after Google announced their āanonymize after two yearsā
policy, U.S. District Judge Louis L. Stanton granted a motion to give Viacom
41. āthe motion to compel
production of all data from
the Logging database
concerning each time a
Y ube video has been
ouT
viewed on the Y ube
ouT
website or through
embedding on a third-party
website is grantedā
a copy of the YouTube access logs. This information includes:
47. āprivacy concerns are
speculativeā
Speculative.
And he quoted a line from that blog post made back in February ā08, where Google said that
an IP address
48. āThe reality is though
that in most cases, an IP
address without
additional information
cannot [identify you]ā
- Google Blog, Feb ā08
alone isnāt identifying information.
Fortunately Viacom are on top of the privacy concerns raised by their request for YouTube
logs. They say they will limit access to the data
49. āis going to be limited to
outside advisers who can
use it solely for the
purpose of enforcing our
rights against Y ube and ouT
Googleā
- Michael D. Fricklas, Viacomās general counsel
to Viacomās advisers. Whew, thank you Viacom. And thank you, Google, for choosing to
retain that data and only anonymize after two years!
52. āthe retention of data
generated or processed in
connection with the
provision of publicly
available electronic
communications services or
of public communications
networksā
the retention of data generated or processed in connection with the provision of publicly
available electronic communications services or of public communications networks.
53. 6 months
-
2 years
For anywhere between 6 months and 2 years, depending on the type of data. Fortunately it
only covers
54. telephony, mobile
telephony, Internet access,
Internet email and Internet
telephony.
telephony, mobile telephony, Internet access, Internet email and Internet telephony. Member
states have the option to delay the Internet bits until March 2009 and most have opted to
delay. Member states also have the option to go beyond the EU recommendations, as
55. .dk
Denmark has chosen to do. Their draft provision enacting the EU order would require ISPs to
log the
59. every single internet data
packet
every single Internet data packet. Iām reasonably conļ¬dent this is impossible, but it does
show the ambitions of the states here. The states are all conļ¬dent that the data wonāt be
60. āmisusedā
misused, but donāt deļ¬ned āmisuseā.
I wonāt deļ¬ne āmisuseā either, but will simply note that Germany makes their logs available
for prosecutions under
65. āCloudā
āthe cloudā (air quotes). Itās a broad and popular term, and like all broad popular terms itās
become more of a buzzword to get venture funding, or to sound like you know what youāre
talking about than a useful marker of technological progress.
66. āCloudā = On Demand +
Utility Computing + SOA
+ ASP + SaaS + ...
It covers a multitude of sins. We talked about Software As A Service (YouTube), and thatās a
real trend. Lots of different types of software are moving into āthe cloudā.
67. Meet Google Healthcare. Patients upload their medical records and get a friendly useful
interface for checking drug interactions, etc. The privacy aspect to this is a little odd,
though.
68. Google Healthcare slips through a crack in the medical record legislation in the USA--
because theyāre not holding records on behalf of providers, no security is mandated. The
expectation is that the market will determine an appropriate level of security. Hoorah for the
market!
And as we all know, markets immediately ļ¬nd the appropriate level of security with no
incentive to provide too little. And fortunately there can be no privacy problems ļ¬owing from
an inadequate level of security. Right? Right?
69. Salesforce.com is Customer Relationship Management--salespeople and their managers can
keep track of contacts, accounts, etc. online. You know, āJoe at IBM buys widgets, hereās his
name and address and a record of every conversation that anyone in the company has had
with himā.
71. Hereās mint.com, one of several web-based apps playing in Quickenās back yard. They fetch
your bank statements from your bank, process them, and tell you where youāre spending
your money.
72. Utility Computing
That was software as a service.
Another aspect to the cloud is utility computing, such as
73. Amazon Web Services
Amazonās web services. They offer many services over the web that have nothing to do with
books, including
74. Amazon EC2
their āElastic Compute Cloudā, EC2. You upload a virtual machine image (think: copy of a
hard drive) and Amazon runs it for you. No longer do you have to worry about managing
bandwidth, installing operating systems on servers, replacing hard drives, and all the stuff
that IT departments used to have to do. Consequently
75. many people have build their companies on EC2 and its sister storage service, S3. Many of
these are startups, attracted because it lets them scale without substantial investment, but
many EC2 customers (Eli Lilly, for example) are sizable. Amazonās services are getting a lot
of use,
76. as this graph shows. Amazonās Web Services overtook Amazon.com, in amount of traffic, in
mid-2007, just 18 months after launch.
77. Google also have a utility computing play: you write your program to run on their
environment and away you go, all of Googleās more than 2 million servers are yours for the
using. Like Amazon, youāre charged by the CPU-hour.
78. Microsoft are getting into the game as well, with their Live platform. According to a recent
Economist article, theyāre rolling out.
79. 35,000/month
2,500/container
$500M/data centre
35,000 new servers every month in data centers. Theyāre building the data centers from 40-
foot shipping containers that each house 2,500 servers. The new data center in Chicago cost
$500M. These numbers arenāt unusual for the industry.
80. āCloudā
What does Utility Computing and Software as a Service have in common? What unites Google
Apps, your ISP, Amazon EC2, and Salesforce.com?
83. Outsourced IT
Outsourced IT. Every program on every machine is a pain in someoneās ass. IT departments
must keep the versions up to date, deal with bitrot (when Windows stops working and
everything has to be reinstalled), etc. Using Google Docs means all the administrative hassle
of Microsoft Office has become Googleās problem. And part and parcel of outsourced IT is
85. Hackers, Viruses, et al.
bad guys. Every IT department asks itself ācan Google do a better job of this than I can?ā and
probably answers, correctly, āyesā.
86. Privacy & Security
But because thereās a strong relationship between privacy and security, namely itās
impossible to have privacy without good security,
88. Whereās the problem?
Many people ask āso what?ā when you tell them that theyāve outsourced their privacy.
āGoogle will look after me, right?ā they say. There are two reasons why this isnāt necessarily
true.
89. (a) Google decides
First, Googleās gets to decide how much security to put around your data, and have no
doubt--it is a decision. Thereās always
92. perfect security
you might as well look for perfect happiness. There is no such thing as perfect security, just
acceptable levels of risk. Your company makes decides upon that acceptable level every time
they decide not to train their IT staff in the latest security practices, every time they choose
between one antivirus product and another, every time they decide not to buy a $75,000
network security appliance. Thatās perfectly normal. Whatās different in the world of the
cloud, though, is that
93. decide
you donāt get to decide. Your vendor does. And no vendor will tell you all the security
precautions and procedures they have. Their security is
94. opaque
opaque. Few vendors even report incidents, and even fewer countries have mandatory
reporting laws. So youāre asked to make a security (and thus privacy) decision on the basis of
imperfect, or even absent, information. Thatās the ļ¬rst reason that outsourced privacy isnāt
all good.
95. (b) Gubmint = Black Hats
The second reason is that itās not just rogue teenagers hellbent on getting your credit card
details that you have to worry about. As the EU directive clearly show, governments are black
hats, bad guys. They just have
96. subpoena / court order /
search warrant / request
different attack vectors from those of your average 21 year old Ukrainian virus writer. And as
97. Y ube
ouT usernames
IP addresses
Viacom
Viacomās logļ¬le data grab shows, many private companies have learned to piggyback on the
judicial and legislativeās attack vectors.
98. Warshak v USA
The situationās even worse in the USA. Until now there have been two standards for
searching your records:
99. constitutional
Constitutional, preventing unreasonable search and seizure, which covers stuff in your home.
Constitutional protections are hard to circumvent, as they were written by angry 18th century
revolutionaries and tend to be quite explicit in their distrust of government. The other
protection is
100. statutory
statutory, the laws that ļ¬ll the gaps in the Constitution. Statutory protections guard your
data outside the home
101. statutory = weaker
and generally require law enforcement to have less justiļ¬cation than do the constitutional
standards. This difference between two standards means law enforcement can more easily
intercept your data outside the home than in it -- bad news for hosted apps like Google
Docs, Mail, etc. Let me go over this again:
102. data in the home
if you download your mail to your laptop and keep it there,
103. data in theE !
A F home
S
the police need a search warrant to get to it, and the burden of probable cause that goes with
it. But if you use Gmail and
104. data in the cloud
keep your mail on Googleās servers,
105. D !
E
data in the cloud
N
P W
and the police need only a court order, which has a much lower hurdle to clear. A recent
lawsuit,
106. Warshak v USA
Warshak v USA has lead to a ruling that the two should be treated the same (and
Constitutional standards should apply to getting data from ISPs). Itās bouncing around
appeals at the moment, and the double standard will continue until and unless itās resolved
in Warshakās favour.
107. āRelying on the
Government to ensure
your privacy is like asking
a peeping T to install
om
your window blinds.ā
- John Perry Barlow
In short, security and privacy are as much threatened by legal and regulatory means as by
viral and Trojan.
108. But thatās enough about the past. Letās get back to the Future of the Cloud.
109. Future of the Cloud
Did you hear those capitals? Futurism needs capital letters. I donāt plan to make predictions
about
112. trends
trends, growing numbers of similar things done by people who live on the cutting edge of
whatās possible with todayās technology, because Alan Kay knew how to do futurism:
113. āThe best way to predict
the future is to invent it.ā
- Alan Kay
He invented object-oriented programming, the the windowing system, pulldown menus, GUIs
and computer interfaces as we know them today basically. So I look for people or projects
that seem to be to be inventing the future. Here are a few.
114. Letās start in an unlikely place: the foot. The Nike+ is a shoe with a accelerometer that
counts steps. It communicates with an iPod and syncs your running record to a web site.
People who use it report a new way of thinking about their run: itās become a
115. āMy run is now a
videogame, and I want
you to play with me.ā
- Jane McGonigal
video game. You can have collaborative and competitive challenges and you get virtual
trophies. Now many pieces of gym equipment can also report to the iPod, so people can
track their treadmill, cross-trainer, and bike miles on the same web site as they keep track of
their running.
116. Or take the Amazon Kindle. This is an electronic book reader--super high resolution screen,
keyboard, and all that but the main innovation behind it is cellular connectivity. You donāt
buy a network contract when you buy the Kindle, but the books you buy (from Amazon.com,
naturally) arrive via the cellular data network. Introduced a year ago, Amazon will have sold
380,000 of them by the end of this year.
117. This is the Wattson. Itās a power meter that reports the information to you, not just the
power company. This is the display, thereās also a transmitter that clips onto a mains cables.
Thereās companion software, Holmes, that uploads your energy usage to the Holmes web site
and lets you track and compare over time.
118. This is Botanicalls, a gizmo that sends Twitter updates when your plants need watering.
What you see is a sensor that clips into the potplantās soil and a network cable to send the
updates.
119. Andy Stanford-Clark, an IBM āMaster Inventorā, has instrumented his house and hooked it up
to Twitter. You can follow his house and see his power consumption, when the phone rings,
when the motion-sensitive lights turn on and off, etc.
These people use Twitter, by the way, because Twitter runs a free SMS gateway in the USA
and so it has become a cheap and easy way for programs to send SMS. For example,
120. Former Apple engineer Gordon Meyer hooked his doorbell up to Twitter. Now, no matter
where he is, he knows if his doorbell is ringing.
121. This is the Availabot by Matt Webb and Jack Schulze. Itās a little toy that plugs into your USB
port and stands to attention whenever a particular instant message buddy comes online.
122. These researchers from Iowa State University are holding sensors that will help farmers
understand nutrient and water ļ¬ow in their soil. Theyāre 2 inches by 4 inches at the moment
and will live underground, communicating wirelessly with a central computer.
123. WineM is a prototype of a smart wine rack: a reader senses the RFID tags on bottles and uses
lights to display the type of wine so you donāt have to turn the bottle to check the label.
When you add a new bottle to your collection, the hardware scans the UPC code and uses
public databases to translate that into a variety of wine--no keyboard necessary.
124. This is a MobileTEEN GPS unit. AIG, before they needed bailing out, offered a Teen GPS
insurance. In exchange for lower premiums (anyone here tried to insure a teenage driver
lately? You know what Iām talking about) your teen (or their car) must carry this GPS device
around, which reports their location back to AIG. You can tell AIGās web site to SMS you if
your teenager speeds, and you can even set up a GeoFENCE (thatās a registered service mark,
by the way) and itāll SMS you if they leave that area.
125. This is the Dash mobile GPS unit for your car. It has a cellular network card in it, and uploads
your location to the Dash servers. In return, you get incredibly accurate up-to-the-second
traffic information as reported by the Dash units of the other cars in the city.
126. This is the next President of the United States of America, using his Blackberry. The
Blackberry is a mobile phone with email, it lets you send and receive email no matter where
you are. The latest news is that Obama will have to give up his Blackberry because the emails
will be a matter of public record, and because itās not a good idea for a cellphone company to
have a database in which can be easily found the location of the President of the United
States of America.
129. Web meets World
They all connect the Internet to a physical device, whether itās a potplant, a teenage driver, or
a book. In some cases the devices are
131. displays
displays, reļ¬ecting the state of the networked data: whether your friend is online or which
books you have paid for. But in none of these cases is
145. science fiction
science ļ¬ction. Youāre right, there are a few science ļ¬ction authors who have done a great
job of predicting the near future. One of them,
146. William Gibson
William Gibson, was interviewed by Rolling Stone on their 40th anniversary. He was asked
about the major challenges we faced, and he identiļ¬ed
158. data trails
data trails that linger behind us like contrails in the air after the plane has disappeared from
sight. You know we do it now: we leave trails in
169. easy
Itās not easy, but itās not bad.
There are three things that would make a lot of these problems go away.
170. (0) Donāt collect it
The most obvious solution is simply not to collect the data in the ļ¬rst place. A lot of these
advertising-driven companies are packrats--think of Google and the two year lifetime of
unanonymized data (after the Viacom ruling, by the way, they announced they were changing
those two-year lifetimes to nine months). They keep this data because it might be useful to
their future selves and not for you or your future self.
171. (1) Same goals
Next, vendors should have the same goals as you do. Amazon does: when you run your web
site on EC2, youāre paying Amazon for that service. Amazon isnāt mining what you do and
theyāre not storing your data for later use. Googleās business model, however, is advertising.
So you get GMail and it looks like itās free, but the price you pay is Googleās data collection.
172. (2) Cryptography
Finally, your data should be encrypted in such a way that the service provider canāt decrypt it
without you. We can do this technically, but few companies do it in practice.
173. problems
But there are problems with these solutions, and they cut to the heart of whatās so attractive
about the very things that could threaten privacy:
174. 0) data makes services
better
Itād be sad if Google werenāt to collect data because the data that Google collects makes its
services better. And we beneļ¬t when this happens. Yes, it makes Google rich, but it also
makes the Internet navigable, our email spam-free, and ļ¬lls the world with unicorns and
rainbows.
175. 1) free is cheap
Aligning interests is all very well, but advertising business models make services free that
would otherwise be a direct cost. We can all host our domains and our email on Google
without paying a cent, whereas ISPs typically charge for it. Aligning Googleās interests with
our privacy interests may damage our pecuniary interests. And, ļ¬nally,
176. (3) shared data makes
individual experiences
better
If you encrypt personally-identifying data, you make it very difficult to create those sites that
crunch everyoneās data and make suggestions or recommendations. If the Holmes web site
canāt see the power use of my Wattson, how can I compare myself to other people?
177. impossible?
So is it impossible to do this right? To respect privacy yet have a cloud application?
179. Wesabe is a Quicken-like program on the web, like Mint. These guys get privacy right. First,
they donāt hold the keys to your Internet banking to download them: you run an uploader on
your PC that fetches your electronic statements from the bank and then sends them to
Wesabe. Second, Wesabe encrypt your personal data so your records canāt be subpoenaād
because your records canāt be tracked back to you. Third, itās a commercial service and not
advertising supported--your best interests are their best interests. And ļ¬nally, Wesabe still
manages to provide collective intelligence (youāre not saving as much as everyone else, for
example). I contract to OāReilly Media, who invested in Wesabe precisely because they get
this so right.
180. So here we are, nearly to the end. What did we learn?
181. (1) Web meets World
Physical devices are being integrated with the Web, or ācloudā if you will
182. (2) Data in the cloud can
be a privacy problem
Data in the cloud can be a privacy problem, because
183. (3) Youāve outsourced
privacy
youāve outsourced your privacy, so youāre vulnerable to attack not just from hackers but also
from