Web Meets World: Privacy and the Future of the Cloud


Published on

An introduction to privacy issues around cloud computing, with an eye to the ubiquitous computing future of the cloud. First given 20/11/2008 to the Privacy Forum in Auckland, NZ.

Published in: News & Politics, Technology
  • Excellent work.

    zunita http://ringtones-x.com/ | www.freeringtonesforatt.org/
    Are you sure you want to  Yes  No
    Your message goes here

Web Meets World: Privacy and the Future of the Cloud

  1. Web Meets World Nat Torkington nathan@torkington.com
  2. Stories About The Present Thank you. Hello, everyone. Before I talk about the future of the cloud, I’d like to begin with a few stories about the present.
  3. (1) First story.
  4. AOL America OnLine. In August 2006, AOL released
  5. 20,000,000 650,000 20M web queries from 650k users, sampled over three months, to help researchers improve the state of search technology. AOL claimed that the data had been
  6. anonymized anonymized by turning usernames into unique IDs. However many searches contained identifying information such as
  7. addresses addresses,
  8. names names,
  9. e-mail messages and even e-mail messages. The New York Times was able to identify, for example, user
  10. 4417749 4417749 as
  11. 4417749 Thelma Arnold Thelma Arnold of
  12. 4417749 Thelma Arnold Lilburn, Georgia Lilburn Georgia, who has an interest in
  13. 4417749 Thelma Arnold Lilburn, Georgia numb fingers numb fingers
  14. 4417749 Thelma Arnold Lilburn, Georgia numb fingers 60 single men 60-ish single men, and
  15. 4417749 Thelma Arnold Lilburn, Georgia numb fingers 60 single men dog that urinates on everything dogs that urinate on everything. The Times went to visit her and ran a great caption to the photo accompanying the story:
  16. On the subject of AOL, remember that they
  17. tried tried to anonymize. The privacy loss, while you and I might think it was predictable, wasn’t
  18. deliberate deliberate. Shit, as they say, happened.
  19. (2) Second story
  20. Google Google. They collect a lot of data about people:
  21. searches searches,
  22. ad impressions ad impressions,
  23. clickthroughs clickthroughs,
  24. mail mail
  25. chat messages chat messages
  26. documents documents
  27. spreadsheets spreadsheets
  28. presentations presentations,
  29. addresses addresess
  30. medical records medical records, and more. Fortunately Google know that all this data can be dangerous, and take steps to safeguard the
  31. privacy privacy of their users. In February 08 they posted to the Google Blog saying they would
  32. cookie shorten cookie lifetimes
  33. two years to a mere two years, and
  34. anonymize anonymize their search logs
  35. two years after 18-24 months. It didn’t take long for these steps to be shown up as
  36. good enough not good enough. You see, another Google property,
  37. Y ube ouT YouTube, which serves
  38. 1,000,000+ more than a million videos every day, came under attack.
  39. Viacom v Google by Viacom, parent company of Paramount, Dreamworks, MTV, and Nickelodeon. In March 2007, Viacom sued Google over YouTube claiming that its copyrighted TV shows were available on YouTube and that Google wasn’t doing enough to prevent this unauthorised copying and distribution.
  40. U.S. District Judge Louis L. Stanton In July this year, five months after Google announced their “anonymize after two years” policy, U.S. District Judge Louis L. Stanton granted a motion to give Viacom
  41. “the motion to compel production of all data from the Logging database concerning each time a Y ube video has been ouT viewed on the Y ube ouT website or through embedding on a third-party website is granted” a copy of the YouTube access logs. This information includes:
  42. IP addresses your IP address
  43. Y ube username ouT your YouTube user name
  44. time of day the time of day you
  45. which videos watched a particular video. While you and I might be as horrified as Google was, the Judge said
  46. “privacy concerns are speculative” privacy concerns are
  47. “privacy concerns are speculative” Speculative. And he quoted a line from that blog post made back in February ’08, where Google said that an IP address
  48. “The reality is though that in most cases, an IP address without additional information cannot [identify you]” - Google Blog, Feb ‘08 alone isn’t identifying information. Fortunately Viacom are on top of the privacy concerns raised by their request for YouTube logs. They say they will limit access to the data
  49. “is going to be limited to outside advisers who can use it solely for the purpose of enforcing our rights against Y ube and ouT Google” - Michael D. Fricklas, Viacom’s general counsel to Viacom’s advisers. Whew, thank you Viacom. And thank you, Google, for choosing to retain that data and only anonymize after two years!
  50. (3) Now let’s look at Europe.
  51. Directive 2006/24/EC On March 15, 2006, the EU adopted Directive 2006/24/EC. This mandates
  52. “the retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks” the retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks.
  53. 6 months - 2 years For anywhere between 6 months and 2 years, depending on the type of data. Fortunately it only covers
  54. telephony, mobile telephony, Internet access, Internet email and Internet telephony. telephony, mobile telephony, Internet access, Internet email and Internet telephony. Member states have the option to delay the Internet bits until March 2009 and most have opted to delay. Member states also have the option to go beyond the EU recommendations, as
  55. .dk Denmark has chosen to do. Their draft provision enacting the EU order would require ISPs to log the
  56. source source
  57. time time, and
  58. destination destination of
  59. every single internet data packet every single Internet data packet. I’m reasonably confident this is impossible, but it does show the ambitions of the states here. The states are all confident that the data won’t be
  60. “misused” misused, but don’t defined “misuse”. I won’t define “misuse” either, but will simply note that Germany makes their logs available for prosecutions under
  61. © civil copyright law so that Disney will be able see when you logged in and who you spoke to.
  62. Ok, that’s the end of the stories.
  63. common What did they all have in common?
  64. Cloud They’re all about the cloud. Or, rather, the
  65. “Cloud” “the cloud” (air quotes). It’s a broad and popular term, and like all broad popular terms it’s become more of a buzzword to get venture funding, or to sound like you know what you’re talking about than a useful marker of technological progress.
  66. “Cloud” = On Demand + Utility Computing + SOA + ASP + SaaS + ... It covers a multitude of sins. We talked about Software As A Service (YouTube), and that’s a real trend. Lots of different types of software are moving into “the cloud”.
  67. Meet Google Healthcare. Patients upload their medical records and get a friendly useful interface for checking drug interactions, etc. The privacy aspect to this is a little odd, though.
  68. Google Healthcare slips through a crack in the medical record legislation in the USA-- because they’re not holding records on behalf of providers, no security is mandated. The expectation is that the market will determine an appropriate level of security. Hoorah for the market! And as we all know, markets immediately find the appropriate level of security with no incentive to provide too little. And fortunately there can be no privacy problems flowing from an inadequate level of security. Right? Right?
  69. Salesforce.com is Customer Relationship Management--salespeople and their managers can keep track of contacts, accounts, etc. online. You know, “Joe at IBM buys widgets, here’s his name and address and a record of every conversation that anyone in the company has had with him”.
  70. Google Mail.
  71. Here’s mint.com, one of several web-based apps playing in Quicken’s back yard. They fetch your bank statements from your bank, process them, and tell you where you’re spending your money.
  72. Utility Computing That was software as a service. Another aspect to the cloud is utility computing, such as
  73. Amazon Web Services Amazon’s web services. They offer many services over the web that have nothing to do with books, including
  74. Amazon EC2 their “Elastic Compute Cloud”, EC2. You upload a virtual machine image (think: copy of a hard drive) and Amazon runs it for you. No longer do you have to worry about managing bandwidth, installing operating systems on servers, replacing hard drives, and all the stuff that IT departments used to have to do. Consequently
  75. many people have build their companies on EC2 and its sister storage service, S3. Many of these are startups, attracted because it lets them scale without substantial investment, but many EC2 customers (Eli Lilly, for example) are sizable. Amazon’s services are getting a lot of use,
  76. as this graph shows. Amazon’s Web Services overtook Amazon.com, in amount of traffic, in mid-2007, just 18 months after launch.
  77. Google also have a utility computing play: you write your program to run on their environment and away you go, all of Google’s more than 2 million servers are yours for the using. Like Amazon, you’re charged by the CPU-hour.
  78. Microsoft are getting into the game as well, with their Live platform. According to a recent Economist article, they’re rolling out.
  79. 35,000/month 2,500/container $500M/data centre 35,000 new servers every month in data centers. They’re building the data centers from 40- foot shipping containers that each house 2,500 servers. The new data center in Chicago cost $500M. These numbers aren’t unusual for the industry.
  80. “Cloud” What does Utility Computing and Software as a Service have in common? What unites Google Apps, your ISP, Amazon EC2, and Salesforce.com?
  81. SEP They all add up to the same thing: they make computing
  82. Someone Else’s Problem someone else’s problem. It’s
  83. Outsourced IT Outsourced IT. Every program on every machine is a pain in someone’s ass. IT departments must keep the versions up to date, deal with bitrot (when Windows stops working and everything has to be reinstalled), etc. Using Google Docs means all the administrative hassle of Microsoft Office has become Google’s problem. And part and parcel of outsourced IT is
  84. Outsourced Security outsourced security. It’s a pain in the bum locking down machines and applications against
  85. Hackers, Viruses, et al. bad guys. Every IT department asks itself “can Google do a better job of this than I can?” and probably answers, correctly, “yes”.
  86. Privacy & Security But because there’s a strong relationship between privacy and security, namely it’s impossible to have privacy without good security,
  87. Outsourced Privacy using Google Docs means that we’ve outsourced privacy as well.
  88. Where’s the problem? Many people ask “so what?” when you tell them that they’ve outsourced their privacy. “Google will look after me, right?” they say. There are two reasons why this isn’t necessarily true.
  89. (a) Google decides First, Google’s gets to decide how much security to put around your data, and have no doubt--it is a decision. There’s always
  90. the next threat another threat to defend against. If you’re looking for
  91. perfect security perfect security, then
  92. perfect security you might as well look for perfect happiness. There is no such thing as perfect security, just acceptable levels of risk. Your company makes decides upon that acceptable level every time they decide not to train their IT staff in the latest security practices, every time they choose between one antivirus product and another, every time they decide not to buy a $75,000 network security appliance. That’s perfectly normal. What’s different in the world of the cloud, though, is that
  93. decide you don’t get to decide. Your vendor does. And no vendor will tell you all the security precautions and procedures they have. Their security is
  94. opaque opaque. Few vendors even report incidents, and even fewer countries have mandatory reporting laws. So you’re asked to make a security (and thus privacy) decision on the basis of imperfect, or even absent, information. That’s the first reason that outsourced privacy isn’t all good.
  95. (b) Gubmint = Black Hats The second reason is that it’s not just rogue teenagers hellbent on getting your credit card details that you have to worry about. As the EU directive clearly show, governments are black hats, bad guys. They just have
  96. subpoena / court order / search warrant / request different attack vectors from those of your average 21 year old Ukrainian virus writer. And as
  97. Y ube ouT usernames IP addresses Viacom Viacom’s logfile data grab shows, many private companies have learned to piggyback on the judicial and legislative’s attack vectors.
  98. Warshak v USA The situation’s even worse in the USA. Until now there have been two standards for searching your records:
  99. constitutional Constitutional, preventing unreasonable search and seizure, which covers stuff in your home. Constitutional protections are hard to circumvent, as they were written by angry 18th century revolutionaries and tend to be quite explicit in their distrust of government. The other protection is
  100. statutory statutory, the laws that fill the gaps in the Constitution. Statutory protections guard your data outside the home
  101. statutory = weaker and generally require law enforcement to have less justification than do the constitutional standards. This difference between two standards means law enforcement can more easily intercept your data outside the home than in it -- bad news for hosted apps like Google Docs, Mail, etc. Let me go over this again:
  102. data in the home if you download your mail to your laptop and keep it there,
  103. data in theE ! A F home S the police need a search warrant to get to it, and the burden of probable cause that goes with it. But if you use Gmail and
  104. data in the cloud keep your mail on Google’s servers,
  105. D ! E data in the cloud N P W and the police need only a court order, which has a much lower hurdle to clear. A recent lawsuit,
  106. Warshak v USA Warshak v USA has lead to a ruling that the two should be treated the same (and Constitutional standards should apply to getting data from ISPs). It’s bouncing around appeals at the moment, and the double standard will continue until and unless it’s resolved in Warshak’s favour.
  107. “Relying on the Government to ensure your privacy is like asking a peeping T to install om your window blinds.” - John Perry Barlow In short, security and privacy are as much threatened by legal and regulatory means as by viral and Trojan.
  108. But that’s enough about the past. Let’s get back to the Future of the Cloud.
  109. Future of the Cloud Did you hear those capitals? Futurism needs capital letters. I don’t plan to make predictions about
  110. CO2-burning flying cars we’ll all be driving, or
  111. the silver robots that will do our bidding. All I do is identify
  112. trends trends, growing numbers of similar things done by people who live on the cutting edge of what’s possible with today’s technology, because Alan Kay knew how to do futurism:
  113. “The best way to predict the future is to invent it.” - Alan Kay He invented object-oriented programming, the the windowing system, pulldown menus, GUIs and computer interfaces as we know them today basically. So I look for people or projects that seem to be to be inventing the future. Here are a few.
  114. Let’s start in an unlikely place: the foot. The Nike+ is a shoe with a accelerometer that counts steps. It communicates with an iPod and syncs your running record to a web site. People who use it report a new way of thinking about their run: it’s become a
  115. “My run is now a videogame, and I want you to play with me.” - Jane McGonigal video game. You can have collaborative and competitive challenges and you get virtual trophies. Now many pieces of gym equipment can also report to the iPod, so people can track their treadmill, cross-trainer, and bike miles on the same web site as they keep track of their running.
  116. Or take the Amazon Kindle. This is an electronic book reader--super high resolution screen, keyboard, and all that but the main innovation behind it is cellular connectivity. You don’t buy a network contract when you buy the Kindle, but the books you buy (from Amazon.com, naturally) arrive via the cellular data network. Introduced a year ago, Amazon will have sold 380,000 of them by the end of this year.
  117. This is the Wattson. It’s a power meter that reports the information to you, not just the power company. This is the display, there’s also a transmitter that clips onto a mains cables. There’s companion software, Holmes, that uploads your energy usage to the Holmes web site and lets you track and compare over time.
  118. This is Botanicalls, a gizmo that sends Twitter updates when your plants need watering. What you see is a sensor that clips into the potplant’s soil and a network cable to send the updates.
  119. Andy Stanford-Clark, an IBM “Master Inventor”, has instrumented his house and hooked it up to Twitter. You can follow his house and see his power consumption, when the phone rings, when the motion-sensitive lights turn on and off, etc. These people use Twitter, by the way, because Twitter runs a free SMS gateway in the USA and so it has become a cheap and easy way for programs to send SMS. For example,
  120. Former Apple engineer Gordon Meyer hooked his doorbell up to Twitter. Now, no matter where he is, he knows if his doorbell is ringing.
  121. This is the Availabot by Matt Webb and Jack Schulze. It’s a little toy that plugs into your USB port and stands to attention whenever a particular instant message buddy comes online.
  122. These researchers from Iowa State University are holding sensors that will help farmers understand nutrient and water flow in their soil. They’re 2 inches by 4 inches at the moment and will live underground, communicating wirelessly with a central computer.
  123. WineM is a prototype of a smart wine rack: a reader senses the RFID tags on bottles and uses lights to display the type of wine so you don’t have to turn the bottle to check the label. When you add a new bottle to your collection, the hardware scans the UPC code and uses public databases to translate that into a variety of wine--no keyboard necessary.
  124. This is a MobileTEEN GPS unit. AIG, before they needed bailing out, offered a Teen GPS insurance. In exchange for lower premiums (anyone here tried to insure a teenage driver lately? You know what I’m talking about) your teen (or their car) must carry this GPS device around, which reports their location back to AIG. You can tell AIG’s web site to SMS you if your teenager speeds, and you can even set up a GeoFENCE (that’s a registered service mark, by the way) and it’ll SMS you if they leave that area.
  125. This is the Dash mobile GPS unit for your car. It has a cellular network card in it, and uploads your location to the Dash servers. In return, you get incredibly accurate up-to-the-second traffic information as reported by the Dash units of the other cars in the city.
  126. This is the next President of the United States of America, using his Blackberry. The Blackberry is a mobile phone with email, it lets you send and receive email no matter where you are. The latest news is that Obama will have to give up his Blackberry because the emails will be a matter of public record, and because it’s not a good idea for a cellphone company to have a database in which can be easily found the location of the President of the United States of America.
  127. Ok, enough examples.
  128. Common? What do these devices have in common?
  129. Web meets World They all connect the Internet to a physical device, whether it’s a potplant, a teenage driver, or a book. In some cases the devices are
  130. sensors sensors, reporting back speeds, power consumption, Nitrogen concentration. In other cases the devices are
  131. displays displays, reflecting the state of the networked data: whether your friend is online or which books you have paid for. But in none of these cases is
  132. dumb device the device dumb. We’re entering an age of
  133. smart devices smart devices, where
  134. everything everything
  135. will be will be
  136. networked networked. Not because it’s cool and trendy but because
  137. the network the network
  138. makes makes
  139. the device the device
  140. better better, or because
  141. the device the device
  142. makes makes
  143. the network the network
  144. better better. You might be thinking this is all sounding a bit
  145. science fiction science fiction. You’re right, there are a few science fiction authors who have done a great job of predicting the near future. One of them,
  146. William Gibson William Gibson, was interviewed by Rolling Stone on their 40th anniversary. He was asked about the major challenges we faced, and he identified
  147. ubiquitous computing ubiquitous computing, this idea that everything is connected all the time. He said: One of the things our grandchildren will find
  148. quaintest quaintest about us is that we distinguish
  149. the digital the digital from
  150. the real the real,
  151. the virtual the virtual, from
  152. the real the real
  153. literally impossible In the future, that will become literally impossible
  154. cyberspace The distinction between cyberspace and
  155. ! cyberspace that which isn't cyberspace is going to be
  156. unimaginable unimaginable. But you don’t have to look to science fiction to see that as we go through life today, we leave
  157. invisible traces invisible traces of our presence,
  158. data trails data trails that linger behind us like contrails in the air after the plane has disappeared from sight. You know we do it now: we leave trails in
  159. credit card credit card databases,
  160. phone calls phone company databases,
  161. ISPs ISP databases. But when we pay cash, or visit someone in real life, or use a payphone, we can
  162. go off the grid go off the grid, be
  163. anonymous anonymous. But as the web meets the world, this will be harder and harder to do. Each new
  164. service = database service is backed by a database, and that database is vulnerable to
  165. Viacom bad guys,
  166. Directive 2006/24/EC governments, and
  167. AOL incompetence. I know it sounds grim, but it’s
  168. BAD not all bad.
  169. easy It’s not easy, but it’s not bad. There are three things that would make a lot of these problems go away.
  170. (0) Don’t collect it The most obvious solution is simply not to collect the data in the first place. A lot of these advertising-driven companies are packrats--think of Google and the two year lifetime of unanonymized data (after the Viacom ruling, by the way, they announced they were changing those two-year lifetimes to nine months). They keep this data because it might be useful to their future selves and not for you or your future self.
  171. (1) Same goals Next, vendors should have the same goals as you do. Amazon does: when you run your web site on EC2, you’re paying Amazon for that service. Amazon isn’t mining what you do and they’re not storing your data for later use. Google’s business model, however, is advertising. So you get GMail and it looks like it’s free, but the price you pay is Google’s data collection.
  172. (2) Cryptography Finally, your data should be encrypted in such a way that the service provider can’t decrypt it without you. We can do this technically, but few companies do it in practice.
  173. problems But there are problems with these solutions, and they cut to the heart of what’s so attractive about the very things that could threaten privacy:
  174. 0) data makes services better It’d be sad if Google weren’t to collect data because the data that Google collects makes its services better. And we benefit when this happens. Yes, it makes Google rich, but it also makes the Internet navigable, our email spam-free, and fills the world with unicorns and rainbows.
  175. 1) free is cheap Aligning interests is all very well, but advertising business models make services free that would otherwise be a direct cost. We can all host our domains and our email on Google without paying a cent, whereas ISPs typically charge for it. Aligning Google’s interests with our privacy interests may damage our pecuniary interests. And, finally,
  176. (3) shared data makes individual experiences better If you encrypt personally-identifying data, you make it very difficult to create those sites that crunch everyone’s data and make suggestions or recommendations. If the Holmes web site can’t see the power use of my Wattson, how can I compare myself to other people?
  177. impossible? So is it impossible to do this right? To respect privacy yet have a cloud application?
  178. impossible? No, there are companies trying hard to do it right. For example,
  179. Wesabe is a Quicken-like program on the web, like Mint. These guys get privacy right. First, they don’t hold the keys to your Internet banking to download them: you run an uploader on your PC that fetches your electronic statements from the bank and then sends them to Wesabe. Second, Wesabe encrypt your personal data so your records can’t be subpoena’d because your records can’t be tracked back to you. Third, it’s a commercial service and not advertising supported--your best interests are their best interests. And finally, Wesabe still manages to provide collective intelligence (you’re not saving as much as everyone else, for example). I contract to O’Reilly Media, who invested in Wesabe precisely because they get this so right.
  180. So here we are, nearly to the end. What did we learn?
  181. (1) Web meets World Physical devices are being integrated with the Web, or “cloud” if you will
  182. (2) Data in the cloud can be a privacy problem Data in the cloud can be a privacy problem, because
  183. (3) You’ve outsourced privacy you’ve outsourced your privacy, so you’re vulnerable to attack not just from hackers but also from
  184. (4) Governments governments,
  185. (5) Competitors competitors, and
  186. (6) Incompetence AOL. And while it is possible to build useful services while
  187. (7) Choose not to collect choosing not to gather some data,
  188. (8) Cryptography encrypting the data you do collect, and
  189. (9) Aligning interests making sure that your service provider doesn’t have a motive to undermine your privacy ...
  190. easy It’s not easy. Thank you, I’ll now take
  191. questions? nathan@torkington.com questions.