Successfully reported this slideshow.
Your SlideShare is downloading. ×

Digital Trails Dave King 1 5 10 Part 1 D3


Check these out next

1 of 58 Ad

More Related Content

Slideshows for you (17)

Viewers also liked (20)


Similar to Digital Trails Dave King 1 5 10 Part 1 D3 (20)

More from Dave King (12)


Recently uploaded (20)

Digital Trails Dave King 1 5 10 Part 1 D3

  1. 1. Digital Traces and Trails: Extracting Intelligence from the Collective Interactions of Web and Mobile Users   Dave King HICSS-44 Tutorial January 5, 2010
  2. 2. Agenda <ul><li>Digital Traces & Trails – </li></ul><ul><ul><li>Some examples </li></ul></ul><ul><ul><li>Expansion in the volume and types of traces </li></ul></ul><ul><li>Mining Digital Traces & Trails </li></ul><ul><li>Some Comments about Privacy </li></ul>
  3. 3. Pop Quiz
  4. 4. Background <ul><li>Dave King, SVP of Product Development, Product Management - JDA Software </li></ul><ul><li>Experience </li></ul><ul><ul><li>6 Years with JDA Software </li></ul></ul><ul><ul><li>27 Years - Enterprise Software </li></ul></ul><ul><ul><li>15 Years as a University Professor </li></ul></ul><ul><li>Education </li></ul><ul><ul><li>Ph.D. in Sociology and Statistics from University of North Carolina at Chapel Hill (long time ago) </li></ul></ul>
  5. 5. Background <ul><li>12 years as Co-Chair of the Internet & Digital Economy Track (HICSS) </li></ul><ul><li>Long Time Interest in various aspects of E-Commerce & Business Intelligence </li></ul><ul><li>Tutorial topic reflects a personal interest in </li></ul><ul><ul><li>The data produced by various networks and network devices, </li></ul></ul><ul><ul><li>The examination of those data with advanced analytical techniques </li></ul></ul><ul><ul><li>And some of the social issues and problems associated with that analysis. </li></ul></ul>
  6. 6. Proliferation of Traces & Trails <ul><li>Our lives have been leaving increasingly complete and detailed traces in cyberspace as two-way electronic communications devices have proliferated and diversified. Telephones were the first such devices to find widespread use; they soon yielded telephone billing data – records of when, where and by whom calls were made. Then bank ATM machines and point-of-sale terminals began to produce transaction records. As personal computer were plugged into commercial online networks, they too began to create electronic trails… There is more of this to come. As switched video networks become extensively used for everyday purposes – shopping, banking, selecting movies, social contact, political assembly – they potentially will grab and keep much more detailed portraits of private lives than have ever been made before. And wearable devices – ones that continuously monitor your medical condition, for example, or perhaps a cybersex suit that some journalists have avidly imagined – may construct the most up-close and intimate records. </li></ul>
  7. 7. Where there's data … <ul><li>Data mining technologies are pervasive in our society. The are designed to capture, aggregate, and analyze our digital footprints, such as purchases, Internet search strings, blogs, and travel patterns in an attempt to profile individuals for a variety of applications (Jason Millar, Problem for Predictive Data Mining, Lessons from Identity Trail, 2009) </li></ul>
  8. 8. Digital Trails: Their Value & Misuse <ul><li>The value of this data is unprecedented in the history of mankind. If you consider the sum of your online searching, mapping, communicating, blogging, news reading, shopping, and browsing, you should realize that you've revealed a very complete picture of yourself and placed it on the servers of a select few online companies. </li></ul><ul><li>The thin veneer of anonymity on the web is insufficient to protect you from revealing your identity. If you aren't even a little concerned, you should be. The value of this information is staggering and ripe for misuse. </li></ul>
  9. 9. Digital Traces and/or Trails: Informal Definitions <ul><li>Digital traces refer to the traces of activities and behaviors that people leave when they interact in digital environments ( </li></ul><ul><li>Digital trails refer to the associations or interconnections of these traces with other traces and with other sources of information </li></ul>
  10. 10. Digital Traces & Trails: Intention Unintentional Intentional Lifelogging Everyday Acts Interactions on Social Media/Networks
  11. 11. Lifelogging Steve Mann (the world’s first cyborg) – Cyborglogging ( Jennifer Ringley – Lifecasting thru the JenniCam (1996-2003) Mitch Maddox (aka DotComGuy) – 2002 Daniel P.W. Ellis Audio Lifelogging (2005-2007) Lisa Emily Batey– Lifecasting from Tokyo thru the (2007)
  12. 12. Lifelogging: MyLifeBits – Gordon Bell “ I’m losing my mind… By the way so are you.” “ Soon… you will have the capacity for Total Recall. You will be able to summon up everything you have ever see, heard, or done And you will be in total control, able to retrieve as much or as little as you want at any given time.”
  13. 13. Total Recall: Reminiscent of the Memex “ A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.” Vannevar Bush (1945)
  14. 14. Total Recall: SenseCam
  15. 15. MyLifeBits: The Research Project MyLifeBits store database Voice annotation tool Text annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell files Legacy applications Browser tool Internet IM capture MAPI interface Legacy email client GPS import & Map display SenseCam Screen saver
  16. 16. Total Recall: Gordon Bell’s Trove
  17. 17. Total Recall: The 1 TB Life <ul><li>1TB gives you 65+ years of: </li></ul><ul><ul><li>100 email messages a day (5KB each) </li></ul></ul><ul><ul><li>100 web pages day (50KB each) </li></ul></ul><ul><ul><li>5 scanned pages a day (100KB each) </li></ul></ul><ul><ul><li>1 book every 10 days (1 MB each) </li></ul></ul><ul><ul><li>10 photos per day (400 KB JPEG each) </li></ul></ul><ul><ul><li>8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) </li></ul></ul><ul><ul><li>1 new music CD every 10 days (45 min each at 128 Kb/s) </li></ul></ul><ul><li>It will take you 5 years to fill up your 80 GB drive </li></ul><ul><li>Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video) </li></ul>
  18. 18. Total Recall: The Benefits <ul><li>Ability to recover particular events, names, faces, and words </li></ul><ul><ul><li>A log of your vital statistics and medical history </li></ul></ul><ul><ul><li>A digital memory of people you met, conversations you had, places you visited, and events you participated in. </li></ul></ul><ul><ul><li>A complete archive of your work and play, and your work habits. </li></ul></ul><ul><li>Ability to sort and sift through your digital memories to uncover patterns in your life </li></ul><ul><ul><li>Your life can be chronicled, condensed, cross-correlated, and plotted out for you in useful and illuminating ways </li></ul></ul><ul><ul><li>Something you could never have gleaned with your unaided brain </li></ul></ul>
  19. 19. Life Recorders <ul><li>“ Life Recorders May Be This Century’s Wrist Watch.”, Arrington, M. (Sept 6, 2009) </li></ul>
  20. 20. Digital Traces: How much is a free smartphone worth? In the fall of 2008, 100 undergraduate students living in Random Hall at M.I.T. agreed for one year to swap their privacy for free smartphones in exchange for participating in a n MIT study aimed at understanding the impact of social interaction on social diffusion. When the participating students dialed other students, sent e-mails, or listened to songs the researchers knew…Every moment the students had their Windows Mobile smartphones with them, the researchers knew where they were and who was nearby. [ Translated into 350,000 hours of data – e.g. 65,000 phone calls, 25,000 SMS messages, 3.3 million scanned bluetooth devices and 2.5 million scanned 802.11 WLAN APs]
  21. 21. Reality Mining: Human Dynamics Group
  22. 22. Reality Mining: Human Dynamics Group
  23. 23. Sociometric Badges: Modeling Workplace Interaction <ul><li>Deployed their Sociometric badge platform for a period of one month (20 working days) at a Chicago-area data server configuration firm that consisted of 28 employees, with 23 participating in the study. Each employee was instructed to wear a Sociometric badge every day from the moment they arrived at work until they left their office. </li></ul><ul><li>Sociometric badge enable MIT researchers to track daily human activities, to extract speech features and non-linguistic signals in realtime, to locate individuals in the workplace (within 1.5 meters), and to detect other workers in close proximity, and to capture f2f interaction time. </li></ul><ul><li>In total we collected 1,900 hours of data, with a median of 80 hours per employee. </li></ul>
  24. 24. Reality Mining: Results <ul><li>Mobile phone features can be used to accurately identify relationships between individuals. And predict the sharing of music within a social network. </li></ul><ul><li>In the workplace complex problems are best solved with f2f interaction </li></ul>
  25. 25. Social Media Landscape Unintentional Intentional Lifelogging Everyday Acts Interactions on Social Media/Networks
  26. 26. Universal McCann Survey of Social Media Usage, Attitudes & Interests Wave 1 15 Countries 7500 Internet Users 9/06 Wave 2 21 Countries 10000 Internet Users 6/07 Wave 3 29 Countries 17000 Internet Users 3/08 Wave 4 38 Countries 23000 Internet Users 3/09 <ul><li>Survey representative of the Active Internet Universe between16-54 (at least every other day) </li></ul><ul><li>Online applications, platforms and media, which aim to facilitate interaction, collaboration and the sharing of content” </li></ul>
  27. 27. Universal McCann Survey: All Social Media have grown over the 4 Waves
  28. 28. Universal McCann Survey: Social Networks Note: There are variations in the absolute numbers of users
  29. 29. Universal McCann Survey Trends <ul><li>Social networks continue to grow. Nearly two-thirds of active internet users have now joined a social network site, up from 57% in Wave 3. . </li></ul><ul><li>Social networks are now a regular part of the online experience with 64.1% of active internet users spending time managing their profile. </li></ul><ul><li>Wave 4 reveals that social networks are becoming the dominant platform for content creation and content sharing. Users are starting to focus their digital life around the likes of Facebook, MySpace and Orkut. </li></ul>
  30. 30. Type of Profile Information (e.g. Facebook) <ul><li>Basic Information: </li></ul><ul><ul><li>Networks </li></ul></ul><ul><ul><li>Sex </li></ul></ul><ul><ul><li>Birthday </li></ul></ul><ul><ul><li>Hometown </li></ul></ul><ul><ul><li>Relationship Status </li></ul></ul><ul><ul><li>Looking for </li></ul></ul><ul><li>  Education and Work: </li></ul><ul><ul><li>Grad School </li></ul></ul><ul><ul><li>College </li></ul></ul><ul><ul><li>High School </li></ul></ul><ul><ul><li>Employer (Name, Time Period, Location) </li></ul></ul><ul><ul><li>Friends </li></ul></ul><ul><li>Personal Information: </li></ul><ul><ul><li>Activities </li></ul></ul><ul><ul><li>Interests </li></ul></ul><ul><ul><li>Favorite Music </li></ul></ul><ul><ul><li>Favorite TV Shows </li></ul></ul><ul><ul><li>Favorite Movies </li></ul></ul><ul><ul><li>Favorite Books </li></ul></ul><ul><ul><li>Contact Information: </li></ul></ul><ul><ul><li>Email </li></ul></ul><ul><ul><li>Current City </li></ul></ul>
  31. 31. Information Revelation on Facebook (4000 CMU Students) Information Revelation and Privacy in Online Social Networks. ACM Workshop on Privacy in the Electronic Society 2005. Ralph Gross - Alessandro Acquisti.
  32. 32. Key Social Network Activities
  33. 33. 1% Rule (or something like that) 434 It's an emerging rule of thumb that suggests that if you get a group of 100 people online then one will create content, 10 will &quot;interact&quot; with it (commenting or offering improvements) and the other 89 will just view it.
  34. 34. The Evidence of the 1% Rule <ul><li>YouTube -- each day there are 100 million downloads and 65,000 uploads - which is 1,538 downloads per upload - and 20m unique users per month. </li></ul><ul><ul><li>That puts the &quot;creator to consumer&quot; ratio at just 0.5%, but it's early days yet; not everyone has discovered YouTube (and it does make downloading much easier than uploading, because any web page can host a YouTube link). </li></ul></ul><ul><li>Wikipedia -- 50% of all article edits are done by 0.7% of users, and more than 70% of all articles have been written by just 1.8% of all users. </li></ul><ul><li>Yahoo Groups discussion lists -- &quot;1% of the user population might start a group; 10% of the user population might participate actively, and actually author content, whether starting a thread or responding to a thread-in-progress; 100% of the user population benefits from the activities of the above groups,&quot; he noted on his blog ( ) in February. </li></ul>
  35. 35. Day in the life … Recording without even trying We log onto computers at school and work, use our debit cards to buy lunch, scan our membership cards at the gym; the list goes on and on.  With each of these everyday acts we leave a digital bread crumb that enables others to track our movements. But how often do we stop and wonder, who is following these virtual trails? Unintentional Intentional Lifelogging Everyday Acts Interactions on Social Media/Networks
  36. 36. Day in the life …
  37. 37. What’s in the Trail? <ul><li>Type of trail </li></ul><ul><li>Time of trail </li></ul><ul><li>Initiated from (location) </li></ul><ul><li>Collected by </li></ul><ul><li>Data captured </li></ul><ul><li>Where stored </li></ul><ul><li>Accessible by (w/o sale) </li></ul><ul><li>Sold to </li></ul><ul><li>Privacy constraints </li></ul><ul><li>Government access </li></ul>
  38. 38. Sample Trail – Internet Activity
  39. 39. Sample Trail – Cell Phone
  40. 40. Web Trails: From IT to Marketing Hits Pages Visits Visitors (
  41. 41. Web Trails: Any Guesses <ul><li>June 30, 1998 </li></ul><ul><li>Lou Montulli </li></ul><ul><li>Netscape Communications Corp (US) </li></ul><ul><li>Persistent client state in a hypertext transfer protocol based client-server system </li></ul><ul><li>Answer – Cookie (aka tracking cookie, browser cookie, HTTP cookie) </li></ul>5774670
  42. 42. Web Trails: Cookies 1 2 3 <ul><li>Text stored on a user's computer by a web browser. </li></ul><ul><li>A cookie consists of one or more name-value pairs (e.g. user preferences, shopping cart contents, session identifier…) </li></ul><ul><li>Sent as an HTTP header by a web server to a web browser and then sent back unchanged by the browser each time it accesses that server. </li></ul>
  43. 43. Web Trails: Tracking across Multiple Sites Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)
  44. 44. Web Trails: What’s in a name <ul><li>Web Bug, Web Beacon, Tracking Bug, Tracking Pixel, Pixel Tag,1×1 Gif, Clear Gif, Transparent Gif </li></ul><ul><li>This is what the user sees: </li></ul>Pixel is here
  45. 45. What’s in a name <ul><li>Examples </li></ul><ul><ul><li><img src=&quot;; width=1 height=1 border=0> </li></ul></ul><ul><ul><li><img width=1 height=1 border=0 src=&quot; &db_afcr=4B31-C2FB-10E2C&event=reghome&group=register& time=1999. 6.37&quot;> </li></ul></ul><ul><li>What information can be tracked? Some examples </li></ul><ul><ul><li>The IP address of the computer that fetched the Web Bug </li></ul></ul><ul><ul><li>The URL of the page that the Web Bug is located on </li></ul></ul><ul><ul><li>The URL of the Web Bug image </li></ul></ul><ul><ul><li>The time the Web Bug was viewed </li></ul></ul><ul><ul><li>The type of browser that fetched the Web Bug image </li></ul></ul><ul><ul><li>A previously set cookie value </li></ul></ul>
  46. 46. Web Trails: Page Tag (JavaScript) Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)
  47. 47. Web Trails: Ad Clicks & Analysis Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)
  48. 48. Every Activities: Magnitude <ul><li>When it comes to producing data, we’re prolific. Those of us wielding cell phones, laptops, and credit cards fatten our digital dossiers every day, simply by living… </li></ul><ul><li>In a single month, Yahoo alone gathers 110 billion pieces of data about its customers… Each person visiting sites in Yahoo’s network of advertisers leaves behind on average, a trail of 2,520 clues. </li></ul>
  49. 49. Every Activities: Magnitude <ul><li>In a given year a conservative estimate of twenty digital transactions a day means that more than 7,000 transactions become associated with a particular individual – upwards of a half million in a lifetime (Jason Millar, Problem for Predictive Data Mining, Lessons from Identity Trail, 2009). </li></ul><ul><li>. </li></ul>
  50. 50. Creating Digital Trails: The Internet of Things Traffic Cameras Electronic Tolls Traffic Cameras Transit Cards Passports Security Badges Time Clocks Payment Cards Loyalty Cards Membership Cards Digital Cameras Video Recorders Voice Recorders Health Recorders Sleep Recorders Wireless Scales Mobile Phones GPS Trainers GPS Devices Photo Geotagging WPS Devices RFID Tags Event RFID Tags Unisense Sensors
  51. 51. Digital Trails: Location-Based Systems Networked (e.g. mobile phone triangulation) Handset (e.g. GPS trialateration) Hybrid(e.g. A-GPS or XPS)
  52. 52. Digital Trail: (and
  53. 53. Digital Trail:
  54. 54. Creating Digital Trails: RFID
  55. 55. Creating Digital Trails: RFID Critics <ul><li>Tracking and Identifying: </li></ul><ul><li>Vehicles and Commuters </li></ul><ul><li>Animals </li></ul><ul><li>Product Inventory </li></ul><ul><li>People </li></ul>
  56. 56. Creating Digital Trails: Some Examples RFID Tag Sensor Ubisense.Net
  57. 57. What can you do with location data?
  58. 58. Pop Quiz

Editor's Notes

  • The Web became mainstream enough for marketers to care (38 million Internet users in 1994, but roughly 1.5 billion by January 2009—a 40-fold increase ( Analytics became visitor-centric. (logging had to move from individual requests for pages to user visits) Analysts devised ways to segment visitors so they could decide which browsers, campaigns, promotions, countries, or referring sites were producing the best business results, and optimize their websites accordingly.
  • Used with most hosted Analytics Services (e.g. Google Analytics)