12 gordon bell


Published on

Presentation by Gordon Bell, Principal Researcher and Microsoft Research Silicon Valley Laboratory. The title is "Where's all that data? What's it good for?". This was presented at our Fujitsu North America Technology Forum 2012, held in Santa Clara, CA on Jan. 25th, 2012. The theme of the event was "From Sensor Networks to Human Networks: Turning Big Data into Actionable Wisdom"

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

12 gordon bell

  1. Where’s all that data? What’s it good for? Gordon Bell Microsoft Research Silicon Valley Laboratory Fujitsu 5th Technology Forum From Sensor Networks to Human Networks: Turning Big Data into Actionable Wisdom 25 January 2012
  2. Where do you get all those bits? Some Stories…• World’s commercial transactions• The Cloud• Personal lives from recording everything (MyLifeBits) – Individuals – Social sites – Libraries e.g. Mormon Library for preserving member archives• Fourth Paradigm of Science based on data – Our world is being instrumented for observing everything• Monitoring the earth and water for energy, food, and pleasure
  3. Courtesy of Barabba (or Steve Haeckel, IBM) or someone else
  4. Commercial People & Science & Real time,Transactions All Their Bits 4th Paradigm Real World Sense & Effect Courtesy of Gordon Bell, Barabba, Steve Haeckel, IBM and probably someone else
  5. LifeloggingWith extreme lifelogging, all of us will have theability to recall or have recalled everythingwe’ve ever said, saw, and did… just like today’s Political candidates Are people basically narcissistic?
  6. My five lifelogging epiphanies1. Its capture and digitization (1998)2. It’s organization and recall (2001)3. It’s a transaction processor for everything in and about your life (2005)4. It’s your true e-memory(2007) Bio-memory is just the meta-data and URL for e-Memory5. Your e-memory is everywhere and beyond your control (2011)
  7. The challenge nowWith extreme lifelogging, all of us will have the abilityto recall or have recalled everything we’ve ever said,saw, and did… just like today’s Political candidatesThe Challenge:Collecting the bits from the individuals.• Where are the bits?• Can they be recalled?• Who owns them?• How much does it cost to store forever?
  8. Let’s look at individuals & their e-memories
  9. I’m losing my mind
  10. THE ULTIMATE DIARY WHAT IF YOU COULD REMEMBER EVERYTHING?“Dalgliesh, he knew, had almost total recall.”-P.D. James, Death of an expert witness
  11. Everything you’ve ever read
  12. Everything you’ve ever seen
  13. Everything you’ve ever heard
  14. And much more… your very state• Location, bio, temperature, light level… sensors galore• Your heart beat, blood pressure, stress level, etc.
  15. As little or as much as you likeCertainly much more than ever beforeIF YOU WANT, YOU CAN HAVETOTAL RECALL
  17. Storage: cheap and abundant 10000 1000Gigabytes (GB) PC (3.5") Notebook (2.5") 100 PDA (2") Cellphone (Flash) 10 1 2000 2005 2010 2015
  18. Recall: search, analyze & present
  19. Next 10 years will see revolution of life and societyTOTAL RECALL IS INEVITABLEA TOOL OR THE ULTIMATE DIARY?
  20. We think life-Blogging is nuts!NOT LIFE-BLOGGING
  21. Personal and privateLIFE-LOGGING NOT LIFE-BLOGGING
  22. Recording/Sensors
  23. Why just paper?
  24. Memex As We May Think, Vannevar Bush, 1945“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”• Full-text search, text & audio annotations, and hyperlinks
  25. Recording Everything
  26. Bits per person… A One Terabyte, Low Resolution Life• 2000 VL res life can be stored in a TB (GB/month)• 2005 MyLifeBits captured about 1 GB per month… – Very little audio, video, lower resolution photos – Web page, photos, and video takes up the space• 2010 10-20 Terabytes is more realistic – SenseCam 3 GB/month 3 samples/minute – Audio 17 GB/month
  27. Special Persons Archives• Charles Vest: former President of MIT• Einstein• National Lib. Of Medicine: Lederberg• Salman Rushdie• LDS Church (Mormon)
  28. Public 21st century figure legacies• Charles Vest, president of MIT from 1990 to 2004, delivered a hard drive with nearly all of the files of his 14 year tenure to the MIT Archivist. It including speeches and letters (drafts), presentations, planning documents, meeting minutes, e-mails and a few photos. The only items Vest had deleted were a few files about his personal finances.• Nothing had been scanned, so no incoming input such as letters, web page views, articles, unless they were attachments.
  29. www.alberteinstein.info
  30. www.alberteinstein.info
  31. National Library of Medicine• Top 30: 265 GB; 181K Files; 1.5 MB/file – 99.99% tiff images – less than .01% plain text – less than .01% html files – less than .001% AVI video• Web derivative files 36 GB; 75K files; 0.5 MB/file• Items Pages Video Who 18,615 49,951 8 Lederberg 1,738 37,110 25 RMP 1,054 10,811 5 Koop 580 4,374 1 Avery 469 1,833 - Crick 279 1160 1 Pauling 302 893 - Varmus … 27.5 K 143 K
  32. Lederberg Finder page
  33. Lederberg papers official reports Number of document segments
  34. Emory University: Rushdie Archives Snooping Through Salman Rushdies Computer http://www.youtube.com/watch?v=pBtFNpgzlsg• 200 cardboard boxes;• Four emulated MACs; 18 Gbytes; 40,000 files
  35. SenseCam: Recording everything seen
  36. GB withSenseCam & Voice RecorderCamera now available:Viconreview.com
  37. Capturingevery step
  38. The “killer app”… Health!
  39. Capturing every heartbeat• 72.6 beats/min; 38.16 Million beats/year• 3.13 billion beats per life• Battery life: the expected time to next surgery! – St. Jude battery was 4-4.5 years, or ETS – Medtronic current, 8 years.
  40. AudiometricTest 050117
  41. Sensors with IP On Everything
  42. HR, weight, BP, distance,
  43. Health monitoring devices
  44. In-body health sensingpillcam Nanobot in the bloodstream EndoSure Wireless Pressure Sensor in an aneurysm sac
  45. Navigenics Report 2006-03
  46. HealthMonitoring:“Yourhusbandjust died,… here’shis blackbox”
  47. 100 75 50 25 0 Work: email, im, social sites Work: Who & When Work: Legacy documents Work: Web pages >Work: T&M (VIBE) >Work: Meetings >Work: Telephone… Home: Finance, Legal Learning: Books, journals, etc. Health: PHR >>Health: Diet & Exercise Health: On & inbody metrics Life: Music (CDs, cassettes,… Life: PhotosLife: Memorabilia, ephemera >Life: Tracked Days Life: Video Productions Life: SenseCam Days
  48. Bits per person… A One Terabyte, Low Resolution Life• 2000 VL res life can be stored in a TB (GB/month)• 2005 MyLifeBits captured about 1 GB per month… – Very little audio, video, lower resolution photos – Web page, photos, and video takes up the space• 2010 10-20 Terabytes is more realistic – SenseCam 3 GB/month 3 samples/minute – Audio 17 GB/month
  49. A View of Preserving Digital Lives• Preserving the analog life of a 20th century person: 10-100 GB. 2 Mpgs, 100Kimages, 100-1,000 hrs. video – Won’t analog people need to be converted to digital, … if not they’re really gone and forgotten history? – Gresham’s Law: digital lives drive out analog lives• How will a 21st century, digital person be preserved? – Which “lives” of a person e.g. personal, professional? – Depth of each life? – Size. Who’s in a library’s digital lifeboat?• Preserving Everybody? – Role of public institutions vs. the cloud for “all of us”
  50. Fire in the LibraryTechnology Review January 2012
  51. How far do we trust our institutions to save lives?• Re a comment on NPR in late January http://www.npr.org/templates/story/story.php?storyId=99372779 about people saving recordings of early pre-bluegrass American folk music:• "He considered giving his collection to the Library of Congress, … Alden says he worried that theyd be hard for musicians … to access, and that theyd gather dust lying …, what librarian … would let someone into the stacks with a banjo or a fiddle …?"• … theyre burning CDs and shipping them all over, which is the "lots of copies keeps stuff safe" philosophy (www.lockss.com). They havent taken the next step and put them online, and anyway dont have a virtual place to put them that has a good chance of surviving and caring for them in perpetuity.
  52. Scientific Data Deluge• CERN detectors• Radio telescopes• New telescopes and observatories• Gene sequencers• Global weather sensors• Earth science sensors
  53. Science Paradigms1. Thousand years ago: science was empirical describing natural phenomena2. Last few hundred years: theoretical branch using models, generalizations . 2 a 4G c23. Last few decades (FORTRAN):  a   3  2   a   a computational branch simulating complex phenomena4. Today Data-intensive science : data exploration (eScience) unify theory, experiment, and simulation – Data captured by instruments Or generated by simulation – Processed by software – Information/Knowledge stored in computer – Scientist analyzes database / files using data management and statistics Jim Gray NRC-CSTB 2007-01
  54.  Make sure the scientists have a data problem – otherwise they won’t take the time to talk with you Define 20 questions/plots – this drives the technical design, but also helps the cross-disciplines communication Spread the 20 questions/plots across “easy”, “tricky”, “too hard to do now” Ask about sharing and security and get to shared pragmatic consensus Don’t forget to write the papers on both sides - they help drive adoption Courtesy Catharine van Ingen
  55. Synthesizing Imagery, Sensors, Models and Field DataClimate classification FLUXNET ~1MB (1file) Curated sensor dataset 30GB (960 files)Vegetative clumping NASA MODIS imagery archives FLUXNET ~5MB (1file) 5 TB (600K files) curated field dataset 2 KB (1 file) Sizes given are 1 US year 20 US year ~ 1 global land surface yearNCEP/NCAR ~100MB (4K files)
  56. Global Scale Global Scale ArchiveContinental US Reprojection Reduction Download
  57. By the numbers….• 22 months • 1.3 M re-projected tiles• 2 CS interns; 1 architect; • 25 M reduction files 1 science intern; 1 • (TBD) VM senior scientist; 3 scaleup/scaledown hangers-on operations• 522 K cpu hours • (TBD) Lines of• 14 TB upload (nonMatLab) code• 10 TB max storage • $79K external billing• 5 TB download• 2.3 B storage operations
  58. The South Esk Hydrological Sensor Web:Next-Generation Catchment Management Water for a Healthy Country Andrew Terhorst Tasmanian ICT Centre (Hobart WSM real time) award winner 9 September 2011
  59. The sustainability challenge …• Australia is the driest inhabited continent• River flows can be extremely fickle/unreliable• Sustainable management of freshwater resources FLOOD EARLY WARNING requires good situation awareness WATER HYDRO-POWER REGULATIONS GENERATION REQUIRES RESERVOIR GOOD WATER MANAGEMENT SITUATION QUALITY AWARENESS WATER ENVIRONMENTAL TRADING FLOWS IRRIGATION PLANNING 2011 iAwards - Sustainability and Green IT
  60. South Esk River, Tasmania • Catchment receives variable rainfall - river flows are very erratic • Water resource managers require better situation awareness for managing water restrictions • Sustainability goal is to maximise water harvesting opportunities without compromising environmental flows2011 iAwards - Sustainability and Green IT
  61. Hydro-meteorological sensor network2011 iAwards - Sustainability and Green IT
  62. Integrating sensor data from multiple agencies2011 iAwards - Sustainability and Green IT
  63. Project goalDevelop a prototypewater informationsystem made up of twolinked sub-systems:• Continuous flow forecast system - Based on emerging Sensor Web standards• Provenance management system - Provides information on how flow forecasts are produced 2011 iAwards - Sustainability and Green IT
  64. Current practice DecisionNumeric Application Layer SupportModels Tools Physical Sensors, Observation Sensor Layer Archives2011 iAwards - Sustainability and Green IT
  65. Paradigm shift DecisionNumeric Semantic Application Layer SupportModels Broker Tools Sensor Web Services LayerPhysical Sensors, Observation Archives Sensor Layer2011 iAwards - Sustainability and Green IT
  66. Architectural framework Network Management And Provenance Scientific workflow Sensor data Atmospheric feeds Clients models Flow forecast models2011 iAwards - Sustainability and Green IT
  67. 2011 iAwards - Sustainability and Green IT
  68. Key system features Interoperable Provenance Highly management Re-locatable scalable First hydrological sensor web built in Redundancy Rapid Australia integration of Uses near Open sensor assets real-time data feeds from Unique Architecture Standards-based multiple agencies Improved Key understandin Reusable software of natural Features components systemPublished behaviourresearch articles Value Quality Enables Proposition sustainableIncluded in the management of GenericGlobal Earth scarce water applications Described as next- Serves regulatorsObserving System of resources generation water and communitySystems Provides economic information system in Serve other purposes benefit to irrigatorsimplementation pilot ITU technology briefing e.g. flood warning, fire-danger risk 2011 iAwards - Sustainability and Green IT assessment
  69. The end