MyLifeBits Jim Gemmell February, 2005
Conclusion We have entered an era of virtually unlimited storage, enabling the lifetime store  To make the store useful we need annotation, typed links, and database features More capture, more correlation – less work by the user
Collaborators Chief inspiration & guinea pig: Gordon Bell Software development lead: Roger Lueder MSR Collaborators: Lyndsay Williams, Ken Wood, Kentaro Toyama, Ron Logan, Steve Drucker, Curtis Wong, Mary Czerwinski, Brian Meyers Interns: Josh Blumenstock, Evan Salomon, Aleks Aris
Outline What is MyLifeBits History/Motivation MyLifeBits system outline Demo Future work
MyLifeBits is: An experiment in lifetime storage Digitizing Gordon Bell’s past Capturing more of his future A software system Capture Storage & retrieval Organization & annotation Minimum requirement: fulfill Vannevar Bush’s 1945 “Memex” vision
Memex As We May Think, Vannevar Bush, 1945 “ A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks
I am data
The guinea pig Has now scanned virtually all: Books written (and read when possible) Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) Photos Posters, paintings, photo of things (artifacts, …medals, plaques) Home movies and videos CD collection And, of course, all PC files Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come Paperless throughout 2002.  12” scanned, 12’ discarded . Only 44 GB, incl. 10 wma, 14 SQL!!!  Video: o(100) + 500 mov
The 1 TB Life 1TB gives you 65+ years of: 100 email messages a day (5KB each) 100 web pages day (50KB each) 5 scanned pages a day (100KB each) 1 book every 10 days (1 MB each) 10 photos per day (400 KB JPEG each) 8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) 1 new music CD every 10 days  (45 min each  at 128 Kb/s) It will take you 5 years to fill up your 80 GB drive Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video)
Trying to fill a terabyte in a year Gordon’s lifetime collection < 30 GB  (12 GB is music CDs) 4 hours 1.6K hours 1.5 Mb/s video 26 hours 9.3K hours 256 kb/s video 51 hours 18.6K hours 128 kb/s audio 2.9K docs 1.0M docs 1 MB document 7.3K photos 2.7M photos Photo  (400 KB JPEG) Per day Per TB Item
“ yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely” -Vannevar Bush, 1945
So you’ve got it – now what do you do with it? Can you find anything? Can you organize that many objects? Once you find it will you know what it is? Once you’ve found it once, could you find it again?
“ A record if it is to be useful … must be continuously extended, it must be stored, and above all it must be consulted”  “ The difficulty seems to be, not so much that we publish unduly … but rather that publication has been extended far beyond our present ability to make real use of the record”  - Vannevar Bush
MyLifeBits Software MyLifeBits store database Voice annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell Browser tool Internet IM capture GPS import & Map display SenseCam Screen saver Text annotation tool MAPI interface Legacy email client Outlook interface files Legacy applications VIBE logging
Entities & Links Annotates Caller in Phone Call Photo of Event Transcludes
MyLifeBits Schema (simplified) Images Music Phone calls Resources Relation-ships Relation-ship types Entity types Resource entities Event types Event log Events Tasks People Notes Email Messages Saved searches
DEMO
Future work:  new capture modes/devices SenseCam Deja View Body Media Quindi
Future work: Visualizations Don't give me a little card image and say, &quot;That's all you've got, because that's what I thought you should want for your virtual shoebox.&quot; There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. -Andy van Dam, Hypertext '87 Keynote Address Next Media U. Maryland IN-SPIRE Web Scout
Future work: UI UI Improvements User studies
Future work: Content analysis & Data Mining Is MyLifeBits just enough rope to hang yourself with? MyLifeBits must become MyPersonalAssistant Content analysis and data mining Doc similarity & “clean living”  Document meta-data extraction “ Creative thought and essentially repetitive thought are very different things. For the latter there are, and may be, powerful mechanical aids” –  Vannevar Bush
Future work: scaling Just starting to hit performance problems Stress tests & design modifications
www.MyLifeBits.com http://research.microsoft.com/CARPE2004
BONUS SLIDES
Everything  goes in a database You need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication) If you don’t use one, you will find yourself creating one! Files as blobs, also sync with file system for legacy apps SQL
CARPE ’04 The First ACM Workshop on Continuous Archival & Retrieval of Personal Experiences October 15 th  2004 Columbia University, New York, NY, USA
Dear Appy,  How committed are you?  Signed,  Lost and Forgotten Data Dear Appy, I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, these apps should be responsible for eternal support.  But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps?  If things continue on their current path, it seems I will be completely un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric... By Gordon Bell   http://research.microsoft.com/~gbell
A Storocratic Oath Do no harm to dates (File creation, Photo taken) Do no harm to device created &  other meta-data. Camera data & location data are sacred. Support & aid the creation of critical meta-data.  When/how the user feels like it Auto-magically! Maintain user confidentiality
Classification wish list Download classifications rather than build them Definitions & synonyms should help find what I want Today it is too expensive to manually classify my scanned paper. E.g. “right time” meta-data is critical! Next year I hope “the system” can classify papers and other documents e.g. bills In 10 years I expect all documents to appear electronically & classified  with a little help from me
Personal  Search is not Professional or Web search System sees every entry & access Everything, not just a professional life  Limited to SIS, not an infinite amount, covers a profession & personal life   Web as seen by search engines MyLifeBits Knowledge breadth e.g. Dewey classification Depth e.g. information item types & coverage Professional user

MyLifeBits van Microsoft

  • 1.
    MyLifeBits Jim GemmellFebruary, 2005
  • 2.
    Conclusion We haveentered an era of virtually unlimited storage, enabling the lifetime store To make the store useful we need annotation, typed links, and database features More capture, more correlation – less work by the user
  • 3.
    Collaborators Chief inspiration& guinea pig: Gordon Bell Software development lead: Roger Lueder MSR Collaborators: Lyndsay Williams, Ken Wood, Kentaro Toyama, Ron Logan, Steve Drucker, Curtis Wong, Mary Czerwinski, Brian Meyers Interns: Josh Blumenstock, Evan Salomon, Aleks Aris
  • 4.
    Outline What isMyLifeBits History/Motivation MyLifeBits system outline Demo Future work
  • 5.
    MyLifeBits is: Anexperiment in lifetime storage Digitizing Gordon Bell’s past Capturing more of his future A software system Capture Storage & retrieval Organization & annotation Minimum requirement: fulfill Vannevar Bush’s 1945 “Memex” vision
  • 6.
    Memex As WeMay Think, Vannevar Bush, 1945 “ A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks
  • 7.
  • 8.
    The guinea pigHas now scanned virtually all: Books written (and read when possible) Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) Photos Posters, paintings, photo of things (artifacts, …medals, plaques) Home movies and videos CD collection And, of course, all PC files Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come Paperless throughout 2002. 12” scanned, 12’ discarded . Only 44 GB, incl. 10 wma, 14 SQL!!! Video: o(100) + 500 mov
  • 9.
    The 1 TBLife 1TB gives you 65+ years of: 100 email messages a day (5KB each) 100 web pages day (50KB each) 5 scanned pages a day (100KB each) 1 book every 10 days (1 MB each) 10 photos per day (400 KB JPEG each) 8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) 1 new music CD every 10 days (45 min each at 128 Kb/s) It will take you 5 years to fill up your 80 GB drive Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video)
  • 10.
    Trying to filla terabyte in a year Gordon’s lifetime collection < 30 GB (12 GB is music CDs) 4 hours 1.6K hours 1.5 Mb/s video 26 hours 9.3K hours 256 kb/s video 51 hours 18.6K hours 128 kb/s audio 2.9K docs 1.0M docs 1 MB document 7.3K photos 2.7M photos Photo (400 KB JPEG) Per day Per TB Item
  • 11.
    “ yet ifthe user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely” -Vannevar Bush, 1945
  • 12.
    So you’ve gotit – now what do you do with it? Can you find anything? Can you organize that many objects? Once you find it will you know what it is? Once you’ve found it once, could you find it again?
  • 13.
    “ A recordif it is to be useful … must be continuously extended, it must be stored, and above all it must be consulted” “ The difficulty seems to be, not so much that we publish unduly … but rather that publication has been extended far beyond our present ability to make real use of the record” - Vannevar Bush
  • 14.
    MyLifeBits Software MyLifeBitsstore database Voice annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell Browser tool Internet IM capture GPS import & Map display SenseCam Screen saver Text annotation tool MAPI interface Legacy email client Outlook interface files Legacy applications VIBE logging
  • 15.
    Entities & LinksAnnotates Caller in Phone Call Photo of Event Transcludes
  • 16.
    MyLifeBits Schema (simplified)Images Music Phone calls Resources Relation-ships Relation-ship types Entity types Resource entities Event types Event log Events Tasks People Notes Email Messages Saved searches
  • 17.
  • 18.
    Future work: new capture modes/devices SenseCam Deja View Body Media Quindi
  • 19.
    Future work: VisualizationsDon't give me a little card image and say, &quot;That's all you've got, because that's what I thought you should want for your virtual shoebox.&quot; There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. -Andy van Dam, Hypertext '87 Keynote Address Next Media U. Maryland IN-SPIRE Web Scout
  • 20.
    Future work: UIUI Improvements User studies
  • 21.
    Future work: Contentanalysis & Data Mining Is MyLifeBits just enough rope to hang yourself with? MyLifeBits must become MyPersonalAssistant Content analysis and data mining Doc similarity & “clean living” Document meta-data extraction “ Creative thought and essentially repetitive thought are very different things. For the latter there are, and may be, powerful mechanical aids” – Vannevar Bush
  • 22.
    Future work: scalingJust starting to hit performance problems Stress tests & design modifications
  • 23.
  • 24.
  • 25.
    Everything goesin a database You need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication) If you don’t use one, you will find yourself creating one! Files as blobs, also sync with file system for legacy apps SQL
  • 26.
    CARPE ’04 TheFirst ACM Workshop on Continuous Archival & Retrieval of Personal Experiences October 15 th 2004 Columbia University, New York, NY, USA
  • 27.
    Dear Appy, How committed are you? Signed, Lost and Forgotten Data Dear Appy, I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, these apps should be responsible for eternal support. But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? If things continue on their current path, it seems I will be completely un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric... By Gordon Bell http://research.microsoft.com/~gbell
  • 28.
    A Storocratic OathDo no harm to dates (File creation, Photo taken) Do no harm to device created & other meta-data. Camera data & location data are sacred. Support & aid the creation of critical meta-data. When/how the user feels like it Auto-magically! Maintain user confidentiality
  • 29.
    Classification wish listDownload classifications rather than build them Definitions & synonyms should help find what I want Today it is too expensive to manually classify my scanned paper. E.g. “right time” meta-data is critical! Next year I hope “the system” can classify papers and other documents e.g. bills In 10 years I expect all documents to appear electronically & classified with a little help from me
  • 30.
    Personal Searchis not Professional or Web search System sees every entry & access Everything, not just a professional life Limited to SIS, not an infinite amount, covers a profession & personal life Web as seen by search engines MyLifeBits Knowledge breadth e.g. Dewey classification Depth e.g. information item types & coverage Professional user