Big Data and Tangibles - TEI 13


Published on

Slides created for the Tangible Embedded & Embodied Interaction conference 2013

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big Data and Tangibles - TEI 13

  1. 1. From Big Data to Insights: Opportunities and Challenges for TEI in Genomics Orit Shaer, Ali Mazalek, Brygg Ullmer, Miriam K. Konkel
  2. 2. Outline Introduction to genomics/motivation Design challenges Case studies Opportunities for TEI Going forward
  3. 3. Genomics “While the work is a challenge, making genetics interactive is potentially as transformative as the move from batch processing to time sharing” -Bafna V. et al. Communications of the ACM Jan 2013
  4. 4. Project flow: Genome Sequencing Project Sequencing Centers High- throughput Sequencing Draft Sequence Finished Sequence Sequence Archiving Genome Annotation DNA Sequence Protein Prediction Pathways Comparative Analysis Target Selection
  5. 5. Schkolne, Ishii, and Schroder 2004. TEI for Scientists Gillet et al. 2005Brooks et al. 1990 Project GROPE Tabard, A., et. al 2011. eLabBench.
  6. 6. Challenges Scale Heterogeneous Data Diverse Audience
  7. 7. Scale Filesystem @ Broad Inst.: 13+PB One run of an Illumina HiSeq 2500: 6 billion paired-end sequences (600 gigabases, or 120Gb/day) Thousand Genomes project: 692 collaborators 110 institutions >15 groups in (bi-)weekly conference calls Blue Waters cluster: >380K CPU cores + >3K GPUs
  8. 8. Heterogeneous Data
  9. 9. Diverse Audience Genomic Scientists Citizen Scientist General Public Future Scientists
  10. 10. How can TEI systems be designed to • Empower citizens to make informed health decisions? • Communicate scientific data to communities? • Enhance learning of complex concepts? • Support experts interacting with big data?
  11. 11. Challenges Scale Heterogeneous Data Diverse Audience
  12. 12. Case Studies Tabletop Genome Browsing & Primer Design Tangible-targeted Computational Genomics Tangibles For Visualizing Systems Biology
  13. 13. Locate Learn Retrieve Annotate Compare
  14. 14. 48.4% 1.0%2.4% 46.6% 1.6% Human genome: understanding ca. 2012 Mobile elements Processed pseudogenes Tandem repeats & low complexity DNA Dark matter Protein & RNA coding regions Composition of other primate genomes is very similar Tangibles-targeted computational genomics
  15. 15. Example projects: rhesus, orangutan, human, marmoset genomes • Often multi-institution, multi-person efforts – Above articles: ~250, 100 co-authors • Often long duration (e.g., 4-6 years before first publication) • Iterative fusion of computational and “wet bench” analyses • Some analyses “big CPU” (e.g., 200 cpu cores for weeks); others, “big RAM” (200+GB RAM)
  16. 16. Tangible Visualization: persistent representations of people, projects, activities… Interactions 2012.07: Entangling space, form, light, time, computational STEAM, and cultural artifacts
  17. 17. CS3: Systems Biology Modeling
  18. 18. Lessons learned TEI can facilitate immediate, visible, and easily reversible manipulations • How to design TEI for open-ended creative inquiries? Tangible representations can facilitate multi-stage workflows • Important for execution and tracking of complex analyses • Need parametrized, annotatable representations of complex large datasets TEI could facilitate collaboration for distributed and co-located teams • Large interdisciplinary teams and distributed work are common in this area • Users can jointly manipulate assumptions and see consequences Tangible tools can support understanding and discovery • Provide access to different pieces of the problem (data, reactions) • Help users forms accurate mental models through tangible/embodied manipulation
  19. 19. Opportunities for TEI Engagement Understanding Complex Problems Visualizing Biological Data Enabling Large Collaborations Supporting Diverse Audiences Managing Varied Timescales
  20. 20. Understanding Complex Problems
  21. 21. Enabling Large Collaborations
  22. 22. Managing Varied Timescales Powers of 10,000: • Milliseconds • Minutes • Months • Millenia Entangling Space, Form, Light, Time, Computational STEAM, and Cultural Artifacts Examples • Many genome projects: 5+ years • Sequencing Lincoln’s DNA: under active discussion since 1991 • Most of us sequenced within decade? materially impacting all our descendants
  23. 23. Going forward • Some aspects w/ broad TEI, computational science synergies • How to visualize and engage data, activity, progress spanning many systems, people, places, timescales? • What representational forms, device ecologies, most appropriate for large, abstract data? • Facilitating engagement with big data in ways that highlight connections between multiple forms of evidence • Some aspects specific to genomics • 2023: anticipate most of us in room + many thousands of species having genomes fully or partially sequenced • Commonalities, distinctions in engagements by scientists, students, street people, senators, senior citizens, solicitors, …
  24. 24. THANKS! Orit Shaer: Ali Mazalek: Brygg Ullmer: Miriam Konkel: Consuelo Valdes (Wellesley College) and Andy Wu (Georgia Tech). This work has been partially funded by NSF IIS-1017693, DRL- 097394084, and CNS-1126739.