From Big Data to Insights:
Opportunities and Challenges
for TEI in Genomics
Orit Shaer, Ali Mazalek, Brygg Ullmer, Miriam K. Konkel
Introduction to genomics/motivation
Opportunities for TEI
“While the work is a challenge, making genetics
interactive is potentially as
transformative as the move from batch
processing to time sharing”
-Bafna V. et al. Communications of the ACM Jan 2013
Schkolne, Ishii, and Schroder 2004.
TEI for Scientists
Gillet et al. 2005Brooks et al. 1990
Tabard, A., et. al 2011. eLabBench.
Filesystem @ Broad Inst.: 13+PB
One run of an Illumina HiSeq 2500:
6 billion paired-end sequences
(600 gigabases, or 120Gb/day)
Thousand Genomes project:
>15 groups in (bi-)weekly
Blue Waters cluster:
>380K CPU cores
+ >3K GPUs
Citizen Scientist General Public
How can TEI systems be designed to
• Empower citizens to make informed health decisions?
• Communicate scientific data to communities?
• Enhance learning of complex concepts?
• Support experts interacting with big data?
Tabletop Genome Browsing
& Primer Design
Tangibles For Visualizing
Human genome: understanding ca. 2012
Tandem repeats & low
Protein & RNA coding
Composition of other primate genomes is very similar
Tangibles-targeted computational genomics
Example projects: rhesus, orangutan, human, marmoset genomes
• Often multi-institution, multi-person efforts
– Above articles: ~250, 100 co-authors
• Often long duration (e.g., 4-6 years before first publication)
• Iterative fusion of computational and “wet bench” analyses
• Some analyses “big CPU” (e.g., 200 cpu cores for weeks);
others, “big RAM” (200+GB RAM)
of people, projects, activities…
Interactions 2012.07: Entangling space, form, light, time, computational STEAM,
and cultural artifacts
TEI can facilitate immediate, visible, and easily reversible manipulations
• How to design TEI for open-ended creative inquiries?
Tangible representations can facilitate multi-stage workflows
• Important for execution and tracking of complex analyses
• Need parametrized, annotatable representations of complex large datasets
TEI could facilitate collaboration for distributed and co-located teams
• Large interdisciplinary teams and distributed work are common in this area
• Users can jointly manipulate assumptions and see consequences
Tangible tools can support understanding and discovery
• Provide access to different pieces of the problem (data, reactions)
• Help users forms accurate mental models through tangible/embodied manipulation
Opportunities for TEI Engagement
Understanding Complex Problems
Visualizing Biological Data
Enabling Large Collaborations
Supporting Diverse Audiences
Managing Varied Timescales
Managing Varied Timescales
Powers of 10,000:
Entangling Space, Form, Light, Time, Computational STEAM, and Cultural Artifacts
• Many genome projects: 5+ years
• Sequencing Lincoln’s DNA: under
active discussion since 1991
• Most of us sequenced within decade?
materially impacting all our descendants
• Some aspects w/ broad TEI, computational science synergies
• How to visualize and engage data, activity, progress spanning
many systems, people, places, timescales?
• What representational forms, device ecologies, most
appropriate for large, abstract data?
• Facilitating engagement with big data in ways that highlight
connections between multiple forms of evidence
• Some aspects specific to genomics
• 2023: anticipate most of us in room + many thousands of
species having genomes fully or partially sequenced
• Commonalities, distinctions in engagements by scientists,
students, street people, senators, senior citizens, solicitors, …
Orit Shaer: email@example.com
Ali Mazalek: firstname.lastname@example.org
Brygg Ullmer: email@example.com
Miriam Konkel: firstname.lastname@example.org
Consuelo Valdes (Wellesley College) and Andy Wu (Georgia Tech).
This work has been partially funded by NSF IIS-1017693, DRL-
097394084, and CNS-1126739.