1. Will Data Science
Approaches Impact Our
Science?
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
6/15/17 Dataverse 2017 1
2. We Have Already Been Impacted
• Deep learning is a part of data science
• Definition
• Feature extraction from large sometimes sparse datasets
• Examples from this meeting
• Residue identification for protein ligand docking - trained on PMC applied to
PubMed.
• Improving ranking in fold recognition
• Folding membrane proteins
• Prediction of RNA-RNA interactions
4. Data analytics has become a major economic
driver in the private and government sectors…
It could be argued academic research is now
behind the curve not ahead of it ...
This has implications for our science, both in the
research itself and in recruitment of students and
faculty
5. Areas of Data Science
Yes much of this we have been doing, but much is becoming
mainstream which we can learn from
• Effective use of cloud computing.
• Model-based inference.
• Predictive modeling.
• Recognizing unusual and
anomalous behavior in data.
• Geometry and topology of data.
• Indexing, labelling and retrieval,
query engines
• Large-scale optimization.
• Machine learning.
• Network and graph theory.
• Privacy and security.
• Monitoring and analytics.
• Economics, governance and ethics.
• Controlled vocabularies.
• Visualization.
• Compression.
• Data mining.
6. What motivates me to want to have this
discussion…
I am developing a Data Science Institute where the
students are discovering open data and analytical
techniques that I think, Hmm I could apply this to
my own research..
It seems worthwhile to have a discussion to see if
a) you agree b) what might you share with our
community c) we cant ignore it...
8. Through distortions in its social use that include bias, amplification,
and invention, citation can be used to generate information
cascades resulting in unfounded authority of claims.
From Tim Clark]
9. Possible Items for Discussion
• Data
• New open data sources relevant to 3Dsig attendees
• Analytics
• Little known analytical techniques that could be more broadly applicable to
our community eg for clustering, machine learning, statistical rigor …
• Societal
• Maintaing researchers in the field
• Attracting data scientists to work on our problems