Data deluge has become a reality in today's scientific research. What does it mean to future science workforce? How can you prepare yourself to embrace the data challenges and opportunities? This presentation will provide you with an overview of data science and what it means to you as future researchers and career scientists.
1. Data Science: An Emerging
Field for Future Jobs
Jian Qin
School of Information Studies
Syracuse University
A presentation for the Graduate School, Syracuse University
February 22, 2013
2. DS
Talk points
› Data science (DS) and data scientists in the context of
research data
› Implications and expectations of future research workforce
› Preparing for the challenges and opportunities
GRADUATION SCHOOL PRESENTATION 2013-2-22 2
3. DS
Feeling the pressure
of data deluge in the
digital information
world …
http://readwrite.com/2011/11/17/
infographic-data-deluge---8-ze
GRADUATION SCHOOL PRESENTATION 2013-2-22 3
4. DS
…in science research
http://www.sciencemag.org/content/
331/6018.cover-expansion
GRADUATION SCHOOL PRESENTATION 2013-2-22 4
5. …in our health care
DS
http://ars.els-cdn.com/content/image/1-s2.0-S1053811905002508-gr4.jpg
GRADUATION SCHOOL PRESENTATION 2013-2-22 5
7. Shift in Science Paradigms
DS
Thousand A few hundred A few decades Today
years ago years ago ago
Data exploration (eScience)
unify theory, experiment, and
simulation
A computational -- Data captured by
approach instruments or generated by
simulating simulator
Theoretical complex -- Processed by software
branch phenomena -- Information/Knowledge
using models, stored in computer
generalizations -- Scientist analyzes
Science was database/files using data
empirical management and statistics
describing natural Gray, J. & Szalay, A. (2007). eScience – A transformed scientific method.
phenomena http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
2/22/13 13:54
GRADUATION SCHOOL PRESENTATION 2013-2-22 7
8. DS
Research data collections
Size Metadata Management
Standards
Larger, Multiple, Organized
discipline- comprehensive Institutionalized,
based
Heroic
individual
Smaller, team- None or inside the
based random team
GRADUATION SCHOOL PRESENTATION 2013-2-22 8
9. Emerging concepts
DS
that are going to stay and
matter to your career
GRADUATION SCHOOL PRESENTATION 2013-2-22 9
10. What is data science?
DS
“An emerging area of work
concerned with the collection,
presentation, analysis, visualization,
management, and preservation of
large collections of information.”
Stanton, J. (2012). Introduction to Data Science.
http://ischool.syr.edu/media/documents/2012/3/
DataScienceBook1_1.pdf
GRADUATION SCHOOL PRESENTATION 2013-2-22 10
11. DS
Data science and scientific research
Management domain Technical domain
Plan, design, consult Ingest, store,
for, implement, and organize, merge,
evaluate data filter, and transform
management projects data and create
and services analysis-ready data
GRADUATION SCHOOL PRESENTATION 2013-2-22 11
12. Data management is essential
DS
Laboratory Data Data Modeling/
Management Specialist Management Specialist
Scientific Data Management • Administer operational database • Work closely with the high
Specialist • Assure the quality of data performance computing and
• Design, develop, implement, and database content the IT manager
manage high-throughput automatic • Interact closely with researchers, • Develop a data model for
data processing infrastructure for lab managers, and platform complex multi-scale rocks
large databases in a mature system coordinators • Design and organize a
• Develop and improve the • Track deliverables against budget database and complex
infrastructure supporting this system and prepare data reports queries
• Interface with multiple data • Collaborate closely with IT and • Integrate and mange multi-
providers to design, build, and bioinformatics colleagues scale rocks subjected to
maintain their customized databases • Assist IT in gathering workflow large-scale scientific
• Clarify requirements, feature requirements computing applications
requests and bug reports for software • Test changes and updates in IT
systems http://www.ingrainrocks.com/
developers and assist in testing data-management-specialist/
code. • Create and maintain app
documentation
http://www.bioinformatics.org/
forums/forum.php?forum_id=9670
GRADUATION SCHOOL PRESENTATION 2013-2-22 12
13. DS
“We’re increasingly finding data in
the wild, and data scientists are
involved with gathering data,
massaging it into a tractable form,
making it tell its story, and presenting
that story to others.”
Loukides, M. (2011). What is data science? Sebastopol, CA: O’Reilly.
GRADUATION SCHOOL PRESENTATION 2013-2-22 13
14. DS
Emerging job market: Data scientists
› Data scientists are more likely to be involved across the
data lifecycle:
– Acquiring new data sets: 33%
– Parsing data sets: 29%
– Filtering and organizing data: 40%
– Mining data for patterns: 30%
– Advanced algorithms to solve analytical problems: 29%
– Representing data visually: 38%
– Telling a story with data: 34%
– Interacting with data dynamically: 37%
– Making business decisions based on data: 40%
http://mashable.com/2012/01/13/career-of-
GRADUATION SCHOOL PRESENTATION 2013-2-22 14
the-future-data-scientist-infographic/
15. Are you ready for the data
challenges and opportunities?
DS
GRADUATION SCHOOL PRESENTATION 2013-2-22 15
16. Ability to use a Knowledge
Data
DS
wide variety of a subject
modeling,
tools for domain
documentation, database and
analysis, and query design
report of data
Data OS,
Collaboration,
communication,
scientists Programming
languages
and co-
ordination
Content and Encoding
What are repository languages
systems
expected of data
scientists? GRADUATION SCHOOL PRESENTATION 2013-2-22 16
17. DS
Analytical skills: domain modeling
Requirement analysis
Interview skills, analysis and
generalization skills
Workflow analysis
Ability to capture components and
sequences in workflows
Data modeling
Ability to translate domain analysis
Data transformation into data models
needs analysis
Ability to envision the data model
Data provenance within the larger system architecture
needs analysis
GRADUATION SCHOOL PRESENTATION 2013-2-22 17
18. Analytical skills: from data sources to patterns,
DS
relationships, and trends
Analytical tools
“Hacking”
Knowledge
Data
products
GRADUATION SCHOOL PRESENTATION 2013-2-22 18
19. Data management skills: data lifecycle and
DS
infrastructural services
Metadata Encoding Semantic Identify Infrastructural
standards language control management services
Processed, transformed, derived, calculated, … data • Data source
discovery
• Data curation
Common data format
Image formats
• Data preservation
Matrix formats • Data integration and
Microarray file formats mashup
Communication protocols • Data citation,
publication, and
distribution
• Data linking and
interoperability
• …
GRADUATION SCHOOL PRESENTATION 2013-2-22 19
20. Technology skills with excellent communication
DS
skills
TECHNOLOGY SKILLS COMMUNICATION SKILLS
› Operation systems › Interviews
› Repository systems › “Ice breaking”
› Database systems › Community building
› Programming languages › Institutionalization
› Encoding languages › Stakeholder buy-in
› Specialized programming
GRADUATION SCHOOL PRESENTATION 2013-2-22 20
22. DS
Four tracks: choose what you are good at
Data Data storage
analytics and
Data Science management
core course:
Applied data
science
Databases
General
system Data
management visualization
http://ischool.syr.edu/
future/cas/
datascience.aspx
GRADUATION SCHOOL PRESENTATION 2013-2-22 22
23. The iSchool’s version of data science
DS
education
Ability to use a Knowledge
wide variety of a subject Data
tools for domain modeling,
documentation, database and
analysis, and query design
Eventually the report of data
iSchool data science
program will build Data OS,
Collaboration,
the foundation for communication,
scientists Programming
languages
and co-
super data ordination
scientists…
Content and Encoding
repository languages
systems
GRADUATION SCHOOL PRESENTATION 2013-2-22 23
24. DS
Thank You!
Questions?
GRADUATION SCHOOL PRESENTATION 2013-2-22 24