Data Science: An Emerging Field for Future Jobs


Published on

Data deluge has become a reality in today's scientific research. What does it mean to future science workforce? How can you prepare yourself to embrace the data challenges and opportunities? This presentation will provide you with an overview of data science and what it means to you as future researchers and career scientists.

  1. 1. Data Science: An EmergingField for Future JobsJian QinSchool of Information StudiesSyracuse UniversityA presentation for the Graduate School, Syracuse UniversityFebruary 22, 2013
  2. 2. DS Talk points ›  Data science (DS) and data scientists in the context of research data ›  Implications and expectations of future research workforce ›  Preparing for the challenges and opportunities GRADUATION SCHOOL PRESENTATION 2013-2-22 2
  3. 3. DS Feeling the pressure of data deluge in the digital information world … infographic-data-deluge---8-ze GRADUATION SCHOOL PRESENTATION 2013-2-22 3
  4. 4. DS …in science research 331/6018.cover-expansion GRADUATION SCHOOL PRESENTATION 2013-2-22 4
  5. 5. …in our health careDS GRADUATION SCHOOL PRESENTATION 2013-2-22 5
  6. 6.! market=boston&region_id=112&region_ …in our neighborhood type=1&v=8DS GRADUATION SCHOOL PRESENTATION 2013-2-22 6
  7. 7. Shift in Science ParadigmsDS Thousand A few hundred A few decades Today years ago years ago ago Data exploration (eScience) unify theory, experiment, and simulation A computational -- Data captured by approach instruments or generated by simulating simulator Theoretical complex -- Processed by software branch phenomena -- Information/Knowledge using models, stored in computer generalizations -- Scientist analyzes Science was database/files using data empirical management and statistics describing natural Gray, J. & Szalay, A. (2007). eScience – A transformed scientific method. phenomena 2/22/13 13:54 GRADUATION SCHOOL PRESENTATION 2013-2-22 7
  8. 8. DS Research data collections Size Metadata Management Standards Larger, Multiple, Organized discipline- comprehensive Institutionalized, based Heroic individual Smaller, team- None or inside the based random team GRADUATION SCHOOL PRESENTATION 2013-2-22 8
  9. 9. Emerging conceptsDS that are going to stay and matter to your career GRADUATION SCHOOL PRESENTATION 2013-2-22 9
  10. 10. What is data science?DS “An emerging area of work concerned with the collection, presentation, analysis, visualization, management, and preservation of large collections of information.” Stanton, J. (2012). Introduction to Data Science. DataScienceBook1_1.pdf GRADUATION SCHOOL PRESENTATION 2013-2-22 10
  11. 11. DS Data science and scientific research Management domain Technical domain Plan, design, consult Ingest, store, for, implement, and organize, merge, evaluate data filter, and transform management projects data and create and services analysis-ready data GRADUATION SCHOOL PRESENTATION 2013-2-22 11
  12. 12. Data management is essential DS Laboratory Data Data Modeling/ Management Specialist Management SpecialistScientific Data Management •  Administer operational database •  Work closely with the highSpecialist •  Assure the quality of data performance computing and•  Design, develop, implement, and database content the IT manager manage high-throughput automatic •  Interact closely with researchers, •  Develop a data model for data processing infrastructure for lab managers, and platform complex multi-scale rocks large databases in a mature system coordinators •  Design and organize a•  Develop and improve the •  Track deliverables against budget database and complex infrastructure supporting this system and prepare data reports queries•  Interface with multiple data •  Collaborate closely with IT and •  Integrate and mange multi- providers to design, build, and bioinformatics colleagues scale rocks subjected to maintain their customized databases •  Assist IT in gathering workflow large-scale scientific•  Clarify requirements, feature requirements computing applications requests and bug reports for software •  Test changes and updates in IT systems developers and assist in testing data-management-specialist/ code. •  Create and maintain app documentation forums/forum.php?forum_id=9670 GRADUATION SCHOOL PRESENTATION 2013-2-22 12
  14. 14. DS Emerging job market: Data scientists ›  Data scientists are more likely to be involved across the data lifecycle: –  Acquiring new data sets: 33% –  Parsing data sets: 29% –  Filtering and organizing data: 40% –  Mining data for patterns: 30% –  Advanced algorithms to solve analytical problems: 29% –  Representing data visually: 38% –  Telling a story with data: 34% –  Interacting with data dynamically: 37% –  Making business decisions based on data: 40% GRADUATION SCHOOL PRESENTATION 2013-2-22 14 the-future-data-scientist-infographic/
  15. 15. Are you ready for the data challenges and opportunities?DS GRADUATION SCHOOL PRESENTATION 2013-2-22 15
  16. 16. Ability to use a Knowledge DataDS wide variety of a subject modeling, tools for domain documentation, database and analysis, and query design report of data Data OS, Collaboration, communication, scientists Programming languages and co- ordination Content and Encoding What are repository languages systems expected of data scientists? GRADUATION SCHOOL PRESENTATION 2013-2-22 16
  17. 17. DS Analytical skills: domain modeling Requirement analysis Interview skills, analysis and generalization skills Workflow analysis Ability to capture components and sequences in workflows Data modeling Ability to translate domain analysis Data transformation into data models needs analysis Ability to envision the data model Data provenance within the larger system architecture needs analysis GRADUATION SCHOOL PRESENTATION 2013-2-22 17
  18. 18. Analytical skills: from data sources to patterns,DS relationships, and trends Analytical tools “Hacking” Knowledge Data products GRADUATION SCHOOL PRESENTATION 2013-2-22 18
  19. 19. Data management skills: data lifecycle andDS infrastructural services Metadata Encoding Semantic Identify Infrastructural standards language control management services Processed, transformed, derived, calculated, … data •  Data source discovery •  Data curation Common data format Image formats •  Data preservation Matrix formats •  Data integration and Microarray file formats mashup Communication protocols •  Data citation, publication, and distribution •  Data linking and interoperability •  … GRADUATION SCHOOL PRESENTATION 2013-2-22 19
  20. 20. Technology skills with excellent communicationDS skills TECHNOLOGY SKILLS COMMUNICATION SKILLS ›  Operation systems ›  Interviews ›  Repository systems ›  “Ice breaking” ›  Database systems ›  Community building ›  Programming languages ›  Institutionalization ›  Encoding languages ›  Stakeholder buy-in ›  Specialized programming GRADUATION SCHOOL PRESENTATION 2013-2-22 20
  22. 22. DS Four tracks: choose what you are good at Data Data storage analytics and Data Science management core course: Applied data science Databases General system Data management visualization future/cas/ datascience.aspx GRADUATION SCHOOL PRESENTATION 2013-2-22 22
  23. 23. The iSchool’s version of data scienceDS education Ability to use a Knowledge wide variety of a subject Data tools for domain modeling, documentation, database and analysis, and query design Eventually the report of data iSchool data science program will build Data OS, Collaboration, the foundation for communication, scientists Programming languages and co- super data ordination scientists… Content and Encoding repository languages systems GRADUATION SCHOOL PRESENTATION 2013-2-22 23
  24. 24. DS Thank You! Questions? GRADUATION SCHOOL PRESENTATION 2013-2-22 24