Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Educating a New Breed ofData Scientists for ScientificData ManagementJian QinSchool of Information StudiesSyracuse Univers...
DS	   Talk points        ›  Data science (DS) and data scientists in the context of           scientific data        ›  An...
What is data science?DS	                     “An emerging area of work                   concerned with the collection,   ...
DS	   Data science and scientific research    Plan, design, consult                                     Ingest, store,    ...
What data scientists are expected to do:               the job market       DS	                                 Laboratory...
DS	         “We’re increasingly finding data in        the wild, and data scientists are        involved with gathering da...
What data scientists are expected to do:DS	   the difference from tradition        ›  Data scientists are more likely to b...
How should educational        programs address the        challenge?        A case of the CAS in Data Science        progr...
Story 1: cognitive-demanding workflows andDS	   data management›  Domain: Thermochronology and tectonics›  What’s involved...
DS	   Story 2: highly automated workflows    ›  Domain: Astrophysics: gravitational wave detection    ›  What’s involved: ...
Ability to use a        Knowledge                                                               DataDS	      wide variety ...
DS	Analytical    skills: domain modeling   Requirement analysis                                Interview skills, analysis ...
Analytical skills: from data sources to patterns,DS	   relationships, and trends                                       Ana...
Data management skills: data lifecycle andDS	 infrastructural services      Metadata    Encoding       Semantic         Id...
Technology skills with excellent communicationDS	   skills        TECHNOLOGY SKILLS                    COMMUNICATION SKILL...
No superman model for beginning dataDS	   scientists              Data            analytics                               ...
DS	     The CAS in Data Science program at SU          ›  Required:             –  Data Administration Concepts and Databa...
What we learned from the programDS	   development        ›  Data science is a moving target with multiple focal points    ...
DS	   Reconciliation of the two views of data science         “An emerging area of work                              “We’r...
The iSchool’s version of data scienceDS	    education                               Ability to use a       Knowledge      ...
DS	        eScience Librarianship Curriculum Project:        http://eslib.ischool.syr.edu/        Science Data Literacy Pr...
Upcoming SlideShare
Loading in …5
×

Educating a New Breed of Data Scientists for Scientific Data Management

3,044 views

Published on

This presentation reports the data science curriculum development and implementation at Syracuse iSchool, which has shaped by the fast changing data-intensive environment not only for science but also for business and research at large.

Published in: Education
  • Be the first to comment

Educating a New Breed of Data Scientists for Scientific Data Management

  1. 1. Educating a New Breed ofData Scientists for ScientificData ManagementJian QinSchool of Information StudiesSyracuse UniversityMicrosoft eScience Workshop, Chicago, October 9, 2012
  2. 2. DS Talk points ›  Data science (DS) and data scientists in the context of scientific data ›  An iSchool version of the DS curriculum ›  Findings and lessons from implementing the DS curriculum ›  A new breed of data scientists: the iSchool approach 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 2
  3. 3. What is data science?DS “An emerging area of work concerned with the collection, presentation, analysis, visualization, management, and preservation of large collections of information.” Stanton, J. (2012). Introduction to Data Science. http://ischool.syr.edu/media/documents/2012/3/ DataScienceBook1_1.pdf 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 3
  4. 4. DS Data science and scientific research Plan, design, consult Ingest, store, for, implement, and organize, merge, evaluate data filter, and transform management projects data and create and services analysis-ready data 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 4
  5. 5. What data scientists are expected to do: the job market DS Laboratory Data Data Modeling/ Management Specialist Management SpecialistScientific Data Management •  Administer operational database •  Working closely with the highSpecialist •  Assure the quality of data performance computing and•  Design, develop, implement, and database content the IT manager manage high-throughput automatic •  Interact closely with researchers, •  Develop a data model for data processing infrastructure for lab managers, and platform complex multi-scale rocks large databases in a mature system coordinators •  Design and organize a•  Develop and improve the •  Track deliverables against budget database and complex infrastructure supporting this system and prepare data reports queries•  Interface with multiple data providers •  Collaborate closely with IT and •  Integrate and mange multi- to design, build, and maintain their bioinformatics colleagues scale rocks subjected to customized databases •  Assist IT in gathering workflow large-scale scientific•  Clarify requirements, feature requirements computing applications requests and bug reports for software •  Test changes and updates in IT systems http://www.ingrainrocks.com/ developers and assist in testing data-management-specialist/ code. •  Create and maintain app documentation http://www.bioinformatics.org/ forums/forum.php?forum_id=9670 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 5
  6. 6. DS “We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” Loukides, M. (2011). What is data science? Sebastopol, CA: O’Reilly. 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 6
  7. 7. What data scientists are expected to do:DS the difference from tradition ›  Data scientists are more likely to be involved across the data lifecycle: –  Acquiring new data sets: 33% –  Parsing data sets: 29% –  Filtering and organizing data: 40% –  Mining data for patterns: 30% –  Advanced algorithms to solve analytical problems: 29% –  Representing data visually: 38% –  Telling a story with data: 34% –  Interacting with data dynamically: 37% –  Making business decisions based on data: 40% http://mashable.com/2012/01/13/career-of- 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 7 the-future-data-scientist-infographic/
  8. 8. How should educational programs address the challenge? A case of the CAS in Data Science program at Syracuse iSchoolDS 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 8
  9. 9. Story 1: cognitive-demanding workflows andDS data management›  Domain: Thermochronology and tectonics›  What’s involved: rock samples from drilling and field observation, sliced and grained rock samples›  Data types: Excel data files (lots of them), spectrum and microscopic images, annotations›  Analysis: modeling and sensemaking by combining data from multiple data files with specialized software›  Bottleneck problem: manually matching/merging/filtering data is extremely cumbersome and the problem is compounded by the difficulty finding the right data files
  10. 10. DS Story 2: highly automated workflows ›  Domain: Astrophysics: gravitational wave detection ›  What’s involved: data ingestion from laser interferometers, raw data calibration and segmentation, workflow management, provenance ›  Data types: streaming data from the laser interferometers, images ›  Analysis: detection of “events” ›  Bottleneck problem: tracking of data and processes and the relationships between them 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 10
  11. 11. Ability to use a Knowledge DataDS wide variety of a subject modeling, tools for domain documentation, database and analysis, and query design report of data Data OS, Collaboration, communication, scientists Programming languages and co- ordination Content and Encoding What are repository languages systems expected of data scientists? 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 11
  12. 12. DS Analytical skills: domain modeling Requirement analysis Interview skills, analysis and generalization skills Workflow analysis Ability to capture components and sequences in workflows Data modeling Ability to translate domain analysis Data transformation into data models needs analysis Ability to envision the data model Data provenance within the larger system architecture needs analysis
  13. 13. Analytical skills: from data sources to patterns,DS relationships, and trends Analytical tools “Hacking” Knowledge Data products 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 13
  14. 14. Data management skills: data lifecycle andDS infrastructural services Metadata Encoding Semantic Identify Infrastructural standards language control management services Processed, transformed, derived, calculated, … data •  Data source discovery •  Data curation Common data format Image formats •  Data preservation Matrix formats •  Data integration and Microarray file formats mashup Communication protocols •  Data citation, publication, and distribution •  Data linking and interoperability •  … 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 14
  15. 15. Technology skills with excellent communicationDS skills TECHNOLOGY SKILLS COMMUNICATION SKILLS ›  Operation systems ›  Interviews ›  Repository systems ›  “Ice breaking” ›  Database systems ›  Community building ›  Programming languages ›  Institutionalization ›  Encoding languages ›  Stakeholder buy-in ›  Specialized programming 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 15
  16. 16. No superman model for beginning dataDS scientists Data analytics Data storage and management Data scientists Core: Applied data science Databases General system management Data visualization 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 16
  17. 17. DS The CAS in Data Science program at SU ›  Required: –  Data Administration Concepts and Database Management –  Applied Data Science ›  Elective: Data Analytics Data Storage and Data Visualization •  Data Mining Management •  Information Architecture for •  Basics of Information •  Technologies for Web Internet Services Retrieval Systems Content Management •  Information Visualization •  Natural Language •  Foundations of Digital Data Processing •  Creating, Managing, and General Systems Management •  Advanced Information Preserving Digital Assets •  Enterprise Technologies Analytics •  Data Warehousing •  Managing Information Systems •  Research Methods •  Advanced Database Projects •  Statistical Methods Management •  Information Systems Analysis 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 17
  18. 18. What we learned from the programDS development ›  Data science is a moving target with multiple focal points –  Versions from statistics, computer science, and library and information science ›  Skills vs. theories –  Students are anxious to learn skills but not so interested in theories –  Theories help build visions ›  Sufficient hands-on time for technologies and tools ›  Authentic learning through real-world data management projects 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 18
  19. 19. DS Reconciliation of the two views of data science “An emerging area of work “We’re increasingly finding concerned with the data in the wild, and data collection, presentation, scientists are involved with analysis, visualization, gathering data, massaging it management, and into a tractable form, making it preservation of large tell its story, and presenting that collections of information.” story to others.” Stanton, J. (2012). Introduction to Data Science. Loukides, M. (2011). What is data http://ischool.syr.edu/media/documents/2012/3/ science? Sebastopol, CA: O’Reilly. DataScienceBook1_1.pdf 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 19
  20. 20. The iSchool’s version of data scienceDS education Ability to use a Knowledge wide variety of a subject Data tools for domain modeling, documentation, database and analysis, and query design Eventually the report of data iSchool data science program will build Data OS, Collaboration, the foundation for communication, scientists Programming languages and co- super data ordination scientists… Content and Encoding repository languages systems 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 20
  21. 21. DS eScience Librarianship Curriculum Project: http://eslib.ischool.syr.edu/ Science Data Literacy Project: http://sdl.syr.edu/ CAS in Data Science: http://ischool.syr.edu/future/cas/ datascience.aspx 10/9/12 MICROSOFT ESCIENCE WORKSHOP 2012 21

×