Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science - Poster - Kirk Borne - RDAP12


Published on

Data Science: The Revolution in Science Education

Poster at Research Data Access & Preservation Summit 23 March 2012

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Data Science - Poster - Kirk Borne - RDAP12

  1. 1. Data Science: The Revolution in Science Education Kirk D. Borne < > (George Mason University -- School of Physics, Astronomy, & Computational Sciences)SUMMARY:• Huge quantities of data are being generated, collected, and stored within all scientific, research, business, government, and personal domains (including social networks of all sorts). Visualize This:• Two significant challenges of this BIG DATA flood are addressed here:  Training the next-generation workforce to manage and expertly use these data … A sea of Data (sea of CDs)  “The Rise of the Data Scientist” This is the CD Sea in Kilmington, England (600,000 CDs ~ 300 TB).  Discovering the hidden knowledge and surprises that are hidden within the data …  Transforming our repositories from a data representation to a knowledge representation• So how do we address these challenges?• First, we must face it – i.e., the future researchers that we train as well as knowledge workers (those who extract knowledge from data and information) must recognize the need and face the challenge.• Second, we need algorithms, tools, and methodologies from the discipline of Data Science:  … for Big Data management, data mining & knowledge discovery, efficient & effecting indexing, data fusion & integration, visual analytics, relevance analysis, dimension reduction, feature selection, semantic mark-up, knowledge More data is different! mining, knowledge-reuse, knowledge self extraction, self-description, recommendation systems, and more. Knowledge More Data is Different – Data Science is Essential Data Science Education: Two Perspectives • The message should be clear: “more data is not simply more data, but more Discovery from • Informatics in Education – working with data in all learning settings data is different.” BIG DATA has big volume, velocity, and variety! BIG DATA: • Informatics (Data Science) enables transparent reuse and analysis of data in • Numerous federal agencies (and others, of course) have addressed this, many names and inquiry-based classroom learning. including the August 9, 2010 announcement from the White House OSTP: many responses … • Learning is enhanced when students work with real data and information • Big Data is a national challenge and a national priority, along with healthcare and • Data Mining (especially online data) that are related to the topic (any topic) being studied. national security. • Machine Learning (ML) • (“Using Data in the Classroom”) • Exploratory Data Analysis (EDA) • See (#87) • Intelligent Data Analysis (IDA) • An Education in Informatics – students are specifically trained: • International initiative by the CODATA organization to address this challenge: • Data Analytics • … to access large distributed data repositories • Predictive Analytics ADMIRE = Advanced Data Methods and Information technologies for • Discovery Informatics • … to conduct meaningful inquiries into the data Research and Education • On-Line Analytical Processing • … to mine, visualize, and analyze the data • Business Intelligence (BI) • Many U.S. national study groups in the sciences have issued reports on the • Business Analytics • … to make objective data-driven inferences, discoveries, and decisions urgency of establishing both research and educational programs to face the Big • Customer Relationship Management • Numerous Data Science programs now exist at several universities (GMU, Data challenges. • Target Marketing • Cross-Selling Caltech, RPI, Michigan, Cornell, U. Illinois, and others …) • Each of these reports have issued a call to action … • Market Basket Analysis • (Computational & Data Sciences @ GMU) • Credit Scoring • Case-Based Reasoning (CBR) Data Science: A National Imperative • Connecting the Dots Goals of Data Science Education 1. National Academies report: Bits of Power: Issues in Global Access to Scientific Data, (1997) downloaded from • Intrusion Detection Systems (IDS) • Recommendation / Personalization • Primary Goal: to increase student’s understanding of the role that data & 2. NSF (National Science Foundation) report: Knowledge Lost in Information: Research Directions for Digital Libraries, (2003) downloaded from Systems! information play across all disciplines, and to increase the student’s ability 3. NSF report: Cyberinfrastructure for Environmental Research and Education, (2003) downloaded from to use the technologies and methodologies associated with data acquisition, 4. NSB (National Science Board) report: Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century, (2005) downloaded from CDS Undergraduate management, search, mining, analysis, and visualization. 5. NSF report with the Computing Research Association: Cyberinfrastructure for Education and Learning for the Future: A Vision and Research Agenda, (2005) downloaded from Program at GMU: • Secondary goals: 6. NSF Atkins Report: Revolutionizing Science & Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on • To increase student’s abilities to use databases for inquiry. Cyberinfrastructure, (2005) downloaded from 7. NSF report: The Role of Academic Libraries in the Digital Data Universe, (2006) downloaded from • CDS = Computational and • To increase student’s abilities to acquire, process, and explore data with the use of a 8. National Research Council, National Academies Press report: Learning to Think Spatially, (2006) downloaded from Data Sciences computer. • To increase student’s confidence and comfort in using data to address real-world 9. NSF report: Cyberinfrastructure Vision for 21st Century Discovery, (2007) downloaded from • Undergraduate B.S. degree 10. JISC/NSF Workshop report on Data-Driven Science & Repositories, (2007) problems (in their chosen scientific discipline, or in any endeavor). program at GMU since 2008 11. DOE report: Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, • To increase student’s awareness of ethical issues pertaining to data and information, (2007) downloaded from • The DATA SCIENCE component including privacy, ownership, proper attribution, misuse and abuse of statistics and 12. DOE report: Mathematics for Analysis of Petascale Data Workshop Report, (2008) downloaded from of the curriculum was developed graphs, data falsification, and objective reasoning from data. with the support of a grant from the 13. NSTC Interagency Working Group on Digital Data report: Harnessing the Power of Digital Data for Science and Society, (2009) downloaded from • To demonstrate and to share the joy of discovery from data. National Science Foundation: 14. National Academies report: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, (2009) downloaded from • CUPIDS = Curriculum for an 15. NSF report: Data-Enabled Science in the Mathematical and Physical Sciences, (2010) Undergraduate Program In Data Sciences Concluding Remarks and Reflections: • Primary Goal: to increase • Now is the time to implement data-oriented methodologies (Informatics / Addressing the D2K (Data-to-Knowledge) Challenge student’s understanding of the role Data Science) into all degree programs – training the next-generation that data plays across the sciences • Complete end-to-end application of Informatics (BIG DATA Science): as well as to increase the student’s workforce to use data for knowledge discovery and decision support. • Data management, metadata management, data search, information extraction, ability to use the technologies • We have a grand opportunity now to establish dialogue and collaboration data mining, knowledge discovery, knowledge representation associated with data acquisition, across diverse data-intensive research and application communities. • All steps are necessary – skilled workforce needed to take data to information mining, analysis, and visualization. • Students with a broad interest in computers and sciences will benefit and then take information to knowledge. • Objectives – students are trained: from these types of programs: Computational and Data Sciences. • … to access large distributed data • Applies to any discipline. o Actual quote from high school senior visiting the university: “I plan repositories • … to conduct meaningful inquiries to major in biology, but I wish I could do something with computers into the data also.” • … to mine, visualize, and analyze the data • Students graduating with a traditional discipline-based bachelors degree • … to make objective data-driven in science generally do not have the required background necessary to inferences, discoveries, and decisions participate as productive members of modern interdisciplinary scientific • Core CDS courses & electives: research teams, which are becoming increasingly computational- and • CDS 101 – Introduction to Computational data-intensive. and Data Sciences • CDS 130 – Computing for Scientists • The motivating theme and goal of science degree programs should be to • CDS 151 – Data Ethics train the next-generation scientists in the tools and techniques of cyber- • CDS 251 – Introduction to Scientific Programming enabled science (e-Science) to prepare them to confront the emerging • CDS 301 – Scientific Information and Data petascale challenges of data-intensive science. Visualization • It is also good for society in general that all members of the 21st century • CDS 302 – Scientific Data and Databases • CDS 401 – Scientific Data Mining workforce are trained in computational and data science skills – i.e., • CDS 410 – Modeling and Simulations I computational literacy and data literacy for all citizens! • CDS 411 – Modeling and Simulations II