Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Science - Poster - Kirk Borne - RDAP12

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 1 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (18)

Advertisement

Similar to Data Science - Poster - Kirk Borne - RDAP12 (20)

More from ASIS&T (20)

Advertisement

Recently uploaded (20)

Data Science - Poster - Kirk Borne - RDAP12

  1. 1. Data Science: The Revolution in Science Education Kirk D. Borne <kborne@gmu.edu > (George Mason University -- School of Physics, Astronomy, & Computational Sciences) SUMMARY: • Huge quantities of data are being generated, collected, and stored within all scientific, research, business, government, and personal domains (including social networks of all sorts). Visualize This: • Two significant challenges of this BIG DATA flood are addressed here:  Training the next-generation workforce to manage and expertly use these data … A sea of Data (sea of CDs)  “The Rise of the Data Scientist” This is the CD Sea in Kilmington, England (600,000 CDs ~ 300 TB).  Discovering the hidden knowledge and surprises that are hidden within the data …  Transforming our repositories from a data representation to a knowledge representation • So how do we address these challenges? • First, we must face it – i.e., the future researchers that we train as well as knowledge workers (those who extract knowledge from data and information) must recognize the need and face the challenge. • Second, we need algorithms, tools, and methodologies from the discipline of Data Science:  … for Big Data management, data mining & knowledge discovery, efficient & effecting indexing, data fusion & integration, visual analytics, relevance analysis, dimension reduction, feature selection, semantic mark-up, knowledge More data is different! mining, knowledge-reuse, knowledge self extraction, self-description, recommendation systems, and more. Knowledge More Data is Different – Data Science is Essential Data Science Education: Two Perspectives • The message should be clear: “more data is not simply more data, but more Discovery from • Informatics in Education – working with data in all learning settings data is different.” BIG DATA has big volume, velocity, and variety! BIG DATA: • Informatics (Data Science) enables transparent reuse and analysis of data in • Numerous federal agencies (and others, of course) have addressed this, many names and inquiry-based classroom learning. including the August 9, 2010 announcement from the White House OSTP: many responses … • Learning is enhanced when students work with real data and information • Big Data is a national challenge and a national priority, along with healthcare and • Data Mining (especially online data) that are related to the topic (any topic) being studied. national security. • Machine Learning (ML) • http://serc.carleton.edu/usingdata/ (“Using Data in the Classroom”) • Exploratory Data Analysis (EDA) • See http://www.aip.org/fyi (#87) • Intelligent Data Analysis (IDA) • An Education in Informatics – students are specifically trained: • International initiative by the CODATA organization to address this challenge: • Data Analytics • … to access large distributed data repositories • Predictive Analytics ADMIRE = Advanced Data Methods and Information technologies for • Discovery Informatics • … to conduct meaningful inquiries into the data Research and Education • On-Line Analytical Processing • … to mine, visualize, and analyze the data • Business Intelligence (BI) • Many U.S. national study groups in the sciences have issued reports on the • Business Analytics • … to make objective data-driven inferences, discoveries, and decisions urgency of establishing both research and educational programs to face the Big • Customer Relationship Management • Numerous Data Science programs now exist at several universities (GMU, Data challenges. • Target Marketing • Cross-Selling Caltech, RPI, Michigan, Cornell, U. Illinois, and others …) • Each of these reports have issued a call to action … • Market Basket Analysis • http://spacs.gmu.edu/ (Computational & Data Sciences @ GMU) • Credit Scoring • Case-Based Reasoning (CBR) Data Science: A National Imperative • Connecting the Dots Goals of Data Science Education 1. National Academies report: Bits of Power: Issues in Global Access to Scientific Data, (1997) downloaded from • Intrusion Detection Systems (IDS) http://www.nap.edu/catalog.php?record_id=5504 • Recommendation / Personalization • Primary Goal: to increase student’s understanding of the role that data & 2. NSF (National Science Foundation) report: Knowledge Lost in Information: Research Directions for Digital Libraries, (2003) downloaded from http://www.sis.pitt.edu/~dlwkshop/report.pdf Systems! information play across all disciplines, and to increase the student’s ability 3. NSF report: Cyberinfrastructure for Environmental Research and Education, (2003) downloaded from http://www.ncar.ucar.edu/cyber/cyberreport.pdf to use the technologies and methodologies associated with data acquisition, 4. NSB (National Science Board) report: Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century, (2005) downloaded from http://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdf CDS Undergraduate management, search, mining, analysis, and visualization. 5. NSF report with the Computing Research Association: Cyberinfrastructure for Education and Learning for the Future: A Vision and Research Agenda, (2005) downloaded from http://www.cra.org/reports/cyberinfrastructure.pdf Program at GMU: • Secondary goals: 6. NSF Atkins Report: Revolutionizing Science & Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on http://spacs.gmu.edu/ • To increase student’s abilities to use databases for inquiry. Cyberinfrastructure, (2005) downloaded from http://www.nsf.gov/od/oci/reports/atkins.pdf 7. NSF report: The Role of Academic Libraries in the Digital Data Universe, (2006) downloaded from http://www.arl.org/bm~doc/digdatarpt.pdf • CDS = Computational and • To increase student’s abilities to acquire, process, and explore data with the use of a 8. National Research Council, National Academies Press report: Learning to Think Spatially, (2006) downloaded from Data Sciences computer. http://www.nap.edu/catalog.php?record_id=11019 • To increase student’s confidence and comfort in using data to address real-world 9. NSF report: Cyberinfrastructure Vision for 21st Century Discovery, (2007) downloaded from http://www.nsf.gov/od/oci/ci_v5.pdf • Undergraduate B.S. degree 10. JISC/NSF Workshop report on Data-Driven Science & Repositories, (2007) http://www.sis.pitt.edu/~repwkshop/NSF-JISC-report.pdf problems (in their chosen scientific discipline, or in any endeavor). program at GMU since 2008 11. DOE report: Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, • To increase student’s awareness of ethical issues pertaining to data and information, (2007) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/DOE-Visualization-Report-2007.pdf • The DATA SCIENCE component including privacy, ownership, proper attribution, misuse and abuse of statistics and 12. DOE report: Mathematics for Analysis of Petascale Data Workshop Report, (2008) downloaded from of the curriculum was developed http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf graphs, data falsification, and objective reasoning from data. with the support of a grant from the 13. NSTC Interagency Working Group on Digital Data report: Harnessing the Power of Digital Data for Science and Society, (2009) downloaded from • To demonstrate and to share the joy of discovery from data. http://www.nitrd.gov/about/Harnessing_Power_Web.pdf National Science Foundation: 14. National Academies report: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, (2009) downloaded from • CUPIDS = Curriculum for an http://www.nap.edu/catalog.php?record_id=12615 15. NSF report: Data-Enabled Science in the Mathematical and Physical Sciences, (2010) http://www.cra.org/ccc/docs/reports/DES-report_final.pdf Undergraduate Program In Data Sciences Concluding Remarks and Reflections: • Primary Goal: to increase • Now is the time to implement data-oriented methodologies (Informatics / Addressing the D2K (Data-to-Knowledge) Challenge student’s understanding of the role Data Science) into all degree programs – training the next-generation that data plays across the sciences • Complete end-to-end application of Informatics (BIG DATA Science): as well as to increase the student’s workforce to use data for knowledge discovery and decision support. • Data management, metadata management, data search, information extraction, ability to use the technologies • We have a grand opportunity now to establish dialogue and collaboration data mining, knowledge discovery, knowledge representation associated with data acquisition, across diverse data-intensive research and application communities. • All steps are necessary – skilled workforce needed to take data to information mining, analysis, and visualization. • Students with a broad interest in computers and sciences will benefit and then take information to knowledge. • Objectives – students are trained: from these types of programs: Computational and Data Sciences. • … to access large distributed data • Applies to any discipline. o Actual quote from high school senior visiting the university: “I plan repositories • … to conduct meaningful inquiries to major in biology, but I wish I could do something with computers into the data also.” • … to mine, visualize, and analyze the data • Students graduating with a traditional discipline-based bachelors degree • … to make objective data-driven in science generally do not have the required background necessary to inferences, discoveries, and decisions participate as productive members of modern interdisciplinary scientific • Core CDS courses & electives: research teams, which are becoming increasingly computational- and • CDS 101 – Introduction to Computational data-intensive. and Data Sciences • CDS 130 – Computing for Scientists • The motivating theme and goal of science degree programs should be to • CDS 151 – Data Ethics train the next-generation scientists in the tools and techniques of cyber- • CDS 251 – Introduction to Scientific Programming enabled science (e-Science) to prepare them to confront the emerging • CDS 301 – Scientific Information and Data petascale challenges of data-intensive science. Visualization • It is also good for society in general that all members of the 21st century • CDS 302 – Scientific Data and Databases • CDS 401 – Scientific Data Mining workforce are trained in computational and data science skills – i.e., • CDS 410 – Modeling and Simulations I computational literacy and data literacy for all citizens! • CDS 411 – Modeling and Simulations II

×