Big Data and Computer Science Education

2,132 views

Published on

Keynote from "Consortium for Computing Sciences in Colleges — Northeastern Region" 4/25/2014

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,132
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
43
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • *
    **
    ***
  • Big Data and Computer Science Education

    1. 1. Big Data Meets Computer Science Jim Hendler Tetherless World Professor of Computer, Web and Data Sciences Director, Rensselaer Institute for Data Exploration and Applications @jahendler
    2. 2. The Rensselaer “IDEA” (idea.rpi.edu)The Rensselaer “IDEA” (idea.rpi.edu)
    3. 3. The Rensselaer IDEA 3 … Across Applications (corresponding to Challenges Identified in the Rensselaer Plan 2024) Healthcare Analytics Business Systems Built and Natural Environments Virtual and Augmented Reality Cyber- Resiliency Policy, Ethics and Open Government Materials Informatics Data-driven Physical/Life Sciences
    4. 4. The Rensselaer IDEA 4 Developing a Comprehensive “Data Science” Research Agenda P. Fox and J. Hendler, The Science of Data Science, Big Data, 2(2), in press
    5. 5. The Rensselaer IDEA Graduate Projects in IDEA • IDEA and CCI (HPC): technologies to enable Rensselaer researchers to work with data at larger scales and in new ways • Population-scale cognitive computing models for “human intensive” agent-based simulations • IDEA and EMPAC (Performing arts center): provide next generation data exploration tools • Multi-person data visualization tools for big-data applications • IDEA and Watson: New direction in Cognitive Computation • How do we go from Question/Answering to Open Web Data exploration? • IDEA and CBIS (Ctr for Biotechnology & Interdisciplinary Studies): Data-driven Informatics • Can we couple semantics and big data to find new medical uses for already approved drugs?
    6. 6. The Rensselaer IDEA External Projects and partnerships Emergency Room Care Language and Agents Largescale Healthcare Analytics In Discussion Jumpstart (Proposal underway) Built and Natural Biome data-driven science and engineering Cognitive Computing Collaborative Research Initiative
    7. 7. Campus Data Infrastructure Metadata • Title • Author • Author Email • Licence • Subject • Keyword • Data Type Dataset CDF RPI Object Deposit RPI Research Network RPI-ID Request RPI-ID Request Share Knowledge Join Network Allocate a universal accessible RPI-ID Register Metadata Upload Any Data RPI Research Object Registration and Deposit RPI Research Collaboration and Community Network
    8. 8. Requires going Beyond the Database Discovery Integrate Visualize Explain Thinking outside the Database box Strata talk, 2013 - https://www.youtube.com/watch?v=Cob5oltMGMc
    9. 9. At new scales (and in new ways) Fox and Hendler, Changing the Equation on Scientific Visualization, Science, 2/11 - http://www.sciencemag.org/content/331/6018/705.short)
    10. 10. A Whole New World • But what about undergraduate education – where do we train the students who can take on projects needing • statistics and analytics • informatics • data science challenges • machine learning • unstructured data • cognitive computation • …
    11. 11. Computer Science Education? • Programming is a necessary skill – not sufficient • and we mostly teach it wrong… – (For my heresies about teaching programming, see “Let’s Help Computer Science Students Crack the Code, 3/13 http://chronicle.com/article/Lets-Help-Computer-Science/137649/ ) • The computing environment of today is nothing like the computing environment of the 70s, – but the curriculum hasn’t changed much since I was in school – but the fundamentals are NOT all the same – data-oriented computations involve graphs, memory intensive algorithms, machine learning, …
    12. 12. Deploying these ideas at RPI • Innovation in the interdisciplinary Information Technology Program – Renamed Information Technology and Web Science, 2011 • for more on Web Science, see – Berners-Lee et al., Creating a Science of the World Wide Web, Science, 2006, https://www.sciencemag.org/content/313/5788/769.summary; – Hendler et. al, Web Science: An interdisciplinary Approach to Understanding the Web, CACM, 7/2008, http://cacm.acm.org/magazines/2008/7/5366-web-science/fulltext
    13. 13. IT and Web Science • First IT academic program in U.S. • First web science degree program in U.S.; First undergraduate web science degree anywhere • BS in ITWS (20 concentrations) and MS in IT (10 concentrations) • PhD in Multi-Disciplinary Sciences • http://itws.rpi.edu – I was Director 2008-2012 – Now directed by Peter Fox (whose slides I stole for this section)
    14. 14.       Technical Track Courses      Concentrations Computer Engineering Track 1) ECSE-2610 Computer Components and Operations 2) ENGR-2350 Embedded Control 3) ECSE-2660 Computer Architecture, Networking and  Operating Systems Civil Engineering Computer Hardware Computer Networking (hardware focus) Mechanical/Aeronautical  Eng. Computer Science Track 1) CSCI-2200 Foundations of Computer Science 2) CSCI-2300 Introduction to Algorithms 3) CSCI-2500 Computer Organization Cognitive Science Computer Networking (software focus) Information Security Machine and Computational Learning Information Systems Track 1) CSCI-2200 Foundation of Computer Science 2) CSCI-2500 Computer Organization 3) Four credits from the following: • CSCI-2220 Programming in Java (2 credits) • CSCI-2961 Program in Python (2 credits) • CSCI-2300 Introduction to Algorithms (4 credits) • ITWS-49XX Web Systems Development II (4 credits) Arts Communication Economics Entrepreneurship Finance Management Information    Systems Medicine Pre-law Psychology STS Web Science Track 1) CSCI-2200 Foundations of Computer Science 2) CSCI-2500 Computer Organization 3) One of the following: • CSCI-49XX Web Systems Development II • Web/Data Course approved by ITWS Curriculum  Committee Data Science Science Informatics  Web Technologies  
    15. 15. CHANGES TO THE MASTER’S IN INFORMATION TECHNOLOGY PROGRAM • In Spring 2013 the MS in IT core curriculum was revised to include Data Analytics. • Networking core classes were replaced with Data Analytics core classes: Data Science, Database Mining, X-informatics, and Data Analytics (a new class offered in Spring 2014). • The MS in IT program also added two new concentrations: Data Science and Analytics and Information Dominance. • The Information Dominance concentration was developed for a new Navy program that will be educating a select group of 5-10 naval officers a year with the skills needed for military cyberspace operations. Two officers started in Fall 2013 and three began in Spring 2014.
    16. 16. IT Core Area Course Number Course Title Term(s) Offered Database Systems CSCI-4380 Database Systems Fall/Spring Data Analytics ITWS-6350 Data Science Fall Software Design and Engineering CSCI-4440 Software Design and Documentation Fall ITWS-6400 X-Informatics Spring Management of Technology* ITWS-6300 Business Issues for Engineers and Scientists (Professional Track Only) Fall/Spring Human Computer Interaction COMM-6420 Foundations of HCI Usability Fall COMM-696X Human Media Interaction Spring MS in IT Required Core Courses * For the research track, replace ITWS-6300 Business Issues for Engineers and Scientists with one of the two semester courses ITWS- 6980 Master’s Project or ITWS-6990 Master’s Thesis. Advanced Core options for students who have previously completed a Core Course IT Core Area Course Number Course Title Term(s) Offered Database Systems CSCI-6390 Database Mining Fall ITWS-6350 Data Science Fall ITWS-696X Semantic E-Science Fall Data Analytics CSCI-6390 Database Mining Fall ITWS-6400 X-Informatics Spring ITWX-696X Data Analytics Spring Software Design CSCI-6500 Distributed Computing Over the Internet Fall ECSE-6780 Software Engineering II Fall ITWS-696X Semantic E-Science Fall Management of Technology MGMT-6080 Networks, Innovation and Value Creation Fall MGMT-6140 Information Systems for Management Spring Human Computer Interaction COMM-6620 Information Architecture Spring COMM-6770 User-Centered Design Fall COMM-696X Interactive Media Design Summer
    17. 17. Concentration Course Number Course Name Term(s) Offered Data Science and Analytics Data and Information analytics extends analysis (descriptive and predictive models to obtain knowledge from data) by using insight from analyses to recommend action or to guide and communicate decision-making. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with an entire methodology. Key topics include: advanced statistical computing theory, multivariate analysis, and application of computer science courses such as data mining and machine learning and change detection by uncovering unexpected patterns in data. Select two or three of the following courses: ITWS-6350 Data Science Fall ITWS-6400 X-Informatics Spring ITWS-696X Data Analytics Spring ITWS-696X Semantic E-Science Fall ITWX-696X Advanced Semantic Technologies* Spring If only two of the above were chosen, select one more of the following courses: COMM-6620 Information Architecture Spring CSCI-4020 Computer Algorithms Spring CSCI-4150 Introduction to AI Fall CSCI-6390 Database Mining Fall CSCI-4220 or CSCI- 6220 Network Programming or Parallel Algorithm Design Spring ISYE-4220 Optimization Algorithms and Applications Fall ISYE-6180 Knowledge Discovery with Data Mining Spring MGMT-696X Technology Foundations for Business Analytics Fall MGMT-696X Predictive Analytics Using Social Media Spring Concentration Course Number Course Name Term(s) Offered Information Dominance The Information Dominance concentration prepares students for careers designing, building, and managing secure information systems and networks. The concentration includes advanced study in encryption and network security, formal models and policies for access control in databases and application systems, secure coding techniques, and other related information assurance topics. The combination of coursework provides comprehensive coverage of issues and solutions for utilizing high assurance systems for tactical decision-making. It prepares students for careers ranging from secure information systems analyst, to information security engineer, to field information manager and chief information officer. It is also appropriate for all IT professionals who want to enhance their knowledge of how to use pervasive information in situational awareness, operations scenarios, and decision-making. Select two or three of the following courses: ISYE-6180 Knowledge Discovery with Data Mining Spring CSCI-6960 Cryptography and Network Security I Fall ITWS-4370 Information System Security Spring CSCI-4650 Networking Laboratory I Fall/Spri ng MGMT-7760 Risk Management Fall ISYE-4310 Ethics of Modeling for Industrial Systems Engineering Fall If only two of the above were chosen, select one more of the following courses: CSCI-6390 Database Mining Fall CSCI-6968 Cryptography and Network Security II Spring CSCI-4660 Networking Laboratory II Fall/Spri ng ECSE-6860 Evaluation Methods for Decision Making Fall ISYE-6500 Information and Decision Technologies for Industrial and Service Systems Fall/Spri ng CSCI-496X Computational Analysis of Social Processes Fall Two New MS in IT Concentrations
    18. 18. Also at RPI • Data Science Research Center and Data Science Education Center (dsrc.rpi.edu, 2009) • http://www.rpi.edu/about/inside/issue/v4n17/datacente r.html – Over 45: research faculty, post-docs, grad students, staff, undergraduates… • Data is one of the Rensselaer Plan’s five thrusts • Other key faculty – Fran Berman (Center for Digital Society and RDA) – Bulent Yener (DSRC Director) – Peter Fox(ITWS Director)
    19. 19. More RPI Curriculua • Environmental Science with Geoinformatics concentration • Bio, geo, chem, astro, materials - informatics • GIS for Science • Visualization (new summer program) • Multi-disciplinary science program - PhD in Data and Web Science • DATUM: Data in Undergraduate Math! (Bennett) • Missing – intermediate statistics • Graphs – significant potential here – must teach!
    20. 20. 5-6 years in… • Science and interdisciplinary from the start! – Not a question of: do we train scientists to be technical/data people, or do we train technical people to learn the science – It’s a skill/ course level approach that is needed • We teach methodology and principles over technology • Data science must be a skill, and natural like using instruments, writing/using codes • Team/ collaboration aspects are key • Foundations and theory must be taught – for data, as well as programming
    21. 21. Summary

    ×