SlideShare a Scribd company logo
Lessons from Data Science Program at
Indiana University: Curriculum, Students
and Online Delivery
NSF/TCPP Workshop on Parallel and Distributed Computing
Education
Edupar at IPDPS 2015
May 25, 2015
5/25/2015 1
Geoffrey Fox
gcf@indiana.edu http://www.infomall.org
School of Informatics and Computing
Digital Science Center
Indiana University Bloomington
School of Informatics and
Computing
5/25/2015 2
Background of the School
• The School of Informatics was established in 2000 as first of
its kind in the United States.
• Computer Science was established in 1971 and became part
of the school in 2005.
• Library and Information Science
was established in 1951 and
became part of the school
in 2013.
• Now named the School of
Informatics and Computing.
• Data Science added January 2014
• Engineering to be added Fall 2016
5/25/2015 3
What Is Our School About?
The broad range of computing and information technology:
science, a broad range of applications and human and
societal implications.
United by a focus on
information and technology,
our extensive programs
include:
• Computer Science
• Informatics
• Information Science
• Library Science
• Data Science (virtual - starting)
• Engineering (real - expected)
5/25/2015 4
Size of School
(2014-2015)
• Faculty ~100
• Students
Undergraduate 1,472
Master’s 649
Ph.D. 243
• Female Undergraduates 21%
(68% since 2007)
• Female Graduate Students 28%
(4% since 2007)
Undergraduates mainly Informatics (75%);
Graduates mainly Computer Science
5/25/2015 5
Data Science
5/25/2015 6
SOIC Data Science Program
• Cross Disciplinary Faculty – 31 in School of Informatics and Computing, a
few in statistics and expanding across campus
• Affordable online and traditional residential curricula or mix thereof
• Masters, Certificate, PhD Minor in place; Full PhD being studied
• http://www.soic.indiana.edu/graduate/degrees/data-science/index.html
• Note data science mentioned in faculty advertisements but unlike other
parts of School, there are no dedicated faculty
75/25/2015
It is around 7% of School
looking at fraction of
enrolled students summing
graduate and
undergraduate levels
IU Data Science Program and Degrees
• Program managed by cross disciplinary Faculty in Data Science.
Currently Statistics and Informatics and Computing School but
plans to expand scope to full campus
• A purely online 4-course Certificate in Data Science has been
running since January 2014
– Some switched to Online Masters
– Most students are professionals taking courses in “free time”
• A campus wide Ph.D. Minor in Data Science has been approved.
• Masters in Data Science (10-course) approved October 2014
• Exploring PhD in Data Science
• Courses labelled as “Decision-maker” and “Technical” paths
where McKinsey says an order of magnitude more (1.5 million by
2018) unmet job openings in Decision-maker track
85/25/2015
McKinsey Institute on Big Data Jobs
• There will be a shortage of talent necessary for organizations to take
advantage of big data. By 2018, the United States alone could face a
shortage of 140,000 to 190,000 people with deep analytical skills as
well as 1.5 million managers and analysts with the know-how to use
the analysis of big data to make effective decisions.
• IU Data Science Decision Maker Path aimed at 1.5 million jobs.
Technical Path covers the 140,000 to 190,000
http://www.mckinsey.com/mgi/publications/big_data/index.asp
95/25/2015
Job Trends
Big Data about an
order of magnitude
larger than data
science
19 May 2015 Jobs
3475 for “data science“
2277 for “data scientist“
19488 for “big data”
http://www.indeed.com/jobtrends?
q=%22Data+science%22%2C+%
22data+scientist%22%2C+%22bi
g+data%22%2C&l=
5/25/2015 10
What is Data Science?
• The next slide gives a definition arrived by a NIST study group fall 2013.
• The previous slide says there are several jobs but that’s not enough! Is this a
field – what is it and what is its core?
– The emergence of the 4th or data driven paradigm of science illustrates
significance - http://research.microsoft.com/en-
us/collaboration/fourthparadigm/
– Discovery is guided by data rather than by a model
– The End of (traditional) science http://www.wired.com/wired/issue/16-07
is famous here
• Another example is recommender systems in Netflix, e-commerce etc.
– Here data (user ratings of movies or products) allows an empirical
prediction of what users like
– Here we define points in spaces (of users or products), cluster them etc.
– all conclusions coming from data
5/25/2015 11
Data Science Definition from NIST Public Working Group
• Data Science is the extraction of actionable knowledge directly from data
through a process of discovery, hypothesis, and analytical hypothesis
analysis.
• A Data Scientist is a
practitioner who has sufficient
knowledge of the overlapping
regimes of expertise in
business needs, domain
knowledge, analytical skills
and programming expertise to
manage the end-to-end
scientific method process
through each stage in the big
data lifecycle.
See Big Data Definitions in
http://bigdatawg.nist.gov/V1_output_docs.php
5/25/2015 12
Some Existing Online Data Science
Activities
• Indiana University Masters is “blended”: online and/or
residential; other universities offer residential
• We discount online classes so that total cost of 10 courses is
~$11,500 (in state price)
30$35,490
5/25/2015 13
Computational Science
• Computational science has important similarities to data
science but with a simulation rather than data analysis flavor.
• Although a great deal of effort went into with meetings and
several academic curricula/programs, it didn’t take off
– In my experience not a lot of students were interested and
– The academic job opportunities were not great
• Data science has more jobs; maybe it will do better?
• Can we usefully link these concepts?
• PS both use parallel computing!
• In days gone by, I did research in particle physics
phenomenology which in retrospect was an early form of data
science using models extensively
5/25/2015 14
Data Science Curriculum
5/25/2015 15
IU Data Science Program: Masters
• Masters Fully approved by University and State October 14 2014 and
started January 2015
• Blended online and residential (any combination)
– Online offered at in-state rates (~$1100 per course)
– Hybrid (online for a year and then residential) surprisingly not
popular
• Informatics, Computer Science, Information and Library Science in
School of Informatics and Computing and the Department of
Statistics, College of Arts and Science, IUB
• 30 credits (10 conventional courses)
• Basic (general) Masters degree plus tracks
– Currently only track is “Computational and Analytic Data Science”
– Other tracks expected such as Biomedical Data Science
165/25/2015
Data Science Enrollment
• Fall 2015, about 240 new applicants to program; cap enrollment
• Certificate in Data Science (started January 2014)
– Current 29
– 16 applicants (12 admits, 9 accepts)
• (reduced from last year as prefer Masters)
– Plus 40 students in special executive education certificate through Kelley
Business School – currently just one class
• Online Masters in Data Science (started January 2015)
– Current 54
– 42 applicants; just 2 “hybrid” (28 admits, 19 accepts)
– Transfers from certificate have head start
• Residential Masters in Data Science (started January 2015)
– Current 3
– 187 applicants (101 admits, 55 accepts)
• Expected Data Science total enrollment Fall 2015 150-170
5/25/2015 17
Advertising Campaign
• Comparison of “Adwords” results for Three Masters Programs
• Security Informatics
• Data Science
• Information and Library Science
• CPC Cost per Click
• CTR Click Through Rate
5/25/2015 18
Program Adwords
timeframe
Adwords
Cost
# clicks CTR CPC #
applications
Security 12/1-4/30 $13K 2,577 0.20% $5.10 26
Data
Science
10/31-3/30 $17K 38,544 1.28% $0.43 267
ILS 9/1-4/30 $18K 4,382 0.11% $4.08 199
19
Indiana
University
Data
Science Site
5/25/2015
3 Types of Students
• Professionals wanting skills to improve job or
“required” by employee to keep up with technology
advances
• Traditional sources of IT Masters
• Students in non IT fields wanting to do “domain
specific data science”
5/25/2015 20
What do students want?
• Degree with some relevant curriculum
– Data Science and Computer Science distinct BUT
• Important goal often “Optional Practical Training” OPT
allowing graduated students visa to work for US
companies
– Must have spent at least a year in US in residential
program
• Residential CS Masters (at IU) 95% foreign students
• Online program students quite varied but mostly USA
professionals aiming to improve/switch job
215/25/2015
IU and Competition
• With Computer Science, Informatics, ILS, Statistics, IU has particularly broad
unrivalled technology base
– Other universities have more domain data science than IU
• Existing Masters in US in table. Many more certificates and related degrees
(such as business analytics)
School Program Campus Online Degree
Columbia University Data Science Yes No MS 30 cr
Illinois Institute of
Technology
Data Science Yes No MS 33 cr
New York University Data Science Yes No MS 36 cr
University of California
Berkeley School of
Information
Master of Information
and Data Science
Yes Yes M.I.D.S
University of Southern
California
Computer Science with
Data Science
Yes No MS 27 cr
5/25/2015 22
Data Science Curriculum
Faculty in Data Science is “virtual department”
4 course Certificate: purely online, started January 2014
10 course Masters: online/residential, started January 2015
5/25/2015 23
Basic Masters Course Requirements
• One course from two of three technology areas
– I. Data analysis and statistics
– II. Data lifecycle (includes “handling of research data”)
– III. Data management and infrastructure
• One course from (big data) application course cluster
• Other courses chosen from list maintained by Data Science Program
curriculum committee (or outside this with permission of advisor/ Curriculum
Committee)
• Capstone project optional
• All students assigned an advisor who approves course choice.
• Due to variation in preparation label courses
– Decision Maker
– Technical
• Corresponding to two categories in McKinsey report – note Decision Maker
had an order of magnitude more job openings expected
5/25/2015 24
Computational and Analytic Data Science track
• For this track, data science courses have been reorganized into categories reflecting
the topics important for students wanting to prepare for computational and analytic
data science careers for which a strong computer science background is necessary.
Consequently, students in this track must complete additional requirements,
• 1) A student has to take at least 3 courses (9 credits) from Category 1 Core
Courses. Among them, B503 Analysis of Algorithms is required and the student
should take at least 2 courses from the following 3:
– B561 Advanced Database Concepts,
– [STAT] S520 Introduction to Statistics OR (New Course) Probabilistic Reasoning
– B555 Machine Learning OR I590 Applied Machine Learning
• 2) A student must take at least 2 courses from Category 2 Data Systems, AND, at
least 2 courses from Category 3 Data Analysis. Courses taken in Category 1 can
be double counted if they are also listed in Category 2 or Category 3.
• 3) A student must take at least 3 courses from Category 2 Data Systems, OR, at
least 3 courses from Category 3 Data Analysis. Again, courses taken in Category 1
can be double counted if they are also listed in Category 2 or Category 3. One of
these courses must be an application domain course
5/25/2015 25
Admissions Criterion
• Decided by Data Science Program Curriculum
Committee
• Need some computer programming experience (either
through coursework or experience), and a
mathematical background and knowledge of statistics
will be useful
• Tracks can impose stronger requirements
• 3.0 Undergraduate GPA
• A 500 word personal statement
• GRE scores are required for all applicants.
• 3 letters of recommendation
5/25/2015 26
Geoffrey Fox’s
Online Data Science Classes I
Same class offered as
• MOOC
• Residential class
• Online class for credit
5/25/2015 27
Some Online Data Science Classes
• BDAA: Big Data Applications & Analytics
– Used to be called X-Informatics
– ~40 hours of video mainly discussing applications (The X in
X-Informatics or X-Analytics) in context of big data and
clouds https://bigdatacourse.appspot.com/course
• BDOSSP: Big Data Open Source Software and Projects
http://bigdataopensourceprojects.soic.indiana.edu/
– ~27 Hours of video discussing HPC-ABDS and use on
FutureSystems for Big Data software
• Both divided into sections (coherent topics), units (~lectures)
and lessons (5-20 minutes) in which student is meant to stay
awake
285/25/2015
• 1 Unit: Organizational Introduction
• 1 Unit: Motivation: Big Data and the Cloud; Centerpieces of the Future Economy
• 3 Units: Pedagogical Introduction: What is Big Data, Data Analytics and X-Informatics
• SideMOOC: Python for Big Data Applications and Analytics: NumPy, SciPy, MatPlotlib
• SideMOOC: Using FutureSystems for Java and Python
• 4 Units: X-Informatics with X= LHC Analysis and Discovery of Higgs particle
– Integrated Technology: Explore Events; histograms and models; basic statistics (Python and some in Java)
• 3 Units on a Big Data Use Cases Survey
• SideMOOC: Using Plotviz Software for Displaying Point Distributions in 3D
• 3 Units: X-Informatics with X= e-Commerce and Lifestyle
• Technology (Python or Java): Recommender Systems - K-Nearest Neighbors
• Technology: Clustering and heuristic methods
• 1 Unit: Parallel Computing Overview and familiar examples
• 4 Units: Cloud Computing Technology for Big Data Applications & Analytics
• 2 Units: X-Informatics with X = Web Search and Text Mining and their technologies
• Technology for Big Data Applications & Analytics : Kmeans (Python/Java)
• Technology for Big Data Applications & Analytics: MapReduce
• Technology for Big Data Applications & Analytics : Kmeans and MapReduce Parallelism (Python/Java)
• Technology for Big Data Applications & Analytics : PageRank (Python/Java)
• 3 Units: X-Informatics with X = Sports
• 1 Unit: X-Informatics with X = Health
• 1 Unit: X-Informatics with X = Internet of Things & Sensors
• 1 Unit: X-Informatics with X = Radar for Remote Sensing
Big Data Applications & Analytics Topics
Red = Software
295/25/2015
http://x-informatics.appspot.com/course
Example
Google
Course
Builder
MOOC
4 levels
Course
Sections (15)
Units(37)
Lessons(~250)
Video 38.5 hrs
Units are
roughly
traditional
lecture
Lessons are
~15 minute
segments
5/25/2015 30
https://bigdatacoursespring2015.appspot.com/course
http://x-informatics.appspot.com/course
Example
Google
Course
Builder
MOOC
The Physics
Section
expands to 4
units and 2
Homeworks
Unit 9 expands
to 5 lessons
Lessons played
on YouTube
“talking head
video +
PowerPoint”
5/25/2015 31
32
5/25/2015
Course Home Page showing Syllabus
33
Note that we have a course – section – unit – lesson hierarchy (supported by Mooc
Builder) with abstracts available at each level of hierarchy. The home page has overview
information (shown earlier) plus a list of all sections and a syllabus shown above.
5/25/2015
A typical lesson (the first in unit 21) Note links to all
37 units across the top
345/25/2015
MOOC Version
• Offered at https://bigdatacourse.appspot.com/preview
• Open to everybody
• Uses no University resources
• Updated December 2014
• One of two SoIC MOOCs named one of “7 great MOOCs for techies”
by ComputerWorld http://www.computerworld.com/article/2849569/7-
great-moocs-for-techies-all-free-starting-soon.html November 2014
• May 14 2015 3562 enrolled – small by MOOC standards
• Students from 108 countries
– 1020 USA
– 916 India
– 180 Brazil
– ~130 France, Spain, UK
5/25/2015 35
Associate's degree 73
Bachelor's degree 1078
Doctorate 243
High School and equivalent 176
Master's degree 1257
Other 42
(blank) 693
Student Starting Level
Age Distribution: Average 34
5/25/2015 36
Homeworks
• These are online within Google Course Builder for the MOOC
with peer assessment. In the 3 credit offerings, all graded
material (homework and projects) is conducted traditionally
through Indiana University Oncourse (superceded by Canvas).
• Oncourse was additionally used to assign which videos should
be watched each week and the discussion forum topics
described later (these were just “special homeworks in
Oncourse).
• In the non-residential data science certificate class, the
students were on a variable schedule (as typically working full
time and many distractions; one for example had faculty
position interviews) and considerable latitude was given for
video and homework completion dates.
375/25/2015
Discussion Forums
• Each offering had a separate set of electronic discussion forums which
were used for class announcements (replicating Oncourse) and for
assigned discussions.
• Following slide illustrates an assigned discussion on the implications of the
success of e-commerce for the future of “real malls”. The students were
given “participation credit” for posting here and these were very well
received.
• Later offerings made greater use of these forums. Based on student
feedback, we encouraged even greater participation through students both
posting and commenting.
• Note I personally do not like specialized (walled garden) forums and the
class forums were set up using standard Google Community Groups with a
familiar elegant interface. These community groups also link well to Google
Hangouts described later.
• As well as interesting topics, all class announcements were made in the
“Instructor” forum repeating information posted at Oncourse. Of course no
sensitive material such as returned homework was posted on Google site.
385/25/2015
The community group for one of classes and
one forum (“No more malls”)
5/25/2015 39
Hangouts and Adobe Connect
• For the purely online offering, we supplemented the asynchronous
material described above with real-time interactive Google Hangout
video sessions.
• Given varied time zones and weekday demands on students, these
were held at 1pm Eastern on Sundays.
• Google Hangouts are conveniently scheduled from community page
and offer interactive video and chat capabilities that were well
received. Other technologies such as Skype are also possible.
• Hangouts are restricted to 10-15 people which was sufficient for this
section but in general insufficient. Not all of 12 students attended a
given class.
• The Hangouts focused on general data science issues and the
mechanics of the class.
• Augment Hangout by non-video Adobe Connect session
405/25/2015
Figure 6: Community Events for Online
Data Science Certificate Course
415/25/2015
In class Sessions
• The residential sections had regular in class sessions; one 90
minute session per class each week. This was originally two
sessions but reduced to one partly because online videos turned
these into “flipped classes” with less need for in class time and
partly to accommodate more students (77 total graduate and
undergraduate) in two groups with separate classes.
• These classes were devoted to discussions of course material,
homework and largely the discussion forum topics. This part of
course was not greatly liked by the students – especially the
undergraduate section which voted in favor of a model with only
the online components (including the discussion forums which
they recommended expanding).
• In particular the 9.30am start time was viewed as too early and
intrinsically unattractive.
425/25/2015
Geoffrey Fox’s
Online Data Science Classes II
5/25/2015 43
5/25/2015 44
Kaleidoscope of (Apache) Big Data Stack (ABDS) and HPC Technologies
Cross-
Cutting
Functions
1) Message
and Data
Protocols:
Avro, Thrift,
Protobuf
2) Distributed
Coordination:
Google
Chubby,
Zookeeper,
Giraffe,
JGroups
3) Security &
Privacy:
InCommon,
Eduroam
OpenStack
Keystone,
LDAP, Sentry,
Sqrrl, OpenID,
SAML OAuth
4)
Monitoring:
Ambari,
Ganglia,
Nagios, Inca
17) Workflow-Orchestration: ODE, ActiveBPEL, Airavata, Pegasus, Kepler, Swift, Taverna, Triana, Trident, BioKepler, Galaxy, IPython, Dryad,
Naiad, Oozie, Tez, Google FlumeJava, Crunch, Cascading, Scalding, e-Science Central, Azure Data Factory, Google Cloud Dataflow, NiFi (NSA),
Jitterbit, Talend, Pentaho, Apatar, Docker Compose
16) Application and Analytics: Mahout , MLlib , MLbase, DataFu, R, pbdR, Bioconductor, ImageJ, OpenCV, Scalapack, PetSc, Azure Machine
Learning, Google Prediction API & Translation API, mlpy, scikit-learn, PyBrain, CompLearn, DAAL(Intel), Caffe, Torch, Theano, DL4j, H2O, IBM
Watson, Oracle PGX, GraphLab, GraphX, IBM System G, GraphBuilder(Intel), TinkerPop, Google Fusion Tables, CINET, NWB, Elasticsearch, Kibana,
Logstash, Graylog, Splunk, Tableau, D3.js, three.js, Potree, DC.js
15B) Application Hosting Frameworks: Google App Engine, AppScale, Red Hat OpenShift, Heroku, Aerobatic, AWS Elastic Beanstalk, Azure, Cloud
Foundry, Pivotal, IBM BlueMix, Ninefold, Jelastic, Stackato, appfog, CloudBees, Engine Yard, CloudControl, dotCloud, Dokku, OSGi, HUBzero,
OODT, Agave, Atmosphere
15A) High level Programming: Kite, Hive, HCatalog, Tajo, Shark, Phoenix, Impala, MRQL, SAP HANA, HadoopDB, PolyBase, Pivotal HD/Hawq,
Presto, Google Dremel, Google BigQuery, Amazon Redshift, Drill, Kyoto Cabinet, Pig, Sawzall, Google Cloud DataFlow, Summingbird
14B) Streams: Storm, S4, Samza, Granules, Google MillWheel, Amazon Kinesis, LinkedIn Databus, Facebook Puma/Ptail/Scribe/ODS, Azure Stream
Analytics, Floe
14A) Basic Programming model and runtime, SPMD, MapReduce: Hadoop, Spark, Twister, MR-MPI, Stratosphere (Apache Flink), Reef, Hama,
Giraph, Pregel, Pegasus, Ligra, GraphChi, Galois, Medusa-GPU, MapGraph, Totem
13) Inter process communication Collectives, point-to-point, publish-subscribe: MPI, Harp, Netty, ZeroMQ, ActiveMQ, RabbitMQ,
NaradaBrokering, QPid, Kafka, Kestrel, JMS, AMQP, Stomp, MQTT, Marionette Collective, Public Cloud: Amazon SNS, Lambda, Google Pub Sub,
Azure Queues, Event Hubs
12) In-memory databases/caches: Gora (general object from NoSQL), Memcached, Redis, LMDB (key value), Hazelcast, Ehcache, Infinispan
12) Object-relational mapping: Hibernate, OpenJPA, EclipseLink, DataNucleus, ODBC/JDBC
12) Extraction Tools: UIMA, Tika
11C) SQL(NewSQL): Oracle, DB2, SQL Server, SQLite, MySQL, PostgreSQL, CUBRID, Galera Cluster, SciDB, Rasdaman, Apache Derby, Pivotal
Greenplum, Google Cloud SQL, Azure SQL, Amazon RDS, Google F1, IBM dashDB, N1QL, BlinkDB
11B) NoSQL: Lucene, Solr, Solandra, Voldemort, Riak, Berkeley DB, Kyoto/Tokyo Cabinet, Tycoon, Tyrant, MongoDB, Espresso, CouchDB,
Couchbase, IBM Cloudant, Pivotal Gemfire, HBase, Google Bigtable, LevelDB, Megastore and Spanner, Accumulo, Cassandra, RYA, Sqrrl, Neo4J,
Yarcdata, AllegroGraph, Blazegraph, Facebook Tao, Titan:db, Jena, Sesame
Public Cloud: Azure Table, Amazon Dynamo, Google DataStore
11A) File management: iRODS, NetCDF, CDF, HDF, OPeNDAP, FITS, RCFile, ORC, Parquet
10) Data Transport: BitTorrent, HTTP, FTP, SSH, Globus Online (GridFTP), Flume, Sqoop, Pivotal GPLOAD/GPFDIST
9) Cluster Resource Management: Mesos, Yarn, Helix, Llama, Google Omega, Facebook Corona, Celery, HTCondor, SGE, OpenPBS, Moab, Slurm,
Torque, Globus Tools, Pilot Jobs
8) File systems: HDFS, Swift, Haystack, f4, Cinder, Ceph, FUSE, Gluster, Lustre, GPFS, GFFS
Public Cloud: Amazon S3, Azure Blob, Google Cloud Storage
7) Interoperability: Libvirt, Libcloud, JClouds, TOSCA, OCCI, CDMI, Whirr, Saga, Genesis
6) DevOps: Docker (Machine, Swarm), Puppet, Chef, Ansible, SaltStack, Boto, Cobbler, Xcat, Razor, CloudMesh, Juju, Foreman, OpenStack Heat,
Sahara, Rocks, Cisco Intelligent Automation for Cloud, Ubuntu MaaS, Facebook Tupperware, AWS OpsWorks, OpenStack Ironic, Google Kubernetes,
Buildstep, Gitreceive, OpenTOSCA, Winery, CloudML, Blueprints, Terraform, DevOpSlang, Any2Api
5) IaaS Management from HPC to hypervisors: Xen, KVM, Hyper-V, VirtualBox, OpenVZ, LXC, Linux-Vserver, OpenStack, OpenNebula,
Eucalyptus, Nimbus, CloudStack, CoreOS, rkt, VMware ESXi, vSphere and vCloud, Amazon, Azure, Google and other public Clouds
Networking: Google Cloud DNS, Amazon Route 53
21 layers
Over 350
Software
Packages
May 15
2015
Big Data & Open Source Software Projects Overview
• This course studies DevOps and software used in many commercial
activities to study Big Data.
• The backdrop for course is the ~350 software subsystems HPC-ABDS
(High Performance Computing enhanced - Apache Big Data Stack)
illustrated at http://hpc-abds.org/kaleidoscope/
• The cloud computing architecture underlying ABDS and contrast of this with
HPC.
• The main activity of the course is building a significant project using multiple
HPC-ABDS subsystems combined with user code and data.
• Projects will be suggested or students can chose their own
• http://cloudmesh.github.io/introduction_to_cloud_computing/class/lesson/projects.html
• For more information,
see: http://bigdataopensourceprojects.soic.indiana.edu/ and
• http://cloudmesh.github.io/introduction_to_cloud_computing/class/bdossp_sp15/week_plan.html
• 25 Hours of Video
• Probably too much for semester class
5/25/2015 45
5/25/2015 46
5/25/2015 47
5/25/2015 48
Unexpected Lessons
• We learnt some things from current offering of BDOSSP class
– 40 online students from around the world
• The hyperlinking of material caused students NOT to go through material
systematically
– Suggest go to structured hierarchy as in BDAA Course
– Followed from use of Canvas as mundane LMS plus multiple web
resources (Microsoft Office Mix and our computer support pages)
• Students did not use email and discussion groups in Canvas; we switched to
emails and list serves to the their main (not IU) email
• Very erratic progress due to different time zones and interruption of full time
job for each student
– Difficult to have communal “help” sessions and to give interactive support
at time student wanted
• OpenStack fragile!
5/25/2015 49
MOOC’s
5/25/2015 50
Background on MOOC’s
• MOOC’s are a “disruptive force” in the educational
environment
– Coursera, Udacity, Khan Academy and many others
• MOOC’s have courses and technologies
• Google Course Builder and OpenEdX are open source
MOOC technologies
• Blackboard and others are learning management systems
with (some) MOOC support
• Coursera Udacity etc. have internal proprietary MOOC
software
• This software is LMS++
• LMS= Learning Management system
515/25/2015
MOOC Style Implementations
• Courses from commercial sources, universities and
partnerships
• Courses with 100,000 students (free)
• Georgia Tech a leader in rigorous academic
curriculum – MOOC style Masters in Computer
Science (pay tuition, get regular GT degree)
• Interesting way to package tutorial material for
computers and software e.g.
– E.g. Course online programming laboratories supported by
MOOC modules on how to use system
525/25/2015
535/25/2015
MOOCs in SC community
• Activities like CI-Tutor and HPC University are community
activities that have collected much re-usable education
material
• MOOC’s naturally support re-use at lesson or higher level
– e.g. include MPI on XSEDE MOOC as part of many parallel
programming classes
• Need to develop agreed ways to use backend servers (HPC or
Cloud) to support MOOC laboratories
– Students should be able to take MOOC classes from tablet or phone
• Parts of MOOC’s (Units or Sections) can be used as modules
to enhance classes in outreach activities
545/25/2015
Cloud MOOC Repository
55
http://iucloudsummerschool.appspot.com/preview
5/25/2015
Online Education
5/25/2015 56
Potpourri of Online Technologies
• Canvas (Indiana University Default): Best for interface with IU grading and
records
• Google Course Builder: Best for management and integration of
components
• Ad hoc web pages: alternative easy to build integration
• Microsoft Mix: Simplest faculty preparation interface
• Adobe Presenter/Camtasia: More powerful video preparation that support
subtitles but not clearly needed
• Google Community: Good social interaction support
• YouTube: Best user interface for videos (without Mix PowerPoint support)
• Hangout: Best for instructor-students online interactions (one instructor to 9
students with live feed). Hangout on air mixes live and streaming (30 second
delay from archived YouTube) and more participants
• OpenEdX at one time future of Google Course Builder and getting easier to
use but still significant effort
575/25/2015
Four Online Platforms I
5/25/2015 58
CourseBuilder OpenEdx IU Canvas OfficeMix Plugin for
Powerpoint
OpenSource Yes Yes No N/A
Microsoft Integration
(Office 365,
Onedrive, Azure
cloud)
No Yes. Predicted to be
included in the
upcoming release.
No N/A
Analytics Some analytics
included but not
comprehensive. Still
needs more
development.
No analytics included
but there is a version
0 alpha release
Analytics API
available for use.
External apps can be
developed
Very basic Very basic but more
useful than Canvas
Peer reviews Yes Yes Yes No
Four Online Platforms II
5/25/2015 59
CourseBuilder OpenEdx IU Canvas OfficeMix
Plugin for
Powerpoint
LTI Compliance
(Learning
Technologies
Integration)
Yes. CB as a
LTI provider or
consumer.
Yes. Yes.
Functionality
might be limited
by IU.
N/A
Ease of use
and
customization
scale
5/10 for students,
faculty, developers
7/10 – ease of use
by students, faculty
3/10 – customization
by developer
N/A PowerPoint
Slide labelled
Videos could
be an
advantage
Ease of
Deployment
10/10 1/10 N/A N/A
Cost Almost none; Can
rise with increase
usage of cloud
transactions but
usually a very low
cost operation
Very expensive
to deploy and
maintain the
servers; Need a
dedicated staff
for administering
servers;
IU provided N/A
1 – not easy
10 – very easy
Four Online Platforms III
5/25/2015 60
CourseBuilder OpenEdx IU Canvas OfficeMix
Plugin for
Powerpoint
Unique
Features and
Functionality
Skill maps
BigQuery for
Analytics
Good UI for
course
administration;
Integrated
forums,
grading, content
area, and much
more.
Export/Import
Grades
Enables faculty
to record their
own videos and
insert
interactive
content such as
quizzes,
programming
test-bed etc.
Common
features
Certificates
Generation
Supported;
Quizzes;
Assessments;
Peer Reviews;
Autograding
Certificate
Generation
Supported;
Quizzes;
Assessments;
Autograding;
Quizzes;
Assessments;
Peer Review
N/A
Summary
5/25/2015 61
Lessons / Insights
• Data Science is a very healthy area
• At IU, I expect to grow in interest although set up as a
program has strange side effects
• Not clear if Online education is taking off but may be
distorted by US Company hiring practices
• I teach all my classes – residential or online -- with
online lectures
• All of this straightforward but hard work
• Current open source and proprietary MOOC software
not very satisfactory; “easy” to do better
• No reason to differentiate MOOC and general LMS
5/25/2015 62
Details of Masters Degree
Computational and Analytic Data
Science track
635/25/2015
Computational and Analytic Data Science track
• Category 1: Core Courses
• CSCI B503 Analysis of Algorithms
• CSCI B555 Machine Learning OR INFO I590 Applied Machine
Learning
• CSCI B561 Advanced Database Concepts
• STAT S520 Introduction to Statistics OR (New Course) Probabilistic
Reasoning
• Category 2: Data Systems
• CSCI B534 Distributed Systems CSCI B561 Advanced Database
Concepts, CSCI B662 Database Systems & Internal Design
• CSCI B649 Cloud Computing CSCI B649 Advanced Topics in
Privacy
• CSCI P538 Computer Networks
• INFO I533 Systems & Protocol Security & Information Assurance
• ILS Z534: Information Retrieval: Theory and Practice
645/25/2015
Computational and Analytic Data Science track
• Category 3: Data Analysis
• CSCI B565 Data Mining
• CSCI B555 Machine Learning
• INFO I590 Applied Machine Learning
• INFO I590 Complex Networks and Their Applications
• STAT S520 Introduction to Statistics
• (New Course) Probabilistic Reasoning
• (New Course CSCI) Algorithms for Big Data
• Category 4: Elective Courses
• CSCI B551 Elements of Artificial Intelligence
• CSCI B553 Probabilistic Approaches to Artificial Intelligence
• CSCI B659 Information Theory and Inference
• CSCI B661 Database Theory and Systems Design
• INFO I519 Introduction to Bioinformatics
• INFO I520 Security For Networked Systems
• INFO I529 Machine Learning in Bioinformatics
• INFO I590 Relational Probabilistic Models
• ILS Z637 - Information Visualization
• Every course in 500/600 SOIC related to data that is not in the list
• All courses from STAT that are 600 and above
655/25/2015
Details of Masters Degree
General Track
665/25/2015
General Track: Areas I and II
• I. Data analysis and statistics: gives students skills to
develop and extend algorithms, statistical approaches, and
visualization techniques for their explorations of large scale
data. Topics include data mining, information retrieval,
statistics, machine learning, and data visualization and will be
examined from the perspective of “big data,” using examples
from the application focus areas described in Section IV.
• II. Data lifecycle: gives students an understanding of the data
lifecycle, from digital birth to long-term preservation. Topics
include data curation, data stewardship, issues related to
retention and reproducibility, the role of the library and data
archives in digital data preservation and scholarly
communication and publication, and the organizational, policy,
and social impacts of big data.
675/25/2015
General Track: Areas III and IV
• III. Data management and infrastructure: gives students skills to
manage and support big data projects. Data have to be described,
discovered, and actionable. In data science, issues of scale come to the
fore, raising challenges of storage and large-scale computation. Topics
in data management include semantics, metadata, cyberinfrastructure
and cloud computing, databases and document stores, and security and
privacy and are relevant to both data science and “big data” data
science.
• IV. Big data application domains: gives students experience with data
analysis and decision making and is designed to equip them with the
ability to derive insights from vast quantities and varieties of data. The
teaching of data science, particularly its analytic aspects, is most
effective when an application area is used as a focus of study. The
degree will allow students to specialize in one or more application areas
which include, but are not limited to Business analytics, Science
informatics, Web science, Social data informatics, Health and
Biomedical informatics.
685/25/2015
I. Data Analysis and Statistics
• CSCI B503 Analysis of Algorithms
• CSCI B553 Probabilistic Approaches to Artificial Intelligence
• CSCI B652: Computer Models of Symbolic Learning
• CSCI B659 Information Theory and Inference
• CSCI B551: Elements of Artificial Intelligence
• CSCI B555: Machine Learning
• CSCI B565: Data Mining
• INFO I573: Programming for Science Informatics
• INFO I590 Visual Analytics
• INFO I590 Relational Probabilistic Models
• INFO I590 Applied Machine Learning
• ILS Z534: Information Retrieval: Theory and Practice
• ILS Z604: Topics in Library and Information Science: Big Data Analysis for Web and Text
• ILS Z637: Information Visualization
• STAT S520 Intro to Statistics
• STAT S670: Exploratory Data Analysis
• STAT S675: Statistical Learning & High-Dimensional Data Analysis
• (New Course CSCI) Algorithms for Big Data
• (New Course CSCI) Probabilistic Reasoning
• All courses from STAT that are 600 and above
695/25/2015
II. Data Lifecycle
• INFO I590: Data Provenance
• INFO I590 Complex Systems
• ILS Z604 Scholarly Communication
• ILS Z636: Semantic Web
• ILS Z652: Digital Libraries
• ILS Z604: Data Curation
• (New Course INFO): Social and Organizational Informatics of
Big Data
• (New Course ILS: Project Management for Data Science
• (New Course ILS): Big Data Policy
705/25/2015
III. Data Management and Infrastructure
• CSCI B534: Distributed Systems
• CSCI B552: Knowledge-Based Artificial Intelligence
• CSCI B561: Advanced Database Concepts
• CSCI B649: Cloud Computing (offered online)
• CSCI B649 Advanced Topics in Privacy
• CSCI B649: Topics in Systems: Cloud Computing for Data Intensive Sciences
• CSCI B661: Database Theory and System Design
• CSCI B662 Database Systems & Internal Design
• CSCI B669: Scientific Data Management and Preservation
• CSCI P536: Operating Systems
• CSCI P538 Computer Networks
• INFO I520 Security For Networked Systems
• INFO I525: Organizational Informatics and Economics of Security
• INFO I590 Complex Networks and their Applications
• INFO I590: Topics in Informatics: Data Management for Big Data
• INFO I590: Topics in Informatics: Big Data Open Source Software and Projects
• ILS S511: Database
• Every course in 500/600 SOIC related to data that is not in the list
715/25/2015
IV. Application areas
• CSCI B656: Web mining
• CSCI B679: Topics in Scientific Computing: High Performance Computing
• INFO I519 Introduction to Bioinformatics
• INFO I529 Machine Learning in Bioinformatics
• INFO I533 Systems & Protocol Security & Information Assurance
• INFO I590: Topics in Informatics: Big Data Applications and Analytics
• INFO I590: Topics in Informatics: Big Data in Drug Discovery, Health and
Translational Medicine
• ILS Z605: Internship in Data Science
• Kelley School of Business: business analytics course(s)
• Other courses from Indiana University e.g. Physics Data Analysis
725/25/2015
Typical Paths through Degree
735/25/2015
Technical Track of General DS Masters
• Year 1 Semester 1:
– INFO 590: Topics in Informatics: Big Data Applications and Analytics
– ILS Z604: Big Data Analytics for Web and Text
– STAT S520: Intro to Statistics
• Year 1: Semester 2:
– CSCI B661: Database Theory and System Design
– ILS Z637: Information Visualization
– STAT S670: Exploratory Data Analysis
• Year 1: Summer:
– CSCI B679: Topics in Scientific Computing: High Performance
Computing
• Year 2: Semester 3:
– CSCI B555: Machine Learning
– STAT S670: Exploratory Data Analysis
– CSCI B649: Cloud Computing
745/25/2015
Computational and Analytic Data Science track
• Year 1 Semester 1:
– B503 Analysis of Algorithms
– B561 Advanced Database Concepts
– S520 Introduction to Statistics
• Year 1: Semester 2:
– B649 Cloud Computing
– Z534: Information Retrieval: Theory and Practice
– B555 Machine Learning
• Year 1: Summer:
– ILS 605: Internship in Data Science
• Year 2: Semester 3:
– B565 Data Mining
– I520 Security For Networked Systems
– Z637 - Information Visualization
755/25/2015
An Information-oriented Track
• Year 1 Semester 1:
– INFO 590: Topics in Informatics: Big Data Applications and Analytics
– ILS Z604 Big Data Analytics for Web and Text.
– STAT S520 Intro to Statistics
• Year 1: Semester 2:
– CSCI B661 Database Theory and System Design
– ILS Z637: Information Visualization
– ILS Z653: Semantic Web
• Year 1: Summer:
– ILS 605: Internship in Data Science
• Year 2: Semester 3:
– ILS Z604 Data Curation
– ILS Z604 Scholarly Communication
– INFO I590: Data Provenance
765/25/2015

More Related Content

What's hot

Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceIntroduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data Science
Ferdin Joe John Joseph PhD
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
Joanne Luciano
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Ilkay Altintas, Ph.D.
 
Deploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National ScaleDeploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National Scale
michaeldwebb
 
Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management
Stephane Fellah
 
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
streamspotter
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
amiraryani
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
Alex Rayón Jerez
 
Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12
ASIS&T
 
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - IntroductionTUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
Hong-Linh Truong
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
NUS-ISS
 
Ci days notre_dame_april2010
Ci days notre_dame_april2010Ci days notre_dame_april2010
Ci days notre_dame_april2010
University of Illinois at Urbana-Champaign
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
University of Washington
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
Geoffrey Fox
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
LIBER Europe
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data Infrastructures
LIBER Europe
 
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
Dr.-Ing. Thomas Hartmann
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
Vaibhav Kurkute
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
The HDF-EOS Tools and Information Center
 

What's hot (20)

Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceIntroduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data Science
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
Deploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National ScaleDeploying Open Learning Analytics at a National Scale
Deploying Open Learning Analytics at a National Scale
 
Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management
 
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 
Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12
 
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - IntroductionTUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
TUW-ASE-Summer 2015: Advanced Services Engineering - Introduction
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Ci days notre_dame_april2010
Ci days notre_dame_april2010Ci days notre_dame_april2010
Ci days notre_dame_april2010
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data Infrastructures
 
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 

Similar to Lessons from Data Science Program at Indiana University: Curriculum, Students and Online Delivery

Creating Interactive Dashboards with Microsoft Excel
Creating Interactive Dashboards with Microsoft ExcelCreating Interactive Dashboards with Microsoft Excel
Creating Interactive Dashboards with Microsoft Excel
AACRAO
 
Launch of the Week: Eastern Washington University
Launch of the Week: Eastern Washington UniversityLaunch of the Week: Eastern Washington University
Launch of the Week: Eastern Washington University
Laura Faccone
 
Graduate Engineering Programs at USC (India_Fall 2016)
Graduate Engineering Programs at USC (India_Fall 2016)Graduate Engineering Programs at USC (India_Fall 2016)
Graduate Engineering Programs at USC (India_Fall 2016)
Laura Amalia Hartman
 
Priming the Computer Science Pump
Priming the Computer Science PumpPriming the Computer Science Pump
Priming the Computer Science Pump
WeTeach_CS
 
2019 01 16 data matters - v6 - Using data to support the student digital expe...
2019 01 16 data matters - v6 - Using data to support the student digital expe...2019 01 16 data matters - v6 - Using data to support the student digital expe...
2019 01 16 data matters - v6 - Using data to support the student digital expe...
jisc_digital_insights
 
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCETOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
SKILL-LYNC SUPPORT
 
Liverpool 2018 presentation
Liverpool 2018 presentationLiverpool 2018 presentation
LiFT Scholars Program: Supporting the Next Generation of Technology Leaders
LiFT Scholars Program: Supporting the Next Generation of Technology LeadersLiFT Scholars Program: Supporting the Next Generation of Technology Leaders
LiFT Scholars Program: Supporting the Next Generation of Technology Leaders
Dr. Molly Morin
 
Week #5 slide show
Week #5 slide showWeek #5 slide show
Week #5 slide show
Cheryl Simon
 
Project Lead the Way
Project Lead the WayProject Lead the Way
Project Lead the Way
NAFCareerAcads
 
Overview of Effective Learning Analytics Using data and analytics to support ...
Overview of Effective Learning Analytics Using data and analytics to support ...Overview of Effective Learning Analytics Using data and analytics to support ...
Overview of Effective Learning Analytics Using data and analytics to support ...
Bart Rienties
 
Keeping Pace with K-12 Digital Learning: Annual Review of Policy & Practice
Keeping Pace with K-12 Digital Learning: Annual Review of Policy & PracticeKeeping Pace with K-12 Digital Learning: Annual Review of Policy & Practice
Keeping Pace with K-12 Digital Learning: Annual Review of Policy & Practice
Michigan Virtual Learning Research Institute
 
Education broadband session 11 stories
Education broadband session   11 storiesEducation broadband session   11 stories
Education broadband session 11 stories
Michael Flood
 
Instructional materials & ict
Instructional materials & ictInstructional materials & ict
Instructional materials & ict
Bright Micah
 
Learning analytics and the learning and teaching journey | Prof Deborah West ...
Learning analytics and the learning and teaching journey | Prof Deborah West ...Learning analytics and the learning and teaching journey | Prof Deborah West ...
Learning analytics and the learning and teaching journey | Prof Deborah West ...
Blackboard APAC
 
AppCEDA Webinar UW-Madison 12-10-15
AppCEDA Webinar UW-Madison 12-10-15AppCEDA Webinar UW-Madison 12-10-15
AppCEDA Webinar UW-Madison 12-10-15
Matt Griswold
 
CHEDTEB - Student Survey
CHEDTEB - Student SurveyCHEDTEB - Student Survey
CHEDTEB - Student Survey
Petr Novotný
 
Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014
Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014
Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014
Jisc
 
The Long Road from Reactive to Proactive: Developing an Accessibility Strategy
The Long Road from Reactive to Proactive: Developing an Accessibility StrategyThe Long Road from Reactive to Proactive: Developing an Accessibility Strategy
The Long Road from Reactive to Proactive: Developing an Accessibility Strategy
3Play Media
 
Pam Muth and Lisa Bolton: Optimising QILT to improve the student experience
Pam Muth and Lisa Bolton: Optimising QILT to improve the student experiencePam Muth and Lisa Bolton: Optimising QILT to improve the student experience
Pam Muth and Lisa Bolton: Optimising QILT to improve the student experience
Studiosity.com
 

Similar to Lessons from Data Science Program at Indiana University: Curriculum, Students and Online Delivery (20)

Creating Interactive Dashboards with Microsoft Excel
Creating Interactive Dashboards with Microsoft ExcelCreating Interactive Dashboards with Microsoft Excel
Creating Interactive Dashboards with Microsoft Excel
 
Launch of the Week: Eastern Washington University
Launch of the Week: Eastern Washington UniversityLaunch of the Week: Eastern Washington University
Launch of the Week: Eastern Washington University
 
Graduate Engineering Programs at USC (India_Fall 2016)
Graduate Engineering Programs at USC (India_Fall 2016)Graduate Engineering Programs at USC (India_Fall 2016)
Graduate Engineering Programs at USC (India_Fall 2016)
 
Priming the Computer Science Pump
Priming the Computer Science PumpPriming the Computer Science Pump
Priming the Computer Science Pump
 
2019 01 16 data matters - v6 - Using data to support the student digital expe...
2019 01 16 data matters - v6 - Using data to support the student digital expe...2019 01 16 data matters - v6 - Using data to support the student digital expe...
2019 01 16 data matters - v6 - Using data to support the student digital expe...
 
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCETOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
 
Liverpool 2018 presentation
Liverpool 2018 presentationLiverpool 2018 presentation
Liverpool 2018 presentation
 
LiFT Scholars Program: Supporting the Next Generation of Technology Leaders
LiFT Scholars Program: Supporting the Next Generation of Technology LeadersLiFT Scholars Program: Supporting the Next Generation of Technology Leaders
LiFT Scholars Program: Supporting the Next Generation of Technology Leaders
 
Week #5 slide show
Week #5 slide showWeek #5 slide show
Week #5 slide show
 
Project Lead the Way
Project Lead the WayProject Lead the Way
Project Lead the Way
 
Overview of Effective Learning Analytics Using data and analytics to support ...
Overview of Effective Learning Analytics Using data and analytics to support ...Overview of Effective Learning Analytics Using data and analytics to support ...
Overview of Effective Learning Analytics Using data and analytics to support ...
 
Keeping Pace with K-12 Digital Learning: Annual Review of Policy & Practice
Keeping Pace with K-12 Digital Learning: Annual Review of Policy & PracticeKeeping Pace with K-12 Digital Learning: Annual Review of Policy & Practice
Keeping Pace with K-12 Digital Learning: Annual Review of Policy & Practice
 
Education broadband session 11 stories
Education broadband session   11 storiesEducation broadband session   11 stories
Education broadband session 11 stories
 
Instructional materials & ict
Instructional materials & ictInstructional materials & ict
Instructional materials & ict
 
Learning analytics and the learning and teaching journey | Prof Deborah West ...
Learning analytics and the learning and teaching journey | Prof Deborah West ...Learning analytics and the learning and teaching journey | Prof Deborah West ...
Learning analytics and the learning and teaching journey | Prof Deborah West ...
 
AppCEDA Webinar UW-Madison 12-10-15
AppCEDA Webinar UW-Madison 12-10-15AppCEDA Webinar UW-Madison 12-10-15
AppCEDA Webinar UW-Madison 12-10-15
 
CHEDTEB - Student Survey
CHEDTEB - Student SurveyCHEDTEB - Student Survey
CHEDTEB - Student Survey
 
Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014
Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014
Aligning IT and University Strategy - Paul Curran - Jisc Digital Festival 2014
 
The Long Road from Reactive to Proactive: Developing an Accessibility Strategy
The Long Road from Reactive to Proactive: Developing an Accessibility StrategyThe Long Road from Reactive to Proactive: Developing an Accessibility Strategy
The Long Road from Reactive to Proactive: Developing an Accessibility Strategy
 
Pam Muth and Lisa Bolton: Optimising QILT to improve the student experience
Pam Muth and Lisa Bolton: Optimising QILT to improve the student experiencePam Muth and Lisa Bolton: Optimising QILT to improve the student experience
Pam Muth and Lisa Bolton: Optimising QILT to improve the student experience
 

More from Geoffrey Fox

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
Geoffrey Fox
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Geoffrey Fox
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Geoffrey Fox
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
Geoffrey Fox
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
Geoffrey Fox
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
Geoffrey Fox
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC Technology
Geoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and Education
Geoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
Geoffrey Fox
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Geoffrey Fox
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
Remarks on MOOC's
Remarks on MOOC'sRemarks on MOOC's
Remarks on MOOC's
Geoffrey Fox
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
Geoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Geoffrey Fox
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
Geoffrey Fox
 

More from Geoffrey Fox (20)

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC Technology
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and Education
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Remarks on MOOC's
Remarks on MOOC'sRemarks on MOOC's
Remarks on MOOC's
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 

Recently uploaded

How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
EduSkills OECD
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 

Recently uploaded (20)

How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 

Lessons from Data Science Program at Indiana University: Curriculum, Students and Online Delivery

  • 1. Lessons from Data Science Program at Indiana University: Curriculum, Students and Online Delivery NSF/TCPP Workshop on Parallel and Distributed Computing Education Edupar at IPDPS 2015 May 25, 2015 5/25/2015 1 Geoffrey Fox gcf@indiana.edu http://www.infomall.org School of Informatics and Computing Digital Science Center Indiana University Bloomington
  • 2. School of Informatics and Computing 5/25/2015 2
  • 3. Background of the School • The School of Informatics was established in 2000 as first of its kind in the United States. • Computer Science was established in 1971 and became part of the school in 2005. • Library and Information Science was established in 1951 and became part of the school in 2013. • Now named the School of Informatics and Computing. • Data Science added January 2014 • Engineering to be added Fall 2016 5/25/2015 3
  • 4. What Is Our School About? The broad range of computing and information technology: science, a broad range of applications and human and societal implications. United by a focus on information and technology, our extensive programs include: • Computer Science • Informatics • Information Science • Library Science • Data Science (virtual - starting) • Engineering (real - expected) 5/25/2015 4
  • 5. Size of School (2014-2015) • Faculty ~100 • Students Undergraduate 1,472 Master’s 649 Ph.D. 243 • Female Undergraduates 21% (68% since 2007) • Female Graduate Students 28% (4% since 2007) Undergraduates mainly Informatics (75%); Graduates mainly Computer Science 5/25/2015 5
  • 7. SOIC Data Science Program • Cross Disciplinary Faculty – 31 in School of Informatics and Computing, a few in statistics and expanding across campus • Affordable online and traditional residential curricula or mix thereof • Masters, Certificate, PhD Minor in place; Full PhD being studied • http://www.soic.indiana.edu/graduate/degrees/data-science/index.html • Note data science mentioned in faculty advertisements but unlike other parts of School, there are no dedicated faculty 75/25/2015 It is around 7% of School looking at fraction of enrolled students summing graduate and undergraduate levels
  • 8. IU Data Science Program and Degrees • Program managed by cross disciplinary Faculty in Data Science. Currently Statistics and Informatics and Computing School but plans to expand scope to full campus • A purely online 4-course Certificate in Data Science has been running since January 2014 – Some switched to Online Masters – Most students are professionals taking courses in “free time” • A campus wide Ph.D. Minor in Data Science has been approved. • Masters in Data Science (10-course) approved October 2014 • Exploring PhD in Data Science • Courses labelled as “Decision-maker” and “Technical” paths where McKinsey says an order of magnitude more (1.5 million by 2018) unmet job openings in Decision-maker track 85/25/2015
  • 9. McKinsey Institute on Big Data Jobs • There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. • IU Data Science Decision Maker Path aimed at 1.5 million jobs. Technical Path covers the 140,000 to 190,000 http://www.mckinsey.com/mgi/publications/big_data/index.asp 95/25/2015
  • 10. Job Trends Big Data about an order of magnitude larger than data science 19 May 2015 Jobs 3475 for “data science“ 2277 for “data scientist“ 19488 for “big data” http://www.indeed.com/jobtrends? q=%22Data+science%22%2C+% 22data+scientist%22%2C+%22bi g+data%22%2C&l= 5/25/2015 10
  • 11. What is Data Science? • The next slide gives a definition arrived by a NIST study group fall 2013. • The previous slide says there are several jobs but that’s not enough! Is this a field – what is it and what is its core? – The emergence of the 4th or data driven paradigm of science illustrates significance - http://research.microsoft.com/en- us/collaboration/fourthparadigm/ – Discovery is guided by data rather than by a model – The End of (traditional) science http://www.wired.com/wired/issue/16-07 is famous here • Another example is recommender systems in Netflix, e-commerce etc. – Here data (user ratings of movies or products) allows an empirical prediction of what users like – Here we define points in spaces (of users or products), cluster them etc. – all conclusions coming from data 5/25/2015 11
  • 12. Data Science Definition from NIST Public Working Group • Data Science is the extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and analytical hypothesis analysis. • A Data Scientist is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle. See Big Data Definitions in http://bigdatawg.nist.gov/V1_output_docs.php 5/25/2015 12
  • 13. Some Existing Online Data Science Activities • Indiana University Masters is “blended”: online and/or residential; other universities offer residential • We discount online classes so that total cost of 10 courses is ~$11,500 (in state price) 30$35,490 5/25/2015 13
  • 14. Computational Science • Computational science has important similarities to data science but with a simulation rather than data analysis flavor. • Although a great deal of effort went into with meetings and several academic curricula/programs, it didn’t take off – In my experience not a lot of students were interested and – The academic job opportunities were not great • Data science has more jobs; maybe it will do better? • Can we usefully link these concepts? • PS both use parallel computing! • In days gone by, I did research in particle physics phenomenology which in retrospect was an early form of data science using models extensively 5/25/2015 14
  • 16. IU Data Science Program: Masters • Masters Fully approved by University and State October 14 2014 and started January 2015 • Blended online and residential (any combination) – Online offered at in-state rates (~$1100 per course) – Hybrid (online for a year and then residential) surprisingly not popular • Informatics, Computer Science, Information and Library Science in School of Informatics and Computing and the Department of Statistics, College of Arts and Science, IUB • 30 credits (10 conventional courses) • Basic (general) Masters degree plus tracks – Currently only track is “Computational and Analytic Data Science” – Other tracks expected such as Biomedical Data Science 165/25/2015
  • 17. Data Science Enrollment • Fall 2015, about 240 new applicants to program; cap enrollment • Certificate in Data Science (started January 2014) – Current 29 – 16 applicants (12 admits, 9 accepts) • (reduced from last year as prefer Masters) – Plus 40 students in special executive education certificate through Kelley Business School – currently just one class • Online Masters in Data Science (started January 2015) – Current 54 – 42 applicants; just 2 “hybrid” (28 admits, 19 accepts) – Transfers from certificate have head start • Residential Masters in Data Science (started January 2015) – Current 3 – 187 applicants (101 admits, 55 accepts) • Expected Data Science total enrollment Fall 2015 150-170 5/25/2015 17
  • 18. Advertising Campaign • Comparison of “Adwords” results for Three Masters Programs • Security Informatics • Data Science • Information and Library Science • CPC Cost per Click • CTR Click Through Rate 5/25/2015 18 Program Adwords timeframe Adwords Cost # clicks CTR CPC # applications Security 12/1-4/30 $13K 2,577 0.20% $5.10 26 Data Science 10/31-3/30 $17K 38,544 1.28% $0.43 267 ILS 9/1-4/30 $18K 4,382 0.11% $4.08 199
  • 20. 3 Types of Students • Professionals wanting skills to improve job or “required” by employee to keep up with technology advances • Traditional sources of IT Masters • Students in non IT fields wanting to do “domain specific data science” 5/25/2015 20
  • 21. What do students want? • Degree with some relevant curriculum – Data Science and Computer Science distinct BUT • Important goal often “Optional Practical Training” OPT allowing graduated students visa to work for US companies – Must have spent at least a year in US in residential program • Residential CS Masters (at IU) 95% foreign students • Online program students quite varied but mostly USA professionals aiming to improve/switch job 215/25/2015
  • 22. IU and Competition • With Computer Science, Informatics, ILS, Statistics, IU has particularly broad unrivalled technology base – Other universities have more domain data science than IU • Existing Masters in US in table. Many more certificates and related degrees (such as business analytics) School Program Campus Online Degree Columbia University Data Science Yes No MS 30 cr Illinois Institute of Technology Data Science Yes No MS 33 cr New York University Data Science Yes No MS 36 cr University of California Berkeley School of Information Master of Information and Data Science Yes Yes M.I.D.S University of Southern California Computer Science with Data Science Yes No MS 27 cr 5/25/2015 22
  • 23. Data Science Curriculum Faculty in Data Science is “virtual department” 4 course Certificate: purely online, started January 2014 10 course Masters: online/residential, started January 2015 5/25/2015 23
  • 24. Basic Masters Course Requirements • One course from two of three technology areas – I. Data analysis and statistics – II. Data lifecycle (includes “handling of research data”) – III. Data management and infrastructure • One course from (big data) application course cluster • Other courses chosen from list maintained by Data Science Program curriculum committee (or outside this with permission of advisor/ Curriculum Committee) • Capstone project optional • All students assigned an advisor who approves course choice. • Due to variation in preparation label courses – Decision Maker – Technical • Corresponding to two categories in McKinsey report – note Decision Maker had an order of magnitude more job openings expected 5/25/2015 24
  • 25. Computational and Analytic Data Science track • For this track, data science courses have been reorganized into categories reflecting the topics important for students wanting to prepare for computational and analytic data science careers for which a strong computer science background is necessary. Consequently, students in this track must complete additional requirements, • 1) A student has to take at least 3 courses (9 credits) from Category 1 Core Courses. Among them, B503 Analysis of Algorithms is required and the student should take at least 2 courses from the following 3: – B561 Advanced Database Concepts, – [STAT] S520 Introduction to Statistics OR (New Course) Probabilistic Reasoning – B555 Machine Learning OR I590 Applied Machine Learning • 2) A student must take at least 2 courses from Category 2 Data Systems, AND, at least 2 courses from Category 3 Data Analysis. Courses taken in Category 1 can be double counted if they are also listed in Category 2 or Category 3. • 3) A student must take at least 3 courses from Category 2 Data Systems, OR, at least 3 courses from Category 3 Data Analysis. Again, courses taken in Category 1 can be double counted if they are also listed in Category 2 or Category 3. One of these courses must be an application domain course 5/25/2015 25
  • 26. Admissions Criterion • Decided by Data Science Program Curriculum Committee • Need some computer programming experience (either through coursework or experience), and a mathematical background and knowledge of statistics will be useful • Tracks can impose stronger requirements • 3.0 Undergraduate GPA • A 500 word personal statement • GRE scores are required for all applicants. • 3 letters of recommendation 5/25/2015 26
  • 27. Geoffrey Fox’s Online Data Science Classes I Same class offered as • MOOC • Residential class • Online class for credit 5/25/2015 27
  • 28. Some Online Data Science Classes • BDAA: Big Data Applications & Analytics – Used to be called X-Informatics – ~40 hours of video mainly discussing applications (The X in X-Informatics or X-Analytics) in context of big data and clouds https://bigdatacourse.appspot.com/course • BDOSSP: Big Data Open Source Software and Projects http://bigdataopensourceprojects.soic.indiana.edu/ – ~27 Hours of video discussing HPC-ABDS and use on FutureSystems for Big Data software • Both divided into sections (coherent topics), units (~lectures) and lessons (5-20 minutes) in which student is meant to stay awake 285/25/2015
  • 29. • 1 Unit: Organizational Introduction • 1 Unit: Motivation: Big Data and the Cloud; Centerpieces of the Future Economy • 3 Units: Pedagogical Introduction: What is Big Data, Data Analytics and X-Informatics • SideMOOC: Python for Big Data Applications and Analytics: NumPy, SciPy, MatPlotlib • SideMOOC: Using FutureSystems for Java and Python • 4 Units: X-Informatics with X= LHC Analysis and Discovery of Higgs particle – Integrated Technology: Explore Events; histograms and models; basic statistics (Python and some in Java) • 3 Units on a Big Data Use Cases Survey • SideMOOC: Using Plotviz Software for Displaying Point Distributions in 3D • 3 Units: X-Informatics with X= e-Commerce and Lifestyle • Technology (Python or Java): Recommender Systems - K-Nearest Neighbors • Technology: Clustering and heuristic methods • 1 Unit: Parallel Computing Overview and familiar examples • 4 Units: Cloud Computing Technology for Big Data Applications & Analytics • 2 Units: X-Informatics with X = Web Search and Text Mining and their technologies • Technology for Big Data Applications & Analytics : Kmeans (Python/Java) • Technology for Big Data Applications & Analytics: MapReduce • Technology for Big Data Applications & Analytics : Kmeans and MapReduce Parallelism (Python/Java) • Technology for Big Data Applications & Analytics : PageRank (Python/Java) • 3 Units: X-Informatics with X = Sports • 1 Unit: X-Informatics with X = Health • 1 Unit: X-Informatics with X = Internet of Things & Sensors • 1 Unit: X-Informatics with X = Radar for Remote Sensing Big Data Applications & Analytics Topics Red = Software 295/25/2015
  • 30. http://x-informatics.appspot.com/course Example Google Course Builder MOOC 4 levels Course Sections (15) Units(37) Lessons(~250) Video 38.5 hrs Units are roughly traditional lecture Lessons are ~15 minute segments 5/25/2015 30 https://bigdatacoursespring2015.appspot.com/course
  • 31. http://x-informatics.appspot.com/course Example Google Course Builder MOOC The Physics Section expands to 4 units and 2 Homeworks Unit 9 expands to 5 lessons Lessons played on YouTube “talking head video + PowerPoint” 5/25/2015 31
  • 33. Course Home Page showing Syllabus 33 Note that we have a course – section – unit – lesson hierarchy (supported by Mooc Builder) with abstracts available at each level of hierarchy. The home page has overview information (shown earlier) plus a list of all sections and a syllabus shown above. 5/25/2015
  • 34. A typical lesson (the first in unit 21) Note links to all 37 units across the top 345/25/2015
  • 35. MOOC Version • Offered at https://bigdatacourse.appspot.com/preview • Open to everybody • Uses no University resources • Updated December 2014 • One of two SoIC MOOCs named one of “7 great MOOCs for techies” by ComputerWorld http://www.computerworld.com/article/2849569/7- great-moocs-for-techies-all-free-starting-soon.html November 2014 • May 14 2015 3562 enrolled – small by MOOC standards • Students from 108 countries – 1020 USA – 916 India – 180 Brazil – ~130 France, Spain, UK 5/25/2015 35 Associate's degree 73 Bachelor's degree 1078 Doctorate 243 High School and equivalent 176 Master's degree 1257 Other 42 (blank) 693 Student Starting Level
  • 36. Age Distribution: Average 34 5/25/2015 36
  • 37. Homeworks • These are online within Google Course Builder for the MOOC with peer assessment. In the 3 credit offerings, all graded material (homework and projects) is conducted traditionally through Indiana University Oncourse (superceded by Canvas). • Oncourse was additionally used to assign which videos should be watched each week and the discussion forum topics described later (these were just “special homeworks in Oncourse). • In the non-residential data science certificate class, the students were on a variable schedule (as typically working full time and many distractions; one for example had faculty position interviews) and considerable latitude was given for video and homework completion dates. 375/25/2015
  • 38. Discussion Forums • Each offering had a separate set of electronic discussion forums which were used for class announcements (replicating Oncourse) and for assigned discussions. • Following slide illustrates an assigned discussion on the implications of the success of e-commerce for the future of “real malls”. The students were given “participation credit” for posting here and these were very well received. • Later offerings made greater use of these forums. Based on student feedback, we encouraged even greater participation through students both posting and commenting. • Note I personally do not like specialized (walled garden) forums and the class forums were set up using standard Google Community Groups with a familiar elegant interface. These community groups also link well to Google Hangouts described later. • As well as interesting topics, all class announcements were made in the “Instructor” forum repeating information posted at Oncourse. Of course no sensitive material such as returned homework was posted on Google site. 385/25/2015
  • 39. The community group for one of classes and one forum (“No more malls”) 5/25/2015 39
  • 40. Hangouts and Adobe Connect • For the purely online offering, we supplemented the asynchronous material described above with real-time interactive Google Hangout video sessions. • Given varied time zones and weekday demands on students, these were held at 1pm Eastern on Sundays. • Google Hangouts are conveniently scheduled from community page and offer interactive video and chat capabilities that were well received. Other technologies such as Skype are also possible. • Hangouts are restricted to 10-15 people which was sufficient for this section but in general insufficient. Not all of 12 students attended a given class. • The Hangouts focused on general data science issues and the mechanics of the class. • Augment Hangout by non-video Adobe Connect session 405/25/2015
  • 41. Figure 6: Community Events for Online Data Science Certificate Course 415/25/2015
  • 42. In class Sessions • The residential sections had regular in class sessions; one 90 minute session per class each week. This was originally two sessions but reduced to one partly because online videos turned these into “flipped classes” with less need for in class time and partly to accommodate more students (77 total graduate and undergraduate) in two groups with separate classes. • These classes were devoted to discussions of course material, homework and largely the discussion forum topics. This part of course was not greatly liked by the students – especially the undergraduate section which voted in favor of a model with only the online components (including the discussion forums which they recommended expanding). • In particular the 9.30am start time was viewed as too early and intrinsically unattractive. 425/25/2015
  • 43. Geoffrey Fox’s Online Data Science Classes II 5/25/2015 43
  • 44. 5/25/2015 44 Kaleidoscope of (Apache) Big Data Stack (ABDS) and HPC Technologies Cross- Cutting Functions 1) Message and Data Protocols: Avro, Thrift, Protobuf 2) Distributed Coordination: Google Chubby, Zookeeper, Giraffe, JGroups 3) Security & Privacy: InCommon, Eduroam OpenStack Keystone, LDAP, Sentry, Sqrrl, OpenID, SAML OAuth 4) Monitoring: Ambari, Ganglia, Nagios, Inca 17) Workflow-Orchestration: ODE, ActiveBPEL, Airavata, Pegasus, Kepler, Swift, Taverna, Triana, Trident, BioKepler, Galaxy, IPython, Dryad, Naiad, Oozie, Tez, Google FlumeJava, Crunch, Cascading, Scalding, e-Science Central, Azure Data Factory, Google Cloud Dataflow, NiFi (NSA), Jitterbit, Talend, Pentaho, Apatar, Docker Compose 16) Application and Analytics: Mahout , MLlib , MLbase, DataFu, R, pbdR, Bioconductor, ImageJ, OpenCV, Scalapack, PetSc, Azure Machine Learning, Google Prediction API & Translation API, mlpy, scikit-learn, PyBrain, CompLearn, DAAL(Intel), Caffe, Torch, Theano, DL4j, H2O, IBM Watson, Oracle PGX, GraphLab, GraphX, IBM System G, GraphBuilder(Intel), TinkerPop, Google Fusion Tables, CINET, NWB, Elasticsearch, Kibana, Logstash, Graylog, Splunk, Tableau, D3.js, three.js, Potree, DC.js 15B) Application Hosting Frameworks: Google App Engine, AppScale, Red Hat OpenShift, Heroku, Aerobatic, AWS Elastic Beanstalk, Azure, Cloud Foundry, Pivotal, IBM BlueMix, Ninefold, Jelastic, Stackato, appfog, CloudBees, Engine Yard, CloudControl, dotCloud, Dokku, OSGi, HUBzero, OODT, Agave, Atmosphere 15A) High level Programming: Kite, Hive, HCatalog, Tajo, Shark, Phoenix, Impala, MRQL, SAP HANA, HadoopDB, PolyBase, Pivotal HD/Hawq, Presto, Google Dremel, Google BigQuery, Amazon Redshift, Drill, Kyoto Cabinet, Pig, Sawzall, Google Cloud DataFlow, Summingbird 14B) Streams: Storm, S4, Samza, Granules, Google MillWheel, Amazon Kinesis, LinkedIn Databus, Facebook Puma/Ptail/Scribe/ODS, Azure Stream Analytics, Floe 14A) Basic Programming model and runtime, SPMD, MapReduce: Hadoop, Spark, Twister, MR-MPI, Stratosphere (Apache Flink), Reef, Hama, Giraph, Pregel, Pegasus, Ligra, GraphChi, Galois, Medusa-GPU, MapGraph, Totem 13) Inter process communication Collectives, point-to-point, publish-subscribe: MPI, Harp, Netty, ZeroMQ, ActiveMQ, RabbitMQ, NaradaBrokering, QPid, Kafka, Kestrel, JMS, AMQP, Stomp, MQTT, Marionette Collective, Public Cloud: Amazon SNS, Lambda, Google Pub Sub, Azure Queues, Event Hubs 12) In-memory databases/caches: Gora (general object from NoSQL), Memcached, Redis, LMDB (key value), Hazelcast, Ehcache, Infinispan 12) Object-relational mapping: Hibernate, OpenJPA, EclipseLink, DataNucleus, ODBC/JDBC 12) Extraction Tools: UIMA, Tika 11C) SQL(NewSQL): Oracle, DB2, SQL Server, SQLite, MySQL, PostgreSQL, CUBRID, Galera Cluster, SciDB, Rasdaman, Apache Derby, Pivotal Greenplum, Google Cloud SQL, Azure SQL, Amazon RDS, Google F1, IBM dashDB, N1QL, BlinkDB 11B) NoSQL: Lucene, Solr, Solandra, Voldemort, Riak, Berkeley DB, Kyoto/Tokyo Cabinet, Tycoon, Tyrant, MongoDB, Espresso, CouchDB, Couchbase, IBM Cloudant, Pivotal Gemfire, HBase, Google Bigtable, LevelDB, Megastore and Spanner, Accumulo, Cassandra, RYA, Sqrrl, Neo4J, Yarcdata, AllegroGraph, Blazegraph, Facebook Tao, Titan:db, Jena, Sesame Public Cloud: Azure Table, Amazon Dynamo, Google DataStore 11A) File management: iRODS, NetCDF, CDF, HDF, OPeNDAP, FITS, RCFile, ORC, Parquet 10) Data Transport: BitTorrent, HTTP, FTP, SSH, Globus Online (GridFTP), Flume, Sqoop, Pivotal GPLOAD/GPFDIST 9) Cluster Resource Management: Mesos, Yarn, Helix, Llama, Google Omega, Facebook Corona, Celery, HTCondor, SGE, OpenPBS, Moab, Slurm, Torque, Globus Tools, Pilot Jobs 8) File systems: HDFS, Swift, Haystack, f4, Cinder, Ceph, FUSE, Gluster, Lustre, GPFS, GFFS Public Cloud: Amazon S3, Azure Blob, Google Cloud Storage 7) Interoperability: Libvirt, Libcloud, JClouds, TOSCA, OCCI, CDMI, Whirr, Saga, Genesis 6) DevOps: Docker (Machine, Swarm), Puppet, Chef, Ansible, SaltStack, Boto, Cobbler, Xcat, Razor, CloudMesh, Juju, Foreman, OpenStack Heat, Sahara, Rocks, Cisco Intelligent Automation for Cloud, Ubuntu MaaS, Facebook Tupperware, AWS OpsWorks, OpenStack Ironic, Google Kubernetes, Buildstep, Gitreceive, OpenTOSCA, Winery, CloudML, Blueprints, Terraform, DevOpSlang, Any2Api 5) IaaS Management from HPC to hypervisors: Xen, KVM, Hyper-V, VirtualBox, OpenVZ, LXC, Linux-Vserver, OpenStack, OpenNebula, Eucalyptus, Nimbus, CloudStack, CoreOS, rkt, VMware ESXi, vSphere and vCloud, Amazon, Azure, Google and other public Clouds Networking: Google Cloud DNS, Amazon Route 53 21 layers Over 350 Software Packages May 15 2015
  • 45. Big Data & Open Source Software Projects Overview • This course studies DevOps and software used in many commercial activities to study Big Data. • The backdrop for course is the ~350 software subsystems HPC-ABDS (High Performance Computing enhanced - Apache Big Data Stack) illustrated at http://hpc-abds.org/kaleidoscope/ • The cloud computing architecture underlying ABDS and contrast of this with HPC. • The main activity of the course is building a significant project using multiple HPC-ABDS subsystems combined with user code and data. • Projects will be suggested or students can chose their own • http://cloudmesh.github.io/introduction_to_cloud_computing/class/lesson/projects.html • For more information, see: http://bigdataopensourceprojects.soic.indiana.edu/ and • http://cloudmesh.github.io/introduction_to_cloud_computing/class/bdossp_sp15/week_plan.html • 25 Hours of Video • Probably too much for semester class 5/25/2015 45
  • 49. Unexpected Lessons • We learnt some things from current offering of BDOSSP class – 40 online students from around the world • The hyperlinking of material caused students NOT to go through material systematically – Suggest go to structured hierarchy as in BDAA Course – Followed from use of Canvas as mundane LMS plus multiple web resources (Microsoft Office Mix and our computer support pages) • Students did not use email and discussion groups in Canvas; we switched to emails and list serves to the their main (not IU) email • Very erratic progress due to different time zones and interruption of full time job for each student – Difficult to have communal “help” sessions and to give interactive support at time student wanted • OpenStack fragile! 5/25/2015 49
  • 51. Background on MOOC’s • MOOC’s are a “disruptive force” in the educational environment – Coursera, Udacity, Khan Academy and many others • MOOC’s have courses and technologies • Google Course Builder and OpenEdX are open source MOOC technologies • Blackboard and others are learning management systems with (some) MOOC support • Coursera Udacity etc. have internal proprietary MOOC software • This software is LMS++ • LMS= Learning Management system 515/25/2015
  • 52. MOOC Style Implementations • Courses from commercial sources, universities and partnerships • Courses with 100,000 students (free) • Georgia Tech a leader in rigorous academic curriculum – MOOC style Masters in Computer Science (pay tuition, get regular GT degree) • Interesting way to package tutorial material for computers and software e.g. – E.g. Course online programming laboratories supported by MOOC modules on how to use system 525/25/2015
  • 54. MOOCs in SC community • Activities like CI-Tutor and HPC University are community activities that have collected much re-usable education material • MOOC’s naturally support re-use at lesson or higher level – e.g. include MPI on XSEDE MOOC as part of many parallel programming classes • Need to develop agreed ways to use backend servers (HPC or Cloud) to support MOOC laboratories – Students should be able to take MOOC classes from tablet or phone • Parts of MOOC’s (Units or Sections) can be used as modules to enhance classes in outreach activities 545/25/2015
  • 57. Potpourri of Online Technologies • Canvas (Indiana University Default): Best for interface with IU grading and records • Google Course Builder: Best for management and integration of components • Ad hoc web pages: alternative easy to build integration • Microsoft Mix: Simplest faculty preparation interface • Adobe Presenter/Camtasia: More powerful video preparation that support subtitles but not clearly needed • Google Community: Good social interaction support • YouTube: Best user interface for videos (without Mix PowerPoint support) • Hangout: Best for instructor-students online interactions (one instructor to 9 students with live feed). Hangout on air mixes live and streaming (30 second delay from archived YouTube) and more participants • OpenEdX at one time future of Google Course Builder and getting easier to use but still significant effort 575/25/2015
  • 58. Four Online Platforms I 5/25/2015 58 CourseBuilder OpenEdx IU Canvas OfficeMix Plugin for Powerpoint OpenSource Yes Yes No N/A Microsoft Integration (Office 365, Onedrive, Azure cloud) No Yes. Predicted to be included in the upcoming release. No N/A Analytics Some analytics included but not comprehensive. Still needs more development. No analytics included but there is a version 0 alpha release Analytics API available for use. External apps can be developed Very basic Very basic but more useful than Canvas Peer reviews Yes Yes Yes No
  • 59. Four Online Platforms II 5/25/2015 59 CourseBuilder OpenEdx IU Canvas OfficeMix Plugin for Powerpoint LTI Compliance (Learning Technologies Integration) Yes. CB as a LTI provider or consumer. Yes. Yes. Functionality might be limited by IU. N/A Ease of use and customization scale 5/10 for students, faculty, developers 7/10 – ease of use by students, faculty 3/10 – customization by developer N/A PowerPoint Slide labelled Videos could be an advantage Ease of Deployment 10/10 1/10 N/A N/A Cost Almost none; Can rise with increase usage of cloud transactions but usually a very low cost operation Very expensive to deploy and maintain the servers; Need a dedicated staff for administering servers; IU provided N/A 1 – not easy 10 – very easy
  • 60. Four Online Platforms III 5/25/2015 60 CourseBuilder OpenEdx IU Canvas OfficeMix Plugin for Powerpoint Unique Features and Functionality Skill maps BigQuery for Analytics Good UI for course administration; Integrated forums, grading, content area, and much more. Export/Import Grades Enables faculty to record their own videos and insert interactive content such as quizzes, programming test-bed etc. Common features Certificates Generation Supported; Quizzes; Assessments; Peer Reviews; Autograding Certificate Generation Supported; Quizzes; Assessments; Autograding; Quizzes; Assessments; Peer Review N/A
  • 62. Lessons / Insights • Data Science is a very healthy area • At IU, I expect to grow in interest although set up as a program has strange side effects • Not clear if Online education is taking off but may be distorted by US Company hiring practices • I teach all my classes – residential or online -- with online lectures • All of this straightforward but hard work • Current open source and proprietary MOOC software not very satisfactory; “easy” to do better • No reason to differentiate MOOC and general LMS 5/25/2015 62
  • 63. Details of Masters Degree Computational and Analytic Data Science track 635/25/2015
  • 64. Computational and Analytic Data Science track • Category 1: Core Courses • CSCI B503 Analysis of Algorithms • CSCI B555 Machine Learning OR INFO I590 Applied Machine Learning • CSCI B561 Advanced Database Concepts • STAT S520 Introduction to Statistics OR (New Course) Probabilistic Reasoning • Category 2: Data Systems • CSCI B534 Distributed Systems CSCI B561 Advanced Database Concepts, CSCI B662 Database Systems & Internal Design • CSCI B649 Cloud Computing CSCI B649 Advanced Topics in Privacy • CSCI P538 Computer Networks • INFO I533 Systems & Protocol Security & Information Assurance • ILS Z534: Information Retrieval: Theory and Practice 645/25/2015
  • 65. Computational and Analytic Data Science track • Category 3: Data Analysis • CSCI B565 Data Mining • CSCI B555 Machine Learning • INFO I590 Applied Machine Learning • INFO I590 Complex Networks and Their Applications • STAT S520 Introduction to Statistics • (New Course) Probabilistic Reasoning • (New Course CSCI) Algorithms for Big Data • Category 4: Elective Courses • CSCI B551 Elements of Artificial Intelligence • CSCI B553 Probabilistic Approaches to Artificial Intelligence • CSCI B659 Information Theory and Inference • CSCI B661 Database Theory and Systems Design • INFO I519 Introduction to Bioinformatics • INFO I520 Security For Networked Systems • INFO I529 Machine Learning in Bioinformatics • INFO I590 Relational Probabilistic Models • ILS Z637 - Information Visualization • Every course in 500/600 SOIC related to data that is not in the list • All courses from STAT that are 600 and above 655/25/2015
  • 66. Details of Masters Degree General Track 665/25/2015
  • 67. General Track: Areas I and II • I. Data analysis and statistics: gives students skills to develop and extend algorithms, statistical approaches, and visualization techniques for their explorations of large scale data. Topics include data mining, information retrieval, statistics, machine learning, and data visualization and will be examined from the perspective of “big data,” using examples from the application focus areas described in Section IV. • II. Data lifecycle: gives students an understanding of the data lifecycle, from digital birth to long-term preservation. Topics include data curation, data stewardship, issues related to retention and reproducibility, the role of the library and data archives in digital data preservation and scholarly communication and publication, and the organizational, policy, and social impacts of big data. 675/25/2015
  • 68. General Track: Areas III and IV • III. Data management and infrastructure: gives students skills to manage and support big data projects. Data have to be described, discovered, and actionable. In data science, issues of scale come to the fore, raising challenges of storage and large-scale computation. Topics in data management include semantics, metadata, cyberinfrastructure and cloud computing, databases and document stores, and security and privacy and are relevant to both data science and “big data” data science. • IV. Big data application domains: gives students experience with data analysis and decision making and is designed to equip them with the ability to derive insights from vast quantities and varieties of data. The teaching of data science, particularly its analytic aspects, is most effective when an application area is used as a focus of study. The degree will allow students to specialize in one or more application areas which include, but are not limited to Business analytics, Science informatics, Web science, Social data informatics, Health and Biomedical informatics. 685/25/2015
  • 69. I. Data Analysis and Statistics • CSCI B503 Analysis of Algorithms • CSCI B553 Probabilistic Approaches to Artificial Intelligence • CSCI B652: Computer Models of Symbolic Learning • CSCI B659 Information Theory and Inference • CSCI B551: Elements of Artificial Intelligence • CSCI B555: Machine Learning • CSCI B565: Data Mining • INFO I573: Programming for Science Informatics • INFO I590 Visual Analytics • INFO I590 Relational Probabilistic Models • INFO I590 Applied Machine Learning • ILS Z534: Information Retrieval: Theory and Practice • ILS Z604: Topics in Library and Information Science: Big Data Analysis for Web and Text • ILS Z637: Information Visualization • STAT S520 Intro to Statistics • STAT S670: Exploratory Data Analysis • STAT S675: Statistical Learning & High-Dimensional Data Analysis • (New Course CSCI) Algorithms for Big Data • (New Course CSCI) Probabilistic Reasoning • All courses from STAT that are 600 and above 695/25/2015
  • 70. II. Data Lifecycle • INFO I590: Data Provenance • INFO I590 Complex Systems • ILS Z604 Scholarly Communication • ILS Z636: Semantic Web • ILS Z652: Digital Libraries • ILS Z604: Data Curation • (New Course INFO): Social and Organizational Informatics of Big Data • (New Course ILS: Project Management for Data Science • (New Course ILS): Big Data Policy 705/25/2015
  • 71. III. Data Management and Infrastructure • CSCI B534: Distributed Systems • CSCI B552: Knowledge-Based Artificial Intelligence • CSCI B561: Advanced Database Concepts • CSCI B649: Cloud Computing (offered online) • CSCI B649 Advanced Topics in Privacy • CSCI B649: Topics in Systems: Cloud Computing for Data Intensive Sciences • CSCI B661: Database Theory and System Design • CSCI B662 Database Systems & Internal Design • CSCI B669: Scientific Data Management and Preservation • CSCI P536: Operating Systems • CSCI P538 Computer Networks • INFO I520 Security For Networked Systems • INFO I525: Organizational Informatics and Economics of Security • INFO I590 Complex Networks and their Applications • INFO I590: Topics in Informatics: Data Management for Big Data • INFO I590: Topics in Informatics: Big Data Open Source Software and Projects • ILS S511: Database • Every course in 500/600 SOIC related to data that is not in the list 715/25/2015
  • 72. IV. Application areas • CSCI B656: Web mining • CSCI B679: Topics in Scientific Computing: High Performance Computing • INFO I519 Introduction to Bioinformatics • INFO I529 Machine Learning in Bioinformatics • INFO I533 Systems & Protocol Security & Information Assurance • INFO I590: Topics in Informatics: Big Data Applications and Analytics • INFO I590: Topics in Informatics: Big Data in Drug Discovery, Health and Translational Medicine • ILS Z605: Internship in Data Science • Kelley School of Business: business analytics course(s) • Other courses from Indiana University e.g. Physics Data Analysis 725/25/2015
  • 73. Typical Paths through Degree 735/25/2015
  • 74. Technical Track of General DS Masters • Year 1 Semester 1: – INFO 590: Topics in Informatics: Big Data Applications and Analytics – ILS Z604: Big Data Analytics for Web and Text – STAT S520: Intro to Statistics • Year 1: Semester 2: – CSCI B661: Database Theory and System Design – ILS Z637: Information Visualization – STAT S670: Exploratory Data Analysis • Year 1: Summer: – CSCI B679: Topics in Scientific Computing: High Performance Computing • Year 2: Semester 3: – CSCI B555: Machine Learning – STAT S670: Exploratory Data Analysis – CSCI B649: Cloud Computing 745/25/2015
  • 75. Computational and Analytic Data Science track • Year 1 Semester 1: – B503 Analysis of Algorithms – B561 Advanced Database Concepts – S520 Introduction to Statistics • Year 1: Semester 2: – B649 Cloud Computing – Z534: Information Retrieval: Theory and Practice – B555 Machine Learning • Year 1: Summer: – ILS 605: Internship in Data Science • Year 2: Semester 3: – B565 Data Mining – I520 Security For Networked Systems – Z637 - Information Visualization 755/25/2015
  • 76. An Information-oriented Track • Year 1 Semester 1: – INFO 590: Topics in Informatics: Big Data Applications and Analytics – ILS Z604 Big Data Analytics for Web and Text. – STAT S520 Intro to Statistics • Year 1: Semester 2: – CSCI B661 Database Theory and System Design – ILS Z637: Information Visualization – ILS Z653: Semantic Web • Year 1: Summer: – ILS 605: Internship in Data Science • Year 2: Semester 3: – ILS Z604 Data Curation – ILS Z604 Scholarly Communication – INFO I590: Data Provenance 765/25/2015