Introduction to Data Science
Frank Kienle
Lecture overview
Personal Introduction
§ Phd and Habilitation at Technical University of Kaiserslautern, Germany
§ Lecturerer 2008 – 2012 with focus on implementation (micro electronics)
of complex algorithms
§ 2013 – 2017 with Blue Yonder (www.blue-yonder.com) first as a senior data
scientist - later as director data science consulting
§ Since 2014 Privat Dozent at TUKL with focus on teaching data science
practice
The lecture addresses students that are interested in the topic of big data,
programming skills and business models. All three topics are addressed - examples
are presented with respect to predictive models in python.
The internet-of-things describes the change in technology where modern
information technology is penetrating all industrial processes. Here, each device,
machine, and sensors are connected to gather information.
The age of data gathering started already 10 years ago and is often coined under
the term big data. Today, big data is any data that is expensive to manage and
hard to extract value from.
Predictive Analytics is the art to extract value out of big data with the task to
leveraging industrial revenues.
Lecture Context
01.05.17 Frank Kienle p. 3
In this lecture we focus on predictive modeling (machine learning) via python and
how to solve the related business problem. Programming skills are mandatory for a
data scientist; thus, programming exercises have to done by the students.
Predictive models forecast the future given historic data sets. For this machine
learning becomes mandatory. In this lecture we will use the so-called scikit-learn
python library to demonstrate pitfalls and best practices to solve a problem. Note
that a full coverage of these topics is not possible. Thus, only basic concepts are
sketched by using the python programming language.
One of the chief pitfalls of data analysis is attempting to solve the wrong problem.
Thus, the lecture focuses heavily on the business side and how to address the
correct data questions. Persons responsible to solve data science problem in
industry needs to solve a business problem. The job profile is often denoted as
data scientists.
‘Data Scientist: The Sexiest Job of the 21st Century – HBR article @ https://
hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
Lecture Context
01.05.17 Frank Kienle p. 4
Students’ prerequisites:
every topic in these days can be found in the internet. Teaching facts and
controlling it is not the purpose of the lecture. The idea is to widen the scope for data
science by working on ,real or artificial’ use cases
•  slides and online resources will be provided (see information collateral uploaded at
http://de.slideshare.net/frankkienle )
•  important topics will be presented in a compressed style within the lecture, however,
the information collateral provides already all mandatory information
 
•  discussions are always related to use case, real data sets are utilized to demonstrate
problems and pitfalls (hacking skills in python have to be developed)
•  active participation and open discussion philosophy
•  doing the python programming exercises is a prerequisite for the exam
Lecture Overview: applied inverted class room concept
with strong focus on teaching concepts
01.05.17 Frank Kienle 5
Teaching Facts
Teachers help students learn facts—that is, verifiable pieces of specific information.
Facts take a variety of forms, including definitions, names, dates, and formulae.
Sample question used when teaching facts: “What is this?”
Teaching Skills
Teachers also want students to learn skills. Skills are best considered a type of
learning that gets better with practice. Practicing programming will likely make
you more efficient (maybe as well effective). Methods for teaching skills usually
involve practice in which the teacher gives quick feedback on the student's
performance. Sample feedback used when teaching skills: “That time was better.
Can you tell what you did differently?”
Teaching Facts, Skills, Concepts*
01.05.17 Frank Kienle p. 6
*https://people.ucsc.edu/~ktellez/facts-skills-con.html
Teaching Concepts
Teachers are generally most concerned with conceptual learning because it helps
learners to understand why.
Concepts are distinguished from facts in that they are a much broader, deeper type
of knowledge. Learning a concept should help the learner generalize from the
teaching context to other, different contexts.
Concepts are also different from facts and skills because they involve relationships
or processes. Teaching for concepts can take many forms.
One common method for conceptual development is the use of examples and non-
examples, with a focus on attributes/criteria for inclusion. Teachers also engage in
hypothetical questioning and systems analysis instruction for teaching concepts.
Teaching Facts, Skills, Concepts*
01.05.17 Frank Kienle p. 7
*https://people.ucsc.edu/~ktellez/facts-skills-con.html
•  What is a data scientist
•  Skillsets and different profiles for a data scientists
•  Introduction to Big Data
•  Machine Learning (part 1 to 3)
•  Introduction to Data Bases
•  Programming/Hacking day: goal is to enable a quick start for beginners, give
hints for more advanced programmers
•  Use case preparation (programming work, mandatory homework)
•  Business Models/Business Frameworks
•  DevOps and professional environments
•  Data Science: best practices
Basic Building Blocks (many personal perspectives)
01.05.17 Frank Kienle p. 8
Introduction to Data Science
Frank Kienle
Additional Information Collateral
Building data science teams
Data science teams need people with the skills and curiosity to ask the big
questions.
@http://radar.oreilly.com/2011/09/building-data-science-teams.html
The field guide to data science (version 2015: advise to read)
@https://www.boozallen.com/content/dam/boozallen_site/sig/pdf/publications/
2015-field-guide-to-data-science-160211215115.pdf
Data Science Work/Overview
01.05.17 Frank Kienle p. 10
Why Software Is Eating The World
Marc Andreesen, August 20, 2011
@http://www.wsj.com/articles/… (advise to read)
Big data: The next frontier for innovation, competition, and productivity
McKinsey 2011, full report
@http://www.mckinsey.com/insights/business_technology/…
The age of analytics: Competing in a data-driven world
McKinsey 2016, full report
@http://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/
the-age-of-analytics-competing-in-a-data-driven-world
Data Driven Business
01.05.17 Frank Kienle p. 11
market/
understanding
progress
Learning Python
http://docs.python-guide.org/en/latest/intro/learning/
Python Koans (advise to do)
https://bitbucket.org/gregmalcolm/python_koans
A Crash Course in Python for Scientists
http://nbviewer.jupyter.org/gist/rpmuller/5920182
Python Programming
01.05.17 Frank Kienle p. 12
All python exercise/snippets are based on 4 sources
01.05.17 p. 13Frank Kienle
Introduction to Data Science
@https://www.coursera.org/course/datasci
(advise to look at)
Full Topic: Relational Databases, Relational Algebra,
Full Topic: MapReduce,
NoSQL Introduction and Eventual Consistency
Machine Learning(Stanford)
@https://www.coursera.org/course/ml
(advise to look at)
Topic I – IV, VII, X
Data Science/Machine Learning (online courses)
01.05.17 Frank Kienle p. 14
Statistical Analysis & Data Mining Mistakes,
R. Nisbet, J. Elder, G. Miner, ISBN: 978-0-123747655
advise to read (Chapter 20 - Top 10 Data Mining Mistakes)
Data Analytics/ Data Science Books (high level books, easy reading)
01.05.17 Frank Kienle p. 15
Data Science for Business: What you need to know about data mining and data-
analytic thinking
Foster Provost, Tom Fawcett, ISBN: 978-1449361327
Amazon Web Service (AWS) Tutorials
http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-
tutorials.html
Using Vagrant and Ansible (advise to try)
http://docs.ansible.com/guide_vagrant.html
Platforms/Deployment
01.05.17 Frank Kienle p. 16

Data Science Lecture: Overview and Information Collateral

  • 1.
    Introduction to DataScience Frank Kienle Lecture overview
  • 2.
    Personal Introduction § Phd andHabilitation at Technical University of Kaiserslautern, Germany § Lecturerer 2008 – 2012 with focus on implementation (micro electronics) of complex algorithms § 2013 – 2017 with Blue Yonder (www.blue-yonder.com) first as a senior data scientist - later as director data science consulting § Since 2014 Privat Dozent at TUKL with focus on teaching data science practice
  • 3.
    The lecture addressesstudents that are interested in the topic of big data, programming skills and business models. All three topics are addressed - examples are presented with respect to predictive models in python. The internet-of-things describes the change in technology where modern information technology is penetrating all industrial processes. Here, each device, machine, and sensors are connected to gather information. The age of data gathering started already 10 years ago and is often coined under the term big data. Today, big data is any data that is expensive to manage and hard to extract value from. Predictive Analytics is the art to extract value out of big data with the task to leveraging industrial revenues. Lecture Context 01.05.17 Frank Kienle p. 3
  • 4.
    In this lecturewe focus on predictive modeling (machine learning) via python and how to solve the related business problem. Programming skills are mandatory for a data scientist; thus, programming exercises have to done by the students. Predictive models forecast the future given historic data sets. For this machine learning becomes mandatory. In this lecture we will use the so-called scikit-learn python library to demonstrate pitfalls and best practices to solve a problem. Note that a full coverage of these topics is not possible. Thus, only basic concepts are sketched by using the python programming language. One of the chief pitfalls of data analysis is attempting to solve the wrong problem. Thus, the lecture focuses heavily on the business side and how to address the correct data questions. Persons responsible to solve data science problem in industry needs to solve a business problem. The job profile is often denoted as data scientists. ‘Data Scientist: The Sexiest Job of the 21st Century – HBR article @ https:// hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ Lecture Context 01.05.17 Frank Kienle p. 4
  • 5.
    Students’ prerequisites: every topicin these days can be found in the internet. Teaching facts and controlling it is not the purpose of the lecture. The idea is to widen the scope for data science by working on ,real or artificial’ use cases •  slides and online resources will be provided (see information collateral uploaded at http://de.slideshare.net/frankkienle ) •  important topics will be presented in a compressed style within the lecture, however, the information collateral provides already all mandatory information   •  discussions are always related to use case, real data sets are utilized to demonstrate problems and pitfalls (hacking skills in python have to be developed) •  active participation and open discussion philosophy •  doing the python programming exercises is a prerequisite for the exam Lecture Overview: applied inverted class room concept with strong focus on teaching concepts 01.05.17 Frank Kienle 5
  • 6.
    Teaching Facts Teachers helpstudents learn facts—that is, verifiable pieces of specific information. Facts take a variety of forms, including definitions, names, dates, and formulae. Sample question used when teaching facts: “What is this?” Teaching Skills Teachers also want students to learn skills. Skills are best considered a type of learning that gets better with practice. Practicing programming will likely make you more efficient (maybe as well effective). Methods for teaching skills usually involve practice in which the teacher gives quick feedback on the student's performance. Sample feedback used when teaching skills: “That time was better. Can you tell what you did differently?” Teaching Facts, Skills, Concepts* 01.05.17 Frank Kienle p. 6 *https://people.ucsc.edu/~ktellez/facts-skills-con.html
  • 7.
    Teaching Concepts Teachers aregenerally most concerned with conceptual learning because it helps learners to understand why. Concepts are distinguished from facts in that they are a much broader, deeper type of knowledge. Learning a concept should help the learner generalize from the teaching context to other, different contexts. Concepts are also different from facts and skills because they involve relationships or processes. Teaching for concepts can take many forms. One common method for conceptual development is the use of examples and non- examples, with a focus on attributes/criteria for inclusion. Teachers also engage in hypothetical questioning and systems analysis instruction for teaching concepts. Teaching Facts, Skills, Concepts* 01.05.17 Frank Kienle p. 7 *https://people.ucsc.edu/~ktellez/facts-skills-con.html
  • 8.
    •  What isa data scientist •  Skillsets and different profiles for a data scientists •  Introduction to Big Data •  Machine Learning (part 1 to 3) •  Introduction to Data Bases •  Programming/Hacking day: goal is to enable a quick start for beginners, give hints for more advanced programmers •  Use case preparation (programming work, mandatory homework) •  Business Models/Business Frameworks •  DevOps and professional environments •  Data Science: best practices Basic Building Blocks (many personal perspectives) 01.05.17 Frank Kienle p. 8
  • 9.
    Introduction to DataScience Frank Kienle Additional Information Collateral
  • 10.
    Building data scienceteams Data science teams need people with the skills and curiosity to ask the big questions. @http://radar.oreilly.com/2011/09/building-data-science-teams.html The field guide to data science (version 2015: advise to read) @https://www.boozallen.com/content/dam/boozallen_site/sig/pdf/publications/ 2015-field-guide-to-data-science-160211215115.pdf Data Science Work/Overview 01.05.17 Frank Kienle p. 10
  • 11.
    Why Software IsEating The World Marc Andreesen, August 20, 2011 @http://www.wsj.com/articles/… (advise to read) Big data: The next frontier for innovation, competition, and productivity McKinsey 2011, full report @http://www.mckinsey.com/insights/business_technology/… The age of analytics: Competing in a data-driven world McKinsey 2016, full report @http://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/ the-age-of-analytics-competing-in-a-data-driven-world Data Driven Business 01.05.17 Frank Kienle p. 11 market/ understanding progress
  • 12.
    Learning Python http://docs.python-guide.org/en/latest/intro/learning/ Python Koans(advise to do) https://bitbucket.org/gregmalcolm/python_koans A Crash Course in Python for Scientists http://nbviewer.jupyter.org/gist/rpmuller/5920182 Python Programming 01.05.17 Frank Kienle p. 12
  • 13.
    All python exercise/snippetsare based on 4 sources 01.05.17 p. 13Frank Kienle
  • 14.
    Introduction to DataScience @https://www.coursera.org/course/datasci (advise to look at) Full Topic: Relational Databases, Relational Algebra, Full Topic: MapReduce, NoSQL Introduction and Eventual Consistency Machine Learning(Stanford) @https://www.coursera.org/course/ml (advise to look at) Topic I – IV, VII, X Data Science/Machine Learning (online courses) 01.05.17 Frank Kienle p. 14
  • 15.
    Statistical Analysis &Data Mining Mistakes, R. Nisbet, J. Elder, G. Miner, ISBN: 978-0-123747655 advise to read (Chapter 20 - Top 10 Data Mining Mistakes) Data Analytics/ Data Science Books (high level books, easy reading) 01.05.17 Frank Kienle p. 15 Data Science for Business: What you need to know about data mining and data- analytic thinking Foster Provost, Tom Fawcett, ISBN: 978-1449361327
  • 16.
    Amazon Web Service(AWS) Tutorials http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws- tutorials.html Using Vagrant and Ansible (advise to try) http://docs.ansible.com/guide_vagrant.html Platforms/Deployment 01.05.17 Frank Kienle p. 16