6. Introductions
n Application Developer, Web Group
n Concurrent Instructor
q Building Web Applications (MGTI)
q Data Analysis with Python (ITAO)
q Survey of Software Engineering (ITAO)
n At Notre Dame
q 8 years in OIT
q 2 years in Mendoza (Senior Web Developer)
q 5+ years in MarComm
7. Course Overview: Resources
n Al Sweigart (2017) Automate the Boring Stuff with Python (ABSP)
q E-copy is available for free at https://automatetheboringstuff.com
n Jake VanderPlas (2016) Python Data Science Handbook Essential
Tools for Working with Data. (PDSH) (O'Reilly Media)
q Available in bookstore
q Available online shops (and free online version)
n Access to Vocareum, a cloud based class management system
REFERENCE
n William McKinney (2017) Python for Data Analysis, 2nd Edition (PyDA)
(O’Reilly Media)
10. Course Overview: Vocareum for Online-
homework
n In order to enroll in the class on Vocareum you must
click and finish Homework 0 on Sakai.
q Remember there are two parts to submit
n This semester access to Vocareum is provided for
free to Notre Dame Students through a funding from
the Provost Learning Initiative grant.
n Set of assignments are due on a given day
q Do not wait till the last day to finish the set
q Penalty is applied for late submission (see Syllabus for
more details)
11. Course Overview: Objectives
n Analyzing data is all about asking the right questions and then
presenting your results in the best way possible to get your message
across.
q Understand the fundamentals of Python programming language and learn to
code.
q Gain a hands-on experience in data analysis by working with large amount of data
in Python.
q Write programs to collect, process and store data for future analysis.
q Apply data analytics tools to extract relevant information and present results as
business reports.
q Being able to tell a story with the data. This involves short summaries and
visualizations.
12. Course Overview: Semi-Flip Approach
n Specific links and pages from the textbook
are presented at the end of each class.
q You need to practice them before you come to the
class.
q If you are looking at the material for the first
time in the class, it can be very confusing.
n Most of the classes are split between the
lecture slides/notes and in class practice
sessions.
13. Course Overview: Lab Computer Usage
n We will be using the lab computers
q Hands-on experience
q In-class assignments
q Exams
n Using any other application (like email, messaging,
social media, etc.) than stipulated software is strictly
prohibited.
n Any information that is displayed on these screens
during the class time is considered public.
q I can use software to project any desktop that is being
used!!
14. Course Overview: Lab Computer Usage
n Just Joking!
n Highly recommended that you use your own
laptop for both homework as well as in class
learning and exams
n Download Anaconda for Free
n Easy to install on both Windows and Mac
n Contains everything you need to complete
this course
15. Course Overview: Grading
n Online (Vocareum) Homework
q 20%
n Group Project
q 20%
n In-Class Assignments (3)
q 15%
n Midterm Exam
q 20%
n Final Exam
q 25%
16. Course Overview: Topics
Fundamentals
Of Python
Programming
Data Manipulation,
Wrangling, and Visualization
with Python
Advanced
Data Analysis
With Python
• Python basics
• iPython notebook
• Flow Control
• Functions
• Lists and
Dictionaries
• Data Manipulation
• Numpy
• Pandas
• Scipy
• Data Visualization
• Matplotlib
• Web Scrapping
• Introduction to ML
• scikit-learn
• Linear Regression
• Simple classification
• Clustering
17. Provide Feedback – Early and Often
n Constructive and actionable feedback is
always welcome throughout the semester.
n Share your feedback personally (email, Slack
or office hours).
q If you don’t feel comfortable doing this you can
use the anonymous feedback form on Sakai.
20. Who is a Data Analyst?
Data Scientist (n.):
Person who is better at
statistics than any
software engineer and
better at software
engineering than any
statistician.
-- Josh Willis
http://www.datasciencecentral.com
24. What is a program?
n A set of instructions for a computer to
perform a task
Let us change our traditional attitude to the
construction of programs: Instead of imagining
that our main task is to instruct a computer
what to do, let us concentrate rather on
explaining to humans what we want the
computer to do.
Donald E. Knuth, Literate Programming, 1984
25. What is a program?
n A set of instructions for a computer to
perform a task
n Typically a program has
q Input data
q Process the data & instructions
q Produce an output
27. Python Programming
n Python is one of the most popular dynamic
languages, along with R, Ruby, Julia, etc.
n It is widely used by data scientists in both
academia and industry.
n Compared to fundamental programming
languages like C/C++: “Python reads like
kindergarten math and is easy on the layman’s
eye. It requires less code to complete basic
tasks, making it an economical language to
learn.”
29. Why Python?
n Readable and structured code
q Easy to learn
n It has a vibrant open source community.
q This means it is continuously evolving and getting
better everyday.
n Rich ecosystem of libraries makes it ideal and
essential language to learn in the analytics
domain.
q NumPy, SciPy, Pandas, Matplotlib, NLTK, DJango
30. Setting Expectations for this course
n Students will get familiarized with Python as a language and
learn techniques for data analytics, including:
q Data Collection
q Data Extraction and Manipulation
q Data Analysis methods
n This course does not teach
q Software engineering techniques
q App development
q Object-oriented programming concepts
q Checkout Survey of Software Engineering if you are interested
n I want to make sure you learn enough Python and data analytics
to be able to do more advanced content by yourself.
31. Python 2 vs Python 3
n The python community is undergoing a
transition from Python 2 series to Python 3
series
n Most of the code is backward compatible (!)
n We’ll learn Python 3
q Anytime you install or create code, remember to
always use Python 3
32. Programming Resources
n Python Tutor
q http://pythontutor.com/
q This is the best way to visualize your code and
see how the code gets executed step by step.
q Strongly recommended for the first time learners.
33. Additional Online Resources
n A Byte of Python
q https://python.swaroopch.com/
n LinkedIn Learning (Lynda)
q Playlist of videos are available on Sakai
n DataCamp
q Intro to Python for Data Science (link available on Sakai)
q Intermediate Python for Data Science (link available on Sakai)
n This is a fast paced course. Strongly recommend these
additional resources to complement in-class lectures if you
have never programmed before.
34. Best way to learn Python
n Practice, then practice and then practice more!
n Google and Stack Overflow are your best
friends!
n Programming is a creative activity
n You have to learn to break down a problem into
smaller tasks