The document discusses Python, natural language processing (NLP), and Alteryx. It begins with defining Python and comparing it to other programming languages. It then discusses how NLP is used in different industries and defines Alteryx, explaining how it is used and why it is popular. The document provides an agenda and learning objectives for a session on these topics.
2. AGENDA
Define Python and compare it to other programming languages
Discuss NLP and how it is used in different industries
Define Alteryx, explain how it is used and why it is so popular today in the marketplace
11/25/2019
Learning Objectives
2
Disclaimer : The following slides do not reflect the opinions of my employer, but
are my own personal opinions
3.
4. ABOUT ME – FROM FINANCIAL PLANNING TO DATA SCIENCE
11/25/2019 4
5. LinkedIn post
You are making progress even when you do not feel
you are.
Grit
GFC
Reskill /Retrain
Be uncomfortable
11/25/2019 5
Tip of the Iceberg
6. ABOUT ME – NEW LIFE IN DATA SCIENCE
11/25/2019 6
7.
8. WHAT IS CRISP DM?
Cross Industry Standard Process for Data Mining
11/25/2019 8
▪ Data Science
▪ Data Analytics
▪ Consulting
13. 11/25/2019 13
MAC Users:
In Terminal, type:
jupyter notebook to
launch your notebook
Launch Jupyter
Notebook
14. PYTHON
Reproducibility
Turn your code into a slide deck for your boss
Open Source
Notebooks – code + output
Data Visualisation
Produce a report as a data analyst
Deep Learning
Machine Learning
11/25/2019 14
20. CHECK WHICH
JOIN IS
APPROPRIATE
FOR YOU
11/25/2019 20
▪ Do you want to preserve user names on the left?
▪ How will you treat N/A’s?
▪ An inner join may reduce your data points
▪ What problem are you trying to answer?
https://javarevisited.blogspot.com/2012/11/how-to-join-three-tables-in-sql-query-mysql-sqlserver.html#axzz665Ch4Lx2
21. WRITE PYTHON
FILE TO R
11/25/2019 21
Write to CSV and read into R
Call Python from R Markdown
▪ https://rstudio.github.io/reticulate/
29. MODEL EVALUATION METRICS
11/25/2019
29
The confusion matrix, which is a breakdown of predictions into a table showing correct
predictions and the types of incorrect predictions made. Ideally, you will only see numbers
in the diagonal, which means that all your predictions were correct!
● Precision is a measure of a classifier’s exactness. The higher the precision, the more
accurate the classifier.
● Recall is a measure of a classifier’s completeness. The higher the recall, the more cases
the classifier covers.
● The F1 Score or F-score is a weighted average of precision and recall.
● Area under the ROC curve.
Source: Udacity
32. OBTAIN THE DATA
11/25/2019 32
▪ Given a new complaint comes in, We want to assign it to one of 4 categories. The classifier makes
the assumption that each new complaint is assigned to one and only one category.
33. WHAT IS CRISP DM?
Cross Industry Standard Process for Data Mining
11/25/2019 33
Data Understanding
Obtain Data
38. Excel on steroids
Self service analytics – you do not need to code
What can it do?
Workflow Optimization – compress file size, speed up
processing
Data Preparation and Data Blending (joins)
Writing fast and accurate expressions
Using Macros in workflows
Containers
Data Parsing – Text to Columns
Outputs files to csv, an Alteryx workflow and even a
Tableau hyper file
Predictive analytics
11/25/2019 38
ALTERYX DESIGNER
39. 11/25/2019 39
▪ Open source languages
Python and R do not
appear in Gartner’s 2018
magic quadrant
▪ Alteryx is a ‘challenger’ in
machine learning and data
science
Machine Learning Platforms
40. 11/25/2019 40
Choosing the right Alteryx Tool
https://community.alteryx.com/t5/Alteryx-Community-Resources/Designer-Cheat-Sheet/ta-p/371793?attachment-id=36217
42. WEBSCRAPING
Python and Alteryx
We know you want to impress your boss and do web
scraping
Check firstly if your company allows you to scrape
someone else’s data
Speak with your business risk team
You may be breaching copyright laws under some
company terms and conditions
Some websites are locked down by organizations
11/25/2019 42
47. MY TWO CENTS – DATA SCIENCE CAREER TIPS
Meetup.com e.g data science, data engineering, Rladies, Sydney Women in Machine Learning and
Data Science, Women who code, PyData, Docker, Kubernetes, AWS
Lunch & Learn
Hackathons - Data Science is a team sport
Data Science community – Get Involved
Data Science blogs
Learn Python, R and SQL
Internship
11/25/2019 47
48. Job Boards– Buyer Beware
Ask the recruiter what duties you will be performing
in your data scientist role
Ask the hiring manager what your duties you will be
performing in your data scientist role
Do not proceed to a further interview if the hiring
manager says ‘yes you will be doing reporting and
we may do data science in 18 months time’ - Run
for your life!
Keep on applying for data science roles, get a
mentor, speak to the community, do you have
peace?
11/25/2019 48
Data Science Interviews
Roles and responsibilities
Skills & Experience