This document discusses trends in data science and the use of Python. It provides an overview of WeCloudData's education and training programs in data science, machine learning, big data, cloud computing, and artificial intelligence. It describes various part-time and full-time learning paths covering topics such as Python, SQL, machine learning algorithms, deep learning, data engineering, big data tools and platforms, and cloud computing with AWS. It also includes information on career services and past student outcomes like job placements and salaries.
1. Python for Data Science
Trends and Use Cases
WeCloudData
@WeCloudData @WeCloudData tordatascience
weclouddata
WeCloudData tordatascience
2. WeCloudData
v v vEducation Career Consulting
• Analytics Bootcamp
• Career Services
• Mentorship
• ISA
• Diploma Programs
• Part-time Programs
• Corporate Training
• Data Science
• Machine Learning
• Big Data
• Cloud
3. DS Career Panel
Join our kick a**
instructor team!
Help
corporates
upskill their
employees
Data/Cloud
Skills Training
for Canadians
DS Diploma
Toronto Institute of
Data Science
Training
Reskill | Upskill
AI Bootcamp
Communities
Meetups
Networking
Hiring Event
AI Expert
Instructing
Consulting
DS/AI
DE
Cloud
6 months
Success rate
89%
75k Salary
Project-based
Training
Upcoming
Events40%
Referral
by WCD
Bring real-world
client projects to
the classroom
Apache Spark Event
ML/AI Workshops
4. Data Science Part-Time
Learning Path
Prerequisites
Data Science
Learning Path
• ML algorithms
• 2 Projects
• Interview Practice
Applied ML
• Data wrangling
• Data Visualization
• Predictive Modeling
Data Science
w/ Python
• Big data tools
• ML at scale
• ML deployment
• Job referrals
Big Data
Python
Foundation
SQL for
Data Science
5. Scala & Spark for DE
Linux Command Line
Docker | Kubernetes
Scala Programming
Spark In Depth
ETL for DE
Hadoop | Hive | Presto
Data Ingestion & Integration
Talend
Airflow & Pipelines
Real-time Analytics
Apache Kafka
Spark Streaming
Apache Flink
Apache Beam
SparkforDE
BigData
&
ETL
Realtim
e
Analytics
Learn to build data pipelines, scale
data processing with big data tools,
and deployment real-time
applications and machine learning
models at scale.
Data Engineering
Learning Path
Data Engineering Part-Time
Part-time Program
6. AWS Big Data - Part-Time
Learning Path
Learn AWS big data tools and
platforms and get certified as AWS
Certified Big Data Specialist
Cloud Computing
AWS Track
Learn AWS Big
Data Tools
Hands-on
Project
Certification
Exam Prep
02/02/202010/12/2019
Learn AWS
Solution Architect
Hands-on
Project
Certification
Exam Prep
7. Applied Deep Learning
Applied AI – Part-Time
Learning Path
Artificial Intelligence
Program
Deep Learning for NLP
Deep Learning Capstone
Machine Learning in Healthcare
https://www.youtube.com/watch?v=39rSzfpYsvA
8. P(Get Interview) = 0.4 +0.25 + 0.25 + 0.1S E R N
P(Ace Skills) = 0.25 +0.3 + 0.4 + 0.05S C B P
P(Offer) = P(Get Interview) x P(Ace Interview)
Landing a Data Scientist Job
Key Factors
S
E
N
R
Skills
Experience
Resume
C
Network
Communication
B
P
Business Cases
Preparation
10. Prerequisites
Data Science
Learning Path
• ML algorithms
• 2 Projects
• Interview Practice
Applied ML
• Data wrangling
• Data Visualization
• Predictive Modeling
Data Science
w/ Python
• Big data tools
• ML at scale
• ML deployment
• Job referrals
Big Data
Python
Foundation
SQL for
Data Science
11. Prerequisites
• ML algorithms
• 2 Projects
• Interview Practice
Applied ML
• Data wrangling
• Data Visualization
• Predictive Modeling
Data Science
w/ Python
• Big data tools
• ML at scale
• ML deployment
• Job referrals
Big Data
Python
Foundation
SQL for
Data Science
+
Experience
Industry Intern
Consulting Project
+
Career Support
Resume
Referral (50%)
P(Get Interview) = 0.4 +0.25 + 0.25 + 0.1S E R N
S
E
N
R
Skills
Experience
Resume
C
Network
Communication
B
P
Business Cases
Preparation
P(Ace Skills) = 0.25 +0.3 + 0.4 + 0.05S C B P
Training
Data Science Immersive
(PCC Approved Diploma Program)
12. Python
• Py: Basics
• Py: DataTypes
• Py: Strings
• Py: Functions
• Py: Class
• Py: IDEs
(PyCharm)
W2
W3W1
Learning to
Code
• SQL
• Linux | Docker
• Github
• AWS
Data Science w/ Python
• Py: Functions
• Py: Class/OOP
• DS: Numpy
• DS: Pandas
• DS:Viz
• DS:API
• Scraping Project
ML: Classifier
• ML: KNN
• ML: Logistic
• ML: SVM
• ML: Evaluation
• ML: Cross-val
W4 W5
ML: Classifier
• ML:Trees
• ML: Ensembles
• ML:Tuning
• ML: Imbalanced
• ML: Pipeline
Review Week
• Review
• SQL Quiz
• ML Quiz
• Interview Practice
• ML Project #1
W6
12-week Diploma Program
Data Science Diploma Program – Jan 2020
Syllabus
Big Data
• BD: Spark DF
• BD: NoSQL
• Interview Practice
W11
W12W10
Big Data
• Big Data Project
• Spark Machine
Learning
• Model
Deployment
• Rest API
• Model in
Production
Big Data
• BD: Hadoop
• BD: Hive
• BD: SQL on
Hadoop
• BD: Spark
ML: Regression
• Py: Pandas Adv
• ML: Stats
• ML: Linear Algebra
• ML: Optimization
• ML: Regression
W7
ML: Clustering/NLP
• ML:Text Processing
• ML:Topic Model
• ML: Clustering
• ML Dimension
Reduction
• Interview Practice
• Client Project Kickoff
W8
ML: Neural Net
• ML: Neural Net
• ML: Keras
• ML: CNN
• ML Project #2
• Interview Practic
W9
13. Data Science Diploma Program – Jan 2020
Syllabus
Python
• Py: Basics
• Py: DataTypes
• Py: Strings
• Py: Functions
• Py: Class
• Py: IDEs
(PyCharm)
W2
W3W1
Learning to
Code
• SQL
• Linux | Docker
• Github
• AWS
Data Science w/ Python
• Py: Functions
• Py: Class/OOP
• DS: Numpy
• DS: Pandas
• DS:Viz
• DS:API
• Scraping Project
ML: Classifier
• ML: KNN
• ML: Logistic
• ML: SVM
• ML: Evaluation
• ML: Cross-val
W4 W5
ML: Classifier
• ML:Trees
• ML: Ensembles
• ML:Tuning
• ML: Imbalanced
• ML: Pipeline
Review Week
• Review
• SQL Quiz
• ML Quiz
• Interview Practice
• ML Project #1
W6
12-week Diploma Program
Big Data
• BD: Spark DF
• BD: NoSQL
• Interview Practice
W11
W12W10
Big Data
• Big Data Project
• Spark Machine
Learning
• Model
Deployment
• Rest API
• Model in
Production
Big Data
• BD: Hadoop
• BD: Hive
• BD: SQL on
Hadoop
• BD: Spark
ML: Regression
• Py: Pandas Adv
• ML: Stats
• ML: Linear Algebra
• ML: Optimization
• ML: Regression
W7
ML: Clustering/NLP
• ML:Text Processing
• ML:Topic Model
• ML: Clustering
• ML Dimension
Reduction
• Interview Practice
• Client Project Kickoff
W8
ML: Neural Net
• ML: Neural Net
• ML: Keras
• ML: CNN
• ML Project #2
• Interview Practic
W9
Client Project Career/Referral
Other Bootcamps
15. Hands-on Project
Bring real industry-level project experience to the classroom
By working on real projects, we mean
• You will be helping startups set up data pipelines in AWS
• You will be working on forecast models to optimize inventories for
hundreds of millions of device sales
• Your customer segmentation models will shape how a startup manage
marketing campaigns
• You will help the client save AWS cost by 200% by migrating computing to
Apache Spark
• Your machine learning models will help companies retain high value
customers
• Your work will be presented to the CEOs
16. 153k 13
Market
Research
Student Success
Job Placement
6 months 2 months
56%89%
Data Scientist
Security Analyst
Senior Analyst
Data Scientist
Data Engineer
70k 0 New grad
98k 2 FSA
73k 0 New Grad
63k 0 New Grad
78k 3 PWC
Sr Data Scientist
83k Salary
50%
Referral
by WCD
120k 13 QAData Scientist
Data Scientist 80k 2 Data Analyst
100k 0
Geology (New
Grad0
Data Scientist
70k 0
Statistics (New
Grad)
ML Engineer
20. Data Scientist
The Types
Operational DS
Focus: data wrangling, work with
large/small messy data, builds
predictive models
Strength: data handling, tools, business
knowledge
ML Engineer
Focus: ML model deployment, data
pipelines
Strength: coding, algorithms, machine
learning, platforms and tools
ML Researcher
Focus: algorithm development,
research, IP
Strength: ML/DL algorithms,
implmentation, research
DS Product Mngr
Focus: product strategy, business
communications, project management
Strength: product sense, business
requirements, DS acumen
32. Resources
Python
Coding Practice
Coding & Interviews
• LeetCode
• HackerRank
Book Statistics Online Courses
Udemy
• Complete Python Bootcamp
Datacamp
• Introduction to Python
33. Data Science
Importance of foundations
Data Science
Machine
Learning
Big Data
Data
Engineering
Deep
Learning
ML
Engineering
Focus on one programming language at a time
• Get good at it
Must have skills
• Python
• SQL
34. Data Science
What’s next?
Prerequisites
Data Science
Learning Path
• ML algorithms
• 2 Projects
• Interview Practice
Applied ML
• Data wrangling
• Data Visualization
• Predictive Modeling
Data Science
w/ Python
• Big data tools
• ML at scale
• ML deployment
• Job referrals
Big Data
Python
Foundation
SQL for
Data Science
Nov 16 Nov 3 Nov 16Nov 23 Oct 19
37. Targeting
Profiles
Personaliz
ation
Parse/Filter Classify Synthesize
POI Prediction
Ontology/TaxonomyContexts
URL Parsing
POI Database
Context Extraction
Topic Modeling
Content Classification
Location Classify
Signal Aggregation
Taste Merging
Taste Scoring
Data Data Science Pipelines Data Product
POI Context Builder
Rule-based Predictor
ML Predictor
Location Attributes
Home/Work Predictor
Co-location Location
Graph
• sklearn
• gensim
• nltk
• mrjob
• PySpark
• PySpark
Why Python?
Python in a data science project
38. Python Data Management
Structured Data with Pandas DataFrame
Row Index Population Area
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193
Column
DataFrame
Values
Column
Row
Row
Row
Row
Row
Row Index
Row Index
Row Index
Row Index
Row Index
Column Index Column Index
# row access returns Series
states.loc['Florida']
# column access returns Series
states['area']
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
population 170312
area 19552860
# index based selection
states.iloc[1:3, :1]
Row
Index
Population
Florida 170312
Illinois 149995
Series
Series
DataFrame
39. Python Data Management
Pandas - GroupBy()
City
Ticket
Sales
Toronto 100
Montreal 50
Toronto 20
Halifax 40
Montreal 30
Halifax 60
City
Ticket
Sales
Toronto 100
Toronto 20
City
Ticket
Sales
Montreal 50
Montreal 30
City
Ticket
Sales
Halifax 40
Halifax 60
City
Ticket
Sales
Toronto 60
City
Ticket
Sales
Montreal 40
City
Ticket
Sales
Halifax 50
City
Ticket
Sales
Toronto 60
Montreal 40
Halifax 50
Input
DataFrame
Split
DataFrameGroupBy
Apply (sum)
DataFrameGroupBy
Combine
DataFrame
df = pd.DataFrame({'city' : ['Toronto', 'Montreal', 'Toronto', 'Halifax',
'Montreal', 'Halifax'],
'sales' : [100, 50, 20, 40, 30, 60]})
40. Python Data Management
Pandas - Join/Merge # 1-to-1 join
pd.merge(employee, hr, how='inner', on='employee')
Other features
• Pivot Tables
• Window Functions
57. Trends
Model Deployment in Cloud
SageMaker
EMR
ECR
S3
Notebook
Transform Inference
1. ETL on EMR
using Spark
2. Save Model to S3
s3://weclouddata/mod
els/gbm20190612
SageMaker
Spark ML
Container
3. Start notebook
instance and
deploy model
SageMaker
Spark ML
Container
4. Start
SageMaker
Spark container
for prediction
API
SageMaker
Spark ML
Container
61. Python Programming
Why Python?
• Python is the most popular data
science and AI programming
language
• Many employers prefer candidates
with python skills
• Mastering python will expose you
to not only Data Scientist jobs, but
also Data Engineers and DevOps
62. Python Programming
Syllabus
• Python use cases
• Branching
• Loops
• Data Types: list, tuple,
set
• Functions
• Lab: social media
analytics with python
Day 1
Python Basics
Day 2
Intermediate Python
Day 3
Python Data Analysis
• Data Types: String
• Data Types: Dictionary
• Comprehensions
• Regular expression
• Modules & Packages
• Class and Object
• Interview – Prepare for
Python Interview Tests
• Lab – Class and object
• Pandas introduction
• Intro to visualization
with python
• Accessing database
with python
• Use case: Python for
ML and AI
• Project: Building your
first ML algorithm with
python
• Python Installation
• Jupyter Introduction
• Python Introduction
• DS Introduction
• Twitter Dev API Setup
Pre-course
Installation & Preview
63. • Web scraping basics
• BeautifulSoup
• Selenium
• Project #1 kickoff
• Matplotlib Review
Data Collection
Signup
• Project #1 Presentation
• Seaborn | Plotly
• Map Visualization
• Building analytics
dashboard with Dash
• Project #2 kickoff
EDA & Data Visualizations
• Project #2 Presentation
• Predictive modeling
lifecycle
• Introduction to sklearn
• Regression analysis
Predictive Modeling
• Data Science Intro
• Analyze Toronto Open
Data
Data Science
• Advanced Pandas (Merge
and Joins)
• Advanced Aggregations
• Querying databases
• Reporting with Pandas
and Pivot
Data Wrangling
• Intro to Statistics and
Linear Algebra
• Scipy for statistics
• Numpy for linear algebra
• Time series forecasting
with Prophet
Statistics and Linear Algebra
W1 W3 W5
W2 W4 W6
Final
Review
Data Science with Python
Syllabus (Weekend Cohort – 8 sessions/32 hours)
• Python review
• Intro to Pandas
• Intro to Visualization
Self-paced Lectures
• Classification models
• Model evaluation
• Predicting Toronto TTC
delay using Sklearn
Predictive Modeling
W7
• Course review
• Introduction to Machine
Learning
• Introduction to Big Data
Final Review
W8
64. Data Science with Python
Hands-on Projects
This course is instructor-led and project-based. Students will be able to apply the data science
skills acquired during the lectures to 2 hands-on projects. The 2 projects will make your
2 Data Science Projects
• Web Data Analytics
• Data Storytelling (Dashboard + Heroku Deployment)
Data Collection
BeautifulSoup
Selenium
Data Cleaning
Pandas
Matplotlib
Data Analysis
Matplotlib
Pandas
SQLAlchemy
Story telling
Insight Analysis
Presentations
App Deployment
Heroku
Flask
Visualization
Dash
Plotly
Project #1:
Web Data
Analysis
Project #2:
Data
Storytelling
Project 2 Demo
68. Open a webpage
and get HTML of
search results page
Fishing Online
Process data using
pandas and save data
into .csv file
Locating page
elements by XPath
Extract Target Data
NextURL?
Yes
No
Data Science with Python
Student Project Demo
73. Info & Motivation
´ Type : Public
´ Traded as : TSX: ATZ
´ Industry : Fashion
´ Founded : 1984
´ Founder : Brian Hill
´ Headquarters : Vancouver, British Columbia, Canada
´ Products : Clothing
85. Conclusion
´ Business casual clothes prices are higher than others
´ More transactions/purchases happens in weekends
´ Sale event – good deal for famous brands
´ Promotion influences stock change
86. Challenges
´ Save data as Tree structure (.json)
´ Load data
´ Move root node properties to children node
´ Data analyzing using Pandas
´ Visualization - Plotly (multi-chart types)
87. Next Step
´ Detailed size distribution of
brands / products
´ Influences of the strength of
discount
´ Stock refill timing
´ Long term data analyzing
(winter vs. summer)