SlideShare a Scribd company logo
Big Data for Data Scientists
Trends and Use Cases
WeCloudData
@WeCloudData @WeCloudData tordatascience
weclouddata
WeCloudData tordatascience
Career
Services
Meetup
Events
Introduction
Data Skills
Training
WeCloudData offers Toronto’s first data
science accelerator program. We specialize
in teaching lead-edge tools such as AWS,
Spark, and Machine Learning and help our
corporate clients upskill/reskill their data
teams
WCD works with some of the most
talented and experienced data
science experts to deliver public
and corporate trainings. We
currently have 21 part-time and 2
full-time instructors.
Our instructors bring their analytical
expertise from various industries,
teach students advanced tools
such as Python, Hadoop, Spark,
and AWS, mentor students on end-
to-end data projects.
Introduction
Faculty Team
21
Instructors
10
Teaching
Assistants
Python for SAS
and SQL Users
Machine
Learning |
Deep Learning
Big Data
Executive
Workshops
Product & Services
Corporate Training
We offer customized corporate training to Canadian
companies with flexible schedules and learning
support!
We help train, upskill, and
reskill data teams!
Python for SAS Users
Machine Learning
Big Data
AI/DS for Executives
Corporate Data Programs
We’ve delivered customized trainings to many large Canadian companies
WeCloudData
Corporate
Program
We offer customized corporate training to Canadian
companies with flexible schedules and learning
support!
We help train, upskill,
and reskill data
teams!
Introduction
Communities we’re building
8,000 members
120 events
We organize one of the most active DS
communities in Canada!
Workshop Provider
Conference/Clients
Workshop Provider
TMLS Conference
November, 2018
Workshop Provider
TD Canada
Analytics Month
October, 2018
• Machine Learning Open Data
• Spark ML and MLflow
• Deep Learning with PyTorch
• Python for SAS Users
• Machine Learning with Python
Workshop Provider
Big Data & AI
Toronto 2019
June, 2019
• Big Data in AWS Cloud
• Spark for Data Science
• Moving from On-Prem to Cloud
WeCloudData is the conference workshop choice of vendors in Toronto due to our expertise and
specialty.
Analytics Events
We help companies with hiring/branding events
WeCloudData organizes one of the
largest and most active data science
communities in Toronto with 7,500
members and 110 past events. We
help companies facilitate mini-
conferences and help them run hiring
events.
2005
2007
2008 2010
2011
2015
2012
2014 2016 2018
Instructor
Shaohua Zhang
• Co-founder and CEO of WeCloudData. Lead instructor for the corporate training program
• Certified SAS Predictive Modeler since 2007 (among the first 20 in the world)
• Helped build and lead the data science team at BlackBerry (2010 – 2015)
• Helping Communitech incubator and Open Data Exchange mentor startups on data strategies
• Specializes in machine learning, big data, and cloud computing
Learning Path
Data Science Program
Prerequisites
Data Science
Learning Path
Learn to build ML
models using
Sklearn
ML Applied
Master data
wrangling with
Python
Data Science
w/ Python
Harness big data
with Hadoop, Hive,
Presto, and
AtScale
Big Data
Build your portfolio
with hands-on
Capstone projects
ML Advanced
Machine Learning
at Scale with
PySpark ML and
Real-time
Deployment
Spark
Contact us about the courses:
• info@weclouddata.com
Upcoming courses:
• https://weclouddata.com/upcoming-course-schedule
Linux/Docker
Scala
Spark
Programming for
Data Engineering
Hadoop/Hive
Data Ingestion
Workflow
NoSQL
ETL (Big Data)
Spark Internals
Spark Tunings
Spark In-Depth
Kafka
Spark Streaming
Apache Flink
Realtime Analytics
Scaling ML
Model Deployment
Pipeline Automation
Machine Learning
Engineering
Learn to build data pipelines, scale
data processing with big data tools,
and deployment real-time
applications and machine learning
models at scale.
Data Engineering
Learning Path
Learning Path
Data Engineering Program
Contact us about the courses:
• info@weclouddata.com
Upcoming courses:
• https://weclouddata.com/upcoming-course-schedule
Data Scientist
Data Jobs in the
MarketData Handling Complex Analytics Big Data Storytelling
Data Science
Data Scientist
Coding/Tools
Math/ML Storytelling
Data
Scientist
Linux
Python/Scala/Java
Cloud (AWS)
Hadoop, Spark
Statistics
Linear Algebra
Regression
Classification
Clustering
NLP
Presentation
Use cases
Project Mgmt
Communications
Data Science
Essential Skills
Data Scientist Data Analyst
Data Science
Job requirements
Data
Application
Scraping/API
Labeled data
Infra/
Platform
RDBMS
Hadoop
Cloud
Data Engineering
ETL
Enrichment
Dataflow
automation
AI/ML
Python
ML
Deployment
Prediction API
Stream
processing
Data Science
The Myth
Data Scientist
The Types
Operational DS
Focus: data wrangling, work with
large/small messy data, builds
predictive models
Strength: data handling, tools, business
knowledge
ML Engineer
Focus: ML model deployment, data
pipelines
Strength: coding, algorithms, machine
learning, platforms and tools
ML Researcher
Focus: algorithm development,
research, IP
Strength: ML/DL algorithms,
implmentation, research
DS Product Mngr
Focus: product strategy, business
communications, project management
Strength: product sense, business
requirements, DS acumen
Data Scientists are like unicorns… so they’re hard to find. Let’s
focus on building the data science teams.. that have data scientists,
engineers, and analysts working towards the same goal.
Data Science Team
2008 2010 2015 2016 2018
Predictive
Modeler
Grad
School
Data
Scientist
Data
Scientist
Instructor
DS
Trainer
Mentor
My DS Journey
Shaohua Zhang
Operational
Data Scientist
Product
Manager
Data/ML
Engineer
Tools
Projects
Churn
Up-sell/Cross-sell
Social Network
Recommender
Big Data
Cloud
Chatbot
Deployment
HR | Retail | Digital Analytics
Predictive Maintenance
Predictive
Modeler
GrowthAcquisition Maturity Decline Loss
● Lead Gen
● Digital Mktg
● Mobile Ads
● Cross/Up-sell
● Segmentation
● CLTV
● Taste graph
● Personalization
● Loyalty Management
● Context-based Mkgt
● Churn models
● Retention
Acquisition
Models
LTV Loyalty
Management
Retention Winback
Customer
Value
● Winback
models
Predict high risk customers
Data
Scientist
Data
Scientist
Twitter API
Data
Scientist
Business
Our new product feature received a lot of negative review..
- Can we do some analysis?
Data
Scientist
Business
Our new product feature received a lot of negative review..
- Can we do some analysis?
The analysis looks good. Can we build a small tool?
DS
Trainer
Big Data Analytics
Data Collection
Credit
Approval
Age Gender
Annual
Salary
Months in
Residence
Months
in Job
Current
Debt
Paid off
Credit
Client 1 23 M $30,000 36 12 $5,000 Yes
Client 2 30 F $45,000 12 12 $1,000 Yes
Client 3 19 M $15,000 3 1 $10,000 No
Client 4 25 M $25,000 12 27 $15,000 ?
Data Preparation
Data Processing Engines
Relational
Database
NewSQLNoSQL
NoSQL
Goolge F1NoSQL
GraphDB
Search
Cache
Databases
Analytics Tools
Analytics Data Pipelines
Credit: https://arxiv.org/pdf/1409.3809.pdf
GET /velox/catify/predict?userid=22&song=277568
GET /velox/catify/predict_top_k?userid=22&k=100
Velox
Prediction
Service
Model
Manager
Web
Application
HDFS
The Missing Piece
ML Prediction API
HADOOP ECOSYSTEM
Big Data – 4 V’s
Paris
$1000mVolume
London
$1000mVelocity
Tokyo
$1000mVariety
New York
$1000mValue
“More data cross the
internet every second
than were stored in the
entire internet just 20
years ago” - Big Data: The
Management Review
(HBR)
Internet
• 2.3 Zetabytes/day
(2014)
Facebook
• 500 TB/day
(2012)
Programmatic Ads
• 200ms
Fraud Detection
• 400ms
Fraud Prevention
• 50ms
Structured
• Relational
Unstructured
• Image / Voice / Text
Semi-structured
• Graph
“Regardless of its size,
data is worthless if not
turned into actionable
insight”
Internet o 2.5 exabytes (2.5x1018) per
day – 2012
o 2.3 zettabytes (2.3x1021)
per day - 2014
Facebook o 500+ terabytes per day
o 100+ petabytes in a single
Hadoop cluster
“More data cross the internet every second
than were stored in the entire internet just
20 years ago” - Big Data: The Management
Review (HBR)
VelocityVolume Variety
Big Data - Volume
VelocityVolume Variety
video
Big Data – Velocity
Demo
¨ Data Variety
¤ Structured
n Table
n Relational
¤ Unstructured
n Text
n Image
n Audio/Video
¤ Semi-structured
n XML
n JSON
n Graph
Big Data – Variety
History of Big Data (Hadoop)
Hadoop
Big Data
Map Reduce
Apache Spark
Big data - Google Trends
Google MapReduce Paper
Doug Cutting got hired by Yahoo! to work on Hadoop
Spark took off
Knowing more
tools is always
helpful.
Knowing how to
put them to work
together is more
important!
Hadoop Ecosystem
Single Node Architecture
• Traditionally, computation has
been CPU bound
• Complex computation on small
data
• For decades, the primary push is
to increase the computing power
of a single machine
Scale Up vs. Scale Out
• Single Node Architecture
• Scaling up advantage
• Programming is easier than distributed computing
• Faster processing on smaller data
• Scale up disadvantage
• Hardware cost
• Scalability
• Advantage of scale-out systems
• Scalability
• Cost
Traditional Distributed Systems: Problems
• Modern large scale processing is distributed across
machines
• Often hundreds or thousands of nodes
• Focuses on distributing the processing workload
• Powerful compute nodes
• Separate systems for data storage
• Fast network connections to connect them
• Problems with these distributed systems:
• Complex programming model
• It is difficult to deal with partial failures of the system
• Bandwidth limitations
• Data consistency
• Typically at compute time, data is copied to the compute
nodes
• This doesn’t scale to today’s big data
problems!
Data Becomes the Bottleneck
• Traditional distributed systems
don’t scale to today’s Internet-
scale data
• Getting data to the computer
processor becomes the
bottleneck
• Disk I/O is slow
• Network bandwidth is bottleneck
• Solution à moving
computation to the data!
Internet o 2.5 exabytes (2.5x1018)
per day – 2012
o 2.3 zettabytes (2.3x1021)
per day - 2014
Facebook o 500+ terabytes per day
o 100+ petabytes in a
single Hadoop cluster
Modern Distributed Computing Cluster
• Cluster architecture
• A medium-to -large Hadoop
cluster consists of a two-level or
three-level architecture built
with rack-mounted servers.
Each rack of servers is
interconnected using a 1
Gigabyte Ethernet switch. Each
rack-level switch is connected
to a cluster-level switch (which
is typically a larger port-density
10GbE switch)
Stunning Photos Of Google's Massive Data Centers: http://www.forbes.com/pictures/edej45emjgl/up-above-the-massive-floor/
split
node1 node2 node4node3
Block 1 Block 3Block 2
HDFS
Hadoop Distributed File System
• The blocks are replicated to nodes throughout the cluster
• Based on the replication factor (3 by default)
• Replication increases reliability and performance
• Reliability: can tolerate data loss
• Performance: more opportunities for data locality
HDFS - Replications
split
DN1 DN2 DN4DN3
Block 1 Block 3Block 2
• The NameNode stores all metadata
• Information about file locations in
HDFS
• Information about file ownership and
permissions
• Name of the individual blocks
• Locations of the blocks
• Metadata is stored on disk and read
into memory when the NameNode
daemon starts up
• Changes/Edits to the files are written to
the logs
The Name Node
file à /user/lab/myFile.txt
replication à 3
blocksà red,green,blue
block locations à …
Name Node
DN1 DN2 DN4DN3
I wish to wish the
wish you wish to
wish, but if you
wish the wish the
witch wishes, I
won’t wish the
wish you wish to
wish
I wish to wish
the wish you
wish to
wish, but if you
wish the wish
the
witch wishes, I
won’t wish the
wish you wish
to wish
1
1 11 1
1 1
1
1
I
wish
to
the
you
1 11
1
1
1
wish
but
if
you
the 1 1
1
1
1
1
witch
wishes
I
won’t
wish 1 1
the
you
to
1 1
1
1
1
1
4
2
1
1
I
wish
to
the
you
3
1
1
1
wish
but
if
you
the 2
1
1
1
1
witch
wishes
I
won’t
wish 4
the
you
to
1
1
1
but
I
if
to
the
witch
wishes
won’t
wish
you
1
2
1
3
4
1
1
1
11
3
1
1 1
but
I
if
2
1
to
the
1
1
1
witch
wishes
won’t
wish 4
you 1
1
2 1
3 4
1 1
1
Documents
Splitting Map Shuffle/SortCombine Reduce
MapReduce handles these
automatically for you!!
MapReduce - WordCount
HiveMapReduce (Java)
Slave
Slave
Slave
Hadoop
Hive
HDFS
hive > create table tweets_filter as
> select * from tweets
> where to_date(ts) in (‘2010-03-02’,
‘2010-0303’)
Hive Driver
Interpret the query
Optimize the computation
Create job plan and send to Hadoop
Hive CLI
TT 1
MySQL
Metast
ore
Master
Job2398564
Apache Hive
Map
JobTracker
NameNode
TT 2
TT 3
ImpalaPresto
SQL on Hadoop
Apache Presto
Advantage
Daily/Hourly Batch Jobs Interactive Queries
Daily/Hourly Batch Jobs
Interactive Queries
Apache Presto
Advantage
Daily/Hourly Batch Jobs
Interactive Queries
SQL on any datasets
Apache Kylin (OLAP)
AtScale
Processing Engine
10x – 100x
MapReduce vs. Spark
Multi-core CPUs
RAM
Hard Drive SSD
Nodes in a different rack
Network
1Gb/s or
125 MB/s
100 MB/s
600 MB/s
10GB/s
0.1Gb/s
RAM vs. Disk vs. Network
• A unified platform that supports many data processing needs
including
• Batch processing (Spark)
• Stream processing (Spark Streaming)
• Interactive (SparkSQL)
• Iterative (MLlib, ML, GraphX, GraphFrame)
Spark - Unified Data Platform
O
ne
size
fits
m
any!
Visualization
Advanced Analytics
Data Processing
Database
Data Scientist Toolbox (Big Data)
Enterprise - Traditional
Visualization
Advanced Analytics
Data Processing
Platform
Data Scientist Toolbox (Big Data)
Enterprise – New/Cloud
Visualization
Advanced Analytics
Data Processing
Data Lake
Data Scientist Toolbox (Big Data)
Startups | Tech | Digital Labs | Big Data Teams
Visualization
Advanced Analytics
Data Processing
Data Lake
Data Scientist Toolbox (Big Data)
Enterprise – New Trend
Big Data for Data Scientists
Course Detail
Learning Path
Data Science Program
Prerequisites
Data Science
Learning Path
Learn to build ML
models using
Sklearn
ML Applied
Master data
wrangling with
Python
Data Science
w/ Python
Harness big data
with Hadoop, Hive,
Presto, and
AtScale
Big Data
Build your portfolio
with hands-on
Capstone projects
ML Advanced
Machine Learning
at Scale with
PySpark ML and
Real-time
Deployment
Spark
Contact us about the courses:
• info@weclouddata.com
Upcoming courses:
• https://weclouddata.com/upcoming-course-schedule
Big Data for Data Scientists
About this course
• For learners who want to get started with big data, the sheer number of tools in the
ecosystem always feels overwhelming and confusing. With a well-structured
curriculum and instructors who have years of industry experience implementing big
data solutions, the Big Data for Data Scientist will help you focus on learning the tools
that matter the most.
• This course covers several popular big data platforms and frameworks that modern
data scientists and analysts need to master. Students learn throughout the course to
integrate different tools such as Hadoop, Hive, Presto, AWS, and NoSQL to solve real-
world data challenges.
• The course is built around an end-to-end big data pipeline to process terabyte scale
data (billions of records) in a cloud environment. Students gain first-hand experience
on data collection, ingestion, distributed storage, distributed processing, and
interactive visualizations.
• Many big data use cases will be covered to help consolidate the learnings and most
importantly students gain real-life experience and confidence to apply the knowledge
learned back to their data science projects at work.
Big Data for Data Scientists
Who is this course for?
• This course serves as a great foundational course for professionals who want to
switch career, graduates who want to get into this field as a data scientist, and big
data enthusiasts who want to learn the hottest big data tools such as Hadoop, Hive,
Presto, AWS, and NoSQL and apply them to solve real-world big data problems.
• For new graduates and job seekers, this course teaches you the essential big data
tools and concepts required for modern data scientist jobs and then complementary
big data interview questions will get your prepared for interview challenges.
• For data scientists who want to gain new skills, the course will give you
comprehensive view of the big data ecosystem and prepare you for the big data tasks
at work.
• For tech-savvy project managers who want to gain a comprehensive understanding
of big data use cases and lifecycles, the hands-on project in this course gives you
exactly what you hope for.
Big Data for Data Scientists
Learning outcome
After this course, the students will be able to
• Gain competence to take on real data challenges at workplace and demonstrate
experience and advantage in the job market with the learned skills added to the resume
• Gaining solid understanding of the Big Data ecosystem and various real-world use cases
• Comfortable working with different big data platforms such as Hortonworks and AWS
EMR, run Hive ETL pipelines and querying large datasets with Apache Presto
• Build and automate data pipelines with Apache Airflow and build a project demo via
visualization dashboard with Superset
• Gain real world experience through a hands-on project and convince your
manager/peers that you’re up for big data related projects at work
2005
2007
2008 2010
2011
2015
2012
2014 2016 2018
Big Data for Data Scientists
Instructor – Shaohua Zhang
• Co-founder and CEO of WeCloudData. Lead instructor for the Big Data course and the
corporate training program
• Helped build and lead the data science team at BlackBerry (2010 – 2015)
• Helping Communitech incubator and Open Data Exchange mentor startups on data strategies
• Specialize in machine learning, big data, and cloud computing
Big Data for Data Scientists
Prerequisites
Prerequisites
• You do not need prior experience with programming languages such as python, but it
helps!
• Familiarity with Linux Commands, SQL and relational database concepts
• Having an understanding of your company’s big data use case, technologies, and goals
will motivate and direct your focus in this course
Lecture Content Lecture Content
1
Big Data
• Introduction to Big Data
• Big Data Use Cases
• AWS – EC2/S3
7
Spark Core
• Introduction to Spark Core
• Spark RDD Operations
2
Hadoop
• Hadoop Data Distributed Filesystem
• MapReduce with Python
• AWS - EMR
8
Spark DataFrame |
SQL
• Spark DataFrame and SQL
• Complex Transformations and UDFs
3
Apache Hive |
Sqoop
• Hive Introduction
• Hive Queries
• Apache Sqoop
• Project kick-off
9
Spark Performance
Tuning
• Spark Internals
• Performance Tunings
4
SQL on Hadoop
• Presto/Impala
• Apache Kylin/AtScale
10
Spark ML
• Spark Machine Learning API
• Building Classification and Regression Models
5
NoSQL
• Amazon DynamoDB
• Cassandra
• Elasticsearch
11
Spark ML II
• Recommender System with Spark
• Deep Learning on Spark
6
Data Pipeline
• Data pipeline with Airflow
• Visualization with Superset
• Project Discussion
12
Spark Streaming
• Kafka/Kinesis
• Spark Streaming
• Project Presentation
Syllabus
Big Data for Data Scientists
Syllabus (Weekend Cohort – 12 sessions/48 hours)
Big Data for Data Scientists
Industry Use Cases
In this course, we not only teach students how to use the big data tools, but also common
use cases. Understanding the real-world use cases and industry best practices will allow
the students to apply skills to their company’s data problems
Use Cases
• Big data use cases in retail personalization
• Big data use cases for retail banking
• Big data use cases for fraud analytics
• Big data use cases in compliance analytics
• Big data use cases in online advertising
Big Data for Data Scientists
Hands-on Project
This course is instructor-led and project-based. Students will be able to apply the big data
knowledge acquired during the lectures build an end-to-end big data project.
Project: Building an AWS-based Big Data Pipeline
• Real-time data collection and ingestion via Kinesis and NoSQL
• Build Hive databases and ETL pipelines
• Interactive data analysis with Presto
• Building streaming MOLAP cubes with Apache Kylin
• Real-time dashboard with Apache Superset
• Workflow automation with Apache Airflow
Data Size: 500GB ~ 1TG
Records: 1 billion +
Twitter API
Kinesis
Student Project Demo
Stock price prediction using twitter sentiment and deep learning
Student Project Demo
Real-time Twitter Sentiment Pipeline
Student Project Demo
Real-time Twitter Sentiment Pipeline
Big Data for Data Scientists
Learning Support
Support you will receive during this course include
• Mentorship and advice from an industry expert
• In-classroom learning assistance by our assistant instructor
• Online learning support on Slack from instructor and TA
• Hands-on labs and projects to help you apply what you learn
• Additional resources to help you gain advanced knowledge
• Help from our learning advisor on how to choose the learning path and
specialization courses after the Big Data course
Big Data for Data Scientists
Testimonials
This course really helped me with in-depth explanation and application of Cloud and Big Data technologies. The lead instructor is
very enthusiastic and gifted with years of industry experience as a chief data scientist. The course has a well-designed with
systematic curriculum structure where you get to learn each component of the Big Data Ecosystem with a big picture of the
whole Machine-Learning pipeline (online and offline).
Jason Lee
Student Testimonial
I took the Big Data course with WeCloudData. The course introduces the latest big data tools and platforms such as Apache
Hadoop and Amazon Web Services, as well as real-world use cases and industrial best practices. The course also includes an end-
to-end group project which will definitely be something you can be proud of.
I chose this course basically because my company uses Apache Spark and Hadoop distributed system, and I would like to learn
more about it. Surprisingly, what I learned from this course has been far beyond my expectation! I wish I knew WeCloudData
earlier so that I wouldn't have been that struggled at work.
I would also like to express my gratitude and appreciation to the instructor Shaohua in this course. He is extraordinarily
knowledgeable and experienced, one of the best instructor I have ever seen! The way he approaches to a theory is really
straightforward and easy to understand. He is nice and patient while answering questions as well, and always makes sure every
student is on the right track. The program managers of WeCloudData are kind and amiable too. It was a great pleasure to talk
with them!
Grace Tian
Big Data for Data Scientists
How to convince your employer
Do you know that most employers will reimburse the training costs?
• We have a detailed course syllabus and email template that you can use to convince
your manager that this is the right course for you and a good investment for your
company
• You will have a completed project and presentation that you can use to demo to
your manager and showcase your newly minted Big Data skills and get ready for
more interesting data analytics projects
Big Data for Data Scientists
Price
Course Pricing
Big Data & Spark for DS $2000 + tax
Upcoming WeCloud Events
Event Schedules
Upcoming Events
Schedule
Track
Meetup
Org
Topic Date
Big Data WeCloudData Big Data for Data Scientist – Open Class Jun 4
Big Data WeCloudData Spark on Kubernetes Jun 5
Big Data Lightbend
Running Kafka on Kubernetes with
Strimzi
Jun 11
Cloud Big Data & AI Conference
Machine Learning from Experimentation
to Production on AWS
Jun 12
Big Data Big Data & AI Conference
Transforming big data from On-premise
to the Cloud
Jun 12
Big Data Big Data & AI Conference Spark for Data Science Jun 13
Data Science Big Data & AI Conference Moving Towards a Python Environment Jun 13
Big Data | Data Science WeCloudData
Machine Learning Deployment with
Spark and Amazon Sage Maker
Jun 16
Big Data | Data Science WeCloudData Apache Spark Hands-on Workshop Jun 18
tordatascience
For details, visit https://www.meetup.com/tordatascience/
TYPE OF DATA JOB SEEKERS

More Related Content

What's hot

Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceDatabricks
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Marc Lelijveld
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
 

What's hot (20)

Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Data engineering
Data engineeringData engineering
Data engineering
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field Experience
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
 

Similar to Big Data for Data Scientists - Info Session

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
360 digital transformation profile
360 digital transformation   profile360 digital transformation   profile
360 digital transformation profileKamal Singh
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Northern New England TUG May 2024 - Abbott, Taft, Rugemer
Northern New England TUG May 2024 - Abbott, Taft, RugemerNorthern New England TUG May 2024 - Abbott, Taft, Rugemer
Northern New England TUG May 2024 - Abbott, Taft, Rugemerpatrickdtherriault
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 

Similar to Big Data for Data Scientists - Info Session (20)

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
360 digital transformation profile
360 digital transformation   profile360 digital transformation   profile
360 digital transformation profile
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Big Data Analyst at BankofAmerica
Big Data Analyst at BankofAmericaBig Data Analyst at BankofAmerica
Big Data Analyst at BankofAmerica
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate Presentation
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Northern New England TUG May 2024 - Abbott, Taft, Rugemer
Northern New England TUG May 2024 - Abbott, Taft, RugemerNorthern New England TUG May 2024 - Abbott, Taft, Rugemer
Northern New England TUG May 2024 - Abbott, Taft, Rugemer
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 

More from WeCloudData

AWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataAWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataWeCloudData
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataWeCloudData
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataWeCloudData
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataWeCloudData
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudDataWeCloudData
 
Introduction to Python by WeCloudData
Introduction to Python by WeCloudDataIntroduction to Python by WeCloudData
Introduction to Python by WeCloudDataWeCloudData
 
Data Science Career Insights by WeCloudData
Data Science Career Insights by WeCloudDataData Science Career Insights by WeCloudData
Data Science Career Insights by WeCloudDataWeCloudData
 
Web scraping project aritza-compressed
Web scraping project   aritza-compressedWeb scraping project   aritza-compressed
Web scraping project aritza-compressedWeCloudData
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)WeCloudData
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901WeCloudData
 

More from WeCloudData (11)

AWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataAWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudData
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 
Introduction to Python by WeCloudData
Introduction to Python by WeCloudDataIntroduction to Python by WeCloudData
Introduction to Python by WeCloudData
 
Data Science Career Insights by WeCloudData
Data Science Career Insights by WeCloudDataData Science Career Insights by WeCloudData
Data Science Career Insights by WeCloudData
 
Web scraping project aritza-compressed
Web scraping project   aritza-compressedWeb scraping project   aritza-compressed
Web scraping project aritza-compressed
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew Reyes
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单ewymefz
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxzahraomer517
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单ewymefz
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatheahmadsaood
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhArpitMalhotra16
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单nscud
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

Big Data for Data Scientists - Info Session

  • 1. Big Data for Data Scientists Trends and Use Cases WeCloudData @WeCloudData @WeCloudData tordatascience weclouddata WeCloudData tordatascience
  • 2. Career Services Meetup Events Introduction Data Skills Training WeCloudData offers Toronto’s first data science accelerator program. We specialize in teaching lead-edge tools such as AWS, Spark, and Machine Learning and help our corporate clients upskill/reskill their data teams
  • 3. WCD works with some of the most talented and experienced data science experts to deliver public and corporate trainings. We currently have 21 part-time and 2 full-time instructors. Our instructors bring their analytical expertise from various industries, teach students advanced tools such as Python, Hadoop, Spark, and AWS, mentor students on end- to-end data projects. Introduction Faculty Team 21 Instructors 10 Teaching Assistants
  • 4. Python for SAS and SQL Users Machine Learning | Deep Learning Big Data Executive Workshops Product & Services Corporate Training We offer customized corporate training to Canadian companies with flexible schedules and learning support! We help train, upskill, and reskill data teams!
  • 5. Python for SAS Users Machine Learning Big Data AI/DS for Executives Corporate Data Programs We’ve delivered customized trainings to many large Canadian companies WeCloudData Corporate Program We offer customized corporate training to Canadian companies with flexible schedules and learning support! We help train, upskill, and reskill data teams!
  • 6. Introduction Communities we’re building 8,000 members 120 events We organize one of the most active DS communities in Canada!
  • 7. Workshop Provider Conference/Clients Workshop Provider TMLS Conference November, 2018 Workshop Provider TD Canada Analytics Month October, 2018 • Machine Learning Open Data • Spark ML and MLflow • Deep Learning with PyTorch • Python for SAS Users • Machine Learning with Python Workshop Provider Big Data & AI Toronto 2019 June, 2019 • Big Data in AWS Cloud • Spark for Data Science • Moving from On-Prem to Cloud WeCloudData is the conference workshop choice of vendors in Toronto due to our expertise and specialty.
  • 8. Analytics Events We help companies with hiring/branding events WeCloudData organizes one of the largest and most active data science communities in Toronto with 7,500 members and 110 past events. We help companies facilitate mini- conferences and help them run hiring events.
  • 9. 2005 2007 2008 2010 2011 2015 2012 2014 2016 2018 Instructor Shaohua Zhang • Co-founder and CEO of WeCloudData. Lead instructor for the corporate training program • Certified SAS Predictive Modeler since 2007 (among the first 20 in the world) • Helped build and lead the data science team at BlackBerry (2010 – 2015) • Helping Communitech incubator and Open Data Exchange mentor startups on data strategies • Specializes in machine learning, big data, and cloud computing
  • 10. Learning Path Data Science Program Prerequisites Data Science Learning Path Learn to build ML models using Sklearn ML Applied Master data wrangling with Python Data Science w/ Python Harness big data with Hadoop, Hive, Presto, and AtScale Big Data Build your portfolio with hands-on Capstone projects ML Advanced Machine Learning at Scale with PySpark ML and Real-time Deployment Spark Contact us about the courses: • info@weclouddata.com Upcoming courses: • https://weclouddata.com/upcoming-course-schedule
  • 11. Linux/Docker Scala Spark Programming for Data Engineering Hadoop/Hive Data Ingestion Workflow NoSQL ETL (Big Data) Spark Internals Spark Tunings Spark In-Depth Kafka Spark Streaming Apache Flink Realtime Analytics Scaling ML Model Deployment Pipeline Automation Machine Learning Engineering Learn to build data pipelines, scale data processing with big data tools, and deployment real-time applications and machine learning models at scale. Data Engineering Learning Path Learning Path Data Engineering Program Contact us about the courses: • info@weclouddata.com Upcoming courses: • https://weclouddata.com/upcoming-course-schedule
  • 13. Data Jobs in the MarketData Handling Complex Analytics Big Data Storytelling Data Science Data Scientist
  • 14. Coding/Tools Math/ML Storytelling Data Scientist Linux Python/Scala/Java Cloud (AWS) Hadoop, Spark Statistics Linear Algebra Regression Classification Clustering NLP Presentation Use cases Project Mgmt Communications Data Science Essential Skills
  • 15. Data Scientist Data Analyst Data Science Job requirements
  • 17. Data Scientist The Types Operational DS Focus: data wrangling, work with large/small messy data, builds predictive models Strength: data handling, tools, business knowledge ML Engineer Focus: ML model deployment, data pipelines Strength: coding, algorithms, machine learning, platforms and tools ML Researcher Focus: algorithm development, research, IP Strength: ML/DL algorithms, implmentation, research DS Product Mngr Focus: product strategy, business communications, project management Strength: product sense, business requirements, DS acumen
  • 18. Data Scientists are like unicorns… so they’re hard to find. Let’s focus on building the data science teams.. that have data scientists, engineers, and analysts working towards the same goal. Data Science Team
  • 19. 2008 2010 2015 2016 2018 Predictive Modeler Grad School Data Scientist Data Scientist Instructor DS Trainer Mentor My DS Journey Shaohua Zhang Operational Data Scientist Product Manager Data/ML Engineer Tools Projects Churn Up-sell/Cross-sell Social Network Recommender Big Data Cloud Chatbot Deployment HR | Retail | Digital Analytics Predictive Maintenance
  • 20. Predictive Modeler GrowthAcquisition Maturity Decline Loss ● Lead Gen ● Digital Mktg ● Mobile Ads ● Cross/Up-sell ● Segmentation ● CLTV ● Taste graph ● Personalization ● Loyalty Management ● Context-based Mkgt ● Churn models ● Retention Acquisition Models LTV Loyalty Management Retention Winback Customer Value ● Winback models Predict high risk customers
  • 23. Twitter API Data Scientist Business Our new product feature received a lot of negative review.. - Can we do some analysis?
  • 24. Data Scientist Business Our new product feature received a lot of negative review.. - Can we do some analysis? The analysis looks good. Can we build a small tool?
  • 28. Credit Approval Age Gender Annual Salary Months in Residence Months in Job Current Debt Paid off Credit Client 1 23 M $30,000 36 12 $5,000 Yes Client 2 30 F $45,000 12 12 $1,000 Yes Client 3 19 M $15,000 3 1 $10,000 No Client 4 25 M $25,000 12 27 $15,000 ? Data Preparation
  • 33. Credit: https://arxiv.org/pdf/1409.3809.pdf GET /velox/catify/predict?userid=22&song=277568 GET /velox/catify/predict_top_k?userid=22&k=100 Velox Prediction Service Model Manager Web Application HDFS The Missing Piece ML Prediction API
  • 35. Big Data – 4 V’s Paris $1000mVolume London $1000mVelocity Tokyo $1000mVariety New York $1000mValue “More data cross the internet every second than were stored in the entire internet just 20 years ago” - Big Data: The Management Review (HBR) Internet • 2.3 Zetabytes/day (2014) Facebook • 500 TB/day (2012) Programmatic Ads • 200ms Fraud Detection • 400ms Fraud Prevention • 50ms Structured • Relational Unstructured • Image / Voice / Text Semi-structured • Graph “Regardless of its size, data is worthless if not turned into actionable insight”
  • 36. Internet o 2.5 exabytes (2.5x1018) per day – 2012 o 2.3 zettabytes (2.3x1021) per day - 2014 Facebook o 500+ terabytes per day o 100+ petabytes in a single Hadoop cluster “More data cross the internet every second than were stored in the entire internet just 20 years ago” - Big Data: The Management Review (HBR) VelocityVolume Variety Big Data - Volume
  • 38. ¨ Data Variety ¤ Structured n Table n Relational ¤ Unstructured n Text n Image n Audio/Video ¤ Semi-structured n XML n JSON n Graph Big Data – Variety
  • 39. History of Big Data (Hadoop) Hadoop Big Data Map Reduce Apache Spark Big data - Google Trends Google MapReduce Paper Doug Cutting got hired by Yahoo! to work on Hadoop Spark took off
  • 40. Knowing more tools is always helpful. Knowing how to put them to work together is more important!
  • 42. Single Node Architecture • Traditionally, computation has been CPU bound • Complex computation on small data • For decades, the primary push is to increase the computing power of a single machine
  • 43. Scale Up vs. Scale Out • Single Node Architecture • Scaling up advantage • Programming is easier than distributed computing • Faster processing on smaller data • Scale up disadvantage • Hardware cost • Scalability • Advantage of scale-out systems • Scalability • Cost
  • 44. Traditional Distributed Systems: Problems • Modern large scale processing is distributed across machines • Often hundreds or thousands of nodes • Focuses on distributing the processing workload • Powerful compute nodes • Separate systems for data storage • Fast network connections to connect them • Problems with these distributed systems: • Complex programming model • It is difficult to deal with partial failures of the system • Bandwidth limitations • Data consistency • Typically at compute time, data is copied to the compute nodes • This doesn’t scale to today’s big data problems!
  • 45. Data Becomes the Bottleneck • Traditional distributed systems don’t scale to today’s Internet- scale data • Getting data to the computer processor becomes the bottleneck • Disk I/O is slow • Network bandwidth is bottleneck • Solution à moving computation to the data! Internet o 2.5 exabytes (2.5x1018) per day – 2012 o 2.3 zettabytes (2.3x1021) per day - 2014 Facebook o 500+ terabytes per day o 100+ petabytes in a single Hadoop cluster
  • 46. Modern Distributed Computing Cluster • Cluster architecture • A medium-to -large Hadoop cluster consists of a two-level or three-level architecture built with rack-mounted servers. Each rack of servers is interconnected using a 1 Gigabyte Ethernet switch. Each rack-level switch is connected to a cluster-level switch (which is typically a larger port-density 10GbE switch) Stunning Photos Of Google's Massive Data Centers: http://www.forbes.com/pictures/edej45emjgl/up-above-the-massive-floor/
  • 47. split node1 node2 node4node3 Block 1 Block 3Block 2 HDFS Hadoop Distributed File System
  • 48. • The blocks are replicated to nodes throughout the cluster • Based on the replication factor (3 by default) • Replication increases reliability and performance • Reliability: can tolerate data loss • Performance: more opportunities for data locality HDFS - Replications split DN1 DN2 DN4DN3 Block 1 Block 3Block 2
  • 49. • The NameNode stores all metadata • Information about file locations in HDFS • Information about file ownership and permissions • Name of the individual blocks • Locations of the blocks • Metadata is stored on disk and read into memory when the NameNode daemon starts up • Changes/Edits to the files are written to the logs The Name Node file à /user/lab/myFile.txt replication à 3 blocksà red,green,blue block locations à … Name Node DN1 DN2 DN4DN3
  • 50. I wish to wish the wish you wish to wish, but if you wish the wish the witch wishes, I won’t wish the wish you wish to wish I wish to wish the wish you wish to wish, but if you wish the wish the witch wishes, I won’t wish the wish you wish to wish 1 1 11 1 1 1 1 1 I wish to the you 1 11 1 1 1 wish but if you the 1 1 1 1 1 1 witch wishes I won’t wish 1 1 the you to 1 1 1 1 1 1 4 2 1 1 I wish to the you 3 1 1 1 wish but if you the 2 1 1 1 1 witch wishes I won’t wish 4 the you to 1 1 1 but I if to the witch wishes won’t wish you 1 2 1 3 4 1 1 1 11 3 1 1 1 but I if 2 1 to the 1 1 1 witch wishes won’t wish 4 you 1 1 2 1 3 4 1 1 1 Documents Splitting Map Shuffle/SortCombine Reduce MapReduce handles these automatically for you!! MapReduce - WordCount
  • 52. Slave Slave Slave Hadoop Hive HDFS hive > create table tweets_filter as > select * from tweets > where to_date(ts) in (‘2010-03-02’, ‘2010-0303’) Hive Driver Interpret the query Optimize the computation Create job plan and send to Hadoop Hive CLI TT 1 MySQL Metast ore Master Job2398564 Apache Hive Map JobTracker NameNode TT 2 TT 3
  • 54. Apache Presto Advantage Daily/Hourly Batch Jobs Interactive Queries Daily/Hourly Batch Jobs Interactive Queries
  • 55. Apache Presto Advantage Daily/Hourly Batch Jobs Interactive Queries SQL on any datasets
  • 60. Multi-core CPUs RAM Hard Drive SSD Nodes in a different rack Network 1Gb/s or 125 MB/s 100 MB/s 600 MB/s 10GB/s 0.1Gb/s RAM vs. Disk vs. Network
  • 61. • A unified platform that supports many data processing needs including • Batch processing (Spark) • Stream processing (Spark Streaming) • Interactive (SparkSQL) • Iterative (MLlib, ML, GraphX, GraphFrame) Spark - Unified Data Platform O ne size fits m any!
  • 62. Visualization Advanced Analytics Data Processing Database Data Scientist Toolbox (Big Data) Enterprise - Traditional
  • 63. Visualization Advanced Analytics Data Processing Platform Data Scientist Toolbox (Big Data) Enterprise – New/Cloud
  • 64. Visualization Advanced Analytics Data Processing Data Lake Data Scientist Toolbox (Big Data) Startups | Tech | Digital Labs | Big Data Teams
  • 65. Visualization Advanced Analytics Data Processing Data Lake Data Scientist Toolbox (Big Data) Enterprise – New Trend
  • 66. Big Data for Data Scientists Course Detail
  • 67. Learning Path Data Science Program Prerequisites Data Science Learning Path Learn to build ML models using Sklearn ML Applied Master data wrangling with Python Data Science w/ Python Harness big data with Hadoop, Hive, Presto, and AtScale Big Data Build your portfolio with hands-on Capstone projects ML Advanced Machine Learning at Scale with PySpark ML and Real-time Deployment Spark Contact us about the courses: • info@weclouddata.com Upcoming courses: • https://weclouddata.com/upcoming-course-schedule
  • 68. Big Data for Data Scientists About this course • For learners who want to get started with big data, the sheer number of tools in the ecosystem always feels overwhelming and confusing. With a well-structured curriculum and instructors who have years of industry experience implementing big data solutions, the Big Data for Data Scientist will help you focus on learning the tools that matter the most. • This course covers several popular big data platforms and frameworks that modern data scientists and analysts need to master. Students learn throughout the course to integrate different tools such as Hadoop, Hive, Presto, AWS, and NoSQL to solve real- world data challenges. • The course is built around an end-to-end big data pipeline to process terabyte scale data (billions of records) in a cloud environment. Students gain first-hand experience on data collection, ingestion, distributed storage, distributed processing, and interactive visualizations. • Many big data use cases will be covered to help consolidate the learnings and most importantly students gain real-life experience and confidence to apply the knowledge learned back to their data science projects at work.
  • 69. Big Data for Data Scientists Who is this course for? • This course serves as a great foundational course for professionals who want to switch career, graduates who want to get into this field as a data scientist, and big data enthusiasts who want to learn the hottest big data tools such as Hadoop, Hive, Presto, AWS, and NoSQL and apply them to solve real-world big data problems. • For new graduates and job seekers, this course teaches you the essential big data tools and concepts required for modern data scientist jobs and then complementary big data interview questions will get your prepared for interview challenges. • For data scientists who want to gain new skills, the course will give you comprehensive view of the big data ecosystem and prepare you for the big data tasks at work. • For tech-savvy project managers who want to gain a comprehensive understanding of big data use cases and lifecycles, the hands-on project in this course gives you exactly what you hope for.
  • 70. Big Data for Data Scientists Learning outcome After this course, the students will be able to • Gain competence to take on real data challenges at workplace and demonstrate experience and advantage in the job market with the learned skills added to the resume • Gaining solid understanding of the Big Data ecosystem and various real-world use cases • Comfortable working with different big data platforms such as Hortonworks and AWS EMR, run Hive ETL pipelines and querying large datasets with Apache Presto • Build and automate data pipelines with Apache Airflow and build a project demo via visualization dashboard with Superset • Gain real world experience through a hands-on project and convince your manager/peers that you’re up for big data related projects at work
  • 71. 2005 2007 2008 2010 2011 2015 2012 2014 2016 2018 Big Data for Data Scientists Instructor – Shaohua Zhang • Co-founder and CEO of WeCloudData. Lead instructor for the Big Data course and the corporate training program • Helped build and lead the data science team at BlackBerry (2010 – 2015) • Helping Communitech incubator and Open Data Exchange mentor startups on data strategies • Specialize in machine learning, big data, and cloud computing
  • 72. Big Data for Data Scientists Prerequisites Prerequisites • You do not need prior experience with programming languages such as python, but it helps! • Familiarity with Linux Commands, SQL and relational database concepts • Having an understanding of your company’s big data use case, technologies, and goals will motivate and direct your focus in this course
  • 73. Lecture Content Lecture Content 1 Big Data • Introduction to Big Data • Big Data Use Cases • AWS – EC2/S3 7 Spark Core • Introduction to Spark Core • Spark RDD Operations 2 Hadoop • Hadoop Data Distributed Filesystem • MapReduce with Python • AWS - EMR 8 Spark DataFrame | SQL • Spark DataFrame and SQL • Complex Transformations and UDFs 3 Apache Hive | Sqoop • Hive Introduction • Hive Queries • Apache Sqoop • Project kick-off 9 Spark Performance Tuning • Spark Internals • Performance Tunings 4 SQL on Hadoop • Presto/Impala • Apache Kylin/AtScale 10 Spark ML • Spark Machine Learning API • Building Classification and Regression Models 5 NoSQL • Amazon DynamoDB • Cassandra • Elasticsearch 11 Spark ML II • Recommender System with Spark • Deep Learning on Spark 6 Data Pipeline • Data pipeline with Airflow • Visualization with Superset • Project Discussion 12 Spark Streaming • Kafka/Kinesis • Spark Streaming • Project Presentation Syllabus Big Data for Data Scientists Syllabus (Weekend Cohort – 12 sessions/48 hours)
  • 74. Big Data for Data Scientists Industry Use Cases In this course, we not only teach students how to use the big data tools, but also common use cases. Understanding the real-world use cases and industry best practices will allow the students to apply skills to their company’s data problems Use Cases • Big data use cases in retail personalization • Big data use cases for retail banking • Big data use cases for fraud analytics • Big data use cases in compliance analytics • Big data use cases in online advertising
  • 75. Big Data for Data Scientists Hands-on Project This course is instructor-led and project-based. Students will be able to apply the big data knowledge acquired during the lectures build an end-to-end big data project. Project: Building an AWS-based Big Data Pipeline • Real-time data collection and ingestion via Kinesis and NoSQL • Build Hive databases and ETL pipelines • Interactive data analysis with Presto • Building streaming MOLAP cubes with Apache Kylin • Real-time dashboard with Apache Superset • Workflow automation with Apache Airflow Data Size: 500GB ~ 1TG Records: 1 billion + Twitter API Kinesis
  • 76. Student Project Demo Stock price prediction using twitter sentiment and deep learning
  • 77. Student Project Demo Real-time Twitter Sentiment Pipeline
  • 78. Student Project Demo Real-time Twitter Sentiment Pipeline
  • 79. Big Data for Data Scientists Learning Support Support you will receive during this course include • Mentorship and advice from an industry expert • In-classroom learning assistance by our assistant instructor • Online learning support on Slack from instructor and TA • Hands-on labs and projects to help you apply what you learn • Additional resources to help you gain advanced knowledge • Help from our learning advisor on how to choose the learning path and specialization courses after the Big Data course
  • 80. Big Data for Data Scientists Testimonials This course really helped me with in-depth explanation and application of Cloud and Big Data technologies. The lead instructor is very enthusiastic and gifted with years of industry experience as a chief data scientist. The course has a well-designed with systematic curriculum structure where you get to learn each component of the Big Data Ecosystem with a big picture of the whole Machine-Learning pipeline (online and offline). Jason Lee Student Testimonial I took the Big Data course with WeCloudData. The course introduces the latest big data tools and platforms such as Apache Hadoop and Amazon Web Services, as well as real-world use cases and industrial best practices. The course also includes an end- to-end group project which will definitely be something you can be proud of. I chose this course basically because my company uses Apache Spark and Hadoop distributed system, and I would like to learn more about it. Surprisingly, what I learned from this course has been far beyond my expectation! I wish I knew WeCloudData earlier so that I wouldn't have been that struggled at work. I would also like to express my gratitude and appreciation to the instructor Shaohua in this course. He is extraordinarily knowledgeable and experienced, one of the best instructor I have ever seen! The way he approaches to a theory is really straightforward and easy to understand. He is nice and patient while answering questions as well, and always makes sure every student is on the right track. The program managers of WeCloudData are kind and amiable too. It was a great pleasure to talk with them! Grace Tian
  • 81. Big Data for Data Scientists How to convince your employer Do you know that most employers will reimburse the training costs? • We have a detailed course syllabus and email template that you can use to convince your manager that this is the right course for you and a good investment for your company • You will have a completed project and presentation that you can use to demo to your manager and showcase your newly minted Big Data skills and get ready for more interesting data analytics projects
  • 82. Big Data for Data Scientists Price Course Pricing Big Data & Spark for DS $2000 + tax
  • 84. Upcoming Events Schedule Track Meetup Org Topic Date Big Data WeCloudData Big Data for Data Scientist – Open Class Jun 4 Big Data WeCloudData Spark on Kubernetes Jun 5 Big Data Lightbend Running Kafka on Kubernetes with Strimzi Jun 11 Cloud Big Data & AI Conference Machine Learning from Experimentation to Production on AWS Jun 12 Big Data Big Data & AI Conference Transforming big data from On-premise to the Cloud Jun 12 Big Data Big Data & AI Conference Spark for Data Science Jun 13 Data Science Big Data & AI Conference Moving Towards a Python Environment Jun 13 Big Data | Data Science WeCloudData Machine Learning Deployment with Spark and Amazon Sage Maker Jun 16 Big Data | Data Science WeCloudData Apache Spark Hands-on Workshop Jun 18 tordatascience For details, visit https://www.meetup.com/tordatascience/
  • 85. TYPE OF DATA JOB SEEKERS