SlideShare a Scribd company logo
1 of 27
DevSpace would like to thank our
sponsors!
The Buzzwords
• Big Data
• Fast Data
• Dark Data
• Unstructured Data
• Data Mining
• Data Vizualization
• Predictive Analytics
• Machine Learning
• [Deep] Neural Network
The Growth
The Demand
• National gap for analytical expertise at 140k+ by 2017. –McKinsey 2011
• Shortage of 100k Data Scientists by 2020. –Gartner 2012
• 90% of clients need expertise, 40% cite lack of talent. –Accenture 2014
• Survey finds 83% of data scientists see shortage. –Crowdflower 2016
The Salary
https://www.paysa.com/salaries/data-scientist--t
The Definition
A data scientist is a job title for an employee or
business intelligence (BI) consultant who excels at
analyzing data, particularly large amounts of data, to
help a business gain a competitive edge.
–WhatIs.com
The Definition
The Breakdown
Data
•Define
•Collect
•Store
•Explore
Scientist
•Hypothesis
•Plan Approach
•Analysis
•Report Results
The Job
• Educate the business
• Look for problems to solve
• Research new techniques
• Collate data for analysis (ETL)*
• Implement algorithms
• Design big data-capable architecture
• Present insights
The Wrangling
Sample the Data
•Random
•Stratified
Reconcile Missing Data
•Discard
•Infer
Normalize Numeric Values
•Standard Unit of Measure
•Subtract Average (Mean = 0)
•Divide by Standard Deviation
Reduce Dimensionality
•Irrelevant Input Variables
•Redundant Input Variables
Add Derivative Values
•Generalize Attributes
•Discretize Attributes to Categories
•Binarize Categorical Attributes
Design Training Data
•Select
•Combine
•Aggregate
Power and Log transformation
•Approximate Normal Distribution
The Analysis Tools
The Tool Trends
Python
KNIME
RapidMiner
R
SPSS
SAS
Hadoop
The Top Tools
• SQL
• Excel
• Python
• R
• MySQL
The Languages
• R
• Python
• Java/Scala
• Stata
• SAS
• SPSS
• Matlab
• Julia
• Kafka/Storm
http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html
The Math
• basic statistics (ie. p-value)
• statistical modeling
• statistical tests
• experiment design
• distributions
• maximum likelihood estimators
• probability theory
• linear algebra
• multivariable calculus
The Visualization Tools
• Tableau (enterprise visualization products) - www.tableau.com
• ggvis (R visualization package) - ggvis.rstudio.com
• ggplot (plotting system) - ggplot.yhathq.com
• D3.js (declarative DOM manipulation) - d3js.org
• Vega (visualization grammar)- trifacta.github.com/vega
• Rickshaw (charting library - code.shutterstock.com/rickshaw
• modest maps (map library) - modestmaps.com
• Chart.js (plotting library) - www.chartjs.org
The Machine Learning
Concepts
•k-nearest neighbors
•random forests
•ensemble methods
•…use Python libraries!
Tools
•Weka - www.cs.waikato.ac.nz/ml/weka/
The Results
• Report
• Presentation
• Demo
• Prototype
• Component
The Skills
• Data Analyst (A)
• Data Engineer (B)
• Academic (Ab)
• Generalist (AB)
http://blog.udacity.com/2014/11/data-science-job-skills.html
The Path
1. Fundamentals
2. Statistics
3. Programming
4. ML
5. Text Mining
6. Visualization
7. Big Data
8. Data Munging
9. Toolbox
Fundamentals
1. Matrices & Linear Algebra
2. Hash Functions, Binary Tree, O(n)
3. Relational Algebra, DB Basics
4. Inner, Outer, Cross, Theta Join
5. Cap Theorem
6. Tabular Data
7. Data Frames & Series
8. Sharding
9. OLAP
Fundamentals
10. Multidimensional Data Model
11. ETL
12. Reporting vs BI vs Analytics
13. JSON & XML
14. NoSQL
15. Regex
16. Vendor Landscape
17. Env Setup
Statistics
1. Pick a Dataset
2. Descriptive Statistics
3. Exploratory Data Analysis
4. Histograms
5. Percentiles and Outliers
6. Probability Theorem
7. Bayes Theorem
8. Random Variables
9. Cumul Dist Fn (CDF)
The Training
• Coursera - www.coursera.org
• EdX- www.edx.org
• Udacity - www.udacity.com
• Kaggle - www.kaggle.com
• Youtube - projects.iq.harvard.edu/stat110/youtube
• Boot Camps

More Related Content

What's hot

42reports hiring at TechStartupJobs Fair Berlin Spring 2015
42reports hiring at TechStartupJobs Fair Berlin Spring 201542reports hiring at TechStartupJobs Fair Berlin Spring 2015
42reports hiring at TechStartupJobs Fair Berlin Spring 2015TechMeetups
 
Machine Learning with PyCaret
Machine Learning with PyCaretMachine Learning with PyCaret
Machine Learning with PyCaretDatabricks
 
Data Visualization and Analytics with Tableau
Data Visualization and Analytics with TableauData Visualization and Analytics with Tableau
Data Visualization and Analytics with TableauDele Amefo
 
BigData_and_Analytics
BigData_and_AnalyticsBigData_and_Analytics
BigData_and_AnalyticsTMA Solutions
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...Joshua Shinavier
 
Data scoience demo
Data scoience demoData scoience demo
Data scoience demoanngeeth
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmersOutliers Collective
 

What's hot (11)

42reports hiring at TechStartupJobs Fair Berlin Spring 2015
42reports hiring at TechStartupJobs Fair Berlin Spring 201542reports hiring at TechStartupJobs Fair Berlin Spring 2015
42reports hiring at TechStartupJobs Fair Berlin Spring 2015
 
Portfolio vinicius paluch
Portfolio   vinicius paluchPortfolio   vinicius paluch
Portfolio vinicius paluch
 
Machine Learning with PyCaret
Machine Learning with PyCaretMachine Learning with PyCaret
Machine Learning with PyCaret
 
Data Visualization and Analytics with Tableau
Data Visualization and Analytics with TableauData Visualization and Analytics with Tableau
Data Visualization and Analytics with Tableau
 
BigData_and_Analytics
BigData_and_AnalyticsBigData_and_Analytics
BigData_and_Analytics
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
 
Data scoience demo
Data scoience demoData scoience demo
Data scoience demo
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmers
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Graphs
GraphsGraphs
Graphs
 
Equity Valuation (AI Augmented Diligence using BayesDB)
Equity Valuation (AI Augmented  Diligence using BayesDB)Equity Valuation (AI Augmented  Diligence using BayesDB)
Equity Valuation (AI Augmented Diligence using BayesDB)
 

Similar to From Developer to Data Scientist

NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceMark West
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Data Science - Tools and Methods: Practitioner Perspectives
Data Science - Tools and Methods: Practitioner Perspectives Data Science - Tools and Methods: Practitioner Perspectives
Data Science - Tools and Methods: Practitioner Perspectives Scott Allen Mongeau
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Databricks
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienITCamp
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Wes McKinney
 

Similar to From Developer to Data Scientist (20)

NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Data Science - Tools and Methods: Practitioner Perspectives
Data Science - Tools and Methods: Practitioner Perspectives Data Science - Tools and Methods: Practitioner Perspectives
Data Science - Tools and Methods: Practitioner Perspectives
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines Kergosien
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Resume(kaushik shakkari)
Resume(kaushik shakkari)Resume(kaushik shakkari)
Resume(kaushik shakkari)
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Compilerpt
CompilerptCompilerpt
Compilerpt
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 

From Developer to Data Scientist

  • 1.
  • 2. DevSpace would like to thank our sponsors!
  • 3.
  • 4. The Buzzwords • Big Data • Fast Data • Dark Data • Unstructured Data • Data Mining • Data Vizualization • Predictive Analytics • Machine Learning • [Deep] Neural Network
  • 6. The Demand • National gap for analytical expertise at 140k+ by 2017. –McKinsey 2011 • Shortage of 100k Data Scientists by 2020. –Gartner 2012 • 90% of clients need expertise, 40% cite lack of talent. –Accenture 2014 • Survey finds 83% of data scientists see shortage. –Crowdflower 2016
  • 8. The Definition A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge. –WhatIs.com
  • 10.
  • 12. The Job • Educate the business • Look for problems to solve • Research new techniques • Collate data for analysis (ETL)* • Implement algorithms • Design big data-capable architecture • Present insights
  • 13. The Wrangling Sample the Data •Random •Stratified Reconcile Missing Data •Discard •Infer Normalize Numeric Values •Standard Unit of Measure •Subtract Average (Mean = 0) •Divide by Standard Deviation Reduce Dimensionality •Irrelevant Input Variables •Redundant Input Variables Add Derivative Values •Generalize Attributes •Discretize Attributes to Categories •Binarize Categorical Attributes Design Training Data •Select •Combine •Aggregate Power and Log transformation •Approximate Normal Distribution
  • 16. The Top Tools • SQL • Excel • Python • R • MySQL
  • 17. The Languages • R • Python • Java/Scala • Stata • SAS • SPSS • Matlab • Julia • Kafka/Storm http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html
  • 18. The Math • basic statistics (ie. p-value) • statistical modeling • statistical tests • experiment design • distributions • maximum likelihood estimators • probability theory • linear algebra • multivariable calculus
  • 19. The Visualization Tools • Tableau (enterprise visualization products) - www.tableau.com • ggvis (R visualization package) - ggvis.rstudio.com • ggplot (plotting system) - ggplot.yhathq.com • D3.js (declarative DOM manipulation) - d3js.org • Vega (visualization grammar)- trifacta.github.com/vega • Rickshaw (charting library - code.shutterstock.com/rickshaw • modest maps (map library) - modestmaps.com • Chart.js (plotting library) - www.chartjs.org
  • 20. The Machine Learning Concepts •k-nearest neighbors •random forests •ensemble methods •…use Python libraries! Tools •Weka - www.cs.waikato.ac.nz/ml/weka/
  • 21. The Results • Report • Presentation • Demo • Prototype • Component
  • 22. The Skills • Data Analyst (A) • Data Engineer (B) • Academic (Ab) • Generalist (AB) http://blog.udacity.com/2014/11/data-science-job-skills.html
  • 23. The Path 1. Fundamentals 2. Statistics 3. Programming 4. ML 5. Text Mining 6. Visualization 7. Big Data 8. Data Munging 9. Toolbox
  • 24. Fundamentals 1. Matrices & Linear Algebra 2. Hash Functions, Binary Tree, O(n) 3. Relational Algebra, DB Basics 4. Inner, Outer, Cross, Theta Join 5. Cap Theorem 6. Tabular Data 7. Data Frames & Series 8. Sharding 9. OLAP
  • 25. Fundamentals 10. Multidimensional Data Model 11. ETL 12. Reporting vs BI vs Analytics 13. JSON & XML 14. NoSQL 15. Regex 16. Vendor Landscape 17. Env Setup
  • 26. Statistics 1. Pick a Dataset 2. Descriptive Statistics 3. Exploratory Data Analysis 4. Histograms 5. Percentiles and Outliers 6. Probability Theorem 7. Bayes Theorem 8. Random Variables 9. Cumul Dist Fn (CDF)
  • 27. The Training • Coursera - www.coursera.org • EdX- www.edx.org • Udacity - www.udacity.com • Kaggle - www.kaggle.com • Youtube - projects.iq.harvard.edu/stat110/youtube • Boot Camps

Editor's Notes

  1. Chris Gardner
  2. Trying to understand the world of Data Science and “Big Data” can be overwhelming. Not only is it huge, but it is constantly changing! Innovation is shifting from Infrastructure and Analytics toward Applications. http://mattturck.com/wp-content/uploads/2016/03/Big-Data-Landscape-2016-v18-FINAL.png
  3. Before we discuss data scientists, lets look at some common buzz words. Big Data – Data sets so large they require techniques to analyze. Fast Data – Data whose utility is going to decline over time (fast ingest, streaming, preparation, analytics, user response). Dark Data – People don’t know it’s there, don’t know how to access it, aren’t allowed access, or the systems haven’t been set up to leverage it yet. Data Mining – Examining large data sets top generate new insights. Predictive Analytics – Extracting information from existing data to determine patterns. Machine Learning – The use of algorithms to learn from and make predictions on data. Deep Neural Network - Graphical models in which data is computed upon by successive layers of nodes. Any other major buzzwords? Lets talk about Data Science. http://www.zipfianacademy.com/blog/post/46864003608/a-practical-intro-to-data-science
  4. Within the last few years the amount of data being generated has drastically increased. The majority of this new data is unstructured.
  5. As storage prices dropped and the world became increasingly computerized, the amount of data being gathered grew exponentially, as did the opportunities to benefit from it’s analysis. McKinsey predicts need for expertise Gartner predicts talent shortage Accenture sees widespread need (unmet) Survey of industry confirms shortage Studies and surveys are great, but money talks!
  6. Software Engineer market salary of $118k Developer market salary of $94k So we know there’s a need, but what is a Data Scientist?
  7. Domain Expertise! https://digitalmarketing.temple.edu/romannicholas/2016/01/28/when-i-grow-up-i-want-to-be-a-data-scientist/
  8. Data munging is the process of converting “raw” data into a format that can be consumed. Input Variable = “Feature” Also Power and Log transformation A large portion of machine learning models are based on assumption of linearity relationship (ie: the output is linearly dependent on the input), as well as normal distribution of error with constant standard deviation.  However the reality is not exactly align with this assumption in many cases. http://horicky.blogspot.com.au/2012/05/predictive-analytics-data-preparation.html
  9. http://r4stats.com/articles/popularity/
  10. O’Reilly survey better represents open-source community Conclusion: Among the software that tends to be used as a collection of pre-written methods, R, SAS, SPSS and Stata tend to always be in the top, with R and SAS occasionally swapping places depending on the criteria used. I don’t include Python in this group as I rarely see someone using it exclusively to call pre-written routines. Using commodity hardware and open source software, Hadoop’s distributed file system (HDFS) facilitates the storage, management and rapid analysis of vast datasets across distributed clusters of servicers. Hadoop MapReduce Persists to disk (slow) Hive/Presto (new) SQL-like abstraction layers Pig Workflow-driven abstraction Spark * Runs in-memory Standalone or with Hadoop http://r4stats.com/articles/popularity/
  11. Tool
  12. A=Analyst B=Builder Don’t forget Domain Expertise!
  13. Author: Swami Chandrasekaran http://nirvacana.com/thoughts/becoming-a-data-scientist/
  14. 1. PythonLearn Python Programming From Scratch by UdemyLearn to program in Python by CodeCademyLearnPython.org interactive Python tutorial2. Machine LearningMachine learning onlineOperational Intelligence and Machine Data with Splunk3. R LanguageR Basics – R Programming Language Introduction by UdemyIntroduction to R at DataCampLearn R at Code school4. Big DataBig Data UniversityBig Data and Hadoop Essentials by UdemyBasic overview of Big Data Hadoopby- Udemy5. StatisticsStatistics One by CourseraStatistics and ProbabilityProbability & Statistics6. Data MiningData Mining and Web Scraping: How to Convert Sites into Data by UdemyData Mining by Coursera7. SQLInteractive Online SQL Training for BeginnersSachin Quickly Learns (SQL) – Structured Query Language by UdemySQL Tutorial by w3schools8. JavaLearn Java: The Java Programming Tutorial For Beginners by UdemyLearn Java – Free Interactive Java Tutorial Learn Java Programming From Scratch – Udemy