SlideShare a Scribd company logo
Demystifying Data Science
A Realistic Perspective
By
R Venkat Raman
• So Many Definitions
• So Many Assumptions
• So Many Expectations
• So Much Hype
WHY THE NEED TO DEMYSTIFY ?
R Venkat Raman
So Many Definitions
R Venkat Raman
WHAT IS DATA SCIENCE?
“Data science is the field of study that combines domain expertise, programming
skills, and knowledge of math and statistics to extract meaningful insights from data.”
“Data science is the discipline of making data useful.”
“Data Science as a multi-disciplinary subject encompasses the use of mathematics, statistics, and
computer science to study and evaluate data. The key objective of Data Science is to extract valuable
information for use in strategic decision making, product development, trend analysis and forecasting.”
“Data science is a ‘concept to unify statistics, data analysis, machine learning and their
related methods’ in order to ‘understand and analyze actual phenomena’ with data.”
R Venkat Raman
WHAT IS DATA SCIENCE?
THE VENN DIAGRAMS
R Venkat Raman
WHO IS A DATA SCIENTIST?
“An ideal data scientist is someone who has the both the engineering skills to acquire and manage
large data sets, and also has the statistician’s skills to extract value from the large data sets and
present that data to a large audience”
“A data scientist is someone who blends, math, algorithms, and an understanding of human
behaviour with the ability to hack systems together to get answers to interesting human questions
from data”
“A Data Scientist is a person who does Data Science”
“Person who is better at statistics than any software engineer and
better at software engineering than any statistician.”
R Venkat Raman
So Many Assumptions
R Venkat Raman
HOW PEOPLE PERCEIVE DATA SCIENCE
R Venkat Raman
So Many Expectations
R Venkat Raman
BECOMING DATA SCIENTIST – QUICKLY !!
R Venkat Raman
GETTING RICH QUICKLY !!
R Venkat Raman
So Much Hype
R Venkat Raman
CASE OF OLD WINE IN NEW BOTTLE ?
R Venkat Raman
ARTIFICIAL GENERAL INTELLIGENCE ?
R Venkat Raman
THE HYPE CYCLE – WHERE ARE WE ?
We are here
R Venkat Raman
Why This Buzz Now ?
R Venkat Raman
INCREASED STORAGE AND COMPUTING POWER
THE STATISTICS – MACHINE LEARNING DIVERGENCE
• In the 20th century, the computing and storage power was less. This required statisticians to infer a lot of things from a
sample. Hence inferential statistics was heavily used and relied upon.
• Fast forward now, the computing and storage power has increased substantially. This enabled machine learning and Deep
learning to blossom. In Machine/Deep Learning, more data the better as the prediction improves with more quality training
data. This thinking is divergent from a 20th century statistical thinking.
R Venkat Raman
EXPLOSION OF DATA
• 2.5 quintillion bytes of data created each day1
• 90% of the data in the world today has been created in the last two
years alone1
• More than 3.7 Billion humans use the internet 1
• Every minute Snapchat users share 527,760 photos, Users watch
4,146,600 YouTube videos, 456,000 tweets are sent on Twitter,
Instagram users post 46,740 photos
• Close to 3 Billion smartphone users in the world
1:Report as of 2018
There is tremendous scope to extract insights out of these data !
Hence the demand for Data Scientists.
R Venkat Raman
Let’s Demystify
R Venkat Raman
THE VARIOUS FACETS OF DATA SCIENCE?
R Venkat Raman
DATA SCIENCE – A TEAM EFFORT
Data Engineers Data Scientists Data Storyteller/TranslatorsSoftware Engineers
What They Do
Skill Set
Tools Used
• Create Data pipelines.
• Evaluate Databases
• Design Schemas
• Perform ETL
• Knowledge of Databases
• Scripting skills (Linux
commands)
• Knowledge of Cloud
technologies
• SQL commands
• Apply statistical/Machine
learning techniques to
solve business problems
• Perform R&D
• Innovate new solutions
• Develop Data science
products
• Knowledge of statistical
and mathematical
concepts
• Knowledge of various
statistical/ML algorithms
• Scripting skills
(R/Python)
• SQL commands
• Help design UI (front end
coding)
• Do backend coding
• Help deploy data science
solution in production
• Automate the entire
process
• Knowledge of
Programming concepts
• Programming languages
• Knowledge of Databases
• Knowledge of Restful
APIs
• Scripting skills (Linux
commands)
• Communicate Data Science
solutions in Business friendly/ non
technical terms
• Understand business requirements
and translate them to Data science
problems
• Design persuasive Data
visualizations
• High level understanding of
statistics and ML concepts
• Business acumen
• Good soft skills
• Creativity
• Persuasion and articulation
R Venkat Raman
WHY DATA SCIENTISTS ARE VALUED?
R Venkat Raman
THE DATA SCIENTIST TALENT STACK
IDEA INSPIRED BY SCOTT ADAM’S TALENT STACK THEORY
Knowledge of Inner
workings of Algorithms
Statistics/Maths Skills
Coding/ Technical Skills
Persuasion /Storytelling
R Venkat Raman
THE PATH TO BECOME A DATA SCIENTIST
• Can anyone become a Data Scientist ?
Yes
• Can a person become a Data Scientist just by doing some Moocs/short courses for a duration of 3-6 months ?
No
R Venkat Raman
HOW GOOD ARE THE MOOCS AND KAGGLE COMPETITIONS?
TOO MUCH SIGNALING
• There are thousands of courses available online now.
• While the courses may be useful to build knowledge or act as a
repository for revising concepts, the course certificates by
themselves does not guarantee to a person a Data Science Job
• Millions of people take the same courses and the solutions to the
questions of these Moocs are easily hackable or available
• Kaggle competitions are a competition more for showcasing processing
speed or ensemble techniques than intellectual rigor.
• The data is never clean in real life as given in Kaggle competitions
• But Kaggle kernels are useful
MOOCs
Kaggle Competitions
R Venkat Raman
GETTING HIRED AS A DATA SCIENTIST
HOW TO IMPROVE VISIBILITY AND BECOME EMPLOYABLE
• Focus on a specific area like NLP, Computer Vision,
Marketing Analytics, Classical Statistical applications. Try to
be specialist than a generalist.
• This strategy will work to gain entry into the field of
Data Science. But as one gains more experience, it
becomes harder to stay a specialist unless one is in
an academic framework.
• Write technical and non technical blogs
• Try the Feynman technique of learning things
• Do pet projects, develop small products, put the code on
GitHub
• Learn niche and complimentary skills like putting the code
in production or how to dockerize codes.
• Network with Data Scientists in Industry and Academia
• Follow the Data Scientists on Twitter or LinkedIn
• As an Institution or Individual, start Data Science podcasts
R Venkat Raman
BLUE OCEAN STRATEGY – BECOME A DATA SCIENCE TRANSLATOR
R Venkat Raman
REFERENCES & RESOURCES
Slide 4:
https://www.datarobot.com/wiki/data-science/
https://www.kdnuggets.com/2018/09/what-is-data-science.html
https://en.wikipedia.org/wiki/Data_science
https://www.digitalvidya.com/blog/what-is-data-science/
Slide 5:
https://www.datasciencecentral.com/profiles/blogs/difference-of-data-science-machine-learning-and-data-mining
https://towardsdatascience.com/introduction-to-statistics-e9d72d818745
https://towardsdatascience.com/introduction-to-statistics-e9d72d818745
Slide 6:
https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
https://twitter.com/josh_wills/status/198093512149958656?lang=en
Slide 8:
https://me.me/i/data-scientist-31-1-120-0-what-my-friends-think-15a983c0fbc54a91a76d8b25d1c5daaa
Slide 11:
http://blog.fusemachines.com/data-scientist-sexiest-job-21st-century/
Slide 14:
https://www.cnbc.com/2018/03/13/elon-musk-at-sxsw-a-i-is-more-dangerous-than-nuclear-weapons.html
https://www.newyorker.com/magazine/2018/05/14/how-frightened-should-we-be-of-ai
https://www.forbes.com/sites/forbestechcouncil/2017/12/04/why-we-should-be-afraid-of-intelligent-machines/#74fbc13f6be1
R Venkat Raman
REFERENCES & RESOURCES
Slide 15:
https://www.botxo.co/2018/09/03/our-take-on-the-gartner-hype-cycle/
Slide 17:
https://ourworldindata.org/technological-progress
Slide 18:
https://www.socialmediatoday.com/news/how-much-data-is-generated-every-minute-infographic-1/525692/
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-
should-read/#7cfa86f460ba
https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day/
Slide 20:
https://blog.jedox.com/artificial-intelligence-business-intelligence-fpa-part-2/
Slide 23:
https://www.amazon.com/Win-Bigly-Persuasion-World-Matter/dp/0735219710
Slide 27:
https://www.forbes.com/sites/bernardmarr/2018/03/12/forget-data-scientists-and-hire-a-data-translator-instead/#4b209212848a
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/analytics-translator
https://sloanreview.mit.edu/article/why-your-company-needs-data-translators/
R Venkat Raman
Thank You !!
R Venkat Raman

More Related Content

What's hot

Data science 101
Data science 101Data science 101
Data science 101
University of West Florida
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
Arnab Majumdar
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
Domino Data Lab
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
odsc
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
Andrei Savu
 
Data science
Data scienceData science
Data science
GitanshuSharma1
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
Persontyle
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
Καρολίνα Κάτι
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
HimanshuPise1
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
Mark West
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Sri Ambati
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
sreekanthricky
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
Accenture Analytics
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science Project
Eugene Mandel
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
Ajay Ohri
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Edureka!
 
Data Science
Data ScienceData Science
Data Science
Amit Singh
 

What's hot (20)

Data science 101
Data science 101Data science 101
Data science 101
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Data science
Data scienceData science
Data science
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science Project
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...
 
Data Science
Data ScienceData Science
Data Science
 

Similar to Demystifying Data Science

LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
Korkrid Akepanidtaworn
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
SayyedYusufali
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
AkhilGGM
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
VamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
saitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
SaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science training
DIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
VamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
KumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
DATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdfDATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdf
RahulTr22
 

Similar to Demystifying Data Science (20)

LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
DATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdfDATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdf
 

Recently uploaded

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Demystifying Data Science

  • 1. Demystifying Data Science A Realistic Perspective By R Venkat Raman
  • 2. • So Many Definitions • So Many Assumptions • So Many Expectations • So Much Hype WHY THE NEED TO DEMYSTIFY ? R Venkat Raman
  • 3. So Many Definitions R Venkat Raman
  • 4. WHAT IS DATA SCIENCE? “Data science is the field of study that combines domain expertise, programming skills, and knowledge of math and statistics to extract meaningful insights from data.” “Data science is the discipline of making data useful.” “Data Science as a multi-disciplinary subject encompasses the use of mathematics, statistics, and computer science to study and evaluate data. The key objective of Data Science is to extract valuable information for use in strategic decision making, product development, trend analysis and forecasting.” “Data science is a ‘concept to unify statistics, data analysis, machine learning and their related methods’ in order to ‘understand and analyze actual phenomena’ with data.” R Venkat Raman
  • 5. WHAT IS DATA SCIENCE? THE VENN DIAGRAMS R Venkat Raman
  • 6. WHO IS A DATA SCIENTIST? “An ideal data scientist is someone who has the both the engineering skills to acquire and manage large data sets, and also has the statistician’s skills to extract value from the large data sets and present that data to a large audience” “A data scientist is someone who blends, math, algorithms, and an understanding of human behaviour with the ability to hack systems together to get answers to interesting human questions from data” “A Data Scientist is a person who does Data Science” “Person who is better at statistics than any software engineer and better at software engineering than any statistician.” R Venkat Raman
  • 7. So Many Assumptions R Venkat Raman
  • 8. HOW PEOPLE PERCEIVE DATA SCIENCE R Venkat Raman
  • 9. So Many Expectations R Venkat Raman
  • 10. BECOMING DATA SCIENTIST – QUICKLY !! R Venkat Raman
  • 11. GETTING RICH QUICKLY !! R Venkat Raman
  • 12. So Much Hype R Venkat Raman
  • 13. CASE OF OLD WINE IN NEW BOTTLE ? R Venkat Raman
  • 15. THE HYPE CYCLE – WHERE ARE WE ? We are here R Venkat Raman
  • 16. Why This Buzz Now ? R Venkat Raman
  • 17. INCREASED STORAGE AND COMPUTING POWER THE STATISTICS – MACHINE LEARNING DIVERGENCE • In the 20th century, the computing and storage power was less. This required statisticians to infer a lot of things from a sample. Hence inferential statistics was heavily used and relied upon. • Fast forward now, the computing and storage power has increased substantially. This enabled machine learning and Deep learning to blossom. In Machine/Deep Learning, more data the better as the prediction improves with more quality training data. This thinking is divergent from a 20th century statistical thinking. R Venkat Raman
  • 18. EXPLOSION OF DATA • 2.5 quintillion bytes of data created each day1 • 90% of the data in the world today has been created in the last two years alone1 • More than 3.7 Billion humans use the internet 1 • Every minute Snapchat users share 527,760 photos, Users watch 4,146,600 YouTube videos, 456,000 tweets are sent on Twitter, Instagram users post 46,740 photos • Close to 3 Billion smartphone users in the world 1:Report as of 2018 There is tremendous scope to extract insights out of these data ! Hence the demand for Data Scientists. R Venkat Raman
  • 20. THE VARIOUS FACETS OF DATA SCIENCE? R Venkat Raman
  • 21. DATA SCIENCE – A TEAM EFFORT Data Engineers Data Scientists Data Storyteller/TranslatorsSoftware Engineers What They Do Skill Set Tools Used • Create Data pipelines. • Evaluate Databases • Design Schemas • Perform ETL • Knowledge of Databases • Scripting skills (Linux commands) • Knowledge of Cloud technologies • SQL commands • Apply statistical/Machine learning techniques to solve business problems • Perform R&D • Innovate new solutions • Develop Data science products • Knowledge of statistical and mathematical concepts • Knowledge of various statistical/ML algorithms • Scripting skills (R/Python) • SQL commands • Help design UI (front end coding) • Do backend coding • Help deploy data science solution in production • Automate the entire process • Knowledge of Programming concepts • Programming languages • Knowledge of Databases • Knowledge of Restful APIs • Scripting skills (Linux commands) • Communicate Data Science solutions in Business friendly/ non technical terms • Understand business requirements and translate them to Data science problems • Design persuasive Data visualizations • High level understanding of statistics and ML concepts • Business acumen • Good soft skills • Creativity • Persuasion and articulation R Venkat Raman
  • 22. WHY DATA SCIENTISTS ARE VALUED? R Venkat Raman
  • 23. THE DATA SCIENTIST TALENT STACK IDEA INSPIRED BY SCOTT ADAM’S TALENT STACK THEORY Knowledge of Inner workings of Algorithms Statistics/Maths Skills Coding/ Technical Skills Persuasion /Storytelling R Venkat Raman
  • 24. THE PATH TO BECOME A DATA SCIENTIST • Can anyone become a Data Scientist ? Yes • Can a person become a Data Scientist just by doing some Moocs/short courses for a duration of 3-6 months ? No R Venkat Raman
  • 25. HOW GOOD ARE THE MOOCS AND KAGGLE COMPETITIONS? TOO MUCH SIGNALING • There are thousands of courses available online now. • While the courses may be useful to build knowledge or act as a repository for revising concepts, the course certificates by themselves does not guarantee to a person a Data Science Job • Millions of people take the same courses and the solutions to the questions of these Moocs are easily hackable or available • Kaggle competitions are a competition more for showcasing processing speed or ensemble techniques than intellectual rigor. • The data is never clean in real life as given in Kaggle competitions • But Kaggle kernels are useful MOOCs Kaggle Competitions R Venkat Raman
  • 26. GETTING HIRED AS A DATA SCIENTIST HOW TO IMPROVE VISIBILITY AND BECOME EMPLOYABLE • Focus on a specific area like NLP, Computer Vision, Marketing Analytics, Classical Statistical applications. Try to be specialist than a generalist. • This strategy will work to gain entry into the field of Data Science. But as one gains more experience, it becomes harder to stay a specialist unless one is in an academic framework. • Write technical and non technical blogs • Try the Feynman technique of learning things • Do pet projects, develop small products, put the code on GitHub • Learn niche and complimentary skills like putting the code in production or how to dockerize codes. • Network with Data Scientists in Industry and Academia • Follow the Data Scientists on Twitter or LinkedIn • As an Institution or Individual, start Data Science podcasts R Venkat Raman
  • 27. BLUE OCEAN STRATEGY – BECOME A DATA SCIENCE TRANSLATOR R Venkat Raman
  • 28. REFERENCES & RESOURCES Slide 4: https://www.datarobot.com/wiki/data-science/ https://www.kdnuggets.com/2018/09/what-is-data-science.html https://en.wikipedia.org/wiki/Data_science https://www.digitalvidya.com/blog/what-is-data-science/ Slide 5: https://www.datasciencecentral.com/profiles/blogs/difference-of-data-science-machine-learning-and-data-mining https://towardsdatascience.com/introduction-to-statistics-e9d72d818745 https://towardsdatascience.com/introduction-to-statistics-e9d72d818745 Slide 6: https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/ https://twitter.com/josh_wills/status/198093512149958656?lang=en Slide 8: https://me.me/i/data-scientist-31-1-120-0-what-my-friends-think-15a983c0fbc54a91a76d8b25d1c5daaa Slide 11: http://blog.fusemachines.com/data-scientist-sexiest-job-21st-century/ Slide 14: https://www.cnbc.com/2018/03/13/elon-musk-at-sxsw-a-i-is-more-dangerous-than-nuclear-weapons.html https://www.newyorker.com/magazine/2018/05/14/how-frightened-should-we-be-of-ai https://www.forbes.com/sites/forbestechcouncil/2017/12/04/why-we-should-be-afraid-of-intelligent-machines/#74fbc13f6be1 R Venkat Raman
  • 29. REFERENCES & RESOURCES Slide 15: https://www.botxo.co/2018/09/03/our-take-on-the-gartner-hype-cycle/ Slide 17: https://ourworldindata.org/technological-progress Slide 18: https://www.socialmediatoday.com/news/how-much-data-is-generated-every-minute-infographic-1/525692/ https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone- should-read/#7cfa86f460ba https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day/ Slide 20: https://blog.jedox.com/artificial-intelligence-business-intelligence-fpa-part-2/ Slide 23: https://www.amazon.com/Win-Bigly-Persuasion-World-Matter/dp/0735219710 Slide 27: https://www.forbes.com/sites/bernardmarr/2018/03/12/forget-data-scientists-and-hire-a-data-translator-instead/#4b209212848a https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/analytics-translator https://sloanreview.mit.edu/article/why-your-company-needs-data-translators/ R Venkat Raman
  • 30. Thank You !! R Venkat Raman

Editor's Notes

  1. Sources : https://www.datarobot.com/wiki/data-science/ https://www.kdnuggets.com/2018/09/what-is-data-science.html https://en.wikipedia.org/wiki/Data_science https://www.digitalvidya.com/blog/what-is-data-science/
  2. Source : https://www.datasciencecentral.com/profiles/blogs/difference-of-data-science-machine-learning-and-data-mining https://towardsdatascience.com/introduction-to-statistics-e9d72d818745 https://towardsdatascience.com/introduction-to-statistics-e9d72d818745
  3. Sources : https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/ https://twitter.com/josh_wills/status/198093512149958656?lang=en
  4. Sources: https://me.me/i/data-scientist-31-1-120-0-what-my-friends-think-15a983c0fbc54a91a76d8b25d1c5daaa
  5. Source : http://blog.fusemachines.com/data-scientist-sexiest-job-21st-century/
  6. Source : https://www.cnbc.com/2018/03/13/elon-musk-at-sxsw-a-i-is-more-dangerous-than-nuclear-weapons.html https://www.newyorker.com/magazine/2018/05/14/how-frightened-should-we-be-of-ai https://www.forbes.com/sites/forbestechcouncil/2017/12/04/why-we-should-be-afraid-of-intelligent-machines/#74fbc13f6be1
  7. Source : https://www.botxo.co/2018/09/03/our-take-on-the-gartner-hype-cycle/
  8. Source: https://ourworldindata.org/technological-progress
  9. Source: https://www.socialmediatoday.com/news/how-much-data-is-generated-every-minute-infographic-1/525692/ https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#7cfa86f460ba https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day/
  10. Source : https://blog.jedox.com/artificial-intelligence-business-intelligence-fpa-part-2/
  11. Source : https://blog.jedox.com/artificial-intelligence-business-intelligence-fpa-part-2/
  12. Source: Inspired by Scott Adams Talent stack idea from his book – Win Bigly https://www.amazon.com/Win-Bigly-Persuasion-World-Matter/dp/0735219710
  13. Sources: https://www.forbes.com/sites/bernardmarr/2018/03/12/forget-data-scientists-and-hire-a-data-translator-instead/#4b209212848a https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/analytics-translator https://sloanreview.mit.edu/article/why-your-company-needs-data-translators/