SlideShare a Scribd company logo
Tools for Data Science
Vadim Y. Bichutskiy
@vybstat
Data Science Seminar, GMU
April 10, 2015
So you want to be a data scientist?
—  Good news
—  Data is everywhere
—  “Big Data”, “Analytics”, “Data Science” is changing the world
—  Hot and sexy
—  Lots of opportunity to get creative and innovate
—  Many open problems
—  Fun!
—  Demand is off the charts / low supply
—  High salaries
—  Bad news
—  Requires lots of education: PhD is NOT enough
—  Can be overwhelming and stressful
—  Theory, practical tools, experience
—  Long working hours
—  Not enough sleep
—  Bad for health?
—  Versatile, flexible, curious
—  Continuous training
https://www.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist
What’s Data Science?
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
Data scientists: “Create order from chaos”
Statistics
courses
Data collection, processing, cleaning is 80% of the effort
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
Stats/CSI PhD
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
“Data science is a team sport” --DJ Patil
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
http://www.datasciencecentral.com/profiles/blogs/what-technology-tool-skills-do-data-scientists-jobs-require
http://www.datasciencecentral.com/profiles/blogs/what-technology-tool-skills-do-data-scientists-jobs-require
Data Science Skills
—  Core
—  R
—  Python
—  SQL/NoSQL/database concepts
—  Unix command line
—  Statistics/machine learning/CS
—  Graph Theory/Networks
—  Data visualization/dashboards: Tableau, D3
—  Data representation: JSON, XML
—  Communication, domain expertise
—  Project/position dependent
—  Java/C++
—  Amazon Web Services/Cloud computing
—  Hadoop
—  JavaScript/PHP/Web frameworks
—  Emerging
—  Scala
—  Swift
—  Spark/Cluster computing
—  Real-time Analytics
—  Docker, Vagrant
Tools Usage
http://www.oreilly.com/data/free/2014-data-science-salary-survey.csp
Tool Salaries
http://www.oreilly.com/data/free/2014-data-science-salary-survey.csp
“Microsoft-Excel-SQL”
“Hadoop-Java-Cloud Computing”
“R-Python-Analytics”
“MySQL-D3-JavaScript”
“Old tools”
Amazon MLaaS
http://aws.amazon.com/blogs/aws/amazon-machine-learning-make-data-driven-decisions-at-scale/
Resources (1)
—  R
—  http://www.r-project.org/
—  http://www.rstudio.com/
—  Python
—  https://www.python.org/
—  JSON
—  http://json.org/
—  Amazon Web Services
—  http://aws.amazon.com/
—  http://aws.amazon.com/blogs/aws/
—  Hadoop
—  https://hadoop.apache.org/
Resources (2)
—  Scala
—  http://www.scala-lang.org/
—  Spark
—  https://spark.apache.org/
—  https://spark.apache.org/docs/latest/
—  Docker
—  https://www.docker.com/
—  Vagrant
—  https://www.vagrantup.com/
—  Swift
—  apple.co/1CAAKQA 

More Related Content

What's hot

Data science
Data scienceData science
Data science
SwapnilDahake2
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
RohithND
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
SSaudia
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
Kenny Huang Ph.D.
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
azuyo.com
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Data analytics
Data analyticsData analytics
Data analytics
davidfergarcia
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
PROWEBSCRAPER
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
Sreenatha Reddy K R
 
Data Science
Data ScienceData Science
Data Science
Amit Singh
 
Computer science presentation
Computer science presentationComputer science presentation
Computer science presentation
dehrabf
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
sreekanthricky
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 

What's hot (20)

Data cleansing
Data cleansingData cleansing
Data cleansing
 
Data science
Data scienceData science
Data science
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Data analytics
Data analyticsData analytics
Data analytics
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data Science
Data ScienceData Science
Data Science
 
Computer science presentation
Computer science presentationComputer science presentation
Computer science presentation
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 

Similar to Data Science Tools

from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
Data science for developers
Data science for developersData science for developers
Data science for developers
Patricio Del Boca
 
Big Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataBig Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big Data
Sylvia Ogweng
 
From Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data ScienceFrom Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data Science
Juuso Parkkinen
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
University of Washington
 
Data science
Data scienceData science
Data science
NehaPatil182
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
Lin Todd
 
U K O L N Feb 08
U K O L N  Feb 08U K O L N  Feb 08
U K O L N Feb 08
Cameron Neylon
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
DATAVERSITY
 
Act teacherlibrarians2016
Act teacherlibrarians2016Act teacherlibrarians2016
Act teacherlibrarians2016
ARDC
 
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Suna Gurol
 
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
Eight {So Far} Things  I Wish I had Thought About 40 Years AgoEight {So Far} Things  I Wish I had Thought About 40 Years Ago
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
Philip Bourne
 
2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education
EADTU
 
Book Summary : Everybody Lies
Book Summary : Everybody LiesBook Summary : Everybody Lies
Book Summary : Everybody Lies
Rahul Rishi
 
Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?
Dragan Gasevic
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
Philip Bourne
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
01archivist
 
introds_110116.pdf
introds_110116.pdfintrods_110116.pdf
introds_110116.pdf
Osmania University
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
Varsha Khodiyar
 

Similar to Data Science Tools (20)

from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Data science for developers
Data science for developersData science for developers
Data science for developers
 
Big Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataBig Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big Data
 
From Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data ScienceFrom Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data Science
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Data science
Data scienceData science
Data science
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
U K O L N Feb 08
U K O L N  Feb 08U K O L N  Feb 08
U K O L N Feb 08
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Act teacherlibrarians2016
Act teacherlibrarians2016Act teacherlibrarians2016
Act teacherlibrarians2016
 
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
 
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
Eight {So Far} Things  I Wish I had Thought About 40 Years AgoEight {So Far} Things  I Wish I had Thought About 40 Years Ago
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
 
2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education
 
Book Summary : Everybody Lies
Book Summary : Everybody LiesBook Summary : Everybody Lies
Book Summary : Everybody Lies
 
Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
introds_110116.pdf
introds_110116.pdfintrods_110116.pdf
introds_110116.pdf
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 

Data Science Tools