SlideShare a Scribd company logo
1 of 23
Download to read offline
Tools for Data Science
Vadim Y. Bichutskiy
@vybstat
Data Science Seminar, GMU
April 10, 2015
So you want to be a data scientist?
—  Good news
—  Data is everywhere
—  “Big Data”, “Analytics”, “Data Science” is changing the world
—  Hot and sexy
—  Lots of opportunity to get creative and innovate
—  Many open problems
—  Fun!
—  Demand is off the charts / low supply
—  High salaries
—  Bad news
—  Requires lots of education: PhD is NOT enough
—  Can be overwhelming and stressful
—  Theory, practical tools, experience
—  Long working hours
—  Not enough sleep
—  Bad for health?
—  Versatile, flexible, curious
—  Continuous training
https://www.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist
What’s Data Science?
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
Data scientists: “Create order from chaos”
Statistics
courses
Data collection, processing, cleaning is 80% of the effort
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
Stats/CSI PhD
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
“Data science is a team sport” --DJ Patil
O'Neil, Cathy and Schutt, Rachel, Doing Data Science: Straight Talk from the Frontline, O’Reilly, 2014
http://www.datasciencecentral.com/profiles/blogs/what-technology-tool-skills-do-data-scientists-jobs-require
http://www.datasciencecentral.com/profiles/blogs/what-technology-tool-skills-do-data-scientists-jobs-require
Data Science Skills
—  Core
—  R
—  Python
—  SQL/NoSQL/database concepts
—  Unix command line
—  Statistics/machine learning/CS
—  Graph Theory/Networks
—  Data visualization/dashboards: Tableau, D3
—  Data representation: JSON, XML
—  Communication, domain expertise
—  Project/position dependent
—  Java/C++
—  Amazon Web Services/Cloud computing
—  Hadoop
—  JavaScript/PHP/Web frameworks
—  Emerging
—  Scala
—  Swift
—  Spark/Cluster computing
—  Real-time Analytics
—  Docker, Vagrant
Tools Usage
http://www.oreilly.com/data/free/2014-data-science-salary-survey.csp
Tool Salaries
http://www.oreilly.com/data/free/2014-data-science-salary-survey.csp
“Microsoft-Excel-SQL”
“Hadoop-Java-Cloud Computing”
“R-Python-Analytics”
“MySQL-D3-JavaScript”
“Old tools”
Amazon MLaaS
http://aws.amazon.com/blogs/aws/amazon-machine-learning-make-data-driven-decisions-at-scale/
Resources (1)
—  R
—  http://www.r-project.org/
—  http://www.rstudio.com/
—  Python
—  https://www.python.org/
—  JSON
—  http://json.org/
—  Amazon Web Services
—  http://aws.amazon.com/
—  http://aws.amazon.com/blogs/aws/
—  Hadoop
—  https://hadoop.apache.org/
Resources (2)
—  Scala
—  http://www.scala-lang.org/
—  Spark
—  https://spark.apache.org/
—  https://spark.apache.org/docs/latest/
—  Docker
—  https://www.docker.com/
—  Vagrant
—  https://www.vagrantup.com/
—  Swift
—  apple.co/1CAAKQA 

More Related Content

What's hot

Change management success for data governance
Change management success for data governanceChange management success for data governance
Change management success for data governanceReid Elliott
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Denodo
 
Information Assurance And Security - Chapter 1 - Lesson 4
Information Assurance And Security - Chapter 1 - Lesson 4Information Assurance And Security - Chapter 1 - Lesson 4
Information Assurance And Security - Chapter 1 - Lesson 4MLG College of Learning, Inc
 
data mining
data miningdata mining
data mininguoitc
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | EdurekaTop 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | EdurekaEdureka!
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentationNishabhanot1
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Burak S. Arikan
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyDATAVERSITY
 

What's hot (20)

Change management success for data governance
Change management success for data governanceChange management success for data governance
Change management success for data governance
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
 
Data literacy
Data literacyData literacy
Data literacy
 
Information Assurance And Security - Chapter 1 - Lesson 4
Information Assurance And Security - Chapter 1 - Lesson 4Information Assurance And Security - Chapter 1 - Lesson 4
Information Assurance And Security - Chapter 1 - Lesson 4
 
Data Visualization.pptx
Data Visualization.pptxData Visualization.pptx
Data Visualization.pptx
 
data mining
data miningdata mining
data mining
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | EdurekaTop 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Data Science
Data ScienceData Science
Data Science
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentation
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data Strategy
 
Data analytics
Data analyticsData analytics
Data analytics
 

Similar to Data Science Tools

from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
Big Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataBig Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataSylvia Ogweng
 
From Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data ScienceFrom Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data ScienceJuuso Parkkinen
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallDATAVERSITY
 
Act teacherlibrarians2016
Act teacherlibrarians2016Act teacherlibrarians2016
Act teacherlibrarians2016ARDC
 
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...Suna Gurol
 
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
Eight {So Far} Things  I Wish I had Thought About 40 Years AgoEight {So Far} Things  I Wish I had Thought About 40 Years Ago
Eight {So Far} Things I Wish I had Thought About 40 Years AgoPhilip Bourne
 
2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher educationEADTU
 
Book Summary : Everybody Lies
Book Summary : Everybody LiesBook Summary : Everybody Lies
Book Summary : Everybody LiesRahul Rishi
 
Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?Dragan Gasevic
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?Varsha Khodiyar
 

Similar to Data Science Tools (20)

from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Data science for developers
Data science for developersData science for developers
Data science for developers
 
Big Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big DataBig Data Privacy - Society Issues + Big Data
Big Data Privacy - Society Issues + Big Data
 
From Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data ScienceFrom Academia to Industry, Reflections on a Career in Data Science
From Academia to Industry, Reflections on a Career in Data Science
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Data science
Data scienceData science
Data science
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
U K O L N Feb 08
U K O L N  Feb 08U K O L N  Feb 08
U K O L N Feb 08
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Act teacherlibrarians2016
Act teacherlibrarians2016Act teacherlibrarians2016
Act teacherlibrarians2016
 
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
Social Media at Fred Hutchinson Cancer Research Center - The Impossible CAN B...
 
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
Eight {So Far} Things  I Wish I had Thought About 40 Years AgoEight {So Far} Things  I Wish I had Thought About 40 Years Ago
Eight {So Far} Things I Wish I had Thought About 40 Years Ago
 
2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education2015 d. gašević an opportunity for higher education
2015 d. gašević an opportunity for higher education
 
Book Summary : Everybody Lies
Book Summary : Everybody LiesBook Summary : Everybody Lies
Book Summary : Everybody Lies
 
Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?Learning analytics: An opportunity for higher education?
Learning analytics: An opportunity for higher education?
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
introds_110116.pdf
introds_110116.pdfintrods_110116.pdf
introds_110116.pdf
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 

Data Science Tools