SlideShare a Scribd company logo
1 of 30
DEMYSTIFYING
DATA SCIENCE &
ANALYTICS
757ColorCoded Guillermo A. Fisher
2
Guillermo A. Fisher
Husband, father, software engineer,
data wrangler.
https://bklyn.dev
STUFF I DO
Data & Analytics Cloud Enablement Leader
Service Delivery, Project Sponsorship, Technical
Leadership, People Management
https://www.cloudreach.com/careers/
3
STUFF THAT’S HAPPENING
757ColorCoded
We exist to educate and empower local people of color to
achieve careers in technology and improve their lives.
May 25, 2019 at 2:00 PM: Build a Website, Part II: WordPress 101
Slover Library (Lower Level), 235 E Plume St, Norfolk, VA 23510
https://757colorcoded.org
https://757ColorCoded.slack.com
4
STUFF THAT’S HAPPENING
RevolutionConf
RevolutionConf is a two-day, platform and language
agnostic software development conference.
June 6-7, 2019
Wyndham Oceanfront, Virginia Beach, VA 23451
https://revolutionconf.com
https://diversity.revolutionconf.com
5
STUFF THAT’S HAPPENING
SQL Saturday
A day of Data Platform and SQL Server training for all levels. Admittance
to this event is free, however there is a $12 fee for lunch &
refreshments. Please register soon as seating is limited.
June 8, 2019
ECPI University, 5555 Greenwich Road, Virginia Beach, VA 23462
https://www.sqlsaturday.com/839/EventHome.aspx
6
AGENDA
⊗ Some Definitions
⊗ The Big Data Problem
⊗ The Data Science Hierarchy of Needs
⊗ Data Science & Analytics Roles
7
SOME DEFINITIONS
What do all of those buzzwords mean?
Big data is a term used to refer
to data sets that are too large
or complex for traditional data-
processing application software
to adequately deal with.
9
https://en.wikipedia.org/wiki/Big_data
TERABYTES
⊗ 1 TB = 1,500 CD-ROMs
⊗ 2 TB = 130,000 digital photos
⊗ 10 TB = 1 year of data from the
Hubble Space Telescope
10
https://www.lifewire.com/terabytes-gigabytes-amp-petabytes-how-big-are-they-4125169
PETABYTES
⊗ 1 PB = 20,000,000 filing cabinets
⊗ 1 PB = 10,000 hours of TV
⊗ 2.5 PB = capacity of a human brain
11
https://www.makeuseof.com/tag/memory-sizes-gigabytes-terabytes-petabytes/
Data analytics is the pursuit of
extracting meaning from raw
data using specialized computer
systems. These systems
transform, organize, and model
the data to draw conclusions
and identify patterns...
12
https://www.informatica.com/services-and-training/glossary-of-terms/data-analytics-definition.html
13
https://www.tableau.com/en-gb/products/dashboard-starters
Data science is an
interdisciplinary field that uses
scientific methods, processes,
algorithms and systems to
extract knowledge and insights
from data in various forms, both
structured and unstructured…
14
https://en.wikipedia.org/wiki/Data_science
15
https://towardsdatascience.com/data-science-interview-guide-4ee9f5dc778
Machine learning (ML) is the scientific
study of algorithms and statistical
models that computer systems use to
effectively perform a specific task
without using explicit instructions,
relying on patterns and inference
instead.
16
https://en.wikipedia.org/wiki/Machine_learning
Artificial intelligence: the theory and
development of computer systems able
to perform tasks that normally require
human intelligence, such as visual
perception, speech recognition,
decision-making, and translation
between languages.
17
https://en.oxforddictionaries.com/definition/artificial_intelligence
THE BIG DATA PROBLEM
Why all the fuss?
The never-ending stream of
information is incredibly useful
for businesses, but it can also
be a challenge to draw relevant
insights from such a large data
pool.
19
https://marketinginsidergroup.com/strategy/big-data-trends-you-should-know-about-in-2018-infographic/
20
THE DATA SCIENCE
HIERARCHY OF NEEDS
When is an organization ready for Data Science
& Analytics?
22
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
DATA SCIENCE &
ANALYTICS ROLES
What do Data teams look like?
COMMON SKILLS
Skills
⊗ Excellent written & verbal communication
⊗ Effective collaboration
⊗ Database knowledge (MySQL, PostgreSQL, Cassandra, MongoDB, Redis)
⊗ Proficiency in SQL
24
DATA ENGINEER
Responsibilities
⊗ Build and maintain architectures that
support big data systems such as
ETL pipelines.
⊗ Create applications and tools to
support data scientists and data
analysts.
⊗ Collaborate with data scientists to
build algorithms to derive meaning
from data sets.
Skills
⊗ Python, Java, Scala, C++
⊗ Spark, PySpark
⊗ Jupyter notebooks
⊗ Data warehousing
⊗ Data storage
⊗ ETL
⊗ Basic machine learning (Tensorflow,
PyTorch, MXNet)
25
DATA ANALYST
Responsibilities
⊗ Collect and clean data from disparate
data sources for analysis.
⊗ Identify, analyze, and interpret data to
uncover patterns or trends.
⊗ Generate domain-specific reporting,
visualizations, and dashboards.
Skills
⊗ Business Intelligence & Data
Visualization Tools (Tableau, Power
BI, Qlik)
⊗ Microsoft Excel
⊗ Analytics Tools (Google Analytics,
Google Tag Manager, etc.)
26
DATA SCIENTIST
Responsibilities
⊗ Collect and clean data from disparate
data sources for analysis.
⊗ Train and deploy models to predict
outcomes.
⊗ Communicate findings to business
stakeholders.
Skills
⊗ R, Python, Java, Scala
⊗ Jupyter notebooks
⊗ Machine learning (Tensorflow,
PyTorch, MXNet)
⊗ Statistics, Linear Algebra
⊗ Business Intelligence & Data
Visualization Tools (Tableau, Power
BI, Qlik)
27
28
GET STARTED
29
Courses
⊗ Kaggle: Your Home for Data Science
⊗ Coursera
⊗ Udemy
⊗ A Cloud Guru ($29/month)
Certifications
⊗ Microsoft Professional Program in Big Data* ($99)
⊗ AWS Certified Cloud Practitioner ($100)
⊗ Google Cloud Certified Associate Cloud Engineer ($125)
THANKS!
Any questions?
https://bklyn.dev
30

More Related Content

Similar to Demystifying Data Science & Analytics - 757ColorCoded 2019

Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Why SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data StrategyWhy SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data StrategySemantic Web Company
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outlineIan Duncan
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataPat Kenny
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationInside Analysis
 
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaExperiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaNeo4j
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Discover Pinterest
 
Create a Data Science Lab with Microsoft and Open Source tools
Create a Data Science Lab with Microsoft and Open Source toolsCreate a Data Science Lab with Microsoft and Open Source tools
Create a Data Science Lab with Microsoft and Open Source toolsMarcel Franke
 

Similar to Demystifying Data Science & Analytics - 757ColorCoded 2019 (20)

Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Why SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data StrategyWhy SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data Strategy
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open Data
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
AI Super computer update
AI Super computer update AI Super computer update
AI Super computer update
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaExperiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Create a Data Science Lab with Microsoft and Open Source tools
Create a Data Science Lab with Microsoft and Open Source toolsCreate a Data Science Lab with Microsoft and Open Source tools
Create a Data Science Lab with Microsoft and Open Source tools
 

More from Guillermo A. Fisher

The Beginner's Guide to Data Lakes in AWS
The Beginner's Guide to Data Lakes in AWSThe Beginner's Guide to Data Lakes in AWS
The Beginner's Guide to Data Lakes in AWSGuillermo A. Fisher
 
Wrestling with Cultural Identity - 757ColorCoded 2018
Wrestling with Cultural Identity - 757ColorCoded 2018Wrestling with Cultural Identity - 757ColorCoded 2018
Wrestling with Cultural Identity - 757ColorCoded 2018Guillermo A. Fisher
 
Release Management - DE IT Summit 2014
Release Management - DE IT Summit 2014Release Management - DE IT Summit 2014
Release Management - DE IT Summit 2014Guillermo A. Fisher
 
Building Valuable Restful APIs - HRPHP 2015
Building Valuable Restful APIs - HRPHP 2015Building Valuable Restful APIs - HRPHP 2015
Building Valuable Restful APIs - HRPHP 2015Guillermo A. Fisher
 
You're Probably Brilliant - Norfolk.js 2017 Lightning Talk
You're Probably Brilliant - Norfolk.js 2017 Lightning TalkYou're Probably Brilliant - Norfolk.js 2017 Lightning Talk
You're Probably Brilliant - Norfolk.js 2017 Lightning TalkGuillermo A. Fisher
 
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016Guillermo A. Fisher
 

More from Guillermo A. Fisher (7)

Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to Scrum
 
The Beginner's Guide to Data Lakes in AWS
The Beginner's Guide to Data Lakes in AWSThe Beginner's Guide to Data Lakes in AWS
The Beginner's Guide to Data Lakes in AWS
 
Wrestling with Cultural Identity - 757ColorCoded 2018
Wrestling with Cultural Identity - 757ColorCoded 2018Wrestling with Cultural Identity - 757ColorCoded 2018
Wrestling with Cultural Identity - 757ColorCoded 2018
 
Release Management - DE IT Summit 2014
Release Management - DE IT Summit 2014Release Management - DE IT Summit 2014
Release Management - DE IT Summit 2014
 
Building Valuable Restful APIs - HRPHP 2015
Building Valuable Restful APIs - HRPHP 2015Building Valuable Restful APIs - HRPHP 2015
Building Valuable Restful APIs - HRPHP 2015
 
You're Probably Brilliant - Norfolk.js 2017 Lightning Talk
You're Probably Brilliant - Norfolk.js 2017 Lightning TalkYou're Probably Brilliant - Norfolk.js 2017 Lightning Talk
You're Probably Brilliant - Norfolk.js 2017 Lightning Talk
 
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Demystifying Data Science & Analytics - 757ColorCoded 2019

  • 2. 2 Guillermo A. Fisher Husband, father, software engineer, data wrangler. https://bklyn.dev
  • 3. STUFF I DO Data & Analytics Cloud Enablement Leader Service Delivery, Project Sponsorship, Technical Leadership, People Management https://www.cloudreach.com/careers/ 3
  • 4. STUFF THAT’S HAPPENING 757ColorCoded We exist to educate and empower local people of color to achieve careers in technology and improve their lives. May 25, 2019 at 2:00 PM: Build a Website, Part II: WordPress 101 Slover Library (Lower Level), 235 E Plume St, Norfolk, VA 23510 https://757colorcoded.org https://757ColorCoded.slack.com 4
  • 5. STUFF THAT’S HAPPENING RevolutionConf RevolutionConf is a two-day, platform and language agnostic software development conference. June 6-7, 2019 Wyndham Oceanfront, Virginia Beach, VA 23451 https://revolutionconf.com https://diversity.revolutionconf.com 5
  • 6. STUFF THAT’S HAPPENING SQL Saturday A day of Data Platform and SQL Server training for all levels. Admittance to this event is free, however there is a $12 fee for lunch & refreshments. Please register soon as seating is limited. June 8, 2019 ECPI University, 5555 Greenwich Road, Virginia Beach, VA 23462 https://www.sqlsaturday.com/839/EventHome.aspx 6
  • 7. AGENDA ⊗ Some Definitions ⊗ The Big Data Problem ⊗ The Data Science Hierarchy of Needs ⊗ Data Science & Analytics Roles 7
  • 8. SOME DEFINITIONS What do all of those buzzwords mean?
  • 9. Big data is a term used to refer to data sets that are too large or complex for traditional data- processing application software to adequately deal with. 9 https://en.wikipedia.org/wiki/Big_data
  • 10. TERABYTES ⊗ 1 TB = 1,500 CD-ROMs ⊗ 2 TB = 130,000 digital photos ⊗ 10 TB = 1 year of data from the Hubble Space Telescope 10 https://www.lifewire.com/terabytes-gigabytes-amp-petabytes-how-big-are-they-4125169
  • 11. PETABYTES ⊗ 1 PB = 20,000,000 filing cabinets ⊗ 1 PB = 10,000 hours of TV ⊗ 2.5 PB = capacity of a human brain 11 https://www.makeuseof.com/tag/memory-sizes-gigabytes-terabytes-petabytes/
  • 12. Data analytics is the pursuit of extracting meaning from raw data using specialized computer systems. These systems transform, organize, and model the data to draw conclusions and identify patterns... 12 https://www.informatica.com/services-and-training/glossary-of-terms/data-analytics-definition.html
  • 14. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured… 14 https://en.wikipedia.org/wiki/Data_science
  • 16. Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. 16 https://en.wikipedia.org/wiki/Machine_learning
  • 17. Artificial intelligence: the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. 17 https://en.oxforddictionaries.com/definition/artificial_intelligence
  • 18. THE BIG DATA PROBLEM Why all the fuss?
  • 19. The never-ending stream of information is incredibly useful for businesses, but it can also be a challenge to draw relevant insights from such a large data pool. 19 https://marketinginsidergroup.com/strategy/big-data-trends-you-should-know-about-in-2018-infographic/
  • 20. 20
  • 21. THE DATA SCIENCE HIERARCHY OF NEEDS When is an organization ready for Data Science & Analytics?
  • 23. DATA SCIENCE & ANALYTICS ROLES What do Data teams look like?
  • 24. COMMON SKILLS Skills ⊗ Excellent written & verbal communication ⊗ Effective collaboration ⊗ Database knowledge (MySQL, PostgreSQL, Cassandra, MongoDB, Redis) ⊗ Proficiency in SQL 24
  • 25. DATA ENGINEER Responsibilities ⊗ Build and maintain architectures that support big data systems such as ETL pipelines. ⊗ Create applications and tools to support data scientists and data analysts. ⊗ Collaborate with data scientists to build algorithms to derive meaning from data sets. Skills ⊗ Python, Java, Scala, C++ ⊗ Spark, PySpark ⊗ Jupyter notebooks ⊗ Data warehousing ⊗ Data storage ⊗ ETL ⊗ Basic machine learning (Tensorflow, PyTorch, MXNet) 25
  • 26. DATA ANALYST Responsibilities ⊗ Collect and clean data from disparate data sources for analysis. ⊗ Identify, analyze, and interpret data to uncover patterns or trends. ⊗ Generate domain-specific reporting, visualizations, and dashboards. Skills ⊗ Business Intelligence & Data Visualization Tools (Tableau, Power BI, Qlik) ⊗ Microsoft Excel ⊗ Analytics Tools (Google Analytics, Google Tag Manager, etc.) 26
  • 27. DATA SCIENTIST Responsibilities ⊗ Collect and clean data from disparate data sources for analysis. ⊗ Train and deploy models to predict outcomes. ⊗ Communicate findings to business stakeholders. Skills ⊗ R, Python, Java, Scala ⊗ Jupyter notebooks ⊗ Machine learning (Tensorflow, PyTorch, MXNet) ⊗ Statistics, Linear Algebra ⊗ Business Intelligence & Data Visualization Tools (Tableau, Power BI, Qlik) 27
  • 28. 28
  • 29. GET STARTED 29 Courses ⊗ Kaggle: Your Home for Data Science ⊗ Coursera ⊗ Udemy ⊗ A Cloud Guru ($29/month) Certifications ⊗ Microsoft Professional Program in Big Data* ($99) ⊗ AWS Certified Cloud Practitioner ($100) ⊗ Google Cloud Certified Associate Cloud Engineer ($125)