The document provides an overview of data science including definitions, careers, applications and tools. It defines data science, describes the typical steps in a data science project including understanding the problem, acquiring and preparing data, analyzing data, modeling data, visualizing results and deploying solutions. It also discusses careers in data engineering, data analysis, machine learning engineering and as a data scientist. Finally, it covers popular tools and frameworks used in data science like Anaconda, Jupyter Notebooks and examples of data science applications.
2. 1. Definição de Data Science
2. Carreiras em Data Science
3. AI vs ML vs DL vs DS
4. Aplicações
5. Ferramentas de DS
Programação
3. Definição
Quais são os passos em um projeto de Data Science?
- Definir a pergunta que queremos responder, os objetivos que queremos
alcançar: definir o problema de negócio
- Quais dados são pertinentes (muito dado é desperdício) e como cpodemos
coletar?
- Análise exploratória e processamento, tratar os dados, limpeza de dados,
- Permitir que pessoas que não tem conhecimento consigam entender os
dados, gráficos?
- Modelagem de dados (IA, machine learning)
- Comunicar as descobertas (visualização por dashboards)
- Deploy, implementação de software, automatizar a nossa solução
5. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
6. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
7. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
8. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
9. Definição
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
10. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
14. Quantidade de Código
"The ML code is at the heart of a real-world ML production system, but that box often represents only
5% or less of the overall code of that total ML production system."
Fonte: https://developers.google.com/machine-learning/crash-course/production-ml-systems
16. • Engenheira de Dados
• Analista de Dados/Business Intelligence
• Engenheira de ML/DL
• Cientista de dados
Carreiras em Data Science
17. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
18. Definição: Engenheira de dados
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
19. 1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
Definição: Analista de dados
20. Definição: Engenheira de ML/DL
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
21. Definição: Cientista de dados
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
22. • Engenheira de Dados + Big Data
• Analista de Dados/Business Intelligence + Big Data
• Engenheira de ML/DL + Big Data
• Cientista de Dados + Big Data
Carreiras em Data Science
23. Definição
1. Understand a
Business
Problem
• Ask questions and
define objectives
2. Data
Acquisition
• Web Scraping
• APIS
• Direct download from
URLs
• Researches and
crowdworkers
(amazon Turk)
3. Data
Preparation
• Cleaning
• Transformation
• Very time consuming
4. Exploratory
Data Analysis
• Visualization
• Statistics
• Distributions
• Feature selection
6. Visualization
and
Communication
• Documentation
• Talk back to business
• Reports and
Dashboards
7. Deploy and
Maintenance
• Production env
• Real time analytics
• Maintenance of
Projects performance
5. Data
Modeling
• Machine Learning
• Train a model that
best fits the business
requirements
• Prediction models
Big Data
• Data too large for
a spreadsheet
• Requires
alternative
data-processing
software
29. Pesquisa Data Hackers sobre mercado de Data
Science
• Base de dados
• https://www.kaggle.com/datahackers/pesquisa-data-hackers-2019
• Análise detalhada
• https://www.kaggle.com/jsaguiar/brazilian-ds-market
46. Aplicações
Aplicações de ML
• Uber – Preço dinâmico
• Airbnb – Preço dinâmico
• Youtube – Recomendação de próximo vídeo
• Quinto Andar – Calculadora de Aluguel
• Nubank – Análise de Crédito
47. Aplicações
Aplicações de DL
• Google - Google Translate
• Android - Pesquisa por voz
• Facebook - Reconhecimento facial
• SkinVision App – Câncer de Pele
• My Heritage – Colorização de Fotos
58. • Nos contextualizarmos com o ambiente de desenvolvimento em data science
remoto: Google Colab
• Importar um utilitário para colorização de imagens
• Primeiros passos com Python
Objetivo da Atividade