15. Data science job Role
Data scientists: Design data modeling processes to create algorithms and
predictive models and perform custom analysis
Data analysts: Manipulate large data sets and use them to identify trends
and reach meaningful conclusions to inform strategic business decisions
Data engineers: Clean, aggregate, and organize data from disparate
sources and transfer it to data warehouses.
Business intelligence specialists: Identify trends in data sets
Data architects: Design, create, and manage an organization’s data
architecture
25. OSEMN
O — Obtaining our data
S — Scrubbing / Cleaning our data
E — Exploring / Visualizing our data will allow us to find patterns and
trends
M — Modeling our data will give us our predictive power as a wizard
N — Interpreting our data
26. Business Question
1. How can we translate data into dollars?
2. What impact do I want to make with this data?
3. What business value does our model bring to the table?
4. What will save us lots of money?
5. What can be done to make our business run more efficiently?
27. Obtain Your Data
a rule of thumb, there are some things you must take into consideration
when obtaining your data. You must identify all of your available datasets
(which can be from the internet or external/internal databases). You must
extract the data into a usable format (.csv, json, xml, etc..)
Skills Required:
1. Database Management: MySQL, Postgres SQL, MongoDB
2. Querying Relational Databases
3. Retrieving Unstructured Data: text, videos, audio files, documents
4. Distributed Storage: Hadoops, Apache Spark/Flink
28. “Good data science is more
about the questions you pose of
the data rather than data
mugging and analysis”
— Riley Newman
29. Scrubbing / Cleaning Your Data
This phase of the pipeline should require the most time and
effort. Because the results and output of your machine learning model is
only as good as what you put into it. Basically, garbage in garbage out.
30. Scrubbing / Cleaning Your Data
Objective:
1. Examine the data: understand every feature you’re working with, identify
errors, missing values, and corrupt records
2. Clean the data: throw away, replace, and/or fill missing values/errors
Skills Required:
1. Scripting language: Python, R, SAS
2. Data Wrangling Tools: Python Pandas, R
3. Distributed Processing: Hadoop, Map Reduce / Spark
31. Exploring (Exploratory Data Analysis)
Understand
visualizations
statistical testing
Objective:
1. Find patterns in your data through visualizations and charts
2. Extract features by using statistics to identify and test significant variables
Skills Required:
1. Python: Numpy, Matplotlib, Pandas, Scipy
2. R: GGplot2, Dplyr
3. Inferential statistics
4. Experimental Design
5. Data Visualization