Talk for Pycon 2017 on the title "Doing Data science the industry way! :From Data pulling to cleaning to data ingestion to making predictions using Python"
1. Doing Data science the industry
way! :From Data pulling to
cleaning to data ingestion to
making
predictions using Python
2. What’s Data Science and Why do
it?
The science of making sense out of data
Gives us insight of how we are doing.
New Vertical of research ! : Data driven Research
3. How Python Fits in? :
Python is used in all the steps of Data Science .From
pulling to predictions
Python makes life easy for us.
4. Data Science lifecycle :Step 1
The Pulling
The foremost step in a Data Science
Defines the base on which data science will be done
Python helps us in pulling the data from various
sources.
5. Pulling Contd..
Python can do pulling tasks like:
Scraping the web.
Pulling from API calls .
Searching Databases for us.
7. Step 2: The cleaning and why
cleaning is done?
The Second step that is done in Data Science is
cleaning the data.
Raw data can neither be stored nor used for any
purpose.
From source data comes mostly uncleaned
8. Cleaning with Python
Python plays a major role in this step .
Python is used widely for data cleaning .
We can Clean data as we like using python
Python can process millions of records in a matter of
minutes
10. Step 3 :Data Structuring
After one has cleaned the data , before pushing to
sources ,the data needs to be organised well.
Organised data leads to discoveries in the data.
11. Structuring with Python
Python can help in this step too!
Python can play around with Data JSONs and make
them appropriate as per the need.
13. The Data Warehouse
The large store of data accumulated from a wide range
of sources within a company and used to guide
management decisions.
All the pulling cleaning structuring leads here
The treasure chest of key hidden insights of data.
The place to measure span of your wings
14. Data Ingestion
The process of pushing the data into the right
destination is called data ingestion
The destination needs to be in the correct structure
before ingestion is done
15. Warehousing and Ingestion with
Python
Python makes our life easy in this step too
Python helps us build the data flows that lead to
warehouse.
Since python is fast ,it makes the whole job from pulling
to flowing faster than ever.
Python structures the destination (Warehouses) before
ingestion can be done to them.
16. contd
Python fills the gap between data and the destination
(warehouses) and makes them both compatible with
each other.
Python acts as a carrier from starting till end.
18. Predictive Modeling! :The fruit
finally comes to ripe
The process of making predictions from millions of
records of processed data by using machine learning
models is called predictive modeling .
All the above steps are done to achieve this
19. Predictive Modeling contd…
The use of machine learning is done in this step.
The big chunks of data collected earlier are now being
put to use.
20. Predictive Modeling using Python
Python is the heart of this data science step!
We write all the Machine learning models in this step .
We write the models from scratch using python in this
step.
21. Python takes all the charge of
Predictive Modeling
Python does everything from developing models, fine
tuning parameters to visualizing the results.