SlideShare a Scribd company logo
Machine Learning Using
Python
Instructor :- Shubham Sharma
Outlines
• Some Training Pics
• Why learn Python for Machine Leaning ?
• Python Libraries For Machine Leaning
• Machine Learning key phases
• Series and Data-frames
• Case Studies :- Load Prediction Problem
• Python/Predictive model in data analytics
• Main Resources
Some Training pics
Some Training pics
Some Training pics
Why learn Python for Machine Leaning ?
• Python has gathered a lot of interest recently as a choice of language
for data analysis. I had compared it against SAS & R some time back.
Here are some reasons which go in favour of learning Python:
• Open Source – free to install
• Awesome online community
• Very easy to learn
• Can become a common language for data science and production of
web based analytics products.
•
Python Libraries For Machine Leaning
• NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array.
This library also contains basic linear algebra functions, Fourier transforms, advanced random
number capabilities and tools for integration with other low level languages like Fortran, C and
C++
• SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for
variety of high level science and engineering modules like discrete Fourier transform, Linear
Algebra, Optimization and Sparse matrices.
• Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots..
You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these
plotting features inline. If you ignore the inline option, then pylab converts ipython environment
to an environment, very similar to Matlab. You can also use Latex commands to add math to your
plot.
• Pandas for structured data operations and manipulations. It is extensively used for data munging
and preparation. Pandas were added relatively recently to Python and have been instrumental in
boosting Python’s usage in data scientist community.
• .
• Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this
library contains a lot of effiecient tools for machine learning and statistical
modeling including classification, regression, clustering and dimensionality
reduction.
• Statsmodels for statistical modeling. Statsmodels is a Python module that allows
users to explore data, estimate statistical models, and perform statistical tests. An
extensive list of descriptive statistics, statistical tests, plotting functions, and
result statistics are available for different types of data and each estimator.
• Seaborn for statistical data visualization. Seaborn is a library for making attractive
and informative statistical graphics in Python. It is based on matplotlib. Seaborn
aims to make visualization a central part of exploring and understanding data.
• Bokeh for creating interactive plots, dashboards and data applications on modern
web-browsers. It empowers the user to generate elegant and concise graphics in
the style of D3.js. Moreover, it has the capability of high-performance
interactivity over very large or streaming datasets
Python Libraries For Machine Leaning
• Blaze for extending the capability of Numpy and Pandas to distributed and
streaming datasets. It can be used to access data from a multitude of sources
including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together
with Bokeh, Blaze can act as a very powerful tool for creating effective
visualizations and dashboards on huge chunks of data.
• Scrapy for web crawling. It is a very useful framework for getting specific patterns
of data. It has the capability to start at a website home url and then dig through
web-pages within the website to gather information.
• SymPy for symbolic computation. It has wide-ranging capabilities from basic
symbolic arithmetic to calculus, algebra, discrete mathematics and quantum
physics. Another useful feature is the capability of formatting the result of the
computations as LaTeX code.
• Requests for accessing the web. It works similar to the the standard python
library urllib2 but is much easier to code. You will find subtle differences with
urllib2 but for beginners, Requests might be more convenient.
Python Libraries For Machine Leaning
Machine Learning key phases
• We will take you through the 3 key phases:
• Data Exploration – finding out more about the data we have
• Data Munging – cleaning the data and playing with it to make it better
suit statistical modeling
• Predictive Modeling – running the actual algorithms and having fun
Python math and cmath libs
• math provides access to the mathematical functions defined by the C
standard.
• These functions cannot be used with complex numbers;
• cmath
• It provides access to mathematical functions for complex numbers.
The functions in this module accept integers, floating-point numbers
or complex numbers as arguments. They will also accept any Python
object that has either a __complex__() or a __float__() method:
•
Series and Dataframes
• Numpy and Scipy Documentation
• Introduction to Series and Dataframes
• Series can be understood as a 1 dimensional labelled / indexed array.
You can access individual elements of this series through these labels.
Practice data set – Loan Prediction Problem
• Steps :
• Step1 :- installation
• Install ipython
• Install pandas
• Install numpy
• Install matplotlib
• Then
Practice data set – Loan Prediction Problem
• Step 2:- begin with exploration
• To begin, start iPython interface in Inline Pylab mode by typing following on your terminal / windows
command/pydev(eclip) prompt:
• >>>ipython notebook --pylab=inline
• Importing libraries and the data set:
• Importing libraries and the data set:
• import pandas as pd
• import numpy as np
• import matplotlib as plt
• df = pd.read_csv("/home/kunal/Downloads/Loan_Prediction/train.csv") #Reading the dataset in a dataframe
using Pandas
• df.head(10)
• df.describe()
• df['Property_Area'].value_counts()
Practice data set – Loan Prediction Problem
• Step 3 :- Distribution analysis
• Lets start by plotting the histogram of ApplicantIncome using the following commands:
• df['ApplicantIncome'].hist(bins=50)
• Next, we look at box plots to understand the distributions. Box plot for fare can be plotted by:
• df.boxplot(column='ApplicantIncome’)
• Categorical variable analysis
• temp1 = df['Credit_History'].value_counts(ascending=True)
• temp2 = df.pivot_table(values='Loan_Status',index=['Credit_History'],aggfunc=lambda x:
x.map({'Y':1,'N':0}).mean())
• print 'Frequency Table for Credit History:'
• print temp1
• print 'nProbility of getting loan for each Credit History class:'
• print temp2
Practice data set – Loan Prediction Problem
• Using matplotlib for plotting graph
• import matplotlib.pyplot as plt
• fig = plt.figure(figsize=(8,4))
• ax1 = fig.add_subplot(121)
• ax1.set_xlabel('Credit_History')
• ax1.set_ylabel('Count of Applicants')
• ax1.set_title("Applicants by Credit_History")
• temp1.plot(kind='bar')
• ax2 = fig.add_subplot(122)
• temp2.plot(kind = 'bar')
• ax2.set_xlabel('Credit_History')
• ax2.set_ylabel('Probability of getting loan')
• ax2.set_title("Probability of getting loan by credit history")
Practice data set – Loan Prediction Problem
• these two plots can also be visualized by combining them in a stacked
chart::
• temp3 = pd.crosstab(df['Credit_History'], df['Loan_Status'])
• temp3.plot(kind='bar', stacked=True, color=['red','blue'], grid=False)
• 4. Data Munging in Python : Using Pandas
• Check missing values in the dataset
• df.apply(lambda x: sum(x.isnull()),axis=0)
• df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
Practice data set – Loan Prediction Problem
• 5. Building a Predictive Model in Python
• This can be done using the following code:
• from sklearn.preprocessing import LabelEncoder
• var_mod =
['Gender','Married','Dependents','Education','Self_Employed','Property_Area','Lo
an_Status']
• le = LabelEncoder()
• for i in var_mod:
• df[i] = le.fit_transform(df[i])
• df.dtypes
• Python is really a great tool, and is becoming an increasingly popular language
among the data scientists.
Python with JSON,csv with Pandas
• Best blogs
• https://www.dataquest.io/blog/python-json-tutorial/
• http://blog.danwin.com/examples-of-web-scraping-in-python-3-x-for-
data-journalists/
• https://automatetheboringstuff.com/chapter14/
Python/Predictive model in data analytics
• Predictive modeling is a process that uses data
mining and probability to forecast outcomes. Each model is made up
of a number of predictors, which are variables that are likely to
influence future results.
• Sklearn.LabelEncoder()
• It Convert Pandas Categorical Data For Scikit-Learn
Python/Perfect way to build a Predictive Model
• Predictive modeling is a process that uses data mining and
probability to forecast outcomes. Each model is made up of a number
of predictors, which are variables that are likely to influence future
results.
• Broadly, it can be divided into 4 parts.
• Descriptive analysis on the Data – 50% time
• Data treatment (Missing value and outlier fixing) – 40% time
• Data Modelling – 4% time
• Estimation of performance – 6% time
Python/Perfect way to build a Predictive Model
• Descriptive Analysis
• Descriptive statistics is the initial stage of analysis used to describe and
summarize data. The availability of a large amount of data and very efficient
computational methods strengthened this area of the statistic.: Below are the
steps involved to understand,
• Variable Identification
• Univariate Analysis
• Bi-variate Analysis
• Missing values treatment
• Outlier treatment
• Variable transformation
• Variable creation
Python/Perfect way to build a Predictive Model
• Data treatment:
• An important aspect of statistical treatment of data is the handling of
errors. methods to treat missing values
• Deletion: It is of two types: List Wise Deletion and Pair Wise Deletion.
• Mean/ Mode/ Median Imputation: Imputation is a method to fill in the
missing values with estimated ones.
• Prediction Model: Prediction model is one of the sophisticated method for
handling missing data. Here, we create a predictive model to estimate
values that will substitute the missing data.
• KNN Imputation: In this method of imputation, the missing values of
an attribute are imputed using the given number of attributes that are
most similar to the attribute whose values are missing.
Python/Perfect way to build a Predictive Model
• Data Modelling : In case of bigger data, you can consider running a
Random Forest. This will take maximum amount of time
• Estimation of Performance : It is measurement of performance
.kfold with k=7 highly effective to take my initial bet. This finally
takes 1-2 minutes to execute and document.
Python/time-series-forecast-study
• The problem is to predict the number of monthly sales of champagne for the Perrin
Freres label (named for a region in France).
• The dataset provides the number of monthly sales of champagne from January 1964 to
September 1972, or just under 10 years of data.
• Download the dataset as a CSV file and place it in your current working director
• The steps of this project that we will through are as follows.
• Environment.
• Problem Description.
• Test Harness.
• Persistence.
• Data Analysis.
• ARIMA Models.
• Model Validation.
• y with the filename “champagne.csv“.
Python/time-series-forecast-study
• 3. Test Harness
• We must develop a test harness to investigate the data and evaluate candidate models.
• This involves two steps:
• Defining a Validation Dataset.
• Developing a Method for Model Evaluation.
• 3.1 Validation Dataset
• The code below will load the dataset as a Pandas Series and split into two, one for model development
(dataset.csv) and the other for validation (validation.csv).
• from pandas import Series
• series = Series.from_csv('champagne.csv', header=0)
• split_point = len(series) - 12
• dataset, validation = series[0:split_point], series[split_point:]
• print('Dataset %d, Validation %d' % (len(dataset), len(validation)))
• dataset.to_csv('dataset.csv')
• validation.to_csv('validation.csv')
Python/time-series-forecast-study
• The specific contents of these files are:
• dataset.csv: Observations from January 1964 to September 1971 (93
observations)
• validation.csv: Observations from October 1971 to September 1972 (12
observations)
•
3.2. Model Evaluation
• Model evaluation will only be performed on the data
in dataset.csv prepared in the previous section.
• Model evaluation involves two elements:
• Performance Measure.
• Test Strategy.
Python/Recommender systems
• Movie recommend
• https://cambridgespark.com/content/tutorials/implementing-your-
own-recommender-systems-in-Python/index.html
• Movie recommend
• Matrix factorization recommender
• https://beckernick.github.io/matrix-factorization-recommender/
Python/Main resources
• I definitely recommend cs109.org. Harvard CS 109
There are a few different courses, but the one I used when I was
learning was Dataquest
• https://www.dataquest.io/blog/pandas-cheat-sheet/
• https://www.dataquest.io/blog/data-science-portfolio-project/
• Loan prediction
• https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-
learn-data-science-python-scratch-2/

More Related Content

What's hot

Python for ML
Python for MLPython for ML
Python for ML
Reza Sadeghi Jafari
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
Jim Dowling
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
Grant Ingersoll
 
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BigML, Inc
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
Korea Sdec
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
MLconf
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondDataWorks Summit
 
Deep Anomaly Detection from Research to Production Leveraging Spark and Tens...
 Deep Anomaly Detection from Research to Production Leveraging Spark and Tens... Deep Anomaly Detection from Research to Production Leveraging Spark and Tens...
Deep Anomaly Detection from Research to Production Leveraging Spark and Tens...
Databricks
 
Java Memory Analysis: Problems and Solutions
Java Memory Analysis: Problems and SolutionsJava Memory Analysis: Problems and Solutions
Java Memory Analysis: Problems and Solutions
"Mikhail "Misha"" Dmitriev
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
jeykottalam
 
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat
Spark Summit
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
Tom Maiaroto
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
Jimmy Lai
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
Jim Dowling
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
Felipe
 

What's hot (20)

Python for ML
Python for MLPython for ML
Python for ML
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
Deep Anomaly Detection from Research to Production Leveraging Spark and Tens...
 Deep Anomaly Detection from Research to Production Leveraging Spark and Tens... Deep Anomaly Detection from Research to Production Leveraging Spark and Tens...
Deep Anomaly Detection from Research to Production Leveraging Spark and Tens...
 
Java Memory Analysis: Problems and Solutions
Java Memory Analysis: Problems and SolutionsJava Memory Analysis: Problems and Solutions
Java Memory Analysis: Problems and Solutions
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 

Similar to Python ml

Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
Pramod Toraskar
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptx
KashishKashish22
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive GuidePython for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Python and data analytics
Python and data analyticsPython and data analytics
Module 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time SeriesModule 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time Series
ssusere5ddd6
 
Certified Python Business Analyst
Certified Python Business AnalystCertified Python Business Analyst
Certified Python Business Analyst
AnkitSingh2134
 
Python with dataScience
Python with dataSciencePython with dataScience
Data-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptxData-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptx
DRSHk10
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
Tarun Bamba
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
JamesKirk79
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
Turi, Inc.
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
ProjectPro Big Data and Data Science Projects
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptxAdarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptx
hkabir55
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Forward
 

Similar to Python ml (20)

Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive GuidePython for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Python and data analytics
Python and data analyticsPython and data analytics
Python and data analytics
 
Module 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time SeriesModule 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time Series
 
Certified Python Business Analyst
Certified Python Business AnalystCertified Python Business Analyst
Certified Python Business Analyst
 
Python with dataScience
Python with dataSciencePython with dataScience
Python with dataScience
 
Data-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptxData-Analytics using python (Module 4).pptx
Data-Analytics using python (Module 4).pptx
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptxAdarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptx
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Python ml

  • 2. Outlines • Some Training Pics • Why learn Python for Machine Leaning ? • Python Libraries For Machine Leaning • Machine Learning key phases • Series and Data-frames • Case Studies :- Load Prediction Problem • Python/Predictive model in data analytics • Main Resources
  • 6. Why learn Python for Machine Leaning ? • Python has gathered a lot of interest recently as a choice of language for data analysis. I had compared it against SAS & R some time back. Here are some reasons which go in favour of learning Python: • Open Source – free to install • Awesome online community • Very easy to learn • Can become a common language for data science and production of web based analytics products. •
  • 7. Python Libraries For Machine Leaning • NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++ • SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices. • Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot. • Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community. • .
  • 8. • Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction. • Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. • Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data. • Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets Python Libraries For Machine Leaning
  • 9. • Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data. • Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information. • SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code. • Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient. Python Libraries For Machine Leaning
  • 10. Machine Learning key phases • We will take you through the 3 key phases: • Data Exploration – finding out more about the data we have • Data Munging – cleaning the data and playing with it to make it better suit statistical modeling • Predictive Modeling – running the actual algorithms and having fun
  • 11. Python math and cmath libs • math provides access to the mathematical functions defined by the C standard. • These functions cannot be used with complex numbers; • cmath • It provides access to mathematical functions for complex numbers. The functions in this module accept integers, floating-point numbers or complex numbers as arguments. They will also accept any Python object that has either a __complex__() or a __float__() method: •
  • 12. Series and Dataframes • Numpy and Scipy Documentation • Introduction to Series and Dataframes • Series can be understood as a 1 dimensional labelled / indexed array. You can access individual elements of this series through these labels.
  • 13. Practice data set – Loan Prediction Problem • Steps : • Step1 :- installation • Install ipython • Install pandas • Install numpy • Install matplotlib • Then
  • 14. Practice data set – Loan Prediction Problem • Step 2:- begin with exploration • To begin, start iPython interface in Inline Pylab mode by typing following on your terminal / windows command/pydev(eclip) prompt: • >>>ipython notebook --pylab=inline • Importing libraries and the data set: • Importing libraries and the data set: • import pandas as pd • import numpy as np • import matplotlib as plt • df = pd.read_csv("/home/kunal/Downloads/Loan_Prediction/train.csv") #Reading the dataset in a dataframe using Pandas • df.head(10) • df.describe() • df['Property_Area'].value_counts()
  • 15. Practice data set – Loan Prediction Problem • Step 3 :- Distribution analysis • Lets start by plotting the histogram of ApplicantIncome using the following commands: • df['ApplicantIncome'].hist(bins=50) • Next, we look at box plots to understand the distributions. Box plot for fare can be plotted by: • df.boxplot(column='ApplicantIncome’) • Categorical variable analysis • temp1 = df['Credit_History'].value_counts(ascending=True) • temp2 = df.pivot_table(values='Loan_Status',index=['Credit_History'],aggfunc=lambda x: x.map({'Y':1,'N':0}).mean()) • print 'Frequency Table for Credit History:' • print temp1 • print 'nProbility of getting loan for each Credit History class:' • print temp2
  • 16. Practice data set – Loan Prediction Problem • Using matplotlib for plotting graph • import matplotlib.pyplot as plt • fig = plt.figure(figsize=(8,4)) • ax1 = fig.add_subplot(121) • ax1.set_xlabel('Credit_History') • ax1.set_ylabel('Count of Applicants') • ax1.set_title("Applicants by Credit_History") • temp1.plot(kind='bar') • ax2 = fig.add_subplot(122) • temp2.plot(kind = 'bar') • ax2.set_xlabel('Credit_History') • ax2.set_ylabel('Probability of getting loan') • ax2.set_title("Probability of getting loan by credit history")
  • 17. Practice data set – Loan Prediction Problem • these two plots can also be visualized by combining them in a stacked chart:: • temp3 = pd.crosstab(df['Credit_History'], df['Loan_Status']) • temp3.plot(kind='bar', stacked=True, color=['red','blue'], grid=False) • 4. Data Munging in Python : Using Pandas • Check missing values in the dataset • df.apply(lambda x: sum(x.isnull()),axis=0) • df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
  • 18. Practice data set – Loan Prediction Problem • 5. Building a Predictive Model in Python • This can be done using the following code: • from sklearn.preprocessing import LabelEncoder • var_mod = ['Gender','Married','Dependents','Education','Self_Employed','Property_Area','Lo an_Status'] • le = LabelEncoder() • for i in var_mod: • df[i] = le.fit_transform(df[i]) • df.dtypes • Python is really a great tool, and is becoming an increasingly popular language among the data scientists.
  • 19. Python with JSON,csv with Pandas • Best blogs • https://www.dataquest.io/blog/python-json-tutorial/ • http://blog.danwin.com/examples-of-web-scraping-in-python-3-x-for- data-journalists/ • https://automatetheboringstuff.com/chapter14/
  • 20. Python/Predictive model in data analytics • Predictive modeling is a process that uses data mining and probability to forecast outcomes. Each model is made up of a number of predictors, which are variables that are likely to influence future results. • Sklearn.LabelEncoder() • It Convert Pandas Categorical Data For Scikit-Learn
  • 21. Python/Perfect way to build a Predictive Model • Predictive modeling is a process that uses data mining and probability to forecast outcomes. Each model is made up of a number of predictors, which are variables that are likely to influence future results. • Broadly, it can be divided into 4 parts. • Descriptive analysis on the Data – 50% time • Data treatment (Missing value and outlier fixing) – 40% time • Data Modelling – 4% time • Estimation of performance – 6% time
  • 22. Python/Perfect way to build a Predictive Model • Descriptive Analysis • Descriptive statistics is the initial stage of analysis used to describe and summarize data. The availability of a large amount of data and very efficient computational methods strengthened this area of the statistic.: Below are the steps involved to understand, • Variable Identification • Univariate Analysis • Bi-variate Analysis • Missing values treatment • Outlier treatment • Variable transformation • Variable creation
  • 23. Python/Perfect way to build a Predictive Model • Data treatment: • An important aspect of statistical treatment of data is the handling of errors. methods to treat missing values • Deletion: It is of two types: List Wise Deletion and Pair Wise Deletion. • Mean/ Mode/ Median Imputation: Imputation is a method to fill in the missing values with estimated ones. • Prediction Model: Prediction model is one of the sophisticated method for handling missing data. Here, we create a predictive model to estimate values that will substitute the missing data. • KNN Imputation: In this method of imputation, the missing values of an attribute are imputed using the given number of attributes that are most similar to the attribute whose values are missing.
  • 24. Python/Perfect way to build a Predictive Model • Data Modelling : In case of bigger data, you can consider running a Random Forest. This will take maximum amount of time • Estimation of Performance : It is measurement of performance .kfold with k=7 highly effective to take my initial bet. This finally takes 1-2 minutes to execute and document.
  • 25. Python/time-series-forecast-study • The problem is to predict the number of monthly sales of champagne for the Perrin Freres label (named for a region in France). • The dataset provides the number of monthly sales of champagne from January 1964 to September 1972, or just under 10 years of data. • Download the dataset as a CSV file and place it in your current working director • The steps of this project that we will through are as follows. • Environment. • Problem Description. • Test Harness. • Persistence. • Data Analysis. • ARIMA Models. • Model Validation. • y with the filename “champagne.csv“.
  • 26. Python/time-series-forecast-study • 3. Test Harness • We must develop a test harness to investigate the data and evaluate candidate models. • This involves two steps: • Defining a Validation Dataset. • Developing a Method for Model Evaluation. • 3.1 Validation Dataset • The code below will load the dataset as a Pandas Series and split into two, one for model development (dataset.csv) and the other for validation (validation.csv). • from pandas import Series • series = Series.from_csv('champagne.csv', header=0) • split_point = len(series) - 12 • dataset, validation = series[0:split_point], series[split_point:] • print('Dataset %d, Validation %d' % (len(dataset), len(validation))) • dataset.to_csv('dataset.csv') • validation.to_csv('validation.csv')
  • 27. Python/time-series-forecast-study • The specific contents of these files are: • dataset.csv: Observations from January 1964 to September 1971 (93 observations) • validation.csv: Observations from October 1971 to September 1972 (12 observations) • 3.2. Model Evaluation • Model evaluation will only be performed on the data in dataset.csv prepared in the previous section. • Model evaluation involves two elements: • Performance Measure. • Test Strategy.
  • 28. Python/Recommender systems • Movie recommend • https://cambridgespark.com/content/tutorials/implementing-your- own-recommender-systems-in-Python/index.html • Movie recommend • Matrix factorization recommender • https://beckernick.github.io/matrix-factorization-recommender/
  • 29. Python/Main resources • I definitely recommend cs109.org. Harvard CS 109 There are a few different courses, but the one I used when I was learning was Dataquest • https://www.dataquest.io/blog/pandas-cheat-sheet/ • https://www.dataquest.io/blog/data-science-portfolio-project/ • Loan prediction • https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial- learn-data-science-python-scratch-2/