Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science With Python | Python For Data Science | Python Data Science Course | Simplilearn

1,467 views

Published on

This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.

This Data Science with Python presentation will cover the following topics:

1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression

This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.

Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.

You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.

Learn more at: https://www.simplilearn.com

Published in: Education
  • Do you have any questions on this topic? Please share your feedback in the comment section below and we'll have our experts answer it for you. Also, if you would like to have the dataset for implementing the use case shown in the video, please comment below and we will get back to you. Cheers!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Science With Python | Python For Data Science | Python Data Science Course | Simplilearn

  1. 1. What’s in it for you? What is Data Science? Basics of Python for Data Analysis Why learn Python? How to Install Python? Python Libraries for Data Analysis Exploratory analysis using Pandas Introduction to series and data frame Loan Prediction Problem Data Wrangling using Pandas Building a Predictive Model using Scikit-Learn Logistic Regression
  2. 2. What is Data Science? Example Restaurants can predict how many customers will visit on a weekend and plan their food inventory to handle the demand Service Planning System can be trained based on customer behavior pattern to predict the likelihood of a customer buying a product Customer Prediction Data Science is about finding and exploring data in real world, and then using that knowledge to solve business problems
  3. 3. Why Python? Let’s first understand, why we want to use Python?
  4. 4. Why Python? The usage statistics based on google trends depict that Python is currently more popular than R or SAS for Data Science!
  5. 5. Why Python? SPEED PACKAGES DESIGN GOAL But, there are various factors you should consider before deciding which language is best for your Data Analysis:
  6. 6. Why Python? SPEED PACKAGES DESIGN GOAL But, there are various factors you should consider before deciding which language is best for your Data Analysis:
  7. 7. Why Python? SPEED PACKAGES DESIGN GOAL But, there are various factors you should consider before deciding which language is best for your Data Analysis:
  8. 8. Why Python? For instructor Design Goal: Syntax rules in python helps in building application with concise and readable code base Packages: There are numerous packages in Python to choose from like pandas to aggregate & manipulate data, Seaborn or matplotlib to visualize relational data to mention a few Speed: Studies suggest that Python is faster than several widely used languages. Also, we can further speed up python using algorithms and tools
  9. 9. Installing Python Now, let’s install Python to begin the fun
  10. 10. Installing Python • Go to: http://continuum io/downloads • Scroll down to download the graphical installer suitable for your operating system After successful installation, you can launch Jupyter notebook from Anaconda Navigator Anaconda comes with pre-installed libraries In this tutorial, we will be working on Jupyter notebook using Python 3
  11. 11. Python libraries for Data Analysis Let’s get to know some important Python libraries for Data Analysis
  12. 12. Python libraries for Data Analysis There are many interesting libraries that have made Python popular with Data Scientists:
  13. 13. Python libraries for Data Analysis Most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices Pandas for structured data operations and manipulations It is extensively used for data munging and preparation The most powerful feature of NumPy is n-dimensional array This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensional reduction For instructor
  14. 14. Python libraries for Data Analysis Additional libraries, you might need: Networkx & I graph Tensorflow BeautifulSoup OS
  15. 15. Python libraries for Data Analysis os for Operating system and file operations networkx and igraph for graph based data manipulations TensorFlow BeautifulSoup for scrapping web For instructor
  16. 16. What is SciPy? SciPy is a set of scientific and numerical tools for Python • It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, and others • It has fully-featured versions of the linear algebra modules • It is built on top of NumPy
  17. 17. What is NumPy? NumPy is the fundamental package for scientific computing with Python. It contains: • Powerful N-dimensional array object • Tools for integrating C/C++ and Fortran code • It has useful linear algebra, Fourier transform, and random number capabilities
  18. 18. What is Pandas? • The most useful Data Analysis library in Python • Instrumental in increasing the use of Python in Data Science community • It is extensively used for data munging and preparation Pandas is used for structured data operations & manipulations
  19. 19. Exploratory analysis using Pandas Let’s understand the two most common terms used in Pandas: Series Dataframe
  20. 20. Exploratory analysis using Pandas A Series is a one-dimensional object that can hold any data type such as integers, floats and strings Series A DataFrame is a two dimensional object that can have columns with potential different data types DataFrame Pandas
  21. 21. Exploratory analysis using Pandas Default column names Default index Default index Series DataFrame
  22. 22. Exploratory analysis using Pandas Default column names Default index Default index Series DataFrame
  23. 23. Exploratory analysis using Pandas Problem Statement: Based on customer data, predict whether a particular customer’s loan will be approved or not LOAN
  24. 24. Exploratory analysis using Pandas Now, let’s explore our data using Pandas!
  25. 25. Exploratory analysis using Pandas Import the necessary libraries and read the dataset using read_csv() function:
  26. 26. Exploratory analysis using Pandas You can call describe() function to describe all the columns:
  27. 27. Exploratory analysis using Pandas Let’s see numercial values’ distribution 1 Loan Amount
  28. 28. Exploratory analysis using Pandas 2 Applicant Income
  29. 29. Exploratory analysis using Pandas Categorical values’ distribution using matplotlib library: Credit History
  30. 30. Exploratory analysis using Pandas Hence, ‘loanAmount’ and ‘ApplicantIncome’ needs Data Wrangling as some extreme values are observed!
  31. 31. Data Wrangling using Pandas Before proceeding further, let’s understand what is Data Wrangling and why we need it?
  32. 32. Data Wrangling: Process of cleaning and unifying messy and complex data sets It reveals more information about your data Enables decision-making skills in the organization Helps to gather meaningful and precise data for the business Data Wrangling using Pandas
  33. 33. Data Wrangling using Pandas You can see if your data has missing values:
  34. 34. Data Wrangling using Pandas And then you can replace the missing values:
  35. 35. Data Wrangling using Pandas You can access the data types of each column in a DataFrame:
  36. 36. Data Wrangling using Pandas You can perform basic math operations to know more about your data:
  37. 37. Data Wrangling using Pandas You can combine your DataFrames: Combining DataFrame objects can be done using simple concatenation (provided they have the same columns): Creates an array of specified shape and fills it with random values using numpy
  38. 38. Data Wrangling using Pandas
  39. 39. Data Wrangling using Pandas Also, if your DataFrame do not have an identical structure:
  40. 40. Data Wrangling using Pandas You can create a merged dataframe using the merge() function based on the key:
  41. 41. Model Building using Scikit-learn Now, that we have done data wrangling, let’s build a predictive model
  42. 42. Model Building using Scikit-learn We will use Scikit-learn module as it provides a range of supervised and unsupervised learning algorithms
  43. 43. Model Building using Scikit-learn Importing the required scikit-learn module:
  44. 44. Model Building using Scikit-learn Extracting the variables and then splitting the data into train and test:
  45. 45. Model Building using Scikit-learn In this case, we will use Logistic Regression model Logistic Regression is appropriate when the dependent variable is binary
  46. 46. Model Building using Scikit-learn Fitting the data into Logistic Regression model:
  47. 47. Model Building using Scikit-learn Predicting the test results:
  48. 48. Model Building using Scikit-learn To describe the performance of the model let’s build the confusion matrix on test data:
  49. 49. Model Building using Scikit-learn Let’s calculate ACCURACY and PRECISION from confusion matrix: False Positive True Positive False Negative True Negative
  50. 50. Model Building using Scikit-learn Let’s calculate ACCURACY and PRECISION from confusion matrix: • Accuracy Overall, how often is the classifier correct? (TP+TN)/total = (103+18)/150 = 0.80 • Precision When it predicts yes, how often is it correct? TP/predicted yes = 103/130 = 0.79
  51. 51. Model Building using Scikit-learn We can also find the accuracy through Python module:
  52. 52. Model Building using Scikit-learn So , we have built a model with 80% accuracy
  53. 53. Summary Data Science & its popularity with python Data Analysis Libraries in python Series and dataframe in pandas Logistic Regression using scikitData wranglingExploratory analysis

×