SlideShare a Scribd company logo
1 of 4
Data analysis with pandas
and scikit-learn
- Data Preparation
- Data Modeling & Prediction
- Data Visualisation
- Grouping of Data
Data analysis provides:
We have worked on analysis of big scope of transactional data provides by company, helping
to improve revenue values, increase customer acquisition, retention, and satisfaction.
Why do we care about it
Health care analytics allows the examination of patterns in healthcare data in order to decide how
clinical care can be enhanced while limiting excessive costs. Predictive analysis is a key driver for
improving patient care, reducing costs and bringing greater efficiencies to the healthcare industry.
We are looking forward to apply the following methods to group, sort, analyse data and build
predictive models.
Pandas
Pandas - python library providing data analysis features, similar to:
- R
- Matlab
- SAS
Key features provided by Pandas:
- reading, writing and analysing big data
- time series-specific functionality
- easy handling of missing data in floating point as well as non-floating point data
- automatic and explicit data alignment
- powerful, flexible group by functionality to perform split-apply-combine operations on data sets
- intuitive merging and joining large data sets
- hierarchical labeling of axes
- fast computation
Scikit-learn
Open source machine learning library for the Python programming language
Key features:
* supervised learning, in which the data comes with additional attributes that we want to predict
(Click here to go to the scikit-learn supervised learning page) :
- classification (Identifying to which category an object belongs to.)
- regression (Predictions)
- clustering (Automatic grouping of similar objects into sets)
- preprossessing (Transforming input data such as text for use with machine learning
algorithms.)
* unsupervised learning, in which the training data consists of a set of input vectors
x without any corresponding target values. The goal in such problems may be to discover
groups of similar examples within the data
Data visualization
Seaborn - python visualization library, provides a high-level interface for
drawing attractive statistical graphics
Key features:
- high-level abstractions for structuring grids of plots that let you easily build
complex visualizations
- a function to plot statistical timeseries data
- functions that visualize matrices of data
- tools that fit and visualize linear regression models

More Related Content

What's hot

Internet of Things Chicago - Meetup
Internet of Things Chicago - MeetupInternet of Things Chicago - Meetup
Internet of Things Chicago - MeetupJason Lobel
 
Project Topics in Data Mining
Project Topics in Data MiningProject Topics in Data Mining
Project Topics in Data MiningPhdtopiccom
 
Master Data Management Using AI
Master Data Management Using AIMaster Data Management Using AI
Master Data Management Using AISonal Goyal
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data miningDataminingTools Inc
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratchVinayak Hegde
 
BigData-Architecture
BigData-ArchitectureBigData-Architecture
BigData-ArchitectureNarayana B
 
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...Istituto nazionale di statistica
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data MiningPhdtopiccom
 
Hadoop training in Bangalore
Hadoop training in BangaloreHadoop training in Bangalore
Hadoop training in Bangaloreappaji intelhunt
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 

What's hot (18)

Internet of Things Chicago - Meetup
Internet of Things Chicago - MeetupInternet of Things Chicago - Meetup
Internet of Things Chicago - Meetup
 
resume_MH
resume_MHresume_MH
resume_MH
 
Project Topics in Data Mining
Project Topics in Data MiningProject Topics in Data Mining
Project Topics in Data Mining
 
Master Data Management Using AI
Master Data Management Using AIMaster Data Management Using AI
Master Data Management Using AI
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
36. data mining techniques
36. data mining techniques36. data mining techniques
36. data mining techniques
 
Enterprise architecture for big data projects
Enterprise architecture for big data projectsEnterprise architecture for big data projects
Enterprise architecture for big data projects
 
BigData-Architecture
BigData-ArchitectureBigData-Architecture
BigData-Architecture
 
Ets train ppt_big_data_basics_v2.0
Ets train ppt_big_data_basics_v2.0Ets train ppt_big_data_basics_v2.0
Ets train ppt_big_data_basics_v2.0
 
Solution Architecture - AWS
Solution Architecture - AWSSolution Architecture - AWS
Solution Architecture - AWS
 
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data Mining
 
Hadoop training in Bangalore
Hadoop training in BangaloreHadoop training in Bangalore
Hadoop training in Bangalore
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
19
1919
19
 

Similar to Data analysis with pandas and scikit-learn

Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation ProjectAditya Ekawade
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with SparkKhalid Salama
 
Splunk Business Analytics
Splunk Business AnalyticsSplunk Business Analytics
Splunk Business AnalyticsCleverDATA
 
Technical Research Document - Anurag
Technical Research Document - AnuragTechnical Research Document - Anurag
Technical Research Document - Anuraganuragrajandekar
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)Julien SIMON
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Muhammad Fahad
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...semanticsconference
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitecturePalani Kumar
 
Mis jaiswal-chapter-08
Mis jaiswal-chapter-08Mis jaiswal-chapter-08
Mis jaiswal-chapter-08Amit Fogla
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET Journal
 
Intro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI SoftwareIntro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI Softwarerafeq
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 

Similar to Data analysis with pandas and scikit-learn (20)

Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Sap Bw 3.5 Overview
Sap Bw 3.5 OverviewSap Bw 3.5 Overview
Sap Bw 3.5 Overview
 
Python and data analytics
Python and data analyticsPython and data analytics
Python and data analytics
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation Project
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Splunk Business Analytics
Splunk Business AnalyticsSplunk Business Analytics
Splunk Business Analytics
 
Technical Research Document - Anurag
Technical Research Document - AnuragTechnical Research Document - Anurag
Technical Research Document - Anurag
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
Actian Matrix Datasheet
Actian Matrix DatasheetActian Matrix Datasheet
Actian Matrix Datasheet
 
Mis jaiswal-chapter-08
Mis jaiswal-chapter-08Mis jaiswal-chapter-08
Mis jaiswal-chapter-08
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
 
Intro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI SoftwareIntro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI Software
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Data analysis with pandas and scikit-learn

  • 1. Data analysis with pandas and scikit-learn - Data Preparation - Data Modeling & Prediction - Data Visualisation - Grouping of Data Data analysis provides: We have worked on analysis of big scope of transactional data provides by company, helping to improve revenue values, increase customer acquisition, retention, and satisfaction. Why do we care about it Health care analytics allows the examination of patterns in healthcare data in order to decide how clinical care can be enhanced while limiting excessive costs. Predictive analysis is a key driver for improving patient care, reducing costs and bringing greater efficiencies to the healthcare industry. We are looking forward to apply the following methods to group, sort, analyse data and build predictive models.
  • 2. Pandas Pandas - python library providing data analysis features, similar to: - R - Matlab - SAS Key features provided by Pandas: - reading, writing and analysing big data - time series-specific functionality - easy handling of missing data in floating point as well as non-floating point data - automatic and explicit data alignment - powerful, flexible group by functionality to perform split-apply-combine operations on data sets - intuitive merging and joining large data sets - hierarchical labeling of axes - fast computation
  • 3. Scikit-learn Open source machine learning library for the Python programming language Key features: * supervised learning, in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page) : - classification (Identifying to which category an object belongs to.) - regression (Predictions) - clustering (Automatic grouping of similar objects into sets) - preprossessing (Transforming input data such as text for use with machine learning algorithms.) * unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data
  • 4. Data visualization Seaborn - python visualization library, provides a high-level interface for drawing attractive statistical graphics Key features: - high-level abstractions for structuring grids of plots that let you easily build complex visualizations - a function to plot statistical timeseries data - functions that visualize matrices of data - tools that fit and visualize linear regression models