Definition and basic features of data analysis with python, pandas and scikit-learn. Brief explanation about most powerful features. Introduction part.
My INSURER PTE LTD - Insurtech Innovation Award 2024
Data analysis with pandas and scikit-learn
1. Data analysis with pandas
and scikit-learn
- Data Preparation
- Data Modeling & Prediction
- Data Visualisation
- Grouping of Data
Data analysis provides:
We have worked on analysis of big scope of transactional data provides by company, helping
to improve revenue values, increase customer acquisition, retention, and satisfaction.
Why do we care about it
Health care analytics allows the examination of patterns in healthcare data in order to decide how
clinical care can be enhanced while limiting excessive costs. Predictive analysis is a key driver for
improving patient care, reducing costs and bringing greater efficiencies to the healthcare industry.
We are looking forward to apply the following methods to group, sort, analyse data and build
predictive models.
2. Pandas
Pandas - python library providing data analysis features, similar to:
- R
- Matlab
- SAS
Key features provided by Pandas:
- reading, writing and analysing big data
- time series-specific functionality
- easy handling of missing data in floating point as well as non-floating point data
- automatic and explicit data alignment
- powerful, flexible group by functionality to perform split-apply-combine operations on data sets
- intuitive merging and joining large data sets
- hierarchical labeling of axes
- fast computation
3. Scikit-learn
Open source machine learning library for the Python programming language
Key features:
* supervised learning, in which the data comes with additional attributes that we want to predict
(Click here to go to the scikit-learn supervised learning page) :
- classification (Identifying to which category an object belongs to.)
- regression (Predictions)
- clustering (Automatic grouping of similar objects into sets)
- preprossessing (Transforming input data such as text for use with machine learning
algorithms.)
* unsupervised learning, in which the training data consists of a set of input vectors
x without any corresponding target values. The goal in such problems may be to discover
groups of similar examples within the data
4. Data visualization
Seaborn - python visualization library, provides a high-level interface for
drawing attractive statistical graphics
Key features:
- high-level abstractions for structuring grids of plots that let you easily build
complex visualizations
- a function to plot statistical timeseries data
- functions that visualize matrices of data
- tools that fit and visualize linear regression models