DATA ANALYSIS PACKAGES
DATA ANALYSIS USING PYTHON
Pandas is a software library used in Python programming languages for data manipulation and
analysis.
Pandas is well suited for many different kinds of data:
● Tabular data with heterogeneously-typed columns.
● Ordered and unordered time series data.
● Arbitrary matrix data with row and column labels.
● Any other form of observational/statistical data sets.
The data actually need not be labeled at all to be
placed into a pandas data structure.
PANDAS OPERATIONS
OPERATIONS
Slicing the
dataframe
Joining and merging
ConcatenationChanging the index
Joining and mergingData conversation
PANDAS DATA STRUCTURES
Pandas introduces two new data structures to Python :
 Series
 DataFrame
PANDAS-DATA STRUCTURES: SERIES
 One-dimensional array-like object
containing data and labels (or
index)
 Lots of ways to build a Series
SERIES - WORKING WITH THE INDEX
 A series index can be specified
 Single values can be selected by index
 Multiple values can be selected with
multiple indexes
SERIES - WORKING WITH THE INDEX
 Think of a Series as a fixed-length, order
dict
 However, unlike dict,index items don't
have to be unique
SERIES - OPERATIONS
 Filtering
 NumPy-type operations on data
SERIES - INCOMPLETE DATA
Pandas can accommodate incomplete
data
SERIES- AUTOMATIC ALIGNMENT
Unlike in NumPy ndarray, data is
automatically aligned.
DATA STRUCTURES : DATAFRAME
 Spreadsheet-like data structure containing an order collection of columns
 Has both a row and column index
 Consider as dict of Series (with shared index)
DATAFRAME
Creation with dictionary of equal-length lists
DATAFRAME
Creation with dictionary of dictionary
DATAFRAME
 Columns can be retrieved as Series:
dict notation
attribute notation
 Rows can retrieved by position or by
name (using ix attribute)
DATAFRAME
New Columns can be added (by
computation or direct assignment)
DATAFRAME - REINDEXING
Creation of new object with the data
conformed to a new index
INSTALLATION OF PANDAS
To install pandas module in terminal write: Sudo apt-get install python-pandas
PANDAS
Pandas is an open source Python library for data analysis.
Pandas Python is one of those libraries for data analysis, that contains high-level data structures and tools to
help data scientists or data analysts manipulate data in a very simple and easy way.
NUMPY
Numpy is the standard package for scientific and numerical computing in python:
 Powerful N-dimensional array object.
 sophisticated (broadcasting) functions.
 Tools for integrating C/C++ and Fortran code.
 Useful linear algebra,Fourier transform and random number capabilities.
 Numpy provides the data structure.
 NumPy has to be installed first and then any other add-ons can be installed.
INSTALLATION OF NUMPY
To install numpy module in terminal write: Sudo apt-get install python-numpy
SCIPY
Scientific Python code name, SciPy-It is an assortment of mathematical functions and algorithms
which are built on top of Python’s extension NumPy.
 Scipy depends on Numpy’s data structure.
 SciPy is useful for data-processing and prototyping of systems.
 SciPy provides various high-level commands and classes for manipulating and visualizing
data.
 Scipy provides the application layer.
INSTALLATION OF SCIPY
To install Scipy module in terminal write: Sudo apt-get install python-scipy
MATPLOTLIB
 Matlplotlib is a Python module for visualization.
 Matplotlib allows you to easily make line graphs, pie chart, histogram and other professional
grade figures.
 Using Matplotlib you can customize every aspect of a figure.
 When used within IPython, Matplotlib has interactive features like zooming and panning.
 It supports different GUI back ends on all operating systems, and can also export graphics to
common vector and graphics formats: PDF, SVG, JPG, PNG, BMP, GIF, etc.
 Matplotlib tries to make easy things easy and hard things possible.
INSTALLATION OF MATPLOTLIB
To install Matplotlib module in terminal write: Sudo apt-get install python-matplotlib
 Sci-Kit learn are modules or plugins for Scipy,they are too specialized to be included in
SciPy itself.
 Sci-Kit Learn is built upon SciPy and thus to use Sci-Kit Learn it is necessary to install various
other Python libraries – Pandas, NumPy, SciPy and IPython.
 Scikit-learn helps to quickly implement popular algorithms on your dataset.
 Scikit-learn includes tools for many standard machine-learning tasks.
SCI-KIT LEARN
INSTALLATION OF SCI-KIT LEARN
If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip:
pip install -U scikit-learn
Data Analysis packages

Data Analysis packages

  • 1.
  • 3.
    DATA ANALYSIS USINGPYTHON Pandas is a software library used in Python programming languages for data manipulation and analysis. Pandas is well suited for many different kinds of data: ● Tabular data with heterogeneously-typed columns. ● Ordered and unordered time series data. ● Arbitrary matrix data with row and column labels. ● Any other form of observational/statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure.
  • 4.
    PANDAS OPERATIONS OPERATIONS Slicing the dataframe Joiningand merging ConcatenationChanging the index Joining and mergingData conversation
  • 5.
    PANDAS DATA STRUCTURES Pandasintroduces two new data structures to Python :  Series  DataFrame
  • 6.
    PANDAS-DATA STRUCTURES: SERIES One-dimensional array-like object containing data and labels (or index)  Lots of ways to build a Series
  • 7.
    SERIES - WORKINGWITH THE INDEX  A series index can be specified  Single values can be selected by index  Multiple values can be selected with multiple indexes
  • 8.
    SERIES - WORKINGWITH THE INDEX  Think of a Series as a fixed-length, order dict  However, unlike dict,index items don't have to be unique
  • 9.
    SERIES - OPERATIONS Filtering  NumPy-type operations on data
  • 10.
    SERIES - INCOMPLETEDATA Pandas can accommodate incomplete data
  • 11.
    SERIES- AUTOMATIC ALIGNMENT Unlikein NumPy ndarray, data is automatically aligned.
  • 12.
    DATA STRUCTURES :DATAFRAME  Spreadsheet-like data structure containing an order collection of columns  Has both a row and column index  Consider as dict of Series (with shared index)
  • 13.
  • 14.
  • 15.
    DATAFRAME  Columns canbe retrieved as Series: dict notation attribute notation  Rows can retrieved by position or by name (using ix attribute)
  • 16.
    DATAFRAME New Columns canbe added (by computation or direct assignment)
  • 17.
    DATAFRAME - REINDEXING Creationof new object with the data conformed to a new index
  • 18.
    INSTALLATION OF PANDAS Toinstall pandas module in terminal write: Sudo apt-get install python-pandas
  • 19.
    PANDAS Pandas is anopen source Python library for data analysis. Pandas Python is one of those libraries for data analysis, that contains high-level data structures and tools to help data scientists or data analysts manipulate data in a very simple and easy way.
  • 20.
    NUMPY Numpy is thestandard package for scientific and numerical computing in python:  Powerful N-dimensional array object.  sophisticated (broadcasting) functions.  Tools for integrating C/C++ and Fortran code.  Useful linear algebra,Fourier transform and random number capabilities.  Numpy provides the data structure.  NumPy has to be installed first and then any other add-ons can be installed.
  • 21.
    INSTALLATION OF NUMPY Toinstall numpy module in terminal write: Sudo apt-get install python-numpy
  • 22.
    SCIPY Scientific Python codename, SciPy-It is an assortment of mathematical functions and algorithms which are built on top of Python’s extension NumPy.  Scipy depends on Numpy’s data structure.  SciPy is useful for data-processing and prototyping of systems.  SciPy provides various high-level commands and classes for manipulating and visualizing data.  Scipy provides the application layer.
  • 23.
    INSTALLATION OF SCIPY Toinstall Scipy module in terminal write: Sudo apt-get install python-scipy
  • 24.
    MATPLOTLIB  Matlplotlib isa Python module for visualization.  Matplotlib allows you to easily make line graphs, pie chart, histogram and other professional grade figures.  Using Matplotlib you can customize every aspect of a figure.  When used within IPython, Matplotlib has interactive features like zooming and panning.  It supports different GUI back ends on all operating systems, and can also export graphics to common vector and graphics formats: PDF, SVG, JPG, PNG, BMP, GIF, etc.  Matplotlib tries to make easy things easy and hard things possible.
  • 25.
    INSTALLATION OF MATPLOTLIB Toinstall Matplotlib module in terminal write: Sudo apt-get install python-matplotlib
  • 26.
     Sci-Kit learnare modules or plugins for Scipy,they are too specialized to be included in SciPy itself.  Sci-Kit Learn is built upon SciPy and thus to use Sci-Kit Learn it is necessary to install various other Python libraries – Pandas, NumPy, SciPy and IPython.  Scikit-learn helps to quickly implement popular algorithms on your dataset.  Scikit-learn includes tools for many standard machine-learning tasks. SCI-KIT LEARN
  • 27.
    INSTALLATION OF SCI-KITLEARN If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip: pip install -U scikit-learn