SlideShare a Scribd company logo
1 of 32
Comparing EDA with
classical and Bayesian
analysis
There are several approaches to data analysis.
Classical data analysis
The Bayesian approach incorporates prior probability distribution
knowledge into the analysis steps:
Probability distribution:
The probability distribution is one of the important concepts in statistics.
 It has huge applications in business, engineering, medicine and other major
sectors.
 It is mainly used to make future predictions based on a sample for a random
experiment.
For example, in business, it is used to predict if there will be profit or loss to the
company using any new strategy or by proving any hypothesis test in the medical
field, etc.
 Data analysts and data scientists freely mix the steps mentioned in the
preceding approaches to get meaningful insights from the data.
 In addition to that, it is essentially difficult to judge or estimate which
model is best for data analysis.
 All of them have their paradigms and are suitable for different types of
data analysis.
Software tools available for EDA
 There are several software tools that are available to facilitate EDA. Here, we are going to
outline some of the open-source tools:
1. Python: This is an open-source programming language widely used in data analysis, data
mining, and data science.
2. R programming language: R is an open-source programming language that is widely utilized in
statistical computation and graphical data analysis.
3. Weka: This is an open-source data mining package that involves several EDA tools and
algorithms.
4. KNIME: This is an open-source tool for data analysis and is based on
Eclipse
Getting started with EDA
NumPy - Basics
 NumPy which stands for Numerical Python.
 Travis Oliphant created NumPy package in 2005
What is Numpy?
 NumPy is a module for Python that allows you to work
with multidimensional arrays and matrices.
 It’s perfect for scientific or mathematical
calculations because it’s fast and efficient.
Why Numpy?
 NumPy provides a convenient and efficient way to handle the vast
amount of data.
 NumPy is also very convenient with Matrix multiplication and data
reshaping.
Create Array:
 import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
[1 2 3 4 5]
Example
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
NumPy Array Creation
import numpy as np
# Creating array from list with type float
a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:n", a)
# Creating array from tuple
b = np.array((1 , 3, 2))
print ("nArray created using passed tuple:n", b)
NumPy Array Creation
# Creating a 3X4 array with all zeros
c = np.zeros((3, 4))
print ("An array initialized with all zeros:n", c)
# Create an array with random values
e = np.random.random((2, 2))
print ("A random array:n", e)
Arange Function in numpy
# Create a sequence of integers
# from 0 to 30 with steps of 5
f = np.arange(0, 30, 5)
print ("A sequential array with steps of 5:n", f)
Reshape From 1-D to 2-D
 import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
Searching Arrays
 Find the indexes where the value is 4:
 Search using where()method
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)
Data Distribution
 Data Distribution is a list of all possible values, and how often each value occurs.
 Such lists are important when working with statistics and data science.
 The random module offers methods that returns randomly generated data
distributions.
Random Distribution
 A random distribution is a set of random numbers.
 It can be created using choice() methods of a random module.
 The choice() method allows us to specify the probability for each value.
 The probability is set by a number between 0 and 1, where
 0 means that the value will never occur and
 1 means that the value will always occur.
Example:
Generate a 1-D array containing 100 values, where each value has to be 3, 5, 7 or 9.
 The probability for the value to be 3 is set to be 0.1
 The probability for the value to be 5 is set to be 0.3
 The probability for the value to be 7 is set to be 0.6
 The probability for the value to be 9 is set to be 0
Pandas
What is Pandas?
 Pandas is a Python library used for working with data sets.
 It has functions for analyzing, cleaning, exploring, and manipulating data.
 The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis"
and was created by Wes McKinney in 2008.
 What is a DataFrame?
 A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and
columns, much like a spreadsheet.
 DataFrames are data structures used in modern data analytics because they are a flexible and
intuitive way of storing and working with data.
Why Use Pandas?
 Pandas allow us to analyze big data and make conclusions based
on statistical theories.
 Pandas can clean messy datasets, and make them readable and
relevant.
 Relevant data is very important in data science.
What Can Pandas Do?
Pandas give you answers about the data.
• Is there a correlation between two or more columns?
 What is the average value?
 Max value?
 Min value?
 Pandas are also able to delete rows that are not relevant, or
contain wrong values, like empty or NULL values. This is
called cleaning the data.
Create Labels
Example :Create your own labels
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
Pandas DataFrames
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional
data structure, like a 2 dimensional array,
or a table with rows and columns.
Pandas Read CSV and JSON
Read CSV Files
 A simple way to store big data sets is to use CSV files.
 CSV files contain plain text and are a well-known format that
can be read by everyone including Pandas.
 In our examples we will be using a CSV file called 'data.csv'.
Example
 import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Data.csv file
Output
Pandas: Read JSON
 Big data sets are often stored, or extracted as JSON.
 JSON is plain text, but has the format of an object, and
is well-known in the world of programming, including
Pandas.
JSON file called 'data.json'.
Load the JSON file into a DataFrame:
 import pandas as pd
df = pd.read_json('data.json')
print(df.to_string())
Pandas - Analyzing DataFrames
 Viewing() method
 One of the most used methods for getting a quick overview of
the DataFrame, is the head() method.
 Get a quick overview by printing the first 10 rows of the
DataFrame:
 import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(10))
https://www.w3schools.com/python/pa
ndas/default.asp

More Related Content

What's hot

Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce ApplicationDr. C.V. Suresh Babu
 
Distributed Computing
Distributed Computing Distributed Computing
Distributed Computing Megha yadav
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsightsCynthia Saracco
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceExperfy
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Confusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsConfusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsMinesh A. Jethva
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualizationGokulnath S
 
RISC and CISC Processors
RISC and CISC ProcessorsRISC and CISC Processors
RISC and CISC ProcessorsAdeel Rasheed
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 

What's hot (20)

Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce Application
 
Distributed Computing
Distributed Computing Distributed Computing
Distributed Computing
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Confusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsConfusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metrics
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualization
 
String matching, naive,
String matching, naive,String matching, naive,
String matching, naive,
 
RISC and CISC Processors
RISC and CISC ProcessorsRISC and CISC Processors
RISC and CISC Processors
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 

Similar to Comparing EDA with classical and Bayesian analysis.pptx

Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science JobRohit Dubey
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NOllieShoresna
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-LearnDucat India
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
Unit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxUnit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxprakashvs7
 
Lecture 3 intro2data
Lecture 3 intro2dataLecture 3 intro2data
Lecture 3 intro2dataJohnson Ubah
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Analysis using r
Analysis using rAnalysis using r
Analysis using rPriya Mohan
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)EMRE AKCAOGLU
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with RDr Nisha Arora
 

Similar to Comparing EDA with classical and Bayesian analysis.pptx (20)

Unit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptxUnit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptx
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
 
Unit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptxUnit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptx
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
 
Lecture3.pptx
Lecture3.pptxLecture3.pptx
Lecture3.pptx
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
Unit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxUnit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptx
 
Lecture 3 intro2data
Lecture 3 intro2dataLecture 3 intro2data
Lecture 3 intro2data
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Analysis using r
Analysis using rAnalysis using r
Analysis using r
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
4)12th_L-1_PYTHON-PANDAS-I.pptx
4)12th_L-1_PYTHON-PANDAS-I.pptx4)12th_L-1_PYTHON-PANDAS-I.pptx
4)12th_L-1_PYTHON-PANDAS-I.pptx
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Comparing EDA with classical and Bayesian analysis.pptx

  • 1. Comparing EDA with classical and Bayesian analysis
  • 2. There are several approaches to data analysis.
  • 3. Classical data analysis The Bayesian approach incorporates prior probability distribution knowledge into the analysis steps: Probability distribution: The probability distribution is one of the important concepts in statistics.  It has huge applications in business, engineering, medicine and other major sectors.  It is mainly used to make future predictions based on a sample for a random experiment. For example, in business, it is used to predict if there will be profit or loss to the company using any new strategy or by proving any hypothesis test in the medical field, etc.
  • 4.  Data analysts and data scientists freely mix the steps mentioned in the preceding approaches to get meaningful insights from the data.  In addition to that, it is essentially difficult to judge or estimate which model is best for data analysis.  All of them have their paradigms and are suitable for different types of data analysis.
  • 5. Software tools available for EDA  There are several software tools that are available to facilitate EDA. Here, we are going to outline some of the open-source tools: 1. Python: This is an open-source programming language widely used in data analysis, data mining, and data science. 2. R programming language: R is an open-source programming language that is widely utilized in statistical computation and graphical data analysis. 3. Weka: This is an open-source data mining package that involves several EDA tools and algorithms. 4. KNIME: This is an open-source tool for data analysis and is based on Eclipse
  • 7. NumPy - Basics  NumPy which stands for Numerical Python.  Travis Oliphant created NumPy package in 2005 What is Numpy?  NumPy is a module for Python that allows you to work with multidimensional arrays and matrices.  It’s perfect for scientific or mathematical calculations because it’s fast and efficient. Why Numpy?  NumPy provides a convenient and efficient way to handle the vast amount of data.  NumPy is also very convenient with Matrix multiplication and data reshaping.
  • 8. Create Array:  import numpy as np arr = np.array([1, 2, 3, 4, 5]) print(arr) print(type(arr)) [1 2 3 4 5]
  • 9. Example import numpy as np # Creating array object arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] ) # Printing type of arr object print("Array is of type: ", type(arr)) # Printing array dimensions (axes) print("No. of dimensions: ", arr.ndim) # Printing shape of array print("Shape of array: ", arr.shape) # Printing size (total number of elements) of array print("Size of array: ", arr.size)
  • 10. NumPy Array Creation import numpy as np # Creating array from list with type float a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float') print ("Array created using passed list:n", a) # Creating array from tuple b = np.array((1 , 3, 2)) print ("nArray created using passed tuple:n", b)
  • 11. NumPy Array Creation # Creating a 3X4 array with all zeros c = np.zeros((3, 4)) print ("An array initialized with all zeros:n", c) # Create an array with random values e = np.random.random((2, 2)) print ("A random array:n", e)
  • 12. Arange Function in numpy # Create a sequence of integers # from 0 to 30 with steps of 5 f = np.arange(0, 30, 5) print ("A sequential array with steps of 5:n", f)
  • 13. Reshape From 1-D to 2-D  import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) newarr = arr.reshape(4, 3) print(newarr)
  • 14. Searching Arrays  Find the indexes where the value is 4:  Search using where()method import numpy as np arr = np.array([1, 2, 3, 4, 5, 4, 4]) x = np.where(arr == 4) print(x)
  • 15. Data Distribution  Data Distribution is a list of all possible values, and how often each value occurs.  Such lists are important when working with statistics and data science.  The random module offers methods that returns randomly generated data distributions.
  • 16. Random Distribution  A random distribution is a set of random numbers.  It can be created using choice() methods of a random module.  The choice() method allows us to specify the probability for each value.  The probability is set by a number between 0 and 1, where  0 means that the value will never occur and  1 means that the value will always occur.
  • 17. Example: Generate a 1-D array containing 100 values, where each value has to be 3, 5, 7 or 9.  The probability for the value to be 3 is set to be 0.1  The probability for the value to be 5 is set to be 0.3  The probability for the value to be 7 is set to be 0.6  The probability for the value to be 9 is set to be 0
  • 19. What is Pandas?  Pandas is a Python library used for working with data sets.  It has functions for analyzing, cleaning, exploring, and manipulating data.  The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.  What is a DataFrame?  A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet.  DataFrames are data structures used in modern data analytics because they are a flexible and intuitive way of storing and working with data.
  • 20. Why Use Pandas?  Pandas allow us to analyze big data and make conclusions based on statistical theories.  Pandas can clean messy datasets, and make them readable and relevant.  Relevant data is very important in data science.
  • 21. What Can Pandas Do? Pandas give you answers about the data. • Is there a correlation between two or more columns?  What is the average value?  Max value?  Min value?  Pandas are also able to delete rows that are not relevant, or contain wrong values, like empty or NULL values. This is called cleaning the data.
  • 22. Create Labels Example :Create your own labels import pandas as pd a = [1, 7, 2] myvar = pd.Series(a, index = ["x", "y", "z"]) print(myvar)
  • 23. Pandas DataFrames What is a DataFrame? A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
  • 24. Pandas Read CSV and JSON Read CSV Files  A simple way to store big data sets is to use CSV files.  CSV files contain plain text and are a well-known format that can be read by everyone including Pandas.  In our examples we will be using a CSV file called 'data.csv'.
  • 25. Example  import pandas as pd mydataset = { 'cars': ["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2] } myvar = pd.DataFrame(mydataset) print(myvar)
  • 28. Pandas: Read JSON  Big data sets are often stored, or extracted as JSON.  JSON is plain text, but has the format of an object, and is well-known in the world of programming, including Pandas.
  • 29. JSON file called 'data.json'.
  • 30. Load the JSON file into a DataFrame:  import pandas as pd df = pd.read_json('data.json') print(df.to_string())
  • 31. Pandas - Analyzing DataFrames  Viewing() method  One of the most used methods for getting a quick overview of the DataFrame, is the head() method.  Get a quick overview by printing the first 10 rows of the DataFrame:  import pandas as pd df = pd.read_csv('data.csv') print(df.head(10))