Exploratory Data Analysis
By : Vishwas Narayan
Principles of Analytic Graphics
Principle 1 : Show comparison
Principle 2 : Show causality,mechanism,explanation.systematic structure
Principle 3 : Show multivariate data
Principle 4 : Integration Of the evidence
Principle 5 : Describe and document the evidences with the appropriate
labels,scales,sources etc
Principle 6 : Content is king
Graphs
● To understand the data property
● To find patterns
● To suggest modeling strategy
● To “debug” analysis
Characteristics of the exploratory data
● They are made quickly
● A large number can be made
● The goal is for the personal understanding
● Axes/legends are generally cleaned up(later)
● Color and the size are the primary information
One Dimensional Summaries
mean,median,mode,max,min
Two Dimensional Plots
Multiple or the overlaid plots
Scatter plots
Smooth scatter plots
> 2 dimensional plots
Co-plots
Spinning plots
Use of the color,size
Base plotting system
test ,lines,points,axis
These will be very convenient
Difficult to translate
Plot is just a series of the command
Lattice system
The plots are created with the specific function
Spacing and the margins are automatic only parameters(a lot in number) is to be
given here.
Panel plot are the different levels of the third variable.
The ggplot2 system
Splits the difference between base and the lattice in the number of ways
Automatically deals with the spacing,text,titles but also allows you to annotate
adding to a plot
Superficial similarity to lattice but generally easier/more intuitive to use
Scaling
Vertical Scaling
Horizontal Scaling
Today cost is the key
Shap Framework
Jmeter tool
Train Test Scree Plots
What is Data Wrangling?
What you just did now is Data Wrangling.
Lattice plotting system
Lattice data system behavior
Hierarchical Clustering
K Mean clustering
Dimension Reduction

Exploratory data analysis

Editor's Notes

  • #3 https://simplystatistics.org/