SlideShare a Scribd company logo
1 of 41
DATA ANALYSIS
CHARAK RAY
libra.charak@gmail.com
COURSE CONTENTS
•Core Data Analysis
• 1D analysis
• 2D analysis: both quantitative
• 2D analysis: both nominal
• Learning multivariate correlation
• Principal components (PCA) and SVD: Mathematical foundations
• Principal components (PCA) and SVD: Applications
• Clustering with k-means
INTRO: WHAT IS CORE DATA
ANALYSIS?
Four main parts
1. Data Mining and data patterns and their use
2. Core data analysis: two main goals for
Knowledge Enhancing
3. Visualization: How it works
4. Illustrative data cases
INTRO: DATA MINING AND DATA PATTERNS
AND THEIR USE
•Is it Data Mining?
• Well, what is Data Mining?
• Generically, Data Mining is looking for (i) patterns in data stored in (ii) Databases
as part of (iii) Knowledge Discovery
• Core data analysis does not care of (ii) Databases
• Core data analysis does care of (ia) specific patterns in data as part of
(iia) Knowledge Enhancing
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 1
The History of Laws for planetary motion
Double success
Ptolemy (c. 150 a.d.):
• Sun and planets
• circle Earth
• Does not match data well
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 2
The History of Laws for planetary motion
• Copernicus (c. 1540):
• Planets circle Sun
• Does not match data well
• either
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 3
Laws for planetary motion:
Kepler (c. 1605):
• 1st Law: Planets revolve Sun in ellipses (ovals)
• 2d Law: Speed changes – the further away from Sun, the faster
• Does either
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 4
Planet
Period
(year)
Distance (average,
relative to that of
Earth)
Mercury
Venus
Earth
Mars
Jupiter
Saturn
Uranus
Neptune
Pluto
0.241
0.615
1.00
1.88
11.8
29.5
84.0
165
248
0.39
0.72
1.00
1.52
5.20
9.54
19.18
30.06
39.44
3d Law:
Is there any relation
between
speed/period and
distance?
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 5
3d Kepler’s Law:
Is there any relation
between speed/period
and distance?
Fit no line…
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 6
3d Kepler’s Law (1619):
[J. Napier invented
logarithm (1614)]
Log(P)=
𝟑
𝟐
Log(D)
P2=D3
INTRO: EXAMPLE OF PATTERN
DOUBLE SUCCESS 7
Three Kepler’s Laws: What is so grand?
Substantiated theoretically by
R. Hooke (1635-1703) and I. Newton (1642-1727)
UNIVERSAL GRAVITATION LAW
Mathematical equation, cornerstone of modern science
INTRO: EXAMPLE OF PATTERN
FAILURE? 1
Imagine this:
Broad street, Soho, London,
Cholera outbreak September 1854
Dr. Snow report: “On proceeding to the spot, I found
that nearly all the deaths had taken place within a short
distance of the pump.”
Dr John Snow’s map:
Cases of death
labeled by ticks.
The handle of pump
removed 7/9/1854.
INTRO 1: EXAMPLE OF PATTERN
FAILURE? 2
Myth: Death stopped. Data analysis won.
Fact: Data analysis lost. The health commission rejected the water
pump theory, as contradicting the science of the day (cholera outbreak
caused by “concentrated noxious atmospheric influence, no doubt
emanating from putrefying organic matter”). The handle of the pump
was ordered back. Death stopped because all died already.
More death occurred at further cholera outbreaks till R. Koch discovered
and publicized the vibrio cholera in 1883.
Dr John Snow’s map:
A case of death
Is labeled by a tick
PATTERN FOUND
Success: if
Compatible with existing knowledge
Failure: if
Not compatible with existing knowledge
Advice
• Find a pattern
• Interpret using existing knowledge
• Care not whether interpretation is
compatible
INTRO: WHAT IS CORE DATA
ANALYSIS II 1
• Core data analysis does care of (ia) specific patterns in data as part of (iia) Knowledge
Enhancing
• What are these (ia), (iia) specifics?
• Have something to do with the notion of Knowledge
• Statements of fact (“I teach this class.”) – factual
• Statements of pattern, regularity (“Professors use to teach classes.”) - structural
INTRO: WHAT IS CORE DATA
ANALYSIS II 2
• Core data analysis does care of (ia) specific patterns in data as part of (iia) Knowledge
Enhancing
• (ia), (iia) specifics relate to elements of structural knowledge
• Elements of Structural knowledge:
• Concepts (“Professor”, “Teach”, “Class”)
• Statements of relation between concepts (“Professors use to teach classes.”) - structural
INTRO: WHAT IS CORE DATA
ANALYSIS II
•List elements of structural knowledge,
•concepts and
•statements of relation among them, for
•3d Kepler’s Law
•Dr Snow’s cholera outbreak map
INTRO: WHAT IS CORE DATA
ANALYSIS II 3
• Core data analysis does care of (ia) deriving concepts and statements of relation between them
from data
• (iia) Structural Knowledge Enhancing, generically, via either of the two pathways
• Two pathways for Structural Knowledge Enhancing
• Summarization: Developing Concepts
• Correlation: Deriving Statements of relation between concepts
W1. INTRO: WHAT IS CORE DATA
ANALYSIS II 4
• Two pathways for Structural Knowledge Enhancing
• Summarization: Developing Concepts
• Correlation: Deriving Statements of relation between concepts
 Two major formats:
 Quantitative (both concepts and statements)
 3d Kepler’s Law
Period2 = Distance3
 Categorical (both concepts and statements)
 Dr Snow’s conclusion:
Cholera death is caused by pump water
INTRO II: STRUCTURAL
KNOWLEDGE ENHANCING GENERIC
METHODS
•Two pathways & Two formats
• Summarization methods:
• Quantitative Principal component analysis (PCA)
• Categorical Cluster analysis
• Correlation methods:
• Quantitative Regression
• Categorical Classifier
INTRO II: THREE POSSIBLE LAYERS
OF STUDY
Pro Con
• Systems Usable now Short lived
Simple Too many
• Concepts Awareness Superficial
• Methods Workable Technical
Extendable Boring
Long-term
INTRO II: COURSE CONTENTS
REVIEW
•Summarization: PCA (Weeks 6 and 7), Cluster
analysis (Week 8)
•Correlation: Classifier (Week 5), (no Regression, sorry;
if needed, go to Statistics, Econometrics and Neuron Networks
courses)
•Prequel: 1D and 2D analyses to study basic
concepts and basic methods
•Pre-prequel: Intro – Data and problems
INTRO II: RELATION TO OTHER
APPROACHES
• Classical mathematical statistics: data is just a vehicle to fit and test
mathematical models in the applied domain (say, in data analysis, a feature is
a column in table, they model it as a random variable!)
• Machine Learning: Prediction rules to be built incrementally (say, here PCA is
a major method; for them, just a method to preprocess the data)
• Data Mining: adding new knowledge by finding
interesting patterns in databases, which is initial
stage of knowledge discovery (CDA is part of that,
up to databases)
OVERALL: METHODS are SAME, PERSPECTIVES DO DIFFER
INTRO III: VISUALIZATION
• Visualization of data is an important activity assisting data analysis by a human in many ways
including
A. Highlighting
B. Integrating different aspects
C. Manipulating (not shown)
A few examples follow.
INTRO III: VISUALIZATION
A. Highlighting 1
Figure 1. A fragment of London Tube
map made after H. Beck (1906); the
central part is highlighted by
disproportionate scaling. Being, for a
long while, totally rejected by the
authorities, a standard for metro maps
worldwide.
INTRO III: VISUALIZATION
A. Highlighting 2: Cheating by distortion
Figure 2. A decline in relative numbers of
general practitioner doctors in California in 70-
es is conveniently visualized using 1D size-, not
2D area-related, scaling of a picture of doctor.
INTRO III: VISUALIZATION
Highlighting 3: Cheating by
distortion
Figure 3. Another unintended
distortion: a newspaper’s self-
satisfaction report (July 2005) is
visualized with bars that grow
from mark 500,000 rather than 0.
A 25% advantage has visually
grown ten-fold!
INTRO III: VISUALIZATION
B. Integrating aspects 1
Figure 4. Con Edison company’s power grid screen over
Manhattan NY. Grid repair problems are dealt with on the fly
by sending operators upon seeing disorders on the screen.
INTRO III: VISUALIZATION
B. Integrating aspects 2
Figure 5. Minard’s (1869) depiction of a lost Napoleon
campaign 1812 integrating space, time and strength of
the French army.
INTRO III: VISUALIZATION
B. Integrating
aspects 3
Figure 6. The
structure of research
activities of CENTRIA
(UNL, Lisbon) in 2007
represented over ACM
Computer Subjects
Classification 1998.
INTRO IV: ILLUSTRATIVE DATA CASES
Company name Income, $mln MShare,% NSup EC Sector
Aversiona
Antyops
Astonite
19.0
29.4
23.9
43.7
36.0
38.0
2
3
3
No
No
No
Utility
Utility
Industrial
Bayermart
Breaktops
Bumchista
18.4
25.7
12.1
27.9
22.3
16.9
2
3
2
Yes
Yes
Yes
Utility
Industrial
Industrial
Civiok
Cyberdam
23.9
27.2
30.2
58.0
4
5
Yes
Yes
Retail
Retail
Case 1: Companies 1
Companies characterized by mixed scale features; the first three companies making product A, the next three
making product B, and the last two product C.
Metadata: A. Features and Domain knowledge
1) Income, $ Mln;
2) Mshare - Market share , per cent;
3) NSup - Number of principal suppliers;
4) ECommerce - Yes e-trade or No;
5) Sector - (a) Retail, (b) Utility, and (c) Industrial.
B. Main production (A,B,C)
C. Feature scale types (3 main types)
INTRO IV: ILLUSTRATIVE DATA CASES
Case 1: Companies 2
Metadata: A. Features and Domain knowledge
1) Income, $ Mln;
2) Mshare - Market share , per cent;
3) NSup - Number of principal suppliers;
4) ECommerce - Yes e-trade or No;
5) Sector - (a) Retail, (b) Utility, and (c) Industrial.
Feature: Maps entities to feature values (Synonyms: Variable,
Attribute, Character, Parameter)
Feature. Quantitative scale: Arithmetic averaging makes
sense
Examples: 1) Income, 2) Mshare, 3) NSup
INTRO IV: ILLUSTRATIVE DATA CASES
Case 1: Companies 3
Metadata: A. Features and Domain knowledge
1) Income, $ Mln;
2) Mshare - Market share , per cent;
3) NSup - Number of principal suppliers;
4) ECommerce - Yes e-trade or No;
5) Sector - (a) Retail, (b) Utility, and (c) Industrial.
Feature. Nominal scale: Disjunctive categories, Only comparison “equal or
not” making sense (Special case of categorical scales)
Example: 5) Sector (Retail, Utility, Industrial are values
Feature. Binary scale: Two disjunctive categories, “Yes” and “No”
Shares properties of nominal scale and quantitative scale if 1/0 coded
Example: 4) ECommerce
INTRO IV: QUANTATIVE CODING
Company name Income, $mln MShare,% NSup EC Sector
Aversiona
Antyops
Astonite
19.0
29.4
23.9
43.7
36.0
38.0
2
3
3
No
No
No
Utility
Utility
Industrial
Bayermart
Breaktops
Bumchista
18.4
25.7
12.1
27.9
22.3
16.9
2
3
2
Yes
Yes
Yes
Utility
Industrial
Industrial
Civiok
Cyberdam
23.9
27.2
30.2
58.0
4
5
Yes
Yes
Retail
Retail
Case 1: Companies 4
Quantitative coding: Each category is made into a 1/0 binary (dummy) feature “Does
it hold? 1 if Yes, 0 if No.”
Entity Income MSchar NSup EC? Util? Indu? Retail?
1
2
3
19.0
29.4
23.9
43.7
36.0
38.0
2
3
3
0
0
0
1
1
0
0
0
1
0
0
0
4
5
6
18.4
25.7
12.1
27.9
22.3
16.9
2
3
2
1
1
1
1
0
0
0
1
1
0
0
0
7
8
23.9
27.2
30.2
58.0
4
5
1
1
0
0
0
0
1
1
Company data 8x5 converted to the quantitative format 8x7
INTRO IV: ILLUSTRATIVE DATA CASES
Company name Income, $mln MShare,% NSup EC Sector
Aversiona
Antyops
Astonite
19.0
29.4
23.9
43.7
36.0
38.0
2
3
3
No
No
No
Utility
Utility
Industrial
Bayermart
Breaktops
Bumchista
18.4
25.7
12.1
27.9
22.3
16.9
2
3
2
Yes
Yes
Yes
Utility
Industrial
Industrial
Civiok
Cyberdam
23.9
27.2
30.2
58.0
4
5
Yes
Yes
Retail
Retail
Case 1: Companies 5
Data analysis:
• How to map companies to the screen with their similarity reflected in distances
between points? (Summarization/visualization)
• Would clustering of companies reflect the product? What features would be
involved then? (Summarization)
• Can rules be derived to predict the product for another company, coming outside
of the table? (Correlation)
• Is there any relation between the structural features (Nsup,EC,Sector) and
market related features (Income, MSchare)? (Correlation.)
INTRO IV: ILLUSTRATIVE DATA CASES
Case 2: Iris 1
Anderson–Fisher Iris 150x4 data of three taxa:
Specimen (1-150)Taxon
1-50 Iris setosa (diploid)
51-100 Iris versicolor (tetraploid)
101-150 Iris virginica (hexaploid)
Features
W1 Sepal length
W2 Sepal width
W3 Petal length
W4 Petal width
INTRO IV: DATA CASES
Case 2: Iris 2
#
I Iris setosa II Iris versicolor III Iris virginica
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
1
2
3
4
5
6
7
8
9
50
5.1 3.5 1.4 0.3
4.4 3.2 1.3 0.2
4.4 3.0 1.3 0.2
5.0 3.5 1.6 0.6
5.1 3.8 1.6 0.2
4.9 3.1 1.5 0.2
5.0 3.2 1.2 0.2
4.6 3.2 1.4 0.2
5.0 3.3 1.4 0.2
5.1 3.5 1.4 0.2
6.4 3.2 4.5 1.5
5.5 2.4 3.8 1.1
5.7 2.9 4.2 1.3
5.7 3.0 4.2 1.2
5.6 2.9 3.6 1.3
7.0 3.2 4.7 1.4
6.8 2.8 4.8 1.4
6.1 2.8 4.7 1.2
4.9 2.4 3.3 1.0
6.0 2.2 4.0 1.0
6.3 3.3 6.0 2.5
6.7 3.3 5.7 2.1
7.2 3.6 6.1 2.5
7.7 3.8 6.7 2.2
7.2 3.0 5.8 1.6
7.4 2.8 6.1 1.9
7.6 3.0 6.6 2.1
7.7 2.8 6.7 2.0
6.2 3.4 5.4 2.3
6.5 3.2 5.1 2.0
Data analysis
• Visualise the data so that similar specimen are mapped into
points that are near each other, and dissimilar to far away points
• Build a predictor of sepal sizes from the petal sizes (to lessen the
burden of measurement)
• Build a predictor of taxa (classifier) based on the petal/sepal
sizes
INTRO IV: DATA CASES
Case 3: Intrusion attack 1
Features
1) Pr, the protocol-type, which is either tcp or icmp or udp (a nominal feature),
2) BySD, the number of data bytes from source to destination,
3) SH, the number of connections to the same host as the current one in the past two seconds,
4) SS, the number of connections to the same service as the current one in the past two
seconds,
5) SE, the rate of connections (per cent in SHCo) that have SYN errors,
6) RE, the rate of connections (per cent in SHCo) that have REJ errors,
7) A, the type of attack (ap - apache, sa - saint, sm - smurf, and no attack) – a nominal
Pr BySD SH SS SE RE A Pr ByS SH SS Se RE A
Tcp
62344
16 16 0 0.94 Ap Tcp 287 14 14 0 0 no
Tcp 60884 17 17 0.06 0.88 Ap Tcp 308 1 1 0 0 no
Tcp 59424 18 18 0.06 0.89 Ap Tcp 284 5 5 0 0 no
Tcp 59424 19 19 0.05 0.89 Ap Udp 105 2 2 0 0 no
Tcp 59424 20 20 0.05 0.9 Ap Udp 105 2 2 0 0 no
Tcp 75484 21 21 0.05 0.9 Ap Udp 105 2 2 0 0 no
INTRO IV: DATA CASES
Case 3: Intrusion attack 2
Data analysis
• Build a classifier to judge whether the system functions normally or is it under
attack (Correlation);
• Is there any relation between the protocol and type of attack (Correlation);
• Visualize the data reflecting similarity of the patterns (Summarization).
Pr BySD SH SS SE RE A Pr ByS SH SS Se RE A
Tcp
62344
16 16 0 0.94 Ap Tcp 287 14 14 0 0 no
Tcp 60884 17 17 0.06 0.88 Ap Tcp 308 1 1 0 0 no
Tcp 59424 18 18 0.06 0.89 Ap Tcp 284 5 5 0 0 no
Tcp 59424 19 19 0.05 0.89 Ap Udp 105 2 2 0 0 no
Tcp 59424 20 20 0.05 0.9 Ap Udp 105 2 2 0 0 no
Tcp 75484 21 21 0.05 0.9 Ap Udp 105 2 2 0 0 no
TOPICS COVERED:
1. Data Mining and data patterns and their use: if
found a pattern, interpret it!
2. Knowledge Enhancing: summarize to concepts,
correlate to statements of relation.
3. Visualize: to highlight or integrate aspects.
4. Illustrative data cases: concept of feature,
feature scale, data table, data analysis
problem.
THANK YOU…

More Related Content

What's hot

Text analysis and its Importance.pdf
Text analysis and its Importance.pdfText analysis and its Importance.pdf
Text analysis and its Importance.pdfVivekDixit486466
 
Quantitative and qualitative analysis of data
Quantitative and qualitative analysis of dataQuantitative and qualitative analysis of data
Quantitative and qualitative analysis of dataNisha M S
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopJSI
 
Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data AnalyticsProduct School
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Data Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven CultureData Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven CultureAmazon Web Services
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Simplilearn
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization Ana Jofre
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
Data visualization
Data visualizationData visualization
Data visualizationHoang Nguyen
 
Data Visualization
Data VisualizationData Visualization
Data Visualizationsimonwandrew
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Principles of data visualisation 2021
Principles of data visualisation 2021Principles of data visualisation 2021
Principles of data visualisation 2021Marié Roux
 

What's hot (20)

Text analysis and its Importance.pdf
Text analysis and its Importance.pdfText analysis and its Importance.pdf
Text analysis and its Importance.pdf
 
Quantitative and qualitative analysis of data
Quantitative and qualitative analysis of dataQuantitative and qualitative analysis of data
Quantitative and qualitative analysis of data
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data analysis
Data analysisData analysis
Data analysis
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
 
Data Visualization.pptx
Data Visualization.pptxData Visualization.pptx
Data Visualization.pptx
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Data Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven CultureData Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven Culture
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Data visualization
Data visualizationData visualization
Data visualization
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Data analysis
Data analysisData analysis
Data analysis
 
Principles of data visualisation 2021
Principles of data visualisation 2021Principles of data visualisation 2021
Principles of data visualisation 2021
 

Similar to DATA ANALYSIS

Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
Diagram webinar gould_30oct12
Diagram webinar gould_30oct12Diagram webinar gould_30oct12
Diagram webinar gould_30oct12Tom Kuipers
 
Innovative design methods for data science - beyond brainstorming
Innovative design methods for data science - beyond brainstormingInnovative design methods for data science - beyond brainstorming
Innovative design methods for data science - beyond brainstormingAkin Osman Kazakci
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
CSCW in Times of Social Media
CSCW in Times of Social MediaCSCW in Times of Social Media
CSCW in Times of Social MediaHendrik Drachsler
 
Integration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresIntegration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresMark Borkum
 
SE2016 BigData Denis Reznik "Data driven future"
SE2016 BigData Denis Reznik "Data driven future"SE2016 BigData Denis Reznik "Data driven future"
SE2016 BigData Denis Reznik "Data driven future"Inhacking
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
Kid171 chap0 english version
Kid171 chap0 english versionKid171 chap0 english version
Kid171 chap0 english versionFrank S.C. Tseng
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper PresentationShubham Singh
 

Similar to DATA ANALYSIS (20)

Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
R - datascience
R - datascienceR - datascience
R - datascience
 
Diagram webinar gould_30oct12
Diagram webinar gould_30oct12Diagram webinar gould_30oct12
Diagram webinar gould_30oct12
 
Innovative design methods for data science - beyond brainstorming
Innovative design methods for data science - beyond brainstormingInnovative design methods for data science - beyond brainstorming
Innovative design methods for data science - beyond brainstorming
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
CSCW in Times of Social Media
CSCW in Times of Social MediaCSCW in Times of Social Media
CSCW in Times of Social Media
 
Integration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresIntegration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structures
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
 
SE2016 BigData Denis Reznik "Data driven future"
SE2016 BigData Denis Reznik "Data driven future"SE2016 BigData Denis Reznik "Data driven future"
SE2016 BigData Denis Reznik "Data driven future"
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Kid171 chap0 english version
Kid171 chap0 english versionKid171 chap0 english version
Kid171 chap0 english version
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
Big Data and IOT
Big Data and IOTBig Data and IOT
Big Data and IOT
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper Presentation
 
Metric Take Home Lab
Metric Take Home LabMetric Take Home Lab
Metric Take Home Lab
 
ACT Science Coffee - Luisa Buinhas
ACT Science Coffee - Luisa BuinhasACT Science Coffee - Luisa Buinhas
ACT Science Coffee - Luisa Buinhas
 

More from CHARAK RAY

SENSING ENTREPRENEURIAL OPPORTUNITY.pptx
SENSING ENTREPRENEURIAL OPPORTUNITY.pptxSENSING ENTREPRENEURIAL OPPORTUNITY.pptx
SENSING ENTREPRENEURIAL OPPORTUNITY.pptxCHARAK RAY
 
BUSINESS, TRADE & COMMERCE
BUSINESS, TRADE & COMMERCEBUSINESS, TRADE & COMMERCE
BUSINESS, TRADE & COMMERCECHARAK RAY
 
INTRODUCTION TO BUSINESS MANAGEMENT
INTRODUCTION TO BUSINESS MANAGEMENTINTRODUCTION TO BUSINESS MANAGEMENT
INTRODUCTION TO BUSINESS MANAGEMENTCHARAK RAY
 
CASE STUDY.pptx
CASE STUDY.pptxCASE STUDY.pptx
CASE STUDY.pptxCHARAK RAY
 
HUMAN TRANSFORMATION.pptx
HUMAN TRANSFORMATION.pptxHUMAN TRANSFORMATION.pptx
HUMAN TRANSFORMATION.pptxCHARAK RAY
 
ENTREPRENEURSHIP.pptx
ENTREPRENEURSHIP.pptxENTREPRENEURSHIP.pptx
ENTREPRENEURSHIP.pptxCHARAK RAY
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyCHARAK RAY
 
PROJECT REPORT ON COLD STORAGE
PROJECT REPORT ON COLD STORAGEPROJECT REPORT ON COLD STORAGE
PROJECT REPORT ON COLD STORAGECHARAK RAY
 
PROJECT FINANCE
PROJECT FINANCEPROJECT FINANCE
PROJECT FINANCECHARAK RAY
 
LINEAR ALGEBRA, WITH OPTIMIZATION
LINEAR ALGEBRA, WITH OPTIMIZATIONLINEAR ALGEBRA, WITH OPTIMIZATION
LINEAR ALGEBRA, WITH OPTIMIZATIONCHARAK RAY
 
TRAINING OF POLLING PERSONNEL
TRAINING OF POLLING PERSONNELTRAINING OF POLLING PERSONNEL
TRAINING OF POLLING PERSONNELCHARAK RAY
 
WRITING AN ABSTRACT
WRITING AN ABSTRACTWRITING AN ABSTRACT
WRITING AN ABSTRACTCHARAK RAY
 
BUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENT
BUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENTBUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENT
BUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENTCHARAK RAY
 
BUSINESS STUDIES PROJECT ON MARKETING MANAGEMENT
BUSINESS STUDIES PROJECT ON MARKETING MANAGEMENTBUSINESS STUDIES PROJECT ON MARKETING MANAGEMENT
BUSINESS STUDIES PROJECT ON MARKETING MANAGEMENTCHARAK RAY
 
BUSINESS STUDIES PROJECT GUIDELINES
BUSINESS STUDIES PROJECT GUIDELINESBUSINESS STUDIES PROJECT GUIDELINES
BUSINESS STUDIES PROJECT GUIDELINESCHARAK RAY
 
ROYAL CLEAN SERVICES.
ROYAL CLEAN SERVICES.ROYAL CLEAN SERVICES.
ROYAL CLEAN SERVICES.CHARAK RAY
 
ROYAL MEGA ORGANIC FOOD PARK
ROYAL MEGA ORGANIC FOOD PARK ROYAL MEGA ORGANIC FOOD PARK
ROYAL MEGA ORGANIC FOOD PARK CHARAK RAY
 

More from CHARAK RAY (20)

SENSING ENTREPRENEURIAL OPPORTUNITY.pptx
SENSING ENTREPRENEURIAL OPPORTUNITY.pptxSENSING ENTREPRENEURIAL OPPORTUNITY.pptx
SENSING ENTREPRENEURIAL OPPORTUNITY.pptx
 
CH-1C.pptx
CH-1C.pptxCH-1C.pptx
CH-1C.pptx
 
CH-1B.pptx
CH-1B.pptxCH-1B.pptx
CH-1B.pptx
 
BUSINESS, TRADE & COMMERCE
BUSINESS, TRADE & COMMERCEBUSINESS, TRADE & COMMERCE
BUSINESS, TRADE & COMMERCE
 
INTRODUCTION TO BUSINESS MANAGEMENT
INTRODUCTION TO BUSINESS MANAGEMENTINTRODUCTION TO BUSINESS MANAGEMENT
INTRODUCTION TO BUSINESS MANAGEMENT
 
CASE STUDY.pptx
CASE STUDY.pptxCASE STUDY.pptx
CASE STUDY.pptx
 
HUMAN TRANSFORMATION.pptx
HUMAN TRANSFORMATION.pptxHUMAN TRANSFORMATION.pptx
HUMAN TRANSFORMATION.pptx
 
ENTREPRENEURSHIP.pptx
ENTREPRENEURSHIP.pptxENTREPRENEURSHIP.pptx
ENTREPRENEURSHIP.pptx
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
MS WORD
MS WORDMS WORD
MS WORD
 
PROJECT REPORT ON COLD STORAGE
PROJECT REPORT ON COLD STORAGEPROJECT REPORT ON COLD STORAGE
PROJECT REPORT ON COLD STORAGE
 
PROJECT FINANCE
PROJECT FINANCEPROJECT FINANCE
PROJECT FINANCE
 
LINEAR ALGEBRA, WITH OPTIMIZATION
LINEAR ALGEBRA, WITH OPTIMIZATIONLINEAR ALGEBRA, WITH OPTIMIZATION
LINEAR ALGEBRA, WITH OPTIMIZATION
 
TRAINING OF POLLING PERSONNEL
TRAINING OF POLLING PERSONNELTRAINING OF POLLING PERSONNEL
TRAINING OF POLLING PERSONNEL
 
WRITING AN ABSTRACT
WRITING AN ABSTRACTWRITING AN ABSTRACT
WRITING AN ABSTRACT
 
BUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENT
BUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENTBUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENT
BUSINESS STUDIES PROJECT ON PRINCIPLES OF MANAGEMENT
 
BUSINESS STUDIES PROJECT ON MARKETING MANAGEMENT
BUSINESS STUDIES PROJECT ON MARKETING MANAGEMENTBUSINESS STUDIES PROJECT ON MARKETING MANAGEMENT
BUSINESS STUDIES PROJECT ON MARKETING MANAGEMENT
 
BUSINESS STUDIES PROJECT GUIDELINES
BUSINESS STUDIES PROJECT GUIDELINESBUSINESS STUDIES PROJECT GUIDELINES
BUSINESS STUDIES PROJECT GUIDELINES
 
ROYAL CLEAN SERVICES.
ROYAL CLEAN SERVICES.ROYAL CLEAN SERVICES.
ROYAL CLEAN SERVICES.
 
ROYAL MEGA ORGANIC FOOD PARK
ROYAL MEGA ORGANIC FOOD PARK ROYAL MEGA ORGANIC FOOD PARK
ROYAL MEGA ORGANIC FOOD PARK
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

DATA ANALYSIS

  • 2. COURSE CONTENTS •Core Data Analysis • 1D analysis • 2D analysis: both quantitative • 2D analysis: both nominal • Learning multivariate correlation • Principal components (PCA) and SVD: Mathematical foundations • Principal components (PCA) and SVD: Applications • Clustering with k-means
  • 3. INTRO: WHAT IS CORE DATA ANALYSIS? Four main parts 1. Data Mining and data patterns and their use 2. Core data analysis: two main goals for Knowledge Enhancing 3. Visualization: How it works 4. Illustrative data cases
  • 4. INTRO: DATA MINING AND DATA PATTERNS AND THEIR USE •Is it Data Mining? • Well, what is Data Mining? • Generically, Data Mining is looking for (i) patterns in data stored in (ii) Databases as part of (iii) Knowledge Discovery • Core data analysis does not care of (ii) Databases • Core data analysis does care of (ia) specific patterns in data as part of (iia) Knowledge Enhancing
  • 5. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 1 The History of Laws for planetary motion Double success Ptolemy (c. 150 a.d.): • Sun and planets • circle Earth • Does not match data well
  • 6. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 2 The History of Laws for planetary motion • Copernicus (c. 1540): • Planets circle Sun • Does not match data well • either
  • 7. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 3 Laws for planetary motion: Kepler (c. 1605): • 1st Law: Planets revolve Sun in ellipses (ovals) • 2d Law: Speed changes – the further away from Sun, the faster • Does either
  • 8. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 4 Planet Period (year) Distance (average, relative to that of Earth) Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto 0.241 0.615 1.00 1.88 11.8 29.5 84.0 165 248 0.39 0.72 1.00 1.52 5.20 9.54 19.18 30.06 39.44 3d Law: Is there any relation between speed/period and distance?
  • 9. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 5 3d Kepler’s Law: Is there any relation between speed/period and distance? Fit no line…
  • 10. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 6 3d Kepler’s Law (1619): [J. Napier invented logarithm (1614)] Log(P)= 𝟑 𝟐 Log(D) P2=D3
  • 11. INTRO: EXAMPLE OF PATTERN DOUBLE SUCCESS 7 Three Kepler’s Laws: What is so grand? Substantiated theoretically by R. Hooke (1635-1703) and I. Newton (1642-1727) UNIVERSAL GRAVITATION LAW Mathematical equation, cornerstone of modern science
  • 12. INTRO: EXAMPLE OF PATTERN FAILURE? 1 Imagine this: Broad street, Soho, London, Cholera outbreak September 1854 Dr. Snow report: “On proceeding to the spot, I found that nearly all the deaths had taken place within a short distance of the pump.” Dr John Snow’s map: Cases of death labeled by ticks. The handle of pump removed 7/9/1854.
  • 13. INTRO 1: EXAMPLE OF PATTERN FAILURE? 2 Myth: Death stopped. Data analysis won. Fact: Data analysis lost. The health commission rejected the water pump theory, as contradicting the science of the day (cholera outbreak caused by “concentrated noxious atmospheric influence, no doubt emanating from putrefying organic matter”). The handle of the pump was ordered back. Death stopped because all died already. More death occurred at further cholera outbreaks till R. Koch discovered and publicized the vibrio cholera in 1883. Dr John Snow’s map: A case of death Is labeled by a tick
  • 14. PATTERN FOUND Success: if Compatible with existing knowledge Failure: if Not compatible with existing knowledge Advice • Find a pattern • Interpret using existing knowledge • Care not whether interpretation is compatible
  • 15. INTRO: WHAT IS CORE DATA ANALYSIS II 1 • Core data analysis does care of (ia) specific patterns in data as part of (iia) Knowledge Enhancing • What are these (ia), (iia) specifics? • Have something to do with the notion of Knowledge • Statements of fact (“I teach this class.”) – factual • Statements of pattern, regularity (“Professors use to teach classes.”) - structural
  • 16. INTRO: WHAT IS CORE DATA ANALYSIS II 2 • Core data analysis does care of (ia) specific patterns in data as part of (iia) Knowledge Enhancing • (ia), (iia) specifics relate to elements of structural knowledge • Elements of Structural knowledge: • Concepts (“Professor”, “Teach”, “Class”) • Statements of relation between concepts (“Professors use to teach classes.”) - structural
  • 17. INTRO: WHAT IS CORE DATA ANALYSIS II •List elements of structural knowledge, •concepts and •statements of relation among them, for •3d Kepler’s Law •Dr Snow’s cholera outbreak map
  • 18. INTRO: WHAT IS CORE DATA ANALYSIS II 3 • Core data analysis does care of (ia) deriving concepts and statements of relation between them from data • (iia) Structural Knowledge Enhancing, generically, via either of the two pathways • Two pathways for Structural Knowledge Enhancing • Summarization: Developing Concepts • Correlation: Deriving Statements of relation between concepts
  • 19. W1. INTRO: WHAT IS CORE DATA ANALYSIS II 4 • Two pathways for Structural Knowledge Enhancing • Summarization: Developing Concepts • Correlation: Deriving Statements of relation between concepts  Two major formats:  Quantitative (both concepts and statements)  3d Kepler’s Law Period2 = Distance3  Categorical (both concepts and statements)  Dr Snow’s conclusion: Cholera death is caused by pump water
  • 20. INTRO II: STRUCTURAL KNOWLEDGE ENHANCING GENERIC METHODS •Two pathways & Two formats • Summarization methods: • Quantitative Principal component analysis (PCA) • Categorical Cluster analysis • Correlation methods: • Quantitative Regression • Categorical Classifier
  • 21. INTRO II: THREE POSSIBLE LAYERS OF STUDY Pro Con • Systems Usable now Short lived Simple Too many • Concepts Awareness Superficial • Methods Workable Technical Extendable Boring Long-term
  • 22. INTRO II: COURSE CONTENTS REVIEW •Summarization: PCA (Weeks 6 and 7), Cluster analysis (Week 8) •Correlation: Classifier (Week 5), (no Regression, sorry; if needed, go to Statistics, Econometrics and Neuron Networks courses) •Prequel: 1D and 2D analyses to study basic concepts and basic methods •Pre-prequel: Intro – Data and problems
  • 23. INTRO II: RELATION TO OTHER APPROACHES • Classical mathematical statistics: data is just a vehicle to fit and test mathematical models in the applied domain (say, in data analysis, a feature is a column in table, they model it as a random variable!) • Machine Learning: Prediction rules to be built incrementally (say, here PCA is a major method; for them, just a method to preprocess the data) • Data Mining: adding new knowledge by finding interesting patterns in databases, which is initial stage of knowledge discovery (CDA is part of that, up to databases) OVERALL: METHODS are SAME, PERSPECTIVES DO DIFFER
  • 24. INTRO III: VISUALIZATION • Visualization of data is an important activity assisting data analysis by a human in many ways including A. Highlighting B. Integrating different aspects C. Manipulating (not shown) A few examples follow.
  • 25. INTRO III: VISUALIZATION A. Highlighting 1 Figure 1. A fragment of London Tube map made after H. Beck (1906); the central part is highlighted by disproportionate scaling. Being, for a long while, totally rejected by the authorities, a standard for metro maps worldwide.
  • 26. INTRO III: VISUALIZATION A. Highlighting 2: Cheating by distortion Figure 2. A decline in relative numbers of general practitioner doctors in California in 70- es is conveniently visualized using 1D size-, not 2D area-related, scaling of a picture of doctor.
  • 27. INTRO III: VISUALIZATION Highlighting 3: Cheating by distortion Figure 3. Another unintended distortion: a newspaper’s self- satisfaction report (July 2005) is visualized with bars that grow from mark 500,000 rather than 0. A 25% advantage has visually grown ten-fold!
  • 28. INTRO III: VISUALIZATION B. Integrating aspects 1 Figure 4. Con Edison company’s power grid screen over Manhattan NY. Grid repair problems are dealt with on the fly by sending operators upon seeing disorders on the screen.
  • 29. INTRO III: VISUALIZATION B. Integrating aspects 2 Figure 5. Minard’s (1869) depiction of a lost Napoleon campaign 1812 integrating space, time and strength of the French army.
  • 30. INTRO III: VISUALIZATION B. Integrating aspects 3 Figure 6. The structure of research activities of CENTRIA (UNL, Lisbon) in 2007 represented over ACM Computer Subjects Classification 1998.
  • 31. INTRO IV: ILLUSTRATIVE DATA CASES Company name Income, $mln MShare,% NSup EC Sector Aversiona Antyops Astonite 19.0 29.4 23.9 43.7 36.0 38.0 2 3 3 No No No Utility Utility Industrial Bayermart Breaktops Bumchista 18.4 25.7 12.1 27.9 22.3 16.9 2 3 2 Yes Yes Yes Utility Industrial Industrial Civiok Cyberdam 23.9 27.2 30.2 58.0 4 5 Yes Yes Retail Retail Case 1: Companies 1 Companies characterized by mixed scale features; the first three companies making product A, the next three making product B, and the last two product C. Metadata: A. Features and Domain knowledge 1) Income, $ Mln; 2) Mshare - Market share , per cent; 3) NSup - Number of principal suppliers; 4) ECommerce - Yes e-trade or No; 5) Sector - (a) Retail, (b) Utility, and (c) Industrial. B. Main production (A,B,C) C. Feature scale types (3 main types)
  • 32. INTRO IV: ILLUSTRATIVE DATA CASES Case 1: Companies 2 Metadata: A. Features and Domain knowledge 1) Income, $ Mln; 2) Mshare - Market share , per cent; 3) NSup - Number of principal suppliers; 4) ECommerce - Yes e-trade or No; 5) Sector - (a) Retail, (b) Utility, and (c) Industrial. Feature: Maps entities to feature values (Synonyms: Variable, Attribute, Character, Parameter) Feature. Quantitative scale: Arithmetic averaging makes sense Examples: 1) Income, 2) Mshare, 3) NSup
  • 33. INTRO IV: ILLUSTRATIVE DATA CASES Case 1: Companies 3 Metadata: A. Features and Domain knowledge 1) Income, $ Mln; 2) Mshare - Market share , per cent; 3) NSup - Number of principal suppliers; 4) ECommerce - Yes e-trade or No; 5) Sector - (a) Retail, (b) Utility, and (c) Industrial. Feature. Nominal scale: Disjunctive categories, Only comparison “equal or not” making sense (Special case of categorical scales) Example: 5) Sector (Retail, Utility, Industrial are values Feature. Binary scale: Two disjunctive categories, “Yes” and “No” Shares properties of nominal scale and quantitative scale if 1/0 coded Example: 4) ECommerce
  • 34. INTRO IV: QUANTATIVE CODING Company name Income, $mln MShare,% NSup EC Sector Aversiona Antyops Astonite 19.0 29.4 23.9 43.7 36.0 38.0 2 3 3 No No No Utility Utility Industrial Bayermart Breaktops Bumchista 18.4 25.7 12.1 27.9 22.3 16.9 2 3 2 Yes Yes Yes Utility Industrial Industrial Civiok Cyberdam 23.9 27.2 30.2 58.0 4 5 Yes Yes Retail Retail Case 1: Companies 4 Quantitative coding: Each category is made into a 1/0 binary (dummy) feature “Does it hold? 1 if Yes, 0 if No.” Entity Income MSchar NSup EC? Util? Indu? Retail? 1 2 3 19.0 29.4 23.9 43.7 36.0 38.0 2 3 3 0 0 0 1 1 0 0 0 1 0 0 0 4 5 6 18.4 25.7 12.1 27.9 22.3 16.9 2 3 2 1 1 1 1 0 0 0 1 1 0 0 0 7 8 23.9 27.2 30.2 58.0 4 5 1 1 0 0 0 0 1 1 Company data 8x5 converted to the quantitative format 8x7
  • 35. INTRO IV: ILLUSTRATIVE DATA CASES Company name Income, $mln MShare,% NSup EC Sector Aversiona Antyops Astonite 19.0 29.4 23.9 43.7 36.0 38.0 2 3 3 No No No Utility Utility Industrial Bayermart Breaktops Bumchista 18.4 25.7 12.1 27.9 22.3 16.9 2 3 2 Yes Yes Yes Utility Industrial Industrial Civiok Cyberdam 23.9 27.2 30.2 58.0 4 5 Yes Yes Retail Retail Case 1: Companies 5 Data analysis: • How to map companies to the screen with their similarity reflected in distances between points? (Summarization/visualization) • Would clustering of companies reflect the product? What features would be involved then? (Summarization) • Can rules be derived to predict the product for another company, coming outside of the table? (Correlation) • Is there any relation between the structural features (Nsup,EC,Sector) and market related features (Income, MSchare)? (Correlation.)
  • 36. INTRO IV: ILLUSTRATIVE DATA CASES Case 2: Iris 1 Anderson–Fisher Iris 150x4 data of three taxa: Specimen (1-150)Taxon 1-50 Iris setosa (diploid) 51-100 Iris versicolor (tetraploid) 101-150 Iris virginica (hexaploid) Features W1 Sepal length W2 Sepal width W3 Petal length W4 Petal width
  • 37. INTRO IV: DATA CASES Case 2: Iris 2 # I Iris setosa II Iris versicolor III Iris virginica w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4 1 2 3 4 5 6 7 8 9 50 5.1 3.5 1.4 0.3 4.4 3.2 1.3 0.2 4.4 3.0 1.3 0.2 5.0 3.5 1.6 0.6 5.1 3.8 1.6 0.2 4.9 3.1 1.5 0.2 5.0 3.2 1.2 0.2 4.6 3.2 1.4 0.2 5.0 3.3 1.4 0.2 5.1 3.5 1.4 0.2 6.4 3.2 4.5 1.5 5.5 2.4 3.8 1.1 5.7 2.9 4.2 1.3 5.7 3.0 4.2 1.2 5.6 2.9 3.6 1.3 7.0 3.2 4.7 1.4 6.8 2.8 4.8 1.4 6.1 2.8 4.7 1.2 4.9 2.4 3.3 1.0 6.0 2.2 4.0 1.0 6.3 3.3 6.0 2.5 6.7 3.3 5.7 2.1 7.2 3.6 6.1 2.5 7.7 3.8 6.7 2.2 7.2 3.0 5.8 1.6 7.4 2.8 6.1 1.9 7.6 3.0 6.6 2.1 7.7 2.8 6.7 2.0 6.2 3.4 5.4 2.3 6.5 3.2 5.1 2.0 Data analysis • Visualise the data so that similar specimen are mapped into points that are near each other, and dissimilar to far away points • Build a predictor of sepal sizes from the petal sizes (to lessen the burden of measurement) • Build a predictor of taxa (classifier) based on the petal/sepal sizes
  • 38. INTRO IV: DATA CASES Case 3: Intrusion attack 1 Features 1) Pr, the protocol-type, which is either tcp or icmp or udp (a nominal feature), 2) BySD, the number of data bytes from source to destination, 3) SH, the number of connections to the same host as the current one in the past two seconds, 4) SS, the number of connections to the same service as the current one in the past two seconds, 5) SE, the rate of connections (per cent in SHCo) that have SYN errors, 6) RE, the rate of connections (per cent in SHCo) that have REJ errors, 7) A, the type of attack (ap - apache, sa - saint, sm - smurf, and no attack) – a nominal Pr BySD SH SS SE RE A Pr ByS SH SS Se RE A Tcp 62344 16 16 0 0.94 Ap Tcp 287 14 14 0 0 no Tcp 60884 17 17 0.06 0.88 Ap Tcp 308 1 1 0 0 no Tcp 59424 18 18 0.06 0.89 Ap Tcp 284 5 5 0 0 no Tcp 59424 19 19 0.05 0.89 Ap Udp 105 2 2 0 0 no Tcp 59424 20 20 0.05 0.9 Ap Udp 105 2 2 0 0 no Tcp 75484 21 21 0.05 0.9 Ap Udp 105 2 2 0 0 no
  • 39. INTRO IV: DATA CASES Case 3: Intrusion attack 2 Data analysis • Build a classifier to judge whether the system functions normally or is it under attack (Correlation); • Is there any relation between the protocol and type of attack (Correlation); • Visualize the data reflecting similarity of the patterns (Summarization). Pr BySD SH SS SE RE A Pr ByS SH SS Se RE A Tcp 62344 16 16 0 0.94 Ap Tcp 287 14 14 0 0 no Tcp 60884 17 17 0.06 0.88 Ap Tcp 308 1 1 0 0 no Tcp 59424 18 18 0.06 0.89 Ap Tcp 284 5 5 0 0 no Tcp 59424 19 19 0.05 0.89 Ap Udp 105 2 2 0 0 no Tcp 59424 20 20 0.05 0.9 Ap Udp 105 2 2 0 0 no Tcp 75484 21 21 0.05 0.9 Ap Udp 105 2 2 0 0 no
  • 40. TOPICS COVERED: 1. Data Mining and data patterns and their use: if found a pattern, interpret it! 2. Knowledge Enhancing: summarize to concepts, correlate to statements of relation. 3. Visualize: to highlight or integrate aspects. 4. Illustrative data cases: concept of feature, feature scale, data table, data analysis problem.

Editor's Notes

  1. Совет. Добавьте сюда свои заметки докладчика.