SlideShare a Scribd company logo
1 of 23
Download to read offline
Matters of Discussion
Introduction to Data & Analytics
Getting to Know your data and dataset.
Analytic case
1
Compiled for Nasscom Associate Analytics
Types of Data Sets
2
 Record
 Relational records
 Data matrix, e.g., numerical matrix,
crosstabs
 Document data: text documents: term-
frequency vector
 Transaction data
 Graph and network
 World Wide Web
 Social or information networks
 Molecular Structures
 Ordered
 Video data: sequence of images
 Temporal data: time-series
 Sequential Data: transaction sequences
 Genetic sequence data
 Spatial, image and multimedia:
 Spatial data: maps
 Image data:
 Video data:
Data Objects
 Data sets are made up of data objects.
 A data object represents an entity.
 Examples:
 sales database: customers, store items, sales
 medical database: patients, treatments
 university database: students, professors, courses
 Also called samples , examples, instances, data points,
objects, tuples.
 Data objects are described by attributes.
 Database rows -> objects; columns ->attributes.
3
Compiled for Nasscom
Attributes
 Attribute (or dimensions, features,
variables): a data field, representing a
characteristic or feature of a data object.
 E.g., customer _ID, name, address
 Types:
 Nominal
 Binary
 Numeric: quantitative
 Interval-scaled
 Ratio-scaled
4
Compiled for Nasscom Associate Analytics
Attribute Types
 Nominal: categories, states, or “names of things”
 Hair_color = {auburn, black, blond, brown, grey, red, white}
 marital status, occupation, zip codes
 A variable with values which have no numerical value
 Binary
 Nominal attribute with only 2 states (0 and 1)
 Symmetric binary: both outcomes equally important
 e.g., gender
 Asymmetric binary: outcomes not equally important.
 e.g., medical test (positive vs. negative)
 Convention: assign 1 to most important outcome (e.g., HIV
positive)
 Ordinal
 Values have a meaningful order (ranking) but magnitude
between successive values is not known.
 Size = {small, medium, large}, grades, army rankings
 A variable with values which have no numerical value
5
Numeric Attribute Types
 Quantity (integer or real-valued)
 Interval
 Measured on a scale of equal-sized units
 Values have order
 E.g., temperature in C˚or F˚, calendar dates
 No true zero-point
 Ratio
 Inherent zero-point
 We can speak of values as being an order of
magnitude larger than the unit of measurement
(10 K˚ is twice as high as 5 K˚).
 e.g., temperature in Kelvin, length, counts,
monetary quantities
6
Compiled for Nasscom Associate Analytics
Discrete vs. Continuous Attributes
 Discrete Attribute
 Has only a finite or countably infinite set of values
 E.g., zip codes, profession, or the set of words in
a collection of documents
 Sometimes, represented as integer variables
 Note: Binary attributes are a special case of discrete
attributes
 Continuous Attribute
 Has real numbers as attribute values
 E.g., temperature, height, or weight
 Practically, real values can only be measured and
represented using a finite number of digits
 Continuous attributes are typically represented as
floating-point variables
7
Compiled for Nasscom Associate Analytics
• Document database
 A document can be represented by thousands of
attributes, each recording the frequency of a particular
word (such as keywords) or phrase in the document.
8
Compiled for Nasscom Associate Analytics
Time series Data
 A time series is a series of data points
indexed (or listed or graphed) in time order.
 Most commonly, a time series is a sequence
taken at successive equally spaced points in
time.
 Thus it is a sequence of discrete-time data.
 Examples of time series are heights of ocean
tides, USD value, market share value, many
more..
9
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
10
Example of Time Series Data
11
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Field Example topics
Economics Gross Domestic Product (GDP), Consumer Price Index (CPI), S&P 500
Index, and unemployment rates
Social
sciences
Birth rates, population, migration data, political indicators
Epidemiology Disease rates, mortality rates, mosquito populations
Medicine Blood pressure tracking, weight tracking, cholesterol measurements,
heart rate monitoring
Physical
sciences
Global temperatures, monthly sunspot observations, pollution levels.
Time Series Components
12
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Time Series Components – cont..
13
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Time Series Components---cont..
 Trend:- general tendency of the data to increase or
decrease during a long period of time.
 'long term' movement in a time series without calendar
related and irregular effects.
 population growth, price inflation and general economic
changes.
14
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Time Series Components—cont..
How do we identify seasonality or seasonal pattern?
 With respect to calendar related effects.
 large seasonal increase in December retail sales due
to Christmas shopping.
 magnitude of the seasonal component increases
over time , as the trend does.
15
periodic time series
Time Series Components--Cycle
 A cyclic pattern exists when data exhibit rises and
falls that are not of fixed period.
 The duration of these fluctuations is usually of at
least 2 years.
 If the fluctuations are not of fixed period then they
are cyclic else seasonal.
16
Within the time
interval,
the fluctuations
are not fixed
Time Series Components-summary
These components are defined as follows:
 Level: The average value in the series.
 Trend: The increasing or decreasing value in
the series.
 Seasonality: The repeating short-term cycle
in the series.
 Cyclic: data exhibit rises and falls that are not
of fixed period
 Noise: The random variation in the series.
17
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Specific Data Analytic case
18
Compiled for Nasscom Associate Analytics
19
Cosine Similarity in Data Analytic Apps
• A document can be represented by thousands of attributes, each
recording the frequency of a particular word (such as keywords) or
phrase in the document.
• Other vector objects: gene features in micro-arrays, …
• Applications: information retrieval, biologic taxonomy, gene
feature mapping, ...
• Cosine measure: If d1 and d2 are two vectors (e.g., term-frequency
vectors), then
cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d||: the length of
vector d
20
Example: Cosine Similarity
• cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d|: the length of vector d
• Ex: Find the similarity between documents 1 and 2.
d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)
d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25
||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 =
6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5
= 4.12
cos(d1, d2 ) = 0.94
TASK FOR YOU[A2]
1. Investigate the attributes, dimensions,
features, or variables with a suitable
scenario and prepare your critical report.
 nominal
 binary
 ordinal
 numeric: quantitative
21
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
TASK FOR YOU [A3]
2. Investigate the numerous time series
components in context to the business
analytics and application modeling.
22
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Cheers For the Great Patience!
Query Please?
23
Compiled for Nasscom Associate Analytics

More Related Content

Similar to 1.1. Associate Analytics.pptx

Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfRAJVEERKUMAR41
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptjohnsaida3101
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptJapnaamKaurAhluwalia
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptRajnishSingh367990
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptAkhtaruzzamanlimon1
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
DATA UNIT-3.pptx
DATA UNIT-3.pptxDATA UNIT-3.pptx
DATA UNIT-3.pptxbibha737
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statisticsCyrus S. Koroma
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics reviewFELIXARCHER
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1TimKasse
 
Introduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanIntroduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanMamatha Upadhya
 
BIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptxBIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptxVaishnaviElumalai
 

Similar to 1.1. Associate Analytics.pptx (20)

Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Time series.ppt
Time series.pptTime series.ppt
Time series.ppt
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
 
02 data
02 data02 data
02 data
 
Data Types
Data TypesData Types
Data Types
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.ppt
 
Survey data & sampling
Survey data & samplingSurvey data & sampling
Survey data & sampling
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
DATA UNIT-3.pptx
DATA UNIT-3.pptxDATA UNIT-3.pptx
DATA UNIT-3.pptx
 
EDA
EDAEDA
EDA
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics review
 
Statistics Reference Book
Statistics Reference BookStatistics Reference Book
Statistics Reference Book
 
02Data-osu-0829.pdf
02Data-osu-0829.pdf02Data-osu-0829.pdf
02Data-osu-0829.pdf
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Introduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanIntroduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic Mean
 
BIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptxBIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptx
 

Recently uploaded

IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachAdekunleJoseph4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligencePriyadharshiniG41
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk494f574xmv
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excelKapilSidhpuria3
 

Recently uploaded (19)

IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approach
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligence
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excel
 

1.1. Associate Analytics.pptx

  • 1. Matters of Discussion Introduction to Data & Analytics Getting to Know your data and dataset. Analytic case 1 Compiled for Nasscom Associate Analytics
  • 2. Types of Data Sets 2  Record  Relational records  Data matrix, e.g., numerical matrix, crosstabs  Document data: text documents: term- frequency vector  Transaction data  Graph and network  World Wide Web  Social or information networks  Molecular Structures  Ordered  Video data: sequence of images  Temporal data: time-series  Sequential Data: transaction sequences  Genetic sequence data  Spatial, image and multimedia:  Spatial data: maps  Image data:  Video data:
  • 3. Data Objects  Data sets are made up of data objects.  A data object represents an entity.  Examples:  sales database: customers, store items, sales  medical database: patients, treatments  university database: students, professors, courses  Also called samples , examples, instances, data points, objects, tuples.  Data objects are described by attributes.  Database rows -> objects; columns ->attributes. 3 Compiled for Nasscom
  • 4. Attributes  Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object.  E.g., customer _ID, name, address  Types:  Nominal  Binary  Numeric: quantitative  Interval-scaled  Ratio-scaled 4 Compiled for Nasscom Associate Analytics
  • 5. Attribute Types  Nominal: categories, states, or “names of things”  Hair_color = {auburn, black, blond, brown, grey, red, white}  marital status, occupation, zip codes  A variable with values which have no numerical value  Binary  Nominal attribute with only 2 states (0 and 1)  Symmetric binary: both outcomes equally important  e.g., gender  Asymmetric binary: outcomes not equally important.  e.g., medical test (positive vs. negative)  Convention: assign 1 to most important outcome (e.g., HIV positive)  Ordinal  Values have a meaningful order (ranking) but magnitude between successive values is not known.  Size = {small, medium, large}, grades, army rankings  A variable with values which have no numerical value 5
  • 6. Numeric Attribute Types  Quantity (integer or real-valued)  Interval  Measured on a scale of equal-sized units  Values have order  E.g., temperature in C˚or F˚, calendar dates  No true zero-point  Ratio  Inherent zero-point  We can speak of values as being an order of magnitude larger than the unit of measurement (10 K˚ is twice as high as 5 K˚).  e.g., temperature in Kelvin, length, counts, monetary quantities 6 Compiled for Nasscom Associate Analytics
  • 7. Discrete vs. Continuous Attributes  Discrete Attribute  Has only a finite or countably infinite set of values  E.g., zip codes, profession, or the set of words in a collection of documents  Sometimes, represented as integer variables  Note: Binary attributes are a special case of discrete attributes  Continuous Attribute  Has real numbers as attribute values  E.g., temperature, height, or weight  Practically, real values can only be measured and represented using a finite number of digits  Continuous attributes are typically represented as floating-point variables 7 Compiled for Nasscom Associate Analytics
  • 8. • Document database  A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as keywords) or phrase in the document. 8 Compiled for Nasscom Associate Analytics
  • 9. Time series Data  A time series is a series of data points indexed (or listed or graphed) in time order.  Most commonly, a time series is a sequence taken at successive equally spaced points in time.  Thus it is a sequence of discrete-time data.  Examples of time series are heights of ocean tides, USD value, market share value, many more.. 9 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 10. 10
  • 11. Example of Time Series Data 11 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan] Field Example topics Economics Gross Domestic Product (GDP), Consumer Price Index (CPI), S&P 500 Index, and unemployment rates Social sciences Birth rates, population, migration data, political indicators Epidemiology Disease rates, mortality rates, mosquito populations Medicine Blood pressure tracking, weight tracking, cholesterol measurements, heart rate monitoring Physical sciences Global temperatures, monthly sunspot observations, pollution levels.
  • 12. Time Series Components 12 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 13. Time Series Components – cont.. 13 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 14. Time Series Components---cont..  Trend:- general tendency of the data to increase or decrease during a long period of time.  'long term' movement in a time series without calendar related and irregular effects.  population growth, price inflation and general economic changes. 14 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 15. Time Series Components—cont.. How do we identify seasonality or seasonal pattern?  With respect to calendar related effects.  large seasonal increase in December retail sales due to Christmas shopping.  magnitude of the seasonal component increases over time , as the trend does. 15 periodic time series
  • 16. Time Series Components--Cycle  A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.  The duration of these fluctuations is usually of at least 2 years.  If the fluctuations are not of fixed period then they are cyclic else seasonal. 16 Within the time interval, the fluctuations are not fixed
  • 17. Time Series Components-summary These components are defined as follows:  Level: The average value in the series.  Trend: The increasing or decreasing value in the series.  Seasonality: The repeating short-term cycle in the series.  Cyclic: data exhibit rises and falls that are not of fixed period  Noise: The random variation in the series. 17 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 18. Specific Data Analytic case 18 Compiled for Nasscom Associate Analytics
  • 19. 19 Cosine Similarity in Data Analytic Apps • A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as keywords) or phrase in the document. • Other vector objects: gene features in micro-arrays, … • Applications: information retrieval, biologic taxonomy, gene feature mapping, ... • Cosine measure: If d1 and d2 are two vectors (e.g., term-frequency vectors), then cos(d1, d2) = (d1  d2) /||d1|| ||d2|| , where  indicates vector dot product, ||d||: the length of vector d
  • 20. 20 Example: Cosine Similarity • cos(d1, d2) = (d1  d2) /||d1|| ||d2|| , where  indicates vector dot product, ||d|: the length of vector d • Ex: Find the similarity between documents 1 and 2. d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0) d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1) d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25 ||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 = 6.481 ||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5 = 4.12 cos(d1, d2 ) = 0.94
  • 21. TASK FOR YOU[A2] 1. Investigate the attributes, dimensions, features, or variables with a suitable scenario and prepare your critical report.  nominal  binary  ordinal  numeric: quantitative 21 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 22. TASK FOR YOU [A3] 2. Investigate the numerous time series components in context to the business analytics and application modeling. 22 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 23. Cheers For the Great Patience! Query Please? 23 Compiled for Nasscom Associate Analytics