SlideShare a Scribd company logo
1 of 23
Matters of Discussion
Introduction to Data & Analytics
Getting to Know your data and dataset.
Analytic case
1
Compiled for Nasscom Associate Analytics
Types of Data Sets
2
 Record
 Relational records
 Data matrix, e.g., numerical matrix,
crosstabs
 Document data: text documents: term-
frequency vector
 Transaction data
 Graph and network
 World Wide Web
 Social or information networks
 Molecular Structures
 Ordered
 Video data: sequence of images
 Temporal data: time-series
 Sequential Data: transaction sequences
 Genetic sequence data
 Spatial, image and multimedia:
 Spatial data: maps
 Image data:
 Video data:
Data Objects
 Data sets are made up of data objects.
 A data object represents an entity.
 Examples:
 sales database: customers, store items, sales
 medical database: patients, treatments
 university database: students, professors, courses
 Also called samples , examples, instances, data points,
objects, tuples.
 Data objects are described by attributes.
 Database rows -> objects; columns ->attributes.
3
Compiled for Nasscom
Attributes
 Attribute (or dimensions, features,
variables): a data field, representing a
characteristic or feature of a data object.
 E.g., customer _ID, name, address
 Types:
 Nominal
 Binary
 Numeric: quantitative
 Interval-scaled
 Ratio-scaled
4
Compiled for Nasscom Associate Analytics
Attribute Types
 Nominal: categories, states, or “names of things”
 Hair_color = {auburn, black, blond, brown, grey, red, white}
 marital status, occupation, zip codes
 A variable with values which have no numerical value
 Binary
 Nominal attribute with only 2 states (0 and 1)
 Symmetric binary: both outcomes equally important
 e.g., gender
 Asymmetric binary: outcomes not equally important.
 e.g., medical test (positive vs. negative)
 Convention: assign 1 to most important outcome (e.g., HIV
positive)
 Ordinal
 Values have a meaningful order (ranking) but magnitude
between successive values is not known.
 Size = {small, medium, large}, grades, army rankings
 A variable with values which have no numerical value
5
Numeric Attribute Types
 Quantity (integer or real-valued)
 Interval
 Measured on a scale of equal-sized units
 Values have order
 E.g., temperature in C˚or F˚, calendar dates
 No true zero-point
 Ratio
 Inherent zero-point
 We can speak of values as being an order of
magnitude larger than the unit of measurement
(10 K˚ is twice as high as 5 K˚).
 e.g., temperature in Kelvin, length, counts,
monetary quantities
6
Compiled for Nasscom Associate Analytics
Discrete vs. Continuous Attributes
 Discrete Attribute
 Has only a finite or countably infinite set of values
 E.g., zip codes, profession, or the set of words in
a collection of documents
 Sometimes, represented as integer variables
 Note: Binary attributes are a special case of discrete
attributes
 Continuous Attribute
 Has real numbers as attribute values
 E.g., temperature, height, or weight
 Practically, real values can only be measured and
represented using a finite number of digits
 Continuous attributes are typically represented as
floating-point variables
7
Compiled for Nasscom Associate Analytics
• Document database
 A document can be represented by thousands of
attributes, each recording the frequency of a particular
word (such as keywords) or phrase in the document.
8
Compiled for Nasscom Associate Analytics
Time series Data
 A time series is a series of data points
indexed (or listed or graphed) in time order.
 Most commonly, a time series is a sequence
taken at successive equally spaced points in
time.
 Thus it is a sequence of discrete-time data.
 Examples of time series are heights of ocean
tides, USD value, market share value, many
more..
9
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
10
Example of Time Series Data
11
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Field Example topics
Economics Gross Domestic Product (GDP), Consumer Price Index (CPI), S&P 500
Index, and unemployment rates
Social
sciences
Birth rates, population, migration data, political indicators
Epidemiology Disease rates, mortality rates, mosquito populations
Medicine Blood pressure tracking, weight tracking, cholesterol measurements,
heart rate monitoring
Physical
sciences
Global temperatures, monthly sunspot observations, pollution levels.
Time Series Components
12
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Time Series Components – cont..
13
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Time Series Components---cont..
 Trend:- general tendency of the data to increase or
decrease during a long period of time.
 'long term' movement in a time series without calendar
related and irregular effects.
 population growth, price inflation and general economic
changes.
14
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Time Series Components—cont..
How do we identify seasonality or seasonal pattern?
 With respect to calendar related effects.
 large seasonal increase in December retail sales due
to Christmas shopping.
 magnitude of the seasonal component increases
over time , as the trend does.
15
periodic time series
Time Series Components--Cycle
 A cyclic pattern exists when data exhibit rises and
falls that are not of fixed period.
 The duration of these fluctuations is usually of at
least 2 years.
 If the fluctuations are not of fixed period then they
are cyclic else seasonal.
16
Within the time
interval,
the fluctuations
are not fixed
Time Series Components-summary
These components are defined as follows:
 Level: The average value in the series.
 Trend: The increasing or decreasing value in
the series.
 Seasonality: The repeating short-term cycle
in the series.
 Cyclic: data exhibit rises and falls that are not
of fixed period
 Noise: The random variation in the series.
17
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Specific Data Analytic case
18
Compiled for Nasscom Associate Analytics
19
Cosine Similarity in Data Analytic Apps
• A document can be represented by thousands of attributes, each
recording the frequency of a particular word (such as keywords) or
phrase in the document.
• Other vector objects: gene features in micro-arrays, …
• Applications: information retrieval, biologic taxonomy, gene
feature mapping, ...
• Cosine measure: If d1 and d2 are two vectors (e.g., term-frequency
vectors), then
cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d||: the length of
vector d
20
Example: Cosine Similarity
• cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d|: the length of vector d
• Ex: Find the similarity between documents 1 and 2.
d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)
d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25
||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 =
6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5
= 4.12
cos(d1, d2 ) = 0.94
TASK FOR YOU[A2]
1. Investigate the attributes, dimensions,
features, or variables with a suitable
scenario and prepare your critical report.
 nominal
 binary
 ordinal
 numeric: quantitative
21
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
TASK FOR YOU [A3]
2. Investigate the numerous time series
components in context to the business
analytics and application modeling.
22
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Cheers For the Great Patience!
Query Please?
23
Compiled for Nasscom Associate Analytics

More Related Content

Similar to 1.1. Associate Analytics.pptx

Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfRAJVEERKUMAR41
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptjohnsaida3101
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptAkhtaruzzamanlimon1
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptJapnaamKaurAhluwalia
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptRajnishSingh367990
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
DATA UNIT-3.pptx
DATA UNIT-3.pptxDATA UNIT-3.pptx
DATA UNIT-3.pptxbibha737
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statisticsCyrus S. Koroma
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics reviewFELIXARCHER
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1TimKasse
 
Introduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanIntroduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanMamatha Upadhya
 
BIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptxBIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptxVaishnaviElumalai
 

Similar to 1.1. Associate Analytics.pptx (20)

Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Time series.ppt
Time series.pptTime series.ppt
Time series.ppt
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
 
02 data
02 data02 data
02 data
 
Data Types
Data TypesData Types
Data Types
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.ppt
 
Survey data & sampling
Survey data & samplingSurvey data & sampling
Survey data & sampling
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
DATA UNIT-3.pptx
DATA UNIT-3.pptxDATA UNIT-3.pptx
DATA UNIT-3.pptx
 
EDA
EDAEDA
EDA
 
Statistics Reference Book
Statistics Reference BookStatistics Reference Book
Statistics Reference Book
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics review
 
02Data-osu-0829.pdf
02Data-osu-0829.pdf02Data-osu-0829.pdf
02Data-osu-0829.pdf
 
Data What Type Of Data Do You Have V2.1
Data   What Type Of Data Do You Have V2.1Data   What Type Of Data Do You Have V2.1
Data What Type Of Data Do You Have V2.1
 
Introduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanIntroduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic Mean
 
BIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptxBIOSTATISTICS (MPT) 11 (1).pptx
BIOSTATISTICS (MPT) 11 (1).pptx
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 

1.1. Associate Analytics.pptx

  • 1. Matters of Discussion Introduction to Data & Analytics Getting to Know your data and dataset. Analytic case 1 Compiled for Nasscom Associate Analytics
  • 2. Types of Data Sets 2  Record  Relational records  Data matrix, e.g., numerical matrix, crosstabs  Document data: text documents: term- frequency vector  Transaction data  Graph and network  World Wide Web  Social or information networks  Molecular Structures  Ordered  Video data: sequence of images  Temporal data: time-series  Sequential Data: transaction sequences  Genetic sequence data  Spatial, image and multimedia:  Spatial data: maps  Image data:  Video data:
  • 3. Data Objects  Data sets are made up of data objects.  A data object represents an entity.  Examples:  sales database: customers, store items, sales  medical database: patients, treatments  university database: students, professors, courses  Also called samples , examples, instances, data points, objects, tuples.  Data objects are described by attributes.  Database rows -> objects; columns ->attributes. 3 Compiled for Nasscom
  • 4. Attributes  Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object.  E.g., customer _ID, name, address  Types:  Nominal  Binary  Numeric: quantitative  Interval-scaled  Ratio-scaled 4 Compiled for Nasscom Associate Analytics
  • 5. Attribute Types  Nominal: categories, states, or “names of things”  Hair_color = {auburn, black, blond, brown, grey, red, white}  marital status, occupation, zip codes  A variable with values which have no numerical value  Binary  Nominal attribute with only 2 states (0 and 1)  Symmetric binary: both outcomes equally important  e.g., gender  Asymmetric binary: outcomes not equally important.  e.g., medical test (positive vs. negative)  Convention: assign 1 to most important outcome (e.g., HIV positive)  Ordinal  Values have a meaningful order (ranking) but magnitude between successive values is not known.  Size = {small, medium, large}, grades, army rankings  A variable with values which have no numerical value 5
  • 6. Numeric Attribute Types  Quantity (integer or real-valued)  Interval  Measured on a scale of equal-sized units  Values have order  E.g., temperature in C˚or F˚, calendar dates  No true zero-point  Ratio  Inherent zero-point  We can speak of values as being an order of magnitude larger than the unit of measurement (10 K˚ is twice as high as 5 K˚).  e.g., temperature in Kelvin, length, counts, monetary quantities 6 Compiled for Nasscom Associate Analytics
  • 7. Discrete vs. Continuous Attributes  Discrete Attribute  Has only a finite or countably infinite set of values  E.g., zip codes, profession, or the set of words in a collection of documents  Sometimes, represented as integer variables  Note: Binary attributes are a special case of discrete attributes  Continuous Attribute  Has real numbers as attribute values  E.g., temperature, height, or weight  Practically, real values can only be measured and represented using a finite number of digits  Continuous attributes are typically represented as floating-point variables 7 Compiled for Nasscom Associate Analytics
  • 8. • Document database  A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as keywords) or phrase in the document. 8 Compiled for Nasscom Associate Analytics
  • 9. Time series Data  A time series is a series of data points indexed (or listed or graphed) in time order.  Most commonly, a time series is a sequence taken at successive equally spaced points in time.  Thus it is a sequence of discrete-time data.  Examples of time series are heights of ocean tides, USD value, market share value, many more.. 9 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 10. 10
  • 11. Example of Time Series Data 11 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan] Field Example topics Economics Gross Domestic Product (GDP), Consumer Price Index (CPI), S&P 500 Index, and unemployment rates Social sciences Birth rates, population, migration data, political indicators Epidemiology Disease rates, mortality rates, mosquito populations Medicine Blood pressure tracking, weight tracking, cholesterol measurements, heart rate monitoring Physical sciences Global temperatures, monthly sunspot observations, pollution levels.
  • 12. Time Series Components 12 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 13. Time Series Components – cont.. 13 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 14. Time Series Components---cont..  Trend:- general tendency of the data to increase or decrease during a long period of time.  'long term' movement in a time series without calendar related and irregular effects.  population growth, price inflation and general economic changes. 14 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 15. Time Series Components—cont.. How do we identify seasonality or seasonal pattern?  With respect to calendar related effects.  large seasonal increase in December retail sales due to Christmas shopping.  magnitude of the seasonal component increases over time , as the trend does. 15 periodic time series
  • 16. Time Series Components--Cycle  A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.  The duration of these fluctuations is usually of at least 2 years.  If the fluctuations are not of fixed period then they are cyclic else seasonal. 16 Within the time interval, the fluctuations are not fixed
  • 17. Time Series Components-summary These components are defined as follows:  Level: The average value in the series.  Trend: The increasing or decreasing value in the series.  Seasonality: The repeating short-term cycle in the series.  Cyclic: data exhibit rises and falls that are not of fixed period  Noise: The random variation in the series. 17 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 18. Specific Data Analytic case 18 Compiled for Nasscom Associate Analytics
  • 19. 19 Cosine Similarity in Data Analytic Apps • A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as keywords) or phrase in the document. • Other vector objects: gene features in micro-arrays, … • Applications: information retrieval, biologic taxonomy, gene feature mapping, ... • Cosine measure: If d1 and d2 are two vectors (e.g., term-frequency vectors), then cos(d1, d2) = (d1  d2) /||d1|| ||d2|| , where  indicates vector dot product, ||d||: the length of vector d
  • 20. 20 Example: Cosine Similarity • cos(d1, d2) = (d1  d2) /||d1|| ||d2|| , where  indicates vector dot product, ||d|: the length of vector d • Ex: Find the similarity between documents 1 and 2. d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0) d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1) d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25 ||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 = 6.481 ||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5 = 4.12 cos(d1, d2 ) = 0.94
  • 21. TASK FOR YOU[A2] 1. Investigate the attributes, dimensions, features, or variables with a suitable scenario and prepare your critical report.  nominal  binary  ordinal  numeric: quantitative 21 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 22. TASK FOR YOU [A3] 2. Investigate the numerous time series components in context to the business analytics and application modeling. 22 Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
  • 23. Cheers For the Great Patience! Query Please? 23 Compiled for Nasscom Associate Analytics