SlideShare a Scribd company logo
1 of 4
Download to read offline
1
Data Mining for Big Data
Murat Yazıcı
Valide-I Atik Mh. Boybey Sk. No:6 D:4
34664 Uskudar/Istanbul, Turkey
asiscrus.muratyazici@gmail.com
In the Age of Information and Technology, the amount of information which can be stored is increasing
fromday to day. Increasing the amount of information has led to the need to obtain meaningful
information.We can say that the process of obtaining meaningful information from a large volume of
datais an information discovery process. A number of methods have been needed and developed in the
information discovery process.These methods constitute the data mining techniques such as Cluster
Analysis, Regression Analysis, Social Network Analysis, Time Series Analysis, Market Basket Analysis,
Outlier Detection, Text Mining etc.Data mining users aim to get prediction and making decisions for the
future by obtaining some links, patterns and rules with these techniques in the light of available
data.Because of the importance of obtaining relevant information, the need and importance of data
mining is increasing from day to day.
One of the Data Mining definitions accepted by the world is “Data Mining is a set of techniques and
insights which generating new knowledge for stages of the decision of organizations, and allows us to
make predictions and plans about the future”.
Figure: Jiawei Han and Micheline Kamber,
“Data Mining Concepts and Techniques”, page:6.
1.Data Cleaning (to remove noise and inconsistent
data)
2.Data Integration (where multiple data sources
may be combined)
3.Data Selection (where data relevant to the analysis
task are retrieved from the database)
4.Data Transformation (where data are transformed
or consolidated into forms appropriate for mining by
performing summary or aggregation operations, for
instance)
5.Data Mining (an essential process where
intelligent methods are applied in order to extract
data patterns)
6.Pattern Evaluation (to identify the truly interesting
patterns representing knowledge based on some
interestingness measures)
7.Knowledge Presentation (where visualization and
knowledge representation techniques are used to
present the mined knowledge to the user)
2
Some Techniques:
. Cluster Analysis: Cluster Analysis is a data mining technique which separates objects with similar
features to each other from other objects in a data set and collects similar objects in a group. Many
clusters can be formed in a data set. This technique is often used in many fieldssuch as Machine
Learning, Pattern Recognition, Image Analysis, Information Retrieval and Bioinformatics etc. This
technique don’t have a specific algorithm. There are many algorithm used in this technique such as Thek-
Means Clustering, The k-Medoids Clustering, Hierarchical Clustering, Density-based Clustering and Fuzzy
Clustering etc.This technique can be used for different purposes such as segmenting the market and
determining target markets, product positioning and new product development, product preferences,
buying behavior of consumers, analysis of distribution channels and consumer groups, market structure
analysis, and comparison of market’s differences, or similarities from other markets.
. Regression Analysis:Regression Analysis is a statistical analysis method which aims to estimate the
relationships among variables. At the same time, this method aims to carry out interpolation and
extrapolation after estimating relationships among variables. This technique is often used in many fields
such as Research, Statistics, Mathematics, Finance, Economy, Medicine, Sociology, Modeling etc. There
are many types for these analysis methods. For example, Multiple Linear Regression, Basic Linear
Regression, Nonlinear Regression, Ridge Regression, Fuzzy Regression, Robust Regression, Fuzzy Robust
Regression, Logistic Regression, Bayesian Linear Regression, Percentage Regression, Nonparametric
Regression, Local Regression, Segmented Regression, Stepwise Regression etc.Using different algorithms
has brought about many different techniques as above.Sales figures of a product may wanted to
establish with a model which estimates the future aboutsales figures of a product. Empirical duration of
common stocks may wanted to establish with a model. Thus, predictions can be made for the future
about common stocks. This technique can be usedfor the identification of prognostically relevant risk
factors and the calculation of risk scores for individual prognostication in medicine.
. Social Network Analysis (SNA): Social network analysis views social relationships by using Network
Theory which consists nodes representing individual actors within the networkand ties which represent
relationships between the individuals. SNA usessocial network diagram where nodes are represented as
points, and ties are represented as lines for visualization. This analysis is often used in the Sociology,
Anthropology, Biology, Communication Studies, Economics,Geography, History, Information
Science, Organizational Studies, Political Science, Social Psychology, Development Studies, and Socio-
linguistics etc.Companies may purpose to analysis customer’s social network. Thus, they can
determinespecific people who has asignificantly effect on other people, and the width of the network.
Afterward, companies can support activitiessuch as customer interaction and analysis, marketing and
business intelligence needs. Some organizations may want to determine development of leader
engagement strategies, analysis of individual and group engagement and media use, and community-
based problem solving. Social network analysis is also used in intelligence, counter-intelligence and law
enforcement activities.The NSA (National Security Agency) has been performing social network analysis
on Call Detail Records (CDRs), also known as metadata, since shortly after the September 11 Attacks.
3
. Time Series Analysis:Time Series Analysis is a method which analyzes time series data in order to obtain
meaningful statistics. As well as, Time Series Analysis aim to establish a model which predicts future
values based on previously observed values.This analysis method is often used many areas such as
Statistics, Econometrics, Mathematical Finance, Seismology, Meteorology,Geophysics, Control
Engineering, Astronomy and Communications Engineering. Methods of time series analysis are divided
into parametric and non-parametric, linear and non-linear, univariate and multivariatemethods.Some of
methods in time series analysis are the autoregressive (AR) models, the integrated (I) models,
the moving average (MA) models, autoregressive moving average (ARMA) and autoregressive
integrated moving average (ARIMA) models and autoregressive fractionally integrated moving
average (ARFIMA). Furthermore,to investigate the effect of the season on the data is an important issue
in Time Series Analysis. The effect of the season on the data may produce incorrect results. Some
seasonal adjustment techniques such asX-12-ARIMA, TRAMO/SEATS and STAMP can be used to eliminate
the effect of the season.Some organizations want to forcaste their product’s sales figure for the future.
By based on stocks prices, the next opening or closing price of the stock may wanted to forcaste. In
meteorology, forcasting of the next days’ weather may wanted to determine.In astronomy, stars which
show periodic light level changesmay wanted to establish with a model about their periodic light level. In
seismology, the occurrence of earthquakes may be wanted to predict with a model which is based on
time series.
. Market Basket Analysis: Market Basket Analysis is a data mining technique which tries to discover
interesting relations between variables in large databases. This technique is often used today in many
application areas including Web Usage Mining, Intrusion Detection, Continuous Production and
Bioinformatics. Many algorithms for generating association rules were presented. Some of well known
algorithms in this analysis are Apriori, Eclat and FP-Growth.Some organizations which sell products may
want to use this analysis technique to determine customers purchasing behavior. Whereby, they can
make strategies for marketing. Forexample, we assume that a company revealed thatthere is a high
correlation between the two products after doing Market Basket Analysis. The company may want to
offer these two products on the same shelf by creating sales strategy for sale. Thus, It can increase its
sales figures. Another area of common use of Market Basket Analysis is theIntrusion Detection. The
security of our computer systems and data are at continual risk. The extensive growth of the Internet
and increasing availability of tools and tricks for intruding and attacking networks have prompted the
intrusion detection to become a critical component of network administration. Market Basket Analysis
can be applied to find relationships among system attributes describing the network data. The
relationships among system attributes can provide an insight regarding the selection of useful attributes
for intrusion detection.
.Outlier Detection: In statistics, an outlier is an observation point which is distant from other
observations.The presence of outlier value in a data set can cause to incorrect analysis results. So, some
techniques are need about Outlier Detection.Outlier Detection is a data mining technique which is used
in variety fields such as the Intrusion Detection, Fraud Detection, Fault Detection, System Health
Monitoring, Event Detection in Sensor Network and Detecting Eco-system Disturbances etc. In outlier
detection process, there are more than one technique. Some of the popular techniques are the Distance
4
Based Techniques, One Class Support Vector Machines, Replicator Neural NetworksandCluster
Analysis based on Outlier Detection.This data mining technique can beused in credit card fraud
detection, clinical trials, voting irregularity analysis, data cleansing, network intrusion, severe weather
prediction, geographic information systems, and athlete performance analysis etc.
. Text Mining: Text Mining is a data mining technique which aims to obtain useful information from
atext. Some of text mining tasks are the text categorization, text clustering, concept/entity extraction,
production of granular taxonomies, sentiment analysis, document summarization,and entity relation
modelingetc. Text Mining techniques involve linguistic, statistical, and machine learning techniques. Text
Miningincludes some issue such as information retrieval, lexical analysis to study word frequency
distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques
including link and association analysis, visualization, and predictive analytics. Text mining is often used
many areas such as the Security, Biomedical, Software, Online Media, Marketing, Sentiment Analysis,
Academic Applications etc.Some organizations in public, or private sector may want tomonitor and
analyze online plain text sources such as Internet News, blogs, etc. for security purposes. Within public
sector, some software are needed for tracking and monitoring terrorist activities. In Marketing
Applications, Text Mining can beused in analytical customer relationship management . In Sentiment
Analysis, Text Mining can beused fordetermining the attitude of writer or speaker.

More Related Content

What's hot

IRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber SecurityIRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber SecurityIRJET Journal
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Techniqueijtsrd
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
 
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan SikdarTrend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdarraihansikdar
 
What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?AbhayDhupar
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionLalit Jain
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 

What's hot (17)

IRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber SecurityIRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber Security
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
 
data analysis-mining
data analysis-miningdata analysis-mining
data analysis-mining
 
Data mining
Data mining Data mining
Data mining
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
Datamining
DataminingDatamining
Datamining
 
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan SikdarTrend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
 
What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly Detection
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data analysis
Data analysisData analysis
Data analysis
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 

Viewers also liked

Industrial Catalog Design
Industrial Catalog DesignIndustrial Catalog Design
Industrial Catalog Designunderwoodco
 
Materials for civil and construction engineers 3rd edition ( Solution manual )
Materials for civil and construction engineers 3rd edition ( Solution manual )Materials for civil and construction engineers 3rd edition ( Solution manual )
Materials for civil and construction engineers 3rd edition ( Solution manual )Dyarbaban
 
Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2
Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2
Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2Dr. Teodross Avery
 
горийска № 8 публичная школа
горийска № 8 публичная школагорийска № 8 публичная школа
горийска № 8 публичная школаschool
 
2013 extrait de confidentialité entre l'identité humaine et la juridique conf...
2013 extrait de confidentialité entre l'identité humaine et la juridique conf...2013 extrait de confidentialité entre l'identité humaine et la juridique conf...
2013 extrait de confidentialité entre l'identité humaine et la juridique conf...Jacques-Antoine Normandin
 
Sampling and low-rank tensor approximations
Sampling and low-rank tensor approximationsSampling and low-rank tensor approximations
Sampling and low-rank tensor approximationsAlexander Litvinenko
 
Aulas Virtuales
Aulas VirtualesAulas Virtuales
Aulas Virtualesjohneboi
 
Los mensajes subliminales en la comunicación visual
Los mensajes subliminales en la comunicación visualLos mensajes subliminales en la comunicación visual
Los mensajes subliminales en la comunicación visualluzconde
 
Como subir una presentacion en prezi calameo y slideshare
Como subir una presentacion en prezi calameo y slideshareComo subir una presentacion en prezi calameo y slideshare
Como subir una presentacion en prezi calameo y slidesharediego muñoz
 
Homeland Security workshop
Homeland Security workshopHomeland Security workshop
Homeland Security workshopCandice Martinez
 
MEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOS
MEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOSMEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOS
MEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOSEduardo Guadalupe Damián
 
2013 demande formelle à reno bernier de renoncer à son serment de confidentia...
2013 demande formelle à reno bernier de renoncer à son serment de confidentia...2013 demande formelle à reno bernier de renoncer à son serment de confidentia...
2013 demande formelle à reno bernier de renoncer à son serment de confidentia...Jacques-Antoine Normandin
 
Trabajo en clase
Trabajo en claseTrabajo en clase
Trabajo en clasewavilamonte
 

Viewers also liked (19)

Industrial Catalog Design
Industrial Catalog DesignIndustrial Catalog Design
Industrial Catalog Design
 
Materials for civil and construction engineers 3rd edition ( Solution manual )
Materials for civil and construction engineers 3rd edition ( Solution manual )Materials for civil and construction engineers 3rd edition ( Solution manual )
Materials for civil and construction engineers 3rd edition ( Solution manual )
 
Story board
Story boardStory board
Story board
 
Resume format - 2
Resume format - 2Resume format - 2
Resume format - 2
 
Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2
Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2
Dr. Teodross Avery-Detailed Resume 10:23:16 (Pages) 2
 
горийска № 8 публичная школа
горийска № 8 публичная школагорийска № 8 публичная школа
горийска № 8 публичная школа
 
APRENDIZAJE INVERTIDO
APRENDIZAJE INVERTIDOAPRENDIZAJE INVERTIDO
APRENDIZAJE INVERTIDO
 
2013 extrait de confidentialité entre l'identité humaine et la juridique conf...
2013 extrait de confidentialité entre l'identité humaine et la juridique conf...2013 extrait de confidentialité entre l'identité humaine et la juridique conf...
2013 extrait de confidentialité entre l'identité humaine et la juridique conf...
 
20161030 주일예배 예배
20161030 주일예배   예배20161030 주일예배   예배
20161030 주일예배 예배
 
Sampling and low-rank tensor approximations
Sampling and low-rank tensor approximationsSampling and low-rank tensor approximations
Sampling and low-rank tensor approximations
 
Aulas Virtuales
Aulas VirtualesAulas Virtuales
Aulas Virtuales
 
Jadwal praktikum biologi umum
Jadwal praktikum biologi umumJadwal praktikum biologi umum
Jadwal praktikum biologi umum
 
Los mensajes subliminales en la comunicación visual
Los mensajes subliminales en la comunicación visualLos mensajes subliminales en la comunicación visual
Los mensajes subliminales en la comunicación visual
 
Como subir una presentacion en prezi calameo y slideshare
Como subir una presentacion en prezi calameo y slideshareComo subir una presentacion en prezi calameo y slideshare
Como subir una presentacion en prezi calameo y slideshare
 
Homeland Security workshop
Homeland Security workshopHomeland Security workshop
Homeland Security workshop
 
MEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOS
MEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOSMEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOS
MEDIADAS DE TENDENCIA CENTRAL PARA UN CONJUNTO DE DATOS AGRUPADOS Y NO AGRUPADOS
 
2013 demande formelle à reno bernier de renoncer à son serment de confidentia...
2013 demande formelle à reno bernier de renoncer à son serment de confidentia...2013 demande formelle à reno bernier de renoncer à son serment de confidentia...
2013 demande formelle à reno bernier de renoncer à son serment de confidentia...
 
Trabajo en clase
Trabajo en claseTrabajo en clase
Trabajo en clase
 
Aprendendo com os erros
Aprendendo com os errosAprendendo com os erros
Aprendendo com os erros
 

Similar to Data Mining for Big Data-Murat Yazıcı

McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxandreecapon
 
Study of Data Mining Methods and its Applications
Study of  Data Mining Methods and its ApplicationsStudy of  Data Mining Methods and its Applications
Study of Data Mining Methods and its ApplicationsIRJET Journal
 
DM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year studentsDM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year studentssriharipatilin
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlationVrushaliSolanke
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
 
Running Head Data Mining in The Cloud .docx
Running Head Data Mining in The Cloud                            .docxRunning Head Data Mining in The Cloud                            .docx
Running Head Data Mining in The Cloud .docxhealdkathaleen
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining TechniqRespa Peter
 
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperPrognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperImpetus Technologies
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGCarrie Romero
 
Uncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncodemy
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 

Similar to Data Mining for Big Data-Murat Yazıcı (20)

McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
 
Study of Data Mining Methods and its Applications
Study of  Data Mining Methods and its ApplicationsStudy of  Data Mining Methods and its Applications
Study of Data Mining Methods and its Applications
 
DM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year studentsDM UNIT_5 ppt for btech final year students
DM UNIT_5 ppt for btech final year students
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
Data mining
Data miningData mining
Data mining
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
Data Mining
Data MiningData Mining
Data Mining
 
Running Head Data Mining in The Cloud .docx
Running Head Data Mining in The Cloud                            .docxRunning Head Data Mining in The Cloud                            .docx
Running Head Data Mining in The Cloud .docx
 
Data Mining
Data MiningData Mining
Data Mining
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Dk24717723
Dk24717723Dk24717723
Dk24717723
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
 
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperPrognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMINING
 
Uncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdf
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Big data overview
Big data overviewBig data overview
Big data overview
 

Data Mining for Big Data-Murat Yazıcı

  • 1. 1 Data Mining for Big Data Murat Yazıcı Valide-I Atik Mh. Boybey Sk. No:6 D:4 34664 Uskudar/Istanbul, Turkey asiscrus.muratyazici@gmail.com In the Age of Information and Technology, the amount of information which can be stored is increasing fromday to day. Increasing the amount of information has led to the need to obtain meaningful information.We can say that the process of obtaining meaningful information from a large volume of datais an information discovery process. A number of methods have been needed and developed in the information discovery process.These methods constitute the data mining techniques such as Cluster Analysis, Regression Analysis, Social Network Analysis, Time Series Analysis, Market Basket Analysis, Outlier Detection, Text Mining etc.Data mining users aim to get prediction and making decisions for the future by obtaining some links, patterns and rules with these techniques in the light of available data.Because of the importance of obtaining relevant information, the need and importance of data mining is increasing from day to day. One of the Data Mining definitions accepted by the world is “Data Mining is a set of techniques and insights which generating new knowledge for stages of the decision of organizations, and allows us to make predictions and plans about the future”. Figure: Jiawei Han and Micheline Kamber, “Data Mining Concepts and Techniques”, page:6. 1.Data Cleaning (to remove noise and inconsistent data) 2.Data Integration (where multiple data sources may be combined) 3.Data Selection (where data relevant to the analysis task are retrieved from the database) 4.Data Transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance) 5.Data Mining (an essential process where intelligent methods are applied in order to extract data patterns) 6.Pattern Evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures) 7.Knowledge Presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)
  • 2. 2 Some Techniques: . Cluster Analysis: Cluster Analysis is a data mining technique which separates objects with similar features to each other from other objects in a data set and collects similar objects in a group. Many clusters can be formed in a data set. This technique is often used in many fieldssuch as Machine Learning, Pattern Recognition, Image Analysis, Information Retrieval and Bioinformatics etc. This technique don’t have a specific algorithm. There are many algorithm used in this technique such as Thek- Means Clustering, The k-Medoids Clustering, Hierarchical Clustering, Density-based Clustering and Fuzzy Clustering etc.This technique can be used for different purposes such as segmenting the market and determining target markets, product positioning and new product development, product preferences, buying behavior of consumers, analysis of distribution channels and consumer groups, market structure analysis, and comparison of market’s differences, or similarities from other markets. . Regression Analysis:Regression Analysis is a statistical analysis method which aims to estimate the relationships among variables. At the same time, this method aims to carry out interpolation and extrapolation after estimating relationships among variables. This technique is often used in many fields such as Research, Statistics, Mathematics, Finance, Economy, Medicine, Sociology, Modeling etc. There are many types for these analysis methods. For example, Multiple Linear Regression, Basic Linear Regression, Nonlinear Regression, Ridge Regression, Fuzzy Regression, Robust Regression, Fuzzy Robust Regression, Logistic Regression, Bayesian Linear Regression, Percentage Regression, Nonparametric Regression, Local Regression, Segmented Regression, Stepwise Regression etc.Using different algorithms has brought about many different techniques as above.Sales figures of a product may wanted to establish with a model which estimates the future aboutsales figures of a product. Empirical duration of common stocks may wanted to establish with a model. Thus, predictions can be made for the future about common stocks. This technique can be usedfor the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication in medicine. . Social Network Analysis (SNA): Social network analysis views social relationships by using Network Theory which consists nodes representing individual actors within the networkand ties which represent relationships between the individuals. SNA usessocial network diagram where nodes are represented as points, and ties are represented as lines for visualization. This analysis is often used in the Sociology, Anthropology, Biology, Communication Studies, Economics,Geography, History, Information Science, Organizational Studies, Political Science, Social Psychology, Development Studies, and Socio- linguistics etc.Companies may purpose to analysis customer’s social network. Thus, they can determinespecific people who has asignificantly effect on other people, and the width of the network. Afterward, companies can support activitiessuch as customer interaction and analysis, marketing and business intelligence needs. Some organizations may want to determine development of leader engagement strategies, analysis of individual and group engagement and media use, and community- based problem solving. Social network analysis is also used in intelligence, counter-intelligence and law enforcement activities.The NSA (National Security Agency) has been performing social network analysis on Call Detail Records (CDRs), also known as metadata, since shortly after the September 11 Attacks.
  • 3. 3 . Time Series Analysis:Time Series Analysis is a method which analyzes time series data in order to obtain meaningful statistics. As well as, Time Series Analysis aim to establish a model which predicts future values based on previously observed values.This analysis method is often used many areas such as Statistics, Econometrics, Mathematical Finance, Seismology, Meteorology,Geophysics, Control Engineering, Astronomy and Communications Engineering. Methods of time series analysis are divided into parametric and non-parametric, linear and non-linear, univariate and multivariatemethods.Some of methods in time series analysis are the autoregressive (AR) models, the integrated (I) models, the moving average (MA) models, autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models and autoregressive fractionally integrated moving average (ARFIMA). Furthermore,to investigate the effect of the season on the data is an important issue in Time Series Analysis. The effect of the season on the data may produce incorrect results. Some seasonal adjustment techniques such asX-12-ARIMA, TRAMO/SEATS and STAMP can be used to eliminate the effect of the season.Some organizations want to forcaste their product’s sales figure for the future. By based on stocks prices, the next opening or closing price of the stock may wanted to forcaste. In meteorology, forcasting of the next days’ weather may wanted to determine.In astronomy, stars which show periodic light level changesmay wanted to establish with a model about their periodic light level. In seismology, the occurrence of earthquakes may be wanted to predict with a model which is based on time series. . Market Basket Analysis: Market Basket Analysis is a data mining technique which tries to discover interesting relations between variables in large databases. This technique is often used today in many application areas including Web Usage Mining, Intrusion Detection, Continuous Production and Bioinformatics. Many algorithms for generating association rules were presented. Some of well known algorithms in this analysis are Apriori, Eclat and FP-Growth.Some organizations which sell products may want to use this analysis technique to determine customers purchasing behavior. Whereby, they can make strategies for marketing. Forexample, we assume that a company revealed thatthere is a high correlation between the two products after doing Market Basket Analysis. The company may want to offer these two products on the same shelf by creating sales strategy for sale. Thus, It can increase its sales figures. Another area of common use of Market Basket Analysis is theIntrusion Detection. The security of our computer systems and data are at continual risk. The extensive growth of the Internet and increasing availability of tools and tricks for intruding and attacking networks have prompted the intrusion detection to become a critical component of network administration. Market Basket Analysis can be applied to find relationships among system attributes describing the network data. The relationships among system attributes can provide an insight regarding the selection of useful attributes for intrusion detection. .Outlier Detection: In statistics, an outlier is an observation point which is distant from other observations.The presence of outlier value in a data set can cause to incorrect analysis results. So, some techniques are need about Outlier Detection.Outlier Detection is a data mining technique which is used in variety fields such as the Intrusion Detection, Fraud Detection, Fault Detection, System Health Monitoring, Event Detection in Sensor Network and Detecting Eco-system Disturbances etc. In outlier detection process, there are more than one technique. Some of the popular techniques are the Distance
  • 4. 4 Based Techniques, One Class Support Vector Machines, Replicator Neural NetworksandCluster Analysis based on Outlier Detection.This data mining technique can beused in credit card fraud detection, clinical trials, voting irregularity analysis, data cleansing, network intrusion, severe weather prediction, geographic information systems, and athlete performance analysis etc. . Text Mining: Text Mining is a data mining technique which aims to obtain useful information from atext. Some of text mining tasks are the text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization,and entity relation modelingetc. Text Mining techniques involve linguistic, statistical, and machine learning techniques. Text Miningincludes some issue such as information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. Text mining is often used many areas such as the Security, Biomedical, Software, Online Media, Marketing, Sentiment Analysis, Academic Applications etc.Some organizations in public, or private sector may want tomonitor and analyze online plain text sources such as Internet News, blogs, etc. for security purposes. Within public sector, some software are needed for tracking and monitoring terrorist activities. In Marketing Applications, Text Mining can beused in analytical customer relationship management . In Sentiment Analysis, Text Mining can beused fordetermining the attitude of writer or speaker.