SlideShare a Scribd company logo
1 of 11
Assignment 1: Application
Survey on Data Mining and
Data Warehousing
CSCI 4144, Winter 2016
ID: B00707506, Student Name: Patrick Walter
1/15/2016
2
Link-prediction in Social Networks: A Survey
Introduction
A social network consists of two main components, a set of social actors and a set of
connections. In many cases the social actors represent people, while the connections represent
any form of social interaction, collaboration or influence. It follows that a social network can be
easily represented by a graph with the actors being nodes, and the connections being edges.
The popularity of social networks online has exploded over the past decade. Social networks
have expanded from the contexts of networks of researchers who have collaborated with each
other or employees at a company who have worked together to social networks which can
connect anyone in the world together.
Given that social networks are often based on people, they are often highly dynamic
with actors constantly making new interactions and connections with each other. In many
applications it is beneficial to be able to make predictions about these future connections. The
link-prediction problem was defined by Jon Kleinberg and David Liben-Nowell as the following,
“Given a snapshot of a social network at time t, we seek to accurately predict the edges that
will be added to the network during the interval from time t to a given future time t’ “. (Liben-
Nowell & Kleinberg, 2007) Using link-prediction a system can model the evolution of the
network based on features that are intrinsic to the network. An example of the link-prediction
problem is seen in social networks such as Facebook and other web-based social networks.
Facebook has systems that suggest users to make connections with other users who they may
3
know, or with companies they may like. These suggestions may create a more engaging
experience for users when they can easily make connections with their friends. Link-predictions
can also be used by companies to make suggestions on employees that should work together
on new projects. Thus many companies have vested interest in developing effective link-
prediction systems.
Using Location-based Data to Make Better Predictions
Many link-prediction systems rely heavily on making predictions based on 2-hop
neighbours, or friends- of-friends. This is a result of the scale of most social networks being the
millions of nodes, and the likelihood of two nodes making a connection declining exponentially
with each hop. Social networks that deploy location-based information such as check-ins can
give a way to make predictions that do not occur between neighbouring nodes. By exploiting
the location data of nodes, link-predictions can be made for nodes sharing one or more of these
locations. These nodes may not be within the 2-hop neighbourhood of each other and
therefore the link between them could not be made by a friends-of-friends system. The new
link made by these place-friends can be predicted by using the check-in information of the two
nodes. Thus the problem is defined by a group of researchers from University of Cambridge is:
“how do we design a link prediction system which exploits data about user check-ins” (Scellato,
Noulas, & Mascolo, 2011).
Solution Technology
The solution that Scellato, Noulas, & Mascolo used came in the form of supervised
learning. For each pair of users the link prediction is based on a set of features that describe the
4
pair. These features are based on both common social links and common and overlapping
location data. To create the training data simple labelling is applied. For each snapshot, the
features of every disjoint pair of users are computed, then in the next snapshot the pairs that
become connected are labelled positive and the others are labelled negative. Using the created
training data, classifiers are trained to construct models which can classify test data. Due to the
nature of the data having heavily skewed class distribution, using a supervised method allows
for effective discovery of inter-class boundaries to perform better classification (2011).
Evaluation
Using multiple supervised learning implementations, Scellato, Noulas, & Mascolo were
able to empirically show that using place-data increased the performance of a link-prediction
system. Random forests and model trees with linear regression gave the best performance in
their research. It was noted that the link-prediction was the more accurate in predicting links
that would be made by place-friends since they were able to exploit location-based user activity
(2011).
Allowing for Positive andNegative Links in Link-prediction Networks
In the real world, not all connections between actors in a social network are positive.
Some online social networks have implemented this concept by having actors able to create
connections that can be either positive or negative, for example “friend” or “foe”. A group of
researchers from Stanford and Cornell University “study online social networks in which
relationships can be either positive (indicating relations such as friendship) or negative
(indicating relations such as opposition or antagonism).” (Leskovec, Jure, Huttenlocher, &
5
Kleinberg, 2010). In their research, Leskovec, Jure, Huttenlocher, & Kleinberg discuss how the
sign of a given link interacts with other links in the same neighbourhood or other links
throughout the entire network. Or in terms of the link-prediction problem, what predictions
can be made about the configurations of link signs in a real social network (2010). They define
the edge sign prediction problem as follows: “given a social network with signs on all its edges,
but the sign on the edge from node u to node v, denoted s(u, v), has been “hidden.” How
reliably can we infer this sign s(u, v) using the information provided by the rest of the
network?” (Leskovec, Jure, Huttenlocher, & Kleinberg, 2010).
Solution Technology
To solve the edge sign prediction problem, Leskovec, Huttenlocher and Kleinberg
implemented a solution using a logistic regression classifier, a form of supervised learning. Since
most networks exhibited skewed distribution of positive and negative signed links the group
used two approaches. One approach used a full dataset which had only about one fifth of the
connections being negative, and the other used a balanced dataset with an equal distribution of
signs. In order to use this machine-learning approach features must be defined that describe
pairs of actors with a hidden link. There are two sets of features used. One set of features is
based on the signed degree of the two nodes which are called the degree features (2010). The
other, called the triad features, are based on the joint relationships the two nodes have with
other nodes in their neighbourhood, similar to the friends-of-friends features used in Scellato,
Noulas, and Mascolo’s research.
6
Evaluation
In total there are 23 features used to describe each hidden link, 7 degree features and
16 triad features. The Leskovec, Jure, Huttenlocher, & Kleinberg evaluated the solution on the
basis of each set of features by representing each set by a vector. What stood out the most in
the evaluation was that predictions based on their models significantly outperformed a
previous study which used propagation to go beyond the 2-hop neighbourhood on the same
dataset. This means that sign prediction can be understood based solely on the signs of other
links in the same one-step neighbourhood. In general using the full dataset gained much higher
accuracy, with about 15% improvement from random guessing (2010).
Using Continuous-valued Links in Link-predictions Networks
In the previously mentioned case of link-prediction using location-based information,
the researches treated links as binary relations, and in the edge sign prediction problem the
links were evaluated as being ternary relations. Researchers at Purdue University believe that
“in online social networks the low cost of link formation can lead to networks with
heterogeneous relationship strengths (e.g., acquaintances and best friends mixed together).”
(Xiang, Neville, & Rogati, 2010). Xiang, Neville, & Rogati developed a model to predict and
estimate the strength of links in a social network based on their interaction activity and
similarity. This challenge extends from the link-prediction problem as the group believes that
treating links as binary relations will increase the amount of noise learned by a prediction
model by treating strong and weak links equal. In most online social networks, creating links
comes at such a low-cost that many links may be much less significant than others. Including
7
these insignificant leaks in the learned model can greatly degrade the performance of the
system (2010).
Solution Technology
In order to achieve their model, the Xiang, Neville, & Rogati implemented an
unsupervised method to infer the strength of links in a network. These strength values are
continuous to represent a range of weak to strong relationships (2010). More specifically the
researchers “formulate a latent variable model to infer (hidden) relationship strengths and
develop a coordinate ascent optimization procedure for inference.” (Xiang, Neville, & Rogati,
2010). A Gaussian Distribution was used to model the conditional probability of strengths using
the similarity of the actors involved in each link and maximum likelihood of the probabilities is
used to estimate the latent variable model and a gradient-based method is used to optimize
the parameters of the model (Xiang, Neville, & Rogati, 2010).
Evaluation
Evaluation was done based on two measures, the autocorrelation improvement and the
classification improvement. In terms of autocorrelation, “the relationship-strength network has
significantly higher autocorrelation than the friendship graph in all cases” (Xiang, Neville, &
Rogati, 2010). Using Gaussian random field semi-supervised classification algorithmand
comparing with other works the group reports their model “results in the highest classification
performance for all tasks, suggesting that [their] approach to summarizing the rich profile and
interaction information in online social networks leads to a single meaningful relationship graph
8
which can improve subsequent knowledge discovery and prediction tasks.” (Xiang, Neville, &
Rogati, 2010).
Drivers and Enablers of Data Mining and Data Warehousing
There are many factors that create a demand for data mining and data warehousing
technologies. Many companies, organizations, and institutions have an interest in extracting
information and knowledge from their stored and incoming data. Some groups seek to use their
data to create monetary value while others seek understand how to serve their customers or
employees better. In today’s wide spread use of technology and the World Wide Web, society
is creating new data at alarming rates. In order to handle all this endless stream of data many
companies turn to data mining and warehousing technologies. Many companies can use data
mining to make better business decisions, better target their customers, and find new ways to
market their products and services. The amount of data created in stored far exceeds the
capabilities of any traditional data analysis tools and creates a demand for data mining.
The decreasing cost of computational power and storage are facilitating the widespread
use of data mining and data warehousing in the business world. Globalization is also driving
these technologies as the world becomes more interconnected in online communities. The
increasing availability of data collection devices such as smart phones is also contributing to the
use of data mining. Increasingly datasets are becoming openly available to the public from
many governments and organizations. The abundance of data, the low cost of computation
power, and the use of open and free software creates an environment that fosters data mining.
9
References
Leskovec,Jure,Huttenlocher,D.,&Kleinberg,J.(2010).PredictingPositive andNegative LinksinOnline
Social Networks. Proceeding WWW'10 Proceedingsof the19th internationalconferenceon World wide
web (pp.641-650). NewYork,NY, USA: ACM.
Liben-Nowell,D.,& Kleinberg,J.(2007).The Link-PredictionProblemforSocial Networks. Journalof the
American Societyfor Information Scienceand Technology ,58 (7), 1019-1031.
Scellato,S.,Noulas,A.,&Mascolo,C. (2011). ExploitingPlacesFeaturesinLinkPredictiononLocatio-
basedSocial Networks. Proceeding KDD'11 Proceedingsof the17th ACMSIGKDD international
conferenceon Knowledgediscovery and data mining (pp.1046-1054). New York,NY: ACM.
Xiang, R.,Neville,J.,&Rogati,M. (2010). ModelingRelationshipStrengthinOnline SocialNetworks.
Proceeding WWW '10 Proceedingsof the19th internationalconferenceon World wide web (pp.981-
990). NewYork,NY, USA: ACM.
10
Questions
a) Why DM and DW technologies are becoming important tools for today's business world?
With the growth of data being collected by businesses data warehousing technologies are
become more important. Companies need Data Warehousing technologies to easily access
aggregate information from their data. Businesses also seek to integrate data from multiple
different database systems with different designs and schemas. Data warehousing technology
allows for a company to store their data based on groupings. With all this data companies need
to make sense of it all. Data mining technologies allow for businesses to turn the information
stored in their data warehousing technologies into knowledge. Data mining aids businesses in
making decisions and sheds light on interested correlations that would be otherwise unknown.
In today’s online world, data is what drives businesses and data mining is the methodology of
producing knowledge from vast amounts of data.
b) What are the main differences between data mining, traditional statistics data analysis,
and information retrieval?
Data mining is the process of extracting knowledge from large amounts of data which
involves several steps that turn raw data into knowledge that is easily understood by
humans. Traditional statistical data analysis cannot handle large amounts of data.
Information retrieval, in terms of database systems, only involves accessing and retrieving
data, creating aggregate values, or performing deductive queries.
11
c) How is a data warehouse model different from a relational database model? Why DW
technology is more advanced in supporting business management?
A relational database is simply a collection of tables. Each table has columns and rows and
each cell can be accessed independently or an aggregate query may be applied to a subset
of cells. In order to access any data from a relational database queries must be made in a
relational query language. This is much different than a data warehouse which is a
repository of information from many sources stored under a unified schema. Data in a data
warehouse is stored in a way that it can provide information in a historical perspective and
in a summarized manner. Data warehouses are multidimensional and each cell contains
some aggregate measure. All of these are more advanced in supporting business
management. For example a manager can easily access the aggregate sales of a particular
product by region, or year, or region and year, or any other combination of attributes.
d) What are the main difference between using OLAP on DW and using SQL on traditional
database for supporting business decision making?
Using on-line analytical processing operations allow for data to be presented in different
layers of abstraction to accommodate for different viewpoints. This is useful in a business
environment as different departments may want to see the company’s data in different
ways. Using OLAP is much faster than SQL aggregate queries as the aggregates are
precompiled and don’t need to use computationally expensive operations such as join.

More Related Content

What's hot

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave kingDave King
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation Ratnesh Shah
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 
10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studiesdnac
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisScott Gomer
 
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisFuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisFred Stutzman
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsNoah Flower
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 
Data mining based social network
Data mining based social networkData mining based social network
Data mining based social networkFiras Husseini
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network AnalysisRory Sie
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
 
Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasyJeff Mohr
 
05 Communities in Network
05 Communities in Network05 Communities in Network
05 Communities in Networkdnac
 

What's hot (19)

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
CSE509 Lecture 6
CSE509 Lecture 6CSE509 Lecture 6
CSE509 Lecture 6
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
01 Network Data Collection (2017)
01 Network Data Collection (2017)01 Network Data Collection (2017)
01 Network Data Collection (2017)
 
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisFuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Data mining based social network
Data mining based social networkData mining based social network
Data mining based social network
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made Easy
 
05 Communities in Network
05 Communities in Network05 Communities in Network
05 Communities in Network
 

Similar to Link Prediction Survey

An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksApril Smith
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksAnvardh Nanduri
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...1crore projects
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksEditor IJCATR
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksEditor IJCATR
 
LEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYLEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYIJITE
 
Identifying Most Relevant Node Path To Increase Connection Probability In Gra...
Identifying Most Relevant Node Path To Increase Connection Probability In Gra...Identifying Most Relevant Node Path To Increase Connection Probability In Gra...
Identifying Most Relevant Node Path To Increase Connection Probability In Gra...CSCJournals
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET Journal
 
Predicting tie strength with ego network structures
Predicting tie strength with ego network structuresPredicting tie strength with ego network structures
Predicting tie strength with ego network structuresChristian Schlereth
 
Published Paper
Published PaperPublished Paper
Published PaperFaeza Noor
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)Duke Network Analysis Center
 

Similar to Link Prediction Survey (20)

An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
Ppt
PptPpt
Ppt
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behavior
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social Networks
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
 
LEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYLEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEY
 
Identifying Most Relevant Node Path To Increase Connection Probability In Gra...
Identifying Most Relevant Node Path To Increase Connection Probability In Gra...Identifying Most Relevant Node Path To Increase Connection Probability In Gra...
Identifying Most Relevant Node Path To Increase Connection Probability In Gra...
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
 
Predicting tie strength with ego network structures
Predicting tie strength with ego network structuresPredicting tie strength with ego network structures
Predicting tie strength with ego network structures
 
Published Paper
Published PaperPublished Paper
Published Paper
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 
Research Paper On Correlation
Research Paper On CorrelationResearch Paper On Correlation
Research Paper On Correlation
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Link Prediction Survey

  • 1. Assignment 1: Application Survey on Data Mining and Data Warehousing CSCI 4144, Winter 2016 ID: B00707506, Student Name: Patrick Walter 1/15/2016
  • 2. 2 Link-prediction in Social Networks: A Survey Introduction A social network consists of two main components, a set of social actors and a set of connections. In many cases the social actors represent people, while the connections represent any form of social interaction, collaboration or influence. It follows that a social network can be easily represented by a graph with the actors being nodes, and the connections being edges. The popularity of social networks online has exploded over the past decade. Social networks have expanded from the contexts of networks of researchers who have collaborated with each other or employees at a company who have worked together to social networks which can connect anyone in the world together. Given that social networks are often based on people, they are often highly dynamic with actors constantly making new interactions and connections with each other. In many applications it is beneficial to be able to make predictions about these future connections. The link-prediction problem was defined by Jon Kleinberg and David Liben-Nowell as the following, “Given a snapshot of a social network at time t, we seek to accurately predict the edges that will be added to the network during the interval from time t to a given future time t’ “. (Liben- Nowell & Kleinberg, 2007) Using link-prediction a system can model the evolution of the network based on features that are intrinsic to the network. An example of the link-prediction problem is seen in social networks such as Facebook and other web-based social networks. Facebook has systems that suggest users to make connections with other users who they may
  • 3. 3 know, or with companies they may like. These suggestions may create a more engaging experience for users when they can easily make connections with their friends. Link-predictions can also be used by companies to make suggestions on employees that should work together on new projects. Thus many companies have vested interest in developing effective link- prediction systems. Using Location-based Data to Make Better Predictions Many link-prediction systems rely heavily on making predictions based on 2-hop neighbours, or friends- of-friends. This is a result of the scale of most social networks being the millions of nodes, and the likelihood of two nodes making a connection declining exponentially with each hop. Social networks that deploy location-based information such as check-ins can give a way to make predictions that do not occur between neighbouring nodes. By exploiting the location data of nodes, link-predictions can be made for nodes sharing one or more of these locations. These nodes may not be within the 2-hop neighbourhood of each other and therefore the link between them could not be made by a friends-of-friends system. The new link made by these place-friends can be predicted by using the check-in information of the two nodes. Thus the problem is defined by a group of researchers from University of Cambridge is: “how do we design a link prediction system which exploits data about user check-ins” (Scellato, Noulas, & Mascolo, 2011). Solution Technology The solution that Scellato, Noulas, & Mascolo used came in the form of supervised learning. For each pair of users the link prediction is based on a set of features that describe the
  • 4. 4 pair. These features are based on both common social links and common and overlapping location data. To create the training data simple labelling is applied. For each snapshot, the features of every disjoint pair of users are computed, then in the next snapshot the pairs that become connected are labelled positive and the others are labelled negative. Using the created training data, classifiers are trained to construct models which can classify test data. Due to the nature of the data having heavily skewed class distribution, using a supervised method allows for effective discovery of inter-class boundaries to perform better classification (2011). Evaluation Using multiple supervised learning implementations, Scellato, Noulas, & Mascolo were able to empirically show that using place-data increased the performance of a link-prediction system. Random forests and model trees with linear regression gave the best performance in their research. It was noted that the link-prediction was the more accurate in predicting links that would be made by place-friends since they were able to exploit location-based user activity (2011). Allowing for Positive andNegative Links in Link-prediction Networks In the real world, not all connections between actors in a social network are positive. Some online social networks have implemented this concept by having actors able to create connections that can be either positive or negative, for example “friend” or “foe”. A group of researchers from Stanford and Cornell University “study online social networks in which relationships can be either positive (indicating relations such as friendship) or negative (indicating relations such as opposition or antagonism).” (Leskovec, Jure, Huttenlocher, &
  • 5. 5 Kleinberg, 2010). In their research, Leskovec, Jure, Huttenlocher, & Kleinberg discuss how the sign of a given link interacts with other links in the same neighbourhood or other links throughout the entire network. Or in terms of the link-prediction problem, what predictions can be made about the configurations of link signs in a real social network (2010). They define the edge sign prediction problem as follows: “given a social network with signs on all its edges, but the sign on the edge from node u to node v, denoted s(u, v), has been “hidden.” How reliably can we infer this sign s(u, v) using the information provided by the rest of the network?” (Leskovec, Jure, Huttenlocher, & Kleinberg, 2010). Solution Technology To solve the edge sign prediction problem, Leskovec, Huttenlocher and Kleinberg implemented a solution using a logistic regression classifier, a form of supervised learning. Since most networks exhibited skewed distribution of positive and negative signed links the group used two approaches. One approach used a full dataset which had only about one fifth of the connections being negative, and the other used a balanced dataset with an equal distribution of signs. In order to use this machine-learning approach features must be defined that describe pairs of actors with a hidden link. There are two sets of features used. One set of features is based on the signed degree of the two nodes which are called the degree features (2010). The other, called the triad features, are based on the joint relationships the two nodes have with other nodes in their neighbourhood, similar to the friends-of-friends features used in Scellato, Noulas, and Mascolo’s research.
  • 6. 6 Evaluation In total there are 23 features used to describe each hidden link, 7 degree features and 16 triad features. The Leskovec, Jure, Huttenlocher, & Kleinberg evaluated the solution on the basis of each set of features by representing each set by a vector. What stood out the most in the evaluation was that predictions based on their models significantly outperformed a previous study which used propagation to go beyond the 2-hop neighbourhood on the same dataset. This means that sign prediction can be understood based solely on the signs of other links in the same one-step neighbourhood. In general using the full dataset gained much higher accuracy, with about 15% improvement from random guessing (2010). Using Continuous-valued Links in Link-predictions Networks In the previously mentioned case of link-prediction using location-based information, the researches treated links as binary relations, and in the edge sign prediction problem the links were evaluated as being ternary relations. Researchers at Purdue University believe that “in online social networks the low cost of link formation can lead to networks with heterogeneous relationship strengths (e.g., acquaintances and best friends mixed together).” (Xiang, Neville, & Rogati, 2010). Xiang, Neville, & Rogati developed a model to predict and estimate the strength of links in a social network based on their interaction activity and similarity. This challenge extends from the link-prediction problem as the group believes that treating links as binary relations will increase the amount of noise learned by a prediction model by treating strong and weak links equal. In most online social networks, creating links comes at such a low-cost that many links may be much less significant than others. Including
  • 7. 7 these insignificant leaks in the learned model can greatly degrade the performance of the system (2010). Solution Technology In order to achieve their model, the Xiang, Neville, & Rogati implemented an unsupervised method to infer the strength of links in a network. These strength values are continuous to represent a range of weak to strong relationships (2010). More specifically the researchers “formulate a latent variable model to infer (hidden) relationship strengths and develop a coordinate ascent optimization procedure for inference.” (Xiang, Neville, & Rogati, 2010). A Gaussian Distribution was used to model the conditional probability of strengths using the similarity of the actors involved in each link and maximum likelihood of the probabilities is used to estimate the latent variable model and a gradient-based method is used to optimize the parameters of the model (Xiang, Neville, & Rogati, 2010). Evaluation Evaluation was done based on two measures, the autocorrelation improvement and the classification improvement. In terms of autocorrelation, “the relationship-strength network has significantly higher autocorrelation than the friendship graph in all cases” (Xiang, Neville, & Rogati, 2010). Using Gaussian random field semi-supervised classification algorithmand comparing with other works the group reports their model “results in the highest classification performance for all tasks, suggesting that [their] approach to summarizing the rich profile and interaction information in online social networks leads to a single meaningful relationship graph
  • 8. 8 which can improve subsequent knowledge discovery and prediction tasks.” (Xiang, Neville, & Rogati, 2010). Drivers and Enablers of Data Mining and Data Warehousing There are many factors that create a demand for data mining and data warehousing technologies. Many companies, organizations, and institutions have an interest in extracting information and knowledge from their stored and incoming data. Some groups seek to use their data to create monetary value while others seek understand how to serve their customers or employees better. In today’s wide spread use of technology and the World Wide Web, society is creating new data at alarming rates. In order to handle all this endless stream of data many companies turn to data mining and warehousing technologies. Many companies can use data mining to make better business decisions, better target their customers, and find new ways to market their products and services. The amount of data created in stored far exceeds the capabilities of any traditional data analysis tools and creates a demand for data mining. The decreasing cost of computational power and storage are facilitating the widespread use of data mining and data warehousing in the business world. Globalization is also driving these technologies as the world becomes more interconnected in online communities. The increasing availability of data collection devices such as smart phones is also contributing to the use of data mining. Increasingly datasets are becoming openly available to the public from many governments and organizations. The abundance of data, the low cost of computation power, and the use of open and free software creates an environment that fosters data mining.
  • 9. 9 References Leskovec,Jure,Huttenlocher,D.,&Kleinberg,J.(2010).PredictingPositive andNegative LinksinOnline Social Networks. Proceeding WWW'10 Proceedingsof the19th internationalconferenceon World wide web (pp.641-650). NewYork,NY, USA: ACM. Liben-Nowell,D.,& Kleinberg,J.(2007).The Link-PredictionProblemforSocial Networks. Journalof the American Societyfor Information Scienceand Technology ,58 (7), 1019-1031. Scellato,S.,Noulas,A.,&Mascolo,C. (2011). ExploitingPlacesFeaturesinLinkPredictiononLocatio- basedSocial Networks. Proceeding KDD'11 Proceedingsof the17th ACMSIGKDD international conferenceon Knowledgediscovery and data mining (pp.1046-1054). New York,NY: ACM. Xiang, R.,Neville,J.,&Rogati,M. (2010). ModelingRelationshipStrengthinOnline SocialNetworks. Proceeding WWW '10 Proceedingsof the19th internationalconferenceon World wide web (pp.981- 990). NewYork,NY, USA: ACM.
  • 10. 10 Questions a) Why DM and DW technologies are becoming important tools for today's business world? With the growth of data being collected by businesses data warehousing technologies are become more important. Companies need Data Warehousing technologies to easily access aggregate information from their data. Businesses also seek to integrate data from multiple different database systems with different designs and schemas. Data warehousing technology allows for a company to store their data based on groupings. With all this data companies need to make sense of it all. Data mining technologies allow for businesses to turn the information stored in their data warehousing technologies into knowledge. Data mining aids businesses in making decisions and sheds light on interested correlations that would be otherwise unknown. In today’s online world, data is what drives businesses and data mining is the methodology of producing knowledge from vast amounts of data. b) What are the main differences between data mining, traditional statistics data analysis, and information retrieval? Data mining is the process of extracting knowledge from large amounts of data which involves several steps that turn raw data into knowledge that is easily understood by humans. Traditional statistical data analysis cannot handle large amounts of data. Information retrieval, in terms of database systems, only involves accessing and retrieving data, creating aggregate values, or performing deductive queries.
  • 11. 11 c) How is a data warehouse model different from a relational database model? Why DW technology is more advanced in supporting business management? A relational database is simply a collection of tables. Each table has columns and rows and each cell can be accessed independently or an aggregate query may be applied to a subset of cells. In order to access any data from a relational database queries must be made in a relational query language. This is much different than a data warehouse which is a repository of information from many sources stored under a unified schema. Data in a data warehouse is stored in a way that it can provide information in a historical perspective and in a summarized manner. Data warehouses are multidimensional and each cell contains some aggregate measure. All of these are more advanced in supporting business management. For example a manager can easily access the aggregate sales of a particular product by region, or year, or region and year, or any other combination of attributes. d) What are the main difference between using OLAP on DW and using SQL on traditional database for supporting business decision making? Using on-line analytical processing operations allow for data to be presented in different layers of abstraction to accommodate for different viewpoints. This is useful in a business environment as different departments may want to see the company’s data in different ways. Using OLAP is much faster than SQL aggregate queries as the aggregates are precompiled and don’t need to use computationally expensive operations such as join.