This document discusses frameworks for analyzing evolution in social networks. It covers four key dimensions for studying evolution: how time is dealt with, the objective of study, definitions of communities, and whether evolution is the objective or assumption. Methods described include modeling networks across time, learning single or multiple models, community detection algorithms, and dynamic probabilistic models. Challenges include handling multiple data streams, accounting for entity properties over time, and ensuring community models evolve smoothly.
This document provides a comparison of SQL and NoSQL databases. It summarizes the key features of SQL databases, including their use of schemas, SQL query languages, ACID transactions, and examples like MySQL and Oracle. It also summarizes features of NoSQL databases, including their large data volumes, scalability, lack of schemas, eventual consistency, and examples like MongoDB, Cassandra, and HBase. The document aims to compare the different approaches of SQL and NoSQL for managing data.
Wine Quality Analysis Using Machine LearningMahima -
Wine industries use Product Quality Certification to promote their products and become a concern for every individual who consumes any product. But it's not possible to ensure wine quality by experts with such a huge demand for the product as it will increase the cost. It allows building a model using machine learning techniques with a user interface which predicts the quality of the wine by selecting the important parameters.
Data Models In Database Management SystemAmad Ahmad
This document discusses different types of data models used in database management systems (DBMS), including record-based, relational, network, hierarchical, and entity-relationship (ER) models. It provides an overview of key concepts like data, information, databases, and data models. For each model type, it describes how data is organized and represented. For example, it explains that the relational model organizes data into two-dimensional tables with attributes and tuples, while the hierarchical model structures data in a tree configuration. The ER model views data as entities and relationships between entities.
The document provides an overview of Big Data technology landscape, specifically focusing on NoSQL databases and Hadoop. It defines NoSQL as a non-relational database used for dealing with big data. It describes four main types of NoSQL databases - key-value stores, document databases, column-oriented databases, and graph databases - and provides examples of databases that fall under each type. It also discusses why NoSQL and Hadoop are useful technologies for storing and processing big data, how they work, and how companies are using them.
Big Data Sources PowerPoint Presentation Slides SlideTeam
Manage the big data efficiently using our big data sources PowerPoint deck. These 25 slides presentation deck gives you the access to highlight the information related to the most complex topic in the most desired manner. This PPT deck has been designed by our creative and proficient designers who have great designing skills and knowledge about the concept of big data. Our PPT deck helps you make your people understand that from where the data comes and where it gets stored. Every individual today is using one or another device to accomplish their day to day tasks but only few know how they can assure to protect their data and how it is saved in the cloud storage. To make a presentation where you have to share the information about big data sources you can use this big data sources PowerPoint deck. Some key presentation slides that are included in the deck are cloud, web, internet of things, media, databases, data warehouse appliances etc. Disprove exaggerated claims with our Big Data Sources PowerPoint Presentation Slides. Force the boastful to accept their error.
This document provides an overview of a hospital management system project. It discusses conducting a feasibility study, planning and scheduling the project, performing system analysis, designing the system architecture including databases and user interfaces, implementing the system, and testing it. The project aims to automate the manual record keeping process in hospitals to make it more efficient and reduce errors. It will develop a web-based system using Java and MySQL to digitize key functions like patient registration, appointment scheduling, billing, and more. Testing will ensure the system meets requirements before users accept it.
This is a presentation on Handwritten Digit Recognition using Convolutional Neural Networks. Convolutional Neural Networks give better results as compared to conventional Artificial Neural Networks.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
This document provides a comparison of SQL and NoSQL databases. It summarizes the key features of SQL databases, including their use of schemas, SQL query languages, ACID transactions, and examples like MySQL and Oracle. It also summarizes features of NoSQL databases, including their large data volumes, scalability, lack of schemas, eventual consistency, and examples like MongoDB, Cassandra, and HBase. The document aims to compare the different approaches of SQL and NoSQL for managing data.
Wine Quality Analysis Using Machine LearningMahima -
Wine industries use Product Quality Certification to promote their products and become a concern for every individual who consumes any product. But it's not possible to ensure wine quality by experts with such a huge demand for the product as it will increase the cost. It allows building a model using machine learning techniques with a user interface which predicts the quality of the wine by selecting the important parameters.
Data Models In Database Management SystemAmad Ahmad
This document discusses different types of data models used in database management systems (DBMS), including record-based, relational, network, hierarchical, and entity-relationship (ER) models. It provides an overview of key concepts like data, information, databases, and data models. For each model type, it describes how data is organized and represented. For example, it explains that the relational model organizes data into two-dimensional tables with attributes and tuples, while the hierarchical model structures data in a tree configuration. The ER model views data as entities and relationships between entities.
The document provides an overview of Big Data technology landscape, specifically focusing on NoSQL databases and Hadoop. It defines NoSQL as a non-relational database used for dealing with big data. It describes four main types of NoSQL databases - key-value stores, document databases, column-oriented databases, and graph databases - and provides examples of databases that fall under each type. It also discusses why NoSQL and Hadoop are useful technologies for storing and processing big data, how they work, and how companies are using them.
Big Data Sources PowerPoint Presentation Slides SlideTeam
Manage the big data efficiently using our big data sources PowerPoint deck. These 25 slides presentation deck gives you the access to highlight the information related to the most complex topic in the most desired manner. This PPT deck has been designed by our creative and proficient designers who have great designing skills and knowledge about the concept of big data. Our PPT deck helps you make your people understand that from where the data comes and where it gets stored. Every individual today is using one or another device to accomplish their day to day tasks but only few know how they can assure to protect their data and how it is saved in the cloud storage. To make a presentation where you have to share the information about big data sources you can use this big data sources PowerPoint deck. Some key presentation slides that are included in the deck are cloud, web, internet of things, media, databases, data warehouse appliances etc. Disprove exaggerated claims with our Big Data Sources PowerPoint Presentation Slides. Force the boastful to accept their error.
This document provides an overview of a hospital management system project. It discusses conducting a feasibility study, planning and scheduling the project, performing system analysis, designing the system architecture including databases and user interfaces, implementing the system, and testing it. The project aims to automate the manual record keeping process in hospitals to make it more efficient and reduce errors. It will develop a web-based system using Java and MySQL to digitize key functions like patient registration, appointment scheduling, billing, and more. Testing will ensure the system meets requirements before users accept it.
This is a presentation on Handwritten Digit Recognition using Convolutional Neural Networks. Convolutional Neural Networks give better results as compared to conventional Artificial Neural Networks.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This document discusses static and dynamic websites and how to create dynamic web pages in Java. A static website contains fixed HTML pages while a dynamic website pulls content from a database and can update dynamically. Dynamic pages are created using client-side or server-side scripting. In Java, servlets are used to generate dynamic web pages. Servlets are Java programs that run on a web server and can handle complex requests. The servlet container processes servlet requests and responses, providing a platform for dynamic web pages.
This document provides an overview of key concepts in database systems and management. It discusses the purpose of database systems in organizing and managing data, the relational and object-based models for structuring data, and languages like SQL for defining, manipulating and querying data. The document also outlines different levels of abstraction, database schemas and instances, and components of database management systems.
The document proposes a heart attack prediction system using fuzzy C-means clustering. The system takes in a patient's medical attributes like age, blood pressure, and artery thickness from their records. It then uses a fuzzy C-means algorithm to cluster this data and predict the patient's risk of a heart attack. The system is intended to help doctors make earlier diagnoses compared to only relying on their experience and a patient's records.
The document outlines a syllabus for an Internet of Things Technology course. It includes 5 modules that will be covered over the semester. Evaluation will consist of 3 internal assessments weighted at 30%, 40%, and 30% respectively, covering different portions of the syllabus. Students must attain a minimum of 85% attendance and assignments will be due before each internal assessment. The class website and online testing platform are also indicated.
This document provides an overview of database system concepts and architecture. It discusses different data models including conceptual, physical and implementation models. It also covers database languages, interfaces, utilities and centralized versus distributed (client-server) architectures. Specifically, it describes hierarchical and network data models, the three schema architecture, data independence, DBMS languages like DDL and DML, and different DBMS classifications including relational, object-oriented and distributed systems.
This document is a project report for developing a social networking site submitted as part of a master's degree program. It discusses the existing system's limitations in allowing people to voice violations, injustice, and corruption happening around them. The proposed system aims to provide a common platform for citizens of India to discuss these issues and take appropriate action. It describes the system's modules, development strategy using prototyping, and technical feasibility of the project. In summary, the document outlines a social media platform to promote social responsibility in India by enabling citizens to report issues and participate in online discussions.
This document discusses cloud computing, including definitions of cloud computing, the different types of cloud computing services (SaaS, PaaS, IaaS), examples of cloud platforms like Google Cloud, and advantages like reduced costs, scalability, and environmental benefits compared to traditional computing. It also notes some disadvantages like reliance on internet connectivity and lack of access offline.
Comparison of approaches to security taken by Tableau and Power BI to help make wise choices when planning out enterprise level deployments. View the video recording and download this deck at: https://senturus.com/events/enterprise-security-tableau-vs-power-bi/
Senturus offers a full spectrum of services for business analytics. Our resource library has hundreds of free live and recorded webinars, blog posts, demos and unbiased product reviews available on our website at: http://www.senturus.com/senturus-resources/.
- Problems with traditional data centers.
- Cloud computing definition, deployment, and services models.
- Essential characteristics of cloud services.
- IaaS examples.
- PaaS examples.
- SaaS examples.
- Cloud enabling technologies such as grid computing, utility computing, service oriented architecture (SOA), The Internet, Multi-tenancy, Web 2.0, Automation and Virtualization.
This document provides an overview of a course on computer vision called CSCI 455: Intro to Computer Vision. It acknowledges that many of the course slides were modified from other similar computer vision courses. The course will cover topics like image filtering, projective geometry, stereo vision, structure from motion, face detection, object recognition, and convolutional neural networks. It highlights current applications of computer vision like biometrics, mobile apps, self-driving cars, medical imaging, and more. The document discusses challenges in computer vision like viewpoint and illumination variations, occlusion, and local ambiguity. It emphasizes that perception is an inherently ambiguous problem that requires using prior knowledge about the world.
The document discusses embedded SQL statements and stored procedures in Oracle databases. Embedded SQL allows SQL statements to be placed within a host programming language like C/C++. Stored procedures are named PL/SQL blocks that perform specific database-related tasks. Triggers are PL/SQL blocks that automatically execute in response to data changes, such as inserts or updates.
A multimedia database stores different types of media such as text, audio, video and images. It can be organized by linking metadata to the actual media files or by embedding the media files within the database. A multimedia database management system provides support for creating, storing, accessing, querying and controlling multimedia data and formats. Examples of large multimedia databases include YouTube's database of over 100 million videos watched daily which has grown to over 45 terabytes, and Google's database containing over 33 trillion search entries.
Diabetes is a disease which is rapidly increasing all over the world. It occurs when pancreas does not produce sufficient insulin, or body can not sufficiently use insulin it produces. Diabetes person has increase blood glucose in the body. One of the major problem diabetic patients suffers from is the Diabetic Retinopathy (DR) and blindness. Since the number of diabetes patients is continuously increasing, it increases the data as well.
This document provides an overview of SQL and NoSQL databases. It defines SQL as a language used to communicate with relational databases, allowing users to query, manipulate, and retrieve data. NoSQL databases are defined as non-relational and allow for flexible schemas. The document compares key aspects of SQL and NoSQL such as data structure, querying, scalability and provides examples of popular SQL and NoSQL database systems. It concludes that both SQL and NoSQL databases will continue to be important with polyglot persistence, using the best database for each storage need.
This document describes a proposed Aadhar Secure Travel Identity system that would allow citizens in India to have a single unique identification number linked to their passport, driver's license, and criminal records. The system would integrate government departments like Aadhar, Crime Records, Passport Office, and Regional Transport Authority. It would allow citizens to easily apply for and manage documents without agents. Law enforcement could also use the system to track citizens and block travel if needed during an investigation. The document outlines the proposed system architecture, modules, and technical requirements.
We provide platforms and frameworks for rapid development to unify Everything on the Internet for innovations to be created. Our customers has chosen our technology to be on the edge into the future of Internet. The future of Internet is not about a single technology or protocol, but to make them coexist.
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
Online Music Store is a website that contains recent Bollywood songs and videos.
User can listen as well as download the songs.
Songs are stored in database in the form of bytes.
Reduces storage space and load.
Makes the website quick .
ABSTRACT
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
LCF is a temporal approach to link prediction in dynamic social networks. It proposes a new predictor called Latest Common Friend (LCF) that incorporates temporal aspects. Social networks are modeled as sequences of snapshots over time periods. Each edge is assigned a weight based on timestamp. LCF score for node pairs is the cumulative weight of their common friends, giving more weight to friends with later timestamps. LCF outperforms traditional predictors like Common Neighbor, Adamic-Adar and Jaccard coefficient on 8 real-world dynamic network datasets based on average AUC scores. Modeling networks temporally and weighting edges by timestamp allows LCF to better predict future links in dynamic social networks.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This document discusses static and dynamic websites and how to create dynamic web pages in Java. A static website contains fixed HTML pages while a dynamic website pulls content from a database and can update dynamically. Dynamic pages are created using client-side or server-side scripting. In Java, servlets are used to generate dynamic web pages. Servlets are Java programs that run on a web server and can handle complex requests. The servlet container processes servlet requests and responses, providing a platform for dynamic web pages.
This document provides an overview of key concepts in database systems and management. It discusses the purpose of database systems in organizing and managing data, the relational and object-based models for structuring data, and languages like SQL for defining, manipulating and querying data. The document also outlines different levels of abstraction, database schemas and instances, and components of database management systems.
The document proposes a heart attack prediction system using fuzzy C-means clustering. The system takes in a patient's medical attributes like age, blood pressure, and artery thickness from their records. It then uses a fuzzy C-means algorithm to cluster this data and predict the patient's risk of a heart attack. The system is intended to help doctors make earlier diagnoses compared to only relying on their experience and a patient's records.
The document outlines a syllabus for an Internet of Things Technology course. It includes 5 modules that will be covered over the semester. Evaluation will consist of 3 internal assessments weighted at 30%, 40%, and 30% respectively, covering different portions of the syllabus. Students must attain a minimum of 85% attendance and assignments will be due before each internal assessment. The class website and online testing platform are also indicated.
This document provides an overview of database system concepts and architecture. It discusses different data models including conceptual, physical and implementation models. It also covers database languages, interfaces, utilities and centralized versus distributed (client-server) architectures. Specifically, it describes hierarchical and network data models, the three schema architecture, data independence, DBMS languages like DDL and DML, and different DBMS classifications including relational, object-oriented and distributed systems.
This document is a project report for developing a social networking site submitted as part of a master's degree program. It discusses the existing system's limitations in allowing people to voice violations, injustice, and corruption happening around them. The proposed system aims to provide a common platform for citizens of India to discuss these issues and take appropriate action. It describes the system's modules, development strategy using prototyping, and technical feasibility of the project. In summary, the document outlines a social media platform to promote social responsibility in India by enabling citizens to report issues and participate in online discussions.
This document discusses cloud computing, including definitions of cloud computing, the different types of cloud computing services (SaaS, PaaS, IaaS), examples of cloud platforms like Google Cloud, and advantages like reduced costs, scalability, and environmental benefits compared to traditional computing. It also notes some disadvantages like reliance on internet connectivity and lack of access offline.
Comparison of approaches to security taken by Tableau and Power BI to help make wise choices when planning out enterprise level deployments. View the video recording and download this deck at: https://senturus.com/events/enterprise-security-tableau-vs-power-bi/
Senturus offers a full spectrum of services for business analytics. Our resource library has hundreds of free live and recorded webinars, blog posts, demos and unbiased product reviews available on our website at: http://www.senturus.com/senturus-resources/.
- Problems with traditional data centers.
- Cloud computing definition, deployment, and services models.
- Essential characteristics of cloud services.
- IaaS examples.
- PaaS examples.
- SaaS examples.
- Cloud enabling technologies such as grid computing, utility computing, service oriented architecture (SOA), The Internet, Multi-tenancy, Web 2.0, Automation and Virtualization.
This document provides an overview of a course on computer vision called CSCI 455: Intro to Computer Vision. It acknowledges that many of the course slides were modified from other similar computer vision courses. The course will cover topics like image filtering, projective geometry, stereo vision, structure from motion, face detection, object recognition, and convolutional neural networks. It highlights current applications of computer vision like biometrics, mobile apps, self-driving cars, medical imaging, and more. The document discusses challenges in computer vision like viewpoint and illumination variations, occlusion, and local ambiguity. It emphasizes that perception is an inherently ambiguous problem that requires using prior knowledge about the world.
The document discusses embedded SQL statements and stored procedures in Oracle databases. Embedded SQL allows SQL statements to be placed within a host programming language like C/C++. Stored procedures are named PL/SQL blocks that perform specific database-related tasks. Triggers are PL/SQL blocks that automatically execute in response to data changes, such as inserts or updates.
A multimedia database stores different types of media such as text, audio, video and images. It can be organized by linking metadata to the actual media files or by embedding the media files within the database. A multimedia database management system provides support for creating, storing, accessing, querying and controlling multimedia data and formats. Examples of large multimedia databases include YouTube's database of over 100 million videos watched daily which has grown to over 45 terabytes, and Google's database containing over 33 trillion search entries.
Diabetes is a disease which is rapidly increasing all over the world. It occurs when pancreas does not produce sufficient insulin, or body can not sufficiently use insulin it produces. Diabetes person has increase blood glucose in the body. One of the major problem diabetic patients suffers from is the Diabetic Retinopathy (DR) and blindness. Since the number of diabetes patients is continuously increasing, it increases the data as well.
This document provides an overview of SQL and NoSQL databases. It defines SQL as a language used to communicate with relational databases, allowing users to query, manipulate, and retrieve data. NoSQL databases are defined as non-relational and allow for flexible schemas. The document compares key aspects of SQL and NoSQL such as data structure, querying, scalability and provides examples of popular SQL and NoSQL database systems. It concludes that both SQL and NoSQL databases will continue to be important with polyglot persistence, using the best database for each storage need.
This document describes a proposed Aadhar Secure Travel Identity system that would allow citizens in India to have a single unique identification number linked to their passport, driver's license, and criminal records. The system would integrate government departments like Aadhar, Crime Records, Passport Office, and Regional Transport Authority. It would allow citizens to easily apply for and manage documents without agents. Law enforcement could also use the system to track citizens and block travel if needed during an investigation. The document outlines the proposed system architecture, modules, and technical requirements.
We provide platforms and frameworks for rapid development to unify Everything on the Internet for innovations to be created. Our customers has chosen our technology to be on the edge into the future of Internet. The future of Internet is not about a single technology or protocol, but to make them coexist.
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
Online Music Store is a website that contains recent Bollywood songs and videos.
User can listen as well as download the songs.
Songs are stored in database in the form of bytes.
Reduces storage space and load.
Makes the website quick .
ABSTRACT
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
LCF is a temporal approach to link prediction in dynamic social networks. It proposes a new predictor called Latest Common Friend (LCF) that incorporates temporal aspects. Social networks are modeled as sequences of snapshots over time periods. Each edge is assigned a weight based on timestamp. LCF score for node pairs is the cumulative weight of their common friends, giving more weight to friends with later timestamps. LCF outperforms traditional predictors like Common Neighbor, Adamic-Adar and Jaccard coefficient on 8 real-world dynamic network datasets based on average AUC scores. Modeling networks temporally and weighting edges by timestamp allows LCF to better predict future links in dynamic social networks.
FAST AND ACCURATE MINING THE COMMUNITY STRUCTURE: INTEGRATING CENTER LOCATING...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSijcsit
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Subhajit Sahu
Highlighted notes during research with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli:
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dynamic Graphs.
https://ieeexplore.ieee.org/document/9384277
There are 3 types of community detection methods:
Divisive, Agglomerative, and Multi-level (usually better).
In this paper, heuristics for skipping out most likely unaffected vertices for a modularity-based community detection method like Louvain and SLM (Smart Local Moving) is given. All edge batches are undirected, and sorted by source vertex id. For edge additions, source vertex i, highest modularity changing edge vertex j*, i's neighbors, and j*'s community are marked as affected. For edge deletions, where i and j must be in the same community, i, j, i's neighbors, and i's community are marked as affected. Performance is compared with static, dynamic baseline (incremental), and this method (both Louvain and SLM). Comparison is also done with "DynaMo" and "Batch" community detection methods.
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...Pei Lee
We describe the complete skills of tracking cluster evolution patterns in large evolving networks in this talk. In simple words, given a dynamic graph which is updated at each time moment, how can we incrementally monitor the evolution patterns of graph clusters? Typical evolution patterns include appear/disappear, grow/decay, merge/split. We discussed the incremental computation framework, in contrast to the traditional graph snapshot sequence approach. The ICDE 2014 paper can be found at http://www.cs.ubc.ca/~peil/research.html
This document presents a study analyzing the dynamics of interacting multiplex networks. The objectives are to analyze popular social networking platforms like Facebook and Twitter as multiplex networks through common users, and to study how interactions in one network layer can influence dynamics in another layer using a multiplex network model with different topologies. The study collects data from Facebook and Twitter to construct a multiplex network and identifies common users across the networks. It then analyzes citation and co-authorship networks as interacting layers to study how productivity and citations in one network correlate with the other network. Key results show influence of higher degree nodes on citations and collaboration patterns across the network layers.
Centrality Prediction in Mobile Social NetworksIJERA Editor
By analyzing evolving centrality roles using time dependent graphs, researchers may predict future centrality values. This may prove invaluable in designing efficient routing and energy saving strategies and have profound implications on evolving social behavior in dynamic social networks. In this paper, we propose a new method to predict centrality values of nodes in a dynamic environment. The proposed method is based on calculating the correlation between current and past measure of centrality for each corresponding node, which is used to form a composite vector to represent the given state of centralities. The performance of the proposed method is evaluated through simulated predictions on data sets from real mobile networks. Results indicate significantly low prediction error rate occurs, with a suitable implementation of the proposed method.
This document discusses dynamic community discovery in networks. It begins with an introduction to the goal of identifying meaningful group structures in networks and modeling real-world social networks as dynamic graphs. It then discusses what communities are, why dynamic analysis is needed, and types of communities. The document outlines a model for dynamic community analysis and rules for more complex analysis like birth, death, merging and splitting of communities. It also discusses algorithms for tracking communities across time steps and gives an example of applying this to a real mobile operator network.
This document summarizes a research paper that compares the performance of 23 different activation functions when used in the gates of LSTM networks for classification tasks. The paper trains LSTM networks with different activation functions on three datasets: IMDB, Movie Reviews, and MNIST. It finds that LSTM networks using the Elliott activation function and its modifications achieved the lowest average error on all three datasets, outperforming the commonly used sigmoid activation function. The paper provides the first extensive study and comparison of various activation functions in LSTM networks for classification.
This document discusses predicting new friendships in social networks using temporal information. It describes research on predicting new links in social networks over time using supervised learning models trained on temporal features from past network interactions. The researchers used anonymized Facebook data over 28 months to train decision tree and neural network classifiers to predict new relationships, finding models using temporal information performed better than those without it.
This document summarizes previous work on modeling and analyzing network traffic at the source level. It discusses how traffic at network aggregation points has been found to exhibit self-similarity and long-range dependence, which can be explained by the superposition of many ON/OFF sources with heavy-tailed distributions. The document then describes an experimental study where the authors collected traffic traces from various sources, analyzed the statistical properties, and fitted distributions to metrics like interarrival times and packet sizes in order to model source traffic for simulation purposes.
Power system and communication network co simulation for smart grid applicationsIndra S Wahyudi
This paper proposes a power system and communication network co-simulation framework that integrates a power system dynamics simulator (PSLF) and a network simulator (NS2). The framework uses a global scheduler to run the simulation in a discrete event-driven manner, eliminating errors from separate synchronization of the simulators. As a case study, an agent-based remote backup relay system is simulated using this co-simulation framework to demonstrate its effectiveness in evaluating smart grid applications involving interactions between power and communication systems.
Sub-Graph Finding Information over Nebula Networksijceronline
Social and information networks have been extensively studied over years. This paper studies a new query on sub graph search on heterogeneous networks. Given an uncertain network of N objects, where each object is associated with a network to an underlying critical problem of discovering, top-k sub graphs of entities with rare and surprising associations returns k objects such that the expected matching sub graph queries efficiently involves, Compute all matching sub graphs which satisfy "Nebula computing requests" and this query is useful in ranking such results based on the rarity and the interestingness of the associations among nebula requests in the sub graphs. "In evaluating Top k-selection queries, "we compute information nebula using a global structural context similarity, and our similarity measure is independent of connection sub graphs". We need to compute the previous work on the matching problem can be harnessed for expected best for a naive ranking after matching for large graphs. Top k-selection sets and search for the optimal selection set with the large graphs; sub graphs may have enormous number of matches. In this paper, we identify several important properties of top-k selection queries, We propose novel top–K mechanisms to exploit these indexes for answering interesting sub graph queries efficiently.
Predicting electricity consumption using hidden parametersIJLT EMAS
data mining technique to forecast power demand of a
biological region based on the metrological conditions. The value
forecast analytical data mining technique is implement with the
Hidden Marko Model. The morals of the factor such as heat,
clamminess and municipal celebration on which influence
operation depends and the everyday utilization morals compose
the data. Data mining operation are perform on this
chronological data to form a forecast model which is able of
predict every day utilization provide the meteorological
parameter. The steps of information detection of data process are
implemented. The data is preprocessed and fed to HMM for
guidance it. The educated HMM network is used to predict the
electricity demand for the given meteorological conditions.
1) The document discusses how the topology of digital tourism ecosystems can impact opinion sharing and consensus building among stakeholders. It analyzes network data from three similar Italian tourism destinations.
2) Spectral analysis is conducted on the adjacency and Laplacian matrices of the three networks to assess key network parameters without needing to compute dynamic diffusion processes directly.
3) The results show that the network structures of the whole tourism ecosystems have lower critical thresholds for information diffusion and synchronization than their individual real or virtual components considered separately, indicating more efficient spreading of ideas and consensus reaching at the full ecosystem level.
FORMAL MODELING AND VERIFICATION OF MULTI-AGENTS SYSTEM USING WELLFORMED NETScsandit
Multi-agent systems are asynchronous and distributed computer systems. These characteristics make them also a discrete-event dynamic system. It is, therefore, important to analyze the behavior of such systems to ensure that they terminate correctly and satisfy other important properties. This paper presents a formal modeling and analysis of MAS, based on Well-formed Nets, in order to ensure the absence of any undesired or unexpected behavior. To validate our ontribution, we consider the timetable problem, which is a multi-agent resource allocation problem.
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Waqas Tariq
Electricity demand forecasts are required by companies who need to predict their customers’ demand, and by those wishing to trade electricity as a commodity on financial markets. It is hard to find the right prediction method for a given application if not a prediction expert. Recent works show that Liquid State Machines (LSM’s) can be applied to the prediction of time series. The main advantage of the LSM is that it projects the input data in a high-dimensional dynamical space and therefore simple learning methods can be used to train the readout. In this paper we present an experimental investigation of an approach for the computation of time series prediction by employing Liquid State Machines (LSM) in the modeling of a predictor in a case study for short-term and long-term electricity demand forecasting. Results of this investigation are promising, considering the error to stop training the readout, the number of iterations of training of the readout and that no strategy of seasonal adjustment or preprocessing of data was achieved to extract non-correlated data out of the time series.
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Waqas Tariq
Electricity demand forecasts are required by companies who need to predict their customers’ demand, and by those wishing to trade electricity as a commodity on financial markets. It is hard to find the right prediction method for a given application if not a prediction expert. Recent works show that Liquid State Machines (LSM’s) can be applied to the prediction of time series. The main advantage of the LSM is that it projects the input data in a high-dimensional dynamical space and therefore simple learning methods can be used to train the readout. In this paper we present an experimental investigation of an approach for the computation of time series prediction by employing Liquid State Machines (LSM) in the modeling of a predictor in a case study for short-term and long-term electricity demand forecasting. Results of this investigation are promising, considering the error to stop training the readout, the number of iterations of training of the readout and that no strategy of seasonal adjustment or preprocessing of data was achieved to extract non-correlated data out of the time series.
Event triggered control design of linear networked systems with quantizationsISA Interchange
This paper is concerned with the control design problem of event-triggered networked systems with both state and control input quantizations. Firstly, an innovative delay system model is proposed that describes the network conditions, state and control input quantizations, and event-triggering mechanism in a unified framework. Secondly, based on this model, the criteria for the asymptotical stability analysis and control synthesis of event-triggered networked control systems are established in terms of linear matrix inequalities (LMIs). Simulation results are given to illustrate the effectiveness of the proposed method.
Human computer Interaction ch1-the human.pdfJayaprasanna4
Human cognition and perception involve receiving sensory input, processing and storing information, and applying it through reasoning, problem-solving and skill. The key senses are vision, hearing, touch, smell and taste. Vision involves light being focused on the retina and processed in the brain to interpret signals about size, depth, brightness and color. Hearing detects sound waves that are processed to identify pitch, loudness and timbre. Memory has three types - sensory, short-term and long-term. Thinking uses reasoning like deduction and induction, and problem-solving applies knowledge through means-ends analysis or analogy. Emotion influences human capabilities and involves both cognitive and physiological responses.
computer Networks Error Detection and Correction.pptJayaprasanna4
This document discusses error detection and correction in data transmission. It covers the following key points:
- There are two main types of errors: single-bit errors and burst errors. Burst errors are more common in serial transmission.
- Error detection verifies data accuracy without having the original message. It uses redundancy like vertical and longitudinal redundancy checks. Cyclic redundancy checks use polynomial division to detect errors.
- Error correction automatically fixes certain errors. Single-bit error correction reverses the value of the altered bit. Hamming codes use additional redundant bits to detect and correct single-bit errors.
HUman computer Interaction Socio-organizational Issues.pptJayaprasanna4
This document discusses socio-organizational issues and stakeholder requirements in systems design. It covers topics such as organizational conflicts that can impact system acceptance, identifying stakeholder needs in context, and socio-technical models to understand human and technical requirements. Methodologies covered include soft systems methodology to take a broader view, participatory design to involve users, and ethnographic research to study users unbiasedly.
human computer Interaction cognitive models.pptJayaprasanna4
This document discusses cognitive models used to represent users of interactive systems. It describes hierarchical models that represent a user's task and goal structure, as well as linguistic models that represent the user-system grammar. GOMS and CCT are two common cognitive models discussed in detail, with both modeling a user's goals, operators, and methods in a hierarchical structure. The document also covers Keystroke-Level Modeling (KLM) which uses cognitive understandings to predict user performance times for simple tasks.
World wide web and Hyper Text Markup LanguageJayaprasanna4
The document discusses hypertext, multimedia, and the World Wide Web. It covers understanding hypertext and how it differs from linear text through its non-linear structure and links. It also discusses finding information on the web through navigation, structure, bookmarks and search engines. Finally, it outlines some of the underlying web technologies including protocols, servers, clients and networking that allow information to be delivered and accessed on the internet.
The Monte Carlo method uses randomness to solve deterministic problems. It relies on the law of large numbers, which states that averaging a large number of random samples approximates the expected value. The method involves:
1) Defining a probability space with many random points;
2) Applying the constraints of the problem;
3) Discriminating between points inside and outside the constraints;
4) Using the internal points to define the solution space. The more points used, the more accurate the simulation.
The document discusses activity planning for projects. It covers estimating activity durations, developing detailed activity schedules, and identifying risks. A project consists of related activities, and finishes when all activities are complete. Activities must have clear start and end points and estimated durations. Network diagrams like CPM and PERT are used to plan activity sequences and identify the critical path, which determines the overall project duration. Slack time and floats help schedule activities and identify scheduling flexibility.
The document discusses various techniques for estimating software effort, including parametric models, expert judgment, analogy, and bottom-up and top-down approaches. It describes the bottom-up approach as breaking a project into tasks, estimating effort for each, and summing totals. Top-down uses parametric models relating effort to system size and productivity factors. Function point analysis and COSMIC function points are presented as top-down methods to measure system size independently of programming language.
The document discusses activity planning for projects. It covers estimating activity durations, developing detailed activity schedules, and identifying risks. A project consists of related activities, and finishes when all activities are complete. Activities must have clear start and end points and estimated durations. Network diagrams like CPM and PERT are used to plan activity sequences and identify the critical path, which determines the overall project duration. Slack time and floats help schedule activities and identify scheduling flexibility.
This document outlines the objectives and content of the IT8075 Software Project Management course. The course aims to teach students how to plan, manage, and deliver successful software projects. It covers topics such as project planning, estimation techniques, risk management, project monitoring and control, and people management. The course is divided into 5 units which cover areas such as the software development life cycle, effort estimation methods, scheduling, risk analysis, and staff management in projects. The intended learning outcomes include understanding basic project management principles and gaining knowledge of processes, estimation, risk identification, and using principles to define project structure and track progress.
Path-vector routing uses spanning trees to determine routes between sources and destinations based on policies imposed by each source rather than least-cost. BGP is an interdomain routing protocol that uses path-vector routing, with each AS advertising routing information to neighboring ASes. BGP ensures loop-free paths by enumerating the list of ASes in a path and allows ASes to apply local policies to select the best routes.
This document discusses multicasting and multicast routing protocols. It begins with an introduction to multicasting versus unicasting. It then discusses multicast trees, including source-based trees and group-shared trees. It provides overviews of several multicast routing protocols: DVMRP, a distance-vector protocol; MOSPF, an extension of OSPF; and CBT, which uses a rendezvous-point tree with a core router. Key concepts like reverse path forwarding and pruning/grafting to support dynamic membership are also summarized.
This document provides an introduction to computer networks and the fundamental concepts of data communication. It discusses the key components of data communication systems including messages, senders, receivers, transmission medium, and protocols. It also describes different types of data representation and data flow including simplex, half-duplex, and full-duplex. Additionally, it covers network criteria such as performance, reliability, and security. Finally, it discusses various topics in computer networks including physical topologies (e.g. mesh, star, bus, ring), categories of networks (LAN, MAN, WAN), and their characteristics.
This document discusses error detection and correction techniques used in data communication. It describes different types of errors like single-bit errors and burst errors. It then explains various error detection methods like vertical redundancy check (VRC), longitudinal redundancy check (LRC), cyclic redundancy check (CRC), and checksum that work by adding redundant bits. The document also covers error correction techniques like single-bit error correction using Hamming code which allows detecting and correcting single-bit errors.
This document discusses different types of connecting devices used to connect local area networks (LANs) or segments of LANs. It describes five categories of connecting devices based on the layer of the Internet model in which they operate: 1) passive hubs and repeaters that operate below or at the physical layer, 2) bridges that operate at the physical and data link layers, 3) routers that operate at the physical, data link, and network layers, and 4) gateways that can operate at all five layers. Specific devices like hubs, bridges, and routers are explained in terms of their functions, capabilities, and how they differ from each other.
The document contains questions and answers related to computer networks:
1. The presentation layer of the OSI model performs translation, encryption, and compression functions.
2. A sine wave with a frequency of 6 Hz has a period of 0.1515 seconds.
3. In stop-and-wait protocol, the sender transmits one frame and waits for an acknowledgment before sending the next frame. If no acknowledgment is received within a timeout period, the sender retransmits the original frame.
The document discusses session tracking techniques in servlets. It describes four main techniques: cookies, hidden form fields, URL rewriting, and HTTP sessions. Cookies are the simplest technique and involve assigning a unique session ID to each client as a cookie. Hidden form fields maintain state by storing information in hidden form fields and transmitting it across requests. URL rewriting appends a session ID to the URL. HTTP sessions involve saving client-specific information on the server side in an HTTP session object.
JDBC provides a standard Java API for accessing databases. It allows Java programs to connect to databases, send SQL statements, and process the results. The key steps in using JDBC include: 1) loading the appropriate JDBC driver, 2) establishing a connection, 3) creating statements to query the database, 4) processing result sets, and 5) closing connections. JDBC supports both two-tier and three-tier architectures and allows transactions to group SQL statements.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...amsjournal
The Fourth Industrial Revolution is transforming industries, including healthcare, by integrating digital,
physical, and biological technologies. This study examines the integration of 4.0 technologies into
healthcare, identifying success factors and challenges through interviews with 70 stakeholders from 33
countries. Healthcare is evolving significantly, with varied objectives across nations aiming to improve
population health. The study explores stakeholders' perceptions on critical success factors, identifying
challenges such as insufficiently trained personnel, organizational silos, and structural barriers to data
exchange. Facilitators for integration include cost reduction initiatives and interoperability policies.
Technologies like IoT, Big Data, AI, Machine Learning, and robotics enhance diagnostics, treatment
precision, and real-time monitoring, reducing errors and optimizing resource utilization. Automation
improves employee satisfaction and patient care, while Blockchain and telemedicine drive cost reductions.
Successful integration requires skilled professionals and supportive policies, promising efficient resource
use, lower error rates, and accelerated processes, leading to optimized global healthcare outcomes.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
2. Contents
Evolution in Social Networks
Models and Algorithms for Social Influence Analysis
Algorithms and Systems for Expert Location in Social
Networks
Link Prediction in Social Networks
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
2
3. • Informally, evolution refers to a change that
manifests itself across the time axis.
Evolution in Social Networks
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
3
4. Evolution in Social Networks
Framework
Challenges of Social Network Streams
Incremental Mining for Community Tracing
Tracing Smoothly Evolving Communities
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
4
5. Framework
The activities of the participants of a social network
constitute a stream.
Different perspectives for the study of evolution on
a social network
Modeling a Network across the Time Axis
Evolution across Four Dimensions
Challenges of Social Network Streams
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
5
6. Modeling a Network across the
Time Axis
The objective of a stream mining algorithm is to
maintain an up-to-date model of the data.
This translates into adapting the model learned to
the currently visible records.
Only records within the window called Sliding
Window are considered for model adaptation.
One model Mi is built at each timepoint i and
adapted at the next timepoint into model Mi+1 using
the contents of the time window.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
6
7. Evolution across Four Dimensions
Four dimensions are associated to knowledge
discovery in social networks:
Dealing with Time
Objective of Study
Definition of a Community
Evolution as Objective vs Assumption
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
7
8. Dimension 1: Dealing with Time
two different threads on the analysis of dynamic
social networks:
learn a single model that captures the dynamics of the
network by exploiting information on how the network
has changed from one timepoint to the next.
learn one model at each timepoint and adapt it to the
data arriving at the next timepoint.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
8
9. Dimension 1: Dealing with Time
(Cont…)
The object of study for the first thread is a time
series, for the second one it is a stream.
First thread can exploit all information available
about the network. second thread cannot know how
the network will grow
The second thread monitors communities, the first thread
derives laws of evolution that govern all communities in a
given network.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
9
10. Dimension 2: Objective of Study
Understanding how social networks are formed and how they
evolve.
This is pursued mainly within the first thread of Dimension 1
seeks to find laws of evolution that explain the dynamics of the large
social networks we see in the Web.
Community evolution monitoring is pursued by algorithms that
derive and adapt models over a stream of user interactions or
user postings.
The underpinnings come from two domains:
stream clustering
dynamic probabilistic modeling
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
10
11. Dimension 3: Definition of a
Community
A community may be defined by structure
It may be defined by proximity or similarity of
members
depends strongly on the underlying model of
computation:
if stream clustering is used, then communities are crisp
clusters;
if fuzzy clustering is used, then communities are fuzzy
clusters
if dynamic probabilistic modeling is used, then
communities are latent variables
11
12. Dimension 4: Evolution as
Objective vs Assumption
1. methods that discretize the time axis to study how
communities have changed from one timepoint to
the next.
2. methods that make the assumption of smooth
evolution across time and learn community models
that preserve temporal smoothness
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
12
13. A framework for Social Network
Evolution
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
13
14. Challenges of Social Network
Streams
community monitoring involves multiple, interrelated
streams (challenge C1).
community monitoring requires taking account of the
information on the data entities in each of the
streams involved (C2).
community monitoring involves entities that have the
properties of perennial objects (C3)
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
14
15. Incremental Mining for
Community Tracing
tracing the same community at different timepoints
communities across time while ensuring temporal
smoothness
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
15
16. Tracing Smoothly Evolving
Communities
Assume that communities are smoothly evolving
constellations of interacting entities.
Then community learning over time translates into
finding a sequence of models
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
16
17. Temporal Smoothness for Clusters
Two aspects of model quality or model cost:
Snapshot cost captures the quality of the clustering
learned at each time interval
Temporal cost measures the (dis-)similarity of a
clustering learned at a particular timepoint to the
clustering learned at the previous timepoint
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
17
18. Temporal Smoothness for
Clusters(Cont…)
cluster monitoring translates to the optimization
problem of finding a sequence of models that
minimizes the overall cost
cost of model ξt to be learned from the entity
similarity matrix Mt valid at timepoint t is
The "change parameter" β expresses the
importance of temporal smoothness between the old
model ξt−1 and the new one ξt relatively to the
quality of the new model.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
18
19. Temporal Smoothness for
Clusters(Cont…)
The matrix Mt is computed by considering
local similarity of entities:"local" refers to the current
snapshot of the underlying bipartite graph,
"temporal similarity", which reflects similarity with
entities at earlier moments.
This approach assumes that entities (users) found to
be similar in the past are likely to be still similar.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
19
20. Temporal Smoothness for
Clusters(Cont…)
Temporal Smoothness for Clusters
slightly remodeled the cost function to allow for tuning
the impact of the snapshot cost explicitly:
two ways of modeling temporal smoothness :
1. preservation of cluster membership at timepoint t
2. preservation of cluster quality at t
Designed for a graph that has a fixed, a priori
known, number of nodes.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
20
21. Dynamic Probabilistic Models
Data generating process is described by a number
of latent variables.
Behavior of each entity is described/ determined
by each latent variable to a different degree
(contribution), while all contributions add to 1.
A community(Latent Variables) determines each
activity observed at each timepoint with some
probability
latent variables describe the stream(s) of activities
rather than the (stream of) entities,
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
21
22. Dynamic Probabilistic Models
(Cont…)
Discovering and Monitoring Latent Communities
Dealing with Assumptions on the Data Generating
Process
Extending the Notion of Community
Complexity: quadratic to the number of nodes, if no
optimizations are undertaken
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
22
24. Contents
Introduction
Influence Related Statistics
Social Similarity and Influence
Influence Maximization in Viral Marketing
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
24
25. Social influence
Social influence refers to the behavior change of
individuals affected by others in a network
Advertising
Social recommendation
Marketing
Social security
…
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
25
26. Introduction
The strength of social influence depends on many
factors
the strength of relationships between people in the
network
the network distance between users, temporal effects,
characteristics of networks and individuals in the
network.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
26
27. Influence Related Statistics
Edge measures
relate the influence-based concepts and measures on a
pair of nodes.
explain simple influence-related processes and
interactions between individual nodes.
Node measures
Centrality t o measure the importance of a node in the
network
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
27
28. Edge Measures
Tie strength
Weak ties
Edge betweenness
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
28
29. 1.Tie Strength
Tie strength between two nodes depends on the
overlap of their neighborhoods
Tie Strength
nA,nB are the neighborhoods of A and B
The higher tie strength, the easier for two
individuals to trust one another.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
29
30. Triadic Closure
Triadic closure can be corollared from tie
strength.
If A-B and A-C are strong ties, then B-C are likely to be
a strong tie.
If A-B and A-C are weak ties, then B-C are less likely to
be strong tie.
Related to social balance theory
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
30
31. Triadic Closure (Cont…)
Triadic closure is measured by the clustering
coefficient of the network
nΔ be the number of triangles in the network
|E| be the number of edges
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
31
32. 2.Weak Ties
Weak tie
The overlap of neighborhoods is small
Local bridge
No overlapping neighbors
Global bridge
Removal of A-B may result in the disconnection of the
connected component containing A and B .
In real network, global bridges are rare compared to
local bridges
The effect of local and global bridges is quite similar
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
32
33. 3.Edge Betweenness
Edge betweenness
The total amount of flow across an edge A-B
The information flow between A and B are evenly
distributed on the shortest paths between A and B.
Graph partitioning is an application of edge
betweenness
Gradually remove edges of high betweenness to turn
the network into disconnected components
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
33
34. Node Measures - centrality
to measure the importance of a node in the network.
centrality measures can be grouped into two categories
Radial measures
computed by random walks that start or end from a given node, including:
Degree ,Katz and its extensions ,Closeness
Medial measures
computed by the random walks that pass through a given node
Examples: Node betweenness ,structural holes
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
34
35. Node Measures - centrality Cont…
Radial measures are further categorized into:
Volume measures
fix the length of walks and find the volume ( of the walks
limited by the length.
Length measures
Fix the volume of the target nodes and find the length of
walks to reach the target volume.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
35
36. Degree
radial and volume-based measures.
degree centrality
The degree centrality of node i is defined to
be the degree of the node:
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
36
37. Degree Cont…
The Katz centrality
based on the diffusion behavior in the network.
counts the number of walks starting from a node, while
penalizing longer walks.
ei is a column vector whose ith element is 1, and all
other elements are 0.
The value of β is a positive penalty constant
between 0 and 1.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
37
38. Degree Cont…
Bonacich centrality
allows for negative values of β.
negative weight allows to subtract the even-numbered
walks from the odd-numbered walks which have an
interpretation in exchange networks
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
38
39. Degree Cont…
The Hubbell centrality
X is a matrix and y is a vector.
X = βA and y = βA1 lead to the Katz centrality
X = βA and y = A1 lead to the Bonacich centrality.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
39
40. Degree Cont…
The eigenvector centrality
the principal eigenvector of the matrix A, is related to
the Katz centrality
the eigenvector centrality is the limit of the Katz
centrality as β approaches 1/λ from below
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
40
41. 2.Closeness
radial and length based measures.
the length based measures count the length of the
walks.
Freeman’s closeness centrality
measures the centrality by computing the average of
the shortest distances to all other nodes.
S be the matrix whose (i, j)th element contains the
length of the shortest path from node i to j
1 is the all one vector.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
41
42. 3.Node Betweenness
Freeman’s betweenness centrality
measures how much a given node lies in the shortest
paths of other nodes.
bjk is the number of shortest paths from node j to k,
bjik be the number of shortest paths from node j to k that
pass through node i.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
42
43. Node Betweenness Cont…
Newman’s betweenness centrality
Based on random walks on the graph.
Idea: instead of considering shortest paths, it considers
all possible walks and computes the betweenness from
these different walks.
matrix whose (j, k)th element k contains the
probability of a random walk from j to k, which contains i
as an intermediate node.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
43
44. Structural holes
A node is a structural hole if it is connected to
multiple local bridges.
The actor who serves as a structural hole can
interconnect information originating from multiple
non-interacting parties.
Therefore, this actor is structurally important to the
connectivity of diverse regions of the network.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
44
46. Social Similarity and Influence
A central problem for social influence is to
understand the interplay between similarity and
social ties
Homophily
Existential Test for Social Influence
Influence and actions
Influence and Interaction
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
46
47. Homophily
Homophily
An actor in the social network tends to be similar to their connected
neighbors.
Originated from different mechanisms
Social influence
Indicates people tend to follow the behaviors of their friends
Selection
Indicates people tend to create relationships with other people who are
already similar to them
Confounding variables
Other unknown variables exist, which may cause friends to behave similarly
with one another.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
47
48. Generative models for selection
and influence
Holme-Newman Generative model
initially place the M edges of the network uniformly at
random between vertex pairs
assign opinions to vertices uniformly at random.
At each step either:
move an edge to lie between two individuals whose opinions
agree (selection process)
change the opinion of an individual to agree with one of
their neighbors (influence process)
selection tend to generate a large number of small
clusters, while social influence will generate large
coherent clusters.
48
49. Generative models for selection
and influence
Limitation of Holme-Newman Generative model
Every vertex at a given time can only have one opinion.
Crandall et al. introduced multi-dimensional opinion
vectors to better model complex social networks.
Assume a set of m possible activities in the social network.
Each node v at time t has an m-dimensional vector v(t),
where the ith coordinate of v(t) represents the extent to
which person v is engaging
use cosine similarity to compute the similarity between two
people. in activity i.
More data is required in order to learn the parameters.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
49
50. Quantifying influence and
selection
Numerator is the conditional probability that an
unlinked pair will become linked
numerator is the same probability for unlinked pairs
whose similarity exceeds the threshold .
Values greater than one indicate the presence of
selection.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
50
51. Quantifying influence and
selection Cont…
numerator is the conditional probability that similarity
increases from time t−1 to t between two nodes that
became linked at time t
Denominator is the probability that the similarity increases
from time t−1 to t between two nodes that were not
linked at time t − 1.
values greater than one indicate the presence of
influence.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
51
52. Matrix alignment
framework
incorporates the temporal information to learn the
weight of different attributes for establishing
relationships between users.
This can be done by optimizing (minimizing) the
following objective function:
diagonal elements of W correspond to the vector of weights
of attributes and denotes the Frobenius norm
A distortion distance function is used to measure the degree
of influence and selection.
52
53. Matrix alignment
framework Cont…
used to analyze influence and selection.
it does not differentiate the influence from different
angles
How do we quantify the strength of those social
influences?
How do we construct a model and estimate the model
parameters for real large networks?
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
53
55. Topical Factor Graph
(TFG)model
to formalize the topic-level social influence analysis
into a unified graphical model,
Presents Topical Affinity Propagation (TAP) for
model learning.
the goal of the model is to simultaneously capture
user interests, similarity between users, and network
structure.
The TFG model has:
a set of observed variables {vi}Ni =1 and
a set of hidden vectors {yi}Ni =1
55
56. Topical Factor Graph
(TFG)model
The hidden vector yi ∈ {1, . . . ,N}T models the
topic-level influence from other nodes to node vi.
Each element yz i takes the value from the set {1, . .
. ,N},
represents the node that has the highest probability to
influence node vi on topic z.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
56
58. Shuffle test for social influence
Goal
to test whether social influence exists in a social
network?
Intuition
if social influence does not play a role, then the timing
of activation should be independent of the timing of
activation of other users.
Edge reversal test: if the social correlation is not
determined by social influence, then reversing the edge
direction will not change the social correlation estimate
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
58
59. Randomization Test of Social
Influence
Model social network as a time-evolving graph
Gt=(V, Et), and the nodes with attribute at time t is
Xt
Main idea
Selection and social influence can be differentiated
through the autocorrelation between Xt and Gt
Selection process
Xt-1 determines the social network at Gt
Social influence
Gt-1 determines the node attributes at time t, i.e., Xt
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
59
62. Learning Influence Probabilities
Goal: Learn user influence and action influence from
historical actions
Assumption
If user vi performs an action y at time t and later his
friend vj also perform the action, then there is an
influence from vi to vj
User influence probability
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
62
63. Learning Influence Probabilities
Action influence probability
is the difference between the time when vj
performing the action and the time when user vi
performing the action, given eij=1; represents the
action propagation score
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
63
65. Influence and Autocorrelatoin
Autocorrelation
A set of linked users eij=1 and an attribute matrix X
associated with these users, as the correlation between
the values of X on these instance pairs
Social influence, diffusion processes, and the principle
of homophily give rise to auto correlated observations
Behavior correlation
utilize the observed behavior correlation to predict the
collective behavior in social media through
Social dimension extraction
Discriminative learning
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
65
66. Influence and Grouping Behavior
Grouping behavior
E.g., user’s participation behavior into a forum
four factors in online forums that potentially influence people’s behavior in
joining communities
Friends of Reply Relationship.
Describe how users are influenced by the number of neighbors with whom they have
ever had any reply relationship.
Community Sizes
Quantify the ‘popularity’ of information
Average Ratings of Top Posts
Measure how the authority of information impacts user behavior.
Similarities of Users
If two users are ‘similar’ in a certain way, what is the correlation of the sets of
communities they join?
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
66
68. Influence Maximization
Influence maximization
Minimize marketing cost and more generally to
maximize profit.
E.g., to get a small number of influential users to adopt
a new product, and subsequently trigger a large
cascade of further adoptions.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
68
69. Problem Abstraction
We associate each user with a status: active or
inactive
The status of the chosen set of users (seed nodes) to
market is viewed as active
Other users are viewed as inactive
Influence maximization
Initially all users are considered inactive
Then the chosen users are activated, who may further
influence their friends to be active as well
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
69
70. Alg1: High-degree heuristic
Choose the seed nodes according to their degree.
Intuition
The nodes with more neighbors would arguably tend to
impose more influence upon its direct neighbors.
Know as “degree centrality”
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
70
71. Alg2: Low-distance Heuristic
Consider the nodes with the shortest paths to other
nodes as seed nodes
Intuition
Individuals are more likely to be influenced by those
who are closely related to them.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
71
72. Alg3: Degree Discount Heuristic
General idea
If u has been selected as a seed, then when considering
selecting v as a new seed based on its degree, we
should not count the edge v->u
Referred to SingleDiscount
Specifically, for a node v with dv neighbors of
which tv are selected as seeds, we should discount
v’s degree by 2tv +(dv-tv)tvp
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
72
73. Diffusion Influence Model
Linear Threshold Model
Cascade Model
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
73
74. Linear Threshold Model
General idea
Whether a given node will be active can be based on an
arbitrary monotone function of its neighbors that are
already active.
Formalization
fv : map subsets of v’s neighbors to real numbers in [0,1]
θv : a threshold for each node
S: the set of neighbors of v that are active in step t-1
Node v will turn active in step t if fv(S)> θv
Specifically, in [Kempe, 2003], fv is defined as ,
where bv,u can be seen as a fixed weight, satisfying
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
74
75. Cascade Model
Cascade model
pv(u,S) : the success probability of user u activating user v
Independent cascade model
pv(u,S) is a constant, meaning that whether v is to be active
does not depend on the order v’s neighbors try to activate
it.
Key idea: Flip coins c in advance -> live edges
Fc(A): People influenced under outcome c (set cover !)
F(A) = Sum cP(c) Fc(A) is submodular as well
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
75
76. Algorithms and
Systems for Expert Location in Social Networks
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
76
77. Expert-location problem
Given a particular task and a set of candidates,
one often wants to identify the right expert (or set
of experts) that can perform the given task.
Expert location without graph constraints.
an important aspect of expert location is finding out the
expertise of different individuals from the available
data.
Expert location with score propagation.
problem of inferring the expertise of individuals using
their connections with other experts
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
77
78. Expert team formation problem
problem of identifying a team of experts that can
communicate well
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
78
79. Expert Location without Graph
Constraints
Given a query Q that consists of a list of skills,
identify a subset of candidates X ⊆ X who have the
skills specified by the query.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
79
80. Expert Loc. with Score Propagation
Given a query Q that consists of a list of skills, and
the social graph G, identify an initial subset of
candidates X ⊆ X who have the skills specified by the
query.
Use the input graph among experts to propagate the
initial expertise scores to rerank experts or to identify
new experts.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
80
81. Expert Team Formation
Given a query Q that consists of a list of skills, and
the social graph G, identify a subset of candidates X
⊆ X so that the chosen candidates cover the required
skills and the collaboration/communication cost of X
is minimized.
The collaboration/communication cost is defined over
the input graph G.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
81
82. Expert Location with Score Propagation
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
82
83. A two-step process
1. using language model or heuristic rules to compute
an initial expertise score for each candidate
2. using graph-based ranking algorithms to
propagate scores computed in the step 1 and
rerank experts.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
83
85. The PageRank Algorithm
Models the probability that a web page will be
visited by a random surfer.
At each time step, the surfer proceeds from his
current web page u to a randomly chosen web
page v following one of two strategies:
1. When u does not have outlinks, the surfer jumps to an
arbitrary page in the Web.
2. When u has outlinks, the surfer jumps to an arbitrary
page in the Web with probability α, and a random
page that u hyperlinks to with probability 1 − α.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
85
86. The PageRank Algorithm (Cont…)
when the surfer follows the above combined
process, the probability of him visiting a webpage
u, denoted by π(u), converges to a fixed, steady-
state quantity.
We call this probability π(u) the PageRank of u.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
86
87. HITS Algorithm
There are two types of web pages useful as results for broad-
topic searches:
Authority
Authoritative source for the broad topic search
Hubs
there are many other pages that contain lists of links to authoritative pages
on a specific topic.
A good hub is one that points to many good authorities; a
good authority page is one that is linked to by many
goodhubs.
Therefore, each web page can be assigned two scores – an
authority score and a hub score.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
87
88. Expert Score Propagation
Algorithm 1:Assume user A has answered questions
for users U1, . . . , Un,then the expertise ranking score
of A, denoted by ER(A), is computed as follows:
C(Ui) is the total number of users who have helped Ui
α is the damping factor which can be set between 0
and 1.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
88
89. Algorithm 2
The expert score of a user vi, denoted by s(vi), is
updated using the following rules:
w ((vj, vi), e) is the propagation coeffient and e ∈ Rji
is one kind of relationship from person vj to vi.
U denotes the set of neighbors to vi and Rji is the set
of all relationships between vj and vi.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
89
90. Algorithm 3
Employee ej ’s expertise score p(q|ej) on a given
topic q is smoothed using the following equation:
α is weighting parameter and Nj is the number of
neighbors for employee ej .
p(q|ej ) and p(q|ei) are the initial scores for
employee ej and his neighbor ei, respectively.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
90
92. Introduction
The objective is to find a group of experts in the
network who can
collectively perform a task in an effective manner.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
92
93. Metrics
TheMyers-Briggs Type Indicator (MBTI)
frequently-employed personality assessment tool.
help people to identify the sort of jobs where they would be “most
comfortable and effective".
used in improving and predicting team performance
Herrmann Brain Dominance Instrument (HBDI)
reflects individual’s affinity for creativity, facts, form and feelings
used in several experimental settings
Kolbe Conative Index (KCI)
a psychometric system that measures conation
effective in predicting team’s performance.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
93
94. Forming Teams of Experts
Problem 1 [Team Formation] Given a set of n
individuals denoted by X = {1, . . . , n}, a graph
G(X,E), and task T, find X ⊆ X, so that (∪i∈X’Xi) ∩ T
= T, and the communication cost Cc(X’) is minimized.
Diameter (R): Given graph G(X,E) and a set of
individuals X ⊆ X, we define the diameter
communication cost of X, denoted by Cc-R (X), to be
the diameter of the subgraph G[X].
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
94
95. Forming Teams of Experts (Cont…)
Minimum Spanning Tree (Mst): Given graph G(X,E)
and X ⊆ X, we define the Mst communication cost of
X, denoted by Cc-Mst (X), to be the cost of the
minimum spanning tree on the subgraph G[X].
The Team Formation problem with communication
function Cc-R is called the Diameter-Tf problem.
Team Formation problem with communication
function Cc-Mst is called the Mst-Tf problem.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
95
98. Link Prediction in Social Networks
98
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
99. The “Link Prediction Problem”
Given a snapshot of a social network, can we infer
which new interactions among its members are likely to
occur in the near future?
Based on “proximity” of nodes in a network
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
99
100. Variations
Link completion
Example: When a user buys five books online and the
name of one book is corrupted in transfer. A link
completion algorithm could infer the name of the
missing book based on the user's name and the other
books she bought
Alice, Bob, and a third person attended a meeting.
Given people’s previous co-occurrences, who is the third
person?
Link discovery
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
100
101. Definition of Link Prediction
Generalized Definition
Existence (i.e., does a link exist?)
Classification (i.e., what type of the relationship?)
Regression (i.e., how does the user rate the item?)
Cardinality (i.e., is there more than one link between same pair of nodes?)
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
101
102. Introduction
Natural examples of social networks:
Nodes = people/entities
Edges = interaction/ collaboration
Nodes Edges
Scientists in a discipline Co-authors of a paper
Employees in a large
company
Working on a project
Business Leaders Serve together on a
board
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
102
103. Motivation
Understanding how social networks evolve
The link prediction problem
Given a snapshot of a social network at time t, we seek
to accurately predict the edges that will be added to the
network during the interval (t, t’)
?
103
104. Why?
To suggest interactions or collaborations that haven’t yet
been utilized within an organization
To monitor terrorist networks - to deduce possible
interaction between terrorists (without direct evidence)
Used in Facebook and Linked In to suggest friends
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
104
105. Motivation
Co-authorship network for scientists
Scientists who are “close” in the network
will have common colleagues & circles –
likely to collaborate
Caveat: Scientists who have never collaborated might
in future - hard to predict
Goal: make that intuitive notion precise;
understand which measures of
“proximity” lead to accurate predictions
A
B
C
D
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
105
106. Goals
Present measures of proximity
Understand relative effectiveness of network proximity
measures (adapted from graph theory, CS, social sciences)
Prove that prediction by proximity outperforms random
predictions by a factor of 40 to 50
Prove that subtle measures outperform more direct
measures
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
106
107. Challenges of Link Prediction
Data sparse
Missing data/relationship
Imbalance
So many possibilities, so few choices
Referred as inaccurate problem due to low accuracy in
practice
Accuracy vs. Scalability
Modeling (unobserved/unknown factors)
Tasks of approximation/optimization
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
107
108. Methods for Link Prediction
Take the input graph during training period Gcollab
Pick a pair of nodes (x, y)
Assign a connection weight score(x, y)
Make a list in descending order of score
score is a measure of proximity
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
108
110. Graph distance & Common Neighbors
Graph distance: (Negated) length
of shortest path between x and y
Common Neighbors: A and C have
2 common neighbors, more likely to
collaborate
A
B
C
D
E
(A, C) -2
(C, D) -2
(A, E) -3
A
B
C
D
E
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
110
111. Jaccard’s coefficient and Adamic / Adar
Jaccard’s coefficient: same as
common neighbors, adjusted for
degree
Adamic / Adar: weighting rarer
neighbors more heavily
A
B
C
D
E
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
111
112. Preferential Attachment
Probability that a new collaboration involves x is
proportional to T(x), current neighbors of x
score (x, y) :=
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
112
113. Considering all paths: Katz
Katz: measure that sums over
the collection of paths,
exponentially damped by length
(to count short paths heavily)
β is chosen to be a very small value (for
dampening)
A
B
C
D
E
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
113
114. Hitting time, PageRank
Hitting time: expected number of steps for a random
walk starting at x to reach y
Commute time:
If y has a large stationary probability, Hx,y is small. To
counterbalance, we can normalize
PageRank: to cut down on long random walks, walk can
return to x with a probablity α at every step y
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
114
115. SimRank
Defined by this recursive definition: two nodes are
similar to the extent that they are joined by similar
neighbors
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
115
116. Low-rank approximation
Treat the graph as an adjacency matrix
Compute the rank-k matrix Mk (noise-reduction)
x is a row, y is a row, score(x, y) = inner product of rows
r(x) and r(y)
-A B C
A 1 0
B 1 1
C 0 1
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
116
117. Unseen bigrams and Clustering
Unseen bigrams: Derived from language modeling
Estimating frequency of unseen bigrams – pairs of
words (nodes here) that co-occur in a test corpus but not
in the training corpus
Clustering: deleting tenuous edges in Gcollab through a
clustering procedure and running predictors on the
“cleaned-up” subgraph
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
117
118. Results
The results are presented as:
1. Factor improvement of proposed predictors over
Random predictor
Graph distance predictor
Common neighbors predictor
2. Relative performance vs. the above predictors
3. Common Predictions
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
118
119. Feature based Link Prediction
119
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
120. Drawbacks of Unsupervised Algorithms
Generality
Unsupervised methods are domain-specific
Unstable not only from one network to the other, but
also from one graph-instance to another
Variance Reduction and Sampling Issues
Ensemble could possibly remove important information
Samples of networks with ill-behaving distributions
produce networks with different properties
Imbalance
Agnostic to class distributions by definition
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
120
123. Pros & Cons of Supervised
Algorithms
Generality
Generalize well to many environments in the sense that they
can adjust models depending on posteriorinformation.
Variance Reduction and Sampling Issues
Classification algorithms, like decision tree, can befit from
reduced variance by placing them in an ensemble
framework.
Imbalance
Capable of balancing data and focus on class boundaries.
XLimitation
Unavailability of labels for trainings.
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
123
125. Motivation
Most real-world data are stored in relational DBMS
Few learning algorithms are capable of handling
data in its relational form; thus we have to resort to
“flattening” the data in order to do analysis
As a result, we lose relational information which might
be crucial to understanding the data
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
125
126. Probabilistic Models of Bayesian
networks (BNs)
BNs for a given domain involves a pre-
specified set of attributes whose relationship to
each other is fixed in advance
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
126
127. Bayesian Networks
Good Writer Smart
Quality
Accepted
Review
Length
nodes = domain variables
edges = direct causal influence
Network structure encodes conditional independencies:
I(Review-Length , Good-Writer | Reviewer-Mood)
Reviewer
Mood
0.6 0.4
w
s
w
0.3 0.7
0.1 0.9
0.4 0.6
s
w
s
s
w
S
W P(Q| W, S)
conditional probability table (CPT)
127
128. BN Semantics
Compact & natural representation:
nodes k parents O(2k n) vs. O(2n) params
natural parameters
conditional
independencies
in BN structure
+ local
CPTs
full joint
distribution
over domain
=
W
M
S
Q
A
L
)
,
(
)
|
(
)
,
|
(
)
|
(
)
(
)
(
)
,
,
,
,
,
(
q
m
a
P
m
l
P
s
w
q
P
w
m
P
s
P
w
P
a
l
q
m
s
w
P
|
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
128
129. Reasoning in BNs
Full joint distribution answers any query
P(event | evidence)
Allows combination of different types of reasoning:
Causal: P(Reviewer-Mood | Good-Writer)
Evidential: P(Reviewer-Mood | not Accepted)
Intercausal: P(Reviewer-Mood | not Accepted,
Quality)
W
M
S
Q
A
L
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
129
130. To compute
l
q
m
s
w
a
l
q
m
s
w
P
a
P
,
,
,
,
)
,
,
,
,
,
(
)
(
l
q
m
s
w
q
m
a
P
m
l
P
s
w
q
P
w
m
P
s
P
w
P
,
,
,
,
)
,
|
(
)
|
(
)
,
|
(
)
|
(
)
(
)
(
factors
mood good writer
good
pissy
good
pissy
false
true
true
false
0.7
0.3
0.1
0.9 A factor is a function from
values of variables to
positive real numbers
Variable Elimination
130
131. Some Other Inference Algorithms
Exact
Junction Tree [Lauritzen & Spiegelhalter 88]
Cutset Conditioning [Pearl 87]
Approximate
Loopy Belief Propagation [McEliece et al 98]
Likelihood Weighting [Shwe & Cooper 91]
Markov Chain Monte Carlo [eg MacKay 98]
Gibbs Sampling [Geman & Geman 84]
Metropolis-Hastings [Metropolis et al 53, Hastings 70]
Variational Methods [Jordan et al 98]
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
131
132. Parameter Estimation in BNs
Assume known dependency structure G
Goal: estimate BN parameters q
entries in local probability models,
q is good if it’s likely to generate observed data.
MLE Principle: Choose q* so as to maximize l
Alternative: incorporate a prior
)
,
|
(
log
)
,
:
( G
D
P
G
D
l q
q
)
]
[
Pa
|
(
, u
X
x
X
P
u
x
q
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
133
133. Learning With Complete Data
Fully observed data: data consists of set of
instances, each with a value for all BN variables
With fully observed data, we can compute
= number of instances with , and
and similarly for other counts
We then estimate
s
w
q
N ,
,
s
w
s
w
q
s
w
q
N
N
s
w
q
P
,
,
,
,
, )
,
|
(
q
q w s
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
134
134. Learning with Missing Data:
Expectation-Maximization (EM)
Can’t compute
But
1. Given parameter values, can compute expected counts:
2. Given expected counts, estimate parameters:
Begin with arbitrary parameter values
Iterate these two steps
Converges to local maximum of likelihood
s
w
q
N ,
,
i
i
i
i
i
s
w
q s
w
q
P
N
E
instances
,
, )
evidence
|
,
,
(
]
[
]
[
]
[
)
,
|
(
,
,
,
,
,
s
w
s
w
q
s
w
q
N
E
N
E
s
w
q
P
q
this requires BN inference
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
135
135. Structure search
Begin with an empty network
Consider all neighbors reached by a search operator
that are acyclic
add an edge
remove an edge
reverse an edge
For each neighbor
compute ML parameter values
compute score(s) =
Choose the neighbor with the highest score
Continue until we reach a local maximum
)
(
log
)
,
|
(
log *
s
P
s
D
P s
q
*
s
q
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
136
136. Limitations of BNs
Inability to generalize across collection of individuals
within a domain
if you want to talk about multiple individuals in a domain,
you have to talk about each one explicitly, with its own local
probability model
Domains have fixed structure: e.g. one author, one
paper and one reviewer
if you want to talk about domains with multiple inter-related
individuals, you have to create a special purpose network
for the domain
For learning, all instances have to have the same set
of entities
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
137
137. Relational Schema
Author
Good Writer
Author of
Has Review
• Describes the types of objects and relations in the
database
Review
Paper
Quality
Accepted
Mood
Length
Smart
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
139
140. Probabilistic Relational Model
Length
Mood
Author
Good Writer
Paper
Quality
Accepted
Review
Smart
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
142
141. Fixed relational skeleton :
– set of objects in each class
– relations between them
Author A1
Paper P1
Author: A1
Review: R1
Review R2
Review R1
Author A2
Relational Skeleton
Paper P2
Author: A1
Review: R2
Paper P3
Author: A2
Review: R2
Primary Keys
Foreign Keys
Review R2
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
143
142. Author A1
Paper P1
Author: A1
Review: R1
Review R2
Review R1
Author A2
PRM defines distribution over instantiations of attributes
PRM w/ Attribute Uncertainty
Paper P2
Author: A1
Review: R2
Paper P3
Author: A2
Review: R2
Good Writer
Smart
Length
Mood
Quality
Accepted
Length
Mood
Review R3
Length
Mood
Quality
Accepted
Quality
Accepted
Good Writer
Smart
144
143. A Portion of the BN
P2.Accepted
P2.Quality r2.Mood
P3.Accepted
P3.Quality
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
Pissy
Low
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
r3.Mood
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
145
144. A Portion of the BN
P2.Accepted
P2.Quality r2.Mood
P3.Accepted
P3.Quality
Pissy
Low
r3.Mood
High 3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
Pissy
IFETCEM.E CSEIII SEMNE7012-SNAUNIT4-PPT
146
146. PRM: Aggregate Dependencies
sum, min, max,
avg, mode, count
Length
Mood
Paper
Quality
Accepted
Review
Review R1
Length
Mood
Review R2
Length
Mood
Review R3
Length
Mood
Paper P1
Accepted
Quality
mode
3
.
0
7
.
0
4
.
0
6
.
0
8
.
0
2
.
0
9
.
0
1
.
0
,
,
,
,
,
t
t
f
t
t
f
f
f
P(A | Q, M)
M
Q
150
147. PRM with AU Semantics
))
.
(
|
.
(
)
,
,
|
( ,
.
A
x
parents
A
x
P
P S
x A
x
S
I
Attributes
Objects
probability distribution over completions I:
PRM relational skeleton
+ =
Author
Paper
Review
Author
A1
Paper
P2
Paper
P1
Review
R3
Review
R2
Review
R1
Author
A2
Paper
P3
151