This document discusses graph mining and summarizes the key points from a presentation. It addresses five main problems: (1) understanding how real graphs are structured, (2) how graphs evolve over time, (3) generating realistic graphs, (4) identifying influential nodes in a graph, and (5) tracking communities over time. For problem 4, a method called CenterPiece Subgraph is presented that uses random walk with restart to identify central nodes connecting multiple query nodes. The document concludes that Kronecker graphs can accurately model real graph properties and change over time.
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
Graph mining 2: Statistical approaches for graph miningtuxette
Workshop "Advanced mathematics for network analysis"
organized by Institut des Systèmes Complexes de Toulouse
http://isc-t.fr/evenements/?event_id1=2
Luchon, France
May, 3rd 2016
Part 1: Concepts and Cases (the language of networks, networks in organizations, case studies and key concepts)
Part 2: (Starts on #44) Mapping Organizational, Personal, and Enterprise Networks: Tools
An update to last year's Social Network Analysis Introduction and Tools...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. We show that fractals and self-similarity can explain several of the observed patterns, and we conclude with cascade analysis and a surprising result on virus propagation and immunization.
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -
Description
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Abstract
This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.
The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:
* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way
Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
Graph mining 2: Statistical approaches for graph miningtuxette
Workshop "Advanced mathematics for network analysis"
organized by Institut des Systèmes Complexes de Toulouse
http://isc-t.fr/evenements/?event_id1=2
Luchon, France
May, 3rd 2016
Part 1: Concepts and Cases (the language of networks, networks in organizations, case studies and key concepts)
Part 2: (Starts on #44) Mapping Organizational, Personal, and Enterprise Networks: Tools
An update to last year's Social Network Analysis Introduction and Tools...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. We show that fractals and self-similarity can explain several of the observed patterns, and we conclude with cascade analysis and a surprising result on virus propagation and immunization.
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -
Description
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Abstract
This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.
The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:
* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way
Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.
An introductory-to-mid level to presentation to complex network analysis: network metrics, analysis of online social networks, approximated algorithms, memorization issues, storage.
With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by Internet players.
Data Mining Seminar - Graph Mining and Social Network Analysisvwchu
Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures
Social Network Analysis: applications for education researchChristian Bokhove
What is your talk about?
This seminar will illustrate various social network analysis (SNA) techniques and measures and their applications to research problems in education. These applications will be illustrated from our own research utilising a range of SNA techniques.
What are the key messages of your talk?
We will cover some of the ways in which network data can be collected and utilised with other research data to examine the relationships between network measures and other attributes of individuals and organisations, and how it can be linked to other approaches in multiple methods studies.
What are the implications for practice or research from your talk?
SNA is an approach that draws from theories of social capital to study the relational ties that exist between actors or institutions in a specific context. Such ties might include learning exchanges or advice-seeking interactions. SNA techniques allow researchers to incorporate the interdependence of participants within their research questions, whereas many traditional techniques assume our participants, and their responses to our questions, are independent of one another.
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
Fourth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Social Network Analysis & an Introduction to ToolsPatti Anklam
This presentation was delivered as part of an intense knowledge management curriculum. It covers the basics of network analysis and then goes into the different types of tool that support analyzing networks.
Time Series Data Mining - from PhD to StartupPeter Laurinec
The talk will be oriented on differences between "doing" a research and an application of time series data mining to real problems in business on a real rich data.
I will discuss, why research and business need to be related and also not. Typical tasks of time series data mining in energetics with use cases in R will be shown.
What am I going to get from this course?
Provides a basic conceptual understanding of how clustering works
Provides intuitive understanding of the mathematics behind various clustering algorithms
Walk through Python code examples on how to use various cluster algorithms
Show how clustering is applied in various industry applications
Check it on Experfy: https://www.experfy.com/training/courses/unsupervised-learning-clustering
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
An introductory-to-mid level to presentation to complex network analysis: network metrics, analysis of online social networks, approximated algorithms, memorization issues, storage.
With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by Internet players.
Data Mining Seminar - Graph Mining and Social Network Analysisvwchu
Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures
Social Network Analysis: applications for education researchChristian Bokhove
What is your talk about?
This seminar will illustrate various social network analysis (SNA) techniques and measures and their applications to research problems in education. These applications will be illustrated from our own research utilising a range of SNA techniques.
What are the key messages of your talk?
We will cover some of the ways in which network data can be collected and utilised with other research data to examine the relationships between network measures and other attributes of individuals and organisations, and how it can be linked to other approaches in multiple methods studies.
What are the implications for practice or research from your talk?
SNA is an approach that draws from theories of social capital to study the relational ties that exist between actors or institutions in a specific context. Such ties might include learning exchanges or advice-seeking interactions. SNA techniques allow researchers to incorporate the interdependence of participants within their research questions, whereas many traditional techniques assume our participants, and their responses to our questions, are independent of one another.
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
Fourth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Social Network Analysis & an Introduction to ToolsPatti Anklam
This presentation was delivered as part of an intense knowledge management curriculum. It covers the basics of network analysis and then goes into the different types of tool that support analyzing networks.
Time Series Data Mining - from PhD to StartupPeter Laurinec
The talk will be oriented on differences between "doing" a research and an application of time series data mining to real problems in business on a real rich data.
I will discuss, why research and business need to be related and also not. Typical tasks of time series data mining in energetics with use cases in R will be shown.
What am I going to get from this course?
Provides a basic conceptual understanding of how clustering works
Provides intuitive understanding of the mathematics behind various clustering algorithms
Walk through Python code examples on how to use various cluster algorithms
Show how clustering is applied in various industry applications
Check it on Experfy: https://www.experfy.com/training/courses/unsupervised-learning-clustering
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
One of the most important, yet often overlooked, aspects of predictive modeling is the transformation of data to create model inputs, better known as feature engineering (FE). This talk will go into the theoretical background behind FE, showing how it leverages existing data to produce better modeling results. It will then detail some important FE techniques that should be in every data scientist’s tool kit.
In recent years, we have experienced an exponential growth in the amount of data generated by IoT devices. Data have to be processed strict low latency constraints, that cannot be addressed by conventional computing paradigm and architectures. On top of this, if we consider that we recently hit the limit codified by the Moore’s law, satisfying low-latency requirements of modern applications will become even more challenging in the future. In this talk, we discuss challenges and possibilities of heterogeneous distributed systems in the Post-Moore era.
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Luigi Vanfretti
Title:
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and GridDyn
Presenters:
Luigi Vanfretti (RPI) & Philip Top (LNLL)
luigi.vanfretti@gmail.com, top1@llnl.gov
Abstract:
The Modelica language, being standardized and equation-based, has proven valuable for the for model exchange, simulation and even for model validation applications in actual power systems. These important features have been now recognized by the European Network of Transmission System Operators, which have adopted the Modelica language for dynamic model exchange in the Common Grid Model Exchange Standard (v2.5, Annex F).
Following previous FP7 project results, within the ITEA 3 openCPS project, the presenters have continued the efforts of using the Modelica language for power system modeling and simulation, by developing and maintaining the OpenIPSL library: https://github.com/SmarTS-Lab/OpenIPSL
This seminar first gives an overview of the origins of the OpenIPSL and it’s models, it contrasts it against typical power system tools, and gives an introduction the OpenIPSL library. The new project features that help in the OpenIPSL maintenance (use of continuous integration, regression testing, documentation, etc.) are also described.
Finally, the seminar will present current work at LNLL that exploits OpenIPSL in coordination with other tools including ongoing work integrating openIPSL models into GridDyn an open-source power system simulation tool, as well as a demos of the use of openIPSL libraries in GridDyn.
Bios:
Luigi Vanfretti (SMIEEE’14) obtained the M.Sc. and Ph.D. degrees in electric power engineering at Rensselaer Polytechnic Institute, Troy, NY, USA, in 2007 and 2009, respectively.
He was with KTH Royal Institute of Technology, Stockholm, Sweden, as Assistant 2010-2013), and Associate Professor (Tenured) and Docent (2013-2017/August); where he lead the SmarTS Lab and research group. He also worked at Statnett SF, the Norwegian electric power transmission system operator, as consultant (2011 - 2012), and Special Advisor in R&D (2013 - 2016).
He joined Rensselaer Polytechnic Institute in August 2017, to continue to develop his research at ALSETLab: http://alsetlab.com
His research interests are in the area of synchrophasor technology applications; and cyber-physical power system modeling, simulation, stability and control.
Philp Top (Lawrence Livermore National Lab)
PhD 2007 Purdue University. Currently a Research Engineer at Lawrence Livermore National Laboratory in Livermore, CA. Philip has been involved in several projects connected with the DOE effort on Grid Modernization including projects on modeling and simulation, co-simulation and smart grid data analytics. He is the principle developer on the open source power system simulation tool GridDyn, and a key contributor to the HELICS open source co-simulation framework.
towards Quantum Machine Learning
Machine Learning (ML) gained a lot of momentum in the last ten years, mostly thanks to the advancements in non-linear patterns discovery, and more specifically, in Deep Learning (DL). But those who think that DL is going to address all possible problems might be terribly wrong. DL and ML tasks, in general, are categorized as Non-Polynomial problems, which means that the number of possible solutions for a given problem can grow exponentially, making it intractable using the classical algorithmic approach. Here, Quantum Computing (QC) techniques have the potential to address these issues and help ML methods to solve problems faster and sometimes better than the classical counterpart. The conjunction of these two disciplines resulted in a new exciting research direction to explore: Quantum Machine Learning (QML).
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
4. CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08
C. Faloutsos
5
5. CMU SCS
Problem#1: Joint work with
Dr. Deepayan Chakrabarti
(CMU/Yahoo R.L.)
MMDS 08
C. Faloutsos
6
6. CMU SCS
Graphs - why should we care?
Internet Map
[lumeta.com]
Friendship Network
[Moody ’01]
MMDS 08
C. Faloutsos
Food Web
[Martinez ’91]
Protein Interactions
[genomebiology.com]
7
7. CMU SCS
Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)
D1
DN
• web: hyper-text graph
• ... and more:
MMDS 08
C. Faloutsos
8
...
...
T1
TM
8. CMU SCS
Graphs - why should we care?
• network of companies & board-of-directors
members
• ‘viral’ marketing
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic
and anomaly detection
• ....
MMDS 08
C. Faloutsos
9
9. CMU SCS
Problem #1 - network and graph
mining
•
•
•
•
MMDS 08
How does the Internet look like?
How does the web look like?
What is ‘normal’/‘abnormal’?
which patterns/laws hold?
C. Faloutsos
10
11. CMU SCS
Laws and patterns
• Are real graphs random?
• A: NO!!
– Diameter
– in- and out- degree distributions
– other (surprising) patterns
MMDS 08
C. Faloutsos
12
12. CMU SCS
Solution#1
• Power law in the degree distribution
[SIGCOMM99]
internet domains
att.com
log(degree)
ibm.com
-0.82
log(rank)
MMDS 08
C. Faloutsos
13
13. CMU SCS
Solution#1’: Eigen Exponent E
Eigenvalue
Exponent = slope
E = -0.48
May 2001
Rank of decreasing eigenvalue
• A2: power law in the eigenvalues of the adjacency
matrix
MMDS 08
C. Faloutsos
14
14. CMU SCS
Solution#1’: Eigen Exponent E
Eigenvalue
Exponent = slope
E = -0.48
May 2001
Rank of decreasing eigenvalue
• [Papadimitriou, Mihail, ’02]: slope is ½ of rank
exponent
MMDS 08
C. Faloutsos
15
16. CMU SCS
The Peer-to-Peer Topology
[Jovanovic+]
• Count versus degree
• Number of adjacent peers follows a power-law
MMDS 08
C. Faloutsos
17
17. CMU SCS
More power laws:
citation counts: (citeseer.nj.nec.com 6/2001)
log(count)
Ullman
log(#citations)
MMDS 08
C. Faloutsos
18
18. CMU SCS
More power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(count)
Zipf
``ebay’’
users
log(in-degree)
MMDS 08
C. Faloutsos
19
sites
20. CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08
C. Faloutsos
21
21. CMU SCS
Problem#2: Time evolution
• with Jure Leskovec
(CMU/MLD)
• and Jon Kleinberg (Cornell –
sabb. @ CMU)
MMDS 08
C. Faloutsos
22
22. CMU SCS
Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
MMDS 08
C. Faloutsos
23
23. CMU SCS
Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
MMDS 08
C. Faloutsos
24
24. CMU SCS
Diameter – ArXiv citation graph
• Citations among
physics papers
• 1992 –2003
• One graph per
year
diameter
time [years]
MMDS 08
C. Faloutsos
25
25. CMU SCS
Diameter – “Autonomous
Systems”
• Graph of Internet
• One graph per
day
• 1997 – 2000
diameter
number of nodes
MMDS 08
C. Faloutsos
26
26. CMU SCS
Diameter – “Affiliation Network”
• Graph of
collaborations in
physics – authors
linked to papers
• 10 years of data
diameter
time [years]
MMDS 08
C. Faloutsos
27
27. CMU SCS
Diameter – “Patents”
• Patent citation
network
• 25 years of data
diameter
time [years]
MMDS 08
C. Faloutsos
28
28. CMU SCS
Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
MMDS 08
C. Faloutsos
29
29. CMU SCS
Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled!
– But obeying the ``Densification Power Law’’
MMDS 08
C. Faloutsos
30
34. CMU SCS
Densification – Patent Citations
• Citations among
patents granted E(t)
• 1999
1.66
– 2.9 million nodes
– 16.5 million
edges
• Each year is a
datapoint
MMDS 08
N(t)
C. Faloutsos
35
35. CMU SCS
Densification – Autonomous Systems
• Graph of
Internet
• 2000
E(t)
1.18
– 6,000 nodes
– 26,000 edges
• One graph per
day
N(t)
MMDS 08
C. Faloutsos
36
37. CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08
C. Faloutsos
38
38. CMU SCS
Problem#3: Generation
• Given a growing graph with count of nodes N1,
N2, …
• Generate a realistic sequence of graphs that will
obey all the patterns
MMDS 08
C. Faloutsos
39
39. CMU SCS
Problem Definition
• Given a growing graph with count of nodes N1, N2,
…
• Generate a realistic sequence of graphs that will
obey all the patterns
– Static Patterns
Power Law Degree Distribution
Power Law eigenvalue and eigenvector distribution
Small Diameter
– Dynamic Patterns
Growth Power Law
Shrinking/Stabilizing Diameters
MMDS 08
C. Faloutsos
40
40. CMU SCS
Problem Definition
• Given a growing graph with count of
nodes N1, N2, …
• Generate a realistic sequence of graphs that
will obey all the patterns
• Idea: Self-similarity
– Leads to power laws
– Communities within communities
–…
MMDS 08
C. Faloutsos
41
41. CMU SCS
Kronecker Product – a Graph
Intermediate stage
Adjacency08
MMDS matrix
C. Faloutsos
Adjacency matrix
42
42. CMU SCS
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …
MMDS 08
G4 adjacency matrix
C. Faloutsos
43
43. CMU SCS
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …
MMDS 08
G4 adjacency matrix
C. Faloutsos
44
44. CMU SCS
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …
MMDS 08
G4 adjacency matrix
C. Faloutsos
45
45. CMU SCS
Properties:
• We can PROVE that
–
–
–
–
Degree distribution is multinomial ~ power law
Diameter: constant
Eigenvalue distribution: multinomial
First eigenvector: multinomial
• See [Leskovec+, PKDD’05] for proofs
MMDS 08
C. Faloutsos
46
46. CMU SCS
Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that will obey all
the patterns
– Static Patterns
Power Law Degree Distribution
Power Law eigenvalue and eigenvector distribution
Small Diameter
– Dynamic Patterns
Growth Power Law
Shrinking/Stabilizing Diameters
• First and only generator for which we can prove
all these properties
MMDS 08
C. Faloutsos
47
47. CMU SCS
skip
Stochastic Kronecker Graphs
• Create N1×N1 probability matrix P1
• Compute the kth Kronecker power Pk
• For each entry puv of Pk include an edge
(u,v) with probability puv
0.4 0.2
0.1 0.3
Kronecker
multiplication
P1
0.16 0.08 0.08 0.04
0.04 0.12 0.02 0.06
Instance
Matrix G2
0.04 0.02 0.12 0.06
0.01 0.03 0.03 0.09
flip biased
coins
Pk
MMDS 08
C. Faloutsos
48
48. CMU SCS
Experiments
• How well can we match real graphs?
– Arxiv: physics citations:
• 30,000 papers, 350,000 citations
• 10 years of data
– U.S. Patent citation network
• 4 million patents, 16 million citations
• 37 years of data
– Autonomous systems – graph of internet
• Single snapshot from January 2002
• 6,400 nodes, 26,000 edges
• We show both static and temporal patterns
MMDS 08
C. Faloutsos
49
49. CMU SCS
(Q: how to fit the parm’s?)
A:
• Stochastic version of Kronecker graphs +
• Max likelihood +
• Metropolis sampling
• [Leskovec+, ICML’07]
MMDS 08
C. Faloutsos
50
50. CMU SCS
Experiments on real AS graph
Degree distribution
Adjacency matrix eigen values
MMDS 08
C. Faloutsos
Hop plot
Network value
51
51. CMU SCS
Conclusions
• Kronecker graphs have:
– All the static properties
Heavy tailed degree distributions
Small diameter
Multinomial eigenvalues and eigenvectors
– All the temporal properties
Densification Power Law
Shrinking/Stabilizing Diameters
– We can formally prove these results
MMDS 08
C. Faloutsos
52
52. CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08
C. Faloutsos
53
54. CMU SCS
Center-Piece Subgraph(Ceps)
• Given Q query nodes
• Find Center-piece ( ≤ b )
• App.
– Social Networks
– Law Inforcement, …
• Idea:
– Proximity -> random walk
with restarts
MMDS 08
C. Faloutsos
55
55. CMU SCS
Case Study: AND query
R. Agrawal
Jiawei Han
V. Vapnik
M. Jordan
MMDS 08
C. Faloutsos
56
61. CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08
C. Faloutsos
62
62. CMU SCS
Tensors for time evolving graphs
• [Jimeng Sun+
KDD’06]
• [ “ , SDM’07]
• [ CF, Kolda, Sun,
SDM’07 tutorial]
MMDS 08
C. Faloutsos
63
63. CMU SCS
Social network analysis
• Static: find community structures
Keywords
MMDS 08
Authors
1990
DB
C. Faloutsos
64
64. CMU SCS
Social network analysis
• Static: find community structures
MMDS 08
Authors
1992
1991
1990
DB
C. Faloutsos
65
65. CMU SCS
Social network analysis
• Static: find community structures
• Dynamic: monitor community structure evolution;
spot abnormal individuals; abnormal time-stamps
Keywords
2004
DM
DB
MMDS 08
Authors
1990
DB
C. Faloutsos
66
66. CMU SCS
Application 1: Multiway latent
semantic indexing (LSI)
Philip Yu
2004
1990
authors
DB
Ukeyword
DB
keyword
Michael
Stonebraker
Uauthors
DM
Pattern
Query
• Projection matrices specify the clusters
• Core tensors give cluster activation level
MMDS 08
C. Faloutsos
67
67. CMU SCS
Bibliographic data (DBLP)
• Papers from VLDB and KDD conferences
• Construct 2nd order tensors with yearly
windows with <author, keywords>
• Each tensor: 4584×3741
• 11 timestamps (years)
MMDS 08
C. Faloutsos
68
68. CMU SCS
Multiway LSI
Authors
Keywords
Year
michael carey, michael
stonebraker, h. jagadish,
hector garcia-molina
queri,parallel,optimization,concurr,
objectorient
1995
distribut,systems,view,storage,servic,pr
ocess,cache
2004
streams,pattern,support, cluster,
index,gener,queri
2004
surajit chaudhuri,mitch
cherniack,michael
stonebraker,ugur etintemel
DB
jiawei han,jian pei,philip s. yu,
jianyong wang,charu c. aggarwal
DM
• Two groups are correctly identified: Databases and Data
mining
• People and concepts are drifting over time
MMDS 08
C. Faloutsos
69
69. CMU SCS
Network forensics
• Directional network flows
• A large ISP with 100 POPs, each POP 10Gbps link
capacity [Hotnets2004]
– 450 GB/hour with compression
• Task: Identify abnormal traffic pattern and find out the
cause
normal traffic
destination
destination
abnormal traffic
source
source
(with Prof. Hui FaloutsosDr. Yinglian Xie) 70
MMDS 08
C. Zhang and
70. CMU SCS
Conclusions
Tensor-based methods (WTA/DTA/STA):
• spot patterns and anomalies on time
evolving graphs, and
• on streams (monitoring)
MMDS 08
C. Faloutsos
71
71. CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08
C. Faloutsos
72
73. CMU SCS
Virus propagation
• How do viruses/rumors propagate?
• Blog influence?
• Will a flu-like virus linger, or will it
become extinct soon?
MMDS 08
C. Faloutsos
74
74. CMU SCS
The model: SIS
• ‘Flu’ like: Susceptible-Infected-Susceptible
• Virus ‘strength’ s= β/δ
Healthy
Prob. δ
N2
Prob. β
N1
N
Infected
MMDS 08
Pro
b. β
N3
C. Faloutsos
75
75. CMU SCS
Epidemic threshold τ
of a graph: the value of τ, such that
if strength s = β / δ < τ
an epidemic can not happen
Thus,
• given a graph
• compute its epidemic threshold
MMDS 08
C. Faloutsos
76
76. CMU SCS
Epidemic threshold τ
What should τ depend on?
• avg. degree? and/or highest degree?
• and/or variance of degree?
• and/or third moment of degree?
• and/or diameter?
MMDS 08
C. Faloutsos
77
78. CMU SCS
Epidemic threshold
• [Theorem] We have no epidemic, if
epidemic threshold
recovery prob.
β/δ <τ = 1/ λ1,A
largest eigenvalue
of adj. matrix A
attack prob.
Proof: [Wang+03]
MMDS 08
C. Faloutsos
79
86. CMU SCS
Blog analysis
•
•
•
•
with Mary McGlohon (CMU)
Jure Leskovec (CMU)
Natalie Glance (now at Google)
Mat Hurst (now at MSR)
[SDM’07]
MMDS 08
C. Faloutsos
87
87. CMU SCS
Cascades on the Blogosphere
B1
B2
1
1
B1
a
B2
1
B3
B4
Blogosphere
blogs + posts
1
B3
b
2
B4
Blog network
links among blogs
3
d
e
Post network
links among posts
Q1: popularity-decay of a post?
Q2: degree distributions?
MMDS 08
C. Faloutsos
c
88
88. CMU SCS
Q1: popularity over time
# in links
1
2
3
days after post
Post popularity drops-off – exponentially?
MMDS 08
C. Faloutsos
89
Days after post
89. CMU SCS
Q1: popularity over time
# in links
(log)
1
2
3
days after post
(log)
Post popularity drops-off – exponentially?
POWER LAW!
Exponent?
MMDS 08
C. Faloutsos
90
Days after post
90. CMU SCS
Q1: popularity over time
# in links
(log)
-1.6
1
2
3
days after post
(log)
Post popularity drops-off – exponentially?
POWER LAW!
Exponent? -1.6 (close to -1.5: Barabasi’s stack model)
MMDS 08
C. Faloutsos
91
Days after post
91. CMU SCS
Q2: degree distribution
44,356 nodes, 122,153 edges. Half of blogs
belong to largest connected component.
count
B
1
??
1
1
1
B
2
2
B
B
3
3
4
blog in-degree
MMDS 08
C. Faloutsos
92
92. CMU SCS
Q2: degree distribution
44,356 nodes, 122,153 edges. Half of blogs
belong to largest connected component.
count
B
1
1
1
1
B
2
2
B
B
3
3
4
blog in-degree
MMDS 08
C. Faloutsos
93
93. CMU SCS
Q2: degree distribution
44,356 nodes, 122,153 edges. Half of blogs
belong to largest connected component.
count
in-degree slope: -1.7
out-degree: -3
‘rich get richer’
MMDS 08
C. Faloutsos
blog in-degree
94
94. CMU SCS
Next steps:
• edges with categorical attributes and/or timestamps
• nodes with attributes
• scalability (hadoop – PetaByte scale)
– first eigenvalue; diameter [done]
– rest eigenvalues; community detection [to be done]
– modularity, anomalies etc etc
• visualization (-> summarization)
MMDS 08
C. Faloutsos
95
95. CMU SCS
E.g.: self-* system @ CMU
• >200 nodes
• target: 1 PetaByte
MMDS 08
C. Faloutsos
96
97. CMU SCS
Scalability
• Google: > 450,000 processors in clusters of ~2000
processors each
Barroso, Dean, Hölzle, “Web Search for a
Planet: The Google Cluster Architecture”
IEEE Micro 2003
•
•
•
•
Yahoo: 5Pb of data [Fayyad, KDD’07]
Problem: machine failures, on a daily basis
How to parallelize data mining tasks, then?
A: map/reduce – hadoop (open-source clone) http://
hadoop.apache.org/
MMDS 08
C. Faloutsos
98
98. CMU SCS
2’ intro to hadoop
• master-slave architecture; n-way replication
(default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)
• e.g, find histogram of word frequency
– slaves compute local histograms
– master merges into global histogram
select course-id, count(*)
from ENROLLMENT
group by course-id
MMDS 08
C. Faloutsos
99
99. CMU SCS
2’ intro to hadoop
• master-slave architecture; n-way replication
(default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)
• e.g, find histogram of word frequency
– slaves compute local histograms
– master merges into global histogram
select course-id, count(*)
from ENROLLMENT
group by course-id
MMDS 08
C. Faloutsos
reduce
map
100
100. CMU SCS
OVERALL CONCLUSIONS
• Graphs: Self-similarity and power laws
work, when textbook methods fail!
• New patterns (shrinking diameter!)
• New generator: Kronecker
• SVD / tensors / RWR: valuable tools
• hadoop/mapReduce for scalability
MMDS 08
C. Faloutsos
101
101. CMU SCS
References
• Hanghang Tong, Christos Faloutsos, and Jia-Yu
Pan
Fast Random Walk with Restart and Its Applications
ICDM 2006, Hong Kong.
• Hanghang Tong, Christos Faloutsos Center-Piece
Subgraphs
: Problem Definition and Fast Solutions, KDD
2006, Philadelphia, PA
MMDS 08
C. Faloutsos
102
102. CMU SCS
References
• Jure Leskovec, Jon Kleinberg and Christos
Faloutsos
Graphs over Time: Densification Laws, Shrinking Diame
KDD 2005, Chicago, IL. ("Best Research Paper"
award).
• Jure Leskovec, Deepayan Chakrabarti, Jon
Kleinberg, Christos Faloutsos
Realistic, Mathematically Tractable Graph Generation
(ECML/PKDD 2005), Porto, Portugal, 2005.
MMDS 08
C. Faloutsos
103
103. CMU SCS
References
• Jure Leskovec and Christos Faloutsos, Scalable
Modeling of Real Graphs using Kronecker
Multiplication, ICML 2007, Corvallis, OR, USA
• Shashank Pandit, Duen Horng (Polo) Chau,
Samuel Wang and Christos Faloutsos NetProbe
: A Fast and Scalable System for Fraud Detection in Onl
WWW 2007, Banff, Alberta, Canada, May 8-12,
2007.
• Jimeng Sun, Dacheng Tao, Christos Faloutsos
Beyond Streams and Graphs: Dynamic Tensor Analysis,
KDD 2006, Philadelphia, PA
MMDS 08
C. Faloutsos
104
104. CMU SCS
References
• Jimeng Sun, Yinglian Xie, Hui Zhang, Christos
Faloutsos. Less is More: Compact Matrix
Decomposition for Large Sparse Graphs, SDM,
Minneapolis, Minnesota, Apr 2007. [pdf]
MMDS 08
C. Faloutsos
105
<number>
Diameter first, DPL second
Check diameter formulas
As the network grows the distances between nodes slowly grow
<number>
Diameter first, DPL second
Check diameter formulas
As the network grows the distances between nodes slowly grow
<number>
<number>
<number>
<number>
<number>
<number>
CHECKMARKS
<number>
There are increasing interests in graph mining. In this talk, we introduce center-piece subgraphs. That is, given Q query nodes, we want to find nodes and the resulting subgraph, that have strong connection to all or most of the query nodes.
Our Ceps alg. takes three parameters as input: Q query nodes; the budget b, indicating how large the resulting subgraph is; and the k softand coefficient, which I will explain it later.
Here is an illustrating example, the upper figure is the original graph and given three query nodes (the red one), the importance of those intermediate nodes are indicated in the bottom figure.
Eh, throughout the talk, we use the color to indicate the importance of the nodes, more read means more important. And subsequently, its center-piece subgraph could be as follows.
In social network, say in a authorship network, we can use ceps to discover their common advisor, or an influential author in the field that Q query nodes belongs to. In law inforcement, the query nodes could be the suspects and by ceps, we can find the master-mind.
<number>
Since christos is here, I will not make any comments on this face.
<number>
Since christos is here, I will not make any comments on this face.
<number>
Since christos is here, I will not make any comments on this face.