Shunya Kimura presents on mining the social graph using Neo4j. The presentation discusses past work using relational databases and the challenges of maintenance and scalability. Graph databases like Neo4j are introduced as an alternative that can handle large graphs with millions of nodes and relationships. Examples are provided on loading sample social graph data into an embedded Neo4j instance and performing simple traversals on the graph.
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Graph mining 2: Statistical approaches for graph miningtuxette
Workshop "Advanced mathematics for network analysis"
organized by Institut des Systèmes Complexes de Toulouse
http://isc-t.fr/evenements/?event_id1=2
Luchon, France
May, 3rd 2016
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Graph mining 2: Statistical approaches for graph miningtuxette
Workshop "Advanced mathematics for network analysis"
organized by Institut des Systèmes Complexes de Toulouse
http://isc-t.fr/evenements/?event_id1=2
Luchon, France
May, 3rd 2016
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
What's The Social Graph Got To Do With It?Alisa Leonard
The social graph and data portability have long been a geek discussion. This is hopefully a nice primer for marketers to start thinking about the social graph and its potential. Granted, there are many counterpoints to the ideas expressed here, and many other issues around data portability including decentralization, the creation of microformats and how FBC is contrary to the Open movement...but I wanted this to be an intro for marketers.
Part 1: Concepts and Cases (the language of networks, networks in organizations, case studies and key concepts)
Part 2: (Starts on #44) Mapping Organizational, Personal, and Enterprise Networks: Tools
An update to last year's Social Network Analysis Introduction and Tools...
Leverage the Power of the Social Graph is the Powerpoint presentation from a PRSA sponsored webinar with Steve Momorella from TEKGROUP. The slides outline tips, tricks and examples of how to leverage the power of your social graph, your online newsroom, and your PR outreach efforts.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. We show that fractals and self-similarity can explain several of the observed patterns, and we conclude with cascade analysis and a surprising result on virus propagation and immunization.
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -
Description
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Abstract
This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.
The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:
* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way
Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.
An introductory-to-mid level to presentation to complex network analysis: network metrics, analysis of online social networks, approximated algorithms, memorization issues, storage.
With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by Internet players.
Data Mining Seminar - Graph Mining and Social Network Analysisvwchu
Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures
How to Leverage the Social Graph with Facebook PlatformDave Olsen
Facebook is about more than just Pages and Groups. Facebook's set of powerful APIs, Facebook Platform, has made it easier than ever to create engaging social experiences on your own sites. We'll talk about why you will want to take advantage of Facebook Platform, share an example of using Facebook Platform to drive engagement and give you several strategies for how you can go back to your campus and quickly take advantage of Facebook Platform.
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
There are many possible approaches to storing and querying relationships between users in social networks. This section will dive into the details of storing a social user graph in MongoDB. It will cover the various schema designs for storing the follower networks of users and propose an optimal design for insert and query performance, as well as looking at performance differences between them.
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Gabriele Baldassarre
Introduction to Talend Open Studio for Data Integration, focusing on job architecture, metadata, workspaces, connection types and common use components. Rick Tips & Tricks sections
What's The Social Graph Got To Do With It?Alisa Leonard
The social graph and data portability have long been a geek discussion. This is hopefully a nice primer for marketers to start thinking about the social graph and its potential. Granted, there are many counterpoints to the ideas expressed here, and many other issues around data portability including decentralization, the creation of microformats and how FBC is contrary to the Open movement...but I wanted this to be an intro for marketers.
Part 1: Concepts and Cases (the language of networks, networks in organizations, case studies and key concepts)
Part 2: (Starts on #44) Mapping Organizational, Personal, and Enterprise Networks: Tools
An update to last year's Social Network Analysis Introduction and Tools...
Leverage the Power of the Social Graph is the Powerpoint presentation from a PRSA sponsored webinar with Steve Momorella from TEKGROUP. The slides outline tips, tricks and examples of how to leverage the power of your social graph, your online newsroom, and your PR outreach efforts.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. We show that fractals and self-similarity can explain several of the observed patterns, and we conclude with cascade analysis and a surprising result on virus propagation and immunization.
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -
Description
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Abstract
This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.
The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:
* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way
Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.
An introductory-to-mid level to presentation to complex network analysis: network metrics, analysis of online social networks, approximated algorithms, memorization issues, storage.
With the recent growth of the graph-based data, the large graph processing becomes more and more important. In order to explore and to extract knowledge from such data, graph mining methods, like community detection, is a necessity. The legacy graph processing tools mainly rely on single machine computational capacity, which cannot process large graphs with billions of nodes. Therefore, the main challenge of new tools and frameworks lies on the development of new paradigms that are scalable, efficient and flexible. In this paper, we review the new paradigms of large graph processing and their applications to graph mining domains using the distributed and shared nothing approach used for large data by Internet players.
Data Mining Seminar - Graph Mining and Social Network Analysisvwchu
Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures
How to Leverage the Social Graph with Facebook PlatformDave Olsen
Facebook is about more than just Pages and Groups. Facebook's set of powerful APIs, Facebook Platform, has made it easier than ever to create engaging social experiences on your own sites. We'll talk about why you will want to take advantage of Facebook Platform, share an example of using Facebook Platform to drive engagement and give you several strategies for how you can go back to your campus and quickly take advantage of Facebook Platform.
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
There are many possible approaches to storing and querying relationships between users in social networks. This section will dive into the details of storing a social user graph in MongoDB. It will cover the various schema designs for storing the follower networks of users and propose an optimal design for insert and query performance, as well as looking at performance differences between them.
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Gabriele Baldassarre
Introduction to Talend Open Studio for Data Integration, focusing on job architecture, metadata, workspaces, connection types and common use components. Rick Tips & Tricks sections
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
5. Motivation for social graph analysis
Test of millions of nodes, hundreds of millions of edges.
The diversity of graph algorithm by developing distributed processing technology.
Challenging.
6. Number of users on mixi
30000000
ID
22500000
# of member id
15000000
7500000
0
2007 2008 2009 2010 2011
year
56. GraphDB Neo4j
• True ACID transactions
• High availability
• Scales to billions of nods and relationships
• High speed querying through traversals
Single instance(GPLv3) Multiple instance(AGPLv3)
Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabase
Standalone Neo4j Server Neo4j Server high availability mode
http://neo4j.org/
57. Other my favorite features
for Neo4j
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
58. Other my favorite features
for Neo4j
• RESTful APIs
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
59. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
60. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
– lucene
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
61. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
– lucene
• Implemented graph algorithm
– A*, Dijkstra
– High speed traverse
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
62. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
– lucene
• Implemented graph algorithm
– A*, Dijkstra
– High speed traverse
• Gremlin supported
– Like a query language
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
65. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Server
66. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system
Server
67. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system Analyses system
Server
68. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system Analyses system
Server
69. Introduction simple Neo4j usecase
Single node Multi node
Analyses system
Embedded
Analyses system
Analyses system Analyses system
Server
70. Introduction simple Neo4j usecase
Single node Multi node
Analyses system
Embedded
Analyses system
Analyses system Analyses system
Server
71. Introduction to simple
embedded Neo4j
• Insert Vertices & make Relationships
• Single node & Embedded
• Traversal sample
72. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) {
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
73. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) {
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
74. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
75. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
76. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
77. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
78. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
ID: 3
firstNode.setProperty("Name", "Kimura"); Relation: Like
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
79. Batch Insert
• Non thread safe, non transaction
• But very fast!
public final class Batch {
public static void main(final String[] args) {
BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j",
BatchInserterImpl.loadProperties("/tmp/neo4j.props"));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Name", "Kimura");
prop.put("Age", 21);
long node1 = inserter.createNode(prop);
prop.put("Name", "Kato");
prop.put("Age", 21);
long node2 = inserter.createNode(prop);
inserter.createRelationship(node1, node2,
DynamicRelationshipType.withName("LIKE"), null);
inserter.shutdown();
}
}
80. Traversal sample
• You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
Order.DEPTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
81. Traversal sample
• You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
82. Traversal sample
• You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
83. Traversal sample
• You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
// to get the type of node
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
84. Traversal sample
• You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
// to get the type of node
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
// type of relational for traverse
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
85. Traversal sample
• You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
// to get the type of node
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
// type of relational for traverse
DynamicRelationshipType.withName("LIKE"),
// specify a edge type for traverse
Direction.OUTGOING); INCOMING, BOTH
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
100. experiment
• Store the mixi’s social graph for Neo4j
• Condition
• Machine: 24 core CPU, Memory 65GB
• Neo4j: BatchInsert, community, embedded
• Data
• # of node 15 million # of edge 600 million
101. experiment
• Store the mixi’s social graph for Neo4j
• Condition
• Machine: 24 core CPU, Memory 65GB
• Neo4j: BatchInsert, community, embedded
• Data
• # of node 15 million # of edge 600 million
process time 513m17sec (about 8.6h)
102. Network Dataset
• Stanford Large Network Dataset Collection
• SNAP has a Wide variety of graph data!
Social Networks Communication networks
Citation networks Collaboration networks
Web graphs Product co-purchasing networks
Internet peer-to-peer networks Road networks
Autonomous systems graphs Signed networks
Wikipedia networks and metadata Memetracker and Twitter
http://snap.stanford.edu/data/index.html
110. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality
111. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
112. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality
113. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality betweenness centrality
114. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality betweenness centrality
eigenvector centrality
115. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality betweenness centrality
eigenvector centrality centraization
116. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
degree centralitybetweenness centrality
eigenvector centrality centraization
117. Centrality
• Centrality
• to measure the importance of eahc nodes
closeness centrality Pagerank
degree centralitybetweenness centrality
eigenvector centrality centraization
118. Degree centrality
• The simplest measuring.
• Counting the number of edge of each nodes.
• num of friends
119. Degree centrality
• The simplest measuring.
• Counting the number of edge of each nodes.
• num of friends
1 1
1
120. Degree centrality
• The simplest measuring.
• Counting the number of edge of each nodes.
• num of friends
2
1 1
2
1
2
121. Degree centrality
• The simplest measuring.
• Counting the number of edge of each nodes.
• num of friends
2
1 1
2
1
2
122. Degree centrality
• The simplest measuring.
• Counting the number of edge of each nodes.
• num of friends
2
1 1
5
2
1
2
123. Degree centrality
• The simplest measuring.
• Counting the number of edge of each nodes.
• num of friends
2
1 1
5
2
1
2
124. Degree distribution of mixi
• Random sampling the 1000 users
• the summary of degree sistribution
Min 1st Que. Median Mean 3rd Que. Max
1.00 3.00 10.00 25.69 30.00 903.00
131. Clustering coefficient
• Random sampling the 1000 users
• summary for Clustering coefficient
Min 1st Que. Median Mean 3rd Que. Max
0.00 0.00 0.1157 0.2071 0.2667 1.000
139. • Visualize a my social graph on mixi
• Weighting the Edge
• Amount of communication(color, thickness)
• Weighting the Vertex
• cluster coefficient(color, thickness)
• visualization tool Gephi
http://gephi.org/
140.
141. • Motivation for Social Graph mining
• Overview for GraphDB
• Introduction for Neo4j
• The samples for graph analysis with R
• Introduction Visualization tool Gephi