The document discusses community detection in networks. It defines communities as densely connected groups of nodes that are sparsely connected to other dense groups. Examples of communities include friend groups in social networks and voting coalitions in legislative networks. The document illustrates community detection using examples of social networks and voting networks, and notes that community assignments depend on context. It also discusses factors like directedness, edge weights, and resolution that methods may or may not incorporate and can affect community detection.
Slides from PyData Berlin, July 2017 meetup
Plotly Dash is a newest addition to a rich ecosystem of tools to build visual data science and BI applications in Python.
Graph Machine Learning in Production with Neo4jNeo4j
In our presentation at Data Innovation Summit 2023, we explained how you could accelerate AI and machine learning innovation by using graph data science.
This comes down to three things: 1. Getting your data into a graph
2. Use graph algorithms to find what’s important
3. Use machine learning to make predictions on your graph
We covered these three key steps with code examples and discussed some key considerations when moving your ML workloads to production.
Presentation given at the Analytics Frontier in Charlotte on March 21. The presentation covers the opportunities and risks of AI and how consumers, businesses, society, and governments can mitigate these risks.
The objective of this project is to discuss the importance of Machine Learning in different sectors and how does it solve the problems in the Marketing Analytics field. We have discussed Marketing Segmentation, Advertisement, and Fraud detection in our project. We used different Machine Learning algorithms and used R and Python library to predict and solve these problems. After making models and running test data on those models we got following results:
• We trained a Decision tree and Random Forest classifier model which has 73% accuracy to predict whether a person will be a defaulter or not based on credit history, income, job type, dependents etc.
• We segmented the Social networking profiles based on the likes and dislikes of a person using K-Means Clustering.
• We made a predictive model of the messages a customer receives and determined whether a message will be a Spam or not a spam with an accuracy of 97%. We used Naïve Bayes classifier for this model.
Slides from PyData Berlin, July 2017 meetup
Plotly Dash is a newest addition to a rich ecosystem of tools to build visual data science and BI applications in Python.
Graph Machine Learning in Production with Neo4jNeo4j
In our presentation at Data Innovation Summit 2023, we explained how you could accelerate AI and machine learning innovation by using graph data science.
This comes down to three things: 1. Getting your data into a graph
2. Use graph algorithms to find what’s important
3. Use machine learning to make predictions on your graph
We covered these three key steps with code examples and discussed some key considerations when moving your ML workloads to production.
Presentation given at the Analytics Frontier in Charlotte on March 21. The presentation covers the opportunities and risks of AI and how consumers, businesses, society, and governments can mitigate these risks.
The objective of this project is to discuss the importance of Machine Learning in different sectors and how does it solve the problems in the Marketing Analytics field. We have discussed Marketing Segmentation, Advertisement, and Fraud detection in our project. We used different Machine Learning algorithms and used R and Python library to predict and solve these problems. After making models and running test data on those models we got following results:
• We trained a Decision tree and Random Forest classifier model which has 73% accuracy to predict whether a person will be a defaulter or not based on credit history, income, job type, dependents etc.
• We segmented the Social networking profiles based on the likes and dislikes of a person using K-Means Clustering.
• We made a predictive model of the messages a customer receives and determined whether a message will be a Spam or not a spam with an accuracy of 97%. We used Naïve Bayes classifier for this model.
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
AstraZeneca share their experience of share their experience of building a knowledge graph platform and central service, to power the next generation of insights and analytics at AstraZeneca.
Presentation - Racial and Gender Bias in AI by Gunay Kazimzade. Gunay Kazimzade is working at the Weizenbaum Institute for the Networked Society and she is also a Ph.D. student in Computer Science at the Technical University of Berlin. After Applied Mathematics and Computer Science degrees, she was involved in the education field and managed two social projects focused on women and children Computer Science education. Trained over 3000 women and children in Azerbaijan. Currently working with the Research Group "Criticality of Artificial Intelligence-based systems". Her main research directions are Gender and racial bias in AI, inclusiveness in AI and AI-enhanced education. She is a TEDx speaker participating and presenting in various conferences and summits happening in Europe.
Watch this recorded demonstration of SnapLogic from our team of experts who answer your hybrid cloud and big data integration questions.
demo, ipaas, elastic integration, cloud data, app integration, data integration, hybrid could integration, big data, big data integration
Advanced Analytics and Data Science ExpertiseSoftServe
An overview of SoftServe's Data Science service line.
- Data Science Group
- Data Science Offerings for Business
- Machine Learning Overview
- AI & Deep Learning Case Studies
- Big Data & Analytics Case Studies
Visit our website to learn more: http://www.softserveinc.com/en-us/
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
Ini adalah slide tambahan dari materi pengenalan Big Data Analytics (di file berikutnya), yang mengajak kita mulai hands-on dengan beberapa hal terkait Machine/Deep Learning, Big Data (batch/streaming), dan AI menggunakan Tensor Flow
Combinatorial chemistry has produced a huge amount of chemical libraries and data banks which include prospective drugs. Despite all of this progress, the fundamental problem still remains: how do we take advantage of this data to identify the prospective nature of a compound as a vital drug? Traditional methodologies fail to provide a solution to this.
Grakn, however, provides the framework which can make drug discovery much more efficient, effective and approachable. This radical advancement in technology can model biological knowledge complexity as it is found at its core. With concepts such as hyper relationships, type hierarchies, automated reasoning and analytics we can finally model, represent, and query biological knowledge at an unprecedented scale.
Check out the following links to learn more:
Grakn: http://grakn.ai/
Drug Discovery Knowledge Graph Blog Post: https://blog.grakn.ai/drug-discovery-knowledge-graphs-46db4212777c
BioGrakn: https://github.com/graknlabs/biograkn
Quick introduction to community detection.
Structural properties of real world networks, definition of "communities", fundamental techniques and evaluation measures.
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
AstraZeneca share their experience of share their experience of building a knowledge graph platform and central service, to power the next generation of insights and analytics at AstraZeneca.
Presentation - Racial and Gender Bias in AI by Gunay Kazimzade. Gunay Kazimzade is working at the Weizenbaum Institute for the Networked Society and she is also a Ph.D. student in Computer Science at the Technical University of Berlin. After Applied Mathematics and Computer Science degrees, she was involved in the education field and managed two social projects focused on women and children Computer Science education. Trained over 3000 women and children in Azerbaijan. Currently working with the Research Group "Criticality of Artificial Intelligence-based systems". Her main research directions are Gender and racial bias in AI, inclusiveness in AI and AI-enhanced education. She is a TEDx speaker participating and presenting in various conferences and summits happening in Europe.
Watch this recorded demonstration of SnapLogic from our team of experts who answer your hybrid cloud and big data integration questions.
demo, ipaas, elastic integration, cloud data, app integration, data integration, hybrid could integration, big data, big data integration
Advanced Analytics and Data Science ExpertiseSoftServe
An overview of SoftServe's Data Science service line.
- Data Science Group
- Data Science Offerings for Business
- Machine Learning Overview
- AI & Deep Learning Case Studies
- Big Data & Analytics Case Studies
Visit our website to learn more: http://www.softserveinc.com/en-us/
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
Ini adalah slide tambahan dari materi pengenalan Big Data Analytics (di file berikutnya), yang mengajak kita mulai hands-on dengan beberapa hal terkait Machine/Deep Learning, Big Data (batch/streaming), dan AI menggunakan Tensor Flow
Combinatorial chemistry has produced a huge amount of chemical libraries and data banks which include prospective drugs. Despite all of this progress, the fundamental problem still remains: how do we take advantage of this data to identify the prospective nature of a compound as a vital drug? Traditional methodologies fail to provide a solution to this.
Grakn, however, provides the framework which can make drug discovery much more efficient, effective and approachable. This radical advancement in technology can model biological knowledge complexity as it is found at its core. With concepts such as hyper relationships, type hierarchies, automated reasoning and analytics we can finally model, represent, and query biological knowledge at an unprecedented scale.
Check out the following links to learn more:
Grakn: http://grakn.ai/
Drug Discovery Knowledge Graph Blog Post: https://blog.grakn.ai/drug-discovery-knowledge-graphs-46db4212777c
BioGrakn: https://github.com/graknlabs/biograkn
Quick introduction to community detection.
Structural properties of real world networks, definition of "communities", fundamental techniques and evaluation measures.
1. Basics of Social Networks
2. Real-world problem
3. How to construct graph from real-world problem?
4. What graph theory problem getting from real-world problem?
5. Graph type of Social Networks
6. Special properties in social graph
7. How to find communities and groups in social networks? (Algorithms)
8. How to interpret graph solution back to real-world problem?
Clustering Methods and Community Detection with NetworkX. A slide deck for the NTU Complexity Science Winter School.
For the accompanying iPython Notebook, visit: http://github.com/eflegara/NetStruc
Lecture Slides for Internet and Society course at the University of Edinburgh on understanding the analysis of community and internet (amd mobile etc), using ideas from studies of CMC, social network studies, social capital etc https://www.wiki.ed.ac.uk/display/IandS/Internet+and+Society+Home
Marc Smith - Charting Collections of Connections in Social Media: Creating Ma...Saratoga
An advocate for open tools, open data and open scholarships, Marc Smith strives for access to information to be available to all. Pioneering the possibilities through charting collections and creating maps with NodeXL.
Slides from talks presented at Mammoth BI in Cape Town on 17 November 2014.
Visit www.mammothbi.co.za for details on the event. Follow @MammothBI on twitter.
Strengthening Civil Society Through Social Media: with notesDavid Wilcox
Presentation for 21st century network, February 28 2012. With notes
At times of financial restraint and when Governments are looking at how civil society can be recruited to deliver on their own agenda then how can we ensure that the many associations that make up civil society can protect their independence. Can social networking help create a network of mutual independence that strengthens the countless groups that are the social glue of our civil society?
http://www.meetup.com/21stCenturyNetwork/events/41358702/
This paper intends to show a positive view on changes occurring in communities and social relationships in the age of the network society. With the emergence of new technologies, the meaning of “community” is changing from the traditional neighborhood community, to a group of people that are more tied together in terms of social networks, connected through various networks including computer networks. The hypothesis is that in such communities there are formations of weak ties that connect people with different social backgrounds or communities and thus bring in useful information and connections into one’s life. The effect of weak ties is also significant in process of innovations, which enables small contributions by a large number of people in order to complete a task or an event. Despite early criticism that network society would fragment social ties and families, this essay will explain the positive side of the changes.
Why We Are Open Sourcing ContraxSuite and Some Thoughts About Legal Tech and ...Daniel Katz
Why We Are Open Sourcing ContraxSuite and Some Thoughts About Legal Tech and the Modern Information Economy - By Michael Bommarito + Daniel Martin Katz from LexPredict
Fin (Legal) Tech – Law’s Future from Finance’s Past (Some Thoughts About the ...Daniel Katz
Fin (Legal) Tech – Law’s Future from Finance’s Past (Some Thoughts About the Financialization of the Law) – Professors Daniel Martin Katz + Michael J Bommarito
Exploring the Physical Properties of Regulatory Ecosystems - Professors Danie...Daniel Katz
Exploring the Physical Properties of Regulatory Ecosystems: Regulatory Dynamics Revealed by Securities Filings — Professors Daniel Martin Katz + Michael J Bommarito
Artificial Intelligence and Law - A Primer Daniel Katz
Artificial Intelligence in Law (and beyond) including Machine Learning as a Service, Quantitative Legal Prediction / Legal Analytics, Experts + Crowds + Algorithms
LexPredict - Empowering the Future of Legal Decision MakingDaniel Katz
LexPredict is an enterprise legal technology and consulting firm, specializing in the application of best-in-class processes and technologies from the technology, financial services, and logistics industries to the practice of law, compliance, insurance, and risk management.
We focus on the goals of prediction, optimization, and risk management to enable holistic organizational changes that empower legal decision-making.
These changes span people and processes, software and data, and execution and education.
Legal Analytics Course - Class 9 - Clustering Algorithms (K-Means & Hierarchical Clustering) - Professor Daniel Martin Katz + Professor Michael J Bommarito
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
2. Defini:on
–
Simple
Version
— Broadly:
a
group
of
nodes
that
are
rela&vely
densely
connected
to
each
other
but
sparsely
connected
to
other
dense
groups
in
the
network
¡ Porter,
Onnela,
Mucha.
Communi&es
in
Networks.
No:ces
to
the
AMS,
2009.
— Examples:
¡ Cliques
in
a
high
school
social
network
¡ Vo:ng
coali:ons
in
Congress
¡ Consumer
types
in
a
network
of
co-‐purchases
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
3. Example
–
Social
Networks
Imagine
this
Graph
….
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
4. Example
–
Social
Networks
VerJces:
People
Edges:
Friendship
What
factors
might
affect
the
formaJon
of
friendships
in
a
high
school
social
network?
Ideas:
Age,
Gender,
Class,
Race,
Interests
How
might
we
assign
communiJes
to
this
network?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
5. Example
–
Social
Networks
VerJces:
People
Edges:
Friendship
Girls
What
factors
might
affect
the
formaJon
of
friendships
in
a
high
school
social
network?
Ideas:
Age,
Gender,
Class,
Race,
Interests
Boys
How
might
we
assign
communiJes
to
this
network?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
6. Example
–
Vo:ng
Coali:ons
VerJces:
People
Edges:
Co-‐voted
at
least
once
Now
let s
look
at
the
same
network
as
if
it
represented
co-‐voJng
in
the
Senate.
Ideas:
Issue
posi:on,
geography,
ethnicity,
gender
How
might
we
assign
communiJes
to
this
network?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
7. Example
–
Vo:ng
Coali:ons
Republicans
VerJces:
People
Democrats
Edges:
Co-‐voted
at
least
once
Now
let s
look
at
the
same
network
as
if
it
represented
co-‐voJng
in
the
Senate.
Ideas:
Issue
posi:on,
geography,
ethnicity,
gender
How
might
we
assign
communiJes
to
this
network?
Independents
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
8. Context!
Note
that
we
have
assigned
community
membership
differently
despite
observing
the
same
graph!
Community
detecJon
is
not
a
concept
that
can
be
divorced
from
context.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
10. Directedness
Many
methods
do
not
incorporate
direcJon!
Many
methods
that
do
incorporate
direcJon
do
not
allow
for
bidirected
edges.
Different
soVware
packages
may
implement
the
same
method
with
or
without
support
for
directed
edges.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
11. Weights
Unweighted
Weighted
•
Binary
rela:onships
•
Rela:onship
strength
•
Data
limita:ons
•
Frequency
of
rela:onship
•
Flow
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
12. Weights
Note
edge
thickness.
Unweighted
Weighted
•
Binary
rela:onships
•
Rela:onship
strength
•
Data
limita:ons
•
Frequency
of
rela:onship
•
Flow
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
13. Weights
Many
methods
do
not
incorporate
edge
weights!
Methods
that
do
incorporate
edge
weights
may
differ
in
acceptable
values!
•
Integers
or
real
weights
•
Strictly
posi:ve
weights
Different
soVware
packages
may
implement
the
same
method
with
or
without
support
for
weighted
edges.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
14. Resolu:on
Resolu:on
is
a
concept
inherited
from
op:cs.
According
to
Wiki,
Op,cal
resolu,on
describes
the
ability
of
an
imaging
system
to
resolve
detail
in
the
object
that
is
being
imaged.
High
resoluJon)
Low
resoluJon
•
Can
make
out
many
details!
(15.1MP)
•
Can t
read
a
word!
•
But…
•
But…
•
Details
may
be
noise
•
Can
focus
on
broad
regions
•
Some:mes
they
don t
ma]er!
•
Noise
is
out
of
focus
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
15. Resolu:on
Same
graphs!
High
resoluJon
(microscopic)
Low
resoluJon
(macroscopic)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
16. Resolu:on
Different
hypotheses
or
quesJons
correspond
to
different
resoluJons.
Different
methods
are
more
or
less
effecJve
at
detecJng
community
structure
at
different
resoluJons.
Modularity-‐based
methods
cannot
detect
structure
below
a
known
resoluJon
limit.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
17. Overlapping
Communi:es
Palla,
Derenyi,
Farkas
,Vicsek.
Uncovering
the
overlapping
community
structure
of
complex
networks
in
nature
and
society
Nature
435,
2005.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
18. Computa:onal
Complexity
Refresher
ComputaJonal
complexity
is
a
serious
issue!
Data
is
becoming
more
abundant
and
more
detailed.
Many
quan:ta:ve
research
projects
hinge
on
the
feasibility
of
calcula:ons.
Understanding
computa:onal
complexity
can
allow
you
to
communicate
with
department
IT
personnel
or
computer
scien:sts
to
solve
your
problem.
Make
sure
your
project
is
feasible
before
commi[ng
the
Jme!
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
19. Computa:onal
Complexity
Refresher
Computa:onal
complexity
in
the
context
of
modern
compu:ng
is
primarily
focused
on
two
resources:
1.
Time:
How
long
does
it
take
to
perform
a
sequence
of
opera:ons?
• CPU/GPU
• Exact
vs.
approximate
solu:ons
2.
Storage:
How
much
space
does
it
take
to
store
our
problem?
• Memory
and
persistent
storage
(to
a
lesser
degree)
• Data
representa:ons
We
tend
to
communicate
:me
and
storage
complexity
through
Big-‐O
nota:on.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
20. Computa:onal
Complexity
Refresher
In
computa:onal
complexity,
Big-‐O
nota:on
conveys
informa:on
about
how
:me
and
storage
costs
scale
with
inputs.
•
O(1):
constant
-‐
independent
of
input
•
O(n):
scales
linearly
with
the
size
of
input
•
O(n^2):
scales
quadra:cally
with
the
size
of
input
•
O(n^3):
scales
cubically
with
the
size
of
input
These
terms
ofen
occur
with
log
n
terms
and
are
then
given
the
prefix
quasi-‐.
For
graph
algorithms,
the
input
n
is
typically
• |V|,
the
number
of
ver:ces
• |E|,
the
number
of
edges
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
21. Taxonomy
of
Methods
This
taxonomy
of
methods
follows
the
history
of
their
development.
• Divisive
Methods
• Edge-‐betweenness
(2002)
• Modularity
Methods
• Fast-‐greedy
(2004)
• Leading
Eigenvector
(2006)
• Dynamic
Methods
• Clique
percola:on
(2005)
• Walktrap
(2005)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
22. Edge
Betweenness
PublicaJon(s):
Girvan,
Newman.
Community
structure
in
social
and
biological
networks.
PNAS,
2002.
Basic
Idea:
Divide
the
network
into
subsequently
smaller
pieces
by
finding
edges
that
bridge
communi:es.
Constraints:
•
Can
be
adapted
to
directed
networks
(igraph).
•
Can
be
adapted
to
weights
(no
public
sofware).
Time
Complexity:
O(|V|^3)
in
general,
O(|V|^2
log
|V|)
for
special
cases
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
24. Quick
Aside
–
Zach s
Karate
Club
Zachary's
Karate
Club:
Social
network
of
friendships
between
34
members
of
a
karate
club
at
a
US
university
in
the
1970s
Event:
During
the
observa:on
period,
the
club
broke
into
2
smaller
clubs.
This
split
occurred
along
a
pre-‐exis:ng
social
division
between
the
two
communi:es
in
the
network.
Drawn
from
the
Paper:
Zachary.
An
informa&on
flow
model
for
conflict
and
fission
in
small
groups.
Journal
of
Anthropological
Research
33,
1977.
Download
the
Data:
h]p://www-‐personal.umich.edu/~mejn/netdata/
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
25. Edge
Betweenness
Only
misclassifica:on
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
26. Edge
Betweenness
Betweenness
tends
to
get
the
big
picture
right.
However,
resolu:on
can
be
a
problem!
Do
not
draw
conclusions
about
small
communi:es
from
this
algorithm
alone.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
27. Modularity
•
e
is
the
number
of
edges
in
module
i
•
d
is
total
degree
of
ver:ces
in
module
i
•
m
is
the
total
number
of
edges
in
network
Q
is
difference
between
observed
connecJvity
within
modules
and
EV
for
the
configuraJon
model
(degree-‐distribuJon
fixed)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
28. Modularity
Remember
our
previous
discussion
on
computa:onal
complexity?
Modularity
maximiza:on
is
an
NP-‐hard
problem.
This
means
that
there
is
no
polynomial
representa:on
of
:me
complexity!
All
methods
therefore
try
to
solve
for
approximate
solu&ons.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
29. Modularity
Benjamin
H.
Good,
Yves-‐Alexandre
de
Montjoye
&
Aaron
Clauset,
The
Performance
of
Modularity
Maximiza:on
in
Prac:cal
Contexts,
Phys.
Rev.
E
81,
046106
(2010)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
30. Fast
Greedy
PublicaJon(s):
•
Newman.
Fast
algorithm
for
detec&ng
community
structure
in
networks.
Phys.
Rev.
E,
2004.
•
Clauset,
Newman,
Moore.
Finding
community
structure
in
very
large
networks.
Phys.
Rev.
E,
2004.
•
Wakita,
Tsurumi.
Finding
Community
Structure
in
Mega-‐scale
Social
Networks.
2007.
Basic
Idea:
Try
to
randomly
assemble
a
larger
and
larger
communi:es
from
the
ground
up.
Start
by
placing
each
vertex
in
its
own
community
and
then
combine
communi:es
that
produce
the
best
modularity
at
that
step.
Constraints:
•
Can
be
adapted
to
directed
edges
(no
public).
•
Can
be
adapted
to
weights
(igraph).
Time
Complexity:
O(|E||V|
log
|V|)
worst
case
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
31. Fast
Greedy
Fast-‐Greedy
also
tends
to
aggressively
create
larger
communi:es
to
the
detriment
of
smaller
communi:es.
Why
is
this
node
red
instead
of
blue?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
32. Leading
Eigenvector
PublicaJon(s):
•
Newman.
Finding
community
structure
in
networks
using
the
eigenvectors
of
matrices.
Phys.
Rev.
E,
2006.
•
Leicht,
Newman.
Community
structure
in
directed
networks.
Phys.
Rev.
Le].,
2008.
Basic
Idea:
Use
the
sign
on
the
components
of
the
leading
eigenvector
of
the
Laplacian
to
sequen:ally
divide
the
network.
Constraints:
•
Can
be
adapted
to
directed
edges
(no
public).
•
Can
be
adapted
to
weights
(igraph).
Time
Complexity:
O(|V|^2)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
33. Leading
Eigenvector
Note
that
eigenvector s
results
seem
to
split
the
difference
between
edge
betweenness
and
fast-‐greedy
in
this
case.
Why
are
these
nodes
not
a
part
of
the
larger
modules?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
34. Walktrap
PublicaJon(s):
Pons,
Latapy.
Compu&ng
communi&es
in
large
networks
using
random
walks.
JGAA,
2006.
Basic
Idea:
Simulate
many
short
random
walks
on
the
network
and
compute
pairwise
similarity
measures
based
on
these
walks.
Use
these
similarity
values
to
aggregate
ver:ces
into
communi:es.
Constraints:
•
Can
be
adapted
to
directed
edges
(igraph).
•
Can
be
adapted
to
weights
(igraph).
•
Can
alter
resolu:on
by
walk
length
(igraph).
Time
Complexity:
depends
on
walk
length,
O(|V|^2
log
|V|)
typically
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
35. Walktrap
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
36. Walktrap
Walktrap
assigns
ver:ces
to
different
communi:es
than
previous
algorithms.
Note
that
the
simulated
walk
length
can
be
changed
to
alter
resolu:on.
Furthermore,
simulaJon
is
stochasJc
and
thus
results
may
change
even
aVer
fixing
the
walk
length
and
input
graph!
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
37. Method
Comparison
Edge-‐Betweenness
Fast-‐Greedy
Walktrap
Leading
Eigenvector
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
38. Recommended
Sofware
-‐
igraph
•
Core
Library:
C
•
Interfaces:
Python,
R,
Ruby
•
Features:
Graph
opera:ons
&
algorithms,
random
graph
genera:on,
graph
sta:s:cs,
community
detec:on,
visualiza:on
layout,
ploqng
•
URL:
h]p://igraph.sourceforge.net/
•
Documenta:on:
h]p://igraph.sourceforge.net/documenta:on.html
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
40. Fron:ers
of
Community
Detec:on:
Temporal
Network
Dynamics
Gergely Palla, Albert-Laszlo Barabasi & Tamas Vicsek, Quantifying
Social Group Evolution, Nature 446:7136, 664-667 (2007)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
41. Fron:ers
of
Community
Detec:on:
Community
Structure
Over
Scales,
Time
Period,
etc.
Science 14 May 2010, Vol. 328. no. 5980,
pp. 876 - 878
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
42. Community
Detec:on
Review
Ar:cles
Some
Useful
Review
ArJcles:
Mason A. Porter, Jukka-Pekka Onnela and Peter J. Mucha. 2009.
Communities in Networks. Notices of the American Mathematical Society
56: 1082-1166.
Santo Forunato. 2010. Community detection in graphs. Physics Reports.
486: 75-174.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
43. A
Transi:on
to
Our
Sink
Method
Paper
— Now
we
are
going
to
transi:on
to
a
specific
project
-‐-‐-‐
where
we
apply
some
of
the
ideas
contained
herein
— Provide
a
very
brief
introduc:on
to
the
Exponen:al
Random
Graph
Models
(p*)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
44. Our
Sink
Paper
–Physica
A
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
45. Dynamic
Acyclic
Digraphs
— We
are
interested
in
conduc:ng
community
detec:on
in
the
special
case
of
dynamic
acyclic
digraphs
…
— Before
we
transi:on
to
the
full
presenta:on
–
some
background
— Dynamic
=
Changing
both
Locally
and
Globally
— Digraph
=
Directed
Graph
— Acyclic
=
No
cycles
because
current
documents
generally
cannot
cite
documents
in
the
future
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
46. Dynamic
Acyclic
Digraphs
Case
to
Case
Judicial
Cita:on
Networks
are
Dynamic
Acyclic
Digraphs
So
are
Academic
Cita:on
Networks,
Patents,
etc.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz