This document discusses Gephi, a modular software for exploring and manipulating dynamic network graphs. It describes Gephi's architecture, including its use of the Netbeans platform and modular design. The document outlines Gephi's dynamic network visualization and analysis capabilities, such as its dynamic API, timeline component, and handling of temporal network events. It also covers how dynamic network data can be imported into Gephi using formats like GEXF and via a streaming API, and provides examples of applications like social network and contact network analysis.
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases. This slide deck aims at familiarizing the user with Sqoop and how to effectively use it in real deployments.
The document discusses SAP BW/4HANA architecture archetypes and the transition to more agile data warehousing environments. It describes how SAP BW/4HANA architectures are evolving from traditional enterprise data warehouse approaches to more flexible, simplified architectures and hybrid models that support real-time data and virtual data marts. A case study of a large oil and gas company's implementation of SAP BW/4HANA is presented, which used a hybrid virtual data model with real-time data replication to HANA and virtual objects. Lessons learned emphasized the need for agile development methods and business ownership of the solution.
Get Mainframe Data to Snowflake’s Cloud Data WarehousePrecisely
Organizations are rapidly adopting the cloud data platform, Snowflake. Snowflake helps IT deliver insights to the business more quickly and at a lower cost than traditional data warehouses. In making that move, many companies find that they are missing highly-valued data from systems that are traditionally on-premises, such as the mainframe. Learn how the Syncsort Connect product family is helping IT save time and money getting mainframe data into Snowflake. View this webinar on-demand to:
• Understand common challenges with getting mainframe data into Snowflake and how to overcome them
• Where mainframe data can add value as a source for Snowflake
• A demo on how mainframe data can be integrated into Snowflake in 3-minutes or less using Syncsort Connect
This document discusses a presentation about WebLogic 12c and the WebLogic Management Pack. The presentation agenda includes discussing Fusion Middleware, WebLogic Server which is supported until 09/30/2017, and the WebLogic Management Pack which is supported until 12/31/2017. The document also includes questions to ask the audience about their use of WebLogic.
This document discusses Red Hat HA cluster configuration with Pacemaker. It describes the core components of Red Hat clusters including the cluster infrastructure, high availability service management, administration tools, and third party components. It also covers key aspects of cluster configuration like cluster management, lock management, fencing, and using Pacemaker to create the cluster, add fencing devices, and define resources.
Hadoop Hand-on Lab: Installing Hadoop 2IMC Institute
This document is the agenda for a hands-on workshop on Big Data using Hadoop. It includes an introduction to Big Data concepts, the Hadoop ecosystem, and instructions for installing Hadoop on an Amazon EC2 virtual server in pseudo-distributed mode. The workshop agenda covers launching an EC2 instance, installing Java, downloading and extracting Hadoop, configuring Hadoop, formatting the namenode, and starting the Hadoop processes.
"Changing Role of the DBA" Skills to Have, to Obtain & to Nurture - Updated 2...Markus Michalewicz
The ever-changing IT industry requires DBA's to keep their skills up-to-date. This presentation discusses skills that any DBA should have, but also those that any DBA should obtain and nurture regardless of which new technology is entering the (Gartner) hype cycle. The first ever version of this deck was presented during Sangam18 under the title "(Oracle) DBA Skills to Have, to Obtain and to Nurture" and used in other occasions during 2019. It was subsequently enhanced to a more generic 2019 version, which included an outlook for 2020! This edition of the presentation maintains the generic character, but has been updated to reflect unprecedented changes in 2020 and to cover the latest Oracle technology, to provide a 3-year comparison as well as trends analysis.
Note that the link on slide 25 in the subtitle should have been: https://go.oracle.com/DBA
Converged Media Success: Setting the Stage with Content StrategyRebecca Lieb
Content marketing and converged media: setting strategy, gauging maturity and preparing for converged media workflows. Rebecca Lieb's keynote from Spredfast Social Summit 2012
DOAG Oracle Unified Audit in Multitenant EnvironmentsStefan Oehrli
Oracle Audit is a well-known and proven database functionality. Or maybe not? What does auditing look like in combination with Oracle Multitenant Databases? Does database and Unified Audit work analogous to existing configurations? In the context of this presentation the auditing in the environment of container databases will be examined more closely. It will be shown what has to be considered and how an auditing concept has to be adapted to the new architecture. With focus on the current versions of the Oracle database, specific problems and workarounds in the area of Unified Audit will be shown. The presentation will be complemented by corresponding examples and live demos.
The document discusses Oracle Optimized Solutions, which are predefined solutions that integrate Oracle's servers, storage, networking and software components. This provides benefits such as lower costs, reduced risk, and improved business agility compared to custom configured systems. Specific optimized solutions are described for applications like Siebel CRM, PeopleSoft HCM and E-Business Suite that deliver high performance, availability and reduced costs.
A duplicate (clone or snapshot) database is useful for a variety of purposes, most of which involve testing &
upgrade
• You can perform the following tasks in a duplicate database:
• Test backup and recovery procedures
• Test an upgrade to a new release of Oracle Database
• Test the effect of applications on database performance
• Create a standby database (Dataguard) with DG Broker
• Leverage on Transient Logical Standby to perform an upgrade
• Generate reports
This document provides an introduction to storage concepts and the history of disk and tape storage. It discusses how storage has evolved from the earliest mainframes using punched cards and magnetic tape, to the introduction of disk drives and disk arrays. The key developments covered include the transition from tape to disk drives for faster direct access storage, the benefits of RAID technology for performance and redundancy, and how storage architectures continue advancing with higher capacity and faster disks.
The document discusses new features in SAP HANA SPS09 related to backup and recovery. Key points include:
- New alerts notify administrators of issues with log backups or when automatic log backups are disabled. Thresholds for these alerts can be configured.
- A storage snapshot alert notifies when a snapshot is prepared but not confirmed for a long period.
- The backup configuration screen now has a save button to save changes.
- Recovery to a specific backup allows choosing the backup from the catalog instead of specifying the location.
- Fewer log backups are needed for recovery as it now uses the commit time stamp.
- The SQL statement for a recovery can now be displayed.
White Paper: EMC Isilon OneFS — A Technical Overview EMC
This white paper provides an in-depth look at the major components of the Isilon OneFS operating and file system and details the hardware, software, distributed architecture, and various data protection mechanisms.
IBM Spectrum Scale for File and Object StorageTony Pearson
This document discusses IBM Spectrum Scale, which provides universal access to files and objects across data centers. It can scale to support up to 18 quintillion files per file system and 256 file systems per cluster. IBM Spectrum Scale provides high performance, proven reliability, and flexible access to data through various file and object protocols. It can be deployed as software on various systems, as pre-built systems, or as cloud services. The document outlines the various capabilities and uses of IBM Spectrum Scale, such as file management policies, caching, encryption, protocol servers, integration with Hadoop and backup/disaster recovery.
The document discusses the differences between network attached storage (NAS) and storage area network (SAN) solutions for small businesses. It outlines the key benefits and use cases of each technology. NAS is best for file sharing and backup, while SAN provides faster performance for databases and applications. The document also notes that a combination of NAS and SAN can provide the best of both worlds.
Practical experiences and best practices for SSD and IBM iCOMMON Europe
This document discusses using solid state drives (SSDs) to optimize performance on IBM Power Systems. It provides examples showing that SSDs can significantly reduce batch window times, lower response times for transactions, and accelerate analytics. SSDs offer much faster read/write speeds than hard disk drives (HDDs) and can cut batch windows by 40-50%. For queries on data warehouses, SSDs deliver reductions of 60-87% in processing time. SSDs also allow for 85% faster data building and compression for deep analytics. Mixed SSD/HDD configurations with SSDs holding hot data can deliver the best price/performance.
SQL Server High Availability and Disaster RecoveryMichael Poremba
High availability and disaster recovery strategies for Microsoft SQL Server databases are discussed. Key points include:
1) High availability aims to minimize downtime through redundant components and automatic failover, while disaster recovery protects against total data center outage through redundant systems and facilities.
2) Various SQL Server high availability options are examined, including database mirroring, log shipping, and failover clustering, each with different capabilities like automatic failover speed and hardware requirements.
3) Disaster recovery focuses on having a redundant system in a separate location that can be switched over to if the primary system fails. It requires strategies for backup, offsite storage, and recovery of data at the redundant location.
Interactive visualization and exploration of network data with gephiBernhard Rieder
Presentation for a workshop given at the Centre for Interdisciplinary Methodologies at Warwick University on May 9 2013. Focuses on conceptual and historical questions. Comments, references, and explanations are in the notes.
Introduction to Network Analysis in GephiVictor Blaer
This document provides an introduction to network analysis and visualization using Gephi software. It begins with some mindset tips, such as expecting troubleshooting challenges and iterative work. The document then outlines Gephi's interface, including the Overview panel for interactive exploration, Data Laboratory for raw data, and Preview panel for final visualization export. Basic network concepts like nodes, edges, and layout algorithms are defined. The rest of the document demonstrates how to import node and edge data from CSV files, visualize the resulting network, run statistics and filtering, and tweak the layout for clearer presentation. Examples using real-world Twitter data are also briefly mentioned.
This document provides an introduction to visualizing social networks using the software Gephi. It discusses drawing one's own network, different ways of visualizing networks through layout algorithms, and measures for analyzing networks globally, positionally, and locally. It then gives an overview of the Gephi interface and how to import node and edge attribute data, filter, rank, partition, label nodes, adjust layouts, and export visualizations. The document is intended as a tutorial for getting started with the basics of social network analysis and visualization in Gephi.
SP1: Exploratory Network Analysis with GephiJohn Breslin
ICWSM 2011 Tutorial
Sebastien Heymann and Julian Bilcke
Gephi is an interactive visualization and exploration software for all kinds of networks and relational data: online social networks, emails, communication and financial networks, but also semantic networks, inter-organizational networks and more. Designed to make data navigation and manipulation easy, it aims to fulfill the complete chain from data importing to aesthetics refinements and interaction. Users interact with the visualization and manipulate structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypotheses, intuitively discover patterns or errors in large data collections.
In this tutorial we will provide a hands-on demonstration of the essential functionalities of Gephi, based on a real case scenario: the exploration of student networks from the "Facebook100" dataset (Social Structure of Facebook Networks, Amanda L. Traud et al, 2011). The participants will be guided step by step through the complete chain of representation, manipulation, layout, analysis and aesthetics refinements. Particular focus will be put on filters and metrics for the creation of their first visualizations. They will be incited to compare the hypotheses suggested by their own exploration to the results actually published in the academic paper afterwards. They finally will walk away with the practical knowledge enabling them to use Gephi for their own projects. The tutorial is intended for professionals, researchers and graduates who wish to learn how playing during a network exploration can speed up their studies.
Sébastien Heymann is a Ph.D. Candidate in Computer Science at Université Pierre et Marie Curie, France. His research at the ComplexNetworks team focuses on the dynamics of realworld networks. He leads the Gephi project since 2008, and is the administrator of the Gephi Consortium.
Julian Bilcke is a Software Engineer at ISC-PIF (Complex Systems Institute of Paris, France). He is a founder and a developer for the Gephi project since 2008.
Gephi Plugin Devleoper Workshop, October 6, 2011 in Mountain View, California.
Presentation of Gephi's architecture and the different types of plugins that can be written with examples. Details about Gephi's API, code examples and best practices are presented. The Gephi Toolkit is also covered.
The document discusses social network analysis using the Gephi software tool. It defines social networks and explains that social network analysis studies network structure and behavior. It also defines networks and graphs. The document then explains Gephi's functions for analyzing networks through measures like degree centrality, community detection algorithms, and network visualization capabilities. It provides steps for importing a network into Gephi and using functions like layout, ranking, filtering, and labeling to visualize communities and analyze the network.
The document discusses the formation of the Gephi Consortium to advance the open-source Gephi network analysis platform. The consortium aims to build reusable parts of Gephi, improve the technology at low cost, and create interoperability standards. It will provide a legal structure for the community and infrastructure for research and development efforts to build generic parts of Gephi through fund raising and community support.
This document provides instructions for using different layout algorithms in Gephi to visualize networks. It discusses installing layout plugins, importing a graph file, running initial layouts like Force Atlas to view the network structure, adjusting layout properties, and using different layouts to emphasize various network features. Various layout algorithms like Force Atlas, Fruchterman-Reingold, OpenOrd, and GeoLayout are introduced.
Gephi Toolkit Developer Tutorial.
The Gephi Toolkit project package essential modules (Graph, Layout, Filters, IO...) in a standard Java library, which any Java project can use for getting things done. The toolkit is just a single JAR that anyone could reuse.
This tutorial introduce the project, show possibilities and code examples to get started.
The document discusses the development of Gephi, an open-source network analysis and visualization software. It describes Gephi starting as a student project with 1 user, and growing to have 5,000 regular users through a three step process: 1) initial research and prototyping, 2) innovating for early supporters, and 3) dominating the market. It provides timelines and details for each step. The document also provides examples of how Gephi has enabled novel products and analyses for applications like mapping innovation, adverse drug reactions, social gaming monitoring, and more.
1. The document discusses future directions for software engineering research, including tools to support "citizen scientists" and proposed services for next-generation data repositories.
2. It suggests that data mining tools could provide more services beyond data repositories, such as supporting verification, compression, privacy, and streaming of data.
3. The talk outlines several topics, including software tools for citizen scientists, issues around decision software, and lessons learned regarding certification envelopes, goals, locality, and the need for repair and verification tools.
WSO2 Machine Learner takes data one step further, pairing data gathering and analytics with predictive intelligence: this helps you understand not just the present, but to predict scenarios and generate solutions for the future.
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
RAMP approach to analytics: Rapid Analytics and Model Prototyping; collaborative data challenges with in-built data science process management tools and analytics; An observatory of data science and scientists. Presented at the Design Theory Special Interest Group of International Design Society. Mines ParisTech and Centre for Data Science.
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
Walks through a couple of KNIME Workflows for working with HTS Data.
The workflows are derived from the work described in this publication: https://f1000research.com/articles/6-1136/v2
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...Neo4j
The document describes a novel graph modeling framework for determining feature importance in unsupervised learning. It proposes converting datasets into directed graphs and applying a modified PageRank algorithm to rank features based on their importance. The approach involves 7 steps: 1) converting data to a directed graph, 2) calculating node ranks with PageRank, 3) rebuilding the graph based on ranks, 4) iterating this process and tracking ranks, 5) summarizing ranks, 6) sorting ranks, and 7) outputting ranked features. The approach is validated on several datasets and shown to produce similar feature importance rankings as supervised learning methods. Potential applications include knowledge graphs, disease progression modeling, and disaster recovery system analysis.
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
The document discusses a case study of using KNIME workflows to analyze a hit list from a high-throughput phenotypic screen for malaria in a reproducible and interactive manner. It describes using workflows to clean up the hit list by applying filters and selecting compounds for validation in a way that provides coverage of chemical space while also learning structure-activity relationships from the results. The workflows demonstrate how KNIME can help address common data analysis problems like repeatability, using multiple tools and data sources, and deploying and collaborating on analyses.
Moving from Artisanal to Industrial Machine LearningGreg Landrum
This document summarizes Greg Landrum's presentation on moving machine learning from an artisanal to industrial process. The presentation discusses using the CRISP-DM process to build predictive models for bioactivity in a reproducible way. Two datasets with different numbers of active compounds are used to illustrate modeling workflows in KNIME. The models achieve good accuracy but poor kappa scores due to class imbalance. Adjusting the decision threshold for predictions is shown to improve kappa scores substantially. The artisanal approach of tuning thresholds is presented as a way to improve models for imbalanced data in an industrial setting.
1) The document discusses improving decision making support by linking database results to simulations.
2) Currently, decision making involves disjointed tools like spreadsheets and simulations that do not interoperate.
3) The author proposes a new language called SimQL, similar to SQL, that would allow querying and linking simulations in the same way databases are currently used.
4) This would help merge the disciplines of databases and simulations to better support decision making by providing access to simulations and future predictions from within database systems.
Gio Wiederhold proposes linking database results to simulations to better support decision-making. Currently, databases provide information about the past while decision-makers consider future possibilities using separate tools. Wiederhold suggests databases and simulations be integrated using a language called SimQL, similar to SQL, to estimate future outcomes. This would allow seamless access to simulations from databases, merging past data with predicted futures to improve decisions. Several research questions around language design and managing uncertainty over time remain.
6 Steps To Awesome - Coosto @DevOnSummit March 2018Arjen de Ruiter
The document outlines 6 steps to make a system awesome again: 1) Focus on all strategic directions in parallel instead of just a few projects, 2) Give teams all necessary skills and ownership to limit dependencies, 3) Move towards independent microservices, 4) Provide teams autonomous technology, 5) Work on many stories in parallel with little risk, and 6) Create resilient services with agreed upon principles. The goal is to scale the system through parallelism and independence while delivering continuous improvements.
Best pratices at BGI for the Challenges in the Era of Big Genomics DataXing Xu
BGI is the world's largest genome sequencing center, with over 150 sequencers and a sequencing throughput of 6 TB per day. It also has the largest computing and storage center for genomics in China, with over 20,000 CPU cores, 19 GPUs, 220+ teraflops of peak performance, and 17 petabytes of data storage. BGI faces challenges from the exponential growth of genomic data, complex data analysis processes, and widely distributed data. It addresses these challenges through solutions like high-speed data transfer, cloud computing platforms like EasyGenomics, and distributed algorithms and infrastructure using Hadoop and GPU acceleration.
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...tiberiusp
Scaling up economics models to run on large input sizes, complex market and agent model settings, and on big computational resource pools is a demanding feat.
This presentation tells you what it takes to work as a computational economist.
This presentation provides an overview and introduction to using the Neuroinformatics Platform and XNAT database. It discusses the goals of the platform in standardizing data sharing and storage. It then demonstrates how to navigate the XNAT interface, create projects and user accounts, upload common file types like MRI and EEG data, run quality control pipelines, and access data processing tools. The overall aim is to familiarize users with the basic functions and best practices for working with the Neuroinformatics Platform.
Adapting Lean to Six Sigma DMAIC Flow with Matt Hansen at StatStuffMatt Hansen
This document describes how Lean tools and concepts can be adapted to the Six Sigma DMAIC methodology for problem resolution. It provides examples of specific Lean tools that can be used at each stage of the DMAIC process, including value stream mapping in the Measure phase, identifying waste and non-value-added steps in the Analyze phase, developing a new process map in the Improve phase, and using control charts and standard operating procedures in the Control phase. The document emphasizes that Lean problem resolution follows the same basic 5 steps as DMAIC and provides a framework for integrating Lean thinking into a Six Sigma improvement project.
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...Databricks
This document discusses machine learning for streaming data and summarizes the Streaming Random Forest algorithm and its implementation in Spark and StreamDM. It begins with an introduction to the speaker and definitions of key concepts like data streams, batch vs streaming data, and concept drift. It then describes the Streaming Decision Tree algorithm and how Streaming Random Forest extends it with online bagging and random feature selection. The document demonstrates implementations in MLLib and StreamDM and shows results on electricity and covertype datasets. It concludes with future directions for StreamDM, algorithms, and performance improvements.
Formal Methods for Dependable Neural Networks Chih-Hong Cheng
This document discusses how formal methods can help certify dependable neural networks and ensure safety. It summarizes three in-house projects at fortiss GmbH aimed at using formal methods: 1) nn-verifier, a tool that uses constraint programming to formally verify properties of neural networks, 2) Formal synthesis of runtime monitors from specifications to constrain neural network outputs, and 3) Research towards understanding neural networks through formal verification and certification approaches analogous to standards like DO-178C. The goal is applying formal methods to analyze neural networks and guarantee properties like safety.
PhD Defense: Analyse exploratoire de flots de liens pour la détection d'événe...Sébastien
Link streams represent traces of complex systems’ activities over time, in which links appear when two system entities interact with each other; the aggregation of entities (i.e. nodes) and links is a graph. These traces have become strategic datasets in the last few years for analyzing the activity of large-scale complex systems, involving millions of entities, e.g. mobile phone networks, social networks, or the Internet.
This thesis deals with the exploratory analysis of link streams, in particular the characterization of their dynamics and the identification of anomalies over time (called events). We propose an exploratory framework involving statistical methods and visualization, with no hypothesis about data. The detected events are statistically significant and we propose a method to validate their relevance. We finally illustrate our methodology on the evolution of Github online social network, on which hundred thousands of developers contribute to open source software projects.
Monitoring User-System Interactions through Graph-Based Intrinsic Dynamics An...Sébastien
The document discusses monitoring user interactions with systems through analyzing the intrinsic dynamics of bipartite graphs representing those interactions. The researchers collected a dataset of over 2 million interactions between 336,000 GitHub users and repositories over 4 months. They model the interactions as a time-evolving bipartite graph and compute properties of the graph, like the ratio of internal links, over sliding windows to identify anomalous events. Their goal is to automatically detect anomalies in interactions through this graph-based approach.
The document discusses exploratory network analysis using the open-source network visualization platform Gephi. It describes how Gephi allows users to interact with network data at different levels (micro, macro, dimensions) and time scales to extract meaningful patterns and structural properties. "Zoom", "Crossing" and "Timeline" cursors are introduced as ways to analyze quantitative, qualitative and temporal network data. Examples of using Gephi to analyze collaborations, themes, actors and territories in a network are provided. Key aspects of Gephi like extensibility and the nonprofit community behind it are summarized.
Why contribute? “I did it for teh lulz” R. Stallman
Most of Free/Open Source Software (FOSS) developers are not paid to contribute, so why do they work anyway? In this talk, we’ll investigate the motivations of individual contributors. We’ll put them in perspective with recent studies on motivations and communities of practice. In particular, we’ll see that distinguishing internal vs external incentives is a key to understand why FOSS communities are able to attract and keep contributors around the production of a software…
Presented at http://fossa.inria.fr/fr/program/community
Dec 6, 2012
Outskewer: Using Skewness to Spot Outliers in Samples and Time SeriesSébastien
The document discusses a method called Outskewer for detecting outliers in datasets. It uses skewness, a measure of the asymmetry of a probability distribution, to identify outliers. Outskewer analyzes how the skewness coefficient changes as extreme values are sequentially removed from the dataset. If skewness decreases with removals, the extreme values were outliers that skewed the distribution. The method requires no assumptions about the underlying data distribution. It is not effective on distributions where extreme values are common, like power law distributions.
The Diseasome website explores relationships between human diseases and genes. It uses a dataset mapping the connections between 526 diseases and 903 genes. Users can interactively view this disease-gene network through an online map, search and filter nodes, and access related documents. The website aims to provide an intuitive knowledge discovery experience and promote network-based exploration of biomedical data and documents.
Tour d'horizon des personnes morales adhérentes à l'APRILSébastien
Ce diaporama présente la position des sites des personnes morales adhérentes à l'APRIL dans un graphe du Web. La totalité de l'étude est consultable sur http://web-mining.fr .
Tour d'horizon des personnes morales adhérentes à l'APRIL
Gephi : dynamic features
1. cnrs - upmc laboratoire d’informatique de paris 6
Gephi and network dynamics:
technology and applications
S´bastien Heymann
e
ISCN Dynamic Network Day 2012
24 mai 2012
3. cnrs - upmc laboratoire d’informatique de paris 6
Notions of dynamics
Generally, softwares use the notion of snapshot : state of the
graph at each moment.
Example: Stanford SoNIA (Skye Bender-deMoll and McFarland, Daniel A. (2006) ”The Art and
Science of Dynamic Network Visualization.” Journal of Social Structure. Volume 7, Number 2)
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
3/24
4. cnrs - upmc laboratoire d’informatique de paris 6
Notions of dynamics in Gephi
• no snapshot.
• but ”lifetime” of nodes, edges and attributes.
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
4/24
5. cnrs - upmc laboratoire d’informatique de paris 6
Temporal Intervals
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
5/24
6. cnrs - upmc laboratoire d’informatique de paris 6
Sliding window
0 1 2 3 TICKS
WINDOW
0 1 2 3 4 5 6 TIME
TIMELINE INTERVAL
0 1 2 3 TICKS
WINDOW
0 1 2 3 4 5 6 TIME
TIMELINE INTERVAL
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
6/24
8. cnrs - upmc laboratoire d’informatique de paris 6
Gephi : modular architecture
Stand-alone application or Java library (Gephi Toolkit)
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
8/24
9. cnrs - upmc laboratoire d’informatique de paris 6
Netbeans Platform
”The NetBeans Platform is a generic framework for Swing
applications. It provides the ’plumbing’ that, before, every
developer had to write themselves”
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
9/24
10. cnrs - upmc laboratoire d’informatique de paris 6
Gephi : modules
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
10/24
11. cnrs - upmc laboratoire d’informatique de paris 6
Dynamic API
API dedicated to dynamic network states and events. Browsing
dynamic networks uses the Timeline component and defines a
”visible interval” (i.e. a sub-graph). This API is responsible for
holding and modifying that value.
• Retrieve/Set the current visible interval
• Get the current time format (date, double, datetime)
• Create DynamicGraph, a utility class to apply a sliding
window on a dynamic graph.
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
11/24
12. cnrs - upmc laboratoire d’informatique de paris 6
Dynamic statistics
• select the size of the sliding window
• select the progression step
• # nodes, # edges, degree, clustering coefficient
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
12/24
13. cnrs - upmc laboratoire d’informatique de paris 6
Timeline
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
13/24
14. cnrs - upmc laboratoire d’informatique de paris 6
Timeline animation
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
14/24
15. cnrs - upmc laboratoire d’informatique de paris 6
Sparklines and intervals of existence
for the dynamic attributes
Existence, color and size of nodes updated in real-time in the
visualization.
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
15/24
16. cnrs - upmc laboratoire d’informatique de paris 6
Data import
• Excel spreadsheet with columns ”start” and ”end”.
• Database with columns ”start” and ”end”.
• Graph file in GEXF.
• Stream of network events through the Graph Streaming API.
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
16/24
17. cnrs - upmc laboratoire d’informatique de paris 6
GEXF
• GEXF is an format XML.
• Standard promoted by the Gephi Consortium.
• Specifications started in 2007, stable version Dec. 2010
• Topology, attributes, hierarchy, phylogeny, dynamics (intervals
open/closed, time periods)
• Extensible via namespaces
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
17/24
18. cnrs - upmc laboratoire d’informatique de paris 6
Stream of events
HTTP server provided by the GraphStreaming plugin. Events:
• an: Add node
• cn: Change node
• dn: Delete node
• ae: Add edge
• ce: Change edge
• de: Delete edge
Exemple: add node A (JSON format)
{”an”:{”A”:{”label”:”Node A”,”size”:2}}}
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
18/24
20. cnrs - upmc laboratoire d’informatique de paris 6
Applications
• Temporal evolution of the blogosphere.
• Contact network (SocioPatterns.org/datasets).
• Document mining (Quid, Inc.).
• Visualisation of Twitter (RT or #, e.g. the Royal Wedding).
• Real-time crawl.
• Others, e.g. source code evolution.
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
20/24
21. cnrs - upmc laboratoire d’informatique de paris 6
Face-to-face contacts
SocioPatterns.org (Alain Barrat, Ciro Cattuto et
al.)
J. Stehl´ et al. High-Resolution Measurements of
e
Face-to-Face Contact Patterns in a Primary
School. PLoS ONE 6(8): e23176
Network of contacts aggregated over the first day.
S´bastien Heymann — Gephi Dynamics — 24 mai 2012
e
21/24