This document discusses trends driving the emergence of NoSQL databases and provides an overview of NoSQL. The key trends include: (1) rapidly increasing data set sizes, (2) greater connectivity of data, and (3) more semi-structured and decentralized content. These trends have challenged the performance and architecture of traditional relational databases. NoSQL databases emerged in response and come in four main categories: key-value stores, BigTable clones, document databases, and graph databases. Each has a different data model suited to different types of use cases. Looking ahead, the best approach may be "polyglot persistence" using both SQL and NoSQL solutions.
Neo4j - The Benefits of Graph Databases (OSCON 2009)Emil Eifrem
This document discusses the benefits of graph databases like Neo4j compared to relational databases. It notes that data is becoming more connected and semi-structured, and graph databases can better represent these types of connections. It provides examples of building a node space and traversing relationships in Neo4j using code snippets. It also discusses scaling Neo4j through replication and partitioning. In summary, graph databases like Neo4j are well-suited for complex, connected data and can provide better performance than relational databases for network and semantic queries.
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
Presentation given at nosql east 2009 in Atlanta. Introduces the NOSQL space by offering a framework for categorization and discusses the benefits of graph databases. Oh, and also includes some tongue-in-cheek party poopers about sucky things in the NOSQL space.
The document discusses the rise of graph databases and their benefits over traditional SQL databases. It notes four trends driving growth in data size, connectivity, semi-structured data, and decoupled architectures that have led to the rise of NoSQL databases including key-value, column-oriented, document, and graph databases. It provides an overview of the graph database model, which represents data as nodes and relationships, and an example using the graph database Neo4j.
This document discusses Grails integration with Neo4j graph databases. It begins with an introduction to graph databases and Neo4j. It then covers the Grails Neo4j plugin which allows using Neo4j as the persistence layer for Grails domain classes. Finally, it addresses some challenges in mapping the Grails domain model to the Neo4j nodespace and potential solutions.
This document discusses trends driving the emergence of NoSQL databases and provides an overview of NoSQL. The key trends include: (1) rapidly increasing data set sizes, (2) greater connectivity of data, and (3) more semi-structured and decentralized content. These trends have challenged the performance and architecture of traditional relational databases. NoSQL databases emerged in response and come in four main categories: key-value stores, BigTable clones, document databases, and graph databases. Each has a different data model suited to different types of use cases. Looking ahead, the best approach may be "polyglot persistence" using both SQL and NoSQL solutions.
Neo4j - The Benefits of Graph Databases (OSCON 2009)Emil Eifrem
This document discusses the benefits of graph databases like Neo4j compared to relational databases. It notes that data is becoming more connected and semi-structured, and graph databases can better represent these types of connections. It provides examples of building a node space and traversing relationships in Neo4j using code snippets. It also discusses scaling Neo4j through replication and partitioning. In summary, graph databases like Neo4j are well-suited for complex, connected data and can provide better performance than relational databases for network and semantic queries.
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
Presentation given at nosql east 2009 in Atlanta. Introduces the NOSQL space by offering a framework for categorization and discusses the benefits of graph databases. Oh, and also includes some tongue-in-cheek party poopers about sucky things in the NOSQL space.
The document discusses the rise of graph databases and their benefits over traditional SQL databases. It notes four trends driving growth in data size, connectivity, semi-structured data, and decoupled architectures that have led to the rise of NoSQL databases including key-value, column-oriented, document, and graph databases. It provides an overview of the graph database model, which represents data as nodes and relationships, and an example using the graph database Neo4j.
This document discusses Grails integration with Neo4j graph databases. It begins with an introduction to graph databases and Neo4j. It then covers the Grails Neo4j plugin which allows using Neo4j as the persistence layer for Grails domain classes. Finally, it addresses some challenges in mapping the Grails domain model to the Neo4j nodespace and potential solutions.
The document provides an outline for a presentation on graph-based data models. It introduces some key concepts about graphs and how they are used to model real-world interconnected data. It discusses how early adopters of graph technologies grew by focusing on data relationships. The document also covers graph data structures, graph databases, and graph query languages like Cypher and Gremlin.
This document provides an overview of graph databases. It discusses how graph data is naturally represented as nodes connected by edges, unlike relational databases which require joins. Graph databases allow for fast traversal of connected data and enable querying connected subgraphs. Popular graph database models include property graphs and RDF triple stores. Neo4j is introduced as a widely used graph database management system that uses labels, properties, relationships, and Cypher query language.
Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large scale knowledge graphs in order to facilitate applications like link prediction, knowledge base completion, and question answering.
Most machine learning approaches, which scale horizontally (i.e. can be executed in a distributed environment) work on simpler feature vector based input rather than more expressive knowledge structures.
On the other hand, the learning methods which exploit the expressive structures, e.g. Statistical Relational Learning and Inductive Logic Programming approaches, usually do not scale well to very large knowledge bases owing to their working complexity.
This talk gives an overview of the ongoing project Semantic Analytics Stack (SANSA) which aims to bridge this research gap by creating an out of the box library for scalable, in-memory, structured learning.
Performance of graph query languages:
Analysis on theperformance of graph querylanguages: comparative study of Cypher, Gremlin and native access in Neo4j
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
Today, we count more than 10,000 datasets made available online following Semantic Web standards.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large-scale knowledge graphs in order to facilitate applications in various domains including life sciences, publishing, and the internet of things.
The main objective of this thesis is to lay foundations for efficient algorithms performing analytics, i.e. exploration, quality assessment, and querying over semantic knowledge graphs at a scale that has not been possible before.
First, we propose a novel approach for statistical calculations of large RDF datasets, which scales out to clusters of machines.
In particular, we describe the first distributed in-memory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark.
Many applications such as data integration, search, and interlinking, may take full advantage of the data when having a priori statistical information about its internal structure and coverage.
However, such applications may suffer from low quality and not being able to leverage the full advantage of the data when the size of data goes beyond the capacity of the resources available.
Thus, we introduce a distributed approach of quality assessment of large RDF datasets.
It is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data.
Based on the knowledge of the internal statistics of a dataset and its quality, users typically want to query and retrieve large amounts of information.
As a result, it has become difficult to efficiently process these large RDF datasets.
Indeed, these processes require, both efficient storage strategies and query-processing engines, to be able to scale in terms of data size.
Therefore, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets by translating SPARQL queries into Spark executable code.
We conducted several empirical evaluations to assess the scalability, effectiveness, and efficiency of our proposed approaches.
More importantly, various use cases i.e. Ethereum analysis, Mining Big Data Logs, and Scalable Integration of POIs, have been developed and leverages by our approach.
The empirical evaluations and concrete applications provide evidence that our methodology and techniques proposed during this thesis help to effectively analyze and process large-scale RDF datasets.
All the proposed approaches during this thesis are integrated into the larger SANSA framework.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
This presentation is a review of the NoSQL spaces I did for the X Jornades de Programari Lliure in Barcelona.
You will see a complete review of the NoSQL movement, use cases, technology review, an special review of what are the Graph Databases. And more....
Special thanks to @Hagenburger, @sbitxu, @jannis and the inspiration of the big @jimwebber and the amazing community.
using Spring and MongoDB on Cloud FoundryJoshua Long
This talk introduces how to build MongoDB applications with Spring Data MongoDB on Cloud Foundry. Spring Data provides rich support for easily building applications that work on multiple data stores.
Graph databases are a type of NoSQL database that is optimized for storing and querying connected data and relationships. A graph database represents data in graphs consisting of nodes and edges, where the nodes represent entities and the edges represent relationships between the entities. Graph databases are well-suited for applications that involve complex relationships and connected data, such as social networks, knowledge graphs, and recommendation systems. They allow for flexible querying of relationships and connections via graph traversal operations.
The document describes mm-ADT, a proposed multi-model abstract datatype that aims to provide a universal data structure, processing model, and instruction set that can support various database models like graph, document, relational etc. in a common framework. Key goals are to release a stable mm-ADT specification, compiler and virtual machine, as well as a basic reference implementation by early 2020. The presentation focuses on the universal data structure component, describing how mm-ADT would define custom datatypes, instances, access paths and more using a bytecode language.
Ivan Herman - Semantic Web Activities @ W3Csssw2012
This document summarizes Ivan Herman's presentation on the Semantic Web and Linked Data at the W3C Business Group on Oil, Gas, and Chemicals meeting in Houston on February 13, 2012. The presentation covered topics such as using ontologies and vocabularies to structure and integrate data on the web, technologies like SPARQL and RDFa, converting relational databases to RDF, and ongoing work at the W3C on standards like RDF 1.1 and the Linked Data platform.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
The document discusses digital worlds and applications at both the enterprise and national scales in the United States healthcare system. It notes the massive scale of healthcare data sources, including hundreds of thousands of healthcare offices and databases containing information on hundreds of millions of patients. The critical importance of making sense of this vast amount of heterogeneous healthcare data to improve human lives and health outcomes is also emphasized.
The Business Model Canvas is a strategy and innovation tool to visualize, challenge, and invent business models.
It is outlined in the book Business Model Generation http://www.businessmodelgeneration.com
Web navigation systems for information seeking (updated in Feb 2015)Jack Zheng
- User experience, information architecture, and web navigation
- An evaluation framework to examine major web navigation systems from a human information behavior and user interface perspective.
- Advantages and weaknesses of major types of web navigation system.
This document outlines the typical lifecycle of a software project from planning and analysis through deployment, maintenance, and evaluation. It breaks the process down into key phases including planning, analysis, design, build, testing, deployment, and maintenance. The goal is to show how a system is defined, designed, built, tested, installed, supported, and improved over its lifetime from conception through final evaluation by customers.
This document provides an overview of the IT 4983 Capstone Project course at Kennesaw State University from 2012-2017. It describes how students work in teams on real-world IT projects for external clients over 3 months. Example project types include application development, system implementation, analysis/research, and hybrids. It highlights several featured projects and client feedback, demonstrating how the course provides valuable industry experience for students.
Mobile Web Overview https://www.edocr.com/v/k52p5vj4/Jack Zheng
This document provides an overview of mobile web development. It discusses trends in mobile usage, definitions of mobile web and applications, options for developing mobile content like native, web and hybrid apps. It also covers strategies for mobile websites like responsive design and considerations for mobile design like touch interfaces. Development tools, frameworks and best practices for mobile web are also mentioned.
The document provides an overview of the evolution and trends of web technologies and applications. It discusses the key stages in the evolution of the web from pre-web to modern mobile web. These stages include the early/simple web using static HTML, the dynamic web enabled by server-side processing, the web as a platform supported by mature frameworks, and advances like Web 2.0 and responsive design for mobile. It also covers fundamental technologies, components, servers, processing capabilities, and trends that have shaped the landscape of web development.
This document discusses information systems and their components. It describes the inputs, outputs, boundaries, environments, and purposes of systems. It provides examples of journal publishing review systems and KSU's student information system as complex systems composed of interconnected sub-systems. The document also includes links to Wikipedia pages about systems, systems thinking, and information systems for additional reference.
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...Jack Zheng
Traditional perceptual maps are created using scatter charts or quadrant diagrams, which are based on two dimensions (X and Y axes). Then data items are plotted on the plane based on their values for the two attributes.
The multidimensional perceptual map does not rely on the definition of any fixed axes. The map is composed of smaller areas (cells), which are characterized by a vector of values that represent multiple attributes (dimensions). The positioning of data items in the map is determined by its calculated measure (usually Euclidean distance) again each cell. An unsupervised clustering technique called Self-Organizing Map (SOM) is used to generate such maps.
The multidimensional perceptual map ca be used in many areas including project portfolio management, project prioritization, marketing research, product evaluation, performance management, portfolio management, etc.
The document provides an outline for a presentation on graph-based data models. It introduces some key concepts about graphs and how they are used to model real-world interconnected data. It discusses how early adopters of graph technologies grew by focusing on data relationships. The document also covers graph data structures, graph databases, and graph query languages like Cypher and Gremlin.
This document provides an overview of graph databases. It discusses how graph data is naturally represented as nodes connected by edges, unlike relational databases which require joins. Graph databases allow for fast traversal of connected data and enable querying connected subgraphs. Popular graph database models include property graphs and RDF triple stores. Neo4j is introduced as a widely used graph database management system that uses labels, properties, relationships, and Cypher query language.
Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large scale knowledge graphs in order to facilitate applications like link prediction, knowledge base completion, and question answering.
Most machine learning approaches, which scale horizontally (i.e. can be executed in a distributed environment) work on simpler feature vector based input rather than more expressive knowledge structures.
On the other hand, the learning methods which exploit the expressive structures, e.g. Statistical Relational Learning and Inductive Logic Programming approaches, usually do not scale well to very large knowledge bases owing to their working complexity.
This talk gives an overview of the ongoing project Semantic Analytics Stack (SANSA) which aims to bridge this research gap by creating an out of the box library for scalable, in-memory, structured learning.
Performance of graph query languages:
Analysis on theperformance of graph querylanguages: comparative study of Cypher, Gremlin and native access in Neo4j
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
Today, we count more than 10,000 datasets made available online following Semantic Web standards.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large-scale knowledge graphs in order to facilitate applications in various domains including life sciences, publishing, and the internet of things.
The main objective of this thesis is to lay foundations for efficient algorithms performing analytics, i.e. exploration, quality assessment, and querying over semantic knowledge graphs at a scale that has not been possible before.
First, we propose a novel approach for statistical calculations of large RDF datasets, which scales out to clusters of machines.
In particular, we describe the first distributed in-memory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark.
Many applications such as data integration, search, and interlinking, may take full advantage of the data when having a priori statistical information about its internal structure and coverage.
However, such applications may suffer from low quality and not being able to leverage the full advantage of the data when the size of data goes beyond the capacity of the resources available.
Thus, we introduce a distributed approach of quality assessment of large RDF datasets.
It is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data.
Based on the knowledge of the internal statistics of a dataset and its quality, users typically want to query and retrieve large amounts of information.
As a result, it has become difficult to efficiently process these large RDF datasets.
Indeed, these processes require, both efficient storage strategies and query-processing engines, to be able to scale in terms of data size.
Therefore, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets by translating SPARQL queries into Spark executable code.
We conducted several empirical evaluations to assess the scalability, effectiveness, and efficiency of our proposed approaches.
More importantly, various use cases i.e. Ethereum analysis, Mining Big Data Logs, and Scalable Integration of POIs, have been developed and leverages by our approach.
The empirical evaluations and concrete applications provide evidence that our methodology and techniques proposed during this thesis help to effectively analyze and process large-scale RDF datasets.
All the proposed approaches during this thesis are integrated into the larger SANSA framework.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
This presentation is a review of the NoSQL spaces I did for the X Jornades de Programari Lliure in Barcelona.
You will see a complete review of the NoSQL movement, use cases, technology review, an special review of what are the Graph Databases. And more....
Special thanks to @Hagenburger, @sbitxu, @jannis and the inspiration of the big @jimwebber and the amazing community.
using Spring and MongoDB on Cloud FoundryJoshua Long
This talk introduces how to build MongoDB applications with Spring Data MongoDB on Cloud Foundry. Spring Data provides rich support for easily building applications that work on multiple data stores.
Graph databases are a type of NoSQL database that is optimized for storing and querying connected data and relationships. A graph database represents data in graphs consisting of nodes and edges, where the nodes represent entities and the edges represent relationships between the entities. Graph databases are well-suited for applications that involve complex relationships and connected data, such as social networks, knowledge graphs, and recommendation systems. They allow for flexible querying of relationships and connections via graph traversal operations.
The document describes mm-ADT, a proposed multi-model abstract datatype that aims to provide a universal data structure, processing model, and instruction set that can support various database models like graph, document, relational etc. in a common framework. Key goals are to release a stable mm-ADT specification, compiler and virtual machine, as well as a basic reference implementation by early 2020. The presentation focuses on the universal data structure component, describing how mm-ADT would define custom datatypes, instances, access paths and more using a bytecode language.
Ivan Herman - Semantic Web Activities @ W3Csssw2012
This document summarizes Ivan Herman's presentation on the Semantic Web and Linked Data at the W3C Business Group on Oil, Gas, and Chemicals meeting in Houston on February 13, 2012. The presentation covered topics such as using ontologies and vocabularies to structure and integrate data on the web, technologies like SPARQL and RDFa, converting relational databases to RDF, and ongoing work at the W3C on standards like RDF 1.1 and the Linked Data platform.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
The document discusses digital worlds and applications at both the enterprise and national scales in the United States healthcare system. It notes the massive scale of healthcare data sources, including hundreds of thousands of healthcare offices and databases containing information on hundreds of millions of patients. The critical importance of making sense of this vast amount of heterogeneous healthcare data to improve human lives and health outcomes is also emphasized.
The Business Model Canvas is a strategy and innovation tool to visualize, challenge, and invent business models.
It is outlined in the book Business Model Generation http://www.businessmodelgeneration.com
Web navigation systems for information seeking (updated in Feb 2015)Jack Zheng
- User experience, information architecture, and web navigation
- An evaluation framework to examine major web navigation systems from a human information behavior and user interface perspective.
- Advantages and weaknesses of major types of web navigation system.
This document outlines the typical lifecycle of a software project from planning and analysis through deployment, maintenance, and evaluation. It breaks the process down into key phases including planning, analysis, design, build, testing, deployment, and maintenance. The goal is to show how a system is defined, designed, built, tested, installed, supported, and improved over its lifetime from conception through final evaluation by customers.
This document provides an overview of the IT 4983 Capstone Project course at Kennesaw State University from 2012-2017. It describes how students work in teams on real-world IT projects for external clients over 3 months. Example project types include application development, system implementation, analysis/research, and hybrids. It highlights several featured projects and client feedback, demonstrating how the course provides valuable industry experience for students.
Mobile Web Overview https://www.edocr.com/v/k52p5vj4/Jack Zheng
This document provides an overview of mobile web development. It discusses trends in mobile usage, definitions of mobile web and applications, options for developing mobile content like native, web and hybrid apps. It also covers strategies for mobile websites like responsive design and considerations for mobile design like touch interfaces. Development tools, frameworks and best practices for mobile web are also mentioned.
The document provides an overview of the evolution and trends of web technologies and applications. It discusses the key stages in the evolution of the web from pre-web to modern mobile web. These stages include the early/simple web using static HTML, the dynamic web enabled by server-side processing, the web as a platform supported by mature frameworks, and advances like Web 2.0 and responsive design for mobile. It also covers fundamental technologies, components, servers, processing capabilities, and trends that have shaped the landscape of web development.
This document discusses information systems and their components. It describes the inputs, outputs, boundaries, environments, and purposes of systems. It provides examples of journal publishing review systems and KSU's student information system as complex systems composed of interconnected sub-systems. The document also includes links to Wikipedia pages about systems, systems thinking, and information systems for additional reference.
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...Jack Zheng
Traditional perceptual maps are created using scatter charts or quadrant diagrams, which are based on two dimensions (X and Y axes). Then data items are plotted on the plane based on their values for the two attributes.
The multidimensional perceptual map does not rely on the definition of any fixed axes. The map is composed of smaller areas (cells), which are characterized by a vector of values that represent multiple attributes (dimensions). The positioning of data items in the map is determined by its calculated measure (usually Euclidean distance) again each cell. An unsupervised clustering technique called Self-Organizing Map (SOM) is used to generate such maps.
The multidimensional perceptual map ca be used in many areas including project portfolio management, project prioritization, marketing research, product evaluation, performance management, portfolio management, etc.
These slides were originally a tutorial presented for the SIG preceding the May 2009 meeting of the PRISM Forum.
They attempt to give a survey of the technologies, tools, and state of the world with respect to the Semantic Web as of the first half of 2009.
Webinar: Back to Basics: Thinking in DocumentsMongoDB
New applications, users and inputs demand new types of data, like unstructured, semi-structured and polymorphic data. Adopting MongoDB means adopting to a new, document-based data model.
While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't apply to MongoDB. Documents can represent rich data structures, providing lots of viable alternatives to the standard, normalized, relational model. In addition, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.
In this session, Buzz Moschetti explores how you can take advantage of MongoDB's document model to build modern applications.
The document discusses schema design considerations for modeling data in MongoDB. It notes that while MongoDB is schemaless, applications are still responsible for schema design. It compares relational and MongoDB schema designs, highlighting that MongoDB uses embedded documents, has no joins, and requires duplicating or precomputing data. The document provides recommendations like combining related objects, optimizing for specific use cases, and doing aggregation work during writes rather than reads.
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
This document provides an overview of an introduction to NoSQL webinar. It discusses why NoSQL databases were created, the different types of NoSQL databases including key-value stores, column stores, graph stores, multi-model databases and document stores. It provides details on MongoDB, describing how MongoDB stores data as JSON-like documents with dynamic schemas and supports features like indexing, aggregation and geospatial queries. The webinar agenda is also outlined.
The NoSQL movement has introduced four new database architectural patterns that complement, but not replace, traditional relational and analytical databases. This presentation will introduce these four patterns and discuss their relative strengths and weaknesses for solving a variety of business problems. These problems include Big Data (scalability), search, high availability and agility. For each type of problem we look at how NoSQL databases take different approaches to solving these problems and how you can use this knowledge to find the right database architecture for your business challenges.
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
This document discusses different schema designs for common use cases in MongoDB. It presents four cases: (1) modeling a message inbox, (2) retaining historical data within limits, (3) storing variable attributes efficiently, and (4) looking up users by multiple identities. For each case, it analyzes different modeling approaches, considering factors like query performance, write performance, and whether indexes can be used. The goal is to help designers choose an optimal schema based on their application's access patterns and scale requirements.
Spring Data Graph is an integration library for the open source graph database Neo4j and has been around for over a year, evolving from its infancy as brainchild of Rod Johnson and Emil Eifrem. It supports transparent AspectJ based POJO to Graph Mapping, a Neo4jTemplate API and extensive support for Spring Data Repositories. It can work with an embedded graph database or with the standalone Neo4j Server.
The session starts with a short introduction to graph databases. Following that, the different approaches using Spring Data Graph are explored in the Cineasts.net web-app, a social movie database which is also the application of the tutorial in the Spring Data Graph Guidebook. The session will also cover creating a green-field project using the Spring Roo Addon for Spring Data Graph and deploying the App to CloudFoundry.
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
Presentation about using Neo4j from Django presented at OSCON 2010, Portland OR.
Sample code is available at: https://svn.neo4j.org/components/neo4j.py/trunk/src/examples/python/djangosites/blog/
Graph databases are well suited for complex, interconnected data. Neo4j is a graph database that represents data as nodes connected by relationships. It allows for complex queries and traversals of graph structures. Unlike relational databases, graph databases can directly model real world networks and relationships without needing to flatten the data.
This document provides an introduction to NoSQL databases. It discusses that NoSQL refers to non-relational databases that are not based on SQL and are focused on scalability. Some common types of NoSQL databases include column stores, key-value stores, document stores, graph databases, and XML databases. NoSQL databases are designed to handle large volumes of data across many servers and provide high availability with no single point of failure. Common uses of NoSQL databases include distributed systems like social networks where data is highly distributed and needs to be replicated across servers.
While mathematicians have used graph theory since the 18th century to solve problems, the software patterns for graph data are new to most developers. To enable "mass adoption" of graph technology, we need to establish the right abstractions, access APIs, and data models.
RDF triples, while of paramount importance in establishing RDF graph semantics, are a low-level abstraction, much like using assembly language. For practical and productive “graph programming” we need something different.
Similarly, existing declarative graph query languages (such as SPARQL and Cypher) are not always the best way to access graph data, and sometimes you need a simpler interface (e.g., GraphQL), or even a different approach altogether (e.g., imperative traversals such as with Gremlin).
Ora Lassila is a Principal Graph Technologist in the Amazon Neptune graph database group. He has a long experience with graphs, graph databases, ontologies, and knowledge representation. He was a co-author of the original RDF specification as well as a co-author of the seminal article on the Semantic Web.
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
Neo4j is a graph database that stores data in nodes and relationships. It allows for efficient querying of connected data through graph traversals. Key aspects include nodes that can contain properties, relationships that connect nodes and also contain properties, and the ability to navigate the graph through traversals. Neo4j provides APIs for common graph operations like creating and removing nodes/relationships, running traversals, and managing transactions. It is well suited for domains that involve connected, semi-structured data like social networks.
This document discusses trends driving the adoption of NoSQL databases, including increasing data size, connectivity of information, semi-structured data, and distributed application architectures. It describes four categories of NoSQL databases - aggregate-oriented, key-value stores, column family (BigTable), and document databases - and provides examples and comparisons of their pros and cons.
This document discusses collaborative filtering and recommender systems. It begins with an overview of non-relational databases and graph databases. It then discusses collaborative filtering, including calculating similarity scores between users or items, predicting ratings for unseen items, and making recommendations. Specific methods discussed include Euclidean distance, Pearson correlation, and user-based filtering. The goal of collaborative filtering is to increase sales, market share, and targeted advertising by making personalized recommendations to users.
The document discusses the rise of NoSQL databases as an alternative to traditional relational databases. It provides a brief history of NoSQL, noting that new types of applications and data led developers to look for databases that offer more flexibility and scalability. It also describes the main types of NoSQL databases - key-value stores, graph stores, column stores, and document stores - and discusses some of the advantages of NoSQL databases like flexibility, scalability, availability and lower costs.
The document introduces MongoDB as an open source, high performance database that is a popular NoSQL option. It discusses how MongoDB stores data as JSON-like documents, supports dynamic schemas, and scales horizontally across commodity servers. MongoDB is seen as a good alternative to SQL databases for applications dealing with large volumes of diverse data that need to scale.
This document provides an overview of graph databases and Neo4j. It discusses how graph databases are better suited than relational databases for interconnected data and have simpler data models. Neo4j is highlighted as a graph database that uses nodes, edges and properties to represent data and uses the Cypher query language. It is fully ACID compliant, open source, and has a large active community.
This document provides an introduction to GraphTO, a graph database conference. It discusses who the organizers are and provides an introduction to graph databases and concepts. It highlights how graph databases are better suited than SQL for complex, connected data and provides examples of querying and visualizing graph data using technologies like Neo4j, Cypher, and SPARQL. Finally, it discusses loading patent grant data from XML into a graph and available resources for working with graph databases.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
This presentation looks at relational vs. graph databases and their advantages and disadvantages in storing semantic data for taxonomies and ontologies.
This paper discusses implementing NoSQL databases for robotics applications. NoSQL databases are well-suited for robotics because they can store massive amounts of data, retrieve information quickly, and easily scale. The paper proposes using a NoSQL graph database to store robot instructions and relate them according to tasks. MapReduce processing is also suggested to break large robot data problems into parallel pieces. Implementing a NoSQL system would allow building more intelligent humanoid robots that can process billions of objects and learn quickly from massive sensory inputs.
A presentation of the Neo4j graph database given at QCon SF 2008. It describes why relational databases are increasingly unfit for many applications today and why graphs may be a good fit. It also covers the fundamentals of how to program with Neo4j.
Graph applications were once considered “exotic” and expensive. Until recently, few software engineers had much experience putting graphs to work. However, the use cases are now becoming more commonplace.
This talk explores a practical use case, one which addresses key issues of data governance and reproducible research, and depends on sophisticated use of graph technology.
Consider: some academic disciplines such as astronomy enjoy a wealth of data — mostly open data. Popular machine learning algorithms, open source Python libraries, and distributed systems all owe much to those disciplines and their history of big data.
Other disciplines require strong guarantees for privacy and security. Datasets used in social science research involve confidential details about human subjects: medical histories, wages, home addresses for family members, police records, etc.
Those cannot be shared openly, which impedes researchers from learning about related work by others. Reproducibility of research and the pace of science in general are limited. Nonetheless, social science research is vital for civil governance, especially for evidence-based policymaking (US federal law since 2018).
Even when data may be too sensitive to share openly, often the metadata can be shared. Constructing knowledge graphs of metadata about datasets — along with metadata about authors, their published research, methods used, data providers, data stewards, and so on — that provides effective means to tackle hard problems in data governance.
Knowledge graph work supports use cases such as entity linking, discovery and recommendations, axioms to infer about compliance, etc. This talk reviews the Rich Context AI competition and the related ADRF framework used now by more than 15 federal agencies in the US.
We’ll explore knowledge graph use cases, use of open standards and open source, and how this enhances reproducible research. Social science research for the public sector has much in common with data use in industry.
Issues of privacy, security, and compliance overlap, pointing toward what will be required of banks, media channels, etc., and what technologies apply. We’ll look at comparable work emerging in other parts of industry: open source projects, open standards emerging, and in particular a new set of features in Project Jupyter that support knowledge graphs about data governance.
Similar to NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010) (20)
Startups in Sweden vs Startups in Silicon Valley, 2015 editionEmil Eifrem
Differences between running a startup in Sweden and a startup in Silicon Valley. Bonus: How Neo Technology (Neo4j) uses aspects of Scandinavian culture as a competitive advantage. Presented at the Nordic themed Monki Gräs in London, Jan of 2015.
Btw, we at Neo4j are hiring: http://neo4j.com/jobs/ :)
This document contains snippets from a Neo Technology conference presentation on graphs and graph databases. It discusses how graphs can be used to model real-world domains like social networks, telecommunications networks, financial networks, healthcare networks, and more. It also provides examples of how specific companies like Accenture are using graph databases and outlines Neo Technology's roadmap for improving the user experience of its graph database platform.
An Overview of the Emerging Graph Landscape (Oct 2013)Emil Eifrem
Recent years have seen an explosion of technologies for managing, processing and analyzing graphs, ranging from community projects like Apache Giraph, to vendor led products such as Neo4j and spin outs from established companies like Twitter’s FlockDB. The sheer number of technologies makes it difficult to keep track of these tools and what sets them apart, even for those of us who are active in the space!
But all graph technologies are not created equal. This session will provide a high level framework for making sense of the emerging graph landscape. It will describe the three dominant graph data models today, define top level categories like graph compute engines (Graphlab, Giraph, Pegasus, YarcData, etc) and graph databases (Neo4j, FlockDB, OrientDB, etc) and discuss common characteristics and important properties of each category.
Startups in Sweden vs Startups in Silicon ValleyEmil Eifrem
Differences between running a startup in Sweden and a startup in Silicon Valley. Presented at Stanford's "European Entrepreneurship & Innovation" (http://www.europeanentrepreneursatstanford.com) in Jan of 2012.
NOSQL part of the SpringOne 2GX 2010 keynoteEmil Eifrem
The document discusses Spring Data support for non-relational databases (NOSQL) to address challenges of proliferating and complex data not suitable for relational databases. It provides examples of using Spring Data with the Neo4j graph database to model complex domains like social networks by adding graph features to existing JPA data models and handling relationships as entities rather than raw database operations. A new Spring Roo add-on is presented for simplified Neo4j integration.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
9. Trend 2: connectedness
Giant
Global
Information connectivity
Graph
(GGG)
Ontologies
RDF
Folksonomies
Tagging
User-
Wikis
generated
content
Blogs
RSS
Hypertext
Text
documents web 1.0 web 2.0 “web 3.0”
1990 2000 2010 2020
10. Trend 3: semi-structure
Individualization of content!
In the salary lists of the 1970s, all elements had
exactly one job
In the salary lists of the 2000s, we need 5 job
columns! Or 8? Or 15?
Trend accelerated by the decentralization of
content generation that is the hallmark of the age
of participation (“web 2.0”)
11. Aside: RDBMS performance
Relational database
Salary List
Performance
Majority of
Webapps
Social network
Semantic
}
Trading
custom
Data complexity
18. Four (emerging) NOSQL categories
Key-value stores
Based on Amazon's Dynamo paper
Data model: (global) collection of K-V pairs
Example: Dynomite, Voldemort, Tokyo*
BigTable clones
Based on Google's BigTable paper
Data model: big table, column families
Example: HBase, Hypertable, Cassandra
19. Four (emerging) NOSQL categories
Document databases
Inspired by Lotus Notes
Data model: collections of K-V collections
Example: CouchDB, MongoDB
Graph databases
Inspired by Euler & graph theory
Data model: nodes, rels, K-V on both
Example: AllegroGraph, Sones, Neo4j
20. NOSQL data models
Data size
Key-value stores
Bigtable clones
Document
databases
Graph databases
Data complexity
21. NOSQL data models
Data size
Key-value stores
Bigtable clones
Document
databases
Graph databases
(This is still billions of
90% nodes & relationships)
of
use
cases
Data complexity
23. The Graph DB model: representation
Core abstractions: name = “Emil”
age = 29
Nodes sex = “yes”
Relationships between nodes
1 2
Properties on both
type = KNOWS
time = 4 years 3
type = car
vendor = “SAAB”
model = “95 Aero”
24. Example: The Matrix
name = “The Architect”
name = “Morpheus”
rank = “Captain”
name = “Thomas Anderson”
occupation = “Total badass” 42
age = 29
disclosure = public
KNOWS KNOWS CODED_BY
1 KN O
7 3 WS
13
S
KN name = “Cypher”
KNOW
OW last name = “Reagan”
S
name = “Agent Smith”
disclosure = secret version = 1.0b
age = 3 days age = 6 months language = C++
2
name = “Trinity”
25. Code (1): Building a node space
GraphDatabaseService graphDb = ... // Get factory
// Create Thomas 'Neo' Anderson
Node mrAnderson = graphDb.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );
// Create Morpheus
Node morpheus = graphDb.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly
26. Code (1): Building a node space
GraphDatabaseService graphDb = ... // Get factory
Transaction tx = neo.beginTx();
// Create Thomas 'Neo' Anderson
Node mrAnderson = graphDb.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );
// Create Morpheus
Node morpheus = graphDb.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly
tx.commit();
27. Code (1b): Defining RelationshipTypes
// In package org.neo4j.graphdb
public interface RelationshipType
{
String name();
}
// In package org.yourdomain.yourapp
// Example on how to roll dynamic RelationshipTypes
class MyDynamicRelType implements RelationshipType
{
private final String name;
MyDynamicRelType( String name ){ this.name = name; }
public String name() { return this.name; }
}
// Example on how to kick it, static-RelationshipType-like
enum MyStaticRelTypes implements RelationshipType
{
KNOWS,
WORKS_FOR,
}
30. The Graph DB model: traversal
Traverser framework for name = “Emil”
high-performance traversing age = 31
sex = “yes”
across the node space
1 2
type = KNOWS
time = 4 years 3
type = car
vendor = “SAAB”
model = “95 Aero”
31. Example: Mr Anderson’s friends
name = “The Architect”
name = “Morpheus”
rank = “Captain”
name = “Thomas Anderson”
occupation = “Total badass” 42
age = 29
disclosure = public
KNOWS KNOWS CODED_BY
1 KN O
7 3 WS
13
S
KN name = “Cypher”
KNOW
OW last name = “Reagan”
S
name = “Agent Smith”
disclosure = secret version = 1.0b
age = 3 days age = 6 months language = C++
2
name = “Trinity”
32. Code (2): Traversing a node space
// Instantiate a traverser that returns Mr Anderson's friends
Traverser friendsTraverser = mrAnderson.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
RelTypes.KNOWS,
Direction.OUTGOING );
// Traverse the node space and print out the result
System.out.println( "Mr Anderson's friends:" );
for ( Node friend : friendsTraverser )
{
System.out.printf( "At depth %d => %s%n",
friendsTraverser.currentPosition().getDepth(),
friend.getProperty( "name" ) );
}
33. name = “The Architect”
name = “Morpheus”
rank = “Captain”
name = “Thomas Anderson”
occupation = “Total badass” 42
age = 29
disclosure = public
KNOWS KNOWS CODED_BY
1 KN O
7 3 WS
13
S
KN name = “Cypher”
KNOW
OW last name = “Reagan”
S
name = “Agent Smith”
disclosure = secret version = 1.0b
age = 3 days age = 6 months language = C++
2
name = “Trinity”
$ bin/start-neo-example
Mr Anderson's friends:
At depth 1 => Morpheus
friendsTraverser = mrAnderson.traverse(
Traverser.Order.BREADTH_FIRST, At depth 1 => Trinity
StopEvaluator.END_OF_GRAPH, At depth 2 => Cypher
ReturnableEvaluator.ALL_BUT_START_NODE,
RelTypes.KNOWS,
At depth 3 => Agent Smith
Direction.OUTGOING ); $
34. Example: Friends in love?
name = “The Architect”
name = “Morpheus”
rank = “Captain”
name = “Thomas Anderson”
occupation = “Total badass” 42
age = 29
disclosure = public
KNOWS KNOWS CODED_BY
1 7 3 KN O
WS
13
S
KN
KNOW
name = “Cypher”
OW last name = “Reagan”
S
name = “Agent Smith”
LO disclosure = secret version = 1.0b
VE age = 6 months language = C++
S
2
name = “Trinity”
35. Code (3a): Custom traverser
// Create a traverser that returns all “friends in love”
Traverser loveTraverser = mrAnderson.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
new ReturnableEvaluator()
{
public boolean isReturnableNode( TraversalPosition pos )
{
return pos.currentNode().hasRelationship(
RelTypes.LOVES, Direction.OUTGOING );
}
},
RelTypes.KNOWS,
Direction.OUTGOING );
36. Code (3a): Custom traverser
// Traverse the node space and print out the result
System.out.println( "Who’s a lover?" );
for ( Node person : loveTraverser )
{
System.out.printf( "At depth %d => %s%n",
loveTraverser.currentPosition().getDepth(),
person.getProperty( "name" ) );
}
37. name = “The Architect”
name = “Morpheus”
rank = “Captain”
name = “Thomas Anderson”
occupation = “Total badass” 42
age = 29
disclosure = public
KNOWS KNOWS KN O CODED_BY
1 7 3 WS
13
S
KN
KNOW
name = “Cypher”
OW last name = “Reagan”
S
name = “Agent Smith”
LO disclosure = secret version = 1.0b
VE age = 6 months language = C++
S 2
name = “Trinity”
$ bin/start-neo-example
new ReturnableEvaluator()
Who’s a lover?
{
public boolean isReturnableNode(
TraversalPosition pos)
At depth 1 => Trinity
{ $
return pos.currentNode().
hasRelationship( RelTypes.LOVES,
Direction.OUTGOING );
}
},
38. Bonus code: domain model
How do you implement your domain model?
Use the delegator pattern, i.e. every domain entity
wraps a Neo4j primitive:
// In package org.yourdomain.yourapp
class PersonImpl implements Person
{
private final Node underlyingNode;
PersonImpl( Node node ){ this.underlyingNode = node; }
public String getName()
{
return (String) this.underlyingNode.getProperty( "name" );
}
public void setName( String name )
{
this.underlyingNode.setProperty( "name", name );
}
}
39. Domain layer frameworks
Qi4j (www.qi4j.org)
Framework for doing DDD in pure Java5
Defines Entities / Associations / Properties
Sound familiar? Nodes / Rel’s / Properties!
Neo4j is an “EntityStore” backend
Jo4neo (http://code.google.com/p/jo4neo)
Annotation driven
Weaves Neo4j-backed persistence into domain
objects at runtime
40. Neo4j system characteristics
Disk-based
Native graph storage engine with custom binary
on-disk format
Transactional
JTA/JTS, XA, 2PC, Tx recovery, deadlock
detection, MVCC, etc
Scales up
Many billions of nodes/rels/props on single JVM
Robust
6+ years in 24/7 production
41. Social network pathExists()
12
~1k persons
3
7 1 Avg 50 friends per
person
pathExists(a, b) limit
36
41 77 depth 4
5
Two backends
Eliminate disk IO so
warm up caches
42. Social network pathExists()
2
Emil
1 5
7
Mike Kevin
3 John
Marcus
9 4
Bruce Leigh
# persons query time
Relational database 1 000 2 000 ms
Graph database (Neo4j) 1 000 2 ms
Graph database (Neo4j) 1 000 000 2 ms
43.
44.
45. Pros & Cons compared to RDBMS
+ No O/R impedance mismatch (whiteboard friendly)
+ Can easily evolve schemas
+ Can represent semi-structured info
+ Can represent graphs/networks (with performance)
- Lacks in tool and framework support
- Few other implementations => potential lock in
- No support for ad-hoc queries
+
46. Query languages
SPARQL – “SQL for linked data”
Ex: ”SELECT ?person WHERE {
?person neo4j:KNOWS ?friend .
?friend neo4j:KNOWS ?foe .
?foe neo4j:name “Larry Ellison” .
}”
Gremlin – “perl for graphs”
Ex: ”./outE[@label='KNOWS']/inV[@age > 30]/@name”
47. The Neo4j ecosystem
Neo4j is an embedded database
Tiny teeny lil jar file
Component ecosystem
index
meta-model
graph-matching
remote-graphdb
sparql-engine
...
See http://components.neo4j.org
48. Example: Neo4j-RDF
Neo4j-RDF triple/quad store
OWL SPARQL
RDF
Metamodel Graph
match
Neo4j
49. Language bindings
Neo4j.py – bindings for Jython and CPython
http://components.neo4j.org/neo4j.py
Neo4jrb – bindings for JRuby (incl RESTful API)
http://wiki.neo4j.org/content/Ruby
Neo4jrb-simple
http://github.com/mdeiters/neo4jr-simple
Clojure
http://wiki.neo4j.org/content/Clojure
Scala (incl RESTful API)
http://wiki.neo4j.org/content/Scala
53. Scale out – replication
Rolling out Neo4j HA... soon :)
Master-slave replication, 1st configuration
MySQL style... ish
Except all instances can write, synchronously
between writing slave & master (strong consistency)
Updates are asynchronously propagated to the
other slaves (eventual consistency)
This can handle billions of entities...
… but not 100B
54. Scale out – partitioning
Sharding possible today
… but you have to do manual work
… just as with MySQL
Great option: shard on top of resilient, scalable
OSS app server , see: www.codecauldron.org
Transparent partitioning? Neo4j 2.0
100B? Easy to say. Sliiiiightly harder to do.
Fundamentals: BASE & eventual consistency
Generic clustering algorithm as base case, but
give lots of knobs for developers
56. Case: Enterprise Content Management
Background:
Enterprise Content Management (think: “CMS
but also with non-web content,” or “big
filesystem on the webotubes”)
Thousands of users
Various content types: PDFs, images, videos, doc
files, organization-specific XML formats
“Multi-tentant SaaS”
57. Outline
A saga in three parts
Part I: we're a file system on the web
Part II: sharing is caring
Part III: profit
58. Part I: We're a file system on the web
Let's get something out there
We shall store files in folders
Ya know, versions are kinda cool
59. Part I: concept model
Root
Folder
Sub v. 2
File 1
File 2 v. 1
60. Part I: SQL? NOSQL?
So hmm, this whole relational database thingie...
Modeling hierarchies?
Doable but kinda painful.
Sucky code. And hmm, quite a lot of joins.
Activity feeds
Wouldn't it be cool if you could subscribe to a
folder and get changes fed to you.
Whoa, massive amount of joins!
Denormalization, write explosion, code complexity.
61. Part I: concept model
Root
Folder
Sub v. 2
File 1 How do you represent this in a
graph database?
File 2 v. 1
62. Tadaa!
Root
Folder How do you implement activity
feeds?
Sub v. 2
File 1 Easier when you do ~1M
traversals per second. :)
File 2 v. 1
No need to denormalize and
aggregate events at each
folder level.
63. Part II: Sharing is caring
We're oh-so SaaS and multi-tentant
Would be useful if we could share content between
organizations
Since we're all kinda running on top of the same
system (not just same software) anyway
64. Part II: concept model (a)
Trifork
Root
Folder
Sub v. 2
File 1
File 2 v. 1
Share
Sub
link File 1
InfoQ
Root File 2
65. Part II: Security
Whoa, guys, security?
Customer sez: we need to model organizations
And suborganizations
And hierarchical user groups
Customer sez: and add some security to all that
So add ACLs
And incremental security
66. Admin
Group
U2 U1
Trifork
Org
+R
+W
Trifork
Root -W
Folder
Sub v. 2
File 1
File 2 v. 1
Share
Sub
link File 1
InfoQ
Root File 2
+RW
U3
67. Part II: Keyword translations
Customer sez: we need to cut costs
Ouch
But we spend a lot of time on manually translating
keyword lists and things like that
Let's model that in the graph!
Also, this whole graph thing is really kinda flexible...
so let's throw in some topologies while we're at it!
68. Admin
Group
U2 U1 Tree
Trifork Woods
Org EN
EN
+R Forest
EN
+W Plant
Träd
EN
Skog SV
Trifork SV
Root -W
Folder
Sub v. 2
File 1
File 2 v. 1
Share
Sub
link File 1
InfoQ
Root File 2
+RW
U3
69. Part III: profit
Customer sez:
I heart the cash
If my customers make money, I make money
Developers: “Gives me multi-tentant ecommerce!”
Owait, say waht?
70. Part III: multi-tentant e-commerce?
Conceptual breakdown:
Every org can “enable e-commerce,” thereby
making their content sellable
Within every org, one should be able to model a
supply chain of creator → syndicator → distributor
→ customer
The distributor assigned by region and sets price:
E.x. one dist for Scand, one for the UK
Due to inter-org sharing (remember?), the same
content can belong to several e-commerce
“spheres”
71. QCon
Org Syndicator
Role Ecom
sph
SI
Distributor
Role
Sub
File 1
Folder
DR
Price
List
World
Scand.
U1
72. Finding the price
So how do you actually figure out the price
Throw the distributor
Which is regionally bound
Per content
Per e-commerce sphere
That's a shortest path algo!
73. QCon
Org Syndicator
Role Ecom
sph
SI
Distributor
Sub
Finding the price: Role
File 1
Folder
Equivalent to finding the
SI
shortest path from U1 to File1 DR
Price
along “purple and black” List
Price
relationship types. List DR JAOO
World Org
Scand. Distributor
Role
U1
74. Admin
Group
U2 U1 Tree
Trifork Woods
Org EN
EN
+R Forest
EN
+W Plant
Träd
EN
Skog SV
Trifork SV
Root -W
QCon
Folder Org Syndicator
Role Ecom
sph
Sub v. 2 SI
Distributor
File 1 Role
Sub
File 1
Folder
File 2 v. 1
Share SI
DR
Price
List
Price
Sub
List DR
link File 1 JAOO
World Org
InfoQ
Root File 2
Scand. Distributor
+RW Role
U1
U3
75. ECM conclusions
One example of an evolving model
Site had a lots of content, lots of users
High read load
Moderate write load
Only backend: Neo4j
“You go to a graph database for the performance...
but you stay for the flexibility!”
76. How ego are you? (aka other impls?)
Franz’ AllegroGraph (http://agraph.franz.com)
Proprietary, Lisp, RDF-oriented but real graphdb
Sones graphDB (http://sones.com)
Proprietary, .NET, cloud-only, req invite for test
Kloudshare (http://kloudshare.com)
Graph database in the cloud, still stealth mode
Google Pregel (http://bit.ly/dP9IP)
We are oh-so-secret
Some academic papers from ~10 years ago
G = {V, E} #FAIL
77. Conclusion
Graphs && Neo4j => teh awesome!
Available NOW under AGPLv3 / commercial license
AGPLv3: “if you’re open source, we’re open source”
If you have proprietary software? Must buy a
commercial license
But the first one is free!
Download
http://neo4j.org
Feedback
http://lists.neo4j.org