The document discusses graph databases and provides examples of real-world graphs from Facebook, the New York Times, and a phone bill. It compares the data structure of graph databases to relational databases and shows how graph databases can store connections between entities as nodes with relationships as edges. Properties like tags can also be added to nodes and edges in a graph database to provide additional information.
Graph databases are well-suited for storing and querying multi-relational data. They provide better performance, flexibility, and agility than relational databases for such data. Tests showed graph databases like Neo4j outperforming relational databases by returning results faster and for more records as depth and complexity of queries increased. Cypher is the query language for Neo4j that allows starting queries, matching patterns, returning and filtering results through clauses like START, MATCH, RETURN, and WHERE. Graph databases are used successfully by many large companies needing to handle complex relationships in data.
This document provides an overview of graph databases. It discusses how graph data is naturally represented as nodes connected by edges, unlike relational databases which require joins. Graph databases allow for fast traversal of connected data and enable querying connected subgraphs. Popular graph database models include property graphs and RDF triple stores. Neo4j is introduced as a widely used graph database management system that uses labels, properties, relationships, and Cypher query language.
A presentation of Apache TinkerPop's Gremlin language with running examples over the MovieLens dataset. Presented August 19, 2015 at NoSQL NOW in San Jose, California.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
Graph databases use graph structures to represent and store data, with nodes connected by edges. They are well-suited for interconnected data. Unlike relational databases, graph databases allow for flexible schemas and querying of relationships. Common uses of graph databases include social networks, knowledge graphs, and recommender systems.
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
Jesse Anderson (Smoking Hand)
This early-morning session offers an overview of what HBase is, how it works, its API, and considerations for using HBase as part of a Big Data solution. It will be helpful for people who are new to HBase, and also serve as a refresher for those who may need one.
Graph databases are well-suited for storing and querying multi-relational data. They provide better performance, flexibility, and agility than relational databases for such data. Tests showed graph databases like Neo4j outperforming relational databases by returning results faster and for more records as depth and complexity of queries increased. Cypher is the query language for Neo4j that allows starting queries, matching patterns, returning and filtering results through clauses like START, MATCH, RETURN, and WHERE. Graph databases are used successfully by many large companies needing to handle complex relationships in data.
This document provides an overview of graph databases. It discusses how graph data is naturally represented as nodes connected by edges, unlike relational databases which require joins. Graph databases allow for fast traversal of connected data and enable querying connected subgraphs. Popular graph database models include property graphs and RDF triple stores. Neo4j is introduced as a widely used graph database management system that uses labels, properties, relationships, and Cypher query language.
A presentation of Apache TinkerPop's Gremlin language with running examples over the MovieLens dataset. Presented August 19, 2015 at NoSQL NOW in San Jose, California.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
Graph databases use graph structures to represent and store data, with nodes connected by edges. They are well-suited for interconnected data. Unlike relational databases, graph databases allow for flexible schemas and querying of relationships. Common uses of graph databases include social networks, knowledge graphs, and recommender systems.
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
Jesse Anderson (Smoking Hand)
This early-morning session offers an overview of what HBase is, how it works, its API, and considerations for using HBase as part of a Big Data solution. It will be helpful for people who are new to HBase, and also serve as a refresher for those who may need one.
This document discusses troubleshooting Redis. Some key points:
- Redis is single-threaded, so commands like KEYS, FlushAll, and deleting large collections can be slow. It's better to use SCAN instead of KEYS.
- Creating Redis database snapshots (RDB files) and rewriting the append-only file (AOF) can cause high disk I/O and CPU usage. It's best to disable automatic rewrites.
- Monitoring memory usage and fragmentation is important to avoid performance issues. The maxmemory setting also needs monitoring to prevent out-of-memory errors.
- Network and replication failures need solutions like DNS failover or using Zookeeper for coordination to maintain high availability of Redis
Max De Marzi gave an introduction to graph databases using Neo4j as an example. He discussed trends in big, connected data and how NoSQL databases like key-value stores, column families, and document databases address these trends. However, graph databases are optimized for interconnected data by modeling it as nodes and relationships. Neo4j is a graph database that uses a property graph data model and allows querying and traversal through its Cypher query language and Gremlin scripting language. It is well-suited for domains involving highly connected data like social networks.
This document provides an overview of MongoDB for Java developers. It discusses what MongoDB is, how it compares to relational databases, common use cases, data modeling approaches, CRUD operations, indexing, aggregation, replication, sharding, and tools for integrating MongoDB with Java applications. The document contains multiple code examples and concludes with a demonstration of building a sample app with MongoDB.
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.
We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
The document discusses Structured Query Language (SQL) and its history and components. It notes that SQL is a declarative query language used to define database schemas, manipulate data through queries, and control transactions. The document outlines SQL's data definition language for defining schemas and data manipulation language for querying and modifying data. It also provides examples of SQL statements for creating tables and defining constraints.
This document provides an overview of different database types including relational, NoSQL, document, key-value, graph, and column family databases. It discusses the history and drivers behind the development of NoSQL databases, as well as concepts like horizontal scaling, the CAP theorem, and eventual consistency. Specific databases are also summarized, including MongoDB, Redis, Neo4j, and HBase.
Slides for Data Syndrome one hour course on PySpark. Introduces basic operations, Spark SQL, Spark MLlib and exploratory data analysis with PySpark. Shows how to use pylab with Spark to create histograms.
Mysql is a popular open source database system. It can be downloaded from the mysql website for free. Mysql allows users to create, manipulate and store data in databases. A database contains tables which store data in a structured format. Structured Query Language (SQL) is used to perform operations like querying and manipulating data within mysql databases. Some common sql queries include select, insert, update and delete.
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
This document provides an overview of NoSQL databases. It begins with a brief history of relational databases and Edgar Codd's 1970 paper introducing the relational model. It then discusses modern trends driving the emergence of NoSQL databases, including increased data complexity, the need for nested data structures and graphs, evolving schemas, high query volumes, and cheap storage. The core characteristics of NoSQL databases are outlined, including flexible schemas, non-relational structures, horizontal scaling, and distribution. The major categories of NoSQL databases are explained - key-value, document, graph, and column-oriented stores - along with examples like Redis, MongoDB, Neo4j, and Cassandra. The document concludes by discussing use cases and
This document provides an overview of Spark SQL and its architecture. Spark SQL allows users to run SQL queries over SchemaRDDs, which are RDDs with a schema and column names. It introduces a SQL-like query abstraction over RDDs and allows querying data in a declarative manner. The Spark SQL component consists of Catalyst, a logical query optimizer, and execution engines for different data sources. It can integrate with data sources like Parquet, JSON, and Cassandra.
This presentation about Hadoop architecture will help you understand the architecture of Apache Hadoop in detail. In this video, you will learn what is Hadoop, components of Hadoop, what is HDFS, HDFS architecture, Hadoop MapReduce, Hadoop MapReduce example, Hadoop YARN and finally, a demo on MapReduce. Apache Hadoop offers a versatile, adaptable and reliable distributed computing big data framework for a group of systems with capacity limit and local computing power. After watching this video, you will also understand the Hadoop Distributed File System and its features along with the practical implementation.
Below are the topics covered in this Hadoop Architecture presentation:
1. What is Hadoop?
2. Components of Hadoop
3. What is HDFS?
4. HDFS Architecture
5. Hadoop MapReduce
6. Hadoop MapReduce Example
7. Hadoop YARN
8. Demo on MapReduce
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Who should take up this Big Data and Hadoop Certification Training Course?
Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology for the following professionals:
1. Software Developers and Architects
2. Analytics Professionals
3. Senior IT professionals
4. Testing and Mainframe professionals
5. Data Management Professionals
6. Business Intelligence Professionals
7. Project Managers
8. Aspiring Data Scientists
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
MySQL Group Replication is a new 'synchronous', multi-master, auto-everything replication plugin for MySQL introduced with MySQL 5.7. It is the perfect tool for small 3-20 machine MySQL clusters to gain high availability and high performance. It stands for high availability because the fault of replica don't stop the cluster. Failed nodes can rejoin the cluster and new nodes can be added in a fully automatic way - no DBA intervention required. Its high performance because multiple masters process writes, not just one like with MySQL Replication. Running applications on it is simple: no read-write splitting, no fiddling with eventual consistency and stale data. The cluster offers strong consistency (generalized snapshot isolation).
It is based on Group Communication principles, hence the name.
The document summarizes Spark SQL, which is a Spark module for structured data processing. It introduces key concepts like RDDs, DataFrames, and interacting with data sources. The architecture of Spark SQL is explained, including how it works with different languages and data sources through its schema RDD abstraction. Features of Spark SQL are covered such as its integration with Spark programs, unified data access, compatibility with Hive, and standard connectivity.
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
This document discusses Spark RDD operations and running Spark applications in Scala. It covers Spark transformations like map, filter, and reduce. It also covers Spark actions and different types of RDDs. It provides examples of running Spark in local, standalone, and YARN modes and submitting Spark jobs via spark-submit in those modes. It includes questions and answers about Spark concepts.
The document discusses various topics related to data modeling and query optimization in Hive including:
- File formats like text, Parquet, and ORC that can be used in Hive
- Different types of Hive tables like external, managed, and views
- Data layout techniques in Hive like partitioning, bucketing, and handling skewed data to optimize query performance
- Best practices for using partitioning, bucketing, and skews depending on the type of data and query patterns
This document discusses different types of distributed databases. It covers data models like relational, aggregate-oriented, key-value, and document models. It also discusses different distribution models like sharding and replication. Consistency models for distributed databases are explained including eventual consistency and the CAP theorem. Key-value stores are described in more detail as a simple but widely used data model with features like consistency, scaling, and suitable use cases. Specific key-value databases like Redis, Riak, and DynamoDB are mentioned.
This document provides an overview of a Neo4j basic training session. The training will cover querying graph patterns with Cypher, designing and implementing a graph database model, and evolving existing graphs to support new requirements. Attendees will learn about graph modeling concepts like nodes, relationships, properties and labels. They will go through a modeling workflow example of developing a graph model to represent airport connectivity data from a CSV file and querying the resulting graph.
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
1) The document discusses how Zephyr Health is solving the problem of disconnected healthcare data by building a platform that ingests and integrates data from various sources using algorithms and MongoDB.
2) It organizes data into entity-centric profiles and uses a graph-based index to allow complex queries across the integrated data.
3) The platform powers various analytical applications that help address real business problems by leveraging the integrated data in a standardized way.
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومfaradars
بانک اطلاعاتی اوراکل بی شک یکی از قدرتمندترین نرم افزارها برای مدیریت اطلاعاتی با حجم بسیار بالا می باشد هدف از این آموزش یادگیری مفاهیم پیچیده معماری و چالش های مدیریتی دیتابیس است که به شما کمک خواهد کرد تا به سرعت مطالب را فرا گرفته و به اهداف خود نزدیک شوید .
سرفصل هایی که در این آموزش به آن پرداخته شده است:
معماری دیتابیس اوراکل
آماده سازی محیط بانک اطلاعاتی
ایجاد دیتابیس اوراکل
مدیریت بخش حافظه ای اوراکل
پیکربندی محیط شبکه در اوراکل
...
برای توضیحات بیشتر و تهیه این آموزش لطفا به لینک زیر مراجعه بفرمائید:
http://faradars.org/courses/fvorc9408
This document discusses troubleshooting Redis. Some key points:
- Redis is single-threaded, so commands like KEYS, FlushAll, and deleting large collections can be slow. It's better to use SCAN instead of KEYS.
- Creating Redis database snapshots (RDB files) and rewriting the append-only file (AOF) can cause high disk I/O and CPU usage. It's best to disable automatic rewrites.
- Monitoring memory usage and fragmentation is important to avoid performance issues. The maxmemory setting also needs monitoring to prevent out-of-memory errors.
- Network and replication failures need solutions like DNS failover or using Zookeeper for coordination to maintain high availability of Redis
Max De Marzi gave an introduction to graph databases using Neo4j as an example. He discussed trends in big, connected data and how NoSQL databases like key-value stores, column families, and document databases address these trends. However, graph databases are optimized for interconnected data by modeling it as nodes and relationships. Neo4j is a graph database that uses a property graph data model and allows querying and traversal through its Cypher query language and Gremlin scripting language. It is well-suited for domains involving highly connected data like social networks.
This document provides an overview of MongoDB for Java developers. It discusses what MongoDB is, how it compares to relational databases, common use cases, data modeling approaches, CRUD operations, indexing, aggregation, replication, sharding, and tools for integrating MongoDB with Java applications. The document contains multiple code examples and concludes with a demonstration of building a sample app with MongoDB.
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.
We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
The document discusses Structured Query Language (SQL) and its history and components. It notes that SQL is a declarative query language used to define database schemas, manipulate data through queries, and control transactions. The document outlines SQL's data definition language for defining schemas and data manipulation language for querying and modifying data. It also provides examples of SQL statements for creating tables and defining constraints.
This document provides an overview of different database types including relational, NoSQL, document, key-value, graph, and column family databases. It discusses the history and drivers behind the development of NoSQL databases, as well as concepts like horizontal scaling, the CAP theorem, and eventual consistency. Specific databases are also summarized, including MongoDB, Redis, Neo4j, and HBase.
Slides for Data Syndrome one hour course on PySpark. Introduces basic operations, Spark SQL, Spark MLlib and exploratory data analysis with PySpark. Shows how to use pylab with Spark to create histograms.
Mysql is a popular open source database system. It can be downloaded from the mysql website for free. Mysql allows users to create, manipulate and store data in databases. A database contains tables which store data in a structured format. Structured Query Language (SQL) is used to perform operations like querying and manipulating data within mysql databases. Some common sql queries include select, insert, update and delete.
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
This document provides an overview of NoSQL databases. It begins with a brief history of relational databases and Edgar Codd's 1970 paper introducing the relational model. It then discusses modern trends driving the emergence of NoSQL databases, including increased data complexity, the need for nested data structures and graphs, evolving schemas, high query volumes, and cheap storage. The core characteristics of NoSQL databases are outlined, including flexible schemas, non-relational structures, horizontal scaling, and distribution. The major categories of NoSQL databases are explained - key-value, document, graph, and column-oriented stores - along with examples like Redis, MongoDB, Neo4j, and Cassandra. The document concludes by discussing use cases and
This document provides an overview of Spark SQL and its architecture. Spark SQL allows users to run SQL queries over SchemaRDDs, which are RDDs with a schema and column names. It introduces a SQL-like query abstraction over RDDs and allows querying data in a declarative manner. The Spark SQL component consists of Catalyst, a logical query optimizer, and execution engines for different data sources. It can integrate with data sources like Parquet, JSON, and Cassandra.
This presentation about Hadoop architecture will help you understand the architecture of Apache Hadoop in detail. In this video, you will learn what is Hadoop, components of Hadoop, what is HDFS, HDFS architecture, Hadoop MapReduce, Hadoop MapReduce example, Hadoop YARN and finally, a demo on MapReduce. Apache Hadoop offers a versatile, adaptable and reliable distributed computing big data framework for a group of systems with capacity limit and local computing power. After watching this video, you will also understand the Hadoop Distributed File System and its features along with the practical implementation.
Below are the topics covered in this Hadoop Architecture presentation:
1. What is Hadoop?
2. Components of Hadoop
3. What is HDFS?
4. HDFS Architecture
5. Hadoop MapReduce
6. Hadoop MapReduce Example
7. Hadoop YARN
8. Demo on MapReduce
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Who should take up this Big Data and Hadoop Certification Training Course?
Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology for the following professionals:
1. Software Developers and Architects
2. Analytics Professionals
3. Senior IT professionals
4. Testing and Mainframe professionals
5. Data Management Professionals
6. Business Intelligence Professionals
7. Project Managers
8. Aspiring Data Scientists
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
MySQL Group Replication is a new 'synchronous', multi-master, auto-everything replication plugin for MySQL introduced with MySQL 5.7. It is the perfect tool for small 3-20 machine MySQL clusters to gain high availability and high performance. It stands for high availability because the fault of replica don't stop the cluster. Failed nodes can rejoin the cluster and new nodes can be added in a fully automatic way - no DBA intervention required. Its high performance because multiple masters process writes, not just one like with MySQL Replication. Running applications on it is simple: no read-write splitting, no fiddling with eventual consistency and stale data. The cluster offers strong consistency (generalized snapshot isolation).
It is based on Group Communication principles, hence the name.
The document summarizes Spark SQL, which is a Spark module for structured data processing. It introduces key concepts like RDDs, DataFrames, and interacting with data sources. The architecture of Spark SQL is explained, including how it works with different languages and data sources through its schema RDD abstraction. Features of Spark SQL are covered such as its integration with Spark programs, unified data access, compatibility with Hive, and standard connectivity.
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
This document discusses Spark RDD operations and running Spark applications in Scala. It covers Spark transformations like map, filter, and reduce. It also covers Spark actions and different types of RDDs. It provides examples of running Spark in local, standalone, and YARN modes and submitting Spark jobs via spark-submit in those modes. It includes questions and answers about Spark concepts.
The document discusses various topics related to data modeling and query optimization in Hive including:
- File formats like text, Parquet, and ORC that can be used in Hive
- Different types of Hive tables like external, managed, and views
- Data layout techniques in Hive like partitioning, bucketing, and handling skewed data to optimize query performance
- Best practices for using partitioning, bucketing, and skews depending on the type of data and query patterns
This document discusses different types of distributed databases. It covers data models like relational, aggregate-oriented, key-value, and document models. It also discusses different distribution models like sharding and replication. Consistency models for distributed databases are explained including eventual consistency and the CAP theorem. Key-value stores are described in more detail as a simple but widely used data model with features like consistency, scaling, and suitable use cases. Specific key-value databases like Redis, Riak, and DynamoDB are mentioned.
This document provides an overview of a Neo4j basic training session. The training will cover querying graph patterns with Cypher, designing and implementing a graph database model, and evolving existing graphs to support new requirements. Attendees will learn about graph modeling concepts like nodes, relationships, properties and labels. They will go through a modeling workflow example of developing a graph model to represent airport connectivity data from a CSV file and querying the resulting graph.
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
1) The document discusses how Zephyr Health is solving the problem of disconnected healthcare data by building a platform that ingests and integrates data from various sources using algorithms and MongoDB.
2) It organizes data into entity-centric profiles and uses a graph-based index to allow complex queries across the integrated data.
3) The platform powers various analytical applications that help address real business problems by leveraging the integrated data in a standardized way.
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومfaradars
بانک اطلاعاتی اوراکل بی شک یکی از قدرتمندترین نرم افزارها برای مدیریت اطلاعاتی با حجم بسیار بالا می باشد هدف از این آموزش یادگیری مفاهیم پیچیده معماری و چالش های مدیریتی دیتابیس است که به شما کمک خواهد کرد تا به سرعت مطالب را فرا گرفته و به اهداف خود نزدیک شوید .
سرفصل هایی که در این آموزش به آن پرداخته شده است:
معماری دیتابیس اوراکل
آماده سازی محیط بانک اطلاعاتی
ایجاد دیتابیس اوراکل
مدیریت بخش حافظه ای اوراکل
پیکربندی محیط شبکه در اوراکل
...
برای توضیحات بیشتر و تهیه این آموزش لطفا به لینک زیر مراجعه بفرمائید:
http://faradars.org/courses/fvorc9408
Sam Weaver, a MongoDB Product Manager, introduces MongoDB Compass. He discusses the need for Compass due to customer requests for quicker prototyping, less friction on handovers, and easier learning of MongoDB Query Language (MQL). He demos Compass' features like viewing schemas and sampling data from MongoDB databases. Finally, he outlines future plans like supporting more database operations and statistics, and sharing queries.
Data Modeling for Integration of NoSQL with a Data WarehouseDaniel Upton
Learn to model data to be visible and accessible between NOSQL Big Data repositories and your RDBMS Data Warehouse. Learn how specific RDBMS Data Warehouse data modeling approaches establish flexible integration with NoSQL data sets that do not play by E.F. Codd’s rules.
- Data modeling for NoSQL databases is different than relational databases and requires designing the data model around access patterns rather than object structure. Key differences include not having joins so data needs to be duplicated and modeling the data in a way that works for querying, indexing, and retrieval speed.
- The data model should focus on making the most of features like atomic updates, inner indexes, and unique identifiers. It's also important to consider how data will be added, modified, and retrieved factoring in object complexity, marshalling/unmarshalling costs, and index maintenance.
- The _id field can be tailored to the access patterns, such as using dates for time-series data to keep recent
This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database.
Video here: https://vimeo.com/67371996
The document discusses choosing between SQL and NoSQL databases. It covers the evolution of data architectures from traditional client-server models to newer distributed NoSQL solutions. It provides an overview of different data store types like SQL, NoSQL, key-value, document, column family, and graph databases. The document advises picking the right data model based on business needs, use cases, data storage requirements, and growth patterns then evaluating solutions based on pros and cons. It concludes that for large, growing data, both SQL and NoSQL solutions may be needed.
Webinar: Back to Basics: Thinking in DocumentsMongoDB
New applications, users and inputs demand new types of data, like unstructured, semi-structured and polymorphic data. Adopting MongoDB means adopting to a new, document-based data model.
While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't apply to MongoDB. Documents can represent rich data structures, providing lots of viable alternatives to the standard, normalized, relational model. In addition, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.
In this session, Buzz Moschetti explores how you can take advantage of MongoDB's document model to build modern applications.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
3. OVERVIEW
Why it makes sense to know about graph databases
„Graph databases will come into vogue. One key gap in the Hadoop
ecosystem is for graph databases, which support rich mining and visualization of
relationships, influence, and behavioral propensities. The market for graph
databases will boom in 2012 as companies everywhere adopt them for social
media analytics, marketing campaign optimization, and customer experience
fine-tuning. We will see VCs put big money behind graph database and analytics
startups. Many big data platform and tool vendors will acquire the startups to
supplement their expanding Hadoop, NoSQL, and enterprise data warehousing
(EDW) portfolios. Social graph analysis, although not a brand-new field, will
become one of the most prestigious specialties in the data science arena,
focusing on high-powered drilldown into polystructured behavioral data sets.“
Source: http://blogs.forrester.com/james_kobielus/11-12-19-the_year_ahead_in_big_data_big_cool_new_stuff_looms_large
3
4. OVERVIEW
Example of a real-world graph - facebook
Source: http://www.facebook.com/press/info.php?statistics
4
5. OVERVIEW
Example of a real-world graph - NYT „Cascade“
Source: http://nytlabs.com/projects/cascade.html
5
8. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person
Id Name
0 Henning Rauch
1 René Peinl
2 Foo Bar
3 Bruce Schneier
4 Linus Torwalds
7
9. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person
2
Id Name 3
0 Henning Rauch
1 René Peinl
2 Foo Bar
3 Bruce Schneier
4
4 Linus Torwalds
1
0
7
10. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch 1 0
1 René Peinl 1 2
2 Foo Bar 1 3
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4
4 3 0
7
11. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch 1 0
1 René Peinl 1 2
2 Foo Bar 1 3
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4
4 3 0
7
12. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch 1 0
1 René Peinl 1 2
2 Foo Bar 1 3
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4
4 3 0
Tag
Id Name
0 .NET
1 Java
2 PKI
3 NoSQL
7
13. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch Java
1 0
1 René Peinl 1 2
2 Foo Bar 1 3
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4
4 3 0
Tag NoSQL
Id Name .NET
0 .NET
PKI
1 Java
2 PKI
3 NoSQL
7
14. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch Java
1 0
1 René Peinl 1 2
2 Foo Bar 1 3
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4
4 3 0
Tag Tags_rel NoSQL
Id Name Tag_Id Person_Id Significance .NET
0 .NET 0 0 5
PKI
1 Java 1 1 5
2 PKI 2 1 6
3 NoSQL 2 3 10
3 0 7
3 1 7
7
15. OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch Java
1 0
1 René Peinl 1 2
10
2 Foo Bar 1 3
5
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4 7
4 3 0
5
Tag Tags_rel NoSQL 7
Id Name Tag_Id Person_Id Significance .NET
0 .NET 0 0 5
6 PKI
1 Java 1 1 5
2 PKI 2 1 6
3 NoSQL 2 3 10
3 0 7
3 1 7
7
17. OVERVIEW
Delimitation to RDBMS - Scalability
Knows_rel
Id_1 Id_2
• Relation tables act as a global index over linked 1
1
0
2
data 1
1
3
4
0 1
The bigger the relation table the longer it takes to
0 2
• 0 3
get the interesting information (e.g. local 0
3
4
4
neighbourhood of data) 4 3
Tags_rel
• Solution of graph databases: Information on Tag_Id
0
Person_Id
0
Significance
5
relationships (aka edges) are stored locally on the 1 1 5
vertex
2 1 6
2 3 10
3 0 7
3 1 7
8
18. OVERVIEW
Delimitation to RDBMS - example of complexity
• Task: Find the persons that are known to Id 0.
Knows_rel
• Linear table scan: O(n) Id_1
1
Id_2
0
1 2
Index scan: O(log n)
1 3
• 1 4
0 1
0 2
• Because of the dependency to n RDBMS do not 0
0
3
4
perform well on recursive search algorithms 3 4
4 3
• Graph database solve this task in O(1)
9
19. OVERVIEW
Delimitation to other NoSQL products
Size
> 90% of use cases
Complexity
Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german
10
20. OVERVIEW
Delimitation to other NoSQL products
Size
Key/Value
stores
> 90% of use cases
Complexity
Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german
10
21. OVERVIEW
Delimitation to other NoSQL products
Size
Key/Value
stores
Bigtable
clones
> 90% of use cases
Complexity
Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german
10
22. OVERVIEW
Delimitation to other NoSQL products
Size
Key/Value
stores
Bigtable
clones
Document
databases
> 90% of use cases
Complexity
Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german
10
23. OVERVIEW
Delimitation to other NoSQL products
Size
Key/Value
stores
Bigtable
clones
Document
databases
Graph databases
> 90% of use cases
Complexity
Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german
10
24. OVERVIEW
Delimitation to other NoSQL products
Size
Key/Value
stores
Bigtable
clones
Document
databases
Graph databases
In-memory
graph databases
> 90% of use cases
Complexity
Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german
10
33. NEO4J
Overview
• Graph database + Lucene index
• ACID (isolation level read committed)
• High availability in enterprise edition
• 32 billion vertices, 32 billion edges, 64 billion properties
• Embedded or via REST-API
• Support for the Blueprints project
14
34. NEO4J
Architecture
Cypher/Gremlin Java/Ruby/.../C# API
REST API
Core API (Java)
Caches (files and objects) HA
Record files Transaction-log
Disk(s)
Source: http://www.slideshare.net/rheehot/eo4j-12713065
15
35. NEO4J
knows
Example of the on-disk layout
Name: Bob
Age: 42
Name: Alice
Age: 23
knows
knows
Name: Carol
Age: 22
Source: https://github.com/thobe/presentations
16
36. NEO4J
knows
Example of the on-disk layout
Name: Bob
Age: 42
Name
Name: Alice Bob
Name Age: 23
Alice knows
Age
42
Age
23
knows
Name
Carol
Name: Carol
Age: 22
Age
22
Source: https://github.com/thobe/presentations
16
37. NEO4J
knows
Example of the on-disk layout
SP EP Name: Bob
SN EN Age: 42
knows
Name
Name: Alice SP EP Bob
Name Age: 23
SP EP SN EN
Alice knows
SN EN knows Age
knows 42
Age
23
knows
Name
SP Source Previous
Carol
SN Source Next Name: Carol
EP End Previous Age: 22
EN End Next Age
22
Existent
Nonexistent Source: https://github.com/thobe/presentations
16
38. NEO4J
knows
Example of the on-disk layout
SP EP Name: Bob
SN EN Age: 42
knows
Name
Name: Alice SP EP Bob
Name Age: 23
SP EP SN EN
Alice knows
SN EN knows Age
knows 42
Age
23
knows
Name
SP Source Previous
Carol
SN Source Next Name: Carol
EP End Previous Age: 22
EN End Next Age
22
Existent
Nonexistent Source: https://github.com/thobe/presentations
16
39. NEO4J
In-memory layout (cache)
ID
Relationship ID refs
in: R1 R2 ... Rn
Type 1
out R1 R2 ... Rn
Vertex ... Grouped by type (type = „knows“)
• Transformation of the
double linked list (on-disk)
in: R1 R2 ... Rn
Type n
out R1 R2 ... Rn
to objects
Key 1 Key 2 ... Key n
• Increases the traversal
Val 1
Val 2
Val n
speed
ID start end type
Edge
Key 1 Key 2 ... Key n
Val 1
Val 2
Val n
Source: https://github.com/thobe/presentations
17
40. NEO4J
Traversal
• Relationship-expander (delivers edges of a vertex)
• Evaluators (evaluate if a vertex is going to be traversed or if it
should be taken to the result set)
• Projection of the result set (e.g. „take the last vertex of the path“
• Uniqueness level (sets in steps, whether a node could be visited
several times)
Source: https://github.com/thobe/presentations
18
41. NEO4J
Cypher & Gremlin
Feature Gremlin Cypher
Paradigm Imperative programming Declarative programming
•Developed Marko Rodriguez (Tinkerpop) •In-house development
Description • •Cypher provides greater opportunities for optimization
Based on xpath to describe the traversal
•Developed using Groovy •Good for traversals that need back tracking
•30-50% faster on „simple“ traversals •Output is a table
START
me=node:people(name={myname})
MATCH
me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item
outE[label=HAS_CART].inV item<-[:PURCHASED]-user-[:PURCHASED]->recommendation
.outE[label=CONTAINS_ITEM].inV RETURN recommendation
Example
.inE[label=PURCHASED].outV
.outE[label=PURCHASED].inV START
d=node(1), e=node(2)
MATCH
p = shortestPath( d-[*..15]->e )
RETURN p
Source: https://github.com/thobe/presentations
19
43. NEO4J
Pricing
Price
Edition License Description
(annual)
Complete database
Open Source
„Community“ including a basic 0 €
(GPLv3)
management frontend
+
Monitoring, better
„Advanced“ Commercial and AGPL 6,000 €
management frontend and
support
+
„Enterprise“ Commercial and AGPL Enterprise frontend, HA and 24,000 €
premium support
21
45. INFINITEGRAPH
Overview
• Distributed graph database
• Implemented in C++ (APIs in Java, C#, Python, etc.)
• Based on Objectivity/DB (distributed object database)
• Established 1988 in Sunnyvale, California
• Enterprise-customers + US-government
• Support for Blueprints
23
63. INFINITEGRAPH
Pricing
Price
Edition License Description
(annual)
Complete database but
„InfiniteGraph FREE“ Free limitation to 1 million 0 €
vertices or edges
starts at app. 5000 $
„Pay as you go“ Commercial No limitation (depends on count of
vertices and edges)
Focus on „bigger“ >..... €
„Unit or site licensing“ Commercial
environments (No price available)
Source: http://objectivity.com/products/infinitegraph/overview
29
65. FALLEN-8
Overview
• In-memory graph database
• Implemented in C# (platform independent because of mono)
• 4 billion vertices or edges, each element can have app. 65000
properties
• Indexes on vertices and/or edges
• Core is open source (MIT-license), plugins can have any license
31
66. FALLEN-8
Persistence
• Persistence in form of „save-points“ (all vertices and edges are
serialized en bloc)
• Commodity hardware allows to (de)serialize app. 2 million
vertices or edges per second
• Saving blocks only write operations
• Performance + reliability
32
67. FALLEN-8
Architecture
Services
Index-
Traversal-framework
framework
Core API
Vertices and edges
RAM
33
68. FALLEN-8
Architecture and some plugins
HA + ACID Transaktionen
REST API (via JSON) + Management/query frontend
Traversal-framework Index-framework
(incl. path analysis) (incl R* tree index)
Core API
Vertices and edges
RAM
34