The document describes mm-ADT, a proposed multi-model abstract datatype that aims to provide a universal data structure, processing model, and instruction set that can support various database models like graph, document, relational etc. in a common framework. Key goals are to release a stable mm-ADT specification, compiler and virtual machine, as well as a basic reference implementation by early 2020. The presentation focuses on the universal data structure component, describing how mm-ADT would define custom datatypes, instances, access paths and more using a bytecode language.
Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.
Faunus is a graph analytics engine built atop the Hadoop distributed computing platform. The graph representation is a distributed adjacency list, whereby a vertex and its incident edges are co-located on the same machine. Querying a Faunus graph is possible with a MapReduce-variant of the Gremlin graph traversal language. A Gremlin expression compiles down to a series of MapReduce-steps that are sequence optimized and then executed by Hadoop. Results are stored as transformations to the input graph (graph derivations) or computational side-effects such as aggregates (graph statistics). Beyond querying, a collection of input/output formats are supported which enable Faunus to load/store graphs in the distributed graph database Titan, various graph formats stored in HDFS, and via arbitrary user-defined functions. This presentation will focus primarily on Faunus, but will also review the satellite technologies that enable it.
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
The document summarizes the components of the Gremlin traversal machine:
- The graph is stored in memory and represents the data as vertices, edges, and properties.
- Traversers represent computational threads that traverse the graph. They track their location, path history, and program counter in the traversal program.
- Traversals are programs that manipulate traversers to query and update the graph.
- Gremlin is agnostic to the underlying graph database and can traverse graphs in various systems like TinkerGraph, Titan, Neo4j, and distributed processors.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Tutorial semantic wikis and applicationsMark Greaves
This document outlines an agenda for a tutorial on semantic wikis and applications. The tutorial will include introductions to Semantic MediaWiki, diving deeper into its features, applications of semantic wikis, extensions for Semantic MediaWiki developed by various contributors, connecting Semantic MediaWiki with MS Office, augmenting it with a triple store, discussing future development, and concluding with a question and answer session, followed by a 30 minute break.
The document discusses OrientDB, a document-graph database. It provides an overview of key OrientDB concepts like documents, vertices, edges, classes, clusters, and properties. It also compares the relational and graph data models. The presentation was given by Greg McCarvell and introduces Node.js integration with OrientDB through examples.
Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.
Faunus is a graph analytics engine built atop the Hadoop distributed computing platform. The graph representation is a distributed adjacency list, whereby a vertex and its incident edges are co-located on the same machine. Querying a Faunus graph is possible with a MapReduce-variant of the Gremlin graph traversal language. A Gremlin expression compiles down to a series of MapReduce-steps that are sequence optimized and then executed by Hadoop. Results are stored as transformations to the input graph (graph derivations) or computational side-effects such as aggregates (graph statistics). Beyond querying, a collection of input/output formats are supported which enable Faunus to load/store graphs in the distributed graph database Titan, various graph formats stored in HDFS, and via arbitrary user-defined functions. This presentation will focus primarily on Faunus, but will also review the satellite technologies that enable it.
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
The document summarizes the components of the Gremlin traversal machine:
- The graph is stored in memory and represents the data as vertices, edges, and properties.
- Traversers represent computational threads that traverse the graph. They track their location, path history, and program counter in the traversal program.
- Traversals are programs that manipulate traversers to query and update the graph.
- Gremlin is agnostic to the underlying graph database and can traverse graphs in various systems like TinkerGraph, Titan, Neo4j, and distributed processors.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Tutorial semantic wikis and applicationsMark Greaves
This document outlines an agenda for a tutorial on semantic wikis and applications. The tutorial will include introductions to Semantic MediaWiki, diving deeper into its features, applications of semantic wikis, extensions for Semantic MediaWiki developed by various contributors, connecting Semantic MediaWiki with MS Office, augmenting it with a triple store, discussing future development, and concluding with a question and answer session, followed by a 30 minute break.
The document discusses OrientDB, a document-graph database. It provides an overview of key OrientDB concepts like documents, vertices, edges, classes, clusters, and properties. It also compares the relational and graph data models. The presentation was given by Greg McCarvell and introduces Node.js integration with OrientDB through examples.
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDSSumant Tambe
The document summarizes the Data Distribution Service (DDS) data-centric communication model. DDS uses a typed data-centric approach where:
1) Data objects have a type and obey type rules.
2) Middleware maintains the state of each data object and caches objects for applications to read.
3) Objects are identified by keys and have configurable quality of service settings for reliability, ownership, history and more.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
Apache Doris (incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis. It is open-sourced by Baidu. Doris mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris not only provides batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
The document discusses Structured Query Language (SQL). It describes SQL as a declarative query language used to define database schemas, manipulate data through queries, and perform operations like insert, update, delete. It also outlines SQL's data definition language for defining database structure and data types, and its data manipulation language for conducting queries and CRUD operations. The document provides a brief history of SQL and describes the SQL standard.
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
This document provides an overview of MetaQL, which allows composing queries across NoSQL, SQL, SPARQL, and Spark databases using a domain model. Key points include:
- MetaQL uses a domain model to define concepts and compose typed queries in code that can execute across different databases.
- This separates concerns and improves developer efficiency over managing schemas and databases separately.
- Examples demonstrate MetaQL queries in graph, path, select, and aggregation formats across SQL, NoSQL, and RDF implementations.
Utah Code Camp, Spring 2016. http://utahcodecamp.com In this presentation I describe modern C++. Modern C++ assumes features introduced in the C++11/14 standard. An overview of the new features is presented and some idioms for mdoern C++ based on those features are presented.
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
This document introduces U-SQL, a language for big data analytics on Azure Data Lake Analytics. U-SQL unifies SQL with imperative coding, allowing users to process both structured and unstructured data at scale. It provides benefits of both declarative SQL and custom code through an expression-based programming model. U-SQL queries can span multiple data sources and users can extend its capabilities through C# user-defined functions, aggregates, and custom extractors/outputters. The document demonstrates core U-SQL concepts like queries, joins, window functions, and the metadata model, highlighting how U-SQL brings together SQL and custom code for scalable big data analytics.
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
• The challenges involved with managing time-series data in IoT applications
• Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
• How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
• At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Application development with Oracle NoSQL Database 3.0Anuj Sahni
The document introduces table-based data modeling features for Oracle NoSQL Database. It discusses using tables to simplify application data modeling with familiar concepts like tables and data types. Examples show how to model user and email data using tables, including defining the schema using DDL, querying the data using DML, and indexing the tables. The document also provides an example of modeling user and email data from an email client application to illustrate how to approach data modeling.
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
The document provides an overview of developing iOS applications including the required language (Objective-C), frameworks (Cocoa Touch), tools, and development process. It discusses setting up a Mac development environment, learning Objective-C syntax and concepts like classes, methods, properties, protocols, and the iOS application layers including Cocoa Touch.
Garbage collection has largely removed the need to think about memory management when you write Java code, but there is still a benefit to understanding and minimizing the memory usage of your applications, particularly with the growing number of deployments of Java on embedded devices. This session gives you insight into the memory used as you write Java code and provides you with guidance on steps you can take to minimize your memory usage and write more-memory-efficient code. It shows you how to
• Understand the memory usage of Java code
• Minimize the creation of new Java objects
• Use the right Java collections in your application
• Identify inefficiencies in your code and remove them
Video available from Parleys.com:
https://www.parleys.com/talk/how-write-memory-efficient-java-code
This document proposes a software architecture to address complex and dynamic data modeling challenges. The proposed solution has four main components: [1] An OSGi-based architecture for modularity, reusability and dynamic updates. [2] A graph database (Neo4j) to flexibly store relationships and enable natural queries. [3] A user interface built with AngularJS and D3.js for rich, data-driven visualization. [4] The use of a "mad developer" to implement the architecture. The architecture aims to reduce complexity, support dynamic data and provide a flexible yet user-friendly interface.
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
Let's talk about how you can get the most out of Azure DocumentDB. In this session we will dive deep into the mechanics of DocumentDB and explain the various levers available to tune performance and scale. From partitioned collections to global databases to advanced indexing and query features - this session will equip you with the best practices and nuggets of information that will become invaluable tools in your toolbox for building blazingly fast large-scale applications.
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
The talk will motivate why Apache Arrow and related projects (e.g. DataFusion) is a good choice for implementing modern analytic database systems. It reviews the major components in most databases and explains where Apache Arrow fits in, and explains additional integration benefits from using Arrow.
Cloud Spanner is the first and only relational database service that is both strongly consistent and horizontally scalable. With Cloud Spanner you enjoy all the traditional benefits of a relational database: ACID transactions, relational schemas (and schema changes without downtime), SQL queries, high performance, and high availability. But unlike any other relational database service, Cloud Spanner scales horizontally, to hundreds or thousands of servers, so it can handle the highest of transactional workloads.
Introduction to Data Analtics with Pandas [PyCon Cz]Alexander Hendorf
Pandas is the Swiss-Multipurpose Knife for Data Analysis in Python. With Pandas dealing with data-analysis is easy and simple but there are some things you need to get your head around first as Data-Frames and Data-Series.
The talk with provide an introduction to Pandas for beginners and cover
reading and writing data across multiple formats (CSV, Excel, JSON, SQL, HTML,…)
statistical data analysis and aggregation.
work with built-in data visualisation
inner-mechanics of Pandas: Data-Frames, Data-Series & Numpy.
how to work effectively with Pandas.
The document discusses the idea of the universal graph - that everything can be modeled as a graph of vertices and edges. It proposes some open problems regarding how to model processes and qualia as part of the universal graph structure, and how one might manipulate the laws of physics by altering the processes that govern the evolution of the physical world graph.
More Related Content
Similar to mm-ADT: A Multi-Model Abstract Data Type
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDSSumant Tambe
The document summarizes the Data Distribution Service (DDS) data-centric communication model. DDS uses a typed data-centric approach where:
1) Data objects have a type and obey type rules.
2) Middleware maintains the state of each data object and caches objects for applications to read.
3) Objects are identified by keys and have configurable quality of service settings for reliability, ownership, history and more.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
Apache Doris (incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis. It is open-sourced by Baidu. Doris mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris not only provides batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
The document discusses Structured Query Language (SQL). It describes SQL as a declarative query language used to define database schemas, manipulate data through queries, and perform operations like insert, update, delete. It also outlines SQL's data definition language for defining database structure and data types, and its data manipulation language for conducting queries and CRUD operations. The document provides a brief history of SQL and describes the SQL standard.
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
This document provides an overview of MetaQL, which allows composing queries across NoSQL, SQL, SPARQL, and Spark databases using a domain model. Key points include:
- MetaQL uses a domain model to define concepts and compose typed queries in code that can execute across different databases.
- This separates concerns and improves developer efficiency over managing schemas and databases separately.
- Examples demonstrate MetaQL queries in graph, path, select, and aggregation formats across SQL, NoSQL, and RDF implementations.
Utah Code Camp, Spring 2016. http://utahcodecamp.com In this presentation I describe modern C++. Modern C++ assumes features introduced in the C++11/14 standard. An overview of the new features is presented and some idioms for mdoern C++ based on those features are presented.
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
This document introduces U-SQL, a language for big data analytics on Azure Data Lake Analytics. U-SQL unifies SQL with imperative coding, allowing users to process both structured and unstructured data at scale. It provides benefits of both declarative SQL and custom code through an expression-based programming model. U-SQL queries can span multiple data sources and users can extend its capabilities through C# user-defined functions, aggregates, and custom extractors/outputters. The document demonstrates core U-SQL concepts like queries, joins, window functions, and the metadata model, highlighting how U-SQL brings together SQL and custom code for scalable big data analytics.
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
• The challenges involved with managing time-series data in IoT applications
• Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
• How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
• At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Application development with Oracle NoSQL Database 3.0Anuj Sahni
The document introduces table-based data modeling features for Oracle NoSQL Database. It discusses using tables to simplify application data modeling with familiar concepts like tables and data types. Examples show how to model user and email data using tables, including defining the schema using DDL, querying the data using DML, and indexing the tables. The document also provides an example of modeling user and email data from an email client application to illustrate how to approach data modeling.
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
The document provides an overview of developing iOS applications including the required language (Objective-C), frameworks (Cocoa Touch), tools, and development process. It discusses setting up a Mac development environment, learning Objective-C syntax and concepts like classes, methods, properties, protocols, and the iOS application layers including Cocoa Touch.
Garbage collection has largely removed the need to think about memory management when you write Java code, but there is still a benefit to understanding and minimizing the memory usage of your applications, particularly with the growing number of deployments of Java on embedded devices. This session gives you insight into the memory used as you write Java code and provides you with guidance on steps you can take to minimize your memory usage and write more-memory-efficient code. It shows you how to
• Understand the memory usage of Java code
• Minimize the creation of new Java objects
• Use the right Java collections in your application
• Identify inefficiencies in your code and remove them
Video available from Parleys.com:
https://www.parleys.com/talk/how-write-memory-efficient-java-code
This document proposes a software architecture to address complex and dynamic data modeling challenges. The proposed solution has four main components: [1] An OSGi-based architecture for modularity, reusability and dynamic updates. [2] A graph database (Neo4j) to flexibly store relationships and enable natural queries. [3] A user interface built with AngularJS and D3.js for rich, data-driven visualization. [4] The use of a "mad developer" to implement the architecture. The architecture aims to reduce complexity, support dynamic data and provide a flexible yet user-friendly interface.
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
Let's talk about how you can get the most out of Azure DocumentDB. In this session we will dive deep into the mechanics of DocumentDB and explain the various levers available to tune performance and scale. From partitioned collections to global databases to advanced indexing and query features - this session will equip you with the best practices and nuggets of information that will become invaluable tools in your toolbox for building blazingly fast large-scale applications.
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
The talk will motivate why Apache Arrow and related projects (e.g. DataFusion) is a good choice for implementing modern analytic database systems. It reviews the major components in most databases and explains where Apache Arrow fits in, and explains additional integration benefits from using Arrow.
Cloud Spanner is the first and only relational database service that is both strongly consistent and horizontally scalable. With Cloud Spanner you enjoy all the traditional benefits of a relational database: ACID transactions, relational schemas (and schema changes without downtime), SQL queries, high performance, and high availability. But unlike any other relational database service, Cloud Spanner scales horizontally, to hundreds or thousands of servers, so it can handle the highest of transactional workloads.
Introduction to Data Analtics with Pandas [PyCon Cz]Alexander Hendorf
Pandas is the Swiss-Multipurpose Knife for Data Analysis in Python. With Pandas dealing with data-analysis is easy and simple but there are some things you need to get your head around first as Data-Frames and Data-Series.
The talk with provide an introduction to Pandas for beginners and cover
reading and writing data across multiple formats (CSV, Excel, JSON, SQL, HTML,…)
statistical data analysis and aggregation.
work with built-in data visualisation
inner-mechanics of Pandas: Data-Frames, Data-Series & Numpy.
how to work effectively with Pandas.
Similar to mm-ADT: A Multi-Model Abstract Data Type (20)
The document discusses the idea of the universal graph - that everything can be modeled as a graph of vertices and edges. It proposes some open problems regarding how to model processes and qualia as part of the universal graph structure, and how one might manipulate the laws of physics by altering the processes that govern the evolution of the physical world graph.
The Gremlin traversal machine is composed of three components: a graph, a traversal, and a set of traversers. Learn how these components interact to enable distributed, vendor-agnostic, OLTP/OLAP-based graph computing.
This talk was presented live at DataStax's Support Summit in Carmel, CA (April 2017) and Engineering Summit in Las Vegas, NV (May 2017).
This document summarizes the key concepts and components of Gremlin's graph traversal machinery:
- Gremlin uses a traversal language to express graph queries via step composition, with steps mapping traversers between domains.
- Traversals are compiled to bytecode and optimized by traversal strategies before being executed by the Gremlin machine.
- The Gremlin machine consists of steps implementing functions that process traverser streams. Their composition forms the traversal.
- Gremlin is language-agnostic, with language variants translating to a shared bytecode that interacts with the Java-based implementation.
This presentation was given on January 17, 2016 at the GraphDay conference in Austin, Texas. The slides demonstrate the use of wave dynamics in graph structures. Moreover, they demonstrate how to implement quantum processes on graph structures.
There is an associated article available at http://arxiv.org/abs/1511.06278 (Quantum Walks with Gremlin).
A presentation of Apache TinkerPop's Gremlin language with running examples over the MovieLens dataset. Presented August 19, 2015 at NoSQL NOW in San Jose, California.
This document discusses graph-based computing and traversal using Gremlin and Titan. It provides examples of querying a graph about relationships between characters in Greek mythology like Hercules. Traversal operations are demonstrated to find other characters Hercules may know or which actor played him in a movie. The value of graph analysis for insights and recommendations is also discussed.
Who am I and why do I feel that the world is not infinitely perfect? Which technologies should I use to rectify this situation? Enter the graph and the graph traversal.
The document discusses graphs and graph databases. It introduces the concept of property graphs and how they can intuitively model complex relationships between entities. It discusses how graph traversal enables expressive querying and numerous analyses of graph data. The document uses examples involving Greek mythology to illustrate graph concepts and traversal queries.
There is nothing more fascinating and utterly mind-bending than traversing a graph. Those who succumb to this data processing pattern euphorically suffer from graph pathology.
This is a case study of the Graph Addict.
Gremlin is a graph traversal language that connects to various graph databases/frameworks.
* Neo4j [http://neo4j.org]
* OrientDB [http://orientechnologies.com]
* DEX [http://www.sparsity-technologies.com/dex]
* OpenRDF Sail [http://openrdf.org]
* JUNG [http://jung.sourceforge.net]
This lecture addresses the state of Gremlin as of the 0.9 (April 16, 2011).
This tutorial/lecture addresses various aspects of the graph traversal language Gremlin. In particular, the presentation focuses on Gremlin 0.7 and its application to graph analysis and manipulation.
Memoirs of a Graph Addict: Despair to RedemptionMarko Rodriguez
This document summarizes a lecture about graph databases and graph structures. It discusses graph databases as an alternative to relational databases that allows for direct linking of objects without joins. It then describes the speaker's 10 years of experience working with graph structures and applications. Finally, it outlines the TinkerPop product suite for working with graph databases.
Aanswers · Aanswers ◦ n(I) π
etc...15
15
I is the identity matrix.
The Multi-Relational Path Algebra:
- Allows single-relational algorithms to be applied to multi-relational graphs
- Provides a universal framework for defining paths through a multi-relational
graph
- Enables the computation of multiple primary eigenvectors, each
corresponding to a different path definition
- In effect, provides multiple definitions of centrality for a multi-relational
network
- Is Turing complete—any computable path can be expressed
- Is a general framework—applies to any multi-relational data model
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
A graph is a data structure that links a set of vertices by a set of edges. Modern graph databases support multi-relational graph structures, where there exist different types of vertices (e.g. people, places, items) and different types of edges (e.g. friend, lives at, purchased). By means of index-free adjacency, graph databases are optimized for graph traversals and are interacted with through a graph traversal engine. A graph traversal is defined as an abstract path whose instance is realized on a graph dataset. Graph databases and traversals can be used for searching, scoring, ranking, and in concert, recommendation. This presentation will explore graph structures, algorithms, traversal algebras, graph-related software suites, and a host of examples demonstrating how to solve real-world problems, in real-time, with graphs. This is a whirlwind tour of the theory and application of graphs.
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
The graph/network domain has been driven by the creativity of numerous individuals from disparate areas of the academic and the commercial sector. Examples of contributing academic disciplines include mathematics, physics, sociology, and computer science. Given the interdisciplinary nature of the domain, it is difficult for any single individual to objectively realize and speak about the space as a whole. Any presentation of the ideas is ultimately biased by the formal training and expertise of the individual. For this reason, I will simply present on the domain from my perspective---from my personal experiences. More specifically, from my perspective biased by cognitive and computer science.
This is an autobiographical lecture on my life (so far) with graphs/networks.
A graph is a structure composed of a set of vertices (i.e.~nodes, dots) connected to one another by a set of edges (i.e.~links, lines). The concept of a graph has been around since the late 19th century, however, only in recent decades has there been a strong resurgence in the development of both graph theories and applications. In applied computing, since the late 1960s, the interlinked table structure of the relational database has been the predominant information storage and retrieval paradigm. With the growth of graph/network-based data and the need to efficiently process such data, new data management systems have been developed. In contrast to the index-intensive, set-theoretic operations of relational databases, graph databases make use of index-free traversals. This presentation will discuss the graph traversal programming pattern and its application to problem-solving with graph databases.
The document discusses network data structures and semantic networks. It provides examples of undirected, directed, and semantic networks. It describes how PageRank can be applied to semantic networks using grammar-based random walkers that follow the relationships between nodes defined in an ontology. It also lists related publications by the author on modeling systems and computations as semantic networks.
The document summarizes a two-year project to develop an ontology and data model to represent scholarly works and their usage. It will analyze bibliographic data and usage data from sources like journals, papers, and online usage logs to develop metrics to quantify the scholarly community. The first year will focus on developing the ontology and algorithms while the second year will analyze the results and report findings.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
mm-ADT: A Multi-Model Abstract Data Type
1. mm-ADT
A Multi-Model Abstract Datatype
Dr. Marko A. Rodriguez
Founder, RReduX Inc.
Project Management Committee, Apache TinkerPop
Rodriguez, M.A., “mm-ADT: A Multi-Model Abstract Datatype,” ApacheCon 2019, Las Vegas, NV, September 2019.
REDUX
RFunded by:
Collaborators
Daniel Kuppitz and Stephen Mallette
2. WARNING
This Slide Presentation Should Not Be Used as a Reference
As of September 2019, there exists a rough draft of the mm-ADT
specification (0.1-alpha) and a Java-based prototype of the mm-
ADT-bc compiler. The specifics presented here are likely to
change as the project matures.
The goal is to release the following artifacts by early 2020.
1.) A stable 1.0 mm-ADT specification document.
2.) A Java-based mm-ADT-bc compiler and virtual machine.
3.) A basic reference implementation of an mm-ADT database.
5. Universal Database Components
UNIVERSAL DATA STRUCTURE
UNIVERSAL PROCESSING MODEL
UNIVERSAL INSTRUCTION SET
80%
solid.
90%
solid.
60%
solid.
as of Sept, 2019
Presentation is primarily on this topic.
10. mm-ADT Compliant
The Benefit to Component Providers
…
…
…
QUERY
LANGUAGE
PROVIDERS
PROCESSING
ENGINE
PROVIDERS
STORAGE
SYSTEM
PROVIDERS
Your query language’s queries can
execute real-time, near-time, or
batch-time over any database.
Your processing engine can be
programmed by any query language
and compute over any database.
Your database can be processed by
many processors whose queries are
defined by many different query
languages.
M
ulti-m
odel
Program
m
ability
Largeruserbase
11. mm-ADT
A Multi-Model Abstract Datatype
Any Data Structure/Language Storage/Processing Agnosticism Universal Operators
GET
PUT
FILTER
SELECT
PROJECT
ORDER
DEDUP
GROUP
COUNT
SUM
MEAN
COALESCE
REPEAT
BRANCH
ARITHMETIC
RECORD LAYOUT
INDEX STRUCTURES
PARTITIONING
DENORMALIZATIONS
SORT ORDERS
…
PULL-BASED ITERATION
PUSH-BASED REACTIVE
MESSAGE PASSING
LINEAR SCAN/FILTER
LAZY/STRICT EVALUTION
QUANTUM WAVE INTERFERENCE
KEY/VALUE
JSON/XML DOCUMENT
PROPERTY/RDF GRAPH
RELATIONAL
WIDE-COLUMN
QUANTUM
…
SQL
SPARQL
CYPHER
GREMLIN
GRAPHQL
CQL
QUERY DOCUMENT
QUANTUM UNITIARY MATRICES
yes. any data structure.
yes. any data query language.
yes. any algorithm.
yes. addition. :)
12. mm-ADT Requirements
A Cluster-Oriented Bytecode and Virtual Machine
A type system for capturing database models and schemas.
Datatypes, composite structures, constraints, foreign keys,
cardinalities, dependencies.
A database perspective w/ loose coupling for future use cases.
Indices, denormalizations, read/write costs, transactions,
offloading computations, complex data access paths, multi-user,
distributed data, data locality, real-time/batch executions.
A bytecode foundation for use across software and languages.
Compatible with Java, C++, Go, etc.-based databases.
Reasonable compilation strategies to/from:
SQL, SPARQL, Cypher, Gremlin, GraphQL, QueryDocs, …future.
A universal processing model able to capture various strategies.
Single-threaded, single-machine queries.
Distributed, cluster-oriented queries.
Lazy and space-efficient for main-memory queries (real-time).
Eager and time-efficient for disk-based queries (batch).
Turing-complete for any known query/algorithm.
14. mm-ADT Bytecode Uses
model-ADT Definitions vs. model-ADT Queries
Created by the engineers developing an
mm-ADT compliant storage system.
Defines the storage system’s abstract
data type. A complex bytecode, but even
for feature rich storage systems, its
typically less than 100 lines of code. Created by the storage system user. A
subtype of the storage system’s model-ADT.
Best understood as the user’s application’s
“schema.” Can be generated by the storage
system via its schema language.
Created by higher-level query languages.
However, it can be written directly by a
human user. A simple bytecode with a
similar structure to the popular query
language of the respective model-ADT.
[model,pg=>mm,
[define,vertex,…]
[define,edge,…]
[define,db,…]] [model,social=>pg,
[define,person,vertex&…]
[define,likes,edge&…]
[define,db,…]]
[db][get,age]
[dedup]
[sum]
modeldefinitionsupported
bythestoragesystem
instance
data
m
anipulation
and
analysis
sub-m
odeldefinitions
describing
the
user’s
dom
ain
ofdiscourse
When compiling query bytecode,
definitions are used for type inference/
checking, query optimization, and
cross model embeddings.
The
m
ostcom
plex
m
odel
to
date
is
65
lines
ofcode.
The language used for all examples is the (somewhat) human readable/writable bytecode language called mm-ADT-bc.
17. [define,person,[name:@str,age:@int]]
symbol structureopcode
Custom Datatypes
Type Definitions
@str
[is,[type][eq, str ]]: @obj => @str?
@int
[is,[type][eq, int ]]: @obj => @int?
@person
[is,[get,name][type][eq, str ]][is,[get,age][type][eq, int ]]: @obj => @person?
@inst
mm-ADT types are filter bytecode (predicates). If an object is unfiltered by the bytecode, then the object is that type.
[define,person,[name:[is,[type][eq, str ]],age:[is,[type][eq, int ]]]]
18. [define,person,[name:@str,age:@int>e(0)]]
Custom Datatypes
Type Refinement
[define,probability, @real>e(0.0)<e(1.0)]
[define,varchar255, @str&[is,[len][lte,255]]
[define,short, @int>e(-32767)<e(32767)]
[define,pair, [@obj,@obj]]
[define,triple, [@obj,@obj,@obj]]
[define,years,@int>e(0)]
[define,person,[name:@str,age:@years]
anonymous
type
named
type
Anonymous types are the norm in mm-ADT.
20. [define,person,[name:@str,age:@int>e(0),friend:@person?]]
Custom Datatypes
Type Quantifiers
{3,5}: 3 to 5
* : {0,infty}
+ : {1,infty}
? : {0,1}
{2} : {2,2}
{3,} : {3,infty}
All standard query languages are bound to the natural numbers with standard multiplication and addition definitions.
[define,none, @obj{0} ]
[define,some, @obj{1} ] // @obj
[define,maybe,@obj{0,1}] // @obj?
Quantifiers can be matrices composed of complex numbers.
Constructive and deconstructive wave interference patterns can be incorporated into a query.
This does not radically alter the bytecode as quantifiers are fundamental to mm-ADT.
Quantifiers can be real numbers (for example, values between 0.0 and 1.0).
Real quantifiers can model energy diffusions, fuzzy set semantics, probabilistic/stochastic behavior.
Unitary
M
artrix
Quantifiers
Real Num
ber
Quantifiers
Quantifiers are any algebraic ring with unity.
[mult][add][sub][zero][one]
@str?
@person?@none | @some = @maybe
{0,0} + {1,1} = {0,1}
{min,max}
[define,q,@unitary]
@vertex?
[define,q,@real>e(0.0)<e(1.0)]
@obj{[1,0,1,0]}
@obj{0.92}
21. [define,person,[name:@str,age:@int>e(0),friend:@person?]]
[define,loner,@person&[friend:@person{0}]]
Custom Datatypes
Subtypes
{x1,x2}&{y1,y2} => {x1*y1,x2*y2}
{x1,x2}|{y1,y2} => {min(x1,y1),max(x2,y2)}
{0,1}
{0,0}
[name:@str,age:@int>e(0),friend:@person?]&[friend:@person{0}]
=>
[name:@str,age:@int>e(0),friend:@person{0}]
Composition through & and | are defined for each type with default definitions from the underlying fundamental type.
super type
loner <: person
N
ote
thatthere
are
also
axiom
s
forthe
com
position
ofa
type’s
instruction
rew
rites.
N
otdiscussed
in
this
presentation.
composition
{0,1}&{0,0}=>{0,0}=>{0}
type quantifier
composition
23. f · h : A → D
type
value/structure
type
quantifier
instance
access
mapsFrom
token
mm-ADT Datatypes
The General Structure of a Type
A{x,y} <= [i]*
-> [f]+ => [j]*
-> [g]+ => [k]*
-> [h]+ => [l]*
instruction
pattern
instruction
rewrite
instruction
token
type
instructions
mapsTo
token
The type is represented in mm-ADT-bc syntax. All mm-ADT bytecode generating languages leverage these components.
A <= [i]
-> [f] => [B <= [j]]
-> [g] => [C <= [k]]
f : A → B
A -> [f] => B
-> [g] => C
A -> [f] => B -> [h] => D
Inspired by OOP where
methods/instructions
grouped by domain class/type.
Types with instructions
denote function branches.
Paths from type to type
via instructions denote
function composition.
def
f(a) = b
spec
f : A → B
g : A → C
+ *
domain
range
24. [define,person,
[name:@str~x,age:@int] <= [db][get,people][is,[get,name][eq,~x]]]
@person&[name:marko,age:29] all constant values
@person&[age:29] name unbound
@person&[name:marko] accessible via <=
instance
type
reference
mm-ADT Datatypes
Instance Data Access Path
“people 29 years of age”
instance access
(canonical representation)
mm-ADT is a cluster-oriented programming language where data access has different costs depending on locality.
A <= […]
25. [define,person,[name:@str,age:@int]]
[define,db,person*
-> [order,[gt,[get,age]]] =>
-> [dedup,[get,name]] =>
-> [count] => [ref,@int <= [db][count]]
-> [is,[get,name][eq,@str~x]] => [ref,@person? <= [db][is,[get,name][eq,~x]]]
indexlookup
aggregate
mm-ADT Datatypes
Instruction Rewrites
no-op
All [ref] instructions are resolved by the storage system or processing engine and are used to access secondary structures.
[db][is,[get,name][eq,marko]] => [ref,@person? <= [db][is,[get,name][eq,marko]]]
O(log n) index queryO(n) linear scan
submitted bytecode rewritten bytecode
26. Primary Structure: The structure associated with the database’s conceptual model.
Secondary Structures: The auxiliary structures used to improve query performance and ensure data integrity.
Relational model-ADT
Primary and Secondary Structures
index
str bool intint
schema
denormalization
sort order
unique
checks
aggregates
aa 1
ab 1
ba 2
bb 7
ca 7
cb 7
da 4
dc 8
1 2
1 13
2 17
7 29
4 30
3 33
2 33
8 39
9 60
9 65
3 81
5 82
1 83
< 10
SUM=567
28. CREATE TABLE people (
name varchar(255),
age int
)
[define,person,[name:@str?,age:@int?]]
[define,persons,@person*]
[define,db,[people:@persons]]
Relational model-ADT
Primary Structure (Subtype)
29. Relational model-ADT
Secondary Structures (Primary Key)
[define,person,[name:@str,age:@int?]]
[define,persons,@person*
-> [dedup,[get,name]] => ]
[define,db,[people:@persons]]
CREATE TABLE people (
name varchar(255),
age int,
PRIMARY KEY(name)
)
30. Relational model-ADT
Secondary Structures (Sort Orders)
[define,person,[name:@str,age:@int?]]
[define,persons,@person*
-> [dedup,[get,name]] =>
-> [order,[gt,[get,name]]] => ]
[define,db,[people:@persons]]
CREATE TABLE people (
name varchar(255),
age int,
PRIMARY KEY(name,ASC)
)
35. [define,person,[name:@str~x,
age:@int>e(0),
friend:@person&[name:@str]] <= [db][get,people]
[is,[get,name]
[eq,~x]]
[define,persons,@person*
-> [dedup,[get,name]] =>
-> [order,[gt,[get,name]]] =>
-> [is,[get,name][eq,@str~x]] => [ref,@person&[name:~x]?
-> [is,[get,age][eq,@int~y]] => [ref,@person&[name:~x,age:~y]?]]
-> [is,[get,age][eq,@int~y]] => [ref,@person&[age:~y]*
-> [is,[get,name][eq,@str~x]] => [ref,@person&[name:~x,age:~y]?]]]
[define,db,[people:@persons]]
Relational model-ADT
Secondary Structures (Multi-Key Index)
CREATE TABLE people (
name varchar(255),
age int NOT NULL,
friend varchar(255),
PRIMARY KEY(name,ASC),
CHECK (age >= 0),
FOREIGN KEY (friend) REFERENCES people(name),
CREATE UNIQUE INDEX name_idx ON people(name),
CREATE INDEX name_age_idx ON people(name,age)
)
36. [define,person,[name:@str~x,
age:@int>e(0),
friend:@person&[name:@str]] <= [db][get,people]
[is,[get,name]
[eq,~x]]
[define,persons,@person*
-> [dedup,[get,name]] =>
-> [order,[gt,[get,name]]] =>
-> [is,[get,name][eq,@str~x]] => [ref,@person&[name:~x]?
-> [is,[get,age][eq,@int~y]] => [ref,@person&[name:~x,age:~y]?]]
-> [is,[get,age][eq,@int~y]] => [ref,@person&[age:~y]*
-> [is,[get,name][eq,@str~x]] => [ref,@person&[name:~x,age:~y]?]]
-> [count] => [ref,@int <= [db][get,people]
[count]]]
[define,db,[people:@persons]]
Relational model-ADT
Secondary Structures (Aggregates)
CREATE TABLE people (
name varchar(255),
age int NOT NULL,
friend varchar(255),
PRIMARY KEY(name,ASC),
CHECK (age >= 0),
FOREIGN KEY (friend) REFERENCES people(name),
CREATE UNIQUE INDEX name_idx ON people(name),
CREATE INDEX name_age_idx ON people(name,age)
)
37. [define,person,[name:@str~x,
age:@int>e(0),
friend:@person&[name:@str]] <= [db][get,people]
[is,[get,name]
[eq,~x]
-> [get,friend][get,friend] => ]
[define,persons,@person*
-> [dedup,[get,name]] =>
-> [order,[gt,[get,name]]] =>
-> [is,[get,name][eq,@str~x]] => [ref,@person&[name:~x]?
-> [is,[get,age][eq,@int~y]] => [ref,@person&[name:~x,age:~y]?]]
-> [is,[get,age][eq,@int~y]] => [ref,@person&[age:~y]*
-> [is,[get,name][eq,@str~x]] => [ref,@person&[name:~x,age:~y]?]]
-> [count] => [ref,@int <= [db][get,people]
[count]]]
[define,db,[people:@persons]]
Relational model-ADT
Secondary Structures (Domain Logic)
CREATE TABLE people (
name varchar(255),
age int NOT NULL,
friend varchar(255),
PRIMARY KEY(name,ASC),
CHECK (age >= 0),
FOREIGN KEY (friend) REFERENCES people(name),
CREATE UNIQUE INDEX name_idx ON people(name),
CREATE INDEX name_age_idx ON people(name,age)
)
conceptual
secondary structures
Friends are pair bonded.
Its a weird example,
but we got stuck with such a simple model.
no-op
38. model-ADT Data Access Graph
Primary and Secondary Structures Specify Data Access Paths
secondary
structures
primary
structure
dereference
reference
Secondary structures “teleport” the processor from one location in the primary structure to another.
<
39. Data Access Path
Primary Structure Only
@person
@persons
@person
@person
@person
@person
@person
@person
@str
@str
[get,people]
map barrier
@db
[get,name]
[get,name]
[is,[get,age][gt,21]]
filter map reduce
[db]
initial
@person
@person
@person
@person
@person
@str
@str
@str
[get,name] [dedup]
@int
[count]
[get,name]
[get,name]
@str
@str
@str
@str
@str
...
[is,[get,age][gt,21]]
[is,[get,age][gt,21]]
[is,[get,age][gt,21]]
[is,[get,age][gt,21]]
[is,[get,age][gt,21]]
[is,[get,age][gt,21]]
[db][get,people]
[is,[get,age][gt,21]]
[get,name]
[dedup]
[count]
CREATE TABLE people (
name varchar(255) NOT NULL,
age int NOT NULL
)
[define,person,[name:@str,age:@int]]
[define,persons,@person*]
[define,db,[people:@persons]]
How
many unique names are there
for people over 21 years of age?
map map
ref
deref
time spent
flatmap
40. Data Access Path
Primary and Secondary Structures
[db][get,people]
[is,[get,age][gt,21]]
[get,name]
[dedup]
[count]
CREATE TABLE people (
name varchar(255),
age int NOT NULL,
PRIMARY KEY(name),
CREATE INDEX idx ON people(age)
)
[get,name]
[dedup]
[count]@person
[age:gt(21)]*
@str* @int
[is,[get,age]
[gt,21]]
@int
dereferenceindex query names
in index
names
are unique
index
hit count
@persons
[get,people]
@db
[db]
[define,person,[name:@str,age:@int]]
[define,persons,@person*
-> [dedup,[get,name]] =>
-> [is,[get,age][@rel~x,@int~y]] => [ref,@person[age:~x(~y)]*
-> [get,name] => [ref,@str* <= […]
-> [dedup] =>
-> [count] => [ref,@int <= […]]]]]
[define,db,[people:@persons]]
map mapmap map mapinitial no-op
runtime instruction rewrites
ref
deref
time spent
41. Common model-ADTs
Primary Structure
KEY-VALUE STORE
[model,kv=>mm,
[define,k,@num|@str]
[define,v,@obj]
[define,kv,[@k,@v]]
[define,db,@kv*]]
PROPERTY GRAPH DATABASE
[model,pg=>mm,
[define,properties,[(@str:@str|@num|@bool)*]]
[define,element,@properties&[id:@obj,label:@str]]
[define,edge,@element&[outV:@vertex,inV:@vertex]]
[define,vertex,@element&[inE:@edge*,outE:@edge*]]
[define,db,@vertex*]]
RELATIONAL DATABASE
[model,rdb=>mm,
[define,value,@bool|@num|@str]
[define,row,[(@str:@value)*]]
[define,table,@row*]
[define,db,[(@str:@table)*]]]
DOCUMENT DATABASE
[model,doc=>mm,
[define,dval,@bool|@num|@str|@dobj|@dlist]
[define,dlist,[(@dval)*]]
[define,dobj,[(@str:@dval)*]]
[define,doc,@dobj&[_id:@str]]
[define,collection,@doc*]
[define,db,[(@str:@collection)*]]]
A model-ADT is an abstract datatype modeled/embedded in mm-ADT.
[model]
domain of discourse
42. [model, rdb=>mm, [define,…][define,…][define,…]]
[model, pg=>rdb,[define,…][define,…][define,…]]
[model, doc=>rdb,[define,…][define,…][define,…]]
[model, kv=>rdb,[define,…][define,…][define,…]]
[model,quantum=>pg, [define,…][define,…][define,…]]
Multiple model-ADTs
Models, Embeddings, and Queries
rdb mm
pg
doc
kv the most expressive model
aligned with the storage system
storage system
implements
mm-ADT objects
storage system’s
supported models
X model
embedded
in Y model
all models must have
an embedding path
to mm-ADT
property graph query (Gremlin=>pg)
[db][is,[get,label][eq,person]]
[get,outE]
[is,[get,label][eq,phones]]
[get,inV]
[get,home]
document query (MongoDB Query=>doc)
[db][get,people]
[get,phones]
[get,home]
key/value query (REST=>kv)
[db][get,people]
[get,phones]
[get,home]
relational query (SQL=>rdb)
[db][get,people]
[at,[db][get,phones],
[get,person_id][is,[eq,[get,id]]]]
[get,home]
A query language compiles to the most aligned model-ADT. The generated bytecode is translated to mm-ADT via rewrites.
SQL
Gremlin
MongoDB Query
REST API
mm-ADT-bc
m
odel-ADT
definition
bytecode
m
odel-ADT
query
bytecode
quantum
QQL
X
Y
X must be a subset of
Y to be embedded
All embeddings are
injective. X embedded
in Y states that there
exists a bijection from
X to some subset of Y.
45. [db] :@obj{0} => @db
[get,people] :@db => @person*
[is, :@person => @person?
[get,name] :@person => @str
[eq,marko]] :@str => @bool
[get,age] :@person => @int
[get,people][db]
[name:marko,…]
[name:stephen,…]
[name:kuppitz,…]
[is, [get,name] [eq,marko]
29
]
[get,age]
[people:…]
[name:marko,…] marko [name:marko,…]true
mm-ADT Bytecode
Query Execution
W
hat are the ages
of the people named marko? Universal Computing via Streams
Universal computing via streams. Real-time, near-time, batch-time. Single machine, cluster-oriented. RAM, disk-based.
46. Stream Ring Theory
An Algebraic Ring for Stream Computing
f*g “Multiplication” is instruction composition
f+g “Addition” is stream branching (clone/split)
-f “Negative” inverts the object quantifier
0 [_]{0}: @obj* => @obj{0}
1 [_]{1}: @obj* => @obj*
Axioms
f+(g+h) = (f+g)+h
f*(g*h) = (f*g)*h
f*(g+h) = (f*g)+(f*h)
(f+h)*g = (f*g)+(h*g)
f*0 = 0
f*1 = f
f+0 = f
f+1 = f+1
f-f = f+(-f) = 0
The stream ring is the product of the quantifier ring and the instruction ring.
Turing
Com
plete
https://zenodo.org/record/2565243
notXOR
addition associativity
multiplication associativity
left distributive
right distributive
multiplicative zero
multiplicative one
additive zero
additive one
negative
47. Stream Ring Theory
Addition, Multiplication, and Distribution
a
b
a b c a b
c
d e
f
a + ba * b * c a * b * (c + (d*e)) * f
a
b
a
b
(a+b)*c = (a*c)+(b*c)
c
c
c
a*(b+c) = (a*b)+(a*c)
b
c
a
a
b
c
a= =
@inst~a + @inst~b = [branch,~a,~b]@inst~a * @inst~b = ~a~b
@inst~a{x,y} * @inst~b{w,z} = ~a~b{x*w,y*z}
@inst~a{x,y} + @inst~b{w,z} = [branch,~a,~b]{x+w,y+z}
Instruction and object quantifiers must be from the same algebraic ring as instruction quantifiers modulate object quantifiers.
48. Stream Ring Theory
Additive and Multiplicative Identities
a
0
x
a
0
x
x
a
0
y a
0
y
a + 0 = a
a 1
x
a 1
y
a 1
y
a * 1 = a
a(x) = y
0 = [_]{0}
1 = [_]{1}
filter
identity
a + b - ab = x?
x(a + b - ab) =
xa + xb - x(ab) =
xa + xb + -x(ab) =
_____________________
0 + 0 + 0 = 0
x + 0 + 0 = x
0 + x + 0 = x
x + x + -x = x
x{0,1} => x?
set union a ∪ b
a + b = x{0,2}
x(a + b) =
xa + xb =
__________________
0 + 0 = 0
0 + x = x
x + 0 = x
x + x = x{2}
x{0,2}
multi-set union a b
a: x => x?
b: x => x?
49. Stream Ring Theory
Binomial and Multinomial Expansions
a
b
a
b
c
b
b
c
(a+b)*(b+c) = ab + ac + b² + bc
a
b
b
c
=
FO
IL
m
ethod
53. SELECT name FROM people WHERE age < 29
[db][get,people]
[is,[get,age][lt,29]]
[get,name]
g.V(1).out(‘created’)
.in(‘created’)
.hasId(neq(1))
.groupCount().by(‘name’)
[db][get,V]
[is,[get,id][eq,1]]
[get,outE]
[is,[get,label][eq,created]]
[get,inV]
[get,inE]
[is,[get,label][eq,created]]
[get,outV]
[is,[get,id][neq,1]]
[group,[get,name],[_],[count]]
person(name:marko){friends{name}}
[db][get,people]
[is,[get,name][eq,marko]]
[get,friends]
[get,name]
Query Language Compilation
Generating model-ADT Bytecode
54. Query Language Compilation
Query Language and Storage System Decoupling
[model,doc=>mm] [model,rdb=>mm] [model,pg=>mm]
model-ADT embeddings use
rewrite rules to encode one
model-ADT within another.
This enables query
bytecode intended for one
model-ADT to execute
against another model-ADT.
Query languages compile to
their respective model-ADT
bytecode specification.
doc=>rdb
doc=>pg
rdb=>doc
rdb=>pg
pg=>doc
pg=>rdb
m
m
-ADT
objects
or
language
translation
55. mm-ADT Components
Storage System, Processing Engine, and Query Language
[model,wc=>mm,…]
[model,kv=>mm,…]
[model,pg=>mm,…]
An mm-ADT storage system
publishes the model-ADTs it
supports (and is optimized for).
Within a particular model-ADT
context, the database will produce
respective mm-ADT objects for
the processor to consume.
For all unsupported models, a
model-ADT embedding can be
used to translate to a supported
model-ADT. However, while
semantically correct, the storage
system might lack appropriate
secondary structures.
An mm-ADT processing engine is
able to accept arbitrary mm-ADT
bytecode and generate an
execution pipeline that can read/
write mm-ADT objects to/from the
underlying mm-ADT compliant
storage system.
mm-ADT processing engines are
agnostic to the model-ADT
bytecode encoding. They must
faithfully implement every
instruction in the mm-ADT
instruction set.
An mm-ADT query language has a
compiler to a specific model-ADT
bytecode specification.
The processing engine translates
the language compiler’s model-
ADT bytecode into an execution
pipeline. At runtime, the processor
and storage system communicate
via mm-ADT objects to ultimately
yield the answer to the query.
[db][get,outE]
[get,inV]
[get,name]
a b
c
d e
f
66. Next Generation Models
mm-ADT is for cluster-oriented, general purpose computing.
Marko A. Rodriguez
Daniel Kuppitz
Stephen Mallette
Patronage from RReduX and DataStax
Credits