DataDay 2023 Presentation - Notes

Outrageous Ideas
Data Day Texas - January 28, 2023
For Graph Databases
Welcome to this talk on Outrageous Ideas for Graph Databases.

@maxdemarzi
maxdemarzi.com
GitHub.com/maxdemarzi
Max De Marzi
My name is Max De Marzi. Follow me on twitter at maxdemarzi, checkout my blog at maxdemarzi.com or read my bad code on github.com slash maxdemarzi. I’ve spent
the last part of my career teaching people about graph databases.

In fact, if you go to the earliest blog post and check out the date. You’ll see… January 2012.

Ten Years
In the Graph game.
That’s 10 years, in the graph game telling people about graphs.

But I am not an ivory tower academic, writing about stu
ff
I don’t actually have
fi
rst hand experience with.

I work the
fi
eld. I write code. I get my hands dirty and I am the one getting yelled at when the stu
ff
doesn’t work.

But I’m not here to talk about me, I’m here to talk about Graphs… and that ladies, gentlemen and everyone else, is called a chart not a graph. That chart tells us that
Graph Databases have grown in popularity more than the other categories.

1.8%
However, even after 10 years, we’re haven’t broken past 2%. Look at Document Stores which broke two digits and we can’t break 2 percent.

Because pretty much everyone sucks.

2018
Back in 2018, Peter Boncz gave a talk on why graph databases suck.

Ideas are Wrong
• Too Many Back-ends (aka
Tinkerpop is wrong)
• No lessons applied from
Relational Databases
• API is incomplete (bulk)
• Query Languages are
Incompetent
- Peter Boncz 2018
He started o
ff
saying that the ideas were wrong. Tinkerpop as a front end has too many back end systems (we’ll get to that). That we learned nothing from relational
databases. That we provided an incomplete API, speci
fi
cally APIs to do bulk operations. And then he said the query languages were incompetent.

Implementation is Wrong
• Nodes as Objects sucks
• No internal algebras
• Incompetent Query Optimizers
• Incompetent Query Executors
• Incompetent Engineering
• Incompetent Engineers
(allegedly)
- Peter Boncz 2018
It wasn’t just the ideas he was criticizing, it was the implementations too. Representing nodes as full on Java Objects takes way too much memory. No compression, no
way to do fast scans, no internal algebras, incompetent query optimizers, incompetent query executors, incompetent engineering. The word of that day was
incompetence….and yes I added that last one, because he might as well have said that at this point. He couldn’t hurt anyone’s feelings any harder.

Last year, Peter Boncz was back to talk about: The Sorry State of Graph Database Systems. Had we learned nothing in 4 years?

https://homepages.cwi.nl/~boncz/edbt2022.pdf
Peter outlines 6 problems…

1. Data that should be accessed together is all over the place. You see it in Triple Stores and in the way Schema-less Graphs store property chains of Nodes and
Relationships.

2. Too many joins. You have to go chase this data down, which means your query planner has to work very hard instead of just scanning a record.

3. Triple stores have no concept of Objects so the query optimizer treats each property independently

4. Graph Databases should stop being special little snow
fl
akes and be more like Relational Databases.

5. Graph Databases built on Key Value stores can’t do bulk operations and the API overhead will kill you.

6.The query languages are a trap. If the optimizer can’t do it, you’ll have to forget the query language. This is how Cypher betrays you.

Peter Suggests:
Stop Sucking
https://homepages.cwi.nl/~boncz/edbt2022.pdf
To wrap up… Peter suggests we stop sucking.

It is at this point that we stand at a fork in the road ahead of us. There are two directions in which we could go. We could explore some of Peter’s suggestions. But then
this talk would be called something like….

Completely Sensible and
Utterly Boring Ideas
For Graph Databases
Completely Sensible and Utterly Boring Ideas for Graph Databases. But it’s not.

Outrageous Ideas
For Graph Databases
It’s called outrageous ideas, so let’s get on with it.

Let’s go backwards, let’s go the wrong direction.

For this we need a time machine. But not that one.

We’re going to 1969. October 1969. So we’ll catch a ride with Bill and Ted instead.

We are going back to our roots. The Codasyl Model. The Network Model… and in keeping with our theme of 1969 we are dropping acid.

Drop ACID
Idea One
First idea, is to drop ACID because in almost all use cases, we are NOT the primary database.

We are the Robin to the Batman. We are a sidekick.

We are the Emotional Support Database. We help keep it together, but we are not the primary database of record.

We are the Mini-Me to the Dr. Evil. We complete them, and as much as we may try to look like them, we aren’t.

Vendor: I bet they are thinking about buying a
Graph Database
Customer: Why did someone take a photo
of us trying to sleep?
No Customer lies in bed at night thinking about buying a graph database. Let’s face it. They already have a database. But it can’t satisfy all their needs. They already tried
some kinky solutions like denormalizing data, adding materialized views, and clustered indexes, but it didn’t do the trick and now they need something new to spice
things up. But we’re there to help, not take over.

Drop The “D”
So let’s start by dropping the Durability.

The hardware vendors already did.

Many graph database vendors have built or are building distributed systems.

I only know 2 things about distributed systems.

1. They introduce a lot of overhead. Frank McSherry showed how terrible some of these so called scalable distributed systems really are.

..and the second thing I know is: They are Hard. Hard to build, Hard to maintain, Hard to reason about. Hard.

Distributed
Graphs
NP-
Distributing a graph is. It’s NP-Hard. It doesn’t matter if P equals NP or not. Splitting up a graph is still going to be NP-Hard.

Let’s talk about A1, this is a 2020 paper about a distributed in memory graph database from Microsoft. I’ll skip the details and jump right into the performance testing for
which they went all out. They built a cluster of 245 Machines with Intel E5-2673 processors. I had to look that one up.

12 Cores
245 Servers
2940
Cores
X
=
It’s a 12 Core Haswell. They have 245 Servers, so a total of 2940 cores. Oh wait a second.

—>
Two… They had 2 of these per server.

12 Cores
245 Servers
5880
Cores
X
=
X 2
So 5880 Cores. Almost 6000 Cores on this Cluster. This is the ultimate dream for a lot of people. A massively distributed in memory graph database. Can you imagine
what kind of performance they got? Well you don’t have to imagine because the paper tell us.

2 hop Query
They performed a two hop query. Start with Steven Spielberg, go to the movies he directed and then to the actors who where in those movies and get a count. They
managed 20,000 queries per second.

20,000
5880
Cores Queries
per
Second
They managed 20,000 queries per second with almost 6000 cores. 20k queries per second. Almost 6k cores.

The distributed the nodes randomly across the cluster. Can you imagine? Every single time they traverse a relationship they have to take a network hit? My mind is
blown, hope yours is too.

Distribute On Cores
not on Servers
Idea Two
So idea number two. Distribute on Cores, and not on Servers.

Why are we here? It’s the Big Question. We aren’t here to have an existential crisis. I’m talking about why are you here at this tech conference? I’ll tell you why.

To prepare for the future. To do that, we have to answer one simple question:

Before the future comes the present and today Intel Xeon processors have up to 60 cores.

Who knows how many cores they will have in the future?

The internet knows. Late this year we get 64 cores, in 2024 we’re getting 128 cores and soon there after at least 344 cores, with a potential for 512 or 528 cores
according to internal leaks at Intel. https://www.youtube.com/watch?v=h20inMLeDnE

Today AMD processors have up to 64 Cores, but by the middle of this year…

They’ll be cranking 128 cores, and who knows how many in the future.

256 Cores
In 2024 AMD will release Zen 5 with up to 256 cores to select customers.

384 !!!
Then somewhere between 2024 and 2025 we will start to see 384 core chips!

64 cores in the cloud.
Hold on you say. You need big RAM to feed all these cores?

Take a look at this beauty. Oh not this kind of RAM?

4TB
Computer RAM, ok. How about 4TB today on a single socket? Is your graph bigger than 4TB?
Tomorrow that will be 8TB and before you know it 32 and 64TB.

24TB
11TB
11TB
What about the cloud? It’s raining terabytes and the forecast is for more.

…and that’s not all. Much like SANs today can let you use a scalable shared pool of hard drive space across a network, CXL technology will let you use a scalable shared
pool of memory across a network.

If you want to learn more, watch this presentation from Gustavo Alonso.
https://www.youtube.com/watch?v=KekKAKI0Aho

At Google, 90% of all analytics workloads operate on less than 1 TB of data.
Dr. Hannes Mühleisen, creator of the DuckDB reminding us that at Google, 90% of all analytics workloads operated on less than 1 terabyte of data.

Does your data
fi
t in a single server today? Will it
fi
t in a single server tomorrow?

Let’s talk about Query Languages.

You don’t have a single gremlin, you have many of them. The Groovy one, the Python one, the Ruby one, the Scala one, the Rust one, they all look similar but they aren’t
the same.

These back-ends are Implemented by a bunch of di
ff
erent Vendors.

Tinkerpop
Standard?
Around 100 vendor
dependent features
Do they allow Lambdas?
What kind of Indexing?
But is it the Standard? No way. Each vendor sets which combination of 100 features they support along with a bunch of other di
ff
erences amongst them. Like allowing
lambdas and the indexing behind the scenes. This is what Peter was complaining about earlier. What I know is that Gremlin is good at two things:

One is giving developers impostor syndrome because it is so hard to learn it turns many people away from graphs.

The second thing Gremlin is good at is allowing those that do make it through the learning curve to start thinking in paths. Start thinking “depth
fi
rst”, which is an
important concept to understand when it comes to graph queries. So it’s not all bad.

Then we have Cypher. Here he is eating the juicy steak in the matrix. It tastes so good, but you know it’s not real.

Customer
• Between a Dozen and a
Hundred Trivial Queries
• Between 0 and a Dozen Non-
Trivial Queries
• A lucky few have All Trivial
Queries
• Most have 1 Non-Trivial Query
and small variations
Workloads
Cypher can handle the Trivial queries just
fi
ne. Some customers have all trivial queries and are blissfully happy. But most have at least 1 big non-trivial query. That
recommendation engine, that shortest path
fi
nding query, that multi source bi-directional weighted traversal, etc. This is where Cypher dies. Literally. He gets electrocuted
by Tank.

So when that happens, we have APOC! Awesome Procedures on Cypher. A library of 450 plus Java Stored Procedures that actually make Cypher usable out of the
matrix and in the real world.

Wait
use graph ldbc
drop query i_short_2
create query i_short_2(INT vid) for graph ldbc {
SetAccum<INT> @@postSet;
SetAccum<INT> @@commentsSet;
SetAccum<INT> @@creatorSet;
SetAccum<INT> @@messageSet;
SetAccum<INT> @@replySet;
SetAccum<INT> @@postFromReplySet;
SetAccum<INT> @@replyToPostSet;
SumAccum<INT> @@current;
SetAccum<INT> @@resultID;
SetAccum<INT> @@visitedSet;
SumAccum<INT> @postID;
SumAccum<INT> @creatorID;
SumAccum<STRING> @creatorFirst;
SumAccum<STRING> @creatorLast;
INT tempMessageID;
INT tempCreator;
STRING tempFirst;
STRING tempLast;
INT postID;
INT tempPostID;
INT length;
INT size;
INT cur;
Person = {person.*};
Creator = {person.*};
Message = {post.*, comments.*};
Prev = {comments.*};
Post = {post.*};
Comments ={comments.*};
Reply = {comments.*};
Reply1 ={comments.*};
ReplyToPost = {comments.*};
Result = {post.*, comments.*};
CurrentReply = {comments.*};
length = Comments.size();
//get person from vid
Person = SELECT s
FROM Person:s
WHERE s.id == vid;
//get latest message
Message = SELECT s
FROM Message:s-((post_hasCreator_person|comments_hasCreator_person):e)->person:t
WHERE t.id == vid
ORDER BY s.creationDate DESC
LIMIT 10;
Message = SELECT s FROM Message:s
ACCUM @@messageSet += s.id,
@@visitedSet += s.id;
PostSet = SELECT s FROM Message:s-(post_hasCreator_person)->:t
ACCUM @@postSet += s.id;
// PRINT PostTest;
//get comment in message
Reply = SELECT s
FROM Message:s-(comments_hasCreator_person)->:t
WHERE t.id == vid
ACCUM @@replySet += s.id,
Reply1 = SELECT s FROM Comments:s WHERE s.id IN @@replySet;
// PRINT Reply1, @@replySet;
ReplyToPost = SELECT s FROM Reply1:s-(comments_replyOf_post)->:t
ACCUM @@replyToPostSet += s.id,
// PRINT @@replyToPostSet;
// PRINT ReplyToPost, @@replyToPostSet;
// //for each comment in message, get 1 hop comment to post
FOREACH item IN @@replySet DO
IF item != -1 THEN
CurrentReply = SELECT s FROM Reply1:s WHERE s.id == item;
//PRINT CurrentReply;
size = CurrentReply.size();
WHILE size != 0 LIMIT 100 DO
Prev = SELECT s FROM CurrentReply:s ACCUM cur = s.id;
CurrentReply = SELECT t
FROM Comments:s-(comments_replyOf_comments)->:t
WHERE s.id == cur
ACCUM @@visitedSet += t.id;
size = CurrentReply.size();
IF size == 0 THEN BREAK; END;
END;
CurrentReply = SELECT s
FROM Prev:s
ACCUM @@replyToPostSet += s.id;
//PRINT CurrentReply;
END;
END;
// PRINT @@replyToPostSet;
//
//get post from 1 hop comment
Post = SELECT s
FROM Post:s-(comments_replyOf_post_reverse)->:t
WHERE t.id IN @@replyToPostSet
ACCUM @@postFromReplySet += s.id;
// PRINT Post;
//get post creator info
Post = SELECT s
FROM Post:s-(post_hasCreator_person)->:t
ACCUM s.@creatorID = t.id,
s.@creatorFirst = t.firstName,
s.@creatorLast = t.lastName;
What about GSQL?
ACCUM @@postFromReplySet += s.id;
// PRINT Post;
//get post creator info
Post = SELECT s
FROM Post:s-(post_hasCreator_person)->:t
ACCUM s.@creatorID = t.id,
s.@creatorFirst = t.firstName,
s.@creatorLast = t.lastName;
// PRINT Post;
//pass person info and postID to 1 hop comment
ReplyToPost = SELECT t
FROM Post:s-(comments_replyOf_post_reverse)->:t
ACCUM t.@postID = s.id,
t.@creatorID = s.@creatorID,
t.@creatorFirst = s.@creatorFirst,
t.@creatorLast = s.@creatorLast,
@@replyToPostSet += t.id;
// PRINT ReplyToPost;
// //the foreach block pass person info and postID to visited comments in post
FOREACH item IN @@replyToPostSet DO
IF item != 0 THEN
Temp = SELECT s FROM ReplyToPost:s WHERE s.id == item
ACCUM tempMessageID = s.id,
tempCreator = s.@creatorID,
tempFirst = s.@creatorFirst,
tempLast = s.@creatorLast,
tempPostID = s.@postID;
//
//// //save person info and PostID from 1 kop comments to message set
Result = SELECT s
FROM Result:s
WHERE s.id IN @@visitedSet
ACCUM CASE WHEN s.id == item THEN
s.@creatorID = tempCreator,
s.@creatorFirst = tempFirst,
s.@creatorLast = tempLast,
s.@postID = tempPostID,
@@resultID += s.id
END;
size = Temp.size();
//filter result set by visited comments
Result = SELECT s FROM Result:s WHERE s.id IN @@visitedSet;
// PRINT tempCreator;
// PRINT "-----------------debug--------------------";
//
// PRINT Result;
//
// PRINT "------debug-----";
//pass post creator info to all visited comment
WHILE size != 0 LIMIT 100 DO
TempReplyTemp= SELECT t
FROM Temp:s-(comments_replyOf_comments_reverse)->:t
ACCUM tempMessageID = s.@creatorID,
tempFirst = s.@creatorFirst,
tempLast = s.@creatorLast,
tempPostID = s.@postID;
IF TempReplyTemp.size() == 1 THEN
Result = SELECT s
FROM Result:s
ACCUM CASE WHEN s.id == tempMessageID THEN
s.@creatorLast = tempLast,
s.@postID = postID
END;
size = TempReplyTemp.size();
END;
END;
END;
END;
//
//
//
// PRINT "---------------Result-------------------------";
//pass post creator to post in message set
FOREACH item IN @@postSet DO
IF item != -1 THEN
TempPost = SELECT s
FROM Result:s-(post_hasCreator_person)->:t
WHERE s.id == item
ACCUM tempCreator = t.id,
tempFirst = t.firstName,
tempLast = t.lastName;
Result = SELECT s FROM Result:s
ACCUM CASE WHEN s.id IN @@postSet THEN
s.@postID = s.id,
s.@creatorLast = tempLast
END;
END;
END;
Result = SELECT s FROM Result:s
WHERE s.id IN @@messageSet
Order by s.creationDate DESC, s.id DESC;
PRINT Result.id, Result.content, Result.imageFile, Result.creationDate, Result.@postID,
Result.@creatorID, Result.@creatorFirst, Result.@creatorLast;
}
install query i_short_2
•
GSQL can’t decide if it’s a query language or a programming language, so it’s just kind of accumulates a lot of lines of code and it’s a pain to work with for all but the
people who get paid by the hour to write this stu
ff
.

So GQL? That’s the new Standard like SQL the vendors have been building? The problem here is that it will still need APOC. Or APOG I guess, and then you can kiss
your standard goodbye.

Programming Languages
instead of Query Languages
Idea Three
Idea Three is to use actual programming languages instead of query languages.

There is a blog post from Ted Neward called the “Vietnam of Computer Science” talking about the war of ORMs and Relational Databases. This is my spin on the subject
about Declarative Query Languages.

The Lie
• In Declarative Query
Languages (like SQL, Cypher,
GQL, etc) developers are
supposed to:
• specify what is to be done
• instead of how to do it.
Let’s start o
ff
with the L I E. Can you spot it? It’s subtle. It says “In Declarative Query Languages developers are supposed to specify what is to be done instead of how to
do it”.

The Problem
• Find the customers who
decreased their purchase
amounts on their most
recent order
• A contest for who
could beat Joe Celko
performance wise
on 10k rows of data
A “simple” query
Let’s look at an example. The problem is a simple query. Find the customers who ordered less on their most recent order than the one before that. This was a subject for
a contest Joe Celko ran back in the day to see who could write a faster query on 10k rows of data. Look at that horrible mess, that was Joe’s query. https://www.red-
gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/celkos-sql-stumper-the-data-warehouse-problem/

44 Different
• There are at least 44 different
ways to write:
“Find the customers who
decreased their purchase
amounts on their most recent
order”
• 30 Unique Timings
• At least 30 ways for the Query
Planner and Optimizer to execute
Queries
I remember this challenge because I entered two queries. There were 44 in total. 44 di
ff
erent ways to write that sentence in SQL, and 30 unique timings to go with them.
So at least 30 ways for the query planner and query optimizer to execute those queries. The queries range in performance from 46ms to 10 seconds. Just on 10
thousand rows of data. Can you imagine the timing range on 10 million rows of data? The fastest queries are 10x faster than the middle of the pack and 20x faster than
all but the worst which we will ignore because Ramesh was probably Trolling.

You end up not only having to be an expert in the query language, but also how to manipulate the query planner and query optimizer to take full advantage of the
mechanical sympathy of the database engine to run your queries optimally. This is worse than just telling the database how to execute the query.

It’s not the fossil fuel industry killing the planet, its all those ine
ffi
cient database queries running on ever growing data that will doom us all.

Lightning Round
Lightning Round.

Idea Four
No More
Database
Drivers
Idea Four is No More Database Drivers. It’s just one more thing to get in the way. You’ll spend your time answering “oh sorry we don’t have a Go Driver or Rust Driver or
Zig Driver or Julia Driver or whatever the cool kids are using this month”…and you’ll have to hire a bunch of people to build and maintain these things. It’s going to cost a
lot of money and be a royal pain. Trust me on this one.

Some of Peter’s Ideas
Schema,
Vectorization,
JIT, SIMD
A sprinkle of Peter’s ideas like actual Schema, vectorized query execution where possible, Just In Time Queries Compilation, take advantage of SIMD where possible. I
mean sure why not, these aren’t bad ideas.

Never trust vendor Benchmarks
Before I say anything more, please remember to never trust vendor benchmarks. Never ever.

Anyone, one day I got really mad at the performance I was getting. And I do mean really mad. Mad enough to write a few thousand lines of C code.

8.3m vs 330m r/s/c*
3m vs 175m r/s/c*
*Relationships Traversed Per
Second Per Core
40-60x
Faster
So I wrote the bare in memory data structures needed to duplicate what Neo4j was doing in C and compared a couple of traversals. The top one goes through 50 million
relationships per query, the second does the same, but checks a property on those relationships before traversing. From 8 million to 330 million. From 3 million to 174
million. That’s 40 to 60 times faster.

But I’m comparing apples and oranges. One is a database meant to handle any workload. The other is handcrafted code meant to handle two queries that we have
complete control over.

So does that mean everyone should just do a couple of shots and build their own handmade graph services? Not really. What it means is that there is plenty of room to
make the current databases better and build new and faster databases.

I got no
patience
and I hate
waiting
Just like Jay-Z. I have no patience and I hate waiting.

We need to code today for a better tomorrow.

Rage DB
@rage_database
ragedb.com
GitHub.com/ragedb
hub.docker.com/u/ragedb
An outrageous
graph database
So I started working on RageDB. Taking some of these outrageous ideas and implementing them.

Shell
f
ish
Because I am Shell
fi
sh.

Sorry, I meant Sel
fi
sh. Graphs are the only thing I know and if the current vendors don’t
fi
x their o
ff
erings then I might be in the same sinking ship as the Hadoop Experts.

I want to build 4 me
• Better performance
• A lot faster (hopefully)
• Can handle diverse workloads
• Properties in Traversals
• An easy interface
• HTTP + JSON
• A programming language
• For complex queries
A graph db that has:
I want to build for my needs. A graph database that is Faster, Better, Easier, and more Flexible by following some of the hardware trends we talked about in this
presentation.

—Paul Barham
“You can have a second computer
once you’ve shown
you know how to use the
f
irst one.”
And planning for a Scale Up System using Lots of Ram, Lots of Cores, on a Single Server. Replicated (eventually) but not Distributed.

Seastar
• Shared Nothing Multicore
• “Server per core”
• Message Passing
• Futures and Promises
• High Performance
Networking
Framework
Using the Seastar framework with it’s “server per core”, futures and promises, and high performance networking.

We avoid shared memory and locking, think of each core as a server message passing events within the physical box instead of via the network. No ACID needed
(maybe).

On 4 Cores
190k Requests / Second
Stupid fast, with latencies low enough for AdTech use cases.

On 4 Cores with DPDK
280k Requests / Second
We can use DPDK (Data Plane Development Kit) to go even faster skipping the network driver and talking to the network card directly… even on the Cloud. Yes. I’m only
getting an empty node, but the other graph databases can’t even say hello that fast.

Schema
• Nodes have a single Type
• No multiple labels
• Properties have a Type
• Bool, Int, Double, String, List
• Nodes of the same Type have
the same properties
• Like any sane database
Not Optional
With a Schema, because in the real world, data has schema. A single type for Nodes and Relationships, because multiple labels were a terrible mistake. Let’s make
things sane again.

HTTP + JSON
• You can talk to it from your
browser
• You can talk to it from any
programming language
• No drivers needed, no custom
protocol
Universal
Let’s talk via HTTP and JSON, from any language, no drivers needed, no custom binary protocols, you can even talk to it from your browser window.

Lua
• Proven
• Used in embedded systems
and games
• Fast
• Fastest scripting language I
know of, and using LuaJIT
• Powerful, small and free (MIT)
“Moon” in Portuguese
Using Lua as the Query Language because it’s proven in the
fi
eld and used in embedded systems and games where performance matters. Using LuaJIT the fastest
scripting language I know of.

Lua
• Simple Queries
As a Query Language
We’ll take whatever the last line of the query is and turn it into JSON. For example getting a node.

Lua
• Simple Queries
• Pipelined Queries
As a Query Language
Or doing a bunch of stu
ff
, related or unrelated, in a pipeline or batch.

Lua
• Simple Queries
• Pipelined Queries
• Complex Queries
As a Query Language
You have a real programming language to do complex queries plus helper functions for accessing the database and soon to com vectorized procedures for faster data
processing.

Look at that pretty UI. I built that myself. Let’s traverse 50M relationships in 10 seconds. Too Slow?

Remember about 100 slides ago when Peter Boncz was complaining about Graph databases not having Bulk APIs. Turns out he was right. Here we can go about 5x
faster by traversing in bulk instead of one at a time. Makes the query simpler too.

Oh hey I forgot to talk about Dgraph and GraphQL. Do we really need it here? We are already returning JSON and can return it in any way we want. A single request can
be one query or one hundred, related or not.

SIMD
• Already in Find with Predicate
• Will be added to Math and Data
Manipulation Functions
• Sprinkled in wherever it can to
speed things up
For Vectorized Execution
Borrowing the EVE library for SIMD vectorized execution. Already making
fi
nding nodes and relationships with a predicate faster, will be adding math and data
manipulation functions as well sprinkling it in wherever we can.

4 Layer Design
HTTP
Lua (in Thread)
Peered
Shard
A very simple 4 layer design. HTTP in the front, Lua if needed in Thread, a Peered method to coordinate multi shard requests and a shard layer to actually work with the
data.

Blog Posts
maxdemarzi.com
I’ve been writing my progress on my blog at maxdemarzi.com so you don’t walk blind into a 20,000 line C++ codebase. A little behind on where the code base is, but will
catch up soon.

Bookmark the website today, it’s RageDB.com

Apache License 2.0. Pinch the person sitting besides you, they aren’t dreaming. My employer allowed me to release this software as Open Source.

Todo in Spanish means “All of It”.

So there is still a ton of things to do.

Of course I’m looking for help.

Todos
• C++ Dev: ragedb
• Java Dev: rage-assured
• Scala Dev: benchmarks
• JavaScript Dev: UI
• DevRel: Home Page
• DevOps: Docker + Packaging
• Anyone: Use it, report bugs,
request features
Means all of us
Just remember that “todos” in Spanish means all of us, whatever your skill set is, I have something you can help with.

Rage DB
@rage_database
ragedb.com
GitHub.com/ragedb
hub.docker.com/u/ragedb
An outrageous
graph database
So we can build an outrageous database together. Thank you.

DataDay 2023 Presentation - Notes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DataDay 2023 Presentation - Notes

Similar to DataDay 2023 Presentation - Notes (20)

More from Max De Marzi

More from Max De Marzi (20)

Recently uploaded

Recently uploaded (20)

DataDay 2023 Presentation - Notes