When it’s time to choose the database technology for your app, the choices can be overwhelming. Should you choose SQL or NoSQL? Open source or proprietary? Self-hosted or hosted? If you’re not already familiar with graph databases, you might be tempted to ignore them as an option. But that could be a
mistake.
In this tools-in-action session, I'll discuss the benefits of graph databases. Then I'll demonstrate how you can quickly and easily get started with a graph database using a hosted version of the open-source graph computing framework Apache TinkerPop. I'll show you how you can use APIs and the Gremlin graph traversal query language to perform CRUD (create, read, update, and delete) operations. I'll even show you how you can use Gremlin to quickly create a simple recommendation engine for your users.
New to graph databases? No problem! Come learn all you need to get started here!
10. #DevoxxFR #IntroToGraph @Lauren_Schaefer
How do you make a graph?
Node
(noun)
Node
(noun)
Node
(noun)
Node
(noun)Edge (Verb)
Properties
Key: value
Properties
Key: value
Properties
Key: value
Properties
Key: value
Properties
Key: value
21. #DevoxxFR #IntroToGraph @Lauren_Schaefer
Many more use cases
• Modeling social networks
• Diagnosing psychosis with word analysis
• Analyzing the spread of epidemics
• Modeling a bio network
• Visualizing a social/economic/political network
43. #DevoxxFR #IntroToGraph @Lauren_Schaefer
That’s all for now…
• To access the resources associated with this presentation…
- visit http://ibm.biz/devoxxfr_tia_slides
• To go deeper, attend my hands-on lab on Friday! (and tell your friends)
• To continue to learn more about Lauren, IBM Graph, and Bluemix, follow
@Lauren_Schaefer
@IBMGraph
@IBMBluemix
Editor's Notes
Tried to learn a little French on my flight here by watching Beauty and the Beast, but unfortunately nothing stuck. Instead, I’ll be presenting here in full-blown American English.
Prereqs: put a couple of extra images in the images folder and redeploy
Open the web ide
Open the graph query editor
l.au.ren.jh.ayw.ard@gmail.com
Graph databases are a type of nosql database that focus not only on the data being stored but also on the relationships between the data.
In relational databases, you have tables. In graph databases, you have nodes and edges.
Nodes or vertexes. Think of the nodes as your nouns. It’s where you store information about your people, places, and things.
Think of the edges as your verbs. It’s where you store information about the actions that connect your nodes.
Property graphs allow you to store properties (or information about your data) in your nodes and edges.
DB Engines
Graph databases are increasing in popularity…but why? Graphs have some very important use cases.
Let’s start with people. When people shop online, it can be incredibly helpful to see what other people like them purchased. Real-time recommendation engines can be efficiently run on graph databases.
Graph databases can be used to determine the shortest or fastest path.
Let’s talk about the internet of things. We live in a smart, connected world. Storing data about devices and all of their connections can be done quite easily in a graph database.
Snapchat glasses
One other common use case when you start combining people, places, and things is detecting fraud. When you need to search for patterns of fraudulent behavior, graph databases are a fantastic option. You can search for complex credit card fraud schemes. You can also search for things like insurance fraud.
So should you care about graph databases? Absolutely! In this connected world where we need to do more with our data than just access it, graph databases help us efficiently store and gain insight into the connections and patterns of our data.
It turns out that setting up a graph database can be really complicated. Apache TinkerPop is an open source graph computing framework that’s very popular. If you want to use TinkerPop, you’ll need to setup a graph database such as Titan, which requires you to also setup and configure Cassandra or HBase and Elastic Search.
Assuming you like the system you set up, you’ll have to keep all of the systems up to date and ensure they’re always running. I don’t know about you, but that sounds a bit overwhelming to me. I’d rather just focus on my app and my data.
IBM Graph is a fully-managed graph database service on Bluemix. You may have heard of Software as a Service. You can think of IBM Graph as TinkerPop as a Service. All of the concepts we learn today are going to be applicable to graph databases in general or TinkerPop. We’ll be using the Gremlin graph traversal language, which works with TinkerPop and is not specific to IBM Graph.
There are 3 big perks that IBM Graph advertises
It’s highly available, so your data is always accessible.
It scales seamlessly, so you don’t have to worry about updating your graph database when your app becomes the next big thing.
It’s managed 24/7 by IBM experts, so you don’t have to worry about upgrading or migrating your data as the technologies change.
The biggest perk that we’ll see today is that it’s easy to setup and get going. We aren’t going to spend time messing with setting up a graph database.
Nouns Nodes or vertices
Verbs Edges
A user can buy many prints. A print can be bought by many users.
MULTI Allows multiple edges of the same label between any pair of vertices.
Other options include simple, many2one, one2many, and one2one.
For simplicity
Not storing passwords
Shipping address stored at time of purchase only
Only one print per order
Simple payment method storage
Not written
All properties have cardinality of single, meaning only one value can be stored. Other options are set or list.
Can extend the schema later to add other properties, vertices, and edges. For example, if we want to allow users to tag prints
Indexes allow you to search your graph
Can extend the schema later to add other indexes. For example, we might want to add an index on the buys edge to search by date.
Composite – search by exact match. Mixed – search doesn’t have to be an exact match. For example, if you want to search for a user who last name starts with Sch, you would use a mixed index. Or if you wanted to search for prints whose price is less than $100, you would use a mixed index.
Unique – should the property value be required to be unique?
indexOnly – restrict the index to a particular vertex or edge
Gremlin graph traversal language
Create
Show home page
def gt = graph.traversal();
gt.addV(label, 'print', 'name', 'cruise', 'description', 'so much fun', 'price', '50', 'imgPath', '006.jpg','type', 'print');
Read
Find the node for the Alaska printdef gt = graph.traversal();gt.V().hasLabel('print').has('name', 'Alaska');
Find the node for the Alaska print by idgt.V(idnumber)
Who bought the Alaska print? (copy first one)gt.V().hasLabel('print').has('name', 'Alaska').in();
View the orders that include the Alaska printgt.V().hasLabel('print').has('name', 'Alaska').inE();
Update
Show Alaska
def gt = graph.traversal();gt.V().hasLabel('print').has('name', 'Alaska');
def gt = graph.traversal();
gt.V().hasLabel('print').has('name', 'Alaska').property('description', 'A favorite pic!');
Delete
Show the list of users
Drop dale
def gt = graph.traversal();gt.V().hasLabel('user').has('username', 'dale');
gt.V().hasLabel('user').has('username', 'dale').drop();
Show the list of users again
Drop all of the orders that Deanna had placed. Show Deanna’s orders
def gt = graph.traversal();gt.V().hasLabel('user').has('username', 'deanna');gt.V().hasLabel('user').has('username', 'deanna').outE();
gt.V().hasLabel('user').has('username', 'deanna').outE().drop();
Recommender/recommendation systems/platforms/engines are used to generate personalized recommendations for users. They vary in complexity and accuracy. Some systems work in real-time while others do not.
Relevant recommendations can be incredibly helpful to your app’s users, which can in turn be incredibly helpful to your app’s engagement rate and sales. Win, win!
Let me show you how I created a recommendation engine for the home page of Lauren’s Lovely Landscapes that displays the top 3 personalized recommendations for users who are signed in.
The idea behind the TinkerPop recipe is collaborative filtering. Essentially, collaborative filtering assumes that if users share something in common with each other (for example, they’ve purchased the same item), those users are likely to share something else in common with each other.
A major strength of graph databases is the ability to quickly generate real-time recommendations through collaborative filtering.
Let me show you how this works.
This is a graph representation of the sample data for Lauren’s Lovely Landscapes. Here you can see four users and six prints. The edges connecting the users and the prints represent purchases.
Let’s say we want to find personalized recommendations for the user Dale.
We’ll begin by looking to see what prints Dale has purchased.
We can see he’s purchased Las Vegas, Australia, and Japan.
Now, we’ll traverse the graph to see which users have bought those prints. We can see Jason, Joy, and Deanna have purchased those prints.
Now, we’ll traverse out from those users to see what prints those users have bought. This will show us what prints Dale might also be interested in buying. We can exclude the prints that Dale has already purchased since we don’t want to recommend to him something he’s already purchased.
We can see the prints that Jason, Joy, and Deanna have bought are Alaska and Antarctica. By counting the edges, we can see that Alaska has been purchased 3 times and Antarctica has been purchased 2 times. Therefore, Alaska will be our top recommendation for Dale.
Query for personalized recommendations
def gt = graph.traversal();
java.util.function.Function byNameImgPath = { Vertex v -> "" + v.value("name") + ":" + v.value("imgPath") };
gt.V().hasLabel("user").has("username", "dale").as("buyer")
.out("buys").aggregate("bought")
.in("buys").where(neq("buyer")).dedup()
.out("buys").where(without("bought"))
.groupCount().by(byNameImgPath).order(local).by(valueDecr).limit(local, 3);
The homepage for Lauren's Lovely Landscapes aims to display three personalized recommendations. The query you just observed might not generate three recommendations. Here's why: (1) if the user is not authenticated or (2) if the user has purchased all or nearly all of the prints the common users have purchased so there are not three prints left to recommend.
Query to find top-most purchased prints
def gt = graph.traversal();
java.util.function.Function byNameImgPath = { Vertex v -> "" + v.value("name") + ":" + v.value("imgPath") };
gt.V().has("type", "user")
.out("buys")
.groupCount().by(byNameImgPath).order(local).by(valueDecr).limit(local, 3);