The database for the Web
Why do I need another

But the question is:
are DBMSs ready for
   the Web age?
Web means:
 Hundreds of users today,
      thousands or
   millions tomorrow

Web means:
 Idea → Demo in few weeks

Web means:
     Fast and frequent
 changes of requirements and
      data structures

Web means:
  + social + relationships
    + interconnections
         = graph

Web means:
 Low usage of resources
  run on cheap hardware

Web means:
 Speak HTTP, REST and JSON

Why is the database
     so important
in modern applications?

modern applications
are mostly I/O Bound
     CPU bound
       = “Not Only SQL”
= make the best choice for your
           use case

Can I trust
new DBMSs?
Non exhaustive list of NoSQL products:

    AllegroGraph, Amazon SimpleDB, Amazon
 Dynamo, Dynomite, BerkleyDB, Google BigTable,
 Cassandra, CouchDB, DB4O, Hbase, Hipertable,
    Hive, Jackrabbit, InfiniteGraph, InfoGrid,
 Memcached, MemcacheDB, Mnesia, M/DB/DT.M,
    MongoDB, Neo4J, OrientDB, Pig, Project
   Voldemort, RavenDB, Redis, Riak, Scalaris,
Sesame, Sones, Terrastore, Tokyo Cabinet/Tyrant,
              Yahoo! Pnuts/Sherpa
“NoSQL container” groups so
    different products,
        no standard,
    difficult to choice,
  difficult to learn them

Can I have a
Fast, scalable, flexible
with ACID Tx, SQL, Security
 easy to use and maintain?

The fastest NoSQL document-graph dbms

+12 years
  of research

+1 year
of design and develop

best features of newest NoSQL solutions
   best features of Relational DBMS
      new ideas and concepts
Ø config
 download, unzip, run!
  cut & paste the db
No dependencies
     with 3 parties software
no conflicts with other software
 just 1 Mb of run-time libraries

 records per second

 schema is not mandatory, relaxed model,
collect heterogeneous documents all together

schema with    constraints on fields and validation rules
Customer.age > 17
Customer.address not null
Customer.surname is mandatory matches 'b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b'

schema with mandatory and optional fields + constraints
    the best of schema-less and schema-full modes

ACID Transactions

  // your code

} catch( Exception e ) {

Complex types
                 collections, maps (key/value)
native support for
            and embedded documents
      no more additional tables to handle them

              are direct links
no Relational JOINS to connect multiple tables
    Load trees and   graphs in few ms!

select * from employee where name like '%Jay%' and status=0

For the most of the queries
everyday a programmer needs
    SQL is simpler,
   more readable and
     compact then
   Scripting (Map/Reduce)

SELECT SUM(price) as prices, SUM(cost) as costs, prices-costs, margin/price
  FROM Balance

function (key, values) {
   var price = 0.0, cost = 0.0, margin = 0.0, marginPercent = 0.0;
   for (var i = 0; i < values.length; i++) {
      price += values[i].price;
      cost += values[i].cost;
   margin = price - cost;
   marginPercent = margin / price;
   return {
      price:        price,
      cost:        cost,
      margin:        margin,
      marginPercent: marginPercent
Asynchronous Query
invoke callback when a record matches the condition
             doesn't collect the result set
           perfect for immediate results
            useful to compute aggregates

       everywhere is available JRE1.5+
            robust engine

Language bindings
        Java as native
JRuby, Scala and Javascript ready
C, C++, Ruby, Node.js in progress

Your language is
not supported (yet)?
 Write an adapter using the
   C, Java or HTTP binding

      firewall friendly
use it from the webbrowser
  use it from the ESB (SOA)

Native JSON
    '@rid' = '26:10',
    '@class' = 'Developer',
    'name' : 'Luca',
    'surname' : 'Garulli',
    'company' : '19:76'

            uses JSON format
online operations (don't stop the database)

Binary protocol
Fast compressed JSON over tcp/ip
         available for Java
     and soon C, C++ and Ruby

RB+Tree                       index
   the best of B+Tree and RB-Tree
fast on browsing, low insertion cost
 It's a new algorithm (soon public)

OO Inheritance
 Definition of Classes of documents
 Classes can extend others classes
    Queries are polymorphics
                  name : string
                 surname : string

      Customer                     Provider
 orders : List<Order>       products : List<Product>
               similar to   triggers
catch events against records, database and transactions
  implement   custom cascade deletion algorithm
              enforce constraints

Fetch plans
 Choose what to fetch on query and document loading
Documents not fetched will be lazy-loaded on request
   | customer
   +---------> Customer
   |            5:233
   | city             country
   +---------> City ---------> Country
   |           11:2              12:3
   | orders
   +--------->* [OrderItem OrderItem OrderItem]
                [ 8:12        8:19     8:23   ]
users and roles, encrypted passwords
        fine grain privileges

4 storage modes

Embedded mode
                really fast
runs in the same JVM of the application,
      less resources, no tcp/ip used

Client/server mode
client and server are separated JVMs
thousands of clients concurrently
   remote tcp/ip binary transport

Distributed mode
 distribute database clusters on
multiple servers (in alpha status)

In-memory mode
Database lives only in memory
       No disk is used
   Destroyed at shutdown

User API
            Document Database

Key/Value Database       Graph Database

             Object Database

Document Database
   the base of all DB implementations
   documents have dynamic structure
something like a smart Map<String,Object>

Open the database
ODatabaseDocumentTx db = new ODatabaseDocumentTx( "remote:localhost/demo" );"admin", "admin");



} finally {                                Open the database 'demo'
  db.close();                               from a remote server

Create a document
ODocument doc = new ODocument( db, "Person" );

doc.field( "name", "Luke" );                               Relationship
doc.field( "surname", "Skywalker" );

doc.field( "city", new ODocument("City" ).fields("name","Rome") );;

SQL Query
List<ODocument> result = db.query(
 new OSQLSynchQuery( "select * from person where = 'Rome'" ) );

for( ODocument d : result ) {
  System.out.println( "Person: " + d.field( "name" ) + d.field( "surname" ) );

Native Query
List<ODocument> result = new ONativeSynchQuery<ODocument,
             db, "Profile", new OQueryContextNativeSchema<ODocument>()) {

 public boolean filter(OQueryContextNativeSchema<ODocument> iRecord) {
   return iRecord.field("city").field("name").eq("Rome").and().field("name").like("G%").go();


Update a document
List<ODocument> result = db.query(
 new OSQLSynchQuery( "select * from person where = 'Rome'"));

for( ODocument d : result ) {
  d.field( "local", true );;

int changed = db.command(
  new OSQLCommand( "update person set local = true where = 'Rome'"))

Delete a document
List<ODocument> result = db.query(
 new OSQLSynchQuery( "select * from person where = 'Rome'" ) );

for( ODocument d : result ) {

int deleted = db.command(
    new OSQLCommand( "delete person where = 'Rome'")).execute();

from/to JSON
System.out.println( document.toJSON() );

document.fromJSON( “{ '@class' = 'Developer',
                      'name' : 'Luca',
                      'surname' : 'Garulli' }” );;

Use hooks (triggers)
public class HookTest extends ORecordHookAbstract {
 public saveProfile(){
  ODatabaseObjectTx database = new ODatabaseObjectTx("remote:localhost/demo");"writer", "writer");



    public void onRecordAfterCreate(ORecord<?> iRecord){
      System.out.println("Record created successfully");

Key/Value Database
          bucket / key / value
         HTTP RESTful protocol
Hazelcast plug-in to   distribute database

Key/Value = RB+Tree
works mainly using RB+Tree   custom indexes
 sort of Map<String,Map<String,Record>>

Object Database
wrapper on top of Document Database
 binds POJO from/to the database
     no OR-Mapping complexity
   no enhancement, no Java Proxies

POJO mapping
Uses the reflection to bind POJO fields
at start-up caches reflection meta-data
             1-to-1 binding
 configurable options by @annotations

Open the database
ODatabaseObjectTx db = new ODatabaseObjectTx( "remote:localhost/demo" );"admin", "admin");


 …                                           Same usage of Document
                                              Database, but the class
} finally {                                        is different

Create a persistent POJO
Person person = new Person();

person.setName( "Luke" );
person.setSurname( "Skywalker" );

person.setCity( new City( "Rome" ) ); person );

Polymorphics SQL Query
List<Person> result = database.query(
 new OSQLSynchQuery("select from person where = 'Rome'"));

                             Queries are polymorphics
                           and subclasses of Person can be
                                  part of result set
for( Person p : result ) {
  if( p instanceof Customer )
    System.out.println("Customer: " + p.getName() + “ “ + p.getSurname() );

Graph Database
 wrapper on top of Document Database
Few simple concepts: Vertex, Edge,
       Property and Index

Example of a Graph

TinkerPop technologies
  sort of “standard” for GraphDB
 a lot of free open-source projects

TinkerPop Blueprints
basic API to interact with GraphDB
   implements transactional and
 indexable property graph model
        bidirectional edges

GraphDB                            & Blueprints API

OrientGraph graph = new OrientGraph("local:/tmp/db/graph”);

Vertex actor = graph.addVertex(null);
actor.setProperty("name", "Leonardo");
actor.setProperty("surname", "Di Caprio");

Vertex movie = graph.addVertex(null);
movie.setProperty("name", "Inception");

Edge edge = graph.addEdge(null, actor, movie, "StarredIn");


     scripting language
 easy to learn and understand
Used for operations against graphs

Graph example

Load graph
Run the console, open the database and load a graph in xml format

marko:~/software/gremlin$ ./

         (o o)

gremlin> $_g := orientdb:open('/tmp/graph/test')

gremlin> g:load('data/graph-example-1.xml')

gremlin> $_g
Displays outgoing edges of vertices with name equals to 'marko',
               then the name of inbound vertices

gremlin> g:key-v('name','marko')/outE

gremlin> g:key-v('name','marko')/outE/inV/@name

gremlin> g:close()
API resume
object, key/value and graph elements all work on top of Document

    you can always access to the underlying document

       changes to the document are reflected to the
          object, key/value and graph elements
                       and viceversa

Cluster of distributed server nodes
 Synchronous, Asynchronous and Read-Only
Load-balancing between client ↔ servers and
             Servers ↔ Servers

Synchronous scenario
     Server #1 owns all the data, used for reads/writes
   Server #2 is the backup replica, can be used for reads
      Clients receive ack only when both are updated
           Server #1 and #2 are Always   Consistent

Client A         Client B            Client A        Client B

    Server #1                              Server #2
    (Leader+Owner)                         (Synchronous)

            DB                                  DB

Asynchronous scenario
                     As for synchronous, but:
          Server #2 is Eventually Consistent
     Clients receive ack just when Server #1 is updated

Client A        Client B                    Client A        Client B

    Server #1                                   Server #2
    (Leader+Owner)                             (Asynchronous)

           DB                                          DB

 Server #1 logs changes while Server #2 is disconnected
 Transparent client switch to good servers (alpha status)
Running transactions will be repeated transparently (v0.9.26)

Client A        Client B             Client A        Client B

     Server #1                           Server #2
    (Leader+Owner)                       (Asynchronous)

           DB                                   DB

Mixed scenario
                                  Strict Consistency is acceptable
Use Server #1 and #2 for cases when
Use Server #3 for cases when Eventually Consistency is acceptable

                             Server #1
 Server #2                                                    Server #3
 (Synchronous)                               update-delay=0   (Asynchronous)

      DB                                                           DB

                Cluster level granularity
Place the “owner” close to the clients to reduce latency

Server main                                          Server USA

                     Asynchronous (update-delay=0)      Europe

                  Asynchronous (update-delay=0)           USA

Real world scenario I
           Distribute data across multiple sites
               Play with sync/asynch+delay
  Keep synchronous copies close and propagate in asynch
    Server Farm Europe                                                   USA
                                             Server China
                                                (Owner)                Customers
Customers                                                                Asia
  USA        Server Main                     China Copy
                                              (Synchronous)            Customers
                (Owner)       Asynchronous                              Europe
  Asia                         propagation
             Main Copy
Customers     (Synchronous)
 Europe                                                                Customers
                                             Server USA                  USA
                                              USA Copy
                                              (Synchronous)            Customers

Real world scenario II
                     Put server nodes in              chain
                     Load-balance requests
                                                                  China East
Server Copy                                                         Read-only
                                                                 China West

                                 Server China                   China North
                                     (Asynchronous)                 Read-only
Server   Main
                                 Server USA                       USA South
                                     (Asynchronous)                 Read-only

         Update European Customer:                                  Read-only
           Propagate the change
Choose the best strategy
                    for your use-case

Server Copy            Client B
                    consistent reads,                       Client D
                    delegated writes                    ev. consistent reads,
                                                              no writes

 Server   Main                    Server China            China North
(Leader-Owner)                     (Asynchronous)             Read-only

   Client A                             Client C
consistent reads,               ev. consistent reads,
  direct writes                   delegates writes

Enhanced SQL
SQL is not enough for collections, maps, trees and graphs
              need to enhance SQL syntax
      Easy syntax derived from JDO/JPA standards

SQL & relationships
select from Account where = 'Italy'

select from Account where addresses contains ( = 'Italy')

SQL & trees/graphs
select from Profile where friends traverse(0,7) ( sex = 'female' )

        (Soon new specific operators for trees and graphs)

SQL & strings
select from Profile where name.toUpperCase() = 'LUCA'

select from City where,3).toUpperCase() = 'TAL'

select from Agenda where phones contains ( number.indexOf( '+39' ) > -1 )

select from Agenda where email matches 'bA-Z0-9._%+-?+@A-Z0-9.-?+.A-Z?{2,4}b'

SQL & conversions
select from Shapes where area.toFloat() > 3.14

select from Agenda where birthDate.toDateTime() > '1976-10-26 07:00:00'

select from Workflow where completed.toBoolean() = true

SQL & schema-less
select from Profile where any() like '%Jay%'

select from Stock where all() is not null

SQL & collections
select from Tree where children contains ( married = true )

select from Tree where children containsAll ( married = true )

select from User where roles containsKey 'shutdown'

select from Graph where edges.size() > 0

SQL & documents
select from Vehicle where @class = 'Car'

select from Friend where @version > 100

select from File where @size > 1000000

ORIENT database v.0.9.23
Type 'help' to display all the commands supported.

> connect remote:localhost/demo admin admin
Connecting to database [remote:localhost/demo] with user 'admin'...OK

> select from profile where nick.startsWith('L')
  #| REC ID |NICK                |SEX                 |AGE                 |
  0|    10:0|Lvca                |male                |34
  1|    10:3|Leo                 |male                |22
  2|    10:7|Luisa               |female              |27
3 item(s) found. Query executed in 0.013 sec(s).

> close
Disconnecting from the database [demo]...OK

> quit
OrientDB Studio/SQL query

                 Resultset is editable and
                 changes are immediately

OrientDB Studio/db structure

                      Physical structure
                         of database

OrientDB Studio/server profiler

                     Statistics and timing are
                      collected in real-time

Always        Free
Open Source Apache 2 license
     free for any purposes,
       even commercials

     by a network of companies through
             Orient Technologies
support, training, consulting, mentoring

OrientDB             OrientDB
for Java developers   Master Development
      8 hours                14 hours

     OrientDB               OrientDB
      for SOA         and the power of graphs
      6 hours                 6 hours

     OrientDB             OrientPlanet
      for DBA          for Web Developers
      6 hours                 6 hours

Certification Program
        to be part of the network
               do courses
       share revenues for support
            work as consultant


Luca Garulli
                              Author of OrientDB and
                             Roma <Meta> Framework
                               Open Source projects,

                           Member of JSR#12 (jdo 1.0) and
                                JSR#243 (jdo 2.0)

                            CTO at Asset Data and Orient            Technologies

       @Rome, Italy             Technical Manager at
                                Romulus consortium

