Objectivity/DB: A Multipurpose NoSQL Database

Objectivity/DB - A Multi-Purpose
NoSQL Database

Leon Guzenda and Lenny Hoffman

Lunch and Learn - December 12, 2013
© Objectivity, Inc. 2013

Agenda
• Objectivity
• NoSQL?
• Objectivity/DB
• Typical Configurations
• Placement Manager


Objectivity, Inc.
• A privately held, profitable company founded in 1988 to
provide object database software for engineering, scientific
and defense applications.
• Defence and Intelligence Community customers include:
US DoD/DoE/DHS, BaE, Boeing, General Dynamics,
L-3, Lockheed Martin, NGC, Raytheon, Titan...
• Equipment manufacturing customers include: Alcatel, Ciena,
Emerson, Ericsson, Iridium, NEC, Qualcomm, Siemens...
• “Big Science” customers include Berkeley NL, Brookhaven NL, CERN,
Lawrence Livermore NL, FermiLab, JHU Sloan Digital Sky Survey, Los
Alamos NL, Max Planck...


Typical Deployments...
Objectivity/DB is used to
aggregate knowledge and
disseminate it back into
the field

HUMINT


...Typical Deployments...
Astronomy


...Typical Deployments

Zeus Anesthesia

Base Station Controller

Iridium LEO Network

High Energy Physics

Process Control

ODBMS Evolution
• 1980s
– “Performance, Performance, Performance!”
– Primarily scientific and engineering applications

• 1990s
– Embedded applications
– Reliability and Scalability
– New languages and Operating Systems

DATA
MANIPULATION
Applications tended
to generate data and
relationships

– Large deployments in the scientific domain

• 2000s

RELATIONSHIP
ANALYTICS

– Ease of use and instrumentation

– Query languages

Applications ingest
and correlate data
and relationships

– Performance and scalability
– Grids and Clouds
– Embedded systems, government and more...


High Ingest Systems
• 2000
– System handles 1 billion objects per day
– Correlates them with existing knowledge
– Raises alerts when new information is found
– Services about 100 analysts at one agency

• 2010
– System handles about 50 trillion objects every day
– Services 1000+ analysts at six agencies

• 2013
– A multi-sensor platform collects 2 Petabytes per day...


NoSQL?


Not Only SQL – A group of 4 primary technologies
Object Database

Key-Value Store

Highly
Interconnected

Simple
Leon Guzenda at ICOODB 2010


NoSQL?
•All of these features could have been obtained
from “Commercial Off The Shelf” ODBMSs:
• Unique object/document IDs

• Flexible object clustering
• Effectivity (data in a relationship)
Shared-Nothing
• Geospatial and multi-dimensional indexing
Fully distributed
• Hash table (key-value) lookups
“No-lock” and novel transaction modes
• Hyperspace = = single logical view of a federation
Iterators and fast, predictable traversals
• Multi-way replication
Fast scans
• High Availability
Optimization for random access
• Text searching

• Sharding
•
•
•
•
•
•

• In Memory Database configurability

Source: ICOODB 2011

Not Only SQL – Is There Actually Anything New?
Other NoSQL

Objectivity/DB

Rediscovered Database Technologies

Scalable Collections:
• Hash Maps
• Plus Trees, Collections, B-Tree
indices and fast navigation

Key-Value Store

Named Groups of Objects:
• Document, Volume, Chapter,
Paragraph, Sentence,
Word…
• Or JSON, XML etc.

Objectivity/SQL++:
• SQL with object extensions
• Every row has a ROWID


OBJECTIVITY/DB


Objects
•
•
•
•
•
•
•
•
•

Accessible via a unique identifier (OID)
Can be linked by named relationships
Can be indexed in multiple ways
Can be included in multiple collections
Can be named in multiple scopes
Can be dynamically clustered
Can be versioned
Can be transient or persistent
Interoperable across multiple languages
(including SQL++) on multiple platforms

Architectural Features
• Single Logical View
• Simple, Distributed Servers
• Replaceable Components (plug-ins)


SINGLE LOGICAL VIEW


SINGLE LOGICAL VIEW

Single Logical View of Millions of Exabytes


Distributed Architecture
Client

Simple, Distributed Servers

Application

Lock Server
Lock Server

Smart
Cache

Lock Server
Query Server
Lock Server
Data Server

Objectivity/DB

Enhances scalability and availability

Parallel Query Engine
Application

Lock Server
Lock Server

Smart
Cache

Lock Server
Query Server
Query Server

Objectivity/DB

PQE

Lock Server
Data Server

Task
Splitter

The Task Splitter aims queries at specific
databases and containers

Filter/Gat
eway

Filters can run complex qualification methods.
Gateways can access other databases or search engines.

Replaceable components for smarter optimization

Replaceable Components


Faster Navigation
PROBLEM: Find all (N) of the Suspects linked to a chosen Incident

Incident_Table

Relational solution:
N * 2 B-Tree lookups
N * 2 logical reads
B-Trees


Join_Table

Suspect_Table

Faster Navigation
PROBLEM: Find all (N) of the Suspects linked to a chosen Incident

Incident_Table

Join_Table

Suspect_Table

Relational solution:
N * 2 B-Tree lookups
N * 2 logical reads
B-Trees

Incident

Objectivity/DB solution:
1
B-Tree lookup
1 + N logical reads

7


Suspects

...Relationship Analytics...
Relational Database

Try constructing the SQL query to find the shortest path between these two rows


...Relationship Analytics...
Relational Database

Object Database

With Objectivity/DB it’s a
few lines of code

Typical Configurations


Configuration Considerations
Processor Power

System Latency

Input/Output Volumes

Concurrency

Data Locality

Data Usage (random/scan,
volatility, query profiles...)

Storage media limitations

Geography and Mobility

Network Bandwidth

Security

Caching

Availability


Standalone

Multithreaded process(es) run on a laptop
or workstation using the local disk(s).


Network of Workstations


In-Memory Database

The reference data is read into a large
cache that is preserved across
transactions, eliminating costly I/Os.


Client-Server


ODBC Server

Allows industry standard tools to access the
federation.


Service-Oriented Architecture

Particularly useful where the link between
the client and the data is very slow.


Parallel Query Engine


High Performance Cluster
BaBAR Project
• “Data Intensive

Computing”
• 2000 CPU Farm
• 40 Terabytes of disk
• 1 Petabyte of tapes

First “public” project to
build a Petabyte+ database


High Availability Cluster


Geographically Distributed


Grid/Cloud Deployment

Grid/Cloud


Logger/Archiver


Hybrid…


…Hybrid
Sources

Ingest

Correlation & Analytics


Visual Analytics

Actions

Flexibility Rules!
• The Single Logical View, Simple Distributed
Servers, Replaceable Components and Small
Footprint.....
• ... Provide Many Deployment Options


Flexibility Rules!
• The Single Logical View, Simple Distributed
Servers, Replaceable Components and Small
Footprint.....
• ... Provide Many Deployment Options
Let's look at physical placement of data in more detail.


www.Objectivity.com

Flexible
Placement
Lenny Hoffman
Chief Architect, Objectivity

Placement
• The vast majority of DBMSs place data
– On disk using some type of organization.
– On different disks.
– On different nodes.

• A lot of what differentiates DBMSs is how they place
data.
• The way data is placed greatly affects the applicability
of the DBMS for certain models (e.g. relational,
document, graph) as well as different use cases.
– Grouping by sets is good for relational, but not so good for
document or graph.

Model Influences Placement
• Most DBMSs are model specific:
– Relational – Key-Value – Document – Graph

• They naturally layout (place) data in ways that best support
those models and the use cases they tend to attract.
– Relational DBMSs tend to place like type data together as that
performs best given their set based access.
– Key-Value DBMSs tend to place data according to key values
(hashed), as that makes looking up by key fast.
– Document DBMSs tend to place data in hierarchies that represent
documents to make working with a document fast.
– Graph DBMSs tend to place data according to its connectivity with
other data as that makes navigation fast.

Placement Configurability
• Performance is the main reason behind placement
choices. Orders of magnitude differences can occur
between efficient and inefficient data placements.
• DBMSs place data based on their understanding and
experience with the use cases that fit with the model
and their target markets.
• Knowing that they can’t in a general way place data
that is best for every application they usually make
placement configurable to some degree.

Typical Placement Configurability
• DBMSs usually provide configurability within the
general framework of placement they do:
– Table spaces, Fragmenting, partitioning, sharding.
– Node assignments.
– Limited clustering (e.g. across joins).

• Most do not provide the ability to change the
framework, in whole or in part.
– When you model or use data differently.
– When some of your data does not fit well with general
assumptions made by the framework.

Fully Configurable Placement
• Model independent.
• Choosing which objects (e.g. records/rows) are placed together
(clustering).
– Because they are going to be used together.

• Which ones are placed separately.
– Because they are going to be used separately and may be distributed
separately.

• Which ones you base a primary, secondary, tertiary, etc. organization on.
• Having different organizations for different data.
– Different types of data.
– The same types but used differently.
– Data used in different modes, e.g. demo vs. production.

• Distributing data based on what is used together and locality of use.

Placement Model
• The Placement Model is the means by which a database designer
expresses a placement design and what is used by the
Placement Manager when placing objects or helping the query
system look them up.
• A Placement Model primarily consists of rules and placers:
– Rule -- expresses the conditions that are to lead to a particular object
placer.
– Object Placer -- responsible for placing objects into containers and
keeping track of where they where placed.
– Container Placer -- responsible for placing containers into databases
and keeping track of where they were placed.
– Database Placer -- responsible for placing database files (including
those for its externalized containers) into storage locations (host/path
combinations) and keeping track where they were placed.

Place According to Connectivity

Place According to Related (Partition)

Place According to Value (Partition)
A

D

3

D

E

8

F

H

12

J

K

55

P

Q

106

S

Z

197

Placement Summary
• Placement has a huge influence on performance.
• You can mix and match all possible placement
arrangements.
• You can use different placement schemes for different parts
of your data.
• We place it, therefore we find it (efficiently).
• It is easy to try different placement schemes to see what
works best for you.
• You can have different placement schemes for demo vs.
production, etc.
• You can evolve your placement as models are versioned.

Thank you!
Q&A
Follow us on twitter: @infinitegraph
and @objectivitydb

Objectivity/DB: A Multipurpose NoSQL Database

More Related Content

Viewers also liked

Similar to Objectivity/DB: A Multipurpose NoSQL Database

More from InfiniteGraph

Recently uploaded

Objectivity/DB: A Multipurpose NoSQL Database