Introduction to NoSQL
(Not Only SQL)
By Antonio Castellón :: Multi-Disciplinary Engineer – Computer Science
May, 2015 - for Philip Morris International R&D
Problem : Data Complex
Problem : Data Complex to Model
Problem : Dynamic Data ( Uncertainty )
End User requirements and data itself sometimes generate
different types of uncertainty
The NoSQL Jungle
Data – NoSQL – Different implementations
CURRENTLY
+150
Data - NoSQL – Comparing data structure
Image from: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
Data - NoSQL – Compare
98% of the business
requirements
There is still billions of
nodes and relationships
Data - NoSQL – Keys to fit
Key-value
store
Column
Store
Document
Store
Graph
Database
Performance High High High Variable
Scalability High High Variable (High) Variable
Flexibility High Moderate High High
Complexity None Low Low High
Functionality Variable(None) Minimal Variable (Low) Graph Theory
Data – Our selection
Graph
Databases
Data – Graph Databases – Why?
Flexible data structure
Doesn’t matter if the relations will change in the future.
Closer match to business logic
Data – Graph Databases – Why?
Natural query system
You tell what you want, not how to get it.
with recursive cluster (party, path, depth)
as ( select cast(@userId as character varying),
cast(@userId as character varying), 1
union
(
select (case
when this.party = amc.userA then amc.userB
when this.party = amc.userB then amc.userA
end), (this.path || '.' || (case
when this.party = amc.userA then amc.userB
when this.party = amc.userB then amc.userA
end)), this.depth + 1
from cluster this, chat amc
where ((this.party = amc.userA and
position(amc.userB in this.path) = 0)
or (this.party = amc.userB and position(amc.userA
in this.path) = 0)) AND this.depth < @depth + 1 )
)
select party, path
from cluster
where not exists (
select *
from cluster c2 where cluster.party = c2.party
and (
char_length(cluster.path) > char_length(c2.path)
or (char_length(cluster.path) =
char_length(c2.path)) and (cluster.path > c2.path)
)
)
order by party, path;
SQL = several hours to be executed
VS
START b = node:User(UserId=‘Manolo')
MATCH (b) --(friend)--(friendoffriend)
RETURN count(friendoffriend)
Cypher Language = 635ms
Data - Graph Databases – Why?
Fits very well with complex data
Data - Graph Databases – Why?
Fits very well with Bio-Informatics
0.9 Billion
relationsips
Data – Graph Databases – Why?
Fast Prototyping and development
We don’t need to lose too much time to define the schema (fine-grained).
Data - Graph Databases – What is it?
Properties
Labels
Relationships
Data - Graph Databases - Implemented by …
Data - Graph Databases – Top 3
Name API Query Methods Consistency Staff (people) /
Community
OrientDB Java Traverser
API, Blueprints,
Rexster
Own SQL-like
Query Language,
Gremlin
ACID, MVCC 3 / Low
Neo4j Java, Python,
JPython, Ruby,
JRuby, JavaScript
(Node.js), PHP,
.NET, Django,
Clojure, Spring,
Scala, or REST
(any language)
Cypher
(native/preferred),
Native Java APIs
(special cases),
Traverser API,
REST, Blueprints,
Gremlin
ACID 42 / Very High
DEX Java, C++, .NET Native Java, C#
and C++ APIs,
Blueprints, Gremlin
Consistency,
durability and
partial isolation
and atomicity
5 / ?
Data - Graph Databases - Neo4j customers
Data - Graph Database - Neo4j - Partners
End
Thanks you for your attention.

NoSQL

  • 1.
    Introduction to NoSQL (NotOnly SQL) By Antonio Castellón :: Multi-Disciplinary Engineer – Computer Science May, 2015 - for Philip Morris International R&D
  • 2.
  • 3.
    Problem : DataComplex to Model
  • 4.
    Problem : DynamicData ( Uncertainty ) End User requirements and data itself sometimes generate different types of uncertainty
  • 5.
  • 6.
    Data – NoSQL– Different implementations CURRENTLY +150
  • 7.
    Data - NoSQL– Comparing data structure Image from: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
  • 8.
    Data - NoSQL– Compare 98% of the business requirements There is still billions of nodes and relationships
  • 9.
    Data - NoSQL– Keys to fit Key-value store Column Store Document Store Graph Database Performance High High High Variable Scalability High High Variable (High) Variable Flexibility High Moderate High High Complexity None Low Low High Functionality Variable(None) Minimal Variable (Low) Graph Theory
  • 10.
    Data – Ourselection Graph Databases
  • 11.
    Data – GraphDatabases – Why? Flexible data structure Doesn’t matter if the relations will change in the future. Closer match to business logic
  • 12.
    Data – GraphDatabases – Why? Natural query system You tell what you want, not how to get it. with recursive cluster (party, path, depth) as ( select cast(@userId as character varying), cast(@userId as character varying), 1 union ( select (case when this.party = amc.userA then amc.userB when this.party = amc.userB then amc.userA end), (this.path || '.' || (case when this.party = amc.userA then amc.userB when this.party = amc.userB then amc.userA end)), this.depth + 1 from cluster this, chat amc where ((this.party = amc.userA and position(amc.userB in this.path) = 0) or (this.party = amc.userB and position(amc.userA in this.path) = 0)) AND this.depth < @depth + 1 ) ) select party, path from cluster where not exists ( select * from cluster c2 where cluster.party = c2.party and ( char_length(cluster.path) > char_length(c2.path) or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path) ) ) order by party, path; SQL = several hours to be executed VS START b = node:User(UserId=‘Manolo') MATCH (b) --(friend)--(friendoffriend) RETURN count(friendoffriend) Cypher Language = 635ms
  • 13.
    Data - GraphDatabases – Why? Fits very well with complex data
  • 14.
    Data - GraphDatabases – Why? Fits very well with Bio-Informatics 0.9 Billion relationsips
  • 15.
    Data – GraphDatabases – Why? Fast Prototyping and development We don’t need to lose too much time to define the schema (fine-grained).
  • 16.
    Data - GraphDatabases – What is it? Properties Labels Relationships
  • 17.
    Data - GraphDatabases - Implemented by …
  • 18.
    Data - GraphDatabases – Top 3 Name API Query Methods Consistency Staff (people) / Community OrientDB Java Traverser API, Blueprints, Rexster Own SQL-like Query Language, Gremlin ACID, MVCC 3 / Low Neo4j Java, Python, JPython, Ruby, JRuby, JavaScript (Node.js), PHP, .NET, Django, Clojure, Spring, Scala, or REST (any language) Cypher (native/preferred), Native Java APIs (special cases), Traverser API, REST, Blueprints, Gremlin ACID 42 / Very High DEX Java, C++, .NET Native Java, C# and C++ APIs, Blueprints, Gremlin Consistency, durability and partial isolation and atomicity 5 / ?
  • 19.
    Data - GraphDatabases - Neo4j customers
  • 20.
    Data - GraphDatabase - Neo4j - Partners
  • 21.
    End Thanks you foryour attention.

Editor's Notes

  • #3 Data is complex from their definition, too many relationships between different nodes and different domains.
  • #4 To fit from the „real“ world to an standard Entity Relational Model is a nightmare and it‘s a focus of errors if something need to be changed in the future (to introduce new properties, new objects, new relationships, etc. )
  • #5 The important thing from any design is to acquire correctly at least the 99% of the User requirements, but it‘s impossible if the user generate uncertainly from different reasons (and also when exists different users with different domains or points of view).
  • #7 Each solution have sense by itself, it has been developed to solve some specific problem and this is just, that the Architect needs to analyze. What are the constraints and structures from your data and what do you want to do with them. Only after this reasoning you will be able to choose between the different solution from the market.
  • #8 Not all NoSQL are the same, and not all fits with your requirements/expectations.
  • #9 NoSQL is not a bulletproof, but fits very well with the BigData
  • #10 Again, you need to evaluate your type of data and what kind of analysis do you want to do with them
  • #11 It‘s a complement, this technology appears several years ago...but the last years was impossed by the requirements about the scalability, clustering and performance.
  • #12 - Data is according with the mind of the expert area (ex: Lab. people) and not with the mind of the IT Expert area. Good reference: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
  • #13 http://www.slideshare.net/ayeeson/0221-cypher-for-sql-professionals