On Storing Big Data
Ilias Flaounas
Intelligent Systems Lab
30 October 2012
I. Flaounas (Intelligent Systems Lab) 30 October 2012 1 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
Especially storing of data is often disregarded as an issue.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
Especially storing of data is often disregarded as an issue.
Note that sometimes a MySQL database is not enough.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
Especially storing of data is often disregarded as an issue.
Note that sometimes a MySQL database is not enough.
Hadoop offers an out of the box distributed filesystem for storing data
files. However, the challenge appears when someone needs DB
capabilities, frequent updates or real time processing.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
Common practices to increase performance fail after a while: buying a
faster server, getting more RAM, using materialised views, fine tuning
queries...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
Common practices to increase performance fail after a while: buying a
faster server, getting more RAM, using materialised views, fine tuning
queries...
Furthermore, “alter table” doesn’t really work with lots of data.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
Common practices to increase performance fail after a while: buying a
faster server, getting more RAM, using materialised views, fine tuning
queries...
Furthermore, “alter table” doesn’t really work with lots of data.
Backups and data availability becomes an issue.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
NoSQL Movement
The term is too broad and new to really define it.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
No common scripting language (like SQL)
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
No common scripting language (like SQL)
No ACID (atomicity, consistency, isolation, durability)
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
No common scripting language (like SQL)
No ACID (atomicity, consistency, isolation, durability)
On the other hand you gain horizontal scalability and high performance.
Also, most NoSQL systems are Map/Reduce ready and/or bind with
Hadoop.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
Specialised for graphs: Neo4j, InfiniteGraph,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
Specialised for graphs: Neo4j, InfiniteGraph,...
Specialised for full-text search: Lucene, Solr...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
Specialised for graphs: Neo4j, InfiniteGraph,...
Specialised for full-text search: Lucene, Solr...
Understand your requirements and then make a choice.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
Oracle response
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Oracle response
May, 2011: Oracle issues a white paper titled “Debunking the NoSQL
Hype”.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Oracle response
May, 2011: Oracle issues a white paper titled “Debunking the NoSQL
Hype”.
The conclusion:
“Go for the tried and true path. Don’t be risking your data on NoSQL
databases.”
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Oracle response
May, 2011: Oracle issues a white paper titled “Debunking the NoSQL
Hype”.
The conclusion:
“Go for the tried and true path. Don’t be risking your data on NoSQL
databases.”
October 2011: Oracle releases the “Oracle NoSQL Database”. The white
paper is now reachable only via Google archives.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
Map/Reduce ready - Can bind with Hadoop
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
Map/Reduce ready - Can bind with Hadoop
Eventually consistent
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
Map/Reduce ready - Can bind with Hadoop
Eventually consistent
Open Source but developed and maintained by company “10gen”
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Document based DB
A document is represented in JSON format:
{
“ id” : 12345678,
“Link” : “http://news.scotsman.com/abc.html”,
“Title”:“Blah blah blah”,
“Content”: “More blah blah”,
“OutletID” : 14,
“Date” : ISODate(“2011-11-17T20:33:15.097Z”),
“ Hash” : 550973592,
“Tags” : [ International, News, Scotland],
}
I. Flaounas (Intelligent Systems Lab) 30 October 2012 8 / 16
Single Server
A single machine stores the DB, e.g MySQL.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 9 / 16
Master/Slave
Two machines in Master/Slave configuration.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 10 / 16
MongoDB - Replication
Automatic Fail Over - The Master is elected among servers.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 11 / 16
MongoDB - Sharding
Data is spread horizontally.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 12 / 16
MongoDB
If new shard is added, data is balanced automatically.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 13 / 16
MongoDB
No single point of failure, distributed read/writes.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 14 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
Designing DB – splitting data among machines for maximum I/O
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
Designing DB – splitting data among machines for maximum I/O
Bugs or ‘simple’ features may be missing, new versions come out too
often...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
Designing DB – splitting data among machines for maximum I/O
Bugs or ‘simple’ features may be missing, new versions come out too
often...
Security
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Thank you!
I. Flaounas (Intelligent Systems Lab) 30 October 2012 16 / 16

On Storing Big Data

  • 1.
    On Storing BigData Ilias Flaounas Intelligent Systems Lab 30 October 2012 I. Flaounas (Intelligent Systems Lab) 30 October 2012 1 / 16
  • 2.
    Storing Big Data Datastart to play an increasingly important role in business and science. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 3.
    Storing Big Data Datastart to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 4.
    Storing Big Data Datastart to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. Especially storing of data is often disregarded as an issue. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 5.
    Storing Big Data Datastart to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. Especially storing of data is often disregarded as an issue. Note that sometimes a MySQL database is not enough. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 6.
    Storing Big Data Datastart to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. Especially storing of data is often disregarded as an issue. Note that sometimes a MySQL database is not enough. Hadoop offers an out of the box distributed filesystem for storing data files. However, the challenge appears when someone needs DB capabilities, frequent updates or real time processing. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 7.
    The Problems Nowadays traditionalrelational databases can reach their limit in performance. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 8.
    The Problems Nowadays traditionalrelational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 9.
    The Problems Nowadays traditionalrelational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries... I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 10.
    The Problems Nowadays traditionalrelational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries... Furthermore, “alter table” doesn’t really work with lots of data. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 11.
    The Problems Nowadays traditionalrelational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries... Furthermore, “alter table” doesn’t really work with lots of data. Backups and data availability becomes an issue. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 12.
    NoSQL Movement The termis too broad and new to really define it. I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 13.
    NoSQL Movement The termis too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 14.
    NoSQL Movement The termis too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 15.
    NoSQL Movement The termis too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 16.
    NoSQL Movement The termis too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables No common scripting language (like SQL) I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 17.
    NoSQL Movement The termis too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables No common scripting language (like SQL) No ACID (atomicity, consistency, isolation, durability) I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 18.
    NoSQL Movement The termis too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables No common scripting language (like SQL) No ACID (atomicity, consistency, isolation, durability) On the other hand you gain horizontal scalability and high performance. Also, most NoSQL systems are Map/Reduce ready and/or bind with Hadoop. I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 19.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 20.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 21.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 22.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 23.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 24.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... Specialised for graphs: Neo4j, InfiniteGraph,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 25.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... Specialised for graphs: Neo4j, InfiniteGraph,... Specialised for full-text search: Lucene, Solr... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 26.
    NoSQL DBs There arelots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... Specialised for graphs: Neo4j, InfiniteGraph,... Specialised for full-text search: Lucene, Solr... Understand your requirements and then make a choice. I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 27.
    Oracle response I. Flaounas(Intelligent Systems Lab) 30 October 2012 6 / 16
  • 28.
    Oracle response May, 2011:Oracle issues a white paper titled “Debunking the NoSQL Hype”. I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 29.
    Oracle response May, 2011:Oracle issues a white paper titled “Debunking the NoSQL Hype”. The conclusion: “Go for the tried and true path. Don’t be risking your data on NoSQL databases.” I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 30.
    Oracle response May, 2011:Oracle issues a white paper titled “Debunking the NoSQL Hype”. The conclusion: “Go for the tried and true path. Don’t be risking your data on NoSQL databases.” October 2011: Oracle releases the “Oracle NoSQL Database”. The white paper is now reachable only via Google archives. I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 31.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 32.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 33.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 34.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 35.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 36.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 37.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 38.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 39.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 40.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support Map/Reduce ready - Can bind with Hadoop I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 41.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support Map/Reduce ready - Can bind with Hadoop Eventually consistent I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 42.
    Example: MongoDB MongoDB (from“humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support Map/Reduce ready - Can bind with Hadoop Eventually consistent Open Source but developed and maintained by company “10gen” I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 43.
    Document based DB Adocument is represented in JSON format: { “ id” : 12345678, “Link” : “http://news.scotsman.com/abc.html”, “Title”:“Blah blah blah”, “Content”: “More blah blah”, “OutletID” : 14, “Date” : ISODate(“2011-11-17T20:33:15.097Z”), “ Hash” : 550973592, “Tags” : [ International, News, Scotland], } I. Flaounas (Intelligent Systems Lab) 30 October 2012 8 / 16
  • 44.
    Single Server A singlemachine stores the DB, e.g MySQL. I. Flaounas (Intelligent Systems Lab) 30 October 2012 9 / 16
  • 45.
    Master/Slave Two machines inMaster/Slave configuration. I. Flaounas (Intelligent Systems Lab) 30 October 2012 10 / 16
  • 46.
    MongoDB - Replication AutomaticFail Over - The Master is elected among servers. I. Flaounas (Intelligent Systems Lab) 30 October 2012 11 / 16
  • 47.
    MongoDB - Sharding Datais spread horizontally. I. Flaounas (Intelligent Systems Lab) 30 October 2012 12 / 16
  • 48.
    MongoDB If new shardis added, data is balanced automatically. I. Flaounas (Intelligent Systems Lab) 30 October 2012 13 / 16
  • 49.
    MongoDB No single pointof failure, distributed read/writes. I. Flaounas (Intelligent Systems Lab) 30 October 2012 14 / 16
  • 50.
    Big Data comewith Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 51.
    Big Data comewith Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 52.
    Big Data comewith Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 53.
    Big Data comewith Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs Designing DB – splitting data among machines for maximum I/O I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 54.
    Big Data comewith Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs Designing DB – splitting data among machines for maximum I/O Bugs or ‘simple’ features may be missing, new versions come out too often... I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 55.
    Big Data comewith Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs Designing DB – splitting data among machines for maximum I/O Bugs or ‘simple’ features may be missing, new versions come out too often... Security I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 56.
    Thank you! I. Flaounas(Intelligent Systems Lab) 30 October 2012 16 / 16