SlideShare a Scribd company logo
Copyright © 2011 LOGTEL
NoSQL (big data)
Samuel Dratwa
Samuel.dratwa@gmail.com
Copyright © 2011 LOGTEL
Agenda
 Big Data / No SQL – technology aspect
 How big is BIG ?
 The motivation behind NoSQL
 CAP theorem
 Partitions / fragmentation
 The different NoSQL models
 Key Value
 Column-Based
 Document store
 Big Table
 Graph
 The NoSQL way of thinking (using graphs)
 Big Data - Applicative (what can we do with it)
2
Copyright © 2011 LOGTEL
It’s a hype (!)
3
Copyright © 2011 LOGTEL
Big Data Definition
 No single standard definition…
“Big Data” is data whose scale, diversity, and
complexity require new architecture,
techniques, algorithms, and analytics to
manage it and extract value and hidden
knowledge from it…
4
Copyright © 2011 LOGTEL
NoSQL humor
5
http://geekandpoke.typepad.com/
Copyright © 2011 LOGTEL
10GB ? 10TB ? 10 PB ?
How big is BIG ?
6
Copyright © 2011 LOGTEL
The 4 V’s
7
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
8
Exponential increase in
collected/generated data
Copyright © 2011 LOGTEL
The 4 V’s
9
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and
structures
• Text, numerical, images, audio,
video, sequences, time series, social
media data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data
10
Copyright © 2011 LOGTEL
The 4 V’s
11
Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed
fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase
history, what you like  send promotions right now for store next to
you
• Healthcare monitoring: sensors monitoring your activities and body
 any abnormal measurements require immediate reaction
12
Copyright © 2011 LOGTEL
The 4 V’s
13
Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and
networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable
fashion
14
The Model Has
Changed…
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming
data
15
Copyright © 2011 LOGTEL
NoSQL motivation
16
Copyright © 2011 LOGTEL
Why Now?
 Explosion of social media sites (Facebook, Twitter)
with large data needs
 Explosion of storage needs in large web sites such
as Google, Yahoo
 Much of the data is not files
 Rise of cloud-based solutions such as Amazon S3
(simple storage solution)
 Shift to dynamically-typed data with frequent
schema changes
 Open-source community
Copyright © 2011 LOGTEL
Parallel Databases and Data Stores
 Relational Databases – mainstay of business
 Web-based applications caused spikes
 Especially true for public-facing e-Commerce
sites
 Many application servers, one database
 Easy to parallelize application servers to 1000s
of servers, harder to parallelize databases to
same scale
 First solution: memcache (in-memory) or other
caching mechanisms to reduce database access
Copyright © 2011 LOGTEL
Scaling Up
 What if the dataset is huge, and very high
number of transactions per second
 Use multiple servers to host database
 ‘scaling out’ or ‘horizontal scaling’
 Parallel databases have been around for a
while
 But expensive, and designed for decision support
not OLTP (Online Transaction Processing)
Copyright © 2011 LOGTEL
Scaling RDBMS – Master/Slave
 Master-Slave
 All writes are written to the master. All reads
performed against the replicated slave
databases
 Good for mostly read, very few update
applications
 Critical reads may be incorrect as writes may
not have been propagated down
 Large data sets can pose problems as master
needs to duplicate data to slaves
Copyright © 2011 LOGTEL
Scaling RDBMS - Partitioning
 Partitioning
 Divide the database across many machines
 E.g. hash or range partitioning
 Handled transparently by parallel databases
 but they are expensive
 “Sharding”
 Divide data amongst many cheap databases
(MySQL/PostgreSQL)
 Manage parallel access in the application
 Scales well for both reads and writes
 Not transparent, application needs to be partition-aware
Copyright © 2011 LOGTEL
What is NoSQL?
 Stands for Not Only SQL
 Class of non-relational data storage systems
 E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema nor do
they use the concept of joins
 All NoSQL offerings relax one or more of the ACID
properties (will talk about the CAP theorem)
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be rivaled
by the current list of NoSQL offerings
Copyright © 2011 LOGTEL
NoSQL Data Storage: Classification
 NoSQL solutions fall into 4 major areas:
 Uninterpreted key/value or ‘the big hash
table’.
 Amazon S3 (Dynamo)
 Voldemort
 Scalaris
 Column-based, with interpreted keys
 Cassandra, BigTable, HBase, Sherpa/PNuts
 Others
 CouchDB (document-based)
 Neo4J (graph-based)
Copyright © 2011 LOGTEL
NoSQL ecosystem
24
Copyright © 2011 LOGTEL
25
Copyright © 2011 LOGTEL
Big Data Landscape 2015
26
Copyright © 2011 LOGTEL 27
Copyright © 2011 LOGTEL
Architecture
28
Copyright © 2011 LOGTEL
MapReduce
29
Copyright © 2011 LOGTEL
ACID
Atomic: Either the whole process of a
transaction is done or none is.
Consistency: Database constraints
(application-specific) are preserved.
Isolation: It appears to the user as if only
one process executes at a time. (Two
concurrent transactions will not see on
another’s transaction while “in flight”.)
Durability: The updates made to the
database in a committed transaction will be
visible to future transactions. (Effects of a
process do not get lost if the system crashes.)
Copyright © 2011 LOGTEL
CAP Theorem
 Three properties of a system
 Consistency (all copies have same value)
 Availability (system can run even if parts have failed)
 Partitions (network can break into two or more parts, each
with active systems that can’t talk to other parts)
 Brewer’s CAP “Theorem”: You can have at most two
of these three properties for any system
 Very large systems will partition at some point
 Choose one of consistency or availability
 Traditional database choose consistency
 Most Web applications choose availability
 Except for specific parts such as order processing
Copyright © 2011 LOGTEL
The reminder
Dial 1-800-remind ......
 Available , Consist – not portioned
 Not available ...
 Available , Partitioned – not Consistent
 Consistent, Partitioned – not Available
32
Copyright © 2011 LOGTEL
The proof…
33
Copyright © 2011 LOGTEL
CAP Theorem
 Three properties of a system
 Consistency (all copies have same value)
 Availability (system can run even if parts have failed)
 Partitions (network can break into two or more parts, each
with active systems that can’t talk to other parts)
 Brewer’s CAP “Theorem”: You can have at most two
of these three properties for any system
 Very large systems will partition at some point
 Choose one of consistency or availability
 Traditional database choose consistency
 Most Web applications choose availability
 Except for specific parts such as order processing
Copyright © 2011 LOGTEL
Availability
 Traditionally, thought of as the server/process
available five 9’s (99.999 %).
 However, for large node system, at almost
any point in time there’s a good chance that a
node is either down or there is a network
disruption among the nodes.
 Want a system that is resilient in the face of
network disruption
Copyright © 2011 LOGTEL
Eventual Consistency
 When no updates occur for a long period of time,
eventually all updates will propagate through the system
and all the nodes will be consistent
 For a given accepted update and a given node, eventually
either the update reaches the node or the node is removed
from service
 Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
 Soft state: copies of a data item may be inconsistent
 Eventually Consistent – copies becomes consistent at
some later time if there are no more updates to that
data item
BASE in Cassandra
Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest Query
Digest Response Digest Response
Result
Client
Read repair if
digests differ
Copyright © 2011 LOGTEL
Common Advantages
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be partitioned
 When data is written, the latest version is on at least
one node and then replicated to other nodes
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don't require a schema
Copyright © 2011 LOGTEL
What am I giving up?
 joins
 group by
 order by
 ACID transactions
 SQL as a sometimes frustrating but still
powerful query language
 easy integration with other applications that
support SQL
Copyright © 2011 LOGTEL
Distributed Key-Value Data Stores
 Distributed key-value data storage systems allow
key-value pairs to be stored (and retrieved on
key) in a massively parallel system
 E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon
Dynamo, ..
 Partitioning, high availability etc. completely
transparent to application
 Sharding systems and key-value stores don’t
support many relational features
 No join operations (except within partition)
 No referential integrity constraints across
partitions
 etc.
Copyright © 2011 LOGTEL
Flexible Data Model
Rockets
Key Value
1
2
3
Name Value
toon
inventoryQty
brakes
Rocket-Powered Roller Skates
Ready, Set, Zoom
5
false
name
Name Value
toon
inventoryQty
brakes
Little Giant Do-It-Yourself Rocket-Sled Kit
Beep Prepared
4
false
Name Value
toon
inventoryQty
wheels
Acme Jet Propelled Unicycle
Hot Rod and Reel
1
1
name
name
Copyright © 2011 LOGTEL
HBase
46
Copyright © 2011 LOGTEL
Google
 Tables are sorted by Row
 Table schema only define its column families .
 Each family consists of any number of columns
 Each column consists of any number of versions
 Columns only exist when inserted, NULLs are free.
 Columns within a family are sorted and stored together
 Everything except table names are byte[]
 (Row, Family: Column, Timestamp)  Value
Row key
Column Family
valueTimeStamp
Copyright © 2011 LOGTEL
Splunk – Document base
48
Copyright © 2011 LOGTEL
Splunk – log analysis
49
Copyright © 2011 LOGTEL
PNUTS Data Storage Architecture
Copyright © 2011 LOGTEL
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)
52
Partitioning And Replication
Copyright © 2011 LOGTEL
Should I be using NoSQL Databases?
 For almost all of us, regular relational
databases are THE correct solution
 NoSQL Data storage systems makes sense for
applications that need to deal with very large
semi-structured data
 Log Analysis
 Social Networking Feeds
Copyright © 2011 LOGTEL
Graph in practice
(thanks to Luca Garulli)
54
Copyright © 2011 LOGTEL
...how to think «graphically» with
one of the most common domains
in the enterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
Lets take a real example - CRM
Copyright © 2011 LOGTEL
Every developer knows
the Relational Model(?),
but who knows the
Graph one?
Copyright © 2011 LOGTEL
Back to school:
Graph Theory crash course
Copyright © 2011 LOGTEL
Sam
NoSQL
lecture
Likes
Basic Graph
Copyright © 2011 LOGTEL
Sam
name: Samuel
surname: Dratwa
company: SADOT
NoSQL
Lecture
editions: [Comverse, Tel-Aviv]
Likes
since: 2012
Vertices and Edges
can have
properties
Vertices and Edges
can have
properties
Vertices and Edges
can have
properties
Vertices are
directed
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
Property Graph Model*
Copyright © 2011 LOGTEL
Sam
NoSQL
lecture
An Edge connects 2
vertices: use multiple
vertices to represents 1-N
and N-M relationships
Edges - Arcs
Copyright © 2011 LOGTEL
Likes
Avital
Sam
FriendOf
NoSQL
lecture
Doron
Joins
Copyright © 2011 LOGTEL
Compliments,
this is your diploma in
«Graph Theory»
Copyright © 2011 LOGTEL
Customer Address
Order Stock
Registry system
Order system
Domain: minimal CRM
Copyright © 2011 LOGTEL
Stock
Registry system
Order
Order system
Customer Address
How does
Relational DBMS
manage relationships?
Copyright © 2011 LOGTEL
JOIN Customer.Address -> Address.Id
Customer
Id Name Address
10 Samuel 34
11 Katja 44
34 Sylvia 54
56 Mark 66
88 Steve 68
Address
Id Location
34 Rome, London
44 Cologne
54 Rome
66 New Mexico
68 Palo Alto
Relational World: 1-1 Relationships
Copyright © 2011 LOGTEL
Inverse JOIN Address.Customer -> Customer.Id
Customer
Id Name
10 Samuel
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Customer Location
24 10 Rome
33 10 London
44 34 Rome
66 11 Cologne
68 88 Palo Alto
Relational World: 1-N Relationships
Copyright © 2011 LOGTEL
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
Customer
Id Name
10 Samuel
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
CustomerAddr
ess
Id Address
10 24
10 33
34 24
Relational World: N-M Relationships
Copyright © 2011 LOGTEL
What’s wrong with the
Relational Model?
Copyright © 2011 LOGTEL
These are all JOINs executed
everytime you traverse a
relationship
Customer
Id Name
10 Samuel
11 Katja
34 Sylvia
56 Mark
88 Steve
Address
Id Location
24 Rome
33 London
44 Rome
66 Cologne
68 Palo Alto
These are all JOINs executed
everytime you traverse a
relationship
These are all JOINs executed
everytime you traverse a
relationship
These are all JOINs executed
everytime you traverse a
relationship!
CustomerAddr
ess
Id Address
10 24
10 33
34 24
The JOIN is the evil!
Copyright © 2011 LOGTEL
Why not JOIN
• A JOIN means searching for a key in another table
• The first rule to improve performance is
indexing all the keys
• Index speeds up searches but slows down
insert, updates and deletes
• So in the best case a JOIN is a lookup into in an
index
• This is done per single join!
• If you traverse hundreds of relationships
you’re executing hundreds of JOINs
Copyright © 2011 LOGTEL
Index Lookup
it is really that fast?
Copyright © 2011 LOGTEL
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Index algorithms are all
similar and based on
balanced trees
Index Lookup: how does it works?
Think to an
Address Book
where we have to find
Samuel’s phone number
Copyright © 2011 LOGTEL
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Found!
Each lookup takes
X steps, where X
grows with the
index size!
Copyright © 2011 LOGTEL
An index lookup is executed
for each JOIN
Querying more tables can easily
produce millions of JOINs/Lookups!
Here the rule: more entries
= more lookup steps = slower JOIN
Copyright © 2011 LOGTEL
Is there a better way to
manage relationships?
Copyright © 2011 LOGTEL
How does GraphDB manage
index-free relationships?
Copyright © 2011 LOGTEL
an Open Source (Apache 2)
document-graph NoSQL dbms
supports: transactions, extended-SQL,
Multi-Master replication, etc
Copyright © 2011 LOGTEL
Sam
Lives
out : [#14:54]
label : ‘Customer’
name : ‘Sam’
out: [#13:35]
in: [#13:100]
Label : ‘Lives’
RID =
#13:35
RID =
#14:54
RID =
#13:100
in: [#14:54]
label = ‘Address’
name = ‘Rome’
The Record ID (RID)
is a Physical position
Rome
OrientDB: traverse a relationship
Copyright © 2011 LOGTEL
GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes the
relationship every time you query a database
Is not that crazy?!
Copyright © 2011 LOGTEL
This means jumping from a
O(log N) algorithm to a near O(1)
traversing cost is not more affected
by database size!
This is huge in the BigData age
Copyright © 2011 LOGTEL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.
orientdb> create vertex V set name = ‘Sam’, label = ‘Customer’
Created vertex #13:35 in 0.03 secs
orientdb> create vertex V set name = ‘Rome’, label = ‘Address’
Created vertex #13:100 in 0.02 secs
orientdb> create edge E from #13:35 to #13:100 set label = ‘Lives’
Created edge #14:54 in 0.02 secs
Create the graph in SQL
Copyright © 2011 LOGTEL
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);
ODocument sam= graph.createVertex();
sam.field(“name", “Sam");
sam.field(“label", “Customer");
ODocument rome = graph.createVertex();
rome.field(“name", “Rome”);
rome.field(“label", “Address”);
ODocument edge = graph.createEdge(sam, rome).field(“label”, “Lives”);
edge.save();
graph.close();
Create the graph in Java
Copyright © 2011 LOGTEL
orientdb> select in[label=‘Lives’].out from V where
label = ‘Address’ and name = ‘Rome’
---+--------+--------------------+--------------------+--------------------+
#| REC ID |label |out |in |
---+--------+--------------------+--------------------+--------------------+
0| 13:35|Sam |[#14:54] | |
---+--------+--------------------+--------------------+--------------------+
1 item(s) found. Query executed in 0.007 sec(s).
orientdb> select * from V where label = ‘Address’ AND
in[label=‘Lives’].size() > 0
---+--------+--------------------+--------------------+--------------------+
#| REC ID |label |out |in |
---+--------+--------------------+--------------------+--------------------+
0| 13:100| Rome | |[#14:54] |
---+--------+--------------------+--------------------+--------------------+
1 item(s) found. Query executed in 0.007 sec(s).
Query the graph in SQL
Copyright © 2011 LOGTEL
OGraphDatabase graph = new
OGraphDatabase("local:/tmp/db/graph”);
// GET ALL THE THE CUSTOMER FROM ROME, ITALY
List<ODocument> result = graph.command( new OCommandSQL (
“select in[label=‘Lives’].out from V where label = ‘Address’
and name = ?”)
).execute( “Rome”);
for( ODocument v : result ) {
System.out.println(“Result: “ + v.field(“label”) );
}
-----------------------------------------------------------------------------------
----Result: Sam
Query the graph in Java
Copyright © 2011 LOGTEL
Query vs. traversal
 Once you’ve a well connected
database in the form of a Super
Graph you can cross records
instead of query them!
 All you need is some root vertices
where to start to traverse
Copyright © 2011 LOGTEL
Customers
Sam John Sylvia
Order
2332
Order
8834
White
Soap
Stocks
Special
Customers
This is a
root
vertex
Query vs. traversal
Copyright © 2011 LOGTEL
Supposing that the root node #30:0 links all the
Customer vertices
Get all the customers:
orientdb> select out.in from #30:0
Get all the customers who bought at least one ‘White Soap’
product:
orientdb> select * from ( select out.in from #30:0) where
out.in.out[label=‘Bought’].in.name = ‘White Soap’
Customers
#30:0
Query the graph in SQL
Copyright © 2011 LOGTEL
Demo time!
Copyright © 2011 LOGTEL
Should I be using NoSQL Databases?
 For almost all of us, regular relational
databases are THE correct solution
 NoSQL Data storage systems makes sense for
applications that need to deal with very very
large semi-structured data
 Log Analysis
 Social Networking Feeds
Copyright © 2011 LOGTEL
WHAT CAN WE DO WITH
BIG DATA ?
90
What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
91
Value of Big Data Analytics
• Big data is more real-time in
nature than traditional DW
applications
• Traditional DW architectures (e.g.
Exadata, Teradata) are not well-
suited for big data apps
• Shared nothing, massively parallel
processing, scale out
architectures are well-suited for
big data apps
92
Copyright © 2011 LOGTEL 93
Copyright © 2011 LOGTEL
USN with related technical areas
94
What is collecting all this data?
Web Browsers Search Engines
Microsoft’s
Internet Explorer
Mozilla’s FireFox
Google’s Chrome
Apple’s Safari
Google’s
Microsoft’s
Yahoo’s
IAC Search’s
Time-Warner’s AOL
Explorer
(Non-profit foundation,
used to be Netscape)
What is collecting all this data?
Smartphones & Apps
Apple’s iPhone
(Apple O/S)
Samsung, HTC.
Nokia, Motorola
(Android O/S)
RIM Corp’s Blackberry
(BlackBerry O/S)
Tablet Computers & Apps
Apple’s iPad
Samsung’s Galaxy
Amazon’s Kindle Fire
What is collecting all this data?
Hospitals & Other Medical Systems Banking & Phone Systems
Can you hear me now?
(Heh heh heh!)
Pharmacies
Laboratories
Imaging Centers
Emergency Medical Services (EMS)
Hospital Information Systems
Doc-in-a-Box
Electronic Medical Records
Blood Banks
Birth & Death Records
What is collecting all this data?
A real pain in the apps! What are they collecting?
• Restaurant reservations
(Open Table)
• Weather in L.A. in 3 days
(Weather+)
• Side effects of medications
(MedWatcher)
• 3-star hotels in New Orleans
(Priceline)
• Which PC should I buy and where
(PriceCheck)
Big Brother Needs Big Data
In March 2012, the Obama Administration announced the Big Data Research
and Development Initiative, $200 million in new R&D investments, which will
explore how Big Data could be used to address important problems facing the
government. The initiative was composed of 84 different Big Data programs
spread across six departments.
http://tinyurl.com/85oytkj
The U.S. Federal Government owns six of the ten most powerful supercomputers
in the world.
How Companies Like Use Big
Data To Make You Love Them
Last month, I talked to Amazon customer service about my malfunctioning
Kindle, and it was great. Thirty seconds after putting in a service request on
Amazon’s website, my phone rang, and the woman on the other end--let’s call
her Barbara--greeted me by name and said, "I understand that you have a
problem with your Kindle." We resolved my problem in under two minutes,
we got to skip the part where I carefully spell out my last name and address,
and she didn’t try to upsell me on anything. After nearly a decade of ordering
stuff from Amazon, I never loved the company as much as I did at that
moment.
The fact is, Amazon has been collecting my information for years--not just
addresses and payment information but the identity of everything I’ve ever
bought or even looked at. And while dozens of other companies do that,
too, Amazon’s doing something remarkable with theirs. They’re using that
data to build our relationship.
Article by Sean Madden, May 2012, an expert in service design and innovation strategy.
How Can You Avoid Big Data?
• Pay cash for everything!
• Never go online!
• Don’t use a telephone!
• Don’t use Kroger or Harris Teeter cards!
• Don’t fill any prescriptions!
• Never leave your house!
Key concept of Big Data
• Store everything
• Don’t delete anything
• Schema is a bottleneck
• Think always on parallel
• Remember the CAP theorem
ThankYou!!!
…and please fill the evaluation form
103

More Related Content

What's hot

20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
Tim Bell
 
OpenNebula Interoperability and Portability DMTF 2011
OpenNebula Interoperability and Portability  DMTF 2011OpenNebula Interoperability and Portability  DMTF 2011
OpenNebula Interoperability and Portability DMTF 2011
Ignacio M. Llorente
 
Open nebula leading innovation in cloud computing management
Open nebula   leading innovation in cloud computing managementOpen nebula   leading innovation in cloud computing management
Open nebula leading innovation in cloud computing management
Ignacio M. Llorente
 
Open source sdn controllers comparison
Open source sdn controllers comparisonOpen source sdn controllers comparison
Open source sdn controllers comparison
Yashaswi Jain
 
Innovation in cloud computing architectures with open nebula
Innovation in cloud computing architectures with open nebulaInnovation in cloud computing architectures with open nebula
Innovation in cloud computing architectures with open nebula
Ignacio M. Llorente
 
WhatIsData-Blitz
WhatIsData-BlitzWhatIsData-Blitz
WhatIsData-Blitz
pharvener
 
OpenStack and MySQL
OpenStack and MySQLOpenStack and MySQL
OpenStack and MySQL
Matt Lord
 
Network Virtualization
Network Virtualization Network Virtualization
Network Virtualization
InterVision Systems
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
OpenNebula Project
 
StratusLab: Darn Simple Cloud
StratusLab: Darn Simple CloudStratusLab: Darn Simple Cloud
StratusLab: Darn Simple Cloud
stratuslab
 
One pk whitepaper
One pk whitepaperOne pk whitepaper
One pk whitepaper
Yuan-Chuan Yeh
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
inside-BigData.com
 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network Virtualization
Kingston Smiler
 
building_private_cloud_with_oss_for_scientific_environments-libre
building_private_cloud_with_oss_for_scientific_environments-librebuilding_private_cloud_with_oss_for_scientific_environments-libre
building_private_cloud_with_oss_for_scientific_environments-libre
Vijayabalan Open to Explore
 
Presentatie Cisco NetApp Proact over FlexPod
Presentatie Cisco NetApp Proact over FlexPodPresentatie Cisco NetApp Proact over FlexPod
Presentatie Cisco NetApp Proact over FlexPod
Proact Netherlands B.V.
 
Naveen nimmu sdn future of networking
Naveen nimmu sdn   future of networkingNaveen nimmu sdn   future of networking
Naveen nimmu sdn future of networking
suniltomar04
 
Software-Defined Networking (SDN): Unleashing the Power of the Network
Software-Defined Networking (SDN): Unleashing the Power of the NetworkSoftware-Defined Networking (SDN): Unleashing the Power of the Network
Software-Defined Networking (SDN): Unleashing the Power of the Network
Robert Keahey
 
Running containers in production, the ING story
Running containers in production, the ING storyRunning containers in production, the ING story
Running containers in production, the ING story
Thijs Ebbers
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
Mark Hinkle
 
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...
Dan Mihai Dumitriu
 

What's hot (20)

20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
 
OpenNebula Interoperability and Portability DMTF 2011
OpenNebula Interoperability and Portability  DMTF 2011OpenNebula Interoperability and Portability  DMTF 2011
OpenNebula Interoperability and Portability DMTF 2011
 
Open nebula leading innovation in cloud computing management
Open nebula   leading innovation in cloud computing managementOpen nebula   leading innovation in cloud computing management
Open nebula leading innovation in cloud computing management
 
Open source sdn controllers comparison
Open source sdn controllers comparisonOpen source sdn controllers comparison
Open source sdn controllers comparison
 
Innovation in cloud computing architectures with open nebula
Innovation in cloud computing architectures with open nebulaInnovation in cloud computing architectures with open nebula
Innovation in cloud computing architectures with open nebula
 
WhatIsData-Blitz
WhatIsData-BlitzWhatIsData-Blitz
WhatIsData-Blitz
 
OpenStack and MySQL
OpenStack and MySQLOpenStack and MySQL
OpenStack and MySQL
 
Network Virtualization
Network Virtualization Network Virtualization
Network Virtualization
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
 
StratusLab: Darn Simple Cloud
StratusLab: Darn Simple CloudStratusLab: Darn Simple Cloud
StratusLab: Darn Simple Cloud
 
One pk whitepaper
One pk whitepaperOne pk whitepaper
One pk whitepaper
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network Virtualization
 
building_private_cloud_with_oss_for_scientific_environments-libre
building_private_cloud_with_oss_for_scientific_environments-librebuilding_private_cloud_with_oss_for_scientific_environments-libre
building_private_cloud_with_oss_for_scientific_environments-libre
 
Presentatie Cisco NetApp Proact over FlexPod
Presentatie Cisco NetApp Proact over FlexPodPresentatie Cisco NetApp Proact over FlexPod
Presentatie Cisco NetApp Proact over FlexPod
 
Naveen nimmu sdn future of networking
Naveen nimmu sdn   future of networkingNaveen nimmu sdn   future of networking
Naveen nimmu sdn future of networking
 
Software-Defined Networking (SDN): Unleashing the Power of the Network
Software-Defined Networking (SDN): Unleashing the Power of the NetworkSoftware-Defined Networking (SDN): Unleashing the Power of the Network
Software-Defined Networking (SDN): Unleashing the Power of the Network
 
Running containers in production, the ING story
Running containers in production, the ING storyRunning containers in production, the ING story
Running containers in production, the ING story
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
 
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...
 

Similar to Big Data NoSQL 1017

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
AnandKonj1
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
ssuser8c8fc1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
Genoveva Vargas-Solar
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
Mohamed Galal
 
NOSQL
NOSQLNOSQL
Big Data
Big DataBig Data
Big Data
NGDATA
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
RalucaGheorghita
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
Editor Jacotech
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
Stefan Prutianu
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
Stefan Ceriu
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014
Connor McDonald
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
punedevscom
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
Venu Anuganti
 

Similar to Big Data NoSQL 1017 (20)

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
NOSQL
NOSQLNOSQL
NOSQL
 
Big Data
Big DataBig Data
Big Data
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 

More from Samuel Dratwa

Artificial Intelligence (and the telecom industry)
Artificial Intelligence (and the telecom industry)Artificial Intelligence (and the telecom industry)
Artificial Intelligence (and the telecom industry)
Samuel Dratwa
 
IoT (and M2M and WoT) From the Operators (CSP) perspective
IoT (and M2M and WoT) From the Operators (CSP) perspectiveIoT (and M2M and WoT) From the Operators (CSP) perspective
IoT (and M2M and WoT) From the Operators (CSP) perspective
Samuel Dratwa
 
Introduction to Cloud Computing 2021
Introduction to Cloud Computing 2021Introduction to Cloud Computing 2021
Introduction to Cloud Computing 2021
Samuel Dratwa
 
Telecom Abbreviations
Telecom AbbreviationsTelecom Abbreviations
Telecom Abbreviations
Samuel Dratwa
 
מונחים טכנולוגים למנהלי הדרכה באמדוקס
מונחים טכנולוגים למנהלי הדרכה באמדוקסמונחים טכנולוגים למנהלי הדרכה באמדוקס
מונחים טכנולוגים למנהלי הדרכה באמדוקס
Samuel Dratwa
 
Amdocs ai s1
Amdocs ai s1Amdocs ai s1
Amdocs ai s1
Samuel Dratwa
 
Basic networking 07-2012
Basic networking 07-2012Basic networking 07-2012
Basic networking 07-2012
Samuel Dratwa
 
רשתות חברתיות ככלי מידע עסקי 2012
רשתות חברתיות ככלי מידע עסקי 2012רשתות חברתיות ככלי מידע עסקי 2012
רשתות חברתיות ככלי מידע עסקי 2012
Samuel Dratwa
 
NGN & IMS
NGN & IMSNGN & IMS
NGN & IMS
Samuel Dratwa
 
The future telecom
The future telecomThe future telecom
The future telecom
Samuel Dratwa
 
Web 2.0 (and the telecom industry)
Web 2.0 (and the telecom industry)Web 2.0 (and the telecom industry)
Web 2.0 (and the telecom industry)
Samuel Dratwa
 
רשתות חברתיות ומידע עסקי - או למה צריך להיות שם
רשתות חברתיות ומידע עסקי - או למה צריך להיות שםרשתות חברתיות ומידע עסקי - או למה צריך להיות שם
רשתות חברתיות ומידע עסקי - או למה צריך להיות שם
Samuel Dratwa
 

More from Samuel Dratwa (12)

Artificial Intelligence (and the telecom industry)
Artificial Intelligence (and the telecom industry)Artificial Intelligence (and the telecom industry)
Artificial Intelligence (and the telecom industry)
 
IoT (and M2M and WoT) From the Operators (CSP) perspective
IoT (and M2M and WoT) From the Operators (CSP) perspectiveIoT (and M2M and WoT) From the Operators (CSP) perspective
IoT (and M2M and WoT) From the Operators (CSP) perspective
 
Introduction to Cloud Computing 2021
Introduction to Cloud Computing 2021Introduction to Cloud Computing 2021
Introduction to Cloud Computing 2021
 
Telecom Abbreviations
Telecom AbbreviationsTelecom Abbreviations
Telecom Abbreviations
 
מונחים טכנולוגים למנהלי הדרכה באמדוקס
מונחים טכנולוגים למנהלי הדרכה באמדוקסמונחים טכנולוגים למנהלי הדרכה באמדוקס
מונחים טכנולוגים למנהלי הדרכה באמדוקס
 
Amdocs ai s1
Amdocs ai s1Amdocs ai s1
Amdocs ai s1
 
Basic networking 07-2012
Basic networking 07-2012Basic networking 07-2012
Basic networking 07-2012
 
רשתות חברתיות ככלי מידע עסקי 2012
רשתות חברתיות ככלי מידע עסקי 2012רשתות חברתיות ככלי מידע עסקי 2012
רשתות חברתיות ככלי מידע עסקי 2012
 
NGN & IMS
NGN & IMSNGN & IMS
NGN & IMS
 
The future telecom
The future telecomThe future telecom
The future telecom
 
Web 2.0 (and the telecom industry)
Web 2.0 (and the telecom industry)Web 2.0 (and the telecom industry)
Web 2.0 (and the telecom industry)
 
רשתות חברתיות ומידע עסקי - או למה צריך להיות שם
רשתות חברתיות ומידע עסקי - או למה צריך להיות שםרשתות חברתיות ומידע עסקי - או למה צריך להיות שם
רשתות חברתיות ומידע עסקי - או למה צריך להיות שם
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 

Big Data NoSQL 1017

  • 1. Copyright © 2011 LOGTEL NoSQL (big data) Samuel Dratwa Samuel.dratwa@gmail.com
  • 2. Copyright © 2011 LOGTEL Agenda  Big Data / No SQL – technology aspect  How big is BIG ?  The motivation behind NoSQL  CAP theorem  Partitions / fragmentation  The different NoSQL models  Key Value  Column-Based  Document store  Big Table  Graph  The NoSQL way of thinking (using graphs)  Big Data - Applicative (what can we do with it) 2
  • 3. Copyright © 2011 LOGTEL It’s a hype (!) 3
  • 4. Copyright © 2011 LOGTEL Big Data Definition  No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… 4
  • 5. Copyright © 2011 LOGTEL NoSQL humor 5 http://geekandpoke.typepad.com/
  • 6. Copyright © 2011 LOGTEL 10GB ? 10TB ? 10 PB ? How big is BIG ? 6
  • 7. Copyright © 2011 LOGTEL The 4 V’s 7
  • 8. Characteristics of Big Data: 1-Scale (Volume) • Data Volume • 44x increase from 2009 2020 • From 0.8 zettabytes to 35zb • Data volume is increasing exponentially 8 Exponential increase in collected/generated data
  • 9. Copyright © 2011 LOGTEL The 4 V’s 9
  • 10. Characteristics of Big Data: 2-Complexity (Varity) • Various formats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data 10
  • 11. Copyright © 2011 LOGTEL The 4 V’s 11
  • 12. Characteristics of Big Data: 3-Speed (Velocity) • Data is begin generated fast and need to be processed fast • Online Data Analytics • Late decisions  missing opportunities • Examples • E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you • Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction 12
  • 13. Copyright © 2011 LOGTEL The 4 V’s 13
  • 14. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 14
  • 15. The Model Has Changed… • The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data 15
  • 16. Copyright © 2011 LOGTEL NoSQL motivation 16
  • 17. Copyright © 2011 LOGTEL Why Now?  Explosion of social media sites (Facebook, Twitter) with large data needs  Explosion of storage needs in large web sites such as Google, Yahoo  Much of the data is not files  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Shift to dynamically-typed data with frequent schema changes  Open-source community
  • 18. Copyright © 2011 LOGTEL Parallel Databases and Data Stores  Relational Databases – mainstay of business  Web-based applications caused spikes  Especially true for public-facing e-Commerce sites  Many application servers, one database  Easy to parallelize application servers to 1000s of servers, harder to parallelize databases to same scale  First solution: memcache (in-memory) or other caching mechanisms to reduce database access
  • 19. Copyright © 2011 LOGTEL Scaling Up  What if the dataset is huge, and very high number of transactions per second  Use multiple servers to host database  ‘scaling out’ or ‘horizontal scaling’  Parallel databases have been around for a while  But expensive, and designed for decision support not OLTP (Online Transaction Processing)
  • 20. Copyright © 2011 LOGTEL Scaling RDBMS – Master/Slave  Master-Slave  All writes are written to the master. All reads performed against the replicated slave databases  Good for mostly read, very few update applications  Critical reads may be incorrect as writes may not have been propagated down  Large data sets can pose problems as master needs to duplicate data to slaves
  • 21. Copyright © 2011 LOGTEL Scaling RDBMS - Partitioning  Partitioning  Divide the database across many machines  E.g. hash or range partitioning  Handled transparently by parallel databases  but they are expensive  “Sharding”  Divide data amongst many cheap databases (MySQL/PostgreSQL)  Manage parallel access in the application  Scales well for both reads and writes  Not transparent, application needs to be partition-aware
  • 22. Copyright © 2011 LOGTEL What is NoSQL?  Stands for Not Only SQL  Class of non-relational data storage systems  E.g. BigTable, Dynamo, PNUTS/Sherpa, ..  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)  Not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings
  • 23. Copyright © 2011 LOGTEL NoSQL Data Storage: Classification  NoSQL solutions fall into 4 major areas:  Uninterpreted key/value or ‘the big hash table’.  Amazon S3 (Dynamo)  Voldemort  Scalaris  Column-based, with interpreted keys  Cassandra, BigTable, HBase, Sherpa/PNuts  Others  CouchDB (document-based)  Neo4J (graph-based)
  • 24. Copyright © 2011 LOGTEL NoSQL ecosystem 24
  • 25. Copyright © 2011 LOGTEL 25
  • 26. Copyright © 2011 LOGTEL Big Data Landscape 2015 26
  • 27. Copyright © 2011 LOGTEL 27
  • 28. Copyright © 2011 LOGTEL Architecture 28
  • 29. Copyright © 2011 LOGTEL MapReduce 29
  • 30. Copyright © 2011 LOGTEL ACID Atomic: Either the whole process of a transaction is done or none is. Consistency: Database constraints (application-specific) are preserved. Isolation: It appears to the user as if only one process executes at a time. (Two concurrent transactions will not see on another’s transaction while “in flight”.) Durability: The updates made to the database in a committed transaction will be visible to future transactions. (Effects of a process do not get lost if the system crashes.)
  • 31. Copyright © 2011 LOGTEL CAP Theorem  Three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Partitions (network can break into two or more parts, each with active systems that can’t talk to other parts)  Brewer’s CAP “Theorem”: You can have at most two of these three properties for any system  Very large systems will partition at some point  Choose one of consistency or availability  Traditional database choose consistency  Most Web applications choose availability  Except for specific parts such as order processing
  • 32. Copyright © 2011 LOGTEL The reminder Dial 1-800-remind ......  Available , Consist – not portioned  Not available ...  Available , Partitioned – not Consistent  Consistent, Partitioned – not Available 32
  • 33. Copyright © 2011 LOGTEL The proof… 33
  • 34. Copyright © 2011 LOGTEL CAP Theorem  Three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Partitions (network can break into two or more parts, each with active systems that can’t talk to other parts)  Brewer’s CAP “Theorem”: You can have at most two of these three properties for any system  Very large systems will partition at some point  Choose one of consistency or availability  Traditional database choose consistency  Most Web applications choose availability  Except for specific parts such as order processing
  • 35. Copyright © 2011 LOGTEL Availability  Traditionally, thought of as the server/process available five 9’s (99.999 %).  However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.  Want a system that is resilient in the face of network disruption
  • 36. Copyright © 2011 LOGTEL Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID  Soft state: copies of a data item may be inconsistent  Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. BASE in Cassandra Query Closest replica Cassandra Cluster Replica A Result Replica B Replica C Digest Query Digest Response Digest Response Result Client Read repair if digests differ
  • 42. Copyright © 2011 LOGTEL Common Advantages  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  When data is written, the latest version is on at least one node and then replicated to other nodes  Down nodes easily replaced  No single point of failure  Easy to distribute  Don't require a schema
  • 43. Copyright © 2011 LOGTEL What am I giving up?  joins  group by  order by  ACID transactions  SQL as a sometimes frustrating but still powerful query language  easy integration with other applications that support SQL
  • 44. Copyright © 2011 LOGTEL Distributed Key-Value Data Stores  Distributed key-value data storage systems allow key-value pairs to be stored (and retrieved on key) in a massively parallel system  E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..  Partitioning, high availability etc. completely transparent to application  Sharding systems and key-value stores don’t support many relational features  No join operations (except within partition)  No referential integrity constraints across partitions  etc.
  • 45. Copyright © 2011 LOGTEL Flexible Data Model Rockets Key Value 1 2 3 Name Value toon inventoryQty brakes Rocket-Powered Roller Skates Ready, Set, Zoom 5 false name Name Value toon inventoryQty brakes Little Giant Do-It-Yourself Rocket-Sled Kit Beep Prepared 4 false Name Value toon inventoryQty wheels Acme Jet Propelled Unicycle Hot Rod and Reel 1 1 name name
  • 46. Copyright © 2011 LOGTEL HBase 46
  • 47. Copyright © 2011 LOGTEL Google  Tables are sorted by Row  Table schema only define its column families .  Each family consists of any number of columns  Each column consists of any number of versions  Columns only exist when inserted, NULLs are free.  Columns within a family are sorted and stored together  Everything except table names are byte[]  (Row, Family: Column, Timestamp)  Value Row key Column Family valueTimeStamp
  • 48. Copyright © 2011 LOGTEL Splunk – Document base 48
  • 49. Copyright © 2011 LOGTEL Splunk – log analysis 49
  • 50. Copyright © 2011 LOGTEL PNUTS Data Storage Architecture
  • 51. Copyright © 2011 LOGTEL 01 1/2 F E D C B A N=3 h(key2) h(key1) 52 Partitioning And Replication
  • 52. Copyright © 2011 LOGTEL Should I be using NoSQL Databases?  For almost all of us, regular relational databases are THE correct solution  NoSQL Data storage systems makes sense for applications that need to deal with very large semi-structured data  Log Analysis  Social Networking Feeds
  • 53. Copyright © 2011 LOGTEL Graph in practice (thanks to Luca Garulli) 54
  • 54. Copyright © 2011 LOGTEL ...how to think «graphically» with one of the most common domains in the enterprise world: The old-classic CRM* domain * today in 99% of the cases a RDBMS is used Lets take a real example - CRM
  • 55. Copyright © 2011 LOGTEL Every developer knows the Relational Model(?), but who knows the Graph one?
  • 56. Copyright © 2011 LOGTEL Back to school: Graph Theory crash course
  • 57. Copyright © 2011 LOGTEL Sam NoSQL lecture Likes Basic Graph
  • 58. Copyright © 2011 LOGTEL Sam name: Samuel surname: Dratwa company: SADOT NoSQL Lecture editions: [Comverse, Tel-Aviv] Likes since: 2012 Vertices and Edges can have properties Vertices and Edges can have properties Vertices and Edges can have properties Vertices are directed * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model Property Graph Model*
  • 59. Copyright © 2011 LOGTEL Sam NoSQL lecture An Edge connects 2 vertices: use multiple vertices to represents 1-N and N-M relationships Edges - Arcs
  • 60. Copyright © 2011 LOGTEL Likes Avital Sam FriendOf NoSQL lecture Doron Joins
  • 61. Copyright © 2011 LOGTEL Compliments, this is your diploma in «Graph Theory»
  • 62. Copyright © 2011 LOGTEL Customer Address Order Stock Registry system Order system Domain: minimal CRM
  • 63. Copyright © 2011 LOGTEL Stock Registry system Order Order system Customer Address How does Relational DBMS manage relationships?
  • 64. Copyright © 2011 LOGTEL JOIN Customer.Address -> Address.Id Customer Id Name Address 10 Samuel 34 11 Katja 44 34 Sylvia 54 56 Mark 66 88 Steve 68 Address Id Location 34 Rome, London 44 Cologne 54 Rome 66 New Mexico 68 Palo Alto Relational World: 1-1 Relationships
  • 65. Copyright © 2011 LOGTEL Inverse JOIN Address.Customer -> Customer.Id Customer Id Name 10 Samuel 11 Katja 34 Sylvia 56 Mark 88 Steve Address Id Customer Location 24 10 Rome 33 10 London 44 34 Rome 66 11 Cologne 68 88 Palo Alto Relational World: 1-N Relationships
  • 66. Copyright © 2011 LOGTEL Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id Customer Id Name 10 Samuel 11 Katja 34 Sylvia 56 Mark 88 Steve Address Id Location 24 Rome 33 London 44 Rome 66 Cologne 68 Palo Alto CustomerAddr ess Id Address 10 24 10 33 34 24 Relational World: N-M Relationships
  • 67. Copyright © 2011 LOGTEL What’s wrong with the Relational Model?
  • 68. Copyright © 2011 LOGTEL These are all JOINs executed everytime you traverse a relationship Customer Id Name 10 Samuel 11 Katja 34 Sylvia 56 Mark 88 Steve Address Id Location 24 Rome 33 London 44 Rome 66 Cologne 68 Palo Alto These are all JOINs executed everytime you traverse a relationship These are all JOINs executed everytime you traverse a relationship These are all JOINs executed everytime you traverse a relationship! CustomerAddr ess Id Address 10 24 10 33 34 24 The JOIN is the evil!
  • 69. Copyright © 2011 LOGTEL Why not JOIN • A JOIN means searching for a key in another table • The first rule to improve performance is indexing all the keys • Index speeds up searches but slows down insert, updates and deletes • So in the best case a JOIN is a lookup into in an index • This is done per single join! • If you traverse hundreds of relationships you’re executing hundreds of JOINs
  • 70. Copyright © 2011 LOGTEL Index Lookup it is really that fast?
  • 71. Copyright © 2011 LOGTEL A-Z A-L M-Z A-L A-D E-L M-Z M-R S-Z A-D A-B C-D E-L E-G H-L E-G E-F G H-L H-J K-L Index algorithms are all similar and based on balanced trees Index Lookup: how does it works? Think to an Address Book where we have to find Samuel’s phone number
  • 72. Copyright © 2011 LOGTEL A-Z A-L M-Z A-L A-D E-L M-Z M-R S-Z A-D A-B C-D E-L E-G H-L E-G E-F G H-L H-J K-L Found! Each lookup takes X steps, where X grows with the index size!
  • 73. Copyright © 2011 LOGTEL An index lookup is executed for each JOIN Querying more tables can easily produce millions of JOINs/Lookups! Here the rule: more entries = more lookup steps = slower JOIN
  • 74. Copyright © 2011 LOGTEL Is there a better way to manage relationships?
  • 75. Copyright © 2011 LOGTEL How does GraphDB manage index-free relationships?
  • 76. Copyright © 2011 LOGTEL an Open Source (Apache 2) document-graph NoSQL dbms supports: transactions, extended-SQL, Multi-Master replication, etc
  • 77. Copyright © 2011 LOGTEL Sam Lives out : [#14:54] label : ‘Customer’ name : ‘Sam’ out: [#13:35] in: [#13:100] Label : ‘Lives’ RID = #13:35 RID = #14:54 RID = #13:100 in: [#14:54] label = ‘Address’ name = ‘Rome’ The Record ID (RID) is a Physical position Rome OrientDB: traverse a relationship
  • 78. Copyright © 2011 LOGTEL GraphDB handles relationships as a physical LINK to the record assigned when the edge is created on the other side RDBMS computes the relationship every time you query a database Is not that crazy?!
  • 79. Copyright © 2011 LOGTEL This means jumping from a O(log N) algorithm to a near O(1) traversing cost is not more affected by database size! This is huge in the BigData age
  • 80. Copyright © 2011 LOGTEL $luca> cd bin $luca> ./console.sh OrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org) Type 'help' to display all the commands supported. orientdb> create vertex V set name = ‘Sam’, label = ‘Customer’ Created vertex #13:35 in 0.03 secs orientdb> create vertex V set name = ‘Rome’, label = ‘Address’ Created vertex #13:100 in 0.02 secs orientdb> create edge E from #13:35 to #13:100 set label = ‘Lives’ Created edge #14:54 in 0.02 secs Create the graph in SQL
  • 81. Copyright © 2011 LOGTEL OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”); ODocument sam= graph.createVertex(); sam.field(“name", “Sam"); sam.field(“label", “Customer"); ODocument rome = graph.createVertex(); rome.field(“name", “Rome”); rome.field(“label", “Address”); ODocument edge = graph.createEdge(sam, rome).field(“label”, “Lives”); edge.save(); graph.close(); Create the graph in Java
  • 82. Copyright © 2011 LOGTEL orientdb> select in[label=‘Lives’].out from V where label = ‘Address’ and name = ‘Rome’ ---+--------+--------------------+--------------------+--------------------+ #| REC ID |label |out |in | ---+--------+--------------------+--------------------+--------------------+ 0| 13:35|Sam |[#14:54] | | ---+--------+--------------------+--------------------+--------------------+ 1 item(s) found. Query executed in 0.007 sec(s). orientdb> select * from V where label = ‘Address’ AND in[label=‘Lives’].size() > 0 ---+--------+--------------------+--------------------+--------------------+ #| REC ID |label |out |in | ---+--------+--------------------+--------------------+--------------------+ 0| 13:100| Rome | |[#14:54] | ---+--------+--------------------+--------------------+--------------------+ 1 item(s) found. Query executed in 0.007 sec(s). Query the graph in SQL
  • 83. Copyright © 2011 LOGTEL OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”); // GET ALL THE THE CUSTOMER FROM ROME, ITALY List<ODocument> result = graph.command( new OCommandSQL ( “select in[label=‘Lives’].out from V where label = ‘Address’ and name = ?”) ).execute( “Rome”); for( ODocument v : result ) { System.out.println(“Result: “ + v.field(“label”) ); } ----------------------------------------------------------------------------------- ----Result: Sam Query the graph in Java
  • 84. Copyright © 2011 LOGTEL Query vs. traversal  Once you’ve a well connected database in the form of a Super Graph you can cross records instead of query them!  All you need is some root vertices where to start to traverse
  • 85. Copyright © 2011 LOGTEL Customers Sam John Sylvia Order 2332 Order 8834 White Soap Stocks Special Customers This is a root vertex Query vs. traversal
  • 86. Copyright © 2011 LOGTEL Supposing that the root node #30:0 links all the Customer vertices Get all the customers: orientdb> select out.in from #30:0 Get all the customers who bought at least one ‘White Soap’ product: orientdb> select * from ( select out.in from #30:0) where out.in.out[label=‘Bought’].in.name = ‘White Soap’ Customers #30:0 Query the graph in SQL
  • 87. Copyright © 2011 LOGTEL Demo time!
  • 88. Copyright © 2011 LOGTEL Should I be using NoSQL Databases?  For almost all of us, regular relational databases are THE correct solution  NoSQL Data storage systems makes sense for applications that need to deal with very very large semi-structured data  Log Analysis  Social Networking Feeds
  • 89. Copyright © 2011 LOGTEL WHAT CAN WE DO WITH BIG DATA ? 90
  • 90. What’s driving Big Data - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time 91
  • 91. Value of Big Data Analytics • Big data is more real-time in nature than traditional DW applications • Traditional DW architectures (e.g. Exadata, Teradata) are not well- suited for big data apps • Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps 92
  • 92. Copyright © 2011 LOGTEL 93
  • 93. Copyright © 2011 LOGTEL USN with related technical areas 94
  • 94. What is collecting all this data? Web Browsers Search Engines Microsoft’s Internet Explorer Mozilla’s FireFox Google’s Chrome Apple’s Safari Google’s Microsoft’s Yahoo’s IAC Search’s Time-Warner’s AOL Explorer (Non-profit foundation, used to be Netscape)
  • 95. What is collecting all this data? Smartphones & Apps Apple’s iPhone (Apple O/S) Samsung, HTC. Nokia, Motorola (Android O/S) RIM Corp’s Blackberry (BlackBerry O/S) Tablet Computers & Apps Apple’s iPad Samsung’s Galaxy Amazon’s Kindle Fire
  • 96. What is collecting all this data? Hospitals & Other Medical Systems Banking & Phone Systems Can you hear me now? (Heh heh heh!) Pharmacies Laboratories Imaging Centers Emergency Medical Services (EMS) Hospital Information Systems Doc-in-a-Box Electronic Medical Records Blood Banks Birth & Death Records
  • 97. What is collecting all this data? A real pain in the apps! What are they collecting? • Restaurant reservations (Open Table) • Weather in L.A. in 3 days (Weather+) • Side effects of medications (MedWatcher) • 3-star hotels in New Orleans (Priceline) • Which PC should I buy and where (PriceCheck)
  • 98. Big Brother Needs Big Data In March 2012, the Obama Administration announced the Big Data Research and Development Initiative, $200 million in new R&D investments, which will explore how Big Data could be used to address important problems facing the government. The initiative was composed of 84 different Big Data programs spread across six departments. http://tinyurl.com/85oytkj The U.S. Federal Government owns six of the ten most powerful supercomputers in the world.
  • 99. How Companies Like Use Big Data To Make You Love Them Last month, I talked to Amazon customer service about my malfunctioning Kindle, and it was great. Thirty seconds after putting in a service request on Amazon’s website, my phone rang, and the woman on the other end--let’s call her Barbara--greeted me by name and said, "I understand that you have a problem with your Kindle." We resolved my problem in under two minutes, we got to skip the part where I carefully spell out my last name and address, and she didn’t try to upsell me on anything. After nearly a decade of ordering stuff from Amazon, I never loved the company as much as I did at that moment. The fact is, Amazon has been collecting my information for years--not just addresses and payment information but the identity of everything I’ve ever bought or even looked at. And while dozens of other companies do that, too, Amazon’s doing something remarkable with theirs. They’re using that data to build our relationship. Article by Sean Madden, May 2012, an expert in service design and innovation strategy.
  • 100. How Can You Avoid Big Data? • Pay cash for everything! • Never go online! • Don’t use a telephone! • Don’t use Kroger or Harris Teeter cards! • Don’t fill any prescriptions! • Never leave your house!
  • 101. Key concept of Big Data • Store everything • Don’t delete anything • Schema is a bottleneck • Think always on parallel • Remember the CAP theorem
  • 102. ThankYou!!! …and please fill the evaluation form 103