FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Hadoop & no sql new generation database systems
1. This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are
confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission
of this document in any manner to any third parties that are not authorised to receive.
Hadoop & NoSQL
New Generation Database Systems
Ramazan FIRIN
22.04.2014
2. 2
AGENDA
• Big Data
• Hadoop
• NoSQL
• Graph DB and Neoj
• Possible Usage in Tellco
• Demo
3. 3
Executive Summary
AVEA
• Big Data is a new IT trend
• Hadoop and NoSQL can used to process Big Data
• Possible usage area in Tellco :
- Prevent Churn
- to offer customer spesific campaign
- to get more customer
4. 4
Big Bang = Big Data
Big Bang Big Data
42008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.
5. 5
What is Big Data?
Datasets that are too awkward to work with using traditional,
hands-ondatabase management tools.
8. 8
Big Data Sources
1. Social network profiles -Facebook, LinkedIn, Yahoo, Google
2. Social influencers - blog comments, user forums, review sites,
3. Activity-generated data - application logs, sensor data
4. Public—Wikipedia, IMDb, etc
5. Data warehouse appliances - transactional data
6. Network and in-stream monitoring
7. Legacy documents—
13. 13
Storage for Big Data
13
İf we cant use relational Database, how can we
store it?
1)Hadoop
2)NoSQL
14. 14
What is HADOOP?
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming models
18. 18
Hadoop Ecosystem
Pig - simplifies hadoop programming, data processing language
Hive - SQL like queries
HBase - Random read/write, billions of row and millions of colums
(NoSQL)
22. 22
What is NoSQL?
• Stands for Not Only SQL
• Non relational
• Cheap, Easy to implement
• Scalability
– Vertically - Add more data
– Horizontally - Add more storage
• No pre-defined schema
• No join operations
• Not ACID, support CAP threom
26. 26
Redis persistance
• RDB - Take snapshot in an interval
Fast
may loss several minutes data if kill -9
•
• AOF – Log for all operations
Still fast enough
may loss 1 second data if kill -9
26
27. 27
Redis Commands
$ redis-cli set counter 100 OK
$ redis-cli incr counter (integer) 101
$ redis-cli incr counter (integer) 102
$ redis-cli incrby counter 10 (integer) 112
SET : SADD,
GET : SPOP, SRANDMEMBER, SMEMBERS
DEL : SREM
ETC : SINTER, SUNION, SCARD, SDIFF, SMOVE, SISMEMBER
27
28. 28
Redis Commands – Lists
$redis-cli rpush messages "Hello how are you?" OK
$ redis-cli rpush messages "Fine thanks. I'm having fun with Redis"
OK
$ redis-cli rpush messages "I should look into this NOSQL thing
ASAP" OK
$ redis-cli lrange messages 0 2
1. Hello how are you?
2. 2. Fine thanks. I'm having fun with Redis
3. 3. I should look into this NOSQL thing ASAP
• Chat systems
• Paginations...
28
31. 31
MongoDB Features
• JSON / BSON support
• RestFul support
• CRUD operations
• Queries like SQL
• İndexing
• Auto sharding
• Built in replication and high availabity
• Aggregation framework
31
34. 34
MondoDB vs SQL
34
SQL MongoDB
SELECT * FROM users db.users.find()
SELECT id, user_id, status FROM users
db.users.find( { }, { user_id: 1, status:
1 } )
SELECT * FROM users WHERE status
= "A"
db.users.find( { status: "A" } )
SELECT user_id, status FROM users
WHERE status = "A"
db.users.find( { status: "A" }, {
user_id: 1, status: 1, _id: 0 } )
SELECT * FROM users WHERE
user_id like "%bc%"
db.users.find( { user_id: /bc/ } )
SELECT * FROM users WHERE status
= "A" ORDER BY user_id ASC
db.users.find( { status: "A" } ).sort( {
user_id: 1 } )
SELECT * FROM users LIMIT 5 SKIP
10
db.users.find().limit(5).skip(10)
41. 41
RMDBS Support ACID
• Atomicity - a transaction is all or nothing
• Consistency - only valid data is written to the database
• Isolation - pretend all transactions are happening serially and the data
is correct
• Durability - what you write is what you get
42. 42
NoSQL Support CAP Threom
Consistency : all nodes give the same
answer
Avaibility : nodes always give answer and
accept updates
Partitioning: system continuos working if
some nodes go quite
47. 47
Graph DB Usage Area
• Recommendations
• Business Inteligence
• Social networking
• MDM
• System Management
• Time Series data
• Product Catalogue
• Web Analitics
• Scientific Computing
• Indexing your slow
RMDBS
49. 49
Neo4j
• Leading Graph
Database
• Transaction support
(ACID)
• Indexing
• Querying
• REST support
• Disk Based
• Opensource
• Traversal framework
• High Performance
(traverse 1.000.000 +
relationship/seconds)
• Robust (in 7/24 operation
since 2003)
• Massive scalability
50. 50
Neo4j Data Model
Neo4j has Nodes and Relationship.
Nodes and realtionships have properties.
Node1 Node2
Property:name
Property:surname
Property:name
Property:surname
Relationship
Relationship type : knows
Property : Date of meeting
54. 54
Who use Neo4j?
• Cisco - Master Data Management
• Telenor Group : Customer organization scructure (203 million
subscribers )
• Deutsche Telekom: Social football site (150 million subscribers )
55. 55
Orient DB
• The Document-Graph
database
• ACID support
• SQL and Native Queries,
• schema-less, schema-full
and schema-mixed modes
• Roles + Security
• Functions
• HTTP / Restfull / Json /
Binary supports
• Hooks
• Fetch plans
• Inheritance
• 200.000 insert per
second(6 M node travels
with cache)
56. 56
FluxGraph
• Temporal Graph Database
• Has checkpoint
• Compatible with Neo4j
562008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.
61. 61
NoSQL Usage
• Cisco is building a master data management system based on Neo4j, and this is
actually our first Fortune 500 customer. They found us about two years ago when they
tried to build this big, complex hierarchy inside of Oracle RAC. In Oracle RAC, they had
response time in minutes, and then when they replaced it [with] Neo4j, they had
response times in milliseconds.
Emil Eifrem – Neo4j
CEO
• NHS tears out its Oracle Spine in favour of open source
http://www.theregister.co.uk/2013/10/10/nhs_drops_oracle_for_riak/
• AMD: Why we had to evacuate 276TB from Oracle DB to Hadoop
http://www.theregister.co.uk/2014/03/24/amd_hadoop_migration/
61