Hadoop & no sql new generation database systems

664 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
664
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop & no sql new generation database systems

  1. 1. This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission of this document in any manner to any third parties that are not authorised to receive. Hadoop & NoSQL New Generation Database Systems Ramazan FIRIN 22.04.2014
  2. 2. 2 AGENDA • Big Data • Hadoop • NoSQL • Graph DB and Neoj • Possible Usage in Tellco • Demo
  3. 3. 3 Executive Summary AVEA • Big Data is a new IT trend • Hadoop and NoSQL can used to process Big Data • Possible usage area in Tellco : - Prevent Churn - to offer customer spesific campaign - to get more customer
  4. 4. 4 Big Bang = Big Data Big Bang Big Data 42008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.
  5. 5. 5 What is Big Data? Datasets that are too awkward to work with using traditional, hands-ondatabase management tools.
  6. 6. 6 Big Data- 3V Concept
  7. 7. 7 Big Data To Smart Data Cover of The Economist
  8. 8. 8 Big Data Sources 1. Social network profiles -Facebook, LinkedIn, Yahoo, Google 2. Social influencers - blog comments, user forums, review sites, 3. Activity-generated data - application logs, sensor data 4. Public—Wikipedia, IMDb, etc 5. Data warehouse appliances - transactional data 6. Network and in-stream monitoring 7. Legacy documents—
  9. 9. 9 Big Data Approach
  10. 10. 10 Sample Usage - 360°Degree View of the Customers
  11. 11. 11 Big Data Solutions – Oracle Big Data Appliance
  12. 12. 12 Big Data Solutions – IBM Pure Data
  13. 13. 13 Storage for Big Data 13 İf we cant use relational Database, how can we store it? 1)Hadoop 2)NoSQL
  14. 14. 14 What is HADOOP? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models
  15. 15. 15 History
  16. 16. 16 Hadoop Components
  17. 17. 17 HADOOP ARCHITECTURE
  18. 18. 18 Hadoop Ecosystem Pig - simplifies hadoop programming, data processing language Hive - SQL like queries HBase - Random read/write, billions of row and millions of colums (NoSQL)
  19. 19. 19 NoSQL
  20. 20. 20 RDBMS PERFORMANCE 20
  21. 21. 21 Join is killer... 21
  22. 22. 22 What is NoSQL? • Stands for Not Only SQL • Non relational • Cheap, Easy to implement • Scalability – Vertically - Add more data – Horizontally - Add more storage • No pre-defined schema • No join operations • Not ACID, support CAP threom
  23. 23. 23 Key-Value Stores - Redis, Voldemort
  24. 24. 24 Redis Features • Data Types • Publish / Subscribe • Transactions • Replication • Persistence • Partition 24
  25. 25. 25 Redis Datatypes • String • List • Sets • Sorted Sets • Hashes 25
  26. 26. 26 Redis persistance • RDB - Take snapshot in an interval Fast may loss several minutes data if kill -9 • • AOF – Log for all operations Still fast enough may loss 1 second data if kill -9 26
  27. 27. 27 Redis Commands $ redis-cli set counter 100 OK $ redis-cli incr counter (integer) 101 $ redis-cli incr counter (integer) 102 $ redis-cli incrby counter 10 (integer) 112 SET : SADD, GET : SPOP, SRANDMEMBER, SMEMBERS DEL : SREM ETC : SINTER, SUNION, SCARD, SDIFF, SMOVE, SISMEMBER 27
  28. 28. 28 Redis Commands – Lists $redis-cli rpush messages "Hello how are you?" OK $ redis-cli rpush messages "Fine thanks. I'm having fun with Redis" OK $ redis-cli rpush messages "I should look into this NOSQL thing ASAP" OK $ redis-cli lrange messages 0 2 1. Hello how are you? 2. 2. Fine thanks. I'm having fun with Redis 3. 3. I should look into this NOSQL thing ASAP • Chat systems • Paginations... 28
  29. 29. 29 Redis – Publish/Subscribe redis 127.0.0.1:6379> PUBLISH myradioshow "Good morning everyone!" (integer) 0 redis 127.0.0.1:6379> PUBLISH myradioshow "How ya'll doin tonight?" (integer) 0 redis 127.0.0.1:6379> PUBLISH myradioshow "Hello? Is anyone listening? I'm not wearing pants." (integer) 0 redis 127.0.0.1:6379> SUBSCRIBE myradioshow Reading messages... (press Ctrl-C to quit) 1) "subscribe" 2) 2) "myradioshow" 29
  30. 30. 30 Document Database - CouchDB, MongoDB
  31. 31. 31 MongoDB Features • JSON / BSON support • RestFul support • CRUD operations • Queries like SQL • İndexing • Auto sharding • Built in replication and high availabity • Aggregation framework 31
  32. 32. 32 Terminology 32
  33. 33. 33 Sharding 33
  34. 34. 34 MondoDB vs SQL 34 SQL MongoDB SELECT * FROM users db.users.find() SELECT id, user_id, status FROM users db.users.find( { }, { user_id: 1, status: 1 } ) SELECT * FROM users WHERE status = "A" db.users.find( { status: "A" } ) SELECT user_id, status FROM users WHERE status = "A" db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } ) SELECT * FROM users WHERE user_id like "%bc%" db.users.find( { user_id: /bc/ } ) SELECT * FROM users WHERE status = "A" ORDER BY user_id ASC db.users.find( { status: "A" } ).sort( { user_id: 1 } ) SELECT * FROM users LIMIT 5 SKIP 10 db.users.find().limit(5).skip(10)
  35. 35. 35 Column Family Stores -Cassandra, HBase
  36. 36. 36 Cassandra Features • Proven • Rich Data Model • Scalable • Distributed & Decentralized • High Performance read/write • Fault Tolerance • No SPOF • Schema free 36
  37. 37. 37 Cassandra Cluster 37
  38. 38. 38 Benhmark 38
  39. 39. 39 Architecture 39
  40. 40. 40 Consistency Level • ANY • ONE • TWO • THREE • QUORUM • LOCAL_QUORUM • EACH_QUORUM • ALL 40
  41. 41. 41 RMDBS Support ACID • Atomicity - a transaction is all or nothing • Consistency - only valid data is written to the database • Isolation - pretend all transactions are happening serially and the data is correct • Durability - what you write is what you get
  42. 42. 42 NoSQL Support CAP Threom Consistency : all nodes give the same answer Avaibility : nodes always give answer and accept updates Partitioning: system continuos working if some nodes go quite
  43. 43. 43 Visual Guide to NoSQL Systems 43
  44. 44. 44 Graph Database - Neo4J, InfoGrid, Infinite Graph
  45. 45. 45 Graph DB Graph database uses graph structures with nodes, edges, and properties to represent and store data.
  46. 46. 46 NoSQL Performance
  47. 47. 47 Graph DB Usage Area • Recommendations • Business Inteligence • Social networking • MDM • System Management • Time Series data • Product Catalogue • Web Analitics • Scientific Computing • Indexing your slow RMDBS
  48. 48. 48 Neo4j
  49. 49. 49 Neo4j • Leading Graph Database • Transaction support (ACID) • Indexing • Querying • REST support • Disk Based • Opensource • Traversal framework • High Performance (traverse 1.000.000 + relationship/seconds) • Robust (in 7/24 operation since 2003) • Massive scalability
  50. 50. 50 Neo4j Data Model Neo4j has Nodes and Relationship. Nodes and realtionships have properties. Node1 Node2 Property:name Property:surname Property:name Property:surname Relationship Relationship type : knows Property : Date of meeting
  51. 51. 51 Relational Databases are Graphs!
  52. 52. 52 Cypher For Query
  53. 53. 53 Ne4j Performance http://www.neotechnology.com/2012/10/20-billion-relationships-imported- into-neo4j-on-ec2/
  54. 54. 54 Who use Neo4j? • Cisco - Master Data Management • Telenor Group : Customer organization scructure (203 million subscribers ) • Deutsche Telekom: Social football site (150 million subscribers )
  55. 55. 55 Orient DB • The Document-Graph database • ACID support • SQL and Native Queries, • schema-less, schema-full and schema-mixed modes • Roles + Security • Functions • HTTP / Restfull / Json / Binary supports • Hooks • Fetch plans • Inheritance • 200.000 insert per second(6 M node travels with cache)
  56. 56. 56 FluxGraph • Temporal Graph Database • Has checkpoint • Compatible with Neo4j 562008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.
  57. 57. 57 Graphs of Telecommunications 57
  58. 58. 58 CDR Analysis by Graph 58
  59. 59. 59 Spring Data 59
  60. 60. 60 Spring Data Neo4j
  61. 61. 61 NoSQL Usage • Cisco is building a master data management system based on Neo4j, and this is actually our first Fortune 500 customer. They found us about two years ago when they tried to build this big, complex hierarchy inside of Oracle RAC. In Oracle RAC, they had response time in minutes, and then when they replaced it [with] Neo4j, they had response times in milliseconds. Emil Eifrem – Neo4j CEO • NHS tears out its Oracle Spine in favour of open source http://www.theregister.co.uk/2013/10/10/nhs_drops_oracle_for_riak/ • AMD: Why we had to evacuate 276TB from Oracle DB to Hadoop http://www.theregister.co.uk/2014/03/24/amd_hadoop_migration/ 61
  62. 62. 62 62 Statistics
  63. 63. 63 Magic Quadrant for Operational Database Management Systems 63
  64. 64. 64 NoSQL Market Size 64
  65. 65. 65 NoSQL Engine Ranking 65
  66. 66. 66 NoSQL in Enterprise App 66
  67. 67. 67 Use of NoSQL products 67
  68. 68. 68 Database market share 68
  69. 69. 69 Web Application Arcitecture 69
  70. 70. 70 Polyglot Persistance 70
  71. 71. 71 Thanks

×