Big data hadoop-no sql and graph db-final

1,994 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,994
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
67
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • This template can be used as a starter file to give updates for project milestones.SectionsRight-click on a slide to add sections. Sections can help to organize your slides or facilitate collaboration between multiple authors.NotesUse the Notes section for delivery notes or to provide additional details for the audience. View these notes in Presentation View during your presentation. Keep in mind the font size (important for accessibility, visibility, videotaping, and online production)Coordinated colors Pay particular attention to the graphs, charts, and text boxes.Consider that attendees will print in black and white or grayscale. Run a test print to make sure your colors work when printed in pure black and white and grayscale.Graphics, tables, and graphsKeep it simple: If possible, use consistent, non-distracting styles and colors.Label all graphs and tables.
  • What is the project about?Define the goal of this projectIs it similar to projects in the past or is it a new effort?Define the scope of this projectIs it an independent project or is it related to other projects?* Note that this slide is not necessary for weekly status meetings
  • * If any of these issues caused a schedule delay or need to be discussed further, include details in next slide.
  • Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • Big data hadoop-no sql and graph db-final

    1. 1. Big Data – Hadoop - NoSQL and Graph DatabaseRamazan FIRIN20.11.2012 This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission of this document in any manner to any third parties that are not authorised to receive.
    2. 2. AGENDA• Big Data• Hadoop• NoSQL• Graph DB and Neoj• Possible Usage in Tellco• Demo 2
    3. 3. Executive Summary • Big Data is a new IT trend • Hadoop and NoSQL can used to process Big Data • Possible usage area in Tellco : - Prevent Churn - to offer customer spesific campaign - to get more customerAVEA 3 R&D /MW Developement
    4. 4. What is Big Data? Datasets that are too awkward to work with using traditional, hands-ondatabase management tools. 4
    5. 5. Big Data- 3V Concept 5
    6. 6. Big Data Sources1. Social network profiles -Facebook, LinkedIn, Yahoo, Google2. Social influencers - blog comments, user forums, review sites,3. Activity-generated data - application logs, sensor data4. Public—Wikipedia, IMDb, etc5. Data warehouse appliances - transactional data6. Network and in-stream monitoring7. Legacy documents— 6
    7. 7. Big Data To Smart Data Cover of The Economist 7
    8. 8. Volume 8
    9. 9. New Data Sources - Internet• 2 Billion internet users by 2011• Twitter processes 7 terabytes data of every day• Facebook processes 10 terabytes data of every day• 4.6 billion mobile phone• Google processes 24 petabytes data of every day 9
    10. 10. Big Data Approach 10
    11. 11. Big Data Design 11
    12. 12. Big Data Usage Sector 12
    13. 13. Sample Usage - 360°Degree View of theCustomers 13
    14. 14. Sample Usage – Customer Sentiment 14
    15. 15. Sample Usage – Detect Churn Pattern 15
    16. 16. Sample Usage - Healty 16
    17. 17. Big Data Market 17
    18. 18. Big Data Solutions – Oracle Big Data Appliance 18
    19. 19. Big Data Solutions – IBM Pure Data 19
    20. 20. TOP 10 Tecnology Trend 2012 from CSC 20
    21. 21. Gartner: Top 10 IT Trends for 2013Avea 21 21R&D /MW Developement
    22. 22. Gartner:10 Critical IT Trends For The Next FiveYears• Third trend is Bigger data and storage:• By 2015, big data demand will generate 1 million jobs in the Global 1000,• but only a one-third of jobs will get filled due to shortage of talent.• Analytics and pattern recognition are key.• Seeing new specialized ARM-based servers to do specialty analytics.Avea 22 22R&D /MW Developement
    23. 23. HADOOP 23
    24. 24. What is HADOOP? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models 24
    25. 25. History 25
    26. 26. Hadoop Components 26
    27. 27. HADOOP ARCHITECTURE 27
    28. 28. Hadoop EcosystemPig - simplifies hadoop programming, data processing languageHive - SQL like queriesHBase - Random read/write, billions of row and millions of colums (NoSQL) 28
    29. 29. Other Google Research 29
    30. 30. NoSQL 30
    31. 31. RDBMS PERFORMANCEAvea 31 31R&D /MW Developement
    32. 32. Join is killer...Avea 32 32R&D /MW Developement
    33. 33. What is NoSQL?• Stands for Not Only SQL• Non relational• Cheap, Easy to implement• Scalability – Vertically - Add more data – Horizontally - Add more storage• No pre-defined schema• No join operations• Not ACID, support CAP threom 33
    34. 34. NoSQL DB Types1. Key-values Stores2. Document Databases3. Column Family Stores4. Graph Databases 34
    35. 35. Key-Value Stores - Redis, Voldemort 35
    36. 36. Document Database- CouchDB, MongoDB 36
    37. 37. Column Family Stores - Cassandra, HBase 37
    38. 38. Graph Database- Neo4J, InfoGrid, Infinite Graph 38
    39. 39. RMDBS Support ACID• Atomicity - a transaction is all or nothing• Consistency - only valid data is written to the database• Isolation - pretend all transactions are happening serially and the data is correct• Durability - what you write is what you get 39
    40. 40. NoSQL Support CAP Threom 40
    41. 41. NoSQL Support CAP Theorem• Consistency - each client always has the same view of the data.• Availability - all clients can always read and write.• Partition tolerance - if one or more nodes fails the system still works You can pick only two... 41
    42. 42. Visual Guide to NoSQL SystemsAvea 42 42R&D /MW Developement
    43. 43. NoSQL Complexity 43
    44. 44. NoSQL Performance 44
    45. 45. Job TrendsAvea 45 45R&D /MW Developement
    46. 46. Graph DB and Neo4j 46
    47. 47. Graph DBGraph database uses graph structures with nodes, edges, and properties to represent and store data. 47
    48. 48. Graph DB Usage Area• Recommendations • Time Series data• Business Inteligence • Product Catalogue• Social networking • Web Analitics• MDM • Scientific Computing• System Management • Indexing your slow RMDBS 48
    49. 49. Relational Databases are Graphs! 49
    50. 50. Neo4j• Leading Graph • Opensource Database• Transaction • Traversal framework support (ACID) • High Performance• Indexing (traverse 1.000.000 + relationship/seconds)• Querying• REST support • Robust (in 7/24 operation since 2003)• Disk Based • Massive scalability 50
    51. 51. Neo4j Data ModelNeo4j has Nodes and Relationship.Nodes and realtionships have properties. Relationship type : knows Node1 Property : Date of meeting Node2 Relationship Property:name Property:name Property:surname Property:surname 51
    52. 52. Ne4j Performancehttp://www.neotechnology.com/2012/10/20-billion-relationships-imported- into-neo4j-on-ec2/ 52
    53. 53. Who use Neo4j?• Cisco - Master Data Management• Telenor Group : Customer organization scructure (203 million subscribers )• Deutsche Telekom: Social football site (150 million subscribers ) 53
    54. 54. Cypher For Query 54
    55. 55. Sample Code 55
    56. 56. Spring Data Neo4j 56
    57. 57. Neoclipse 57
    58. 58. Product CatalogAvea 58 58R&D /MW Developement
    59. 59. Sample OM Data Model 59
    60. 60. Hardware Calculating Tool 60
    61. 61. Hardware Calculating Tool ResultCalculation Result Prod Environment • 4 pysical machines • 3 node at every machines • 1024 mhz cpu • 65536 MB Ram 61
    62. 62. Orient DB• The Document-Graph • HTTP / Restfull / Json / database Binary supports• ACID support • Hooks• SQL and Native Queries, • Fetch plans• schema-less, schema-full • Inheritance and schema-mixed modes • 200.000 insert per• Roles + Security second(6 M node travels with cache)• Functions 62
    63. 63. FluxGraph• Temporal Graph Database• Has checkpoint• Compatible with Neo4jMercedes-Benz Türk A.Ş. 63 632008-07-01_Presentation Template MBT / CEO
    64. 64. Examples for TelCos• CDR• Routing• Social graphs• Master Data Management• Spatial and LBS• Network topology analysis• Neo4j and AndroidAvea 64 64R&D /MW Developement
    65. 65. CDR AnalysisAvea 65 65R&D /MW Developement
    66. 66. Master Data ManagementAvea 66 66R&D /MW Developement
    67. 67. Network ManagementAvea 67 67R&D /MW Developement
    68. 68. Cell Network AnaliysisAvea 68 68R&D /MW Developement
    69. 69. Sample Senarios• Customer Spesific Campaign• Prevent Churn• Get More Customer• Special offer for campaigns 69
    70. 70. Thanks 70

    ×