RDBMS and Hadoop - Co-existence                or competition                                                             ...
Session Agenda!   Introduction to RDBMS   What is Hadoop and Map-Reduce   Hadoop and RDBMS – A comparison   Co-Existen...
Relational DBMS   Based on Relational Mathematics principles   Data is represented in terms of rows and columns of a tab...
Normalization   Normalization - process of removing data redundancy by decomposing    relations in a Database.   De norm...
Relational DBMS        Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   5
Example DataS#   SNAME      STATUS             CITYS1   Smith      20                 LondonS2   Jones      10            ...
Five computers & a 640k ;-)                                                              "I think there is a world        ...
The Big Data Challenges   Sources of Data and the amount of data to analyze is growing    exponentially   Stale data exi...
Hadoop Architecture        Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   9
Hadoop – HDFS(Hadoop Distributed File System)   Reliably store petabytes of replicated data across thousand of nodes    ◦...
Hadoop – HDFS(Hadoop Distributed File System)   Reliably store petabytes of replicated data across thousand of nodes    ◦...
Map-Reduce Model          Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   12
Hadoop – Limitations   Is not intended for realtime querying.   Does not support random access.   Significant learning ...
Where SQL Makes life easy   Joining    ◦ In a single query, get all products in an order with their product information ...
Master Website – A Practical Example         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   15
Master Website – RDBMS Use Cases   Profile Information – That is provided during sign up   Intelligence generated ie the...
Master Website – Hadoop Use Cases   Generating Intelligence from the continuous stream of data    ◦ Wall Posts on Faceboo...
A Practical Example – Facebook Architecture         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   18
THANK YOU       Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   19
Upcoming SlideShare
Loading in...5
×

Co existence or Competition ? - RDBMS and Hadoop

621

Published on

RDBMS and Hadoop - Co-existance or Competition?

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
621
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • No centralized control.Data Redundancy Data Inconsistency Data can not be sharedStandards can not be enforcedSecurity issues Integrity can not be maintainedData dependenceCentralized control.No Data Redundancy Data Consistency Data can be sharedStandards can be enforcedSecurity can be enforcedIntegrity can be maintainedData independence
  • Can all the data be structured?Will we be able to store all the data in the tables ie can we model all the data?Should we discard the data after getting the required structured data from the log files or should we archive it?
  • Take the example of students using the facilities provided by college.
  • Two Core Components – HDFS & Map-ReduceMachines are un-reliableSeparates distributed fault-tolerant computing code from application logic.No need to worry about identity of a machinelets you interact with a cluster, not a bunch of machines.Analysis workloads span across multiple machinesruns as a cloud(cluster) & possibly on a cloud (EC2)
  • Consumer interested inSocial NetworkingOnline purchasing/bookingService Provider Interested dataAdvertisements or Revenue generationReporting – For internal house keepingChallenges Recommendation – publishing those advertisements which consumer look at as an information or which he is interested in.
  • Co existence or Competition ? - RDBMS and Hadoop

    1. 1. RDBMS and Hadoop - Co-existence or competition Ram Mohan Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012
    2. 2. Session Agenda! Introduction to RDBMS What is Hadoop and Map-Reduce Hadoop and RDBMS – A comparison Co-Existence – Practical Example - Master Website Q&A Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 2
    3. 3. Relational DBMS Based on Relational Mathematics principles Data is represented in terms of rows and columns of a table Relational Terminology ◦ Tuple (Row) ◦ Attribute (Column) ◦ Relation (Table) Integrity Constraints ◦ Primary Key ◦ Foreign Key ◦ Alternate Key ACID Test ◦ Atomicity ◦ Consistency ◦ Isolation ◦ Durability Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 3
    4. 4. Normalization Normalization - process of removing data redundancy by decomposing relations in a Database. De normalization - carefully introduced redundancy to improve query performance. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 4
    5. 5. Relational DBMS Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 5
    6. 6. Example DataS# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisP# PNAME COLOR WEIGHT CITYP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonS# P# QTYS1 P1 300S1 P2 200S1 P3 400S2 P1 300S2 P2 400S3 P2 200 Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 6
    7. 7. Five computers & a 640k ;-) "I think there is a world market for about five Moore’s computers" Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 7
    8. 8. The Big Data Challenges Sources of Data and the amount of data to analyze is growing exponentially Stale data exists because DW solutions cannot ingest the vast amounts of data fast enough Lack of performance for advanced analytics and complex queries The number of users and the concurrency of users is increasing rapidly Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 8
    9. 9. Hadoop Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 9
    10. 10. Hadoop – HDFS(Hadoop Distributed File System) Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 10
    11. 11. Hadoop – HDFS(Hadoop Distributed File System) Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 11
    12. 12. Map-Reduce Model Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 12
    13. 13. Hadoop – Limitations Is not intended for realtime querying. Does not support random access. Significant learning curve Provides barebones functionality out of the box but scaling is built-in and inexpensive Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 13
    14. 14. Where SQL Makes life easy Joining ◦ In a single query, get all products in an order with their product information Secondary Indexing ◦ Get CustomerId by e-mail Referential Integrity Realtime Analysis. Millions are trained in SQL and relational data modelling RDBMS provides tremendous functionality, but is extremely difficult and costly to scale Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 14
    15. 15. Master Website – A Practical Example Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 15
    16. 16. Master Website – RDBMS Use Cases Profile Information – That is provided during sign up Intelligence generated ie the output of the analytic jobs. Any online purchasing track records and account management Reporting tools Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 16
    17. 17. Master Website – Hadoop Use Cases Generating Intelligence from the continuous stream of data ◦ Wall Posts on Facebook New tags to be added based on the old logs available, due to new requirements Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 17
    18. 18. A Practical Example – Facebook Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 18
    19. 19. THANK YOU Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 19
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×