Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RDBMS and Hadoop - Co-existence                or competition                                                             ...
Session Agenda!   Introduction to RDBMS   What is Hadoop and Map-Reduce   Hadoop and RDBMS – A comparison   Co-Existen...
Relational DBMS   Based on Relational Mathematics principles   Data is represented in terms of rows and columns of a tab...
Normalization   Normalization - process of removing data redundancy by decomposing    relations in a Database.   De norm...
Relational DBMS        Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   5
Example DataS#   SNAME      STATUS             CITYS1   Smith      20                 LondonS2   Jones      10            ...
Five computers & a 640k ;-)                                                              "I think there is a world        ...
The Big Data Challenges   Sources of Data and the amount of data to analyze is growing    exponentially   Stale data exi...
Hadoop Architecture        Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   9
Hadoop – HDFS(Hadoop Distributed File System)   Reliably store petabytes of replicated data across thousand of nodes    ◦...
Hadoop – HDFS(Hadoop Distributed File System)   Reliably store petabytes of replicated data across thousand of nodes    ◦...
Map-Reduce Model          Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   12
Hadoop – Limitations   Is not intended for realtime querying.   Does not support random access.   Significant learning ...
Where SQL Makes life easy   Joining    ◦ In a single query, get all products in an order with their product information ...
Master Website – A Practical Example         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   15
Master Website – RDBMS Use Cases   Profile Information – That is provided during sign up   Intelligence generated ie the...
Master Website – Hadoop Use Cases   Generating Intelligence from the continuous stream of data    ◦ Wall Posts on Faceboo...
A Practical Example – Facebook Architecture         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   18
THANK YOU       Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   19
Upcoming SlideShare
Loading in …5
×

Co-existence or competition - RDBMS and Hadoop

2,961 views

Published on

Harnessing Hadoop for Big Data - Series III - Presentation on 'Co-existence or Competition - RDBMS and Hadoop

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Co-existence or competition - RDBMS and Hadoop

  1. 1. RDBMS and Hadoop - Co-existence or competition Ram Mohan Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012
  2. 2. Session Agenda! Introduction to RDBMS What is Hadoop and Map-Reduce Hadoop and RDBMS – A comparison Co-Existence – Practical Example - Master Website Q&A Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 2
  3. 3. Relational DBMS Based on Relational Mathematics principles Data is represented in terms of rows and columns of a table Relational Terminology ◦ Tuple (Row) ◦ Attribute (Column) ◦ Relation (Table) Integrity Constraints ◦ Primary Key ◦ Foreign Key ◦ Alternate Key ACID Test ◦ Atomicity ◦ Consistency ◦ Isolation ◦ Durability Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 3
  4. 4. Normalization Normalization - process of removing data redundancy by decomposing relations in a Database. De normalization - carefully introduced redundancy to improve query performance. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 4
  5. 5. Relational DBMS Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 5
  6. 6. Example DataS# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisP# PNAME COLOR WEIGHT CITYP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonS# P# QTYS1 P1 300S1 P2 200S1 P3 400S2 P1 300S2 P2 400S3 P2 200 Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 6
  7. 7. Five computers & a 640k ;-) "I think there is a world market for about five Moore’s computers" Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 7
  8. 8. The Big Data Challenges Sources of Data and the amount of data to analyze is growing exponentially Stale data exists because DW solutions cannot ingest the vast amounts of data fast enough Lack of performance for advanced analytics and complex queries The number of users and the concurrency of users is increasing rapidly Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 8
  9. 9. Hadoop Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 9
  10. 10. Hadoop – HDFS(Hadoop Distributed File System) Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 10
  11. 11. Hadoop – HDFS(Hadoop Distributed File System) Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 11
  12. 12. Map-Reduce Model Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 12
  13. 13. Hadoop – Limitations Is not intended for realtime querying. Does not support random access. Significant learning curve Provides barebones functionality out of the box but scaling is built-in and inexpensive Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 13
  14. 14. Where SQL Makes life easy Joining ◦ In a single query, get all products in an order with their product information Secondary Indexing ◦ Get CustomerId by e-mail Referential Integrity Realtime Analysis. Millions are trained in SQL and relational data modelling RDBMS provides tremendous functionality, but is extremely difficult and costly to scale Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 14
  15. 15. Master Website – A Practical Example Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 15
  16. 16. Master Website – RDBMS Use Cases Profile Information – That is provided during sign up Intelligence generated ie the output of the analytic jobs. Any online purchasing track records and account management Reporting tools Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 16
  17. 17. Master Website – Hadoop Use Cases Generating Intelligence from the continuous stream of data ◦ Wall Posts on Facebook New tags to be added based on the old logs available, due to new requirements Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 17
  18. 18. A Practical Example – Facebook Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 18
  19. 19. THANK YOU Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 19

×