Co existence or Competition ? - RDBMS and Hadoop
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Co existence or Competition ? - RDBMS and Hadoop

  • 640 views
Uploaded on

RDBMS and Hadoop - Co-existance or Competition?

RDBMS and Hadoop - Co-existance or Competition?

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
640
On Slideshare
640
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • No centralized control.Data Redundancy Data Inconsistency Data can not be sharedStandards can not be enforcedSecurity issues Integrity can not be maintainedData dependenceCentralized control.No Data Redundancy Data Consistency Data can be sharedStandards can be enforcedSecurity can be enforcedIntegrity can be maintainedData independence
  • Can all the data be structured?Will we be able to store all the data in the tables ie can we model all the data?Should we discard the data after getting the required structured data from the log files or should we archive it?
  • Take the example of students using the facilities provided by college.
  • Two Core Components – HDFS & Map-ReduceMachines are un-reliableSeparates distributed fault-tolerant computing code from application logic.No need to worry about identity of a machinelets you interact with a cluster, not a bunch of machines.Analysis workloads span across multiple machinesruns as a cloud(cluster) & possibly on a cloud (EC2)
  • Consumer interested inSocial NetworkingOnline purchasing/bookingService Provider Interested dataAdvertisements or Revenue generationReporting – For internal house keepingChallenges Recommendation – publishing those advertisements which consumer look at as an information or which he is interested in.

Transcript

  • 1. RDBMS and Hadoop - Co-existence or competition Ram Mohan Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012
  • 2. Session Agenda! Introduction to RDBMS What is Hadoop and Map-Reduce Hadoop and RDBMS – A comparison Co-Existence – Practical Example - Master Website Q&A Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 2
  • 3. Relational DBMS Based on Relational Mathematics principles Data is represented in terms of rows and columns of a table Relational Terminology ◦ Tuple (Row) ◦ Attribute (Column) ◦ Relation (Table) Integrity Constraints ◦ Primary Key ◦ Foreign Key ◦ Alternate Key ACID Test ◦ Atomicity ◦ Consistency ◦ Isolation ◦ Durability Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 3
  • 4. Normalization Normalization - process of removing data redundancy by decomposing relations in a Database. De normalization - carefully introduced redundancy to improve query performance. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 4
  • 5. Relational DBMS Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 5
  • 6. Example DataS# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisP# PNAME COLOR WEIGHT CITYP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonS# P# QTYS1 P1 300S1 P2 200S1 P3 400S2 P1 300S2 P2 400S3 P2 200 Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 6
  • 7. Five computers & a 640k ;-) "I think there is a world market for about five Moore’s computers" Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 7
  • 8. The Big Data Challenges Sources of Data and the amount of data to analyze is growing exponentially Stale data exists because DW solutions cannot ingest the vast amounts of data fast enough Lack of performance for advanced analytics and complex queries The number of users and the concurrency of users is increasing rapidly Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 8
  • 9. Hadoop Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 9
  • 10. Hadoop – HDFS(Hadoop Distributed File System) Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 10
  • 11. Hadoop – HDFS(Hadoop Distributed File System) Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 11
  • 12. Map-Reduce Model Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 12
  • 13. Hadoop – Limitations Is not intended for realtime querying. Does not support random access. Significant learning curve Provides barebones functionality out of the box but scaling is built-in and inexpensive Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 13
  • 14. Where SQL Makes life easy Joining ◦ In a single query, get all products in an order with their product information Secondary Indexing ◦ Get CustomerId by e-mail Referential Integrity Realtime Analysis. Millions are trained in SQL and relational data modelling RDBMS provides tremendous functionality, but is extremely difficult and costly to scale Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 14
  • 15. Master Website – A Practical Example Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 15
  • 16. Master Website – RDBMS Use Cases Profile Information – That is provided during sign up Intelligence generated ie the output of the analytic jobs. Any online purchasing track records and account management Reporting tools Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 16
  • 17. Master Website – Hadoop Use Cases Generating Intelligence from the continuous stream of data ◦ Wall Posts on Facebook New tags to be added based on the old logs available, due to new requirements Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 17
  • 18. A Practical Example – Facebook Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 18
  • 19. THANK YOU Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 19