SlideShare a Scribd company logo
1 of 19
RDBMS and Hadoop - Co-existence
                or competition
                                                               Ram Mohan




            Copyright © 2011 Flytxt B.V. All rights reserved      1/16/2012
Session Agenda!
   Introduction to RDBMS
   What is Hadoop and Map-Reduce
   Hadoop and RDBMS – A comparison
   Co-Existence – Practical Example - Master Website
   Q&A




               Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   2
Relational DBMS
   Based on Relational Mathematics principles
   Data is represented in terms of rows and columns of a table
   Relational Terminology
    ◦ Tuple (Row)
    ◦ Attribute (Column)
    ◦ Relation (Table)
   Integrity Constraints
    ◦ Primary Key
    ◦ Foreign Key
    ◦ Alternate Key
   ACID Test
    ◦   Atomicity
    ◦   Consistency
    ◦   Isolation
    ◦   Durability




                 Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   3
Normalization
   Normalization - process of removing data redundancy by decomposing
    relations in a Database.
   De normalization - carefully introduced redundancy to improve query
    performance.




               Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   4
Relational DBMS




        Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   5
Example Data
S#   SNAME      STATUS             CITY
S1   Smith      20                 London
S2   Jones      10                 Paris
S3   Blake      30                 Paris


P#   PNAME      COLOR             WEIGHT                 CITY
P1   Nut        Red               12                     London
P2   Bolt       Green             17                     Paris
P3   Screw      Blue              17                     Rome
P4   Screw      Red               14                     London


S#   P#   QTY
S1   P1   300
S1   P2   200
S1   P3   400
S2   P1   300
S2   P2   400
S3   P2   200


                Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   6
Five computers & a 640k ;-)

                                                              "I think there is a world
                                                              market for about five
      Moore’s                                                 computers"
       Law
                                                                        Thomas Watson 1943,
                                                                        Chairman of the board of IBM




      "640k ought to be enough
      for anybody"


                     Attributed to
                     Bill Gates in 1981.




           Copyright © 2011 Flytxt B.V. All rights reserved                                      1/16/2012   7
The Big Data Challenges
   Sources of Data and the amount of data to analyze is growing
    exponentially
   Stale data exists because DW solutions cannot ingest the vast amounts of
    data fast enough
   Lack of performance for advanced analytics and complex queries
   The number of users and the concurrency of users is increasing rapidly




               Copyright © 2011 Flytxt B.V. All rights reserved     1/16/2012   8
Hadoop Architecture




        Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   9
Hadoop – HDFS(Hadoop Distributed File System)
   Reliably store petabytes of replicated data across thousand of nodes
    ◦ Data divided in to 64 MB blocks, each block replicated three times
   Master/Slave architecture
    ◦ Master NameNode contains block locations
    ◦ Slave Datanode manages blocks on local FS
   Built on local commodity hardware
    ◦ No RAID required




                Copyright © 2011 Flytxt B.V. All rights reserved           1/16/2012   10
Hadoop – HDFS(Hadoop Distributed File System)
   Reliably store petabytes of replicated data across thousand of nodes
    ◦ Data divided in to 64 MB blocks, each block replicated three times
   Master/Slave architecture
    ◦ Master NameNode contains block locations
    ◦ Slave Datanode manages blocks on local FS
   Built on local commodity hardware
    ◦ No RAID required




                Copyright © 2011 Flytxt B.V. All rights reserved           1/16/2012   11
Map-Reduce Model




          Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   12
Hadoop – Limitations
   Is not intended for realtime querying.
   Does not support random access.
   Significant learning curve
   Provides barebones functionality out of the box but scaling is built-in and
    inexpensive




                Copyright © 2011 Flytxt B.V. All rights reserved       1/16/2012   13
Where SQL Makes life easy
   Joining
    ◦ In a single query, get all products in an order with their product information
   Secondary Indexing
    ◦ Get CustomerId by e-mail
   Referential Integrity
   Realtime Analysis.
   Millions are trained in SQL and relational data modelling
   RDBMS provides tremendous functionality, but is extremely difficult and
    costly to scale




                 Copyright © 2011 Flytxt B.V. All rights reserved             1/16/2012   14
Master Website – A Practical Example




         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   15
Master Website – RDBMS Use Cases
   Profile Information – That is provided during sign up
   Intelligence generated ie the output of the analytic jobs.
   Any online purchasing track records and account management
   Reporting tools




               Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   16
Master Website – Hadoop Use Cases
   Generating Intelligence from the continuous stream of data
    ◦ Wall Posts on Facebook
   New tags to be added based on the old logs available, due to new
    requirements




                Copyright © 2011 Flytxt B.V. All rights reserved       1/16/2012   17
A Practical Example – Facebook Architecture




         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   18
THANK YOU




       Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   19

More Related Content

Similar to Co existence or Competitions? RDBMS and Hadoop

Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicHKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicLinaro
 
Idc Hpc Web Conf Predictions 2010 Final
Idc Hpc Web Conf Predictions 2010 FinalIdc Hpc Web Conf Predictions 2010 Final
Idc Hpc Web Conf Predictions 2010 FinalChris O'Neal
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
 
Exploring the Next Wave of 10GbE with Crehan Research
Exploring the Next Wave of 10GbE with Crehan ResearchExploring the Next Wave of 10GbE with Crehan Research
Exploring the Next Wave of 10GbE with Crehan ResearchEmulex Corporation
 
The Future of Distributed Databases
The Future of Distributed DatabasesThe Future of Distributed Databases
The Future of Distributed DatabasesNuoDB
 
Scalability 09262012
Scalability 09262012Scalability 09262012
Scalability 09262012Mike Miller
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBVoltDB
 
Dennis Wisnowsky Presentation
Dennis Wisnowsky PresentationDennis Wisnowsky Presentation
Dennis Wisnowsky PresentationMediabistro
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?MarketingArrowECS_CZ
 
Demystifying Modern PLM - Technology
Demystifying Modern PLM - TechnologyDemystifying Modern PLM - Technology
Demystifying Modern PLM - TechnologyOleg Shilovitsky
 
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: TechnologyDemystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: TechnologyOleg Shilovitsky
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Sematext Group, Inc.
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardwareinside-BigData.com
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...Richard Winter
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OSCuneyt Goksu
 

Similar to Co existence or Competitions? RDBMS and Hadoop (20)

Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicHKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
 
Idc Hpc Web Conf Predictions 2010 Final
Idc Hpc Web Conf Predictions 2010 FinalIdc Hpc Web Conf Predictions 2010 Final
Idc Hpc Web Conf Predictions 2010 Final
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Exploring the Next Wave of 10GbE with Crehan Research
Exploring the Next Wave of 10GbE with Crehan ResearchExploring the Next Wave of 10GbE with Crehan Research
Exploring the Next Wave of 10GbE with Crehan Research
 
The Future of Distributed Databases
The Future of Distributed DatabasesThe Future of Distributed Databases
The Future of Distributed Databases
 
Scalability 09262012
Scalability 09262012Scalability 09262012
Scalability 09262012
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Dennis Wisnowsky Presentation
Dennis Wisnowsky PresentationDennis Wisnowsky Presentation
Dennis Wisnowsky Presentation
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
Demystifying Modern PLM - Technology
Demystifying Modern PLM - TechnologyDemystifying Modern PLM - Technology
Demystifying Modern PLM - Technology
 
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: TechnologyDemystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: Technology
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Solving Big Data Problems
Solving Big Data ProblemsSolving Big Data Problems
Solving Big Data Problems
 
Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 

More from Flytxt

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochureFlytxt
 
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...Flytxt
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraFlytxt
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experienceFlytxt
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageFlytxt
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer AnalyticsFlytxt
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochureFlytxt
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Flytxt
 
Improving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic ModellingImproving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic ModellingFlytxt
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingFlytxt
 
Data analytics driven customer experience programs
Data analytics driven customer experience programsData analytics driven customer experience programs
Data analytics driven customer experience programsFlytxt
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Flytxt
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experienceFlytxt
 
7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay DoshiFlytxt
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and HadoopFlytxt
 

More from Flytxt (18)

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochure
 
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital era
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experience
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital age
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing Engagement
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analytics
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochure
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...
 
Improving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic ModellingImproving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic Modelling
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
 
Data analytics driven customer experience programs
Data analytics driven customer experience programsData analytics driven customer experience programs
Data analytics driven customer experience programs
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experience
 
7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Co existence or Competitions? RDBMS and Hadoop

  • 1. RDBMS and Hadoop - Co-existence or competition Ram Mohan Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012
  • 2. Session Agenda!  Introduction to RDBMS  What is Hadoop and Map-Reduce  Hadoop and RDBMS – A comparison  Co-Existence – Practical Example - Master Website  Q&A Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 2
  • 3. Relational DBMS  Based on Relational Mathematics principles  Data is represented in terms of rows and columns of a table  Relational Terminology ◦ Tuple (Row) ◦ Attribute (Column) ◦ Relation (Table)  Integrity Constraints ◦ Primary Key ◦ Foreign Key ◦ Alternate Key  ACID Test ◦ Atomicity ◦ Consistency ◦ Isolation ◦ Durability Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 3
  • 4. Normalization  Normalization - process of removing data redundancy by decomposing relations in a Database.  De normalization - carefully introduced redundancy to improve query performance. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 4
  • 5. Relational DBMS Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 5
  • 6. Example Data S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris P# PNAME COLOR WEIGHT CITY P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S2 P1 300 S2 P2 400 S3 P2 200 Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 6
  • 7. Five computers & a 640k ;-) "I think there is a world market for about five Moore’s computers" Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 7
  • 8. The Big Data Challenges  Sources of Data and the amount of data to analyze is growing exponentially  Stale data exists because DW solutions cannot ingest the vast amounts of data fast enough  Lack of performance for advanced analytics and complex queries  The number of users and the concurrency of users is increasing rapidly Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 8
  • 9. Hadoop Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 9
  • 10. Hadoop – HDFS(Hadoop Distributed File System)  Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times  Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS  Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 10
  • 11. Hadoop – HDFS(Hadoop Distributed File System)  Reliably store petabytes of replicated data across thousand of nodes ◦ Data divided in to 64 MB blocks, each block replicated three times  Master/Slave architecture ◦ Master NameNode contains block locations ◦ Slave Datanode manages blocks on local FS  Built on local commodity hardware ◦ No RAID required Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 11
  • 12. Map-Reduce Model Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 12
  • 13. Hadoop – Limitations  Is not intended for realtime querying.  Does not support random access.  Significant learning curve  Provides barebones functionality out of the box but scaling is built-in and inexpensive Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 13
  • 14. Where SQL Makes life easy  Joining ◦ In a single query, get all products in an order with their product information  Secondary Indexing ◦ Get CustomerId by e-mail  Referential Integrity  Realtime Analysis.  Millions are trained in SQL and relational data modelling  RDBMS provides tremendous functionality, but is extremely difficult and costly to scale Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 14
  • 15. Master Website – A Practical Example Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 15
  • 16. Master Website – RDBMS Use Cases  Profile Information – That is provided during sign up  Intelligence generated ie the output of the analytic jobs.  Any online purchasing track records and account management  Reporting tools Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 16
  • 17. Master Website – Hadoop Use Cases  Generating Intelligence from the continuous stream of data ◦ Wall Posts on Facebook  New tags to be added based on the old logs available, due to new requirements Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 17
  • 18. A Practical Example – Facebook Architecture Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 18
  • 19. THANK YOU Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 19

Editor's Notes

  1. No centralized control.Data Redundancy Data Inconsistency Data can not be sharedStandards can not be enforcedSecurity issues Integrity can not be maintainedData dependenceCentralized control.No Data Redundancy Data Consistency Data can be sharedStandards can be enforcedSecurity can be enforcedIntegrity can be maintainedData independence
  2. Can all the data be structured?Will we be able to store all the data in the tables ie can we model all the data?Should we discard the data after getting the required structured data from the log files or should we archive it?
  3. Take the example of students using the facilities provided by college.
  4. Two Core Components – HDFS & Map-ReduceMachines are un-reliableSeparates distributed fault-tolerant computing code from application logic.No need to worry about identity of a machinelets you interact with a cluster, not a bunch of machines.Analysis workloads span across multiple machinesruns as a cloud(cluster) & possibly on a cloud (EC2)
  5. Consumer interested inSocial NetworkingOnline purchasing/bookingService Provider Interested dataAdvertisements or Revenue generationReporting – For internal house keepingChallenges Recommendation – publishing those advertisements which consumer look at as an information or which he is interested in.