• Like
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Why Every CouchbaseDeployment Should Be PairedWith HadoopCouchbase Server + Cloudera CDH
  • 2. pol·∙y·∙glot  /ˈpäliˌglät/   Adjec1ve:  Knowing  or  using  several  languages.   Noun:    A  person  who  knows  several  languages.   Synonyms:  mul1lingual    per·sist·ence /pəәrˈsistəәns/Noun: The continued or prolonged existence of something.Synonyms: perseverance - tenacity - pertinacity – stubbornness 2  
  • 3. Mo1va1on   Until recently, the architecture behind our persistence systems were designed for: •  Extremely limited RAM •  Limited storage capacity •  Limited I/O throughput •  Simple transformation on data, if any 3  
  • 4. What is Hadoop?  Highly scalable  Unstructured data  Open source  Big Data Operating System  Changing the World One Petabyte at a Time
  • 5. What is Hadoop?  Simplest unit of compute and storage Disks Application CPU Data
  • 6. What is Hadoop?  And when it grows? Application Data
  • 7. What is Hadoop?  And when it grows more?
  • 8. What is Hadoop?  NoSQL to the rescue! Application Data
  • 9. What is Hadoop?  Hadoop is a different paradigm Application Data
  • 10. What is Sqoop?Sqoop is a tool designed to transfer data betweenHadoop and relational databases. You can useSqoop to import data from a relational databasemanagement system (RDBMS) such as MySQL orOracle into the Hadoop Distributed File System(HDFS), transform the data in Hadoop MapReduce,and then export the data back into an RDBMS. sqoop.apache.org
  • 11. What is Sqoop?  Traditional ETL T Data Application Data
  • 12. What is Sqoop?  A different paradigm Application Data Data
  • 13. What is Sqoop?  A very scalable different paradigm Application Data Application Data Application Data Data
  • 14. What is Sqoop?  Where did the Transform go?TTT TTT TTT TTT Application Data
  • 15. Sqoop Details  Sqoop 1.4.1 bundled in CDH4  Sqoop 2.0 coming soon  Default connection is via JDBC  Lots of custom connectors -  Couchbase, VoltDB, Vertica -  Teradata, Netezza -  Oracle, MySQL, Postgres
  • 16.    COMMON  USE  CASES     16  1
  • 17. Ad  and  offer  targe1ng   40  milliseconds  to  respond   with  the  decision.   profiles,  real  1me  campaign     3   sta1s1cs   2   1   profiles,  campaigns   events   17  
  • 18. Ad  Targe1ng:  Moving  Parts   Ad Targeting Platform Logs Logs Logs Couchbase Server Cluster Logs sqoop export Logs flume flow sqoop import Hadoop Cluster 18  
  • 19. Content  and  Recommenda1on  Targe1ng   3   make     recommenda1ons   Content Oriented Site 1   events   Legacy Relational Database 2   user  profiles   19  
  • 20. Content  Driven  Site:  Moving  Parts   In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, Content Driven data behind content driven sites is shifting to Web Site Couchbase. Couchbase Server Cluster Legacy RDBMS Logs Logs Logs Logs Logs Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources. flume flow sqoop import sqoop export sqoop import Hadoop Cluster 20  
  • 21. DEMO!   21  2
  • 22. here be a demo which shows a workload against couchbase, sqooping thatover into hadoop, running some processing there, then sqooping the databack to couchbase. possibly using oozie to drive sqoop processing 22  
  • 24. Couchbase  Import  and  Export   $ sqoop import –-connect http://localhost:8091/pools --table DUMP $ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5 $ sqoop export --connect http://localhost:8091/pools --table DUMP –export-dir DUMP •  For  Imports,  table  must  be:   –  DUMP:  All  keys  currently  in  Couchbase   –  BACKFILL_n:  All  key  muta1ons  for  n  minutes   •  Specified  –username  maps  to  bucket   –  By  default  set  to  “default”  bucket   24  
  • 25. QUESTIONS?   25  
  • 26. THANK  YOU!   Get  Couchbase  Server  at    hcp://www.couchbase.com/download     Give  us  feedback  at:   hcp://www.couchbase.com/forums   26  
  • 27. Image  acribu1on   •  TRS-­‐80  computer:  hcp://www.fotopedia.com/items/ flickr-­‐455238557   27