CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

3,849 views
3,872 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,849
On SlideShare
0
From Embeds
0
Number of Embeds
3,202
Actions
Shares
0
Downloads
25
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

  1. 1. Why Every CouchbaseDeployment Should Be PairedWith HadoopCouchbase Server + Cloudera CDH
  2. 2. pol·∙y·∙glot  /ˈpäliˌglät/   Adjec1ve:  Knowing  or  using  several  languages.   Noun:    A  person  who  knows  several  languages.   Synonyms:  mul1lingual    per·sist·ence /pəәrˈsistəәns/Noun: The continued or prolonged existence of something.Synonyms: perseverance - tenacity - pertinacity – stubbornness 2  
  3. 3. Mo1va1on   Until recently, the architecture behind our persistence systems were designed for: •  Extremely limited RAM •  Limited storage capacity •  Limited I/O throughput •  Simple transformation on data, if any 3  
  4. 4. What is Hadoop?  Highly scalable  Unstructured data  Open source  Big Data Operating System  Changing the World One Petabyte at a Time
  5. 5. What is Hadoop?  Simplest unit of compute and storage Disks Application CPU Data
  6. 6. What is Hadoop?  And when it grows? Application Data
  7. 7. What is Hadoop?  And when it grows more?
  8. 8. What is Hadoop?  NoSQL to the rescue! Application Data
  9. 9. What is Hadoop?  Hadoop is a different paradigm Application Data
  10. 10. What is Sqoop?Sqoop is a tool designed to transfer data betweenHadoop and relational databases. You can useSqoop to import data from a relational databasemanagement system (RDBMS) such as MySQL orOracle into the Hadoop Distributed File System(HDFS), transform the data in Hadoop MapReduce,and then export the data back into an RDBMS. sqoop.apache.org
  11. 11. What is Sqoop?  Traditional ETL T Data Application Data
  12. 12. What is Sqoop?  A different paradigm Application Data Data
  13. 13. What is Sqoop?  A very scalable different paradigm Application Data Application Data Application Data Data
  14. 14. What is Sqoop?  Where did the Transform go?TTT TTT TTT TTT Application Data
  15. 15. Sqoop Details  Sqoop 1.4.1 bundled in CDH4  Sqoop 2.0 coming soon  Default connection is via JDBC  Lots of custom connectors -  Couchbase, VoltDB, Vertica -  Teradata, Netezza -  Oracle, MySQL, Postgres
  16. 16.    COMMON  USE  CASES     16  1
  17. 17. Ad  and  offer  targe1ng   40  milliseconds  to  respond   with  the  decision.   profiles,  real  1me  campaign     3   sta1s1cs   2   1   profiles,  campaigns   events   17  
  18. 18. Ad  Targe1ng:  Moving  Parts   Ad Targeting Platform Logs Logs Logs Couchbase Server Cluster Logs sqoop export Logs flume flow sqoop import Hadoop Cluster 18  
  19. 19. Content  and  Recommenda1on  Targe1ng   3   make     recommenda1ons   Content Oriented Site 1   events   Legacy Relational Database 2   user  profiles   19  
  20. 20. Content  Driven  Site:  Moving  Parts   In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, Content Driven data behind content driven sites is shifting to Web Site Couchbase. Couchbase Server Cluster Legacy RDBMS Logs Logs Logs Logs Logs Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources. flume flow sqoop import sqoop export sqoop import Hadoop Cluster 20  
  21. 21. DEMO!   21  2
  22. 22. here be a demo which shows a workload against couchbase, sqooping thatover into hadoop, running some processing there, then sqooping the databack to couchbase. possibly using oozie to drive sqoop processing 22  
  23. 23. RUNNING  SQOOP  AND  OPTIONS   23  
  24. 24. Couchbase  Import  and  Export   $ sqoop import –-connect http://localhost:8091/pools --table DUMP $ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5 $ sqoop export --connect http://localhost:8091/pools --table DUMP –export-dir DUMP •  For  Imports,  table  must  be:   –  DUMP:  All  keys  currently  in  Couchbase   –  BACKFILL_n:  All  key  muta1ons  for  n  minutes   •  Specified  –username  maps  to  bucket   –  By  default  set  to  “default”  bucket   24  
  25. 25. QUESTIONS?   25  
  26. 26. THANK  YOU!   Get  Couchbase  Server  at    hcp://www.couchbase.com/download     Give  us  feedback  at:   hcp://www.couchbase.com/forums   26  
  27. 27. Image  acribu1on   •  TRS-­‐80  computer:  hcp://www.fotopedia.com/items/ flickr-­‐455238557   27  

×