Your SlideShare is downloading. ×
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Apache Sqoop: Unlocking Hadoop for Your Relational Database

1,177
views

Published on

Kathleen Ting, Technical Account Manager @ Cloudera and Sqoop Committer …

Kathleen Ting, Technical Account Manager @ Cloudera and Sqoop Committer

Unlocking data stored in an organization's RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we'll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,177
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
80
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1.            Unlocking  Hadoop  for  Your  Rela4onal  DB             Kathleen Ting | @kate_ting Technical Account Manager, Cloudera | Sqoop PMC Member Hadoop User Group UK 10 April 2014      
  • 2. Who  Am  I?   •  Started  3  yr  ago  as  1st  Cloudera  Support  Eng   •  Now  manages  Cloudera’s  2  largest  customers   •  Sqoop  CommiJer,  PMC  Member   •  Co-­‐Author  of  the  Apache  Sqoop  Cookbook  
  • 3. What  is  Sqoop?   •  Apache  Top-­‐Level  Project   •  SQl  to  hadOOP   •  Tool  to  transfer  data  from   rela4onal  databases   •  Teradata,  MySQL,  PostgreSQL,   Oracle,  Netezza   •  To/From  Hadoop  ecosystem   •  HDFS  (text,  sequence  file),   Hive,  HBase,  Avro   3
  • 4. Why  Sqoop?   •  Efficient/Controlled  resource  u4liza4on   •  Concurrent  connec4ons,  Time  of  opera4on   •  Datatype  mapping  and  conversion   •  Automa4c,  and  User  override   •  Metadata  propaga4on   •  Sqoop  Record   •  Hive  Metastore   •  Avro  
  • 5. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 6. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 7. Sqoop  1  Architecture   7
  • 8. Sqoop  1  Command  Line   sqoop TOOL PROPS ARG [-- EXTRA] •  TOOL:  import,  export   •  PROPS •  Hadoop  (java)  proper4es   •  -Dwhatever.whenever=yes •  ARG •  Generic  SQOOP  arguments   •  --table, --connect,  ...   •  EXTRA •  connector  specific   •  --schema (PostgreSQL  and  Microsoa  SQL  Server)  
  • 9. Sqoop  1  Example   sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities sqoop export --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities --export-dir /temp/cities
  • 10. Sqoop  1  Challenges   •  Cryp4c,  contextual  command  line  arguments   •  Security  concerns   •  Type  mapping  is  not  clearly  defined   •  Client  needs  access  to  Hadoop  binaries/configura4on   and  database   •  JDBC  model  is  enforced   10
  • 11. Troubleshoo4ng  Sqoop  1   •  Versions:  Sqoop,  Hadoop,  OS,  JDBC   •  Console  log  aaer  running  with  the  --verbose flag   •  Capture  the  en4re  output  via  sqoop import … &> sqoop.log •  En4re  Sqoop  command  including  the  op4ons-­‐file  if  applicable   •  Expected  output  and  actual  output   •  Table  defini4on   •  Small  input  data  set  that  triggers  the  problem   •  Especially  with  export,  malformed  data  is  oaen  the  culprit   •  Hadoop  task  logs   •  Oaen  the  task  logs  contain  further  informa4on  describing  the  problem   •  Permissions  on  input  files  
  • 12. Troubleshoo4ng  Sqoop  1   Imported  table  has  more  rows  than  source  table?   •  Data  contains  char  used  as  Hive’s  delimiters   •  Clean  up  data   •  --hive-drop-import-delims •  Removes  n, t, and 01 char •  --hive-delims-replacement “SPECIAL” •  Replaces  n, t, and 01  char  with  string  SPECIAL •  Not  restricted  to  Hive  -­‐  any  import  job  using  text  files   •  Ensure  output  files  have  one  line  per  imported  row  
  • 13. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 14. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 15. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 16. Protec4ng  Your  Password   sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --table cities -P sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --table cities --password-file my-sqoop-password
  • 17. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 18. Sqoop  Works  on  CLI  Not  in  Oozie   Character parameter '|' has multiple characters; only the first will be used. Got error creating database manager: java.io.IOException: No manager for connect string: "jdbc:teradata...”
  • 19. Sqoop  Works  on  CLI  Not  in  Oozie   sqoop import --password "spEci@l$" –connect 'jdbc:x:/yyy;db=sqoop’ •  Remove  all  escaping  that  you’ve  added  for  the  shell   •  Use  <arg>  vs  <command>  tags  as  content  is   considered  to  be  one  parameter   •  Put  all  -­‐D  parameters  into  configura4on  sec4on   •  Install  driver  into  workflow’s  lib/  directory  or  shared   ac4on  library  /user/oozie/share/lib/sqoop/  
  • 20. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 21. Choosing  Proper  Connector   •  JDBC  driver  is  dependency  for  all   three  connectors   •  Sqoop  automa4cally  chooses   most  op4mal  connector   (OraOoop,  built-­‐in,            Generic  JDBC  Connector)   •  Or  explicitly  chose:     --connection-manager com.quest.oraoop.OraOopConnManager
  • 22. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 23. Overriding  Type  Mapping   -­‐-­‐map-­‐column-­‐java  parameter   •  comma  separated  list  of  key-­‐value  pairs   •  key  =  exact  column  name   •  value  =  target  Java  type     sqoop import --map-column-java c1=Float,c2=String,c3=String ...
  • 24. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 25. Sqoop  2  Architecture   25
  • 26. Sqoop  2  Design  Goals   •  Security  and  Separa4on  of  Concerns   •  Role  based  access  and  use   •  Ease  of  extension   •  No  low-­‐level  Hadoop  knowledge  needed     •  No  func4onal  overlap  between  Connectors   •  Ease  of  Use   •  Uniform  func4onality   •  Domain  specific  interac4ons  
  • 27. Sqoop  2  UI  in  Hue   •  Troubleshoo4ng   •  sqoop.log  file  is  located  in  @LOGDIR@  and  the  rest  should   be  in  server/logs/*   •  Look  for  catalina.out,  catalina.log,  localhost-­‐*.log  
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 39. Resources   39 Sqoop 2 http://archive-primary.cloudera.com/ cdh5/cdh/5/sqoop2/ Sqoop 1