Unlocking	
  Hadoop	
  for	
  Your	
  Rela4onal	
  DB	
  
	
  
	
  
	
  
	
  
	
  
Kathleen Ting | @k...
Who	
  Am	
  I?	
  
•  Started	
  3	
  yr	
  ago	
  as	
  1st	
  Cloudera	
  Support	
  Eng	
  
•  Now	
  manages	
  Cloud...
What	
  is	
  Sqoop?	
  
•  Apache	
  Top-­‐Level	
  Project	
  
•  SQl	
  to	
  hadOOP	
  
•  Tool	
  to	
  transfer	
  d...
Why	
  Sqoop?	
  
•  Efficient/Controlled	
  resource	
  u4liza4on	
  
•  Concurrent	
  connec4ons,	
  Time	
  of	
  opera4o...
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples...
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples...
Sqoop	
  1	
  Architecture	
  
7
Sqoop	
  1	
  Command	
  Line	
  
sqoop TOOL PROPS ARG [-- EXTRA]
•  TOOL:	
  import,	
  export	
  
•  PROPS
•  Hadoop	
  ...
Sqoop	
  1	
  Example	
  
sqoop import 
--connect jdbc:mysql://mysql.example.com/sqoop 
--username sqoop --password sqoop ...
Sqoop	
  1	
  Challenges	
  
•  Cryp4c,	
  contextual	
  command	
  line	
  arguments	
  
•  Security	
  concerns	
  
•  T...
Troubleshoo4ng	
  Sqoop	
  1	
  
•  Versions:	
  Sqoop,	
  Hadoop,	
  OS,	
  JDBC	
  
•  Console	
  log	
  aaer	
  running...
Troubleshoo4ng	
  Sqoop	
  1	
  
Imported	
  table	
  has	
  more	
  rows	
  than	
  source	
  table?	
  
•  Data	
  conta...
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples...
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Ooz...
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Ooz...
Protec4ng	
  Your	
  Password	
  
sqoop import 
--connect jdbc:mysql://mysql.example.com/sqoop 
--username sqoop 
--table ...
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Ooz...
Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
Character parameter '|' has multiple characters;
only the first will...
Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
sqoop import --password "spEci@l$" 
–connect 'jdbc:x:/yyy;db=sqoop’
...
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Ooz...
Choosing	
  Proper	
  Connector	
  
•  JDBC	
  driver	
  is	
  dependency	
  for	
  all	
  
three	
  connectors	
  
•  Sqo...
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Ooz...
Overriding	
  Type	
  Mapping	
  
-­‐-­‐map-­‐column-­‐java	
  parameter	
  
•  comma	
  separated	
  list	
  of	
  key-­‐...
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples...
Sqoop	
  2	
  Architecture	
  
25
Sqoop	
  2	
  Design	
  Goals	
  
•  Security	
  and	
  Separa4on	
  of	
  Concerns	
  
•  Role	
  based	
  access	
  and	...
Sqoop	
  2	
  UI	
  in	
  Hue	
  
•  Troubleshoo4ng	
  
•  sqoop.log	
  file	
  is	
  located	
  in	
  @LOGDIR@	
  and	
  t...
28
29
30
31
32
33
34
35
36
37
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples...
Resources	
  
39
Sqoop 2
http://archive-primary.cloudera.com/
cdh5/cdh/5/sqoop2/
Sqoop 1
Upcoming SlideShare
Loading in...5
×

Apache Sqoop: Unlocking Hadoop for Your Relational Database

1,576

Published on

Kathleen Ting, Technical Account Manager @ Cloudera and Sqoop Committer

Unlocking data stored in an organization's RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we'll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,576
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
107
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Apache Sqoop: Unlocking Hadoop for Your Relational Database

  1. 1.            Unlocking  Hadoop  for  Your  Rela4onal  DB             Kathleen Ting | @kate_ting Technical Account Manager, Cloudera | Sqoop PMC Member Hadoop User Group UK 10 April 2014      
  2. 2. Who  Am  I?   •  Started  3  yr  ago  as  1st  Cloudera  Support  Eng   •  Now  manages  Cloudera’s  2  largest  customers   •  Sqoop  CommiJer,  PMC  Member   •  Co-­‐Author  of  the  Apache  Sqoop  Cookbook  
  3. 3. What  is  Sqoop?   •  Apache  Top-­‐Level  Project   •  SQl  to  hadOOP   •  Tool  to  transfer  data  from   rela4onal  databases   •  Teradata,  MySQL,  PostgreSQL,   Oracle,  Netezza   •  To/From  Hadoop  ecosystem   •  HDFS  (text,  sequence  file),   Hive,  HBase,  Avro   3
  4. 4. Why  Sqoop?   •  Efficient/Controlled  resource  u4liza4on   •  Concurrent  connec4ons,  Time  of  opera4on   •  Datatype  mapping  and  conversion   •  Automa4c,  and  User  override   •  Metadata  propaga4on   •  Sqoop  Record   •  Hive  Metastore   •  Avro  
  5. 5. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  6. 6. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  7. 7. Sqoop  1  Architecture   7
  8. 8. Sqoop  1  Command  Line   sqoop TOOL PROPS ARG [-- EXTRA] •  TOOL:  import,  export   •  PROPS •  Hadoop  (java)  proper4es   •  -Dwhatever.whenever=yes •  ARG •  Generic  SQOOP  arguments   •  --table, --connect,  ...   •  EXTRA •  connector  specific   •  --schema (PostgreSQL  and  Microsoa  SQL  Server)  
  9. 9. Sqoop  1  Example   sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities sqoop export --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities --export-dir /temp/cities
  10. 10. Sqoop  1  Challenges   •  Cryp4c,  contextual  command  line  arguments   •  Security  concerns   •  Type  mapping  is  not  clearly  defined   •  Client  needs  access  to  Hadoop  binaries/configura4on   and  database   •  JDBC  model  is  enforced   10
  11. 11. Troubleshoo4ng  Sqoop  1   •  Versions:  Sqoop,  Hadoop,  OS,  JDBC   •  Console  log  aaer  running  with  the  --verbose flag   •  Capture  the  en4re  output  via  sqoop import … &> sqoop.log •  En4re  Sqoop  command  including  the  op4ons-­‐file  if  applicable   •  Expected  output  and  actual  output   •  Table  defini4on   •  Small  input  data  set  that  triggers  the  problem   •  Especially  with  export,  malformed  data  is  oaen  the  culprit   •  Hadoop  task  logs   •  Oaen  the  task  logs  contain  further  informa4on  describing  the  problem   •  Permissions  on  input  files  
  12. 12. Troubleshoo4ng  Sqoop  1   Imported  table  has  more  rows  than  source  table?   •  Data  contains  char  used  as  Hive’s  delimiters   •  Clean  up  data   •  --hive-drop-import-delims •  Removes  n, t, and 01 char •  --hive-delims-replacement “SPECIAL” •  Replaces  n, t, and 01  char  with  string  SPECIAL •  Not  restricted  to  Hive  -­‐  any  import  job  using  text  files   •  Ensure  output  files  have  one  line  per  imported  row  
  13. 13. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  14. 14. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  15. 15. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  16. 16. Protec4ng  Your  Password   sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --table cities -P sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --table cities --password-file my-sqoop-password
  17. 17. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  18. 18. Sqoop  Works  on  CLI  Not  in  Oozie   Character parameter '|' has multiple characters; only the first will be used. Got error creating database manager: java.io.IOException: No manager for connect string: "jdbc:teradata...”
  19. 19. Sqoop  Works  on  CLI  Not  in  Oozie   sqoop import --password "spEci@l$" –connect 'jdbc:x:/yyy;db=sqoop’ •  Remove  all  escaping  that  you’ve  added  for  the  shell   •  Use  <arg>  vs  <command>  tags  as  content  is   considered  to  be  one  parameter   •  Put  all  -­‐D  parameters  into  configura4on  sec4on   •  Install  driver  into  workflow’s  lib/  directory  or  shared   ac4on  library  /user/oozie/share/lib/sqoop/  
  20. 20. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  21. 21. Choosing  Proper  Connector   •  JDBC  driver  is  dependency  for  all   three  connectors   •  Sqoop  automa4cally  chooses   most  op4mal  connector   (OraOoop,  built-­‐in,            Generic  JDBC  Connector)   •  Or  explicitly  chose:     --connection-manager com.quest.oraoop.OraOopConnManager
  22. 22. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  23. 23. Overriding  Type  Mapping   -­‐-­‐map-­‐column-­‐java  parameter   •  comma  separated  list  of  key-­‐value  pairs   •  key  =  exact  column  name   •  value  =  target  Java  type     sqoop import --map-column-java c1=Float,c2=String,c3=String ...
  24. 24. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  25. 25. Sqoop  2  Architecture   25
  26. 26. Sqoop  2  Design  Goals   •  Security  and  Separa4on  of  Concerns   •  Role  based  access  and  use   •  Ease  of  extension   •  No  low-­‐level  Hadoop  knowledge  needed     •  No  func4onal  overlap  between  Connectors   •  Ease  of  Use   •  Uniform  func4onality   •  Domain  specific  interac4ons  
  27. 27. Sqoop  2  UI  in  Hue   •  Troubleshoo4ng   •  sqoop.log  file  is  located  in  @LOGDIR@  and  the  rest  should   be  in  server/logs/*   •  Look  for  catalina.out,  catalina.log,  localhost-­‐*.log  
  28. 28. 28
  29. 29. 29
  30. 30. 30
  31. 31. 31
  32. 32. 32
  33. 33. 33
  34. 34. 34
  35. 35. 35
  36. 36. 36
  37. 37. 37
  38. 38. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  39. 39. Resources   39 Sqoop 2 http://archive-primary.cloudera.com/ cdh5/cdh/5/sqoop2/ Sqoop 1
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×