Habits of Effective Sqoop Users

Kathleen Ting
Kathleen TingAuthor, Committer, Technical Account Manager at Cloudera
Habits of Effective Sqoop Users
Kate Ting, Customer Operations Engineer
kate@cloudera.com
Halp! Sqoop doesn't work!

    Now what?


2
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

3
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

4
First Things First
    Save time by providing this upfront:
    •  Versions: Sqoop, Hadoop, OS, JDBC
    •  Run with --verbose flag then attach log
    •  Sqoop command including options-file
    •  Expected output vs. actual output
    •  Table definition
    •  Input data set that triggers problem
    •  Hadoop task logs
    •  Check permissions on input files
    •  Divide and conquer
       –  e.g. Import that creates and populates a Hive table is failing
           •  First, do the import alone
           •  Second, create a Hive table without the import using the create-hive-
              table tool



5
Common Problems


6
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

7
MySQL: Connection Failure
    java.lang.RuntimeException: java.lang.RuntimeException:
    com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure

    The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received
    any packets from the server.
    at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 )
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)




8
MySQL: Connection Failure
    •    Problem: Communications Link Failure caused by incorrect permissions.
    •    Solution:
          –  Verify that you can connect to the database from the node where you
             are running Sqoop:
               •  $ mysql --host=<IP Address> --database=test --user=<username> --
                  password=<password>
          –  Add the network port for the server to your my.cnf file
          –  Set up a user account to connect via Sqoop. Grant permissions to the
             user to access the database over the network:
               •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword>
               •  Issue the following command: mysql> grant all privileges on test.* to
                  'testuser'@'%' identified by 'testpassword'




9
MySQL: Importing into Hive
 •  Troubleshooting tips:
     –  Look at /tmp/${user}/hive.log
        •  Identifies exceptions during the load
     –  Look at /user/hive/warehouse
        •  View contents of the imported data




10
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

11
Oracle: Case-Sensitive Catalog Query Errors
 INFO manager.OracleManager: Time zone has been set to GMT
 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
 INFO manager.SqlManager: Executing SQL statement:
 SELECT t.* FROM addlabel_pris t WHERE 1=0
 DEBUG manager.OracleManager$ConnCache: Caching
 released connection for jdbc:oracle:thin:
 ERROR sqoop.Sqoop: Got exception running Sqoop:
 java.lang.NullPointerException
 java.lang.NullPointerException
 at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148)
 at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187)
 at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
 at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
 at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
 at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)




12
Oracle: Case-Sensitive Catalog Query Errors

 •  Problem: NPE caused by using the wrong
    case for the user name and table name.
 •  Solution: Always specify the user and table
    names in upper case (unless it was
    created with mixed/lower case within
    quotes).




13
Oracle: Sqoop Export Failing
 INFO mapred.JobClient: Running job: job_201109231340_0785
 INFO mapred.JobClient: map 0% reduce 0%
 INFO mapred.JobClient: Task Id :
 attempt_201109231340_0785_m_000000_0, Status : FAILED
 java.lang.NullPointerException
 at
 com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe
 lectQuery(DataDrivenDBRecordReader.java:87)
 at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue
 (DBRecordReader.java:225)
 at org.apache.hadoop.mapred.MapTask
 $NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
 (MapContext.java:67)




14
Oracle: Sqoop Export Failing
 •  Problem: IllegalArgumentException
    caused by not non-owner trying to connect
    to the table.
 •  Solution: Prefix the table name with the
    schema, for example
    SchemaName.OracleTableName.




15
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

16
Effective Sqoop Habits

 •  Do create an empty export table.

 •  Don’t use the same table for both import
    and export.




17
Effective Sqoop Habits

 •  Do use --escaped-by option during import
    and --input-escaped-by during export.
 •  Do use fields-terminated-by during import
    and input-fields-terminated-by during
    export.

 •  Don’t reverse them.



18
Effective Sqoop Habits

 •  Do specify the direct mode option (--
    direct), if you use the direct connector.

 •  Don’t specify the query, if you use the
    direct connector.




19
How Do You Eat an Elephant?

 •  One bite at a time
     –  Versions
     –  Verbose flag
     –  Console log
     –  Exact command, etc

 •  Sqoop Troubleshooting Guide
     –  http://archive.cloudera.com/cdh/3/sqoop/
        SqoopUserGuide.html#_troubleshooting


20
1 of 20

Recommended

Apache Sqoop: Unlocking Hadoop for Your Relational Database by
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
3.5K views39 slides
From oracle to hadoop with Sqoop and other tools by
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
19.1K views34 slides
Advanced Sqoop by
Advanced Sqoop Advanced Sqoop
Advanced Sqoop Yogesh Kulkarni
4.5K views14 slides
HiveServer2 by
HiveServer2HiveServer2
HiveServer2Schubert Zhang
27.7K views18 slides
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools by
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
2.6K views14 slides
SQL to Hive Cheat Sheet by
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
500.4K views3 slides

More Related Content

What's hot

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab by
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
143 views14 slides
Why your Spark job is failing by
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
69.1K views50 slides
11. From Hadoop to Spark 2/2 by
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
8.9K views93 slides
Data analysis scala_spark by
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
251 views8 slides
Moving Data Between Exadata and Hadoop by
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopEnkitec
2.1K views24 slides
Data Analytics Service Company and Its Ruby Usage by
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
21K views51 slides

What's hot(19)

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab by CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab143 views
Why your Spark job is failing by Sandy Ryza
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza69.1K views
11. From Hadoop to Spark 2/2 by Fabio Fumarola
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
Fabio Fumarola8.9K views
Data analysis scala_spark by Yiguang Hu
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu251 views
Moving Data Between Exadata and Hadoop by Enkitec
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and Hadoop
Enkitec2.1K views
Data Analytics Service Company and Its Ruby Usage by SATOSHI TAGOMORI
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI21K views
DataEngConf SF16 - Collecting and Moving Data at Scale by Hakka Labs
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs676 views
Spark zeppelin-cassandra at synchrotron by Duyhai Doan
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
Duyhai Doan1.1K views
Apache zeppelin the missing component for the big data ecosystem by Duyhai Doan
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
Duyhai Doan2.1K views
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab by CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab421 views
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha... by Lucidworks
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Lucidworks1.5K views
Tanel Poder - Performance stories from Exadata Migrations by Tanel Poder
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder6.3K views
Introduction to Apache Hive by Avkash Chauhan
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan13.1K views
Using Morphlines for On-the-Fly ETL by Cloudera, Inc.
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
Cloudera, Inc.18.8K views
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku by Redis Labs
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Redis Labs2.4K views
Tanel Poder Oracle Scripts and Tools (2010) by Tanel Poder
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder46.2K views
How to build your query engine in spark by Peng Cheng
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
Peng Cheng4.7K views
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab by CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab378 views

Viewers also liked

Apache sqoop with an use case by
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
7.7K views32 slides
New Data Transfer Tools for Hadoop: Sqoop 2 by
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
18.9K views45 slides
The Transporter 4 MySQL by
The Transporter 4 MySQLThe Transporter 4 MySQL
The Transporter 4 MySQLSagar Nikam
608 views80 slides
Apache Sqoop: A Data Transfer Tool for Hadoop by
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
19.9K views22 slides
Hadoop and rdbms with sqoop by
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Guy Harrison
7.8K views24 slides
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2 by
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Cloudera, Inc.
1.7K views43 slides

Viewers also liked(10)

Apache sqoop with an use case by Davin Abraham
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham7.7K views
New Data Transfer Tools for Hadoop: Sqoop 2 by DataWorks Summit
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit18.9K views
The Transporter 4 MySQL by Sagar Nikam
The Transporter 4 MySQLThe Transporter 4 MySQL
The Transporter 4 MySQL
Sagar Nikam608 views
Apache Sqoop: A Data Transfer Tool for Hadoop by Cloudera, Inc.
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
Cloudera, Inc.19.9K views
Hadoop and rdbms with sqoop by Guy Harrison
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison7.8K views
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2 by Cloudera, Inc.
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Cloudera, Inc.1.7K views
Big data components - Introduction to Flume, Pig and Sqoop by Jeyamariappan Guru
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop by Someshwar Kale
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Someshwar Kale1.9K views
Sqoop on Spark for Data Ingestion by DataWorks Summit
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit16.3K views

Similar to Habits of Effective Sqoop Users

Getting started with Riak in the Cloud by
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
1.1K views55 slides
Node object and roles - Fundamentals Webinar Series Part 3 by
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Chef
21.5K views65 slides
Building an Impenetrable ZooKeeper - Kathleen Ting by
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Tingjaxconf
726 views27 slides
Writing Better Haskell by
Writing Better HaskellWriting Better Haskell
Writing Better Haskellnkpart
544 views26 slides
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC by
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
10.5K views45 slides
Capacity Management/Provisioning (Cloud's full, Can't build here) by
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)andyhky
574 views17 slides

Similar to Habits of Effective Sqoop Users(20)

Getting started with Riak in the Cloud by Ines Sombra
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
Ines Sombra1.1K views
Node object and roles - Fundamentals Webinar Series Part 3 by Chef
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3
Chef21.5K views
Building an Impenetrable ZooKeeper - Kathleen Ting by jaxconf
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
jaxconf726 views
Writing Better Haskell by nkpart
Writing Better HaskellWriting Better Haskell
Writing Better Haskell
nkpart544 views
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC by Kristofferson A
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A10.5K views
Capacity Management/Provisioning (Cloud's full, Can't build here) by andyhky
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
andyhky574 views
Building Out Your Kafka Developer CDC Ecosystem by confluent
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
confluent373 views
BTV PHP - Building Fast Websites by Jonathan Klein
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
Jonathan Klein1.6K views
Workflow Engines for Hadoop by Joe Crobak
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
Joe Crobak38.4K views
Host Health Monitoring with Docker Run by Noah Zoschke
Host Health Monitoring with Docker RunHost Health Monitoring with Docker Run
Host Health Monitoring with Docker Run
Noah Zoschke3K views
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194... by Nilesh Panchal
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Nilesh Panchal5K views
DrupalSouth 2015 - Performance: Not an Afterthought by Nick Santamaria
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
Nick Santamaria377 views
Kubernetes Walk Through from Technical View by Lei (Harry) Zhang
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang1.3K views
Breaking the Monolith - Microservice Extraction at SoundCloud by Jan Kischkel
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloud
Jan Kischkel965 views
TryStack: A Sandbox for OpenStack Users and Admins by Anne Gentle
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
Anne Gentle1.9K views
Troubleshooting Hadoop: Distributed Debugging by Great Wide Open
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open1.1K views
MariaDB: in-depth (hands on training in Seoul) by Colin Charles
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
Colin Charles9.2K views

Recently uploaded

Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...ShapeBlue
159 views25 slides
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...ShapeBlue
132 views13 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 views20 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
170 views29 slides
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueShapeBlue
263 views23 slides
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineShapeBlue
221 views19 slides

Recently uploaded(20)

Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue159 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue132 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue263 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue180 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue130 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue173 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue297 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue138 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson160 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue186 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue184 views

Habits of Effective Sqoop Users

  • 1. Habits of Effective Sqoop Users Kate Ting, Customer Operations Engineer kate@cloudera.com
  • 2. Halp! Sqoop doesn't work! Now what? 2
  • 3. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 3
  • 4. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 4
  • 5. First Things First Save time by providing this upfront: •  Versions: Sqoop, Hadoop, OS, JDBC •  Run with --verbose flag then attach log •  Sqoop command including options-file •  Expected output vs. actual output •  Table definition •  Input data set that triggers problem •  Hadoop task logs •  Check permissions on input files •  Divide and conquer –  e.g. Import that creates and populates a Hive table is failing •  First, do the import alone •  Second, create a Hive table without the import using the create-hive- table tool 5
  • 7. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 7
  • 8. MySQL: Connection Failure java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 ) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) 8
  • 9. MySQL: Connection Failure •  Problem: Communications Link Failure caused by incorrect permissions. •  Solution: –  Verify that you can connect to the database from the node where you are running Sqoop: •  $ mysql --host=<IP Address> --database=test --user=<username> -- password=<password> –  Add the network port for the server to your my.cnf file –  Set up a user account to connect via Sqoop. Grant permissions to the user to access the database over the network: •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword> •  Issue the following command: mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' 9
  • 10. MySQL: Importing into Hive •  Troubleshooting tips: –  Look at /tmp/${user}/hive.log •  Identifies exceptions during the load –  Look at /user/hive/warehouse •  View contents of the imported data 10
  • 11. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 11
  • 12. Oracle: Case-Sensitive Catalog Query Errors INFO manager.OracleManager: Time zone has been set to GMT DEBUG manager.SqlManager: Using fetchSize for next query: 1000 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM addlabel_pris t WHERE 1=0 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin: ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148) at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187) at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362) at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423) at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237) 12
  • 13. Oracle: Case-Sensitive Catalog Query Errors •  Problem: NPE caused by using the wrong case for the user name and table name. •  Solution: Always specify the user and table names in upper case (unless it was created with mixed/lower case within quotes). 13
  • 14. Oracle: Sqoop Export Failing INFO mapred.JobClient: Running job: job_201109231340_0785 INFO mapred.JobClient: map 0% reduce 0% INFO mapred.JobClient: Task Id : attempt_201109231340_0785_m_000000_0, Status : FAILED java.lang.NullPointerException at com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe lectQuery(DataDrivenDBRecordReader.java:87) at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue (DBRecordReader.java:225) at org.apache.hadoop.mapred.MapTask $NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue (MapContext.java:67) 14
  • 15. Oracle: Sqoop Export Failing •  Problem: IllegalArgumentException caused by not non-owner trying to connect to the table. •  Solution: Prefix the table name with the schema, for example SchemaName.OracleTableName. 15
  • 16. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 16
  • 17. Effective Sqoop Habits •  Do create an empty export table. •  Don’t use the same table for both import and export. 17
  • 18. Effective Sqoop Habits •  Do use --escaped-by option during import and --input-escaped-by during export. •  Do use fields-terminated-by during import and input-fields-terminated-by during export. •  Don’t reverse them. 18
  • 19. Effective Sqoop Habits •  Do specify the direct mode option (-- direct), if you use the direct connector. •  Don’t specify the query, if you use the direct connector. 19
  • 20. How Do You Eat an Elephant? •  One bite at a time –  Versions –  Verbose flag –  Console log –  Exact command, etc •  Sqoop Troubleshooting Guide –  http://archive.cloudera.com/cdh/3/sqoop/ SqoopUserGuide.html#_troubleshooting 20

Editor's Notes

  1. Sqoopdoes not guarantee intuitive error messages. But I guarantee that in the next ten minutes you will either learn or be reminded of a fewtips to make your next debugging session more effective.
  2. Specifying the query bypasses the direct connector