SlideShare a Scribd company logo
Habits of Effective Sqoop Users
Kate Ting, Customer Operations Engineer
kate@cloudera.com
Halp! Sqoop doesn't work!

    Now what?


2
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

3
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

4
First Things First
    Save time by providing this upfront:
    •  Versions: Sqoop, Hadoop, OS, JDBC
    •  Run with --verbose flag then attach log
    •  Sqoop command including options-file
    •  Expected output vs. actual output
    •  Table definition
    •  Input data set that triggers problem
    •  Hadoop task logs
    •  Check permissions on input files
    •  Divide and conquer
       –  e.g. Import that creates and populates a Hive table is failing
           •  First, do the import alone
           •  Second, create a Hive table without the import using the create-hive-
              table tool



5
Common Problems


6
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

7
MySQL: Connection Failure
    java.lang.RuntimeException: java.lang.RuntimeException:
    com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure

    The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received
    any packets from the server.
    at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 )
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)




8
MySQL: Connection Failure
    •    Problem: Communications Link Failure caused by incorrect permissions.
    •    Solution:
          –  Verify that you can connect to the database from the node where you
             are running Sqoop:
               •  $ mysql --host=<IP Address> --database=test --user=<username> --
                  password=<password>
          –  Add the network port for the server to your my.cnf file
          –  Set up a user account to connect via Sqoop. Grant permissions to the
             user to access the database over the network:
               •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword>
               •  Issue the following command: mysql> grant all privileges on test.* to
                  'testuser'@'%' identified by 'testpassword'




9
MySQL: Importing into Hive
 •  Troubleshooting tips:
     –  Look at /tmp/${user}/hive.log
        •  Identifies exceptions during the load
     –  Look at /user/hive/warehouse
        •  View contents of the imported data




10
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

11
Oracle: Case-Sensitive Catalog Query Errors
 INFO manager.OracleManager: Time zone has been set to GMT
 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
 INFO manager.SqlManager: Executing SQL statement:
 SELECT t.* FROM addlabel_pris t WHERE 1=0
 DEBUG manager.OracleManager$ConnCache: Caching
 released connection for jdbc:oracle:thin:
 ERROR sqoop.Sqoop: Got exception running Sqoop:
 java.lang.NullPointerException
 java.lang.NullPointerException
 at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148)
 at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187)
 at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
 at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
 at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
 at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)




12
Oracle: Case-Sensitive Catalog Query Errors

 •  Problem: NPE caused by using the wrong
    case for the user name and table name.
 •  Solution: Always specify the user and table
    names in upper case (unless it was
    created with mixed/lower case within
    quotes).




13
Oracle: Sqoop Export Failing
 INFO mapred.JobClient: Running job: job_201109231340_0785
 INFO mapred.JobClient: map 0% reduce 0%
 INFO mapred.JobClient: Task Id :
 attempt_201109231340_0785_m_000000_0, Status : FAILED
 java.lang.NullPointerException
 at
 com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe
 lectQuery(DataDrivenDBRecordReader.java:87)
 at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue
 (DBRecordReader.java:225)
 at org.apache.hadoop.mapred.MapTask
 $NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
 (MapContext.java:67)




14
Oracle: Sqoop Export Failing
 •  Problem: IllegalArgumentException
    caused by not non-owner trying to connect
    to the table.
 •  Solution: Prefix the table name with the
    schema, for example
    SchemaName.OracleTableName.




15
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

16
Effective Sqoop Habits

 •  Do create an empty export table.

 •  Don’t use the same table for both import
    and export.




17
Effective Sqoop Habits

 •  Do use --escaped-by option during import
    and --input-escaped-by during export.
 •  Do use fields-terminated-by during import
    and input-fields-terminated-by during
    export.

 •  Don’t reverse them.



18
Effective Sqoop Habits

 •  Do specify the direct mode option (--
    direct), if you use the direct connector.

 •  Don’t specify the query, if you use the
    direct connector.




19
How Do You Eat an Elephant?

 •  One bite at a time
     –  Versions
     –  Verbose flag
     –  Console log
     –  Exact command, etc

 •  Sqoop Troubleshooting Guide
     –  http://archive.cloudera.com/cdh/3/sqoop/
        SqoopUserGuide.html#_troubleshooting


20

More Related Content

What's hot

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
Fabio Fumarola
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Moving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and Hadoop
Enkitec
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
Duyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
Duyhai Doan
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Lucidworks
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
Cloudera, Inc.
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Redis Labs
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
Peng Cheng
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 

What's hot (19)

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Moving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and Hadoop
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
 

Viewers also liked

Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
The Transporter 4 MySQL
The Transporter 4 MySQLThe Transporter 4 MySQL
The Transporter 4 MySQL
Sagar Nikam
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
Cloudera, Inc.
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Cloudera, Inc.
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
Jeyamariappan Guru
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Someshwar Kale
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
 

Viewers also liked (10)

Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
The Transporter 4 MySQL
The Transporter 4 MySQLThe Transporter 4 MySQL
The Transporter 4 MySQL
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 

Similar to Habits of Effective Sqoop Users

Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
Ines Sombra
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3
Chef
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
jaxconf
 
Writing Better Haskell
Writing Better HaskellWriting Better Haskell
Writing Better Haskell
nkpart
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
andyhky
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
confluent
 
Velocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopVelocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack Workshop
Chef Software, Inc.
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
Jonathan Klein
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
Joe Crobak
 
Host Health Monitoring with Docker Run
Host Health Monitoring with Docker RunHost Health Monitoring with Docker Run
Host Health Monitoring with Docker Run
Noah Zoschke
 
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Nilesh Panchal
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
Nick Santamaria
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloud
Jan Kischkel
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
Anne Gentle
 
Avoid boring work_v2
Avoid boring work_v2Avoid boring work_v2
Avoid boring work_v2
Marcin Przepiórowski
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
InMobi Technology
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
Colin Charles
 

Similar to Habits of Effective Sqoop Users (20)

Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
 
Writing Better Haskell
Writing Better HaskellWriting Better Haskell
Writing Better Haskell
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
 
Velocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopVelocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack Workshop
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Host Health Monitoring with Docker Run
Host Health Monitoring with Docker RunHost Health Monitoring with Docker Run
Host Health Monitoring with Docker Run
 
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloud
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
 
Avoid boring work_v2
Avoid boring work_v2Avoid boring work_v2
Avoid boring work_v2
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
 

Recently uploaded

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Habits of Effective Sqoop Users

  • 1. Habits of Effective Sqoop Users Kate Ting, Customer Operations Engineer kate@cloudera.com
  • 2. Halp! Sqoop doesn't work! Now what? 2
  • 3. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 3
  • 4. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 4
  • 5. First Things First Save time by providing this upfront: •  Versions: Sqoop, Hadoop, OS, JDBC •  Run with --verbose flag then attach log •  Sqoop command including options-file •  Expected output vs. actual output •  Table definition •  Input data set that triggers problem •  Hadoop task logs •  Check permissions on input files •  Divide and conquer –  e.g. Import that creates and populates a Hive table is failing •  First, do the import alone •  Second, create a Hive table without the import using the create-hive- table tool 5
  • 7. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 7
  • 8. MySQL: Connection Failure java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 ) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) 8
  • 9. MySQL: Connection Failure •  Problem: Communications Link Failure caused by incorrect permissions. •  Solution: –  Verify that you can connect to the database from the node where you are running Sqoop: •  $ mysql --host=<IP Address> --database=test --user=<username> -- password=<password> –  Add the network port for the server to your my.cnf file –  Set up a user account to connect via Sqoop. Grant permissions to the user to access the database over the network: •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword> •  Issue the following command: mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' 9
  • 10. MySQL: Importing into Hive •  Troubleshooting tips: –  Look at /tmp/${user}/hive.log •  Identifies exceptions during the load –  Look at /user/hive/warehouse •  View contents of the imported data 10
  • 11. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 11
  • 12. Oracle: Case-Sensitive Catalog Query Errors INFO manager.OracleManager: Time zone has been set to GMT DEBUG manager.SqlManager: Using fetchSize for next query: 1000 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM addlabel_pris t WHERE 1=0 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin: ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148) at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187) at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362) at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423) at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237) 12
  • 13. Oracle: Case-Sensitive Catalog Query Errors •  Problem: NPE caused by using the wrong case for the user name and table name. •  Solution: Always specify the user and table names in upper case (unless it was created with mixed/lower case within quotes). 13
  • 14. Oracle: Sqoop Export Failing INFO mapred.JobClient: Running job: job_201109231340_0785 INFO mapred.JobClient: map 0% reduce 0% INFO mapred.JobClient: Task Id : attempt_201109231340_0785_m_000000_0, Status : FAILED java.lang.NullPointerException at com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe lectQuery(DataDrivenDBRecordReader.java:87) at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue (DBRecordReader.java:225) at org.apache.hadoop.mapred.MapTask $NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue (MapContext.java:67) 14
  • 15. Oracle: Sqoop Export Failing •  Problem: IllegalArgumentException caused by not non-owner trying to connect to the table. •  Solution: Prefix the table name with the schema, for example SchemaName.OracleTableName. 15
  • 16. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 16
  • 17. Effective Sqoop Habits •  Do create an empty export table. •  Don’t use the same table for both import and export. 17
  • 18. Effective Sqoop Habits •  Do use --escaped-by option during import and --input-escaped-by during export. •  Do use fields-terminated-by during import and input-fields-terminated-by during export. •  Don’t reverse them. 18
  • 19. Effective Sqoop Habits •  Do specify the direct mode option (-- direct), if you use the direct connector. •  Don’t specify the query, if you use the direct connector. 19
  • 20. How Do You Eat an Elephant? •  One bite at a time –  Versions –  Verbose flag –  Console log –  Exact command, etc •  Sqoop Troubleshooting Guide –  http://archive.cloudera.com/cdh/3/sqoop/ SqoopUserGuide.html#_troubleshooting 20

Editor's Notes

  1. Sqoopdoes not guarantee intuitive error messages. But I guarantee that in the next ten minutes you will either learn or be reminded of a fewtips to make your next debugging session more effective.
  2. Specifying the query bypasses the direct connector