SlideShare a Scribd company logo
1 of 29
Download to read offline
Apache Sqoop:
                        Highlights of Sqoop 2


   Kathleen Ting (kathleen at cloudera dot com)
   Alexander Alten-Lorenz (alexander at cloudera dot com)

                             May 2012



Wednesday, May 23, 12
What is Sqoop?


     • Bulk data transfer tool
           o   Import/Export from relational database, enterprise
               data warehouse, NoSQL systems
           o   Populate tables in Hive, HBase
           o   Schedule Oozie automated import/export tasks
           o   Support plugins via Connector based architecture




Wednesday, May 23, 12
Sqoop 1 Architecture




Wednesday, May 23, 12
Sqoop 1 Challenges


      • Cryptic, contextual command line arguments
      • Tight coupling between data transfer and
        serialization format
      • Security concerns with openly shared
        credentials
      • Not easy to manage config/install
      • Not easy to monitor map job
      • Connectors are forced to follow JDBC model




Wednesday, May 23, 12
Sqoop 2 Architecture




Wednesday, May 23, 12
Agenda
     • Ease of Use
        o Sqoop 1: Client-side Tool
        o Sqoop 2: Sqoop as a Service
        o Client Interface
        o Sqoop 1: Service Level Integration
        o Sqoop 2: Service Level Integration
     • Ease of Extension
        o Sqoop 1: Implementing Connectors
        o Sqoop 2: Implementing Connectors
        o Sqoop 1: Using Connectors
        o Sqoop 2: Using Connectors
     • Security
        o Sqoop 1: Security
        o Sqoop 2: Security
        o Sqoop 1: Accessing External Systems
        o Sqoop 2: Accessing External Systems
        o Sqoop 1: Resource Management
        o Sqoop 2: Resource Management

Wednesday, May 23, 12
Sqoop 1: Client-side Tool



       • Sqoop 1 is a client-side tool
             o   Client-side installation + configuration
                    Connectors locally installed
                    Local configuration, requiring root privileges
                    JDBC drivers needed locally
                    Database connectivity needed locally




Wednesday, May 23, 12
Sqoop 2: Sqoop as a Service

      • Server-side installation + configuration
                  Connectors configured in one place, managed by Admin/
                   run by Operator
                  JDBC drivers in one place
                  Database connectivity needed on the server




Wednesday, May 23, 12
Client Interface

      • Sqoop 1 client interface:
           o    Command-Line Interface (CLI) based, thus scriptable



      • Sqoop 2 client interface:
            o   CLI based, thus scriptable
            o   Web based, thus accessible
            o   REST API exposed for external tool integration




Wednesday, May 23, 12
Sqoop 1: Service Level Integration

      • Hive, HBase
           o   Requires local installation


      • Oozie
           o   von Neumann(esque) integration:
                  Packaged Sqoop as an action
                  Then ran Sqoop from node machines, causing one MR job
                   to be dependent on another MR job
                  Error-prone, difficult to debug




Wednesday, May 23, 12
Sqoop 2: Service Level Integration

      • Hive, HBase
           o   Server-side integration

      • Oozie
           o   REST API integration




Wednesday, May 23, 12
Ease of Use (summary)



     Sqoop 1                           Sqoop 2
     Client-side install               Server-side install
     CLI based                         CLI + Web based
     Client access to Hive, HBase      Server access to Hive, HBase
     Oozie and Sqoop tightly coupled   Oozie finds REST API




Wednesday, May 23, 12
Agenda
      • Ease of Use
         o Sqoop 1: Client-side Tool
         o Sqoop 2: Sqoop as a Service
         o Client Interface
         o Sqoop 1: Service Level Integration
         o Sqoop 2: Service Level Integration
      • Ease of Extension
         o Sqoop 1: Implementing Connectors
         o Sqoop 2: Implementing Connectors
         o Sqoop 1: Using Connectors
         o Sqoop 2: Using Connectors
      • Security
         o Sqoop 1: Security
         o Sqoop 2: Security
         o Sqoop 1: Accessing External Systems
         o Sqoop 2: Accessing External Systems
         o Sqoop 1: Resource Management
         o Sqoop 2: Resource Management
Wednesday, May 23, 12
Sqoop 1: Implementing Connectors


        • Connectors forced to follow JDBC model
              o   Connectors limited/required to use common JDBC
                  vocabulary (URL, database, table, etc)

        • Connectors must implement all Sqoop
          functionality that they want to support
              o   New functionality not avail for old connectors




Wednesday, May 23, 12
Sqoop 2: Implementing Connectors

      • Connectors are not restricted to JDBC model
           o   Connectors can define own vocabulary

      • Common functionality abstracted out of
        connectors
           o   Connectors only responsible for data transport
           o   Common Reduce phase implements functionality
           o   Ensures that connectors benefit from future dev of
               functionality




Wednesday, May 23, 12
Different Options, Different Results

  Which is running MySQL?

  $ sqoop import --connect jdbc:mysql://localhost/db 
  --username foo --table TEST

  $ sqoop import --connect jdbc:mysql://localhost/db 
  --driver com.mysql.jdbc.Driver --username foo --table
  TEST

        – Different options can lead to unpredictable results

        – Sqoop 2 requires explicit selection of connector thus
          disambiguating the process




Wednesday, May 23, 12
Sqoop 1: Using Connectors
      • Choice of connector is implicit
           o   In a simple case, based on the URL in the --connect
               string used to access the database
           o   Specification of different options can lead to different
               connector selection
           o   Error-prone but good for power users
      • Requires knowledge of database idiosyncrasies
           o   e.g. Couchbase doesn’t need to specify a table
               name, which is required causing --table to get
               overloaded as backfill or dump operation
           o   e.g. --null-string representation not supported by all
               connectors
      • Functionality limited to what the implicitly chosen
        connector supports
Wednesday, May 23, 12
Sqoop 2: Using Connectors

      • User makes explicit connector choice
           o   Less error-prone, more predictable
      • User need not be aware of the functionality of all
        connectors
           o   Couchbase users need not care that other
               connectors use tables
      • Common functionality available to all connectors
           o   Connectors need not worry about downstream
               functionality, transformations, integration with other
               systems




Wednesday, May 23, 12
Ease of Extension (summary)



     Sqoop 1                                   Sqoop 2

     Connector forced to follow JDBC model     Connector given free rein
     Connectors must implement functionality   Connectors benefit from common framework
                                               of functionality
     Connector selection is implicit           Connector selection is explicit




Wednesday, May 23, 12
Agenda
      • Ease of Use
         o Sqoop 1: Client-side Tool
         o Sqoop 2: Sqoop as a Service
         o Client Interface
         o Sqoop 1: Service Level Integration
         o Sqoop 2: Service Level Integration
      • Ease of Extension
         o Sqoop 1: Implementing Connectors
         o Sqoop 2: Implementing Connectors
         o Sqoop 1: Using Connectors
         o Sqoop 2: Using Connectors
      • Security
         o Sqoop 1: Security
         o Sqoop 2: Security
         o Sqoop 1: Accessing External Systems
         o Sqoop 2: Accessing External Systems
         o Sqoop 1: Resource Management
         o Sqoop 2: Resource Management
Wednesday, May 23, 12
Sqoop 1: Security

      • Inherits/propagates Kerberos principal for the
        jobs it launches

      • Access to files on HDFS can be controlled via
        HDFS security

      • Sqoop operates as command line Hadoop client

      • No support for securing access to external
        systems
           o   E.g. relational database


Wednesday, May 23, 12
Sqoop 2: Security
     • Inherits/propagates Kerberos principal for the
       jobs it launches

     • Access to files on HDFS can be controlled via
       HDFS security

     • Sqoop operates as server based application

     • Support for securing access to external systems
       via role-based access to Connection objects
           o   Admins create/edit/delete Connections
           o   Operators use Connections

     • Audit trail logging
Wednesday, May 23, 12
Sqoop 1: Accessing External Systems

      • Every invocation requires necessary credentials
        to access external systems (e.g. relational
        database)
           o   Workaround: Admin creates a limited access user in
               lieu of giving out password
                  Doesn't scale
                  Permission granularity is hard to obtain

      • Hard to prevent misuse once credentials are
        given




Wednesday, May 23, 12
Sqoop 2: Accessing External Systems

      • Sqoop 2 introduces Connections as First-Class
        Objects
           o   Connection encompass credentials
           o   Connections created once, then used many times for
               various import/export Jobs
           o   Connections created by Admin, used by Operator
                  Safeguard credential access from end user

      • Restrict scope: connections can be restricted
        based on operation (import/export)
           o   Operators cannot abuse credentials



Wednesday, May 23, 12
Sqoop 1: Resource Management

      • No explicit resource management policy
           o   User specifies number of map jobs to run
           o   Can’t throttle load on external systems
           o   Can overwhelmed a cluster
           o   Can acquire the whole stack memory on complex
               queries




Wednesday, May 23, 12
Sqoop 2: Resource Management

      • Connections allow specification of resource
        policy
           o   Admin can limit the total number of physical
               Connections open at one time
           o   Connections can be disabled




Wednesday, May 23, 12
Security (summary)



      Sqoop 1                                   Sqoop 2

      Support only for Hadoop security          Support for Hadoop security and role-based
                                                access control to external systems
      High risk of abusing access to external   Reduced risk of abusing access to external
      systems                                   systems
      No resource management policy             Resource management policy




Wednesday, May 23, 12
Takeaway

    Sqoop 2 Highights:
      • Ease of Use: Sqoop as a Service

      • Ease of Extension: Connectors benefit from shared
        functionality

      • Security: Connections as First-Class objects, Role-
        based Security




Wednesday, May 23, 12
Current Status: work-in-progress

      • Sqoop 2 Development: https://
        issues.apache.org/jira/browse/SQOOP-365

      • Sqoop 2 Design: https://cwiki.apache.org/
        confluence/display/SQOOP/Sqoop+2




Wednesday, May 23, 12

More Related Content

What's hot

Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101Whiteklay
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka SecurityDataWorks Summit
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseMingliang Liu
 
Schema registry
Schema registrySchema registry
Schema registryWhiteklay
 

What's hot (20)

YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Schema registry
Schema registrySchema registry
Schema registry
 

Viewers also liked

Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop MeetupSqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetupaaamase
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
 

Viewers also liked (6)

Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop MeetupSqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 

Similar to Highlights Of Sqoop2

May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.pptSQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.pptAjajKhan23
 
Navigating OpenStack Networking
Navigating OpenStack NetworkingNavigating OpenStack Networking
Navigating OpenStack NetworkingPLUMgrid
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
OpenStack Networking and Automation
OpenStack Networking and AutomationOpenStack Networking and Automation
OpenStack Networking and AutomationAdam Johnson
 
An Evaluation of OpenStack Deployment Frameworks
An Evaluation of OpenStack Deployment FrameworksAn Evaluation of OpenStack Deployment Frameworks
An Evaluation of OpenStack Deployment Frameworksshane_gibson
 
NaaS in OpenStack - CloudCamp Moscow
NaaS in OpenStack - CloudCamp MoscowNaaS in OpenStack - CloudCamp Moscow
NaaS in OpenStack - CloudCamp MoscowIlya Alekseyev
 
Optimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerOptimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerRahul Krishna Upadhyaya
 
Scientific Computing - Hardware
Scientific Computing - HardwareScientific Computing - Hardware
Scientific Computing - Hardwarejalle6
 
Latest Advance Animated Ado.Net With JDBC
Latest Advance Animated Ado.Net With JDBC Latest Advance Animated Ado.Net With JDBC
Latest Advance Animated Ado.Net With JDBC Tarun Jain
 
Optimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerOptimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerAnanth Padmanabhan
 
Optimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerOptimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerSatya Sanjibani Routray
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based InfrastructuresMesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructurespierrecdn -
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networkingmarkmcclain
 

Similar to Highlights Of Sqoop2 (20)

May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.pptSQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
SQOOP AND IOTS ARCHITECTURE AND ITS APPLICATION.ppt
 
Navigating OpenStack Networking
Navigating OpenStack NetworkingNavigating OpenStack Networking
Navigating OpenStack Networking
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
OpenStack Networking and Automation
OpenStack Networking and AutomationOpenStack Networking and Automation
OpenStack Networking and Automation
 
An Evaluation of OpenStack Deployment Frameworks
An Evaluation of OpenStack Deployment FrameworksAn Evaluation of OpenStack Deployment Frameworks
An Evaluation of OpenStack Deployment Frameworks
 
NaaS in OpenStack - CloudCamp Moscow
NaaS in OpenStack - CloudCamp MoscowNaaS in OpenStack - CloudCamp Moscow
NaaS in OpenStack - CloudCamp Moscow
 
Optimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerOptimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using docker
 
Scientific Computing - Hardware
Scientific Computing - HardwareScientific Computing - Hardware
Scientific Computing - Hardware
 
Latest Advance Animated Ado.Net With JDBC
Latest Advance Animated Ado.Net With JDBC Latest Advance Animated Ado.Net With JDBC
Latest Advance Animated Ado.Net With JDBC
 
Optimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerOptimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using docker
 
Optimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using dockerOptimising nfv service chains on open stack using docker
Optimising nfv service chains on open stack using docker
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based InfrastructuresMesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 

More from Alexander Alten-Lorenz (12)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Highlights Of Sqoop2

  • 1. Apache Sqoop: Highlights of Sqoop 2 Kathleen Ting (kathleen at cloudera dot com) Alexander Alten-Lorenz (alexander at cloudera dot com) May 2012 Wednesday, May 23, 12
  • 2. What is Sqoop? • Bulk data transfer tool o Import/Export from relational database, enterprise data warehouse, NoSQL systems o Populate tables in Hive, HBase o Schedule Oozie automated import/export tasks o Support plugins via Connector based architecture Wednesday, May 23, 12
  • 4. Sqoop 1 Challenges • Cryptic, contextual command line arguments • Tight coupling between data transfer and serialization format • Security concerns with openly shared credentials • Not easy to manage config/install • Not easy to monitor map job • Connectors are forced to follow JDBC model Wednesday, May 23, 12
  • 6. Agenda • Ease of Use o Sqoop 1: Client-side Tool o Sqoop 2: Sqoop as a Service o Client Interface o Sqoop 1: Service Level Integration o Sqoop 2: Service Level Integration • Ease of Extension o Sqoop 1: Implementing Connectors o Sqoop 2: Implementing Connectors o Sqoop 1: Using Connectors o Sqoop 2: Using Connectors • Security o Sqoop 1: Security o Sqoop 2: Security o Sqoop 1: Accessing External Systems o Sqoop 2: Accessing External Systems o Sqoop 1: Resource Management o Sqoop 2: Resource Management Wednesday, May 23, 12
  • 7. Sqoop 1: Client-side Tool • Sqoop 1 is a client-side tool o Client-side installation + configuration  Connectors locally installed  Local configuration, requiring root privileges  JDBC drivers needed locally  Database connectivity needed locally Wednesday, May 23, 12
  • 8. Sqoop 2: Sqoop as a Service • Server-side installation + configuration  Connectors configured in one place, managed by Admin/ run by Operator  JDBC drivers in one place  Database connectivity needed on the server Wednesday, May 23, 12
  • 9. Client Interface • Sqoop 1 client interface: o Command-Line Interface (CLI) based, thus scriptable • Sqoop 2 client interface: o CLI based, thus scriptable o Web based, thus accessible o REST API exposed for external tool integration Wednesday, May 23, 12
  • 10. Sqoop 1: Service Level Integration • Hive, HBase o Requires local installation • Oozie o von Neumann(esque) integration:  Packaged Sqoop as an action  Then ran Sqoop from node machines, causing one MR job to be dependent on another MR job  Error-prone, difficult to debug Wednesday, May 23, 12
  • 11. Sqoop 2: Service Level Integration • Hive, HBase o Server-side integration • Oozie o REST API integration Wednesday, May 23, 12
  • 12. Ease of Use (summary) Sqoop 1 Sqoop 2 Client-side install Server-side install CLI based CLI + Web based Client access to Hive, HBase Server access to Hive, HBase Oozie and Sqoop tightly coupled Oozie finds REST API Wednesday, May 23, 12
  • 13. Agenda • Ease of Use o Sqoop 1: Client-side Tool o Sqoop 2: Sqoop as a Service o Client Interface o Sqoop 1: Service Level Integration o Sqoop 2: Service Level Integration • Ease of Extension o Sqoop 1: Implementing Connectors o Sqoop 2: Implementing Connectors o Sqoop 1: Using Connectors o Sqoop 2: Using Connectors • Security o Sqoop 1: Security o Sqoop 2: Security o Sqoop 1: Accessing External Systems o Sqoop 2: Accessing External Systems o Sqoop 1: Resource Management o Sqoop 2: Resource Management Wednesday, May 23, 12
  • 14. Sqoop 1: Implementing Connectors • Connectors forced to follow JDBC model o Connectors limited/required to use common JDBC vocabulary (URL, database, table, etc) • Connectors must implement all Sqoop functionality that they want to support o New functionality not avail for old connectors Wednesday, May 23, 12
  • 15. Sqoop 2: Implementing Connectors • Connectors are not restricted to JDBC model o Connectors can define own vocabulary • Common functionality abstracted out of connectors o Connectors only responsible for data transport o Common Reduce phase implements functionality o Ensures that connectors benefit from future dev of functionality Wednesday, May 23, 12
  • 16. Different Options, Different Results Which is running MySQL? $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST $ sqoop import --connect jdbc:mysql://localhost/db --driver com.mysql.jdbc.Driver --username foo --table TEST – Different options can lead to unpredictable results – Sqoop 2 requires explicit selection of connector thus disambiguating the process Wednesday, May 23, 12
  • 17. Sqoop 1: Using Connectors • Choice of connector is implicit o In a simple case, based on the URL in the --connect string used to access the database o Specification of different options can lead to different connector selection o Error-prone but good for power users • Requires knowledge of database idiosyncrasies o e.g. Couchbase doesn’t need to specify a table name, which is required causing --table to get overloaded as backfill or dump operation o e.g. --null-string representation not supported by all connectors • Functionality limited to what the implicitly chosen connector supports Wednesday, May 23, 12
  • 18. Sqoop 2: Using Connectors • User makes explicit connector choice o Less error-prone, more predictable • User need not be aware of the functionality of all connectors o Couchbase users need not care that other connectors use tables • Common functionality available to all connectors o Connectors need not worry about downstream functionality, transformations, integration with other systems Wednesday, May 23, 12
  • 19. Ease of Extension (summary) Sqoop 1 Sqoop 2 Connector forced to follow JDBC model Connector given free rein Connectors must implement functionality Connectors benefit from common framework of functionality Connector selection is implicit Connector selection is explicit Wednesday, May 23, 12
  • 20. Agenda • Ease of Use o Sqoop 1: Client-side Tool o Sqoop 2: Sqoop as a Service o Client Interface o Sqoop 1: Service Level Integration o Sqoop 2: Service Level Integration • Ease of Extension o Sqoop 1: Implementing Connectors o Sqoop 2: Implementing Connectors o Sqoop 1: Using Connectors o Sqoop 2: Using Connectors • Security o Sqoop 1: Security o Sqoop 2: Security o Sqoop 1: Accessing External Systems o Sqoop 2: Accessing External Systems o Sqoop 1: Resource Management o Sqoop 2: Resource Management Wednesday, May 23, 12
  • 21. Sqoop 1: Security • Inherits/propagates Kerberos principal for the jobs it launches • Access to files on HDFS can be controlled via HDFS security • Sqoop operates as command line Hadoop client • No support for securing access to external systems o E.g. relational database Wednesday, May 23, 12
  • 22. Sqoop 2: Security • Inherits/propagates Kerberos principal for the jobs it launches • Access to files on HDFS can be controlled via HDFS security • Sqoop operates as server based application • Support for securing access to external systems via role-based access to Connection objects o Admins create/edit/delete Connections o Operators use Connections • Audit trail logging Wednesday, May 23, 12
  • 23. Sqoop 1: Accessing External Systems • Every invocation requires necessary credentials to access external systems (e.g. relational database) o Workaround: Admin creates a limited access user in lieu of giving out password  Doesn't scale  Permission granularity is hard to obtain • Hard to prevent misuse once credentials are given Wednesday, May 23, 12
  • 24. Sqoop 2: Accessing External Systems • Sqoop 2 introduces Connections as First-Class Objects o Connection encompass credentials o Connections created once, then used many times for various import/export Jobs o Connections created by Admin, used by Operator  Safeguard credential access from end user • Restrict scope: connections can be restricted based on operation (import/export) o Operators cannot abuse credentials Wednesday, May 23, 12
  • 25. Sqoop 1: Resource Management • No explicit resource management policy o User specifies number of map jobs to run o Can’t throttle load on external systems o Can overwhelmed a cluster o Can acquire the whole stack memory on complex queries Wednesday, May 23, 12
  • 26. Sqoop 2: Resource Management • Connections allow specification of resource policy o Admin can limit the total number of physical Connections open at one time o Connections can be disabled Wednesday, May 23, 12
  • 27. Security (summary) Sqoop 1 Sqoop 2 Support only for Hadoop security Support for Hadoop security and role-based access control to external systems High risk of abusing access to external Reduced risk of abusing access to external systems systems No resource management policy Resource management policy Wednesday, May 23, 12
  • 28. Takeaway Sqoop 2 Highights: • Ease of Use: Sqoop as a Service • Ease of Extension: Connectors benefit from shared functionality • Security: Connections as First-Class objects, Role- based Security Wednesday, May 23, 12
  • 29. Current Status: work-in-progress • Sqoop 2 Development: https:// issues.apache.org/jira/browse/SQOOP-365 • Sqoop 2 Design: https://cwiki.apache.org/ confluence/display/SQOOP/Sqoop+2 Wednesday, May 23, 12