SlideShare a Scribd company logo
HLoader
Data Ingestion
from Oracle Databases
to Hadoop Clusters
Automatically
On-Demand
8/13/2015 HLoader – A. Bose, D. Stein 2
HL
Problem
– Control and monitor data transfer
using Sqoop, a CLI tool for bulk data transfer
– Two in one
two distinct Summer Student task proposals for basically
the same job
8/13/2015 HLoader – A. Bose, D. Stein 3
Problem
– Frequent requests
different users with different but similar use cases
ATLAS Job Monitoring, CMS Job Monitoring, CMS data popularity, ACCLOG
– Manually executed job
that can be partially automated
8/13/2015 HLoader – A. Bose, D. Stein 4
Requirements
– Run jobs…
… incrementally
… communicate with
the end user
– Handle failures
retry, notify, prevent
– Be secure, stay safe
authorize, authenticate the users
without exchanging passwords
– Use what’s provided
Run on the CERN-provided
infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 5
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution overview
8/13/2015 HLoader – A. Bose, D. Stein 6
1. Provided infrastructure
Oracle Databases and Hadoop
Clusters
2. Transfer Data
the user wants to transfer
data, so they create a new job:
what, when, where to transfer
3. Execute the transfer on
behalf of the user
schedule and execute the job
at the requested time (also
inform the user of the status)
4. Update if needed
if the user requested
incremental updates, schedule
it after the given interval
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution security
8/13/2015 HLoader – A. Bose, D. Stein 7
1. CERN SSO
authentication
no password exchange
2. Authorization
only available (ownership) and
enabled (configured) Oracle
servers could be used
3. Kerberos SSH
tunneling
separate user to log in to the
clusters, without password
4. Secure password input
other users can not see the
password as plaintext
anywhere
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution security
8/13/2015 HLoader – A. Bose, D. Stein 7
1. CERN SSO
authentication
no password exchange
2. Authorization
only available (ownership) and
enabled (configured) Oracle
servers could be used
3. Kerberos SSH
tunneling
separate user to log in to the
clusters, without password
4. Secure password input
other users can not see the
password as plaintext
anywhere
1
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution security
8/13/2015 HLoader – A. Bose, D. Stein 7
1. CERN SSO
authentication
no password exchange
2. Authorization
only available (ownership) and
enabled (configured) Oracle
servers could be used
3. Kerberos SSH
tunneling
separate user to log in to the
clusters, without password
4. Secure password input
other users can not see the
password as plaintext
anywhere
2
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution security
8/13/2015 HLoader – A. Bose, D. Stein 7
1. CERN SSO
authentication
no password exchange
2. Authorization
only available (ownership) and
enabled (configured) Oracle
servers could be used
3. Kerberos SSH
tunneling
separate user to log in to the
clusters, without password
4. Secure password input
other users can not see the
password as plaintext
anywhere
3
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution security
8/13/2015 HLoader – A. Bose, D. Stein 7
1. CERN SSO
authentication
no password exchange
2. Authorization
only available (ownership) and
enabled (configured) Oracle
servers could be used
3. Kerberos SSH
tunneling
separate user to log in to the
clusters, without password
4. Secure password input
other users can not see the
password as plaintext
anywhere
4
4
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution security
8/13/2015 HLoader – A. Bose, D. Stein 7
1. CERN SSO
authentication
no password exchange
2. Authorization
only available (ownership) and
enabled (configured) Oracle
servers could be used
3. Kerberos SSH
tunneling
separate user to log in to the
clusters, without password
4. Secure password input
other users can not see the
password as plaintext
anywhere
1
2 3
4
4
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
1
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
2
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
3
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
4
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
5
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution modularity
8/13/2015 HLoader – A. Bose, D. Stein 8
1. DB connector agnostic
SQLAlchemy supports several
dialects, also other connectors
can be integrated
2. Interchangeable
scheduler
based on the servers and the
needed schedule complexity
3. Flexible communication
with Hadoop
besides commands through SSH,
Oozie could also be used
4. Client communicating
using REST API
5. Changeable Sqoop JDBC
driver
normal or fast connectors if
possible
1
2
3
4
5
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 9
1. PostgreSQL On-Demand
with Postgre and SQLAlchemy
connector
2. Central WebServices
DFS | Windows > IIS 8.5 >
FastCGI > Python 2.7 > Flask
3. Agent running separated
on DB locally managed server,
OpenStack or WebServices (TBD)
4. Client hosted with the
REST API
for easy usage and update, could
be separate
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 9
1. PostgreSQL On-Demand
with Postgre and SQLAlchemy
connector
2. Central WebServices
DFS | Windows > IIS 8.5 >
FastCGI > Python 2.7 > Flask
3. Agent running separated
on DB locally managed server,
OpenStack or WebServices (TBD)
4. Client hosted with the
REST API
for easy usage and update, could
be separate
1
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 9
1. PostgreSQL On-Demand
with Postgre and SQLAlchemy
connector
2. Central WebServices
DFS | Windows > IIS 8.5 >
FastCGI > Python 2.7 > Flask
3. Agent running separated
on DB locally managed server,
OpenStack or WebServices (TBD)
4. Client hosted with the
REST API
for easy usage and update, could
be separate
2
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 9
1. PostgreSQL On-Demand
with Postgre and SQLAlchemy
connector
2. Central WebServices
DFS | Windows > IIS 8.5 >
FastCGI > Python 2.7 > Flask
3. Agent running separated
on DB locally managed server,
OpenStack or WebServices (TBD)
4. Client hosted with the
REST API
for easy usage and update, could
be separate
3
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 9
1. PostgreSQL On-Demand
with Postgre and SQLAlchemy
connector
2. Central WebServices
DFS | Windows > IIS 8.5 >
FastCGI > Python 2.7 > Flask
3. Agent running separated
on DB locally managed server,
OpenStack or WebServices (TBD)
4. Client hosted with the
REST API
for easy usage and update, could
be separate
4
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution infrastructure
8/13/2015 HLoader – A. Bose, D. Stein 9
1. PostgreSQL On-Demand
with Postgre and SQLAlchemy
connector
2. Central WebServices
DFS | Windows > IIS 8.5 >
FastCGI > Python 2.7 > Flask
3. Agent running separated
on DB locally managed server,
OpenStack or WebServices (TBD)
4. Client hosted with the
REST API
for easy usage and update, could
be separate
1
2 3
4
Solution meta DB
8/13/2015 HLoader – A. Bose, D. Stein 10
HL_SERVERS
HL_CLUSTERS
HL_JOBS
HL_TRANSFERS
HL_LOGS
server_idPK
server_address
server_name
cluster_idPK
cluster_address
cluster_name
job_idPK
source_server_idFK
source_schema_name
source_object_name
destination_cluster_idFK
destination_path
owner_username
sqoop_nmap
sqoop_splitting_column
sqoop_incremental_meth
od
sqoop_direct
start_time
interval
job_last_update
transfer_idPK
scheduler_transfer_id
job_idFK log_idPK
transfer_idFK
log_source
transfer_status
transfer_start
transfer_last_update
last_modified_value
log_path
log_content
Solution restrictions
8/13/2015 HLoader – A. Bose, D. Stein 11
– Only allow tables and views to be imported
the DB is responsible for evaluating and checking the queries
– Selected (preconfigured) source databases
gradual introduction for new users
– Preset destination folder structure
with restricted access rights, avoiding collision, unauthorized access
– Basic Sqoop command logic (for now)
eg., with primary key, only one PK attribute
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
Solution current state
8/13/2015 HLoader – A. Bose, D. Stein 12
1. Client Kate
in progress, meanwhile the
REST interface can be used
2. REST API Daniel
almost ready, missing the new
job processing interface
3. Agent Scheduling Ani
basically ready, can schedule
jobs and update itself after job
description modifications
4. Agent Runners Daniel
working for initial imports,
soon to be able to execute
incremental updates
partially working SSH and
REST monitorig
Solution current state
8/13/2015 HLoader – A. Bose, D. Stein 12
Solution future work
8/13/2015 HLoader – A. Bose, D. Stein 13
– Support more database connectors SQLA/NoSQL
– Support alternative runners like Oozie
– Prepare for Sqoop 2
– Integrate with Hive
– Resolve restrictions
– Release on GitHub with an Open Source license
Summary
– Easily expandable framework and service
for transferring data from Oracle to Hadoop
– Designed with automation in mind
minimal administrator intervention needed
– Service built for easy usage
easy to use for the routine jobs
8/13/2015 HLoader – A. Bose, D. Stein 14
Workflow tools
– GitLab
– JIRA
– Slack
– Jenkins CI
8/13/2015 HLoader – A. Bose, D. Stein 32
Contributors
– Anirudha Bose
– Dániel Stein
– Antonio Romero Marin
– Domenico Giordano
– Kacper Surdy
– Katarzyna Maria Dziedziniewicz-Wójcik
– Manuel Martín Márquez
– Zbigniew Baranowski
8/13/2015 HLoader – A. Bose, D. Stein 15
Client
Meta DB
REST API Agent
Oracle Databases
FIM
Hadoop Clusters
…
HLoader
8/13/2015 HLoader – A. Bose, D. Stein 16
HL

More Related Content

What's hot

.NET Core, ASP.NET Core Course, Session 14
.NET Core, ASP.NET Core Course, Session 14.NET Core, ASP.NET Core Course, Session 14
.NET Core, ASP.NET Core Course, Session 14
aminmesbahi
 
Java API for WebSocket 1.0: Java EE 7 and GlassFish
Java API for WebSocket 1.0: Java EE 7 and GlassFishJava API for WebSocket 1.0: Java EE 7 and GlassFish
Java API for WebSocket 1.0: Java EE 7 and GlassFish
Arun Gupta
 
Introduction to OData
Introduction to ODataIntroduction to OData
Introduction to OData
Mindfire Solutions
 
Rollin onj Rubyv3
Rollin onj Rubyv3Rollin onj Rubyv3
Rollin onj Rubyv3
Oracle
 
Share point review qustions
Share point review qustionsShare point review qustions
Share point review qustions
than sare
 
Java IO, Serialization
Java IO, Serialization Java IO, Serialization
Java IO, Serialization
Hitesh-Java
 
REST-API introduction for developers
REST-API introduction for developersREST-API introduction for developers
REST-API introduction for developers
Patrick Savalle
 
Writing RESTful Web Services
Writing RESTful Web ServicesWriting RESTful Web Services
Writing RESTful Web Services
Paul Boocock
 
.NET Core, ASP.NET Core Course, Session 17
.NET Core, ASP.NET Core Course, Session 17.NET Core, ASP.NET Core Course, Session 17
.NET Core, ASP.NET Core Course, Session 17
aminmesbahi
 
RESTing with JAX-RS
RESTing with JAX-RSRESTing with JAX-RS
RESTing with JAX-RS
Ezewuzie Emmanuel Okafor
 
JAX-RS 2.0: RESTful Web Services
JAX-RS 2.0: RESTful Web ServicesJAX-RS 2.0: RESTful Web Services
JAX-RS 2.0: RESTful Web Services
Arun Gupta
 
Odata
OdataOdata
Getting Started with SQL Server Compact Edition 3.51
Getting Started with SQL Server Compact Edition 3.51Getting Started with SQL Server Compact Edition 3.51
Getting Started with SQL Server Compact Edition 3.51
Mark Ginnebaugh
 
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
Edward Burns
 
OData Introduction and Impact on API Design (Webcast)
OData Introduction and Impact on API Design (Webcast)OData Introduction and Impact on API Design (Webcast)
OData Introduction and Impact on API Design (Webcast)
Apigee | Google Cloud
 
24 collections framework interview questions
24 collections framework interview questions24 collections framework interview questions
24 collections framework interview questions
Arun Vasanth
 
Servlet programming
Servlet programmingServlet programming
Servlet programming
Mallikarjuna G D
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
Jeff Smith
 
TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...
TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...
TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...
Trivadis
 
Building RESTful Applications with OData
Building RESTful Applications with ODataBuilding RESTful Applications with OData
Building RESTful Applications with OData
Todd Anglin
 

What's hot (20)

.NET Core, ASP.NET Core Course, Session 14
.NET Core, ASP.NET Core Course, Session 14.NET Core, ASP.NET Core Course, Session 14
.NET Core, ASP.NET Core Course, Session 14
 
Java API for WebSocket 1.0: Java EE 7 and GlassFish
Java API for WebSocket 1.0: Java EE 7 and GlassFishJava API for WebSocket 1.0: Java EE 7 and GlassFish
Java API for WebSocket 1.0: Java EE 7 and GlassFish
 
Introduction to OData
Introduction to ODataIntroduction to OData
Introduction to OData
 
Rollin onj Rubyv3
Rollin onj Rubyv3Rollin onj Rubyv3
Rollin onj Rubyv3
 
Share point review qustions
Share point review qustionsShare point review qustions
Share point review qustions
 
Java IO, Serialization
Java IO, Serialization Java IO, Serialization
Java IO, Serialization
 
REST-API introduction for developers
REST-API introduction for developersREST-API introduction for developers
REST-API introduction for developers
 
Writing RESTful Web Services
Writing RESTful Web ServicesWriting RESTful Web Services
Writing RESTful Web Services
 
.NET Core, ASP.NET Core Course, Session 17
.NET Core, ASP.NET Core Course, Session 17.NET Core, ASP.NET Core Course, Session 17
.NET Core, ASP.NET Core Course, Session 17
 
RESTing with JAX-RS
RESTing with JAX-RSRESTing with JAX-RS
RESTing with JAX-RS
 
JAX-RS 2.0: RESTful Web Services
JAX-RS 2.0: RESTful Web ServicesJAX-RS 2.0: RESTful Web Services
JAX-RS 2.0: RESTful Web Services
 
Odata
OdataOdata
Odata
 
Getting Started with SQL Server Compact Edition 3.51
Getting Started with SQL Server Compact Edition 3.51Getting Started with SQL Server Compact Edition 3.51
Getting Started with SQL Server Compact Edition 3.51
 
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
 
OData Introduction and Impact on API Design (Webcast)
OData Introduction and Impact on API Design (Webcast)OData Introduction and Impact on API Design (Webcast)
OData Introduction and Impact on API Design (Webcast)
 
24 collections framework interview questions
24 collections framework interview questions24 collections framework interview questions
24 collections framework interview questions
 
Servlet programming
Servlet programmingServlet programming
Servlet programming
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
 
TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...
TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...
TechEvent 2019: Oracle to PostgreSQL - a Travel Guide from Practice; Roland S...
 
Building RESTful Applications with OData
Building RESTful Applications with ODataBuilding RESTful Applications with OData
Building RESTful Applications with OData
 

Similar to HLoader – Automated Incremental Hadoop Data Loader Service and Framework

Kotlin server side frameworks
Kotlin server side frameworksKotlin server side frameworks
Kotlin server side frameworks
Ken Yee
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overview
jhierrot
 
Long Haul Hadoop
Long Haul HadoopLong Haul Hadoop
Long Haul Hadoop
Steve Loughran
 
Multi-Tenancy: Da Teoria à Prática, do DB ao Middleware
Multi-Tenancy: Da Teoria à Prática, do DB ao MiddlewareMulti-Tenancy: Da Teoria à Prática, do DB ao Middleware
Multi-Tenancy: Da Teoria à Prática, do DB ao Middleware
Bruno Borges
 
Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2
BizTalk360
 
DreamFactory Essentials Webinar
DreamFactory Essentials WebinarDreamFactory Essentials Webinar
DreamFactory Essentials Webinar
DreamFactory
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Yahoo Developer Network
 
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Mack Hardy
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
Sadayuki Furuhashi
 
Creating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleepCreating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleep
Mike Anderson
 
Building Restful Applications Using Php
Building Restful Applications Using PhpBuilding Restful Applications Using Php
Building Restful Applications Using Php
Sudheer Satyanarayana
 
Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle Gaurav Bhardwaj
 
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationAmundsen at Brex and Looker integration
Amundsen at Brex and Looker integration
markgrover
 
Strata Hadoop Hopsworks
Strata Hadoop HopsworksStrata Hadoop Hopsworks
Strata Hadoop Hopsworks
Jim Dowling
 
W3C Linked Data Platform Overview
W3C Linked Data Platform OverviewW3C Linked Data Platform Overview
W3C Linked Data Platform Overview
Steve Speicher
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
The HDF-EOS Tools and Information Center
 
JavaOne2013 Leveraging Linked Data and OSLC
JavaOne2013 Leveraging Linked Data and OSLCJavaOne2013 Leveraging Linked Data and OSLC
JavaOne2013 Leveraging Linked Data and OSLC
Steve Speicher
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series
Tom Laszewski
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
Max Neunhöffer
 

Similar to HLoader – Automated Incremental Hadoop Data Loader Service and Framework (20)

Kotlin server side frameworks
Kotlin server side frameworksKotlin server side frameworks
Kotlin server side frameworks
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overview
 
Long Haul Hadoop
Long Haul HadoopLong Haul Hadoop
Long Haul Hadoop
 
Multi-Tenancy: Da Teoria à Prática, do DB ao Middleware
Multi-Tenancy: Da Teoria à Prática, do DB ao MiddlewareMulti-Tenancy: Da Teoria à Prática, do DB ao Middleware
Multi-Tenancy: Da Teoria à Prática, do DB ao Middleware
 
Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2Azure DocumentDB for Healthcare Integration - Part 2
Azure DocumentDB for Healthcare Integration - Part 2
 
DreamFactory Essentials Webinar
DreamFactory Essentials WebinarDreamFactory Essentials Webinar
DreamFactory Essentials Webinar
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
 
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Creating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleepCreating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleep
 
Building Restful Applications Using Php
Building Restful Applications Using PhpBuilding Restful Applications Using Php
Building Restful Applications Using Php
 
Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle
 
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationAmundsen at Brex and Looker integration
Amundsen at Brex and Looker integration
 
Strata Hadoop Hopsworks
Strata Hadoop HopsworksStrata Hadoop Hopsworks
Strata Hadoop Hopsworks
 
W3C Linked Data Platform Overview
W3C Linked Data Platform OverviewW3C Linked Data Platform Overview
W3C Linked Data Platform Overview
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
JavaOne2013 Leveraging Linked Data and OSLC
JavaOne2013 Leveraging Linked Data and OSLCJavaOne2013 Leveraging Linked Data and OSLC
JavaOne2013 Leveraging Linked Data and OSLC
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
 

More from Dániel Stein

Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
Dániel Stein
 
Forráskódtárak gráfalapú statikus analízise
Forráskódtárak gráfalapú statikus analíziseForráskódtárak gráfalapú statikus analízise
Forráskódtárak gráfalapú statikus analízise
Dániel Stein
 
JavaScript forráskódtárak gráfalapú statikus analízise
JavaScript forráskódtárak gráfalapú statikus analíziseJavaScript forráskódtárak gráfalapú statikus analízise
JavaScript forráskódtárak gráfalapú statikus analízise
Dániel Stein
 
Otthoni DVD nyilvántartó rendszer
Otthoni DVD nyilvántartó rendszerOtthoni DVD nyilvántartó rendszer
Otthoni DVD nyilvántartó rendszer
Dániel Stein
 
Cloud-deployed Model Railway Control System
Cloud-deployed Model Railway Control SystemCloud-deployed Model Railway Control System
Cloud-deployed Model Railway Control System
Dániel Stein
 
Nagyméretű forráskódtárak inkrementális statikus analízise
Nagyméretű forráskódtárak inkrementális statikus analíziseNagyméretű forráskódtárak inkrementális statikus analízise
Nagyméretű forráskódtárak inkrementális statikus analízise
Dániel Stein
 

More from Dániel Stein (6)

Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Forráskódtárak gráfalapú statikus analízise
Forráskódtárak gráfalapú statikus analíziseForráskódtárak gráfalapú statikus analízise
Forráskódtárak gráfalapú statikus analízise
 
JavaScript forráskódtárak gráfalapú statikus analízise
JavaScript forráskódtárak gráfalapú statikus analíziseJavaScript forráskódtárak gráfalapú statikus analízise
JavaScript forráskódtárak gráfalapú statikus analízise
 
Otthoni DVD nyilvántartó rendszer
Otthoni DVD nyilvántartó rendszerOtthoni DVD nyilvántartó rendszer
Otthoni DVD nyilvántartó rendszer
 
Cloud-deployed Model Railway Control System
Cloud-deployed Model Railway Control SystemCloud-deployed Model Railway Control System
Cloud-deployed Model Railway Control System
 
Nagyméretű forráskódtárak inkrementális statikus analízise
Nagyméretű forráskódtárak inkrementális statikus analíziseNagyméretű forráskódtárak inkrementális statikus analízise
Nagyméretű forráskódtárak inkrementális statikus analízise
 

Recently uploaded

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 

Recently uploaded (20)

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 

HLoader – Automated Incremental Hadoop Data Loader Service and Framework

  • 1.
  • 2. HLoader Data Ingestion from Oracle Databases to Hadoop Clusters Automatically On-Demand 8/13/2015 HLoader – A. Bose, D. Stein 2 HL
  • 3. Problem – Control and monitor data transfer using Sqoop, a CLI tool for bulk data transfer – Two in one two distinct Summer Student task proposals for basically the same job 8/13/2015 HLoader – A. Bose, D. Stein 3
  • 4. Problem – Frequent requests different users with different but similar use cases ATLAS Job Monitoring, CMS Job Monitoring, CMS data popularity, ACCLOG – Manually executed job that can be partially automated 8/13/2015 HLoader – A. Bose, D. Stein 4
  • 5. Requirements – Run jobs… … incrementally … communicate with the end user – Handle failures retry, notify, prevent – Be secure, stay safe authorize, authenticate the users without exchanging passwords – Use what’s provided Run on the CERN-provided infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 5
  • 6. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution overview 8/13/2015 HLoader – A. Bose, D. Stein 6 1. Provided infrastructure Oracle Databases and Hadoop Clusters 2. Transfer Data the user wants to transfer data, so they create a new job: what, when, where to transfer 3. Execute the transfer on behalf of the user schedule and execute the job at the requested time (also inform the user of the status) 4. Update if needed if the user requested incremental updates, schedule it after the given interval
  • 7. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution security 8/13/2015 HLoader – A. Bose, D. Stein 7 1. CERN SSO authentication no password exchange 2. Authorization only available (ownership) and enabled (configured) Oracle servers could be used 3. Kerberos SSH tunneling separate user to log in to the clusters, without password 4. Secure password input other users can not see the password as plaintext anywhere
  • 8. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution security 8/13/2015 HLoader – A. Bose, D. Stein 7 1. CERN SSO authentication no password exchange 2. Authorization only available (ownership) and enabled (configured) Oracle servers could be used 3. Kerberos SSH tunneling separate user to log in to the clusters, without password 4. Secure password input other users can not see the password as plaintext anywhere 1
  • 9. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution security 8/13/2015 HLoader – A. Bose, D. Stein 7 1. CERN SSO authentication no password exchange 2. Authorization only available (ownership) and enabled (configured) Oracle servers could be used 3. Kerberos SSH tunneling separate user to log in to the clusters, without password 4. Secure password input other users can not see the password as plaintext anywhere 2
  • 10. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution security 8/13/2015 HLoader – A. Bose, D. Stein 7 1. CERN SSO authentication no password exchange 2. Authorization only available (ownership) and enabled (configured) Oracle servers could be used 3. Kerberos SSH tunneling separate user to log in to the clusters, without password 4. Secure password input other users can not see the password as plaintext anywhere 3
  • 11. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution security 8/13/2015 HLoader – A. Bose, D. Stein 7 1. CERN SSO authentication no password exchange 2. Authorization only available (ownership) and enabled (configured) Oracle servers could be used 3. Kerberos SSH tunneling separate user to log in to the clusters, without password 4. Secure password input other users can not see the password as plaintext anywhere 4 4
  • 12. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution security 8/13/2015 HLoader – A. Bose, D. Stein 7 1. CERN SSO authentication no password exchange 2. Authorization only available (ownership) and enabled (configured) Oracle servers could be used 3. Kerberos SSH tunneling separate user to log in to the clusters, without password 4. Secure password input other users can not see the password as plaintext anywhere 1 2 3 4 4
  • 13. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible
  • 14. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible 1
  • 15. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible 2
  • 16. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible 3
  • 17. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible 4
  • 18. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible 5
  • 19. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution modularity 8/13/2015 HLoader – A. Bose, D. Stein 8 1. DB connector agnostic SQLAlchemy supports several dialects, also other connectors can be integrated 2. Interchangeable scheduler based on the servers and the needed schedule complexity 3. Flexible communication with Hadoop besides commands through SSH, Oozie could also be used 4. Client communicating using REST API 5. Changeable Sqoop JDBC driver normal or fast connectors if possible 1 2 3 4 5
  • 20. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 9 1. PostgreSQL On-Demand with Postgre and SQLAlchemy connector 2. Central WebServices DFS | Windows > IIS 8.5 > FastCGI > Python 2.7 > Flask 3. Agent running separated on DB locally managed server, OpenStack or WebServices (TBD) 4. Client hosted with the REST API for easy usage and update, could be separate
  • 21. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 9 1. PostgreSQL On-Demand with Postgre and SQLAlchemy connector 2. Central WebServices DFS | Windows > IIS 8.5 > FastCGI > Python 2.7 > Flask 3. Agent running separated on DB locally managed server, OpenStack or WebServices (TBD) 4. Client hosted with the REST API for easy usage and update, could be separate 1
  • 22. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 9 1. PostgreSQL On-Demand with Postgre and SQLAlchemy connector 2. Central WebServices DFS | Windows > IIS 8.5 > FastCGI > Python 2.7 > Flask 3. Agent running separated on DB locally managed server, OpenStack or WebServices (TBD) 4. Client hosted with the REST API for easy usage and update, could be separate 2
  • 23. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 9 1. PostgreSQL On-Demand with Postgre and SQLAlchemy connector 2. Central WebServices DFS | Windows > IIS 8.5 > FastCGI > Python 2.7 > Flask 3. Agent running separated on DB locally managed server, OpenStack or WebServices (TBD) 4. Client hosted with the REST API for easy usage and update, could be separate 3
  • 24. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 9 1. PostgreSQL On-Demand with Postgre and SQLAlchemy connector 2. Central WebServices DFS | Windows > IIS 8.5 > FastCGI > Python 2.7 > Flask 3. Agent running separated on DB locally managed server, OpenStack or WebServices (TBD) 4. Client hosted with the REST API for easy usage and update, could be separate 4
  • 25. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution infrastructure 8/13/2015 HLoader – A. Bose, D. Stein 9 1. PostgreSQL On-Demand with Postgre and SQLAlchemy connector 2. Central WebServices DFS | Windows > IIS 8.5 > FastCGI > Python 2.7 > Flask 3. Agent running separated on DB locally managed server, OpenStack or WebServices (TBD) 4. Client hosted with the REST API for easy usage and update, could be separate 1 2 3 4
  • 26. Solution meta DB 8/13/2015 HLoader – A. Bose, D. Stein 10 HL_SERVERS HL_CLUSTERS HL_JOBS HL_TRANSFERS HL_LOGS server_idPK server_address server_name cluster_idPK cluster_address cluster_name job_idPK source_server_idFK source_schema_name source_object_name destination_cluster_idFK destination_path owner_username sqoop_nmap sqoop_splitting_column sqoop_incremental_meth od sqoop_direct start_time interval job_last_update transfer_idPK scheduler_transfer_id job_idFK log_idPK transfer_idFK log_source transfer_status transfer_start transfer_last_update last_modified_value log_path log_content
  • 27. Solution restrictions 8/13/2015 HLoader – A. Bose, D. Stein 11 – Only allow tables and views to be imported the DB is responsible for evaluating and checking the queries – Selected (preconfigured) source databases gradual introduction for new users – Preset destination folder structure with restricted access rights, avoiding collision, unauthorized access – Basic Sqoop command logic (for now) eg., with primary key, only one PK attribute
  • 28. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … Solution current state 8/13/2015 HLoader – A. Bose, D. Stein 12 1. Client Kate in progress, meanwhile the REST interface can be used 2. REST API Daniel almost ready, missing the new job processing interface 3. Agent Scheduling Ani basically ready, can schedule jobs and update itself after job description modifications 4. Agent Runners Daniel working for initial imports, soon to be able to execute incremental updates partially working SSH and REST monitorig
  • 29. Solution current state 8/13/2015 HLoader – A. Bose, D. Stein 12
  • 30. Solution future work 8/13/2015 HLoader – A. Bose, D. Stein 13 – Support more database connectors SQLA/NoSQL – Support alternative runners like Oozie – Prepare for Sqoop 2 – Integrate with Hive – Resolve restrictions – Release on GitHub with an Open Source license
  • 31. Summary – Easily expandable framework and service for transferring data from Oracle to Hadoop – Designed with automation in mind minimal administrator intervention needed – Service built for easy usage easy to use for the routine jobs 8/13/2015 HLoader – A. Bose, D. Stein 14
  • 32. Workflow tools – GitLab – JIRA – Slack – Jenkins CI 8/13/2015 HLoader – A. Bose, D. Stein 32
  • 33. Contributors – Anirudha Bose – Dániel Stein – Antonio Romero Marin – Domenico Giordano – Kacper Surdy – Katarzyna Maria Dziedziniewicz-Wójcik – Manuel Martín Márquez – Zbigniew Baranowski 8/13/2015 HLoader – A. Bose, D. Stein 15
  • 34. Client Meta DB REST API Agent Oracle Databases FIM Hadoop Clusters … HLoader 8/13/2015 HLoader – A. Bose, D. Stein 16 HL