SlideShare a Scribd company logo
1 of 18
Email notifications from HBase data
Hadoop : HBase
Coprocessors and
Oozie jobs
Jinith Joseph
• Introduction to Hbase co processors and Oozie java jobs
• Notifications from HBase using HBase co processors & Oozie jobs
• Implementation on Cloudera cdh 5.4.0 without flume
Requirement
Scope of Coprocessors
• System Level : Coprocessors can be configured to work on every tables and regions
• Table Level : Coprocessors can be configured to work on all regions of a specific table
Types of Coprocessors
• Observer coprocessor ( Acts like triggers in conventional databases )
This allows users to insert custom code by overriding the upcall methods provided by the coprocessor framework. The callback
functions are executed from core HBase code when certain events occur.
• Endpoint coprocessor ( Resembled stored procedures in conventional databases )
User can invoke the end point at any time from the client, which will be executed remotely at the target or all regions and the
results will be returned to the client.
Overview of HBase Coprocessor
Observer Coprocessor
Three types of Observers
• Region Observer
These observers provide hooks to data manipulation events like get , put, delete, scan , etc on HBase tables.
For every table region, there will be an instance of RegionObserver coprocessor. But the scope of the
observers could be set to a specific region or across all regions.
• WAL Observer
Provides hooks for write-ahead log (WAL) related operations. These runs in the context of WAL
processing. This is used for WAL writing and reconstruction events (Eg: Bulk load of HBase tables using HFile).
• Master Observer
These observes provides hook on the DDL operations like create / alter / delete tables. The master observer
runs within the context of HBase master.
Oozie
Apache Oozie is a system for running workflows of dependent jobs, and contains two main
engines :
• workflow engine – This stores and runs workflows composed of different types of Hadoop jobs
• coordinator engine - This runs workflow jobs based on predefined schedules and data availability. It allows the
user to define and execute recurrent and interdependent workflow jobs
The Oozie web UI will be available in the URL : http://namdenode:11000
Using Oozie workflow different tasks can be setup up as a part of workflow including :
• Hive Script, Pig , Spark, Java, Sqoop, MapReduce, Shell, Ssh, HDFS Fs, Email, Streaming, Distcp, etc.
Oozie workflows generally follows a transition Diagram to report its status as :
Start Actions
End
Fail
Error
Success
How to Connect these?
HBase
Data TableData insertion to HBase
Coprocessor
Region Observer Coprocessor
postput() which will be invoked after
the insertion of the data in HBase
table has happened
HBase
Log Table Oozie job
Invoke Oozie job with parameters
on a different thread
Using 3rd Party .jars –
mail.jar and activation.jar
Frame the email content
from parameters
Using SMTP send out the
email to users.
* Might not be an ideal way to run in production environment. Please consider the rate of data and the threads invoked to run Oozie jobs
JAVA
JAVA
Region Observer Coprocessor – postput()
• The client code will be written in Java which would override the upcall methods of coprocessor framework
and implement Observer coprocessor, which will be initiated as soon as there is a data available in the HBase
table.
• The insertion of the data will consume the same time before any coprocessors are finished. So any bulk code
on the Coprocessor (overriding put events), will cost us on inserting data in the HBase tables. So here we
would use have our functionality ( Sending out emails ) on an oozie job which will run on a separate thread
and will be triggered from the HBase observer coprocessor
Oozie workflow – Java job using 3rd Party jars
• The Oozie workflow will contain a java job to send out emails using SMTP, will be invoked using Oozie Java API
• Workflow will have a dedicated folder in HDFS with below structure and files ( Each files described later.. )
- /user/oozie/OozieWFConfigs/emailAppDef
- job.properties ( file to hold the specific properties of the workflow )
- workflow.xml ( the workflow design )
- /lib
- <Our Jar File>.jar, activation.jar , mail.jar
How we use HBase coprocessor & Oozie
Target : To write a simple java program with main class, to send out emails to users using SMTP.
Steps :
1. Create a Simple Java project, with package org.
2. Create a class EmailJava.java as below :
EmailJava.jar Implementation
package org;
public class Emails {
private static byte[] Data;
public static void main(String [] args) {
try {
String barcode = args[0]; String pdfContent = args[1];
BASE64Decoder decoder = new BASE64Decoder();
Data = decoder.decodeBuffer(pdfContent);
Properties props = System.getProperties();
props.put("mail.smtp.host","mail.company.com");
props.put("mail.smtp.auth","true");
Session session = Session.getInstance(props, null);
Message msg = new MimeMessage(session);
msg.setFrom(new InternetAddress("oozie@company.com"));
msg.setRecipients(Message.RecipientType.TO,InternetAddress.parse("<user email address>", false));
msg.setSubject("Your Purchase Receipt "+ barcode );
msg.setHeader("X-Mailer", "Company CRM Communications");
String content = "Thank you for shopping. Please find the attached receipt for your purchase.";
MimeBodyPart textBodyPart = new MimeBodyPart();
textBodyPart.setText(content);
3. Compile the code with mail.jar and activation.jar included in the referenced libraries to create
EmailJava.jar
EmailJava.jar Implementation
MimeBodyPart pdfBodyPart = new MimeBodyPart();
pdfBodyPart.setDataHandler(new DataHandler(new ByteArrayDataSource(Data, "application/pdf")));
pdfBodyPart.setFileName("Receipt_" + barcode + ".pdf");
MimeMultipart mimeMultipart = new MimeMultipart();
mimeMultipart.addBodyPart(textBodyPart);
mimeMultipart.addBodyPart(pdfBodyPart);
msg.setContent(mimeMultipart);
msg.setSentDate(new Date());
SMTPTransport t =
(SMTPTransport)session.getTransport("smtp");
System.out.println("Attempting to connect to the SMTP Server");
t.connect("mail.company.com", "<SMTP_UserName>", "<SMTP_Password>");
System.out.println("Connection succeeded. Attempting to send email.");
t.sendMessage(msg, msg.getAllRecipients());
System.out.println("Response: " + t.getLastServerResponse());
t.close();
}
catch(Exception ex)
{
ex.printStackTrace();
}
}
}
Target : As the oozie workflow have to be triggered from a coprocessor, we will first
implement an oozie workflow to send out emails to users. We define the workflow
using XML and will test it within Cloudera environment.
Steps :
1. In HDFS file browser create a new folder “emailAppDef” ( say in the path :
“/user/oozie/OozieWFConfigs/emailAppDef” )
2. Within the folder create a file “workflow.xml”. Edit the file to have contents like below :
Oozie workflow- Implementation
<workflow-app name="Email" xmlns="uri:oozie:workflow:0.5">
<start to="java-95a1"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="java-95a1">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>org.Emails</main-class>
<arg>${barcode}</arg>
<arg>${pdfContent}</arg>
<file>/user/oozie/OozieWFConfigs/emailAppDef/lib/mail.jar#mail.jar</file>
<file>/user/oozie/OozieWFConfigs/emailAppDef/lib/activation.jar#activation.jar</file>
<file>/user/oozie/OozieWFConfigs/emailAppDef/lib/EmailJava.jar#EmailJava.jar</file>
</java>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
Steps :
3. With the XML in the last step, we have mentioned that main class is “org.Emails”, and then we have
included 3 files – mail.jar, activation.jar and EmailJava.jar , where EmailJava.jar is the program we have
written and contains the main class “org.Emails”
4. Create another file within the same directory “emailAppDef” with name “job.properties”. This file will
hold the properties of the workflow. The contents of the job.properties file will be as :
nameNode=hdfs://<namenode IP Address>:8020
jobTracker= <namenode IP Address>:8021
queueName=default
oozie.wf.application.path=/user/jinith.joseph/OozieWFConfigs/emailAppDef
oozie.use.system.libpath=true
5. Create a new folder “lib” within “emailAppDef” , which will hold the main class jar file and the dependent
3rd party jars.
6. Add the EmaiJava.jar, mail.jar and activation.jar into the “lib” folder.
7. We have now finished defining the workflow to invoke a java job from oozie.
Oozie workflow- Implementation
Target : To create a coprocessor to create a history of all insertions in a HBase table and
trigger an oozie workflow in Java.
Steps : Create a simple java project referencing Hadoop, logging, HBase, Common and
Oozie Client jars. Create a java class as below which will override the start() and
postPut() function, which will be executed when there is any activity in the region and
post adding a record in the HBase table respectively.
HBase Coprocessor - Implementation
public class incCoProc extends BaseRegionObserver {
private byte[] master;
private byte[] family;
private byte[] flags = Bytes.toBytes("flags");
private Log log;
@Override
public void start(CoprocessorEnvironment e) throws IOException {
Configuration conf = e.getConfiguration();
master = Bytes.toBytes(conf.get("master"));
family = Bytes.toBytes(conf.get("family"));
}
The start function will be executed when
the coprocessor is associated to a HBase
table or all of them. You could provide
parameters to this function.
Here, the start function accepts couple of
arguments to be initiated. I have used two
arguments here , which are picked from
the configuration, which denotes a table
name and a family name for operations
further.
HBase Coprocessor - Implementation
@Override
public void postPut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit, Durability durability) {
try
{
final RegionCoprocessorEnvironment env = e.getEnvironment();
final byte[] row = put.getRow();
Get get = new Get(row);
final Result result = env.getRegion().get(get);
new Thread(new Runnable() {
public void run() {
try {
insert(env, row, result);
} catch (IOException e1) {
e1.printStackTrace();
}
}
}).start();
}
catch(IOException ex)
{ }
}
postPut() which should be overridden, will
be called after there is any insertion on the
associated Hbase tables.
Here we have got the environment details,
and the row details of the new record in the
HBase table. And called a function on a
separate thread to minimize insertion time
to the parent HBase table.
* After each insert the postPut() will be
executed and the time of insertion to the
HBase table is linearly related to the time
required to complete the actions.
HBase Coprocessor - Implementation
private void insert(RegionCoprocessorEnvironment env, byte[] row, Result r) throws IOException {
Table masterTable = env.getTable(TableName.valueOf(master));
try {
CellScanner scanner = r.cellScanner(); StringBuilder str = new StringBuilder();
int count = 0;
while (scanner.advance()) {
Cell cell = scanner.current();
byte[] qualifier = CellUtil.cloneQualifier(cell);
byte[] value = CellUtil.cloneValue(cell);
if (count > 0)
str.append("|");
str.append(Bytes.toString(qualifier)).append("=").append(Bytes.toString(value));
count++;
}
Put put = new Put(row); put.addColumn(family, row, Bytes.toBytes(str.toString()));
put.addColumn(flags,Bytes.toBytes("EmailSend"),Bytes.toBytes("false"));
put.addColumn(flags, Bytes.toBytes("Archived"), Bytes.toBytes("false"));
put.addColumn(flags, Bytes.toBytes("Flattened"), Bytes.toBytes("false"));
masterTable.put(put);
BASE64Encoder encoder = new BASE64Encoder();
final String pdfContent = encoder.encodeBuffer(Bytes.toBytes(str.toString()));
OozieJobInvoke(pdfContent);
}
catch(Exception ex)
{ }
finally {
masterTable.close();
}
}
In this Insert function, we have read the
data of the row newly inserted into the
parent HBase table, and inserted the same
with some extra fields ( flags) to a log table
(master)
For every Put on table A (with a row key
"<rowid>"), a cell is put into a master
table with row key "<row id>", qualifier
"<row id>" and value a concatenation of
qualifiers and values of the row in A.
Master table name and column family are
passed as arguments.
And finally initiating a oozie job call with
the content of the data ( pdfContent )
HBase Coprocessor - Implementation
private void OozieJobInvoke(String pdfContent) {
OozieClient wc = new OozieClient("http://hdfs:hdfs@<name node>:11000/oozie");
Properties conf = wc.createConfiguration();
conf.setProperty("nameNode", "hdfs://<name node>:8020");
conf.setProperty("jobTracker", “<name node>:8032");
conf.setProperty("queueName", "default");
conf.setProperty("oozie.libpath", "${nameNode}/user/oozie/OozieWFConfigs/emailAppDef/lib");
conf.setProperty("oozie.use.system.libpath", "true");
conf.setProperty("oozie.wf.rerun.failnodes", "true");
conf.setProperty("oozieProjectRoot",
"${nameNode}/user/jinith.joseph/OozieWFConfigs/emailAppDef");
conf.setProperty("appPath",
"${nameNode}/user/jinith.joseph/OozieWFConfigs/emailAppDef");
conf.setProperty(OozieClient.APP_PATH, "${appPath}/workflow.xml");
conf.setProperty("barcode", "0MN20151228102N21452");
conf.setProperty("pdfContent", pdfContent);
try {
String jobId = wc.run(conf);
} catch (Exception r) {
}
}
In this function were the oozie job is called,
we have specified various configurations
for oozie job to pick up and then mentioned
the project path and the application path
(which is the HDFS path of the oozie
workflow xml we have created and
uploaded ).
Once the oozieclient.run() is called, the
oozie job will be submitted. The
oozie.libpath will denote the jar files which
has to be included in the oozie workflow.
Oozie workflow has a couple of arguments
which could be set using conf.setProperty()
function (barcode, pdfContent)
As we have already defined the Oozie
workflow to execute a java job, the java job
will be called and the email send.
HBase Coprocessor – Assign to a Hbase
table
Target : Associate the HBase coprocessor to a HBase table
Steps :
1. Upload the Coprocessor jar file in a HDFS location
2. Start hbase shell and create the required tables.
3. Disable the table to which the coprocessor is being associated.
4. Use the below command to associate the HBase coprocessor to the HBase table :
hbase(main):049:0> alter ‘<HBase table name>', METHOD => 'table_att', 'coprocessor' =>
'hdfs:///user/oozie/JARS/incCoProc .jar|com. incCoProc ||master=log,family=family‘
while the coprocessor is getting associated to the HBase table, you will see that the regions are updated on the shell.
5. Enable the table to which the coprocessor was attached.
6. If you use describe ‘<HBase table name>‘ from hbase shell , you should see that the coprocessor is attached to the HBase table.
7. Try insertion new records to the HBase table to which the coprocessor was attached, you should see that a history is maintained
in the “log” table and emails send out to the configured user.
Master and family are
arguments being passed to
the Hbase Co processor
where the overridden start()
function will use it for other
operations.
So that every insertion on
the master table will be
inserted into the “log” table
with family name “family”
HBase Coprocessor – Unset from a Hbase
table
Target : Unset the HBase coprocessor from a HBase table
Steps :
1. Start hbase shell and create the required tables.
2. Disable the table to which the coprocessor is being associated.
3. Use the below command to associate the HBase coprocessor to the HBase table :
hbase(main):049:0> alter ‘<HBase table name>', METHOD => 'table_att_unset', NAME => 'coprocessor$1‘
4. Enable the table to which the coprocessor was attached.
5. Try insertion new records to the HBase table to which the coprocessor was attached, no actions will be triggered.
Thank You!
Hope you have got an introduction to HBase coprocessors and Oozie jobs and how they could be
related for various functionalities.

More Related Content

What's hot

Consuming RESTful services in PHP
Consuming RESTful services in PHPConsuming RESTful services in PHP
Consuming RESTful services in PHPZoran Jeremic
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12mislam77
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
An Overview of Node.js
An Overview of Node.jsAn Overview of Node.js
An Overview of Node.jsAyush Mishra
 
Apache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniApache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniBhawani N Prasad
 
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASEnable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASInvenire Aude
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018Vlad Mihalcea
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsInvenire Aude
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
IBM Connect 2016 - Break out of the Box
IBM Connect 2016 - Break out of the BoxIBM Connect 2016 - Break out of the Box
IBM Connect 2016 - Break out of the BoxKarl-Henry Martinsson
 

What's hot (20)

Consuming RESTful services in PHP
Consuming RESTful services in PHPConsuming RESTful services in PHP
Consuming RESTful services in PHP
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
An Overview of Node.js
An Overview of Node.jsAn Overview of Node.js
An Overview of Node.js
 
Apache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniApache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawani
 
BD-zero lecture.pptx
BD-zero lecture.pptxBD-zero lecture.pptx
BD-zero lecture.pptx
 
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IASEnable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
Enable Database Service over HTTP or IBM WebSphere MQ in 15_minutes with IAS
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
 
Apache Oozie
Apache OozieApache Oozie
Apache Oozie
 
October 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.xOctober 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.x
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data Processors
 
Mule caching strategy with redis cache
Mule caching strategy with redis cacheMule caching strategy with redis cache
Mule caching strategy with redis cache
 
AD102 - Break out of the Box
AD102 - Break out of the BoxAD102 - Break out of the Box
AD102 - Break out of the Box
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Break out of The Box - Part 2
Break out of The Box - Part 2Break out of The Box - Part 2
Break out of The Box - Part 2
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
IBM Connect 2016 - Break out of the Box
IBM Connect 2016 - Break out of the BoxIBM Connect 2016 - Break out of the Box
IBM Connect 2016 - Break out of the Box
 
COScheduler
COSchedulerCOScheduler
COScheduler
 
Wizard of ORDS
Wizard of ORDSWizard of ORDS
Wizard of ORDS
 

Similar to Hbase coprocessor with Oozie WF referencing 3rd Party jars

Similar to Hbase coprocessor with Oozie WF referencing 3rd Party jars (20)

Apache Oozie
Apache OozieApache Oozie
Apache Oozie
 
Node js getting started
Node js getting startedNode js getting started
Node js getting started
 
nodejs_at_a_glance.ppt
nodejs_at_a_glance.pptnodejs_at_a_glance.ppt
nodejs_at_a_glance.ppt
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Apache Oozie.pptx
Apache Oozie.pptxApache Oozie.pptx
Apache Oozie.pptx
 
6.hive
6.hive6.hive
6.hive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Unit 5
Unit  5Unit  5
Unit 5
 
Academy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storageAcademy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storage
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
מיכאל
מיכאלמיכאל
מיכאל
 
Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
My Saminar On Php
My Saminar On PhpMy Saminar On Php
My Saminar On Php
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Etl with talend (big data)
Etl with talend (big data)Etl with talend (big data)
Etl with talend (big data)
 
Basic API Creation with Node.JS
Basic API Creation with Node.JSBasic API Creation with Node.JS
Basic API Creation with Node.JS
 
Node_basics.pptx
Node_basics.pptxNode_basics.pptx
Node_basics.pptx
 
Hive
HiveHive
Hive
 

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Hbase coprocessor with Oozie WF referencing 3rd Party jars

  • 1. Email notifications from HBase data Hadoop : HBase Coprocessors and Oozie jobs Jinith Joseph
  • 2. • Introduction to Hbase co processors and Oozie java jobs • Notifications from HBase using HBase co processors & Oozie jobs • Implementation on Cloudera cdh 5.4.0 without flume Requirement
  • 3. Scope of Coprocessors • System Level : Coprocessors can be configured to work on every tables and regions • Table Level : Coprocessors can be configured to work on all regions of a specific table Types of Coprocessors • Observer coprocessor ( Acts like triggers in conventional databases ) This allows users to insert custom code by overriding the upcall methods provided by the coprocessor framework. The callback functions are executed from core HBase code when certain events occur. • Endpoint coprocessor ( Resembled stored procedures in conventional databases ) User can invoke the end point at any time from the client, which will be executed remotely at the target or all regions and the results will be returned to the client. Overview of HBase Coprocessor
  • 4. Observer Coprocessor Three types of Observers • Region Observer These observers provide hooks to data manipulation events like get , put, delete, scan , etc on HBase tables. For every table region, there will be an instance of RegionObserver coprocessor. But the scope of the observers could be set to a specific region or across all regions. • WAL Observer Provides hooks for write-ahead log (WAL) related operations. These runs in the context of WAL processing. This is used for WAL writing and reconstruction events (Eg: Bulk load of HBase tables using HFile). • Master Observer These observes provides hook on the DDL operations like create / alter / delete tables. The master observer runs within the context of HBase master.
  • 5. Oozie Apache Oozie is a system for running workflows of dependent jobs, and contains two main engines : • workflow engine – This stores and runs workflows composed of different types of Hadoop jobs • coordinator engine - This runs workflow jobs based on predefined schedules and data availability. It allows the user to define and execute recurrent and interdependent workflow jobs The Oozie web UI will be available in the URL : http://namdenode:11000 Using Oozie workflow different tasks can be setup up as a part of workflow including : • Hive Script, Pig , Spark, Java, Sqoop, MapReduce, Shell, Ssh, HDFS Fs, Email, Streaming, Distcp, etc. Oozie workflows generally follows a transition Diagram to report its status as : Start Actions End Fail Error Success
  • 6. How to Connect these? HBase Data TableData insertion to HBase Coprocessor Region Observer Coprocessor postput() which will be invoked after the insertion of the data in HBase table has happened HBase Log Table Oozie job Invoke Oozie job with parameters on a different thread Using 3rd Party .jars – mail.jar and activation.jar Frame the email content from parameters Using SMTP send out the email to users. * Might not be an ideal way to run in production environment. Please consider the rate of data and the threads invoked to run Oozie jobs JAVA JAVA
  • 7. Region Observer Coprocessor – postput() • The client code will be written in Java which would override the upcall methods of coprocessor framework and implement Observer coprocessor, which will be initiated as soon as there is a data available in the HBase table. • The insertion of the data will consume the same time before any coprocessors are finished. So any bulk code on the Coprocessor (overriding put events), will cost us on inserting data in the HBase tables. So here we would use have our functionality ( Sending out emails ) on an oozie job which will run on a separate thread and will be triggered from the HBase observer coprocessor Oozie workflow – Java job using 3rd Party jars • The Oozie workflow will contain a java job to send out emails using SMTP, will be invoked using Oozie Java API • Workflow will have a dedicated folder in HDFS with below structure and files ( Each files described later.. ) - /user/oozie/OozieWFConfigs/emailAppDef - job.properties ( file to hold the specific properties of the workflow ) - workflow.xml ( the workflow design ) - /lib - <Our Jar File>.jar, activation.jar , mail.jar How we use HBase coprocessor & Oozie
  • 8. Target : To write a simple java program with main class, to send out emails to users using SMTP. Steps : 1. Create a Simple Java project, with package org. 2. Create a class EmailJava.java as below : EmailJava.jar Implementation package org; public class Emails { private static byte[] Data; public static void main(String [] args) { try { String barcode = args[0]; String pdfContent = args[1]; BASE64Decoder decoder = new BASE64Decoder(); Data = decoder.decodeBuffer(pdfContent); Properties props = System.getProperties(); props.put("mail.smtp.host","mail.company.com"); props.put("mail.smtp.auth","true"); Session session = Session.getInstance(props, null); Message msg = new MimeMessage(session); msg.setFrom(new InternetAddress("oozie@company.com")); msg.setRecipients(Message.RecipientType.TO,InternetAddress.parse("<user email address>", false)); msg.setSubject("Your Purchase Receipt "+ barcode ); msg.setHeader("X-Mailer", "Company CRM Communications"); String content = "Thank you for shopping. Please find the attached receipt for your purchase."; MimeBodyPart textBodyPart = new MimeBodyPart(); textBodyPart.setText(content);
  • 9. 3. Compile the code with mail.jar and activation.jar included in the referenced libraries to create EmailJava.jar EmailJava.jar Implementation MimeBodyPart pdfBodyPart = new MimeBodyPart(); pdfBodyPart.setDataHandler(new DataHandler(new ByteArrayDataSource(Data, "application/pdf"))); pdfBodyPart.setFileName("Receipt_" + barcode + ".pdf"); MimeMultipart mimeMultipart = new MimeMultipart(); mimeMultipart.addBodyPart(textBodyPart); mimeMultipart.addBodyPart(pdfBodyPart); msg.setContent(mimeMultipart); msg.setSentDate(new Date()); SMTPTransport t = (SMTPTransport)session.getTransport("smtp"); System.out.println("Attempting to connect to the SMTP Server"); t.connect("mail.company.com", "<SMTP_UserName>", "<SMTP_Password>"); System.out.println("Connection succeeded. Attempting to send email."); t.sendMessage(msg, msg.getAllRecipients()); System.out.println("Response: " + t.getLastServerResponse()); t.close(); } catch(Exception ex) { ex.printStackTrace(); } } }
  • 10. Target : As the oozie workflow have to be triggered from a coprocessor, we will first implement an oozie workflow to send out emails to users. We define the workflow using XML and will test it within Cloudera environment. Steps : 1. In HDFS file browser create a new folder “emailAppDef” ( say in the path : “/user/oozie/OozieWFConfigs/emailAppDef” ) 2. Within the folder create a file “workflow.xml”. Edit the file to have contents like below : Oozie workflow- Implementation <workflow-app name="Email" xmlns="uri:oozie:workflow:0.5"> <start to="java-95a1"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="java-95a1"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <main-class>org.Emails</main-class> <arg>${barcode}</arg> <arg>${pdfContent}</arg> <file>/user/oozie/OozieWFConfigs/emailAppDef/lib/mail.jar#mail.jar</file> <file>/user/oozie/OozieWFConfigs/emailAppDef/lib/activation.jar#activation.jar</file> <file>/user/oozie/OozieWFConfigs/emailAppDef/lib/EmailJava.jar#EmailJava.jar</file> </java> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
  • 11. Steps : 3. With the XML in the last step, we have mentioned that main class is “org.Emails”, and then we have included 3 files – mail.jar, activation.jar and EmailJava.jar , where EmailJava.jar is the program we have written and contains the main class “org.Emails” 4. Create another file within the same directory “emailAppDef” with name “job.properties”. This file will hold the properties of the workflow. The contents of the job.properties file will be as : nameNode=hdfs://<namenode IP Address>:8020 jobTracker= <namenode IP Address>:8021 queueName=default oozie.wf.application.path=/user/jinith.joseph/OozieWFConfigs/emailAppDef oozie.use.system.libpath=true 5. Create a new folder “lib” within “emailAppDef” , which will hold the main class jar file and the dependent 3rd party jars. 6. Add the EmaiJava.jar, mail.jar and activation.jar into the “lib” folder. 7. We have now finished defining the workflow to invoke a java job from oozie. Oozie workflow- Implementation
  • 12. Target : To create a coprocessor to create a history of all insertions in a HBase table and trigger an oozie workflow in Java. Steps : Create a simple java project referencing Hadoop, logging, HBase, Common and Oozie Client jars. Create a java class as below which will override the start() and postPut() function, which will be executed when there is any activity in the region and post adding a record in the HBase table respectively. HBase Coprocessor - Implementation public class incCoProc extends BaseRegionObserver { private byte[] master; private byte[] family; private byte[] flags = Bytes.toBytes("flags"); private Log log; @Override public void start(CoprocessorEnvironment e) throws IOException { Configuration conf = e.getConfiguration(); master = Bytes.toBytes(conf.get("master")); family = Bytes.toBytes(conf.get("family")); } The start function will be executed when the coprocessor is associated to a HBase table or all of them. You could provide parameters to this function. Here, the start function accepts couple of arguments to be initiated. I have used two arguments here , which are picked from the configuration, which denotes a table name and a family name for operations further.
  • 13. HBase Coprocessor - Implementation @Override public void postPut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit, Durability durability) { try { final RegionCoprocessorEnvironment env = e.getEnvironment(); final byte[] row = put.getRow(); Get get = new Get(row); final Result result = env.getRegion().get(get); new Thread(new Runnable() { public void run() { try { insert(env, row, result); } catch (IOException e1) { e1.printStackTrace(); } } }).start(); } catch(IOException ex) { } } postPut() which should be overridden, will be called after there is any insertion on the associated Hbase tables. Here we have got the environment details, and the row details of the new record in the HBase table. And called a function on a separate thread to minimize insertion time to the parent HBase table. * After each insert the postPut() will be executed and the time of insertion to the HBase table is linearly related to the time required to complete the actions.
  • 14. HBase Coprocessor - Implementation private void insert(RegionCoprocessorEnvironment env, byte[] row, Result r) throws IOException { Table masterTable = env.getTable(TableName.valueOf(master)); try { CellScanner scanner = r.cellScanner(); StringBuilder str = new StringBuilder(); int count = 0; while (scanner.advance()) { Cell cell = scanner.current(); byte[] qualifier = CellUtil.cloneQualifier(cell); byte[] value = CellUtil.cloneValue(cell); if (count > 0) str.append("|"); str.append(Bytes.toString(qualifier)).append("=").append(Bytes.toString(value)); count++; } Put put = new Put(row); put.addColumn(family, row, Bytes.toBytes(str.toString())); put.addColumn(flags,Bytes.toBytes("EmailSend"),Bytes.toBytes("false")); put.addColumn(flags, Bytes.toBytes("Archived"), Bytes.toBytes("false")); put.addColumn(flags, Bytes.toBytes("Flattened"), Bytes.toBytes("false")); masterTable.put(put); BASE64Encoder encoder = new BASE64Encoder(); final String pdfContent = encoder.encodeBuffer(Bytes.toBytes(str.toString())); OozieJobInvoke(pdfContent); } catch(Exception ex) { } finally { masterTable.close(); } } In this Insert function, we have read the data of the row newly inserted into the parent HBase table, and inserted the same with some extra fields ( flags) to a log table (master) For every Put on table A (with a row key "<rowid>"), a cell is put into a master table with row key "<row id>", qualifier "<row id>" and value a concatenation of qualifiers and values of the row in A. Master table name and column family are passed as arguments. And finally initiating a oozie job call with the content of the data ( pdfContent )
  • 15. HBase Coprocessor - Implementation private void OozieJobInvoke(String pdfContent) { OozieClient wc = new OozieClient("http://hdfs:hdfs@<name node>:11000/oozie"); Properties conf = wc.createConfiguration(); conf.setProperty("nameNode", "hdfs://<name node>:8020"); conf.setProperty("jobTracker", “<name node>:8032"); conf.setProperty("queueName", "default"); conf.setProperty("oozie.libpath", "${nameNode}/user/oozie/OozieWFConfigs/emailAppDef/lib"); conf.setProperty("oozie.use.system.libpath", "true"); conf.setProperty("oozie.wf.rerun.failnodes", "true"); conf.setProperty("oozieProjectRoot", "${nameNode}/user/jinith.joseph/OozieWFConfigs/emailAppDef"); conf.setProperty("appPath", "${nameNode}/user/jinith.joseph/OozieWFConfigs/emailAppDef"); conf.setProperty(OozieClient.APP_PATH, "${appPath}/workflow.xml"); conf.setProperty("barcode", "0MN20151228102N21452"); conf.setProperty("pdfContent", pdfContent); try { String jobId = wc.run(conf); } catch (Exception r) { } } In this function were the oozie job is called, we have specified various configurations for oozie job to pick up and then mentioned the project path and the application path (which is the HDFS path of the oozie workflow xml we have created and uploaded ). Once the oozieclient.run() is called, the oozie job will be submitted. The oozie.libpath will denote the jar files which has to be included in the oozie workflow. Oozie workflow has a couple of arguments which could be set using conf.setProperty() function (barcode, pdfContent) As we have already defined the Oozie workflow to execute a java job, the java job will be called and the email send.
  • 16. HBase Coprocessor – Assign to a Hbase table Target : Associate the HBase coprocessor to a HBase table Steps : 1. Upload the Coprocessor jar file in a HDFS location 2. Start hbase shell and create the required tables. 3. Disable the table to which the coprocessor is being associated. 4. Use the below command to associate the HBase coprocessor to the HBase table : hbase(main):049:0> alter ‘<HBase table name>', METHOD => 'table_att', 'coprocessor' => 'hdfs:///user/oozie/JARS/incCoProc .jar|com. incCoProc ||master=log,family=family‘ while the coprocessor is getting associated to the HBase table, you will see that the regions are updated on the shell. 5. Enable the table to which the coprocessor was attached. 6. If you use describe ‘<HBase table name>‘ from hbase shell , you should see that the coprocessor is attached to the HBase table. 7. Try insertion new records to the HBase table to which the coprocessor was attached, you should see that a history is maintained in the “log” table and emails send out to the configured user. Master and family are arguments being passed to the Hbase Co processor where the overridden start() function will use it for other operations. So that every insertion on the master table will be inserted into the “log” table with family name “family”
  • 17. HBase Coprocessor – Unset from a Hbase table Target : Unset the HBase coprocessor from a HBase table Steps : 1. Start hbase shell and create the required tables. 2. Disable the table to which the coprocessor is being associated. 3. Use the below command to associate the HBase coprocessor to the HBase table : hbase(main):049:0> alter ‘<HBase table name>', METHOD => 'table_att_unset', NAME => 'coprocessor$1‘ 4. Enable the table to which the coprocessor was attached. 5. Try insertion new records to the HBase table to which the coprocessor was attached, no actions will be triggered.
  • 18. Thank You! Hope you have got an introduction to HBase coprocessors and Oozie jobs and how they could be related for various functionalities.