Real time web app integration with hadoop on docker

Author:RajasekaranKandhasamy
Real-Time Web App Integration with Hadoop On Docker
Reference Architecture
Introduction.......................................................................................................................................2
Big Data Deployment..........................................................................................................................2
Data Exchange/Gateway Layer [ Docker Host 1]................................................................................2
Data Center/Store/Processing Layer [Docker Host 2] ........................................................................3
Packaging and bundling:..................................................................................................................3
Real Time Web Application- Big Data: Integration Architecture.............................................................4
..............................................................................................................Error! Bookmark not defined.
Broker.jar:......................................................................................................................................5
ml-jobs.jar:.....................................................................................................................................5
Hadoop Job Configurations through Real Time WebApplication:...........................................................5
Manage Jobs UI Layout: ..................................................................................................................5
Future Enhancements:.................................................................................................................5
Add/Edit Job UI Layout: ...............................................................................................................5
Execute Button Action:................................................................................................................6
View Job Status ButtonAction:.....................................................................................................6
Sample real-time web application job details table structures: ..........................................................7
Job table:....................................................................................................................................7
Job status table:..........................................................................................................................7
Monitor Job Status from Real-time Web Application.........................................................................7
Current Implementation [Spark Listener Instrumentation]:............................................................7
Design option 2 [Polling HDFS Consumer]:....................................................................................7
Design option 3 [UI Uses HBase for Job status]:.............................................................................7
Design option 4 [UI Uses SPARK History server REST API]:..............................................................7

Introduction
 The emergence of Docker containers, development simplifies the process of building and
shippingapps. HadoopDockerdeploymentprovide the lightweight,disposable environment for
learning and exploring new technology, playing with new ideas, and for doing continuous
integration before testing at scale.
 For single node setup, we have usedHortonworks Dockerimage andformulti node,2Docker
hostshave used.[Settingupthisisseparate process].
 The proposed bigdata systemconsistsof Hadoopsupporting components suchas,
o Apache Spark – For large data processing
o Apache Sqoop– To integrate OLTPdatabases
o Apache HBase – NoSQLstore
o Apache Kafka– Actas distributedESBbetweenReal-timeapplicationandcluster
o Apache Flume – Webapplicationlogstreaming
o Hadoop core [DFS,YARN].
 In the below document, we are goingtosee
o howHadoop modulesare deployed inclusterenvironmentwith Dockersupport.
o howeach module getinteracted
o how are we configuringdifferentHadooprelatedjobsthrough real-time webapplication
o howare we execute Hadoopjobsthrough real-time webapplication
o howare we monitoringjobsubmissionstatus throughreal-timewebapplication
Big Data Deployment
 The proposedbigdata systemconsistsof one “Data Exchange/Gateway Layer[DockerHost 1]”and
can have one or many “Data Center/DataStore/DataProcessingLayer[DockerHost 2(N)]”.
 Horizontal scalingcanachieve in“Data Store/ProcessingLayer”byaddingadditionalDocker
Host using“docker-machinecreate <HOST-NAME>command.
 No separate clustersetupforSpark.Why?
 Leverage YARN environmentfordataprocessing,
 More hardware utilization,
 No separate machine forspark.
Data Exchange/Gateway Layer [ Docker Host 1]
 Real time web application andbigdatacluster/ecosystemcommunicationhappensthroughthis
layer.
 Mediatorcomponentslike flume, Kafka,Map-reduce clientsandsparkclientsare deployedhere
as individual Docker container in Docker Host 1.
 No data storage and distributed computation here.
 All bigdata jobsubmissionsfrom real time webapplication,[this includes Spark job, Map-Reduce job, Sqoop
jobs] are done through Kafka to enforce generic adapter design pattern.
 Data ingestion done through Flume sink(s) and Sqoop server.

 Data processing initiated through Kafka and Spark.
Data Center/Store/Processing Layer [Docker Host 2]
 Specifically, for data storage and distributed data processing.
 Data are more secured in access and can enable cloud based multi tenancy for data as well as
for processing.
 Use horizontal scaling approach to add new Docker host/hardware or to add new tenant.
Packaging and bundling:
 Sqoop :
 Sqoop server artifacts packaged together and deployed in Docker Host -1 Sqoop server
location.
 Sqoop client artifacts packaged together and deployed in Docker Host -1 Kafka location.
 Kafka:
 KafkaProducersand consumerspackagedseparatelyanddeployed in Docker Host -1 Kafka
location.
 Flume:
Flume sink packaged separately and deployed in Docker Host -1 Flume location.
Note: Some Kafka producers and flume agents may be run in real time web application server they
need to deploy appropriately.

Real Time Web Application-Big Data: IntegrationArchitecture
Modules Database
Web Application
Hadoop Eco System
Horizontal
Scaling
YARN
NM RM
HDFS
NN DN
HBASE
MASTER REGION
ZK
Docker Host 1
Flume HBase Sink
YARN
NM
HDFS
DN
HBASE
REGION
Data Store/ Processing LayerData Exchange/GatewayLayer
Log Configuration
Y
KafkaBootStrap
SparkLauncher
Broker.jar
Sqoop Server
Job-Status-QueueJob-Submit-Queue
Spark
Kafka Server
Spark Server
ml-jobs.jar
SparkListener
ML Logic Classes
Job Status Updater
Horizontal
Scaling
JobStatusfromYARN
Docker Host 2 Docker Host N

Broker.jar:
 This jar run as a service to start Kafka consumers [Java programs] and deployed in Kafka server
location [Docker Host 1]. Note: Kafka server started already.
 One of the Kafkaconsumerinside this jar “SparkJobExecutor.java” listening to the queue “Job-
Submit-Queue”.
 Wheneverthe jobsubmittedthrough Kafkaproduceravailable inreal time webapplication then
above consume the message andlaunchthe sparkclientbasedonincoming parameters.
ml-jobs.jar:
 This jar deployed in spark server location. [Docker Host 1].
 It contains list of machine learning implementations that can run on Hadoop distributed
environment.
 It’smandatorythat all machine learningimplementationsshould implement “SparkListener” in
order to publish the job status to “Job-Status-Queue”. So, the real-time web application can
consume job message from this queue and able to update in OLTP database.
HadoopJob Configurations throughReal Time Web Application:
 Introduce “Manage Big Data” link inUI mainmenu.
 “Manage Big Data” page consistsof “Manage Jobs” tab.
Manage Jobs UI Layout:
 Contain“JobSearch” widgetand“JobLists”data table where datatable showslistof jobs
configured.
 “Job Lists”contain“Add”,“Delete”,“Edit”, “Execute”and“View JobStatus”buttonstodo
appropriate actionsonconfiguredjobs.
 “Add” or “Edit” jobdialogscreenprovidesjob name,description, jobtype, FQCN andexecution
mode configurations.
Future Enhancements:
 Enable rule basedroutinginformationforjobs,
 Enable rule basedalertsoreventsforjobs,
 More UI optionsoncamel route configurations.
Add/EditJobUILayout:
 Clickon “Add/Save”shouldsave the above configurationin real time webapplication OLTP
database.
 User can see the saveddata intable view.

ExecuteButtonAction:
ViewJobStatus ButtonAction:
 User can view executedjobstatusandloginformationfromclusters.
 Thisinformationretrievedfromjobstatustable.
 There isKafkaconsumerrunninginside real time webapplication whichislisteningto“Job-
Status-Queue”fromBigData clusterperiodically andupdate jobstatusinOLTPdatabase.

Sample real-time web application job details table structures:
Jobtable:
Jobstatus table:
Monitor Job Status from Real-time Web Application
CurrentImplementation [SparkListenerInstrumentation]:
 Executable Spark job should register Spark Listener.
 Spark Listener event handler(s) will be triggered for each job event.
 Event handler should publish the job status message to “Job-Status-Queue”.
 User can viewjobstatusin“View JobStatus”tab in real time webapplication UI from OLTP data
base.
Designoption2[PollingHDFS Consumer]:
 Executable Spark job should write job status to HDFS.
 Real-time webapplication Camel HDFSPollingConsumerkeep checking the above location and
once find job status file then this will read content from the file and write it to data base.
 User can view job status in “Job Status” tab in real time web application UI from data base.
Designoption3[UIUses HBaseforJobstatus]:
 Executable Spark job should write job status to HBASE.
 User can view job status in “Job Status” tab in real time web application UI from HBASE.
Designoption4[UIUses SPARKHistoryserverREST API]:
 Enable spark history server.
 User can view job status in “Job Status” tab in real time web application UI by using spark
history server REST API.

Real time web app integration with hadoop on docker

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real time web app integration with hadoop on docker

Similar to Real time web app integration with hadoop on docker (20)

Recently uploaded

Recently uploaded (20)

Real time web app integration with hadoop on docker