SlideShare a Scribd company logo
Author:RajasekaranKandhasamy
Real-Time Web App Integration with Hadoop On Docker
Reference Architecture
Introduction.......................................................................................................................................2
Big Data Deployment..........................................................................................................................2
Data Exchange/Gateway Layer [ Docker Host 1]................................................................................2
Data Center/Store/Processing Layer [Docker Host 2] ........................................................................3
Packaging and bundling:..................................................................................................................3
Real Time Web Application- Big Data: Integration Architecture.............................................................4
..............................................................................................................Error! Bookmark not defined.
Broker.jar:......................................................................................................................................5
ml-jobs.jar:.....................................................................................................................................5
Hadoop Job Configurations through Real Time WebApplication:...........................................................5
Manage Jobs UI Layout: ..................................................................................................................5
Future Enhancements:.................................................................................................................5
Add/Edit Job UI Layout: ...............................................................................................................5
Execute Button Action:................................................................................................................6
View Job Status ButtonAction:.....................................................................................................6
Sample real-time web application job details table structures: ..........................................................7
Job table:....................................................................................................................................7
Job status table:..........................................................................................................................7
Monitor Job Status from Real-time Web Application.........................................................................7
Current Implementation [Spark Listener Instrumentation]:............................................................7
Design option 2 [Polling HDFS Consumer]:....................................................................................7
Design option 3 [UI Uses HBase for Job status]:.............................................................................7
Design option 4 [UI Uses SPARK History server REST API]:..............................................................7
Author:RajasekaranKandhasamy
Introduction
 The emergence of Docker containers, development simplifies the process of building and
shippingapps. HadoopDockerdeploymentprovide the lightweight,disposable environment for
learning and exploring new technology, playing with new ideas, and for doing continuous
integration before testing at scale.
 For single node setup, we have usedHortonworks Dockerimage andformulti node,2Docker
hostshave used.[Settingupthisisseparate process].
 The proposed bigdata systemconsistsof Hadoopsupporting components suchas,
o Apache Spark – For large data processing
o Apache Sqoop– To integrate OLTPdatabases
o Apache HBase – NoSQLstore
o Apache Kafka– Actas distributedESBbetweenReal-timeapplicationandcluster
o Apache Flume – Webapplicationlogstreaming
o Hadoop core [DFS,YARN].
 In the below document, we are goingtosee
o howHadoop modulesare deployed inclusterenvironmentwith Dockersupport.
o howeach module getinteracted
o how are we configuringdifferentHadooprelatedjobsthrough real-time webapplication
o howare we execute Hadoopjobsthrough real-time webapplication
o howare we monitoringjobsubmissionstatus throughreal-timewebapplication
Big Data Deployment
 The proposedbigdata systemconsistsof one “Data Exchange/Gateway Layer[DockerHost 1]”and
can have one or many “Data Center/DataStore/DataProcessingLayer[DockerHost 2(N)]”.
 Horizontal scalingcanachieve in“Data Store/ProcessingLayer”byaddingadditionalDocker
Host using“docker-machinecreate <HOST-NAME>command.
 No separate clustersetupforSpark.Why?
 Leverage YARN environmentfordataprocessing,
 More hardware utilization,
 No separate machine forspark.
Data Exchange/Gateway Layer [ Docker Host 1]
 Real time web application andbigdatacluster/ecosystemcommunicationhappensthroughthis
layer.
 Mediatorcomponentslike flume, Kafka,Map-reduce clientsandsparkclientsare deployedhere
as individual Docker container in Docker Host 1.
 No data storage and distributed computation here.
 All bigdata jobsubmissionsfrom real time webapplication,[this includes Spark job, Map-Reduce job, Sqoop
jobs] are done through Kafka to enforce generic adapter design pattern.
 Data ingestion done through Flume sink(s) and Sqoop server.
Author:RajasekaranKandhasamy
 Data processing initiated through Kafka and Spark.
Data Center/Store/Processing Layer [Docker Host 2]
 Specifically, for data storage and distributed data processing.
 Data are more secured in access and can enable cloud based multi tenancy for data as well as
for processing.
 Use horizontal scaling approach to add new Docker host/hardware or to add new tenant.
Packaging and bundling:
 Sqoop :
 Sqoop server artifacts packaged together and deployed in Docker Host -1 Sqoop server
location.
 Sqoop client artifacts packaged together and deployed in Docker Host -1 Kafka location.
 Kafka:
 KafkaProducersand consumerspackagedseparatelyanddeployed in Docker Host -1 Kafka
location.
 Flume:
Flume sink packaged separately and deployed in Docker Host -1 Flume location.
Note: Some Kafka producers and flume agents may be run in real time web application server they
need to deploy appropriately.
Author:RajasekaranKandhasamy
Real Time Web Application-Big Data: IntegrationArchitecture
Modules Database
Web Application
Hadoop Eco System
Horizontal
Scaling
YARN
NM RM
HDFS
NN DN
HBASE
MASTER REGION
ZK
Docker Host 1
Flume HBase Sink
YARN
NM
HDFS
DN
HBASE
REGION
Data Store/ Processing LayerData Exchange/GatewayLayer
Log Configuration
Y
KafkaBootStrap
SparkLauncher
Broker.jar
Sqoop Server
Job-Status-QueueJob-Submit-Queue
Spark
Kafka Server
Spark Server
ml-jobs.jar
SparkListener
ML Logic Classes
Job Status Updater
Horizontal
Scaling
JobStatusfromYARN
Docker Host 2 Docker Host N
Author:RajasekaranKandhasamy
Broker.jar:
 This jar run as a service to start Kafka consumers [Java programs] and deployed in Kafka server
location [Docker Host 1]. Note: Kafka server started already.
 One of the Kafkaconsumerinside this jar “SparkJobExecutor.java” listening to the queue “Job-
Submit-Queue”.
 Wheneverthe jobsubmittedthrough Kafkaproduceravailable inreal time webapplication then
above consume the message andlaunchthe sparkclientbasedonincoming parameters.
ml-jobs.jar:
 This jar deployed in spark server location. [Docker Host 1].
 It contains list of machine learning implementations that can run on Hadoop distributed
environment.
 It’smandatorythat all machine learningimplementationsshould implement “SparkListener” in
order to publish the job status to “Job-Status-Queue”. So, the real-time web application can
consume job message from this queue and able to update in OLTP database.
HadoopJob Configurations throughReal Time Web Application:
 Introduce “Manage Big Data” link inUI mainmenu.
 “Manage Big Data” page consistsof “Manage Jobs” tab.
Manage Jobs UI Layout:
 Contain“JobSearch” widgetand“JobLists”data table where datatable showslistof jobs
configured.
 “Job Lists”contain“Add”,“Delete”,“Edit”, “Execute”and“View JobStatus”buttonstodo
appropriate actionsonconfiguredjobs.
 “Add” or “Edit” jobdialogscreenprovidesjob name,description, jobtype, FQCN andexecution
mode configurations.
Future Enhancements:
 Enable rule basedroutinginformationforjobs,
 Enable rule basedalertsoreventsforjobs,
 More UI optionsoncamel route configurations.
Add/EditJobUILayout:
 Clickon “Add/Save”shouldsave the above configurationin real time webapplication OLTP
database.
 User can see the saveddata intable view.
Author:RajasekaranKandhasamy
ExecuteButtonAction:
ViewJobStatus ButtonAction:
 User can view executedjobstatusandloginformationfromclusters.
 Thisinformationretrievedfromjobstatustable.
 There isKafkaconsumerrunninginside real time webapplication whichislisteningto“Job-
Status-Queue”fromBigData clusterperiodically andupdate jobstatusinOLTPdatabase.
Author:RajasekaranKandhasamy
Sample real-time web application job details table structures:
Jobtable:
Jobstatus table:
Monitor Job Status from Real-time Web Application
CurrentImplementation [SparkListenerInstrumentation]:
 Executable Spark job should register Spark Listener.
 Spark Listener event handler(s) will be triggered for each job event.
 Event handler should publish the job status message to “Job-Status-Queue”.
 User can viewjobstatusin“View JobStatus”tab in real time webapplication UI from OLTP data
base.
Designoption2[PollingHDFS Consumer]:
 Executable Spark job should write job status to HDFS.
 Real-time webapplication Camel HDFSPollingConsumerkeep checking the above location and
once find job status file then this will read content from the file and write it to data base.
 User can view job status in “Job Status” tab in real time web application UI from data base.
Designoption3[UIUses HBaseforJobstatus]:
 Executable Spark job should write job status to HBASE.
 User can view job status in “Job Status” tab in real time web application UI from HBASE.
Designoption4[UIUses SPARKHistoryserverREST API]:
 Enable spark history server.
 User can view job status in “Job Status” tab in real time web application UI by using spark
history server REST API.

More Related Content

What's hot

Hbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsHbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jars
Jinith Joseph
 
SQL Azure the database in the cloud
SQL Azure the database in the cloud SQL Azure the database in the cloud
SQL Azure the database in the cloud
Eduardo Castro
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
Vinay Kumar
 
Hadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of HadoopHadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of Hadoop
Dr Neelesh Jain
 
Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)
Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)
Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)
Chad Green
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
 
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Sergey Smetanin
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
Chris Testa-O'Neill
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
James Serra
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
Expertos en TI
 
CloudStack Metering - Working with Usage Data #CCCNA14
CloudStack Metering - Working with Usage Data #CCCNA14CloudStack Metering - Working with Usage Data #CCCNA14
CloudStack Metering - Working with Usage Data #CCCNA14
ShapeBlue
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicalsShelli Ciaschini
 
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA
 
Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.
Roman Nikitchenko
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAnswerModules
 
Adding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASAdding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSAS
sumedha.r
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
Palash Debnath
 

What's hot (20)

Hbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsHbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jars
 
SQL Azure the database in the cloud
SQL Azure the database in the cloud SQL Azure the database in the cloud
SQL Azure the database in the cloud
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
 
Hadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of HadoopHadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of Hadoop
 
Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)
Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)
Getting Started with Azure SQL Database (Presented at Pittsburgh TechFest 2018)
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
 
CloudStack Metering - Working with Usage Data #CCCNA14
CloudStack Metering - Working with Usage Data #CCCNA14CloudStack Metering - Working with Usage Data #CCCNA14
CloudStack Metering - Working with Usage Data #CCCNA14
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
 
Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
 
Adding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASAdding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSAS
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 

Similar to Real time web app integration with hadoop on docker

Docker Java App with MariaDB – Deployment in Less than a Minute
Docker Java App with MariaDB – Deployment in Less than a MinuteDocker Java App with MariaDB – Deployment in Less than a Minute
Docker Java App with MariaDB – Deployment in Less than a Minute
dchq
 
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Altinity Ltd
 
Evolution of netflix conductor
Evolution of netflix conductorEvolution of netflix conductor
Evolution of netflix conductor
vedu12
 
Java Web Programming on Google Cloud Platform [1/3] : Google App Engine
Java Web Programming on Google Cloud Platform [1/3] : Google App EngineJava Web Programming on Google Cloud Platform [1/3] : Google App Engine
Java Web Programming on Google Cloud Platform [1/3] : Google App Engine
IMC Institute
 
01_Intro_SAP BO DATA Integrator.docx
01_Intro_SAP BO DATA Integrator.docx01_Intro_SAP BO DATA Integrator.docx
01_Intro_SAP BO DATA Integrator.docx
sivakumar269245
 
Dockerization of Azure Platform
Dockerization of Azure PlatformDockerization of Azure Platform
Dockerization of Azure Platform
nirajrules
 
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at Aucfanlab
Aucfan
 
Camel on Cloud by Christina Lin
Camel on Cloud by Christina LinCamel on Cloud by Christina Lin
Camel on Cloud by Christina Lin
Tadayoshi Sato
 
Ocs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesOcs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesThiago Gutierri
 
Meetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdfMeetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdf
Red Hat
 
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
seo18
 
Connect + Docker + AWS = Bitbucket Pipelines
Connect + Docker + AWS = Bitbucket PipelinesConnect + Docker + AWS = Bitbucket Pipelines
Connect + Docker + AWS = Bitbucket Pipelines
Atlassian
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
Software Park Thailand
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
Docker, Inc.
 
ZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS CloudsZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS CloudsSimon Massey
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
Vinay Kumar
 
Hadoop institutes in Bangalore
Hadoop institutes in BangaloreHadoop institutes in Bangalore
Hadoop institutes in Bangalore
srikanthhadoop
 

Similar to Real time web app integration with hadoop on docker (20)

Docker Java App with MariaDB – Deployment in Less than a Minute
Docker Java App with MariaDB – Deployment in Less than a MinuteDocker Java App with MariaDB – Deployment in Less than a Minute
Docker Java App with MariaDB – Deployment in Less than a Minute
 
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
 
Evolution of netflix conductor
Evolution of netflix conductorEvolution of netflix conductor
Evolution of netflix conductor
 
Java Web Programming on Google Cloud Platform [1/3] : Google App Engine
Java Web Programming on Google Cloud Platform [1/3] : Google App EngineJava Web Programming on Google Cloud Platform [1/3] : Google App Engine
Java Web Programming on Google Cloud Platform [1/3] : Google App Engine
 
01_Intro_SAP BO DATA Integrator.docx
01_Intro_SAP BO DATA Integrator.docx01_Intro_SAP BO DATA Integrator.docx
01_Intro_SAP BO DATA Integrator.docx
 
Dockerization of Azure Platform
Dockerization of Azure PlatformDockerization of Azure Platform
Dockerization of Azure Platform
 
Java Agent Virtualization
Java Agent VirtualizationJava Agent Virtualization
Java Agent Virtualization
 
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at Aucfanlab
 
Camel on Cloud by Christina Lin
Camel on Cloud by Christina LinCamel on Cloud by Christina Lin
Camel on Cloud by Christina Lin
 
Ocs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesOcs F5 Bigip Bestpractices
Ocs F5 Bigip Bestpractices
 
Meetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdfMeetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdf
 
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
 
Connect + Docker + AWS = Bitbucket Pipelines
Connect + Docker + AWS = Bitbucket PipelinesConnect + Docker + AWS = Bitbucket Pipelines
Connect + Docker + AWS = Bitbucket Pipelines
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
11-DWR-and-JQuery
11-DWR-and-JQuery11-DWR-and-JQuery
11-DWR-and-JQuery
 
11-DWR-and-JQuery
11-DWR-and-JQuery11-DWR-and-JQuery
11-DWR-and-JQuery
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
ZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS CloudsZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS Clouds
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
 
Hadoop institutes in Bangalore
Hadoop institutes in BangaloreHadoop institutes in Bangalore
Hadoop institutes in Bangalore
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Real time web app integration with hadoop on docker

  • 1. Author:RajasekaranKandhasamy Real-Time Web App Integration with Hadoop On Docker Reference Architecture Introduction.......................................................................................................................................2 Big Data Deployment..........................................................................................................................2 Data Exchange/Gateway Layer [ Docker Host 1]................................................................................2 Data Center/Store/Processing Layer [Docker Host 2] ........................................................................3 Packaging and bundling:..................................................................................................................3 Real Time Web Application- Big Data: Integration Architecture.............................................................4 ..............................................................................................................Error! Bookmark not defined. Broker.jar:......................................................................................................................................5 ml-jobs.jar:.....................................................................................................................................5 Hadoop Job Configurations through Real Time WebApplication:...........................................................5 Manage Jobs UI Layout: ..................................................................................................................5 Future Enhancements:.................................................................................................................5 Add/Edit Job UI Layout: ...............................................................................................................5 Execute Button Action:................................................................................................................6 View Job Status ButtonAction:.....................................................................................................6 Sample real-time web application job details table structures: ..........................................................7 Job table:....................................................................................................................................7 Job status table:..........................................................................................................................7 Monitor Job Status from Real-time Web Application.........................................................................7 Current Implementation [Spark Listener Instrumentation]:............................................................7 Design option 2 [Polling HDFS Consumer]:....................................................................................7 Design option 3 [UI Uses HBase for Job status]:.............................................................................7 Design option 4 [UI Uses SPARK History server REST API]:..............................................................7
  • 2. Author:RajasekaranKandhasamy Introduction  The emergence of Docker containers, development simplifies the process of building and shippingapps. HadoopDockerdeploymentprovide the lightweight,disposable environment for learning and exploring new technology, playing with new ideas, and for doing continuous integration before testing at scale.  For single node setup, we have usedHortonworks Dockerimage andformulti node,2Docker hostshave used.[Settingupthisisseparate process].  The proposed bigdata systemconsistsof Hadoopsupporting components suchas, o Apache Spark – For large data processing o Apache Sqoop– To integrate OLTPdatabases o Apache HBase – NoSQLstore o Apache Kafka– Actas distributedESBbetweenReal-timeapplicationandcluster o Apache Flume – Webapplicationlogstreaming o Hadoop core [DFS,YARN].  In the below document, we are goingtosee o howHadoop modulesare deployed inclusterenvironmentwith Dockersupport. o howeach module getinteracted o how are we configuringdifferentHadooprelatedjobsthrough real-time webapplication o howare we execute Hadoopjobsthrough real-time webapplication o howare we monitoringjobsubmissionstatus throughreal-timewebapplication Big Data Deployment  The proposedbigdata systemconsistsof one “Data Exchange/Gateway Layer[DockerHost 1]”and can have one or many “Data Center/DataStore/DataProcessingLayer[DockerHost 2(N)]”.  Horizontal scalingcanachieve in“Data Store/ProcessingLayer”byaddingadditionalDocker Host using“docker-machinecreate <HOST-NAME>command.  No separate clustersetupforSpark.Why?  Leverage YARN environmentfordataprocessing,  More hardware utilization,  No separate machine forspark. Data Exchange/Gateway Layer [ Docker Host 1]  Real time web application andbigdatacluster/ecosystemcommunicationhappensthroughthis layer.  Mediatorcomponentslike flume, Kafka,Map-reduce clientsandsparkclientsare deployedhere as individual Docker container in Docker Host 1.  No data storage and distributed computation here.  All bigdata jobsubmissionsfrom real time webapplication,[this includes Spark job, Map-Reduce job, Sqoop jobs] are done through Kafka to enforce generic adapter design pattern.  Data ingestion done through Flume sink(s) and Sqoop server.
  • 3. Author:RajasekaranKandhasamy  Data processing initiated through Kafka and Spark. Data Center/Store/Processing Layer [Docker Host 2]  Specifically, for data storage and distributed data processing.  Data are more secured in access and can enable cloud based multi tenancy for data as well as for processing.  Use horizontal scaling approach to add new Docker host/hardware or to add new tenant. Packaging and bundling:  Sqoop :  Sqoop server artifacts packaged together and deployed in Docker Host -1 Sqoop server location.  Sqoop client artifacts packaged together and deployed in Docker Host -1 Kafka location.  Kafka:  KafkaProducersand consumerspackagedseparatelyanddeployed in Docker Host -1 Kafka location.  Flume: Flume sink packaged separately and deployed in Docker Host -1 Flume location. Note: Some Kafka producers and flume agents may be run in real time web application server they need to deploy appropriately.
  • 4. Author:RajasekaranKandhasamy Real Time Web Application-Big Data: IntegrationArchitecture Modules Database Web Application Hadoop Eco System Horizontal Scaling YARN NM RM HDFS NN DN HBASE MASTER REGION ZK Docker Host 1 Flume HBase Sink YARN NM HDFS DN HBASE REGION Data Store/ Processing LayerData Exchange/GatewayLayer Log Configuration Y KafkaBootStrap SparkLauncher Broker.jar Sqoop Server Job-Status-QueueJob-Submit-Queue Spark Kafka Server Spark Server ml-jobs.jar SparkListener ML Logic Classes Job Status Updater Horizontal Scaling JobStatusfromYARN Docker Host 2 Docker Host N
  • 5. Author:RajasekaranKandhasamy Broker.jar:  This jar run as a service to start Kafka consumers [Java programs] and deployed in Kafka server location [Docker Host 1]. Note: Kafka server started already.  One of the Kafkaconsumerinside this jar “SparkJobExecutor.java” listening to the queue “Job- Submit-Queue”.  Wheneverthe jobsubmittedthrough Kafkaproduceravailable inreal time webapplication then above consume the message andlaunchthe sparkclientbasedonincoming parameters. ml-jobs.jar:  This jar deployed in spark server location. [Docker Host 1].  It contains list of machine learning implementations that can run on Hadoop distributed environment.  It’smandatorythat all machine learningimplementationsshould implement “SparkListener” in order to publish the job status to “Job-Status-Queue”. So, the real-time web application can consume job message from this queue and able to update in OLTP database. HadoopJob Configurations throughReal Time Web Application:  Introduce “Manage Big Data” link inUI mainmenu.  “Manage Big Data” page consistsof “Manage Jobs” tab. Manage Jobs UI Layout:  Contain“JobSearch” widgetand“JobLists”data table where datatable showslistof jobs configured.  “Job Lists”contain“Add”,“Delete”,“Edit”, “Execute”and“View JobStatus”buttonstodo appropriate actionsonconfiguredjobs.  “Add” or “Edit” jobdialogscreenprovidesjob name,description, jobtype, FQCN andexecution mode configurations. Future Enhancements:  Enable rule basedroutinginformationforjobs,  Enable rule basedalertsoreventsforjobs,  More UI optionsoncamel route configurations. Add/EditJobUILayout:  Clickon “Add/Save”shouldsave the above configurationin real time webapplication OLTP database.  User can see the saveddata intable view.
  • 6. Author:RajasekaranKandhasamy ExecuteButtonAction: ViewJobStatus ButtonAction:  User can view executedjobstatusandloginformationfromclusters.  Thisinformationretrievedfromjobstatustable.  There isKafkaconsumerrunninginside real time webapplication whichislisteningto“Job- Status-Queue”fromBigData clusterperiodically andupdate jobstatusinOLTPdatabase.
  • 7. Author:RajasekaranKandhasamy Sample real-time web application job details table structures: Jobtable: Jobstatus table: Monitor Job Status from Real-time Web Application CurrentImplementation [SparkListenerInstrumentation]:  Executable Spark job should register Spark Listener.  Spark Listener event handler(s) will be triggered for each job event.  Event handler should publish the job status message to “Job-Status-Queue”.  User can viewjobstatusin“View JobStatus”tab in real time webapplication UI from OLTP data base. Designoption2[PollingHDFS Consumer]:  Executable Spark job should write job status to HDFS.  Real-time webapplication Camel HDFSPollingConsumerkeep checking the above location and once find job status file then this will read content from the file and write it to data base.  User can view job status in “Job Status” tab in real time web application UI from data base. Designoption3[UIUses HBaseforJobstatus]:  Executable Spark job should write job status to HBASE.  User can view job status in “Job Status” tab in real time web application UI from HBASE. Designoption4[UIUses SPARKHistoryserverREST API]:  Enable spark history server.  User can view job status in “Job Status” tab in real time web application UI by using spark history server REST API.