SlideShare a Scribd company logo
Apache Falcon
DevOps
Sanjeev Tripurari
Tech Lead Operations@inmobi
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters.
http://falcon.apache.org
What’s on GRID
/user/sanjeev
/user/mohit
/user/Iliyas
/projects/meetup
/projects/support
/data/stream/click
/data/stream/beacon
Basic Components
Falcon
• Prism
• Server
• Client
ActiveMQ
Oozie
Hadoop
What’s in for DevOps
Cluster
NameNode, JT, Oozie, ActiveMQ, Colo
Feed
Data, DataPath, Lifetime, Retention, Owner,Replication
Process
Job, Queue, Priority, Parallelism, Input, Output, Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR1/2
Server
Oozie ActiveMQ
HDFS - MR1/2
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command:
falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml
falcon entity -list -type [cluster/feed/process]
Cluster
• Submit
• Delete
falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS]
Feed/Process OPTIONS
• schedule
• Status
• list
• touch
• depedency
• definition
• update
• delete
Falcon Cluster
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1">
<interfaces>
<interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/>
<interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/>
<interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/>
<interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/>
<interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/>
</interfaces>
<locations>
<location name="staging" path="/store/falcon/staging"/>
<location name="temp" path="/tmp"/>
<location name="working" path="/store/falcon/working"/>
</locations>
<properties>
<property name="colo.name" value="uk"/>
</properties>
</cluster>
Falcon Feed
<feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>input</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Feed
<feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>output</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Process
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
</cluster>
</clusters>
<parallel>1</parallel>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<inputs>
<input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" />
</inputs>
<outputs>
<output instance="now(0,0)" feed="uk-outputfeed" name="output" />
</outputs>
<properties>
<property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/>
<property name="user" value="${user()}"/>
<property name="baseTime" value="${today(0,0)}"/>
</properties>
<workflow engine="oozie" path="/user/sanjeev/falcon/workflow" />
<retry policy="periodic" delay="minutes(10)" attempts="3" />
</process>
Oozie Workflow
<workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow">
<start to="fs-cmds"/>
<action name="fs-cmds">
<fs>
<mkdir path='${output}'/>
</fs>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
What’s on HDFS
Input Feed: /user/sanjeev/falcon/input/2015/02/20/00
Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18
Output Feed: /user/sanjeevt/falcon/output/
Workflow: /user/sanjeevt/falcon/workflow/workflow.xml
falcon entity -type cluster -submit -file uk-clusterAlpha.xml
falcon entity -type feed -submit -file uk-inputfeed.xml
falcon entity -type feed -submit -file uk-outputfeed.xml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
Typical Production Process and Workflow
(process) process-click-convert
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="process-click-convert" xmlns="uri:falcon:process:0.1">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/>
</cluster>
<cluster name="us-clusterGamma">
<validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/>
</cluster>
</clusters>
<parallel>2</parallel>
<order>FIFO</order>
<frequency>minutes(30)</frequency>
<timezone>UTC</timezone>
<inputs>
<input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/>
</inputs>
<outputs>
<output name="Output" feed="feed-click-convert" instance="now(0,-30)"/>
</outputs>
<properties>
<property name="queueName" value="stream"/>
<property name="jobPriority" value="NORMAL"/>
</properties>
<workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/>
</process>
(workflow) /projects/support/click/conversion/workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'>
<start to='click-convert' />
<action name='click-convert'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${Output}"/>
<delete path="${wf:conf('Output.stats')}"/>
<delete path="${wf:conf('Output.tmp')}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.job.priority</name>
<value>${jobPriority}</value>
</property>
</configuration>
nn.cluster.my.com
<main-class>com.my.cluster.io.Driver</main-class>
<arg>-inputpath</arg><arg>${Input}</arg>
<arg>-outputpath</arg><arg>${Output}</arg>
<arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg>
<arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
Falcon Instance
Operation
Command:
falcon instance -type [feed/process] -[status/list]
falcon entity -list -type [feed/process] -name [processname/feedname] {-start
YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS]
Feed/Process OPTIONS
• status
• list
• logs
• kill
• rerun
• suspend
• resume
Monitoring
• Falcon CLI
• Oozie CLI
• ActiveMQ
• falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
• falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end
2015-02-23T00:00Z -status
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboard
• https://github.com/ajayyadav/falcon-dashboard
OnBoarding Pipeline
• Group All Process
• Minutely, Hourly, Daily, Weekly, Monthly
• Group Related Feeds
• Verify All process jars, workflows pushed to cluster
• Verify ownerships of all feed and process directories
• Verify owners have job scheduling access roles in particular cluster
• Validate the feeds
• Submit and schedule the feeds, so retention and replication is in place
• Dryrun the process schedule
• Submit and schedule the process
• Document the FEED SLA, HDFS Usage, retention period for
monitoring
• Document the PROCESS SLA, to observe delays
Challenges
• Tightly Integrated with Oozie
• Monitoring, onboarding needs streamlined
• Realtime change in Schedule Time, Queues
Advantages
• Development is very aggressive
• Industry is adopted quickly
• Once onboarded, focus only needs to be on set of critical process
• Easy shutdown and upgrade, as all the running jobs are managed by oozie
• DevOps can do easy setup and manage data
Thank You

More Related Content

What's hot

Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Christian Gohmann
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Jason Shih
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...Michael Noel
 
Making MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid Infrastructure
Ilmar Kerm
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Cedric CARBONE
 
Sql Server 2008 New Programmability Features
Sql Server 2008 New Programmability FeaturesSql Server 2008 New Programmability Features
Sql Server 2008 New Programmability Featuressqlserver.co.il
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must Know
Bruno Borges
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?
DLT Solutions
 
ORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesORDS - Oracle REST Data Services
ORDS - Oracle REST Data Services
Justin Michael Raj
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
pasalapudi123
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
Lars Platzdasch
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
Leighton Nelson
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service Application
Leighton Nelson
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
ShareThis
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Alex Zaballa
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
Yue Chen
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug Madrid
Vinay Kumar
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...Michael Noel
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
Lars Platzdasch
 

What's hot (20)

Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
 
Making MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid Infrastructure
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Sql Server 2008 New Programmability Features
Sql Server 2008 New Programmability FeaturesSql Server 2008 New Programmability Features
Sql Server 2008 New Programmability Features
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must Know
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?
 
ORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesORDS - Oracle REST Data Services
ORDS - Oracle REST Data Services
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service Application
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug Madrid
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
 

Similar to Apache Falcon - Sanjeev Tripurari

UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep Dive
Cisco DevNet
 
Immutant
ImmutantImmutant
Immutant
Norman Richards
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs mule
Achyuta Lakshmi
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)
ERPScan
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
Stian Soiland-Reyes
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance Tuning
Atlassian
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAP
Chris John Riley
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
Andrzej Ludwikowski
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
NAVER D2
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)
ERPScan
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
Lindsay Holmwood
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
Madhur Nawandar
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018
Vlad Mihalcea
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixelliando dias
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)
ERPScan
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAP
Chris John Riley
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014
Modern Data Stack France
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
Seetharam Venkatesh
 
vJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemvJUG - The JavaFX Ecosystem
vJUG - The JavaFX Ecosystem
Andres Almiray
 

Similar to Apache Falcon - Sanjeev Tripurari (20)

UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep Dive
 
Immutant
ImmutantImmutant
Immutant
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs mule
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance Tuning
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAP
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAP
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
vJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemvJUG - The JavaFX Ecosystem
vJUG - The JavaFX Ecosystem
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 

Apache Falcon - Sanjeev Tripurari

  • 2. Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. http://falcon.apache.org
  • 4. Basic Components Falcon • Prism • Server • Client ActiveMQ Oozie Hadoop
  • 5. What’s in for DevOps Cluster NameNode, JT, Oozie, ActiveMQ, Colo Feed Data, DataPath, Lifetime, Retention, Owner,Replication Process Job, Queue, Priority, Parallelism, Input, Output, Workflow
  • 6. Basic Enviroment Setup UK US Prism Server Oozie ActiveMQ HDFS - MR1/2 Server Oozie ActiveMQ HDFS - MR1/2
  • 8. Falcon Entity Operation Command: falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml falcon entity -list -type [cluster/feed/process] Cluster • Submit • Delete falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS] Feed/Process OPTIONS • schedule • Status • list • touch • depedency • definition • update • delete
  • 9. Falcon Cluster <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1"> <interfaces> <interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/> <interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/> <interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/> <interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/> <interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/> </interfaces> <locations> <location name="staging" path="/store/falcon/staging"/> <location name="temp" path="/tmp"/> <location name="working" path="/store/falcon/working"/> </locations> <properties> <property name="colo.name" value="uk"/> </properties> </cluster>
  • 10. Falcon Feed <feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>input</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 11. Falcon Feed <feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>output</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 12. Falcon Process <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> </cluster> </clusters> <parallel>1</parallel> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <inputs> <input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" /> </inputs> <outputs> <output instance="now(0,0)" feed="uk-outputfeed" name="output" /> </outputs> <properties> <property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/> <property name="user" value="${user()}"/> <property name="baseTime" value="${today(0,0)}"/> </properties> <workflow engine="oozie" path="/user/sanjeev/falcon/workflow" /> <retry policy="periodic" delay="minutes(10)" attempts="3" /> </process>
  • 13. Oozie Workflow <workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow"> <start to="fs-cmds"/> <action name="fs-cmds"> <fs> <mkdir path='${output}'/> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
  • 14. What’s on HDFS Input Feed: /user/sanjeev/falcon/input/2015/02/20/00 Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18 Output Feed: /user/sanjeevt/falcon/output/ Workflow: /user/sanjeevt/falcon/workflow/workflow.xml falcon entity -type cluster -submit -file uk-clusterAlpha.xml falcon entity -type feed -submit -file uk-inputfeed.xml falcon entity -type feed -submit -file uk-outputfeed.xml falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
  • 15. Typical Production Process and Workflow (process) process-click-convert <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="process-click-convert" xmlns="uri:falcon:process:0.1"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/> </cluster> <cluster name="us-clusterGamma"> <validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/> </cluster> </clusters> <parallel>2</parallel> <order>FIFO</order> <frequency>minutes(30)</frequency> <timezone>UTC</timezone> <inputs> <input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/> </inputs> <outputs> <output name="Output" feed="feed-click-convert" instance="now(0,-30)"/> </outputs> <properties> <property name="queueName" value="stream"/> <property name="jobPriority" value="NORMAL"/> </properties> <workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/> </process> (workflow) /projects/support/click/conversion/workflow.xml <workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'> <start to='click-convert' /> <action name='click-convert'> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${Output}"/> <delete path="${wf:conf('Output.stats')}"/> <delete path="${wf:conf('Output.tmp')}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.job.priority</name> <value>${jobPriority}</value> </property> </configuration> nn.cluster.my.com <main-class>com.my.cluster.io.Driver</main-class> <arg>-inputpath</arg><arg>${Input}</arg> <arg>-outputpath</arg><arg>${Output}</arg> <arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg> <arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg> </java> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
  • 16. Falcon Instance Operation Command: falcon instance -type [feed/process] -[status/list] falcon entity -list -type [feed/process] -name [processname/feedname] {-start YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS] Feed/Process OPTIONS • status • list • logs • kill • rerun • suspend • resume
  • 17. Monitoring • Falcon CLI • Oozie CLI • ActiveMQ • falcon entity type -type process -name falcon-sanjeev-process -dependency (cluster) uk-clusterAlpha (feed) uk-inputfeed - [Input] (feed) uk-outputfeed - [Output] • falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end 2015-02-23T00:00Z -status Consolidated Status: SUCCEEDED Instances: Instance Cluster SourceCluster Status Start End Details Log ----------------------------------------------------------------------------------------------- 2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135- oozie-oozi-W
  • 19. OnBoarding Pipeline • Group All Process • Minutely, Hourly, Daily, Weekly, Monthly • Group Related Feeds • Verify All process jars, workflows pushed to cluster • Verify ownerships of all feed and process directories • Verify owners have job scheduling access roles in particular cluster • Validate the feeds • Submit and schedule the feeds, so retention and replication is in place • Dryrun the process schedule • Submit and schedule the process • Document the FEED SLA, HDFS Usage, retention period for monitoring • Document the PROCESS SLA, to observe delays
  • 20. Challenges • Tightly Integrated with Oozie • Monitoring, onboarding needs streamlined • Realtime change in Schedule Time, Queues Advantages • Development is very aggressive • Industry is adopted quickly • Once onboarded, focus only needs to be on set of critical process • Easy shutdown and upgrade, as all the running jobs are managed by oozie • DevOps can do easy setup and manage data