Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache Falcon
DevOps
Sanjeev Tripurari
Tech Lead Operations@inmobi
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed p...
What’s on GRID
/user/sanjeev
/user/mohit
/user/Iliyas
/projects/meetup
/projects/support
/data/stream/click
/data/stream/b...
Basic Components
Falcon
• Prism
• Server
• Client
ActiveMQ
Oozie
Hadoop
What’s in for DevOps
Cluster
NameNode, JT, Oozie, ActiveMQ, Colo
Feed
Data, DataPath, Lifetime, Retention, Owner,Replicati...
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR1/2
Server
Oozie ActiveMQ
HDFS - MR1/2
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command:
falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml
falcon en...
Falcon Cluster
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="uk-clusterAlpha" description="" colo...
Falcon Feed
<feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/20...
Falcon Feed
<feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2...
Falcon Process
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="falcon-sanjeev-process" xmlns="uri:f...
Oozie Workflow
<workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow">
<start to="fs-cmds"/>
<action name="fs-cmd...
What’s on HDFS
Input Feed: /user/sanjeev/falcon/input/2015/02/20/00
Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18
...
Typical Production Process and Workflow
(process) process-click-convert
<?xml version="1.0" encoding="UTF-8" standalone="y...
Falcon Instance
Operation
Command:
falcon instance -type [feed/process] -[status/list]
falcon entity -list -type [feed/pro...
Monitoring
• Falcon CLI
• Oozie CLI
• ActiveMQ
• falcon entity type -type process -name falcon-sanjeev-process -dependency...
MonitoringDashboard
• https://github.com/ajayyadav/falcon-dashboard
OnBoarding Pipeline
• Group All Process
• Minutely, Hourly, Daily, Weekly, Monthly
• Group Related Feeds
• Verify All proc...
Challenges
• Tightly Integrated with Oozie
• Monitoring, onboarding needs streamlined
• Realtime change in Schedule Time, ...
Thank You
Upcoming SlideShare
Loading in …5
×

Apache Falcon - Sanjeev Tripurari

915 views

Published on

Apache Falcon by Sanjeev Tripurari presented in DevOps Bangalore Meetup group on Feb 21st 2015

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Apache Falcon - Sanjeev Tripurari

  1. 1. Apache Falcon DevOps Sanjeev Tripurari Tech Lead Operations@inmobi
  2. 2. Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. http://falcon.apache.org
  3. 3. What’s on GRID /user/sanjeev /user/mohit /user/Iliyas /projects/meetup /projects/support /data/stream/click /data/stream/beacon
  4. 4. Basic Components Falcon • Prism • Server • Client ActiveMQ Oozie Hadoop
  5. 5. What’s in for DevOps Cluster NameNode, JT, Oozie, ActiveMQ, Colo Feed Data, DataPath, Lifetime, Retention, Owner,Replication Process Job, Queue, Priority, Parallelism, Input, Output, Workflow
  6. 6. Basic Enviroment Setup UK US Prism Server Oozie ActiveMQ HDFS - MR1/2 Server Oozie ActiveMQ HDFS - MR1/2
  7. 7. Logical Setup UK US uk-clusterAlpha uk-clusterBeta prism us-clusterGamma
  8. 8. Falcon Entity Operation Command: falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml falcon entity -list -type [cluster/feed/process] Cluster • Submit • Delete falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS] Feed/Process OPTIONS • schedule • Status • list • touch • depedency • definition • update • delete
  9. 9. Falcon Cluster <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1"> <interfaces> <interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/> <interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/> <interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/> <interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/> <interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/> </interfaces> <locations> <location name="staging" path="/store/falcon/staging"/> <location name="temp" path="/tmp"/> <location name="working" path="/store/falcon/working"/> </locations> <properties> <property name="colo.name" value="uk"/> </properties> </cluster>
  10. 10. Falcon Feed <feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>input</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  11. 11. Falcon Feed <feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>output</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  12. 12. Falcon Process <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> </cluster> </clusters> <parallel>1</parallel> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <inputs> <input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" /> </inputs> <outputs> <output instance="now(0,0)" feed="uk-outputfeed" name="output" /> </outputs> <properties> <property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/> <property name="user" value="${user()}"/> <property name="baseTime" value="${today(0,0)}"/> </properties> <workflow engine="oozie" path="/user/sanjeev/falcon/workflow" /> <retry policy="periodic" delay="minutes(10)" attempts="3" /> </process>
  13. 13. Oozie Workflow <workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow"> <start to="fs-cmds"/> <action name="fs-cmds"> <fs> <mkdir path='${output}'/> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
  14. 14. What’s on HDFS Input Feed: /user/sanjeev/falcon/input/2015/02/20/00 Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18 Output Feed: /user/sanjeevt/falcon/output/ Workflow: /user/sanjeevt/falcon/workflow/workflow.xml falcon entity -type cluster -submit -file uk-clusterAlpha.xml falcon entity -type feed -submit -file uk-inputfeed.xml falcon entity -type feed -submit -file uk-outputfeed.xml falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
  15. 15. Typical Production Process and Workflow (process) process-click-convert <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="process-click-convert" xmlns="uri:falcon:process:0.1"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/> </cluster> <cluster name="us-clusterGamma"> <validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/> </cluster> </clusters> <parallel>2</parallel> <order>FIFO</order> <frequency>minutes(30)</frequency> <timezone>UTC</timezone> <inputs> <input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/> </inputs> <outputs> <output name="Output" feed="feed-click-convert" instance="now(0,-30)"/> </outputs> <properties> <property name="queueName" value="stream"/> <property name="jobPriority" value="NORMAL"/> </properties> <workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/> </process> (workflow) /projects/support/click/conversion/workflow.xml <workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'> <start to='click-convert' /> <action name='click-convert'> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${Output}"/> <delete path="${wf:conf('Output.stats')}"/> <delete path="${wf:conf('Output.tmp')}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.job.priority</name> <value>${jobPriority}</value> </property> </configuration> nn.cluster.my.com <main-class>com.my.cluster.io.Driver</main-class> <arg>-inputpath</arg><arg>${Input}</arg> <arg>-outputpath</arg><arg>${Output}</arg> <arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg> <arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg> </java> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
  16. 16. Falcon Instance Operation Command: falcon instance -type [feed/process] -[status/list] falcon entity -list -type [feed/process] -name [processname/feedname] {-start YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS] Feed/Process OPTIONS • status • list • logs • kill • rerun • suspend • resume
  17. 17. Monitoring • Falcon CLI • Oozie CLI • ActiveMQ • falcon entity type -type process -name falcon-sanjeev-process -dependency (cluster) uk-clusterAlpha (feed) uk-inputfeed - [Input] (feed) uk-outputfeed - [Output] • falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end 2015-02-23T00:00Z -status Consolidated Status: SUCCEEDED Instances: Instance Cluster SourceCluster Status Start End Details Log ----------------------------------------------------------------------------------------------- 2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135- oozie-oozi-W
  18. 18. MonitoringDashboard • https://github.com/ajayyadav/falcon-dashboard
  19. 19. OnBoarding Pipeline • Group All Process • Minutely, Hourly, Daily, Weekly, Monthly • Group Related Feeds • Verify All process jars, workflows pushed to cluster • Verify ownerships of all feed and process directories • Verify owners have job scheduling access roles in particular cluster • Validate the feeds • Submit and schedule the feeds, so retention and replication is in place • Dryrun the process schedule • Submit and schedule the process • Document the FEED SLA, HDFS Usage, retention period for monitoring • Document the PROCESS SLA, to observe delays
  20. 20. Challenges • Tightly Integrated with Oozie • Monitoring, onboarding needs streamlined • Realtime change in Schedule Time, Queues Advantages • Development is very aggressive • Industry is adopted quickly • Once onboarded, focus only needs to be on set of critical process • Easy shutdown and upgrade, as all the running jobs are managed by oozie • DevOps can do easy setup and manage data
  21. 21. Thank You

×