SlideShare a Scribd company logo
1 of 21
Apache Falcon
DevOps
Sanjeev Tripurari
Tech Lead Operations@inmobi
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters.
http://falcon.apache.org
What’s on GRID
/user/sanjeev
/user/mohit
/user/Iliyas
/projects/meetup
/projects/support
/data/stream/click
/data/stream/beacon
Basic Components
Falcon
• Prism
• Server
• Client
ActiveMQ
Oozie
Hadoop
What’s in for DevOps
Cluster
NameNode, JT, Oozie, ActiveMQ, Colo
Feed
Data, DataPath, Lifetime, Retention, Owner,Replication
Process
Job, Queue, Priority, Parallelism, Input, Output, Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR1/2
Server
Oozie ActiveMQ
HDFS - MR1/2
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command:
falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml
falcon entity -list -type [cluster/feed/process]
Cluster
• Submit
• Delete
falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS]
Feed/Process OPTIONS
• schedule
• Status
• list
• touch
• depedency
• definition
• update
• delete
Falcon Cluster
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1">
<interfaces>
<interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/>
<interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/>
<interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/>
<interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/>
<interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/>
</interfaces>
<locations>
<location name="staging" path="/store/falcon/staging"/>
<location name="temp" path="/tmp"/>
<location name="working" path="/store/falcon/working"/>
</locations>
<properties>
<property name="colo.name" value="uk"/>
</properties>
</cluster>
Falcon Feed
<feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>input</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Feed
<feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>output</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Process
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
</cluster>
</clusters>
<parallel>1</parallel>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<inputs>
<input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" />
</inputs>
<outputs>
<output instance="now(0,0)" feed="uk-outputfeed" name="output" />
</outputs>
<properties>
<property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/>
<property name="user" value="${user()}"/>
<property name="baseTime" value="${today(0,0)}"/>
</properties>
<workflow engine="oozie" path="/user/sanjeev/falcon/workflow" />
<retry policy="periodic" delay="minutes(10)" attempts="3" />
</process>
Oozie Workflow
<workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow">
<start to="fs-cmds"/>
<action name="fs-cmds">
<fs>
<mkdir path='${output}'/>
</fs>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
What’s on HDFS
Input Feed: /user/sanjeev/falcon/input/2015/02/20/00
Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18
Output Feed: /user/sanjeevt/falcon/output/
Workflow: /user/sanjeevt/falcon/workflow/workflow.xml
falcon entity -type cluster -submit -file uk-clusterAlpha.xml
falcon entity -type feed -submit -file uk-inputfeed.xml
falcon entity -type feed -submit -file uk-outputfeed.xml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
Typical Production Process and Workflow
(process) process-click-convert
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="process-click-convert" xmlns="uri:falcon:process:0.1">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/>
</cluster>
<cluster name="us-clusterGamma">
<validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/>
</cluster>
</clusters>
<parallel>2</parallel>
<order>FIFO</order>
<frequency>minutes(30)</frequency>
<timezone>UTC</timezone>
<inputs>
<input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/>
</inputs>
<outputs>
<output name="Output" feed="feed-click-convert" instance="now(0,-30)"/>
</outputs>
<properties>
<property name="queueName" value="stream"/>
<property name="jobPriority" value="NORMAL"/>
</properties>
<workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/>
</process>
(workflow) /projects/support/click/conversion/workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'>
<start to='click-convert' />
<action name='click-convert'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${Output}"/>
<delete path="${wf:conf('Output.stats')}"/>
<delete path="${wf:conf('Output.tmp')}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.job.priority</name>
<value>${jobPriority}</value>
</property>
</configuration>
nn.cluster.my.com
<main-class>com.my.cluster.io.Driver</main-class>
<arg>-inputpath</arg><arg>${Input}</arg>
<arg>-outputpath</arg><arg>${Output}</arg>
<arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg>
<arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
Falcon Instance
Operation
Command:
falcon instance -type [feed/process] -[status/list]
falcon entity -list -type [feed/process] -name [processname/feedname] {-start
YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS]
Feed/Process OPTIONS
• status
• list
• logs
• kill
• rerun
• suspend
• resume
Monitoring
• Falcon CLI
• Oozie CLI
• ActiveMQ
• falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
• falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end
2015-02-23T00:00Z -status
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboard
• https://github.com/ajayyadav/falcon-dashboard
OnBoarding Pipeline
• Group All Process
• Minutely, Hourly, Daily, Weekly, Monthly
• Group Related Feeds
• Verify All process jars, workflows pushed to cluster
• Verify ownerships of all feed and process directories
• Verify owners have job scheduling access roles in particular cluster
• Validate the feeds
• Submit and schedule the feeds, so retention and replication is in place
• Dryrun the process schedule
• Submit and schedule the process
• Document the FEED SLA, HDFS Usage, retention period for
monitoring
• Document the PROCESS SLA, to observe delays
Challenges
• Tightly Integrated with Oozie
• Monitoring, onboarding needs streamlined
• Realtime change in Schedule Time, Queues
Advantages
• Development is very aggressive
• Industry is adopted quickly
• Once onboarded, focus only needs to be on set of critical process
• Easy shutdown and upgrade, as all the running jobs are managed by oozie
• DevOps can do easy setup and manage data
Thank You

More Related Content

What's hot

MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
Sveta Smirnova
 
MySQL Best Practices - OTN
MySQL Best Practices - OTNMySQL Best Practices - OTN
MySQL Best Practices - OTN
Ronald Bradford
 
Js infrostructure
Js infrostructureJs infrostructure
Js infrostructure
Igor Alpert
 
MHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksMHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirks
Colin Charles
 

What's hot (20)

MySQL Group Replicatio in a nutshell - MySQL InnoDB Cluster
MySQL Group Replicatio  in a nutshell - MySQL InnoDB ClusterMySQL Group Replicatio  in a nutshell - MySQL InnoDB Cluster
MySQL Group Replicatio in a nutshell - MySQL InnoDB Cluster
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Ch10.애플리케이션 서버의 병목_발견_방법
Ch10.애플리케이션 서버의 병목_발견_방법Ch10.애플리케이션 서버의 병목_발견_방법
Ch10.애플리케이션 서버의 병목_발견_방법
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Monitoring Oracle Databases with Opsview
Monitoring Oracle Databases with OpsviewMonitoring Oracle Databases with Opsview
Monitoring Oracle Databases with Opsview
 
MySQL InnoDB Cluster 미리보기 (remote cluster test)
MySQL InnoDB Cluster 미리보기 (remote cluster test)MySQL InnoDB Cluster 미리보기 (remote cluster test)
MySQL InnoDB Cluster 미리보기 (remote cluster test)
 
MySQL Best Practices - OTN
MySQL Best Practices - OTNMySQL Best Practices - OTN
MySQL Best Practices - OTN
 
#34.스프링프레임워크 & 마이바티스 (Spring Framework, MyBatis)_스프링프레임워크 강좌, 재직자환급교육,실업자교육,국...
#34.스프링프레임워크 & 마이바티스 (Spring Framework, MyBatis)_스프링프레임워크 강좌, 재직자환급교육,실업자교육,국...#34.스프링프레임워크 & 마이바티스 (Spring Framework, MyBatis)_스프링프레임워크 강좌, 재직자환급교육,실업자교육,국...
#34.스프링프레임워크 & 마이바티스 (Spring Framework, MyBatis)_스프링프레임워크 강좌, 재직자환급교육,실업자교육,국...
 
Smooth as Silk Exadata Patching
Smooth as Silk Exadata PatchingSmooth as Silk Exadata Patching
Smooth as Silk Exadata Patching
 
55 New Features in JDK 9
55 New Features in JDK 955 New Features in JDK 9
55 New Features in JDK 9
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Everything You Need to Know About MySQL Group Replication
Everything You Need to Know About MySQL Group ReplicationEverything You Need to Know About MySQL Group Replication
Everything You Need to Know About MySQL Group Replication
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
 
Oracle SOA 12.2.1 Installation
Oracle SOA 12.2.1 InstallationOracle SOA 12.2.1 Installation
Oracle SOA 12.2.1 Installation
 
Flex Cluster e Flex ASM - GUOB Tech Day - OTN TOUR LA Brazil 2014
Flex Cluster e Flex ASM - GUOB Tech Day - OTN TOUR LA Brazil 2014Flex Cluster e Flex ASM - GUOB Tech Day - OTN TOUR LA Brazil 2014
Flex Cluster e Flex ASM - GUOB Tech Day - OTN TOUR LA Brazil 2014
 
Js infrostructure
Js infrostructureJs infrostructure
Js infrostructure
 
MHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksMHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirks
 
20160821 coscup-my sql57docstorelab01
20160821 coscup-my sql57docstorelab0120160821 coscup-my sql57docstorelab01
20160821 coscup-my sql57docstorelab01
 
Installing oracle grid infrastructure and database 12c r1
Installing oracle grid infrastructure and database 12c r1Installing oracle grid infrastructure and database 12c r1
Installing oracle grid infrastructure and database 12c r1
 

Similar to Apache Falcon DevOps

Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
DataWorks Summit
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
elliando dias
 

Similar to Apache Falcon DevOps (20)

Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep Dive
 
Immutant
ImmutantImmutant
Immutant
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs mule
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance Tuning
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAP
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAP
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Apache Falcon DevOps

  • 2. Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. http://falcon.apache.org
  • 4. Basic Components Falcon • Prism • Server • Client ActiveMQ Oozie Hadoop
  • 5. What’s in for DevOps Cluster NameNode, JT, Oozie, ActiveMQ, Colo Feed Data, DataPath, Lifetime, Retention, Owner,Replication Process Job, Queue, Priority, Parallelism, Input, Output, Workflow
  • 6. Basic Enviroment Setup UK US Prism Server Oozie ActiveMQ HDFS - MR1/2 Server Oozie ActiveMQ HDFS - MR1/2
  • 8. Falcon Entity Operation Command: falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml falcon entity -list -type [cluster/feed/process] Cluster • Submit • Delete falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS] Feed/Process OPTIONS • schedule • Status • list • touch • depedency • definition • update • delete
  • 9. Falcon Cluster <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1"> <interfaces> <interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/> <interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/> <interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/> <interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/> <interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/> </interfaces> <locations> <location name="staging" path="/store/falcon/staging"/> <location name="temp" path="/tmp"/> <location name="working" path="/store/falcon/working"/> </locations> <properties> <property name="colo.name" value="uk"/> </properties> </cluster>
  • 10. Falcon Feed <feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>input</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 11. Falcon Feed <feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>output</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 12. Falcon Process <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> </cluster> </clusters> <parallel>1</parallel> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <inputs> <input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" /> </inputs> <outputs> <output instance="now(0,0)" feed="uk-outputfeed" name="output" /> </outputs> <properties> <property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/> <property name="user" value="${user()}"/> <property name="baseTime" value="${today(0,0)}"/> </properties> <workflow engine="oozie" path="/user/sanjeev/falcon/workflow" /> <retry policy="periodic" delay="minutes(10)" attempts="3" /> </process>
  • 13. Oozie Workflow <workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow"> <start to="fs-cmds"/> <action name="fs-cmds"> <fs> <mkdir path='${output}'/> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
  • 14. What’s on HDFS Input Feed: /user/sanjeev/falcon/input/2015/02/20/00 Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18 Output Feed: /user/sanjeevt/falcon/output/ Workflow: /user/sanjeevt/falcon/workflow/workflow.xml falcon entity -type cluster -submit -file uk-clusterAlpha.xml falcon entity -type feed -submit -file uk-inputfeed.xml falcon entity -type feed -submit -file uk-outputfeed.xml falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
  • 15. Typical Production Process and Workflow (process) process-click-convert <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="process-click-convert" xmlns="uri:falcon:process:0.1"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/> </cluster> <cluster name="us-clusterGamma"> <validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/> </cluster> </clusters> <parallel>2</parallel> <order>FIFO</order> <frequency>minutes(30)</frequency> <timezone>UTC</timezone> <inputs> <input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/> </inputs> <outputs> <output name="Output" feed="feed-click-convert" instance="now(0,-30)"/> </outputs> <properties> <property name="queueName" value="stream"/> <property name="jobPriority" value="NORMAL"/> </properties> <workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/> </process> (workflow) /projects/support/click/conversion/workflow.xml <workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'> <start to='click-convert' /> <action name='click-convert'> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${Output}"/> <delete path="${wf:conf('Output.stats')}"/> <delete path="${wf:conf('Output.tmp')}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.job.priority</name> <value>${jobPriority}</value> </property> </configuration> nn.cluster.my.com <main-class>com.my.cluster.io.Driver</main-class> <arg>-inputpath</arg><arg>${Input}</arg> <arg>-outputpath</arg><arg>${Output}</arg> <arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg> <arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg> </java> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
  • 16. Falcon Instance Operation Command: falcon instance -type [feed/process] -[status/list] falcon entity -list -type [feed/process] -name [processname/feedname] {-start YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS] Feed/Process OPTIONS • status • list • logs • kill • rerun • suspend • resume
  • 17. Monitoring • Falcon CLI • Oozie CLI • ActiveMQ • falcon entity type -type process -name falcon-sanjeev-process -dependency (cluster) uk-clusterAlpha (feed) uk-inputfeed - [Input] (feed) uk-outputfeed - [Output] • falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end 2015-02-23T00:00Z -status Consolidated Status: SUCCEEDED Instances: Instance Cluster SourceCluster Status Start End Details Log ----------------------------------------------------------------------------------------------- 2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135- oozie-oozi-W
  • 19. OnBoarding Pipeline • Group All Process • Minutely, Hourly, Daily, Weekly, Monthly • Group Related Feeds • Verify All process jars, workflows pushed to cluster • Verify ownerships of all feed and process directories • Verify owners have job scheduling access roles in particular cluster • Validate the feeds • Submit and schedule the feeds, so retention and replication is in place • Dryrun the process schedule • Submit and schedule the process • Document the FEED SLA, HDFS Usage, retention period for monitoring • Document the PROCESS SLA, to observe delays
  • 20. Challenges • Tightly Integrated with Oozie • Monitoring, onboarding needs streamlined • Realtime change in Schedule Time, Queues Advantages • Development is very aggressive • Industry is adopted quickly • Once onboarded, focus only needs to be on set of critical process • Easy shutdown and upgrade, as all the running jobs are managed by oozie • DevOps can do easy setup and manage data