Architecture of campaign analytics
The issues in the old campaign analytics processes
Building pipeline management framework for robust computing environment
2. Overview
Architecture of Campaign Analytics
What are the issues in the old Campaign Analytics
processes
Build Pipeline Management Framework for robust
computing environment
4. What are the issues the
framework needs to solve
Consistent and robust framework
Adding a new analytics job more easier
Ability to coordinate complex workflows
(serialized and parallel processing)
It should support the catch-up feature
It should make debugging and tracing
easier
5. What does Oozie provide?
Workflow Engine
Workflow definition
A DAG with control flow nodes or action nodes (connected with
transition arrows)
Workflow Nodes
Control flow nodes (start, end, decision, fork, join, kill node)
Action nodes (Map-reduce, pig, Java, Script, etc.)
Parameterization of Workflow
Job Properties
EL functions (Basic EL, WF EL, Hadoop EL, HDFS EL)
Oozie Console
Oozie Client and API
7. Campaign Analytics Pipeline
Management Framework
Campaign Analytics Pipeline Management Framework(PMF) is
built on top of Oozie.
PMF defines campaign analytics processing pipeline. Each
pipeline includes a set of workflows.
PMF organizes, schedules and coordinates the campaign
analytics jobs. It also provides the built-in catch-up feature to
make the pipeline robust.
Oozie workflow engine executes workflows and sending jobs
status to Oozie server.
Monitoring/Tracing jobs through Oozie console.
8. PMF & Oozie Execution Env.
PMF Servers
Own Pipeline definition
Passing workflow tasks to Oozie through Ooize client
Oozie Server
Executes workflow tasks
Manages task status
Hadoop Cluster
Workflow definition deployed in HDFS
M/R processes run on the cluster
Oozie Console
10. Current Workflows
PMF manages three pipelines (hourly
pipeline, daily pipeline, and weekly
pipeline)
Includes 12 workflows
Map/Reduce Jobs run per month:
~100,000 jobs