Wengines, Workflows, and 2 years of advanced data processing in Apache OODT

2,087 views
2,046 views

Published on

With the advent of OODT-215 and OODT-491, there has been a tremendous amount of work to port our next generation Workflow Management system (cutely dubbed "WEngine" for "workflow engine") from an isolated branch into the mainline trunk.

The WEngine system brings amazing advantages including explicit support for branch and bounds in workflow models; prioritized thread pooling and queueing on a per task, and per workflow level; global workflow level conditions (pre and post); condition and workflow timeouts, and an entirely new and more descriptive state model complete with failure codes, and with checkpointing.

WEngine is currently processing the NPOESS Preparatory Project (NPP) PEATE testbed and its thousands of jobs per day, and is being slowly introduced into processing of an entire snow and ice climatology for the Western US and Alaska for the U.S. National Climate Assessment (NCA), working with the world's best snow hydrologists and snow scientists.

With all of those new features, what's an Apache OODT user and fan to do? How can you use WEngine in your system? How does it work today? How will it work tomorrow? We'll answer those questions and more in this fly-by-the-seat-of-your-pants exciting super talk!

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,087
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
41
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Wengines, Workflows, and 2 years of advanced data processing in Apache OODT

  1. 1. Wengines, Workflows, and 2 years of advanced dataprocessing in Apache OODT Chris A. MattmannSenior Computer Scientist, NASA JPL Adjunct Assistant Professor, USCMember, Apache Software Foundation
  2. 2. Agenda• Apache OODT• Workflow Support (Workflow1)• Wengine features (NPP others)• History and Status• Where we‟re at28-Feb-2013 ACNA2013-Mattmann 2
  3. 3. And you are? • Senior Computer Scientist at NASA JPL in Pasadena, CA USA • Software Architecture/Engineering Prof at Univ. of Southern California • Apache Executive Officer and Member involved in – OODT (PMC), Tika (PMC), Nutch (PMC), Incubator (PMC), SIS (PMC), Gora (PMC), Airavata (PMC), cTAKES (Mentor), lots of other projects28-Feb-2013 ACNA2013-Mattmann 3
  4. 4. History of Apache OODT“Oldies but goodies” “Hard man” “Matt man and Crew”information integration 2nd generation “better CAS” Next generation CAS and1st generation CAS 2003-2005 open source@TheASF1999-2003 2005-present 28-Feb-2013 ACNA2013-Mattmann 4
  5. 5. Contexthttp://oodt.apache.org/components/maven/workfl ow/development/developer.html28-Feb-2013 ACNA2013-Mattmann 5
  6. 6. Workflow Manager: some terminology28-Feb-2013 ACNA2013-Mattmann 6
  7. 7. “The Beginning of Workflow”Chris and Paul learn about workflows - 2004 Raj Buyya A Taxonomy of Workflow Management Systems for Grid Computing Workflow Patterns http://workflowpatterns.com28-Feb-2013 ACNA2013-Mattmann 7
  8. 8. “The Beginning: More”Paul is initially more interested in workflows than ChrisChris becomes interested in workflows b/c of this mission - http://oco.jpl.nasa.gov/28-Feb-2013 ACNA2013-Mattmann 8
  9. 9. 2005 – Oh No, a “mission!”Was forced signed up to be the “Lead Process Control System (PCS) developer” for OCOWas worried b/c existing CAS couldn‟t support OCOSchemed brainstormed with Paul about what to do28-Feb-2013 ACNA2013-Mattmann 9
  10. 10. What is Workflow Management?Modeling, executing and monitoring groups of one or more Workflow TasksTasks could be A script file A java process An external command A call to a web service Many more…28-Feb-2013 ACNA2013-Mattmann 10
  11. 11. WorkflowWorkflow has many definitions It‟s typically represented as a graph Task B Task E Task A Task D Task C In traditional science data pipeline systems, this graph is constrained to be a sequential set of process nodes28-Feb-2013 ACNA2013-Mattmann 11
  12. 12. The State of ThingsThe existing CAS was able to handle sequential science data pipelines very well It handles them as a set of individual tasks that are mapped to a product type Tasks are kicked off on ingestion of a product Or by other tasksHowever, the approach and process to executing pipelines and tasks was ad-hoc Task can kick off another task, but by communicating directly with the database to insert its “id” in the “next task” table Tasks are only grouped by product type, so you need to have a product type to have a group of associated tasksAdditionally, the approach didn‟t allow for parallel execution of tasks Tasks were put into a global queueAlso tasks from different “workflows” can compete against one another because the queue is global28-Feb-2013 ACNA2013-Mattmann 12Also control patterns are ad-hoc, does not support standard control flow
  13. 13. New Requirements and DriversWorkflow should be represented as a graph. This will allow for true parallelism.Workflow Management should support identified workflow patterns especially control-flow. The current level of support for control-flow has to a large extent been relegated to tasks. A collection of tasks is associated with a product ingestion and there is only a priority to sort out the order of execution.Data-flow should be captured.The workflow should be able to minimally hook together input and output streams between tasks.Workflow need not have any interaction with a database What if I want to persist a workflow in XML? Or as a flat file, or some other lightweight format28-Feb-2013 ACNA2013-Mattmann 13
  14. 14. Architectural ImplicationsWorkflow Repositories Places to go and fetch and “abstract” workflow description fromWorkflow Execution Engines Give it an abstract workflow, and let it rip Turns an abstract workflow into a “Workflow Instance” Should allow monitoring of the workflow instanceSystem interface Associate abstract workflows with “events” This way, workflows can be tied to things other than just product ingestion ACNA2013-Mattmann28-Feb-2013 14
  15. 15. How is this different from the existing CAS?The Workflow Repository need not be a relational Database It could be a flat file A (set of) XML file(s) An object database Factories create Workflow Repositories, which create WorkflowsTasks are associated with “Workflows”, not “Product Types” This decouples workflow from the File Management aspects of the CASConditions can be pre, or post As opposed to the existing CAS where “Rules” are effectively pre- conditions on a task, and there is no concept of a post condition28-Feb-2013 ACNA2013-Mattmann 15
  16. 16. How is this different from the existing CAS?Workflows are interfaces They could be backed by a (directed graph), or by an iterator (i.e., a sequential pipeline) or by a HashMapWorkflow Tasks have clearly separated out dynamic and static metadata, and they can share metadata Dynamic metadata is passed via the Workflow Engine between all the tasks in a workflow They can all read/write to it Static metadata is associated with each workflow taskWorkflow Events are captured and delivered via Workflow Listeners, which are interfaces Many different backend implementations of Workflow Listeners28-Feb-2013 ACNA2013-Mattmann 16
  17. 17. Workflow ExecutionOnce you‟ve got a Workflow, how do you execute it and turn it into a Workflow Instance?You hand it off to a Workflow Engine28-Feb-2013 ACNA2013-Mattmann 17
  18. 18. What does the Workflow Engine do?Workflow Engine manages: A configurable, extensible thread pool “Worker Threads” are used to process the Workflow Instance they are each handed A queue of worker threads if they aren‟t any available workers in the thread pool to process a Workflow Monitoring which Workers are handling which Workflow Instances, and the state and status of each Workflow InstanceWorkflow Engines execute instances of Workflows28-Feb-2013 ACNA2013-Mattmann 18
  19. 19. What‟s the external interface to the system?Event-based Event names come into the Workflow Manager The Workflow Manager looks up any Workflows associated with the event name The Workflow Manager then calls the Workflow Repository to obtain representations of the Workflow The Workflow Manager then hands off Workflow representations to the Workflow Engine for executionCurrent implementation uses XML-RPC, but it‟s an interface, so it could use REST/HTTP/SOAP/etc.28-Feb-2013 ACNA2013-Mattmann 19
  20. 20. The Workflow ManagerSo, how do we put all of these things together?Well, something like: A Workflow Manager has One or more Workflow Repositories to obtain abstract Workflow descriptions from One or more Workflow Engines to execute Workflows on One or more external interfaces28-Feb-2013 ACNA2013-Mattmann 20
  21. 21. We called this “Workflow1”Worked great for OCO28-Feb-2013 ACNA2013-Mattmann 21
  22. 22. Properties of Workflow1ThreadPool Workflow Engine 1 Thread per entire workflow instance Worked very well for routine production pipeline processing – we know that we will run A <= X <=B jobs per day where A is a good minimal bound on the max threads per JVM – totally OS dependent (256 is a large number) B is the maximal number of threads that doesn‟t bound the JVM28-Feb-2013 ACNA2013-Mattmann 22
  23. 23. ThreadPool washttp://svn.apache.org/repos/asf/oodt/trunk/workfl ow/src/main/resources/workflow.propertiesBased on java.util.concurrentThreadPoolExecutorEasily configurableIf you ran out of threads, scale horizontally and add more JVMs28-Feb-2013 ACNA2013-Mattmann 23
  24. 24. Portion of workflow config for ThreadPool Executor28-Feb-2013 ACNA2013-Mattmann 24
  25. 25. Other Workflow1 StuffBranch and bounds was supported implicitly You want branch and bounds? 1. Define N>1 Workflow that is mapped to an event name 1a. Define N+1 workflow to be “reducer” 2. It will be executed in parallel, hence the branch 3. the Bounds is handled by a pre-condition on N+1 task28-Feb-2013 ACNA2013-Mattmann 25
  26. 26. Metadata context keys Task T1 Task T2 Task T3 Task T4 Workflow Instance "Shared Metadata Context" Task 1: InputFiles: File1.txt OutputFiles: File2.txt Task 2: InputFiles: foobar.txt OutputFiles: foo2.txt, foo1.txt Task 3: OrbitNumber:90004128-Feb-2013 ACNA2013-Mattmann 26 Task4:
  27. 27. Problems with keysKey naming collision Tasks needed to handle this explicitly in “production rules”No grouping of keys Grouping was achieved using “_” key naming scheme PCS_InputFiles PCS_CrawlForDirs28-Feb-2013 ACNA2013-Mattmann 27
  28. 28. Enter this guy Not the one on the left, that‟s my son B Brian Foster - now at Google, curses!28-Feb-2013 ACNA2013-Mattmann 28
  29. 29. And this missionhttp://npp.gsfc.nasa.govNPOESS Preparatory Project (NPP) now called Suomi NPP Sounder PEATE Testbed Element28-Feb-2013 ACNA2013-Mattmann 29
  30. 30. They told Brian thisA little different than the OCO use caseSo,.., the next THREE years worth of jobs, we‟d like to submit today… and then have your “workflow manager” manage the jobs for the next 3 yearsThis effectively blew up our thread pool workflow engine28-Feb-2013 ACNA2013-Mattmann 30
  31. 31. Random David Woollard sighting David Woollard and Brian Foster had to figure out how to solve the NPP problem Decided we need a new workflow manager …branch/fork/sigh28-Feb-2013 ACNA2013-Mattmann 31
  32. 32. Not their faultPaul R. and I and others didn‟t have time to fully watch this, and other OODT PMC members weren‟t really vested in those particular componentsBrian was learning and doing great and we decided in the end that going off into a branch and not destroying Workflow1 users in the trunk was better than having to integrate everything…so we punted28-Feb-2013 ACNA2013-Mattmann 32
  33. 33. NPP Pipeline – more SCF than ops system MetOpA IASI MetOpA IASI IASI IASI L1C L1C Granule GPolygon Map GPolygon File Map File MetOpA IASI L1C MetOpA AMSU-A AMSU-A L1B MetOpA Map Granule Map Orbit Orbit File Boundary File MetOpA MetOpA AMSU-A AMSU-A L1B AMSU-A L1B GPolygon GPolygon File MetOpA MHS MetOpA MHS L1B Granule MetOpA MHS MHS MHS Map File L1B L1B GPolygon Map GPolygon File28-Feb-2013 ACNA2013-Mattmann 33
  34. 34. Enter “Workflow2” or “Wengine”What sucks about Workflow1? Can‟t explicitly model branch and bounds Fixed through “sequential” and “parallel” processors – Paul R.‟s idea OODT-70 No global level workflow conditions Added them OODT-205 Really only pre conditions in Workflow1 Add post conditions OODT-50228-Feb-2013 ACNA2013-Mattmann 34
  35. 35. More improvementsCondition timeouts OK it‟s timed out waiting for a file, run anyways OODT-207Optional or required Allowing boolean OR based conditionals (test this and report its success, but don‟t block) – OODT-208Better failure state reporting and checkpointingOODT-20628-Feb-2013 ACNA2013-Mattmann 35
  36. 36. Yes more improvementsWorkflow Metadata keys https://oodt.jpl.nasa.gov/jira/browse/OODT-303 (internal JPL JIRA -- was already fixed in ASF JIRA in 0.1-incubating) By Group, e.g., PCS/InputFilesGroup/InputFiles PCS/Output/MetFileWriter PCS/FileManagerUrl Task1/SomeKey1Collect all keys for a group wmet.search(“PCS”) -> all keys, can interrogate for values28-Feb-2013 ACNA2013-Mattmann 36
  37. 37. And more…Workflow Lifecycle Management State-driven execution – inversion of controlWhat this literally means – in PCS stat and in PCS OPSUI you see more states28-Feb-2013 ACNA2013-Mattmann 37
  38. 38. Runner FrameworkWorkflow1 had facilities to submit jobs to Resource Manager or to run them on its own locally Was a hack inside of IterativeWorkflowProcessorThreadBrian F. turned this into an explicit interfaceCould hook Workflow directly to e.g., Hadoop I‟m not convinced this was the right way to do this, but I applaud the clean up of my code28-Feb-2013 ACNA2013-Mattmann 38
  39. 39. Sub WorkflowsWorkflows whose sub-tasks can be other workflows (OODT-211)Yes, this is recursive, and mind blowing Task T1 Task T3 Task T4 workflow28-Feb-2013 ACNA2013-Mattmann 39
  40. 40. “Dynamic Workflows”This is one of my favorites OODT-209% ./wmgr-client --url http://localhost:9001 --operation --dynWorkflow --taskIds id1,id2,id3 Task id1 Task id2 Task id328-Feb-2013 ACNA2013-Mattmann 40
  41. 41. Enough, how can I use all this stuff?Brian‟s code existed as forked and un-supported (by community) in NPP repo at JPLBrian, by his own awesomeness, realizes before he leaves me for Google in 2011 that we need to push it to Apachehttp://svn.apache.org/repos/asf/oodt/branches/w engine-branch - last working PEATE version28-Feb-2013 ACNA2013-Mattmann 41
  42. 42. Chris spends 2 years figuring out what Brian didOODT-215 My initial “god” issue to solve everything in JIRA, tried to break the problem down into manageable steps Still took me 2 years – help from Paul R. and from Brian (even though he left for Google he still works on Apache OODT muwahahah)OODT-491 “Finish line tasks for Wengine”28-Feb-2013 ACNA2013-Mattmann 42
  43. 43. Wengine support in trunk first appearsIn Apache OODT 0.4 But was largely a work in progress, and well…didn‟t fully workApache OODT 0.5 happens back compat restored for “Workflow1” style engines Chris and Brian clean up a ton of the branch stuff, and finish most of OODT-491Apache OODT 0.6 we finish for real real real28-Feb-2013 ACNA2013-Mattmann 43
  44. 44. Who will use Wengine?PEATE uses it today Their job processing requirements as an SCF are quite largeU.S. National Climate Assessment (NCA) project, “Snow Hydrology for the Western US and Alaska” will tell you about this on the next slides28-Feb-2013 ACNA2013-Mattmann 44
  45. 45. Talk Part #2Doing stuff with Wengine and why you should care
  46. 46. JPL Snow Serverhttp://snow.jpl.nasa.govFull bore processing and delivery system Near real time and historical processing Dust forcing and snow covered area products Tower data GIS interfaces CSV, JSON, GeoTIFF data format download28-Feb-2013 ACNA2013-Mattmann 46
  47. 47. MODIS Snow Covered Area and Grain Size (MODSCAG) JPL MODSCAG algorithm (Painter et al 2009) Spectral mixture analysis of MODIS Surface Reflectance products Daily 500 m coverage in late morning and early afternoon from NASA satellites Terra and AquaCredit: Tom Painter Upper Colorado River Basin 28-Feb-2013 ACNA2013-Mattmann March 9, 2009 47
  48. 48. MODSCAG Processing: Two Products/ Two InputsMODIS tiles are defined by their horizontal and vertical tile IDs (the 2 characters after the h and the v respectively)Historical Tiles over the Western United States (LPDAAC) Time Range: 2000 - Present h08v04, h08v05, h09v05, h09v04, h10v04 LPDAAC is NASA Land Processes data center located at the USGS Earth Resources Observation and Science (EROS) Center in Sioux Falls, South DakoMODIS Near Real-Time Products (LANCE MODIS NRT) Time Range: Dec 2011 - Present Western United States High Asia28-Feb-2013 ACNA2013-Mattmann 48
  49. 49. Credit: Cameron Goodale28-Feb-2013 ACNA2013-Mattmann 49
  50. 50. Credit: Cameron Goodale28-Feb-2013 ACNA2013-Mattmann 50
  51. 51. Dust Radiative Forcing (W/m2) Dust Radiative Forcing 300 200 100 0MODDRFSDust Radiative Forcing in Snow from MODIS 28-Feb-2013 ACNA2013-Mattmann 51Painter and Bryant, 2012 17 May 2009
  52. 52. Now, what have I cooked up for today?I have an Orion SkyQuest XT8 Classic Dobsonian TelescopeI also have an iPhone 528-Feb-2013 ACNA2013-Mattmann 52
  53. 53. I had a few days of time for some great lunar science28-Feb-2013 ACNA2013-Mattmann 53
  54. 54. As it turns out those images have metadata28-Feb-2013 ACNA2013-Mattmann 54
  55. 55. Add metadataGeocoding, WGS84 lat, lngPlanetary met, TARGET=MOON, etc.28-Feb-2013 ACNA2013-Mattmann 55
  56. 56. Found Hugin28-Feb-2013 ACNA2013-Mattmann 56
  57. 57. Wanted to do something cool with itDiscovered enshapeFigured out how to make it combine images28-Feb-2013 ACNA2013-Mattmann 57
  58. 58. Getting startedWorkflow2 Quick Start on OODT Wiki https://cwiki.apache.org/OODT/workflow2- quick-start-guide.htmlOODT documentation sucks! Check the wiki it‟s better there28-Feb-2013 ACNA2013-Mattmann 58
  59. 59. Will now show you some workflow stuffDreams of moon images, diedWill illustrate dynWorkflows28-Feb-2013 ACNA2013-Mattmann 59
  60. 60. What‟s left?Supporting looking up workflows by category (needed to say “give me all workflows that aren‟t „done‟) OODT-517Fix the resource manager runner OODT-518Fix all the wall clock and per task timing OODT- 51928-Feb-2013 ACNA2013-Mattmann 60
  61. 61. Want to help?dev@oodt.apache.orgOODT-215 and OODT-491 homeworkGet a beer with me or BrianI bribe you?28-Feb-2013 ACNA2013-Mattmann 61
  62. 62. QuestionsThanks!Chris Mattmann@chrismattmannmattmann@apache.org28-Feb-2013 ACNA2013-Mattmann 62

×