• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Amazon cloud based video transcoding

Amazon cloud based video transcoding



Amazon Cloud, video transcoding, stream processing, hadoop, mapreduce, stream processing, kinesis, storm, job scheduling, auto scaling, load balancing, work flow management, video standard, Mpeg-1/-2, ...

Amazon Cloud, video transcoding, stream processing, hadoop, mapreduce, stream processing, kinesis, storm, job scheduling, auto scaling, load balancing, work flow management, video standard, Mpeg-1/-2, mpeg-4 avc, h.264, hevc, h.265, audio standard, mp3, AC-3, AAL, ALAC.



Total Views
Views on SlideShare
Embed Views



2 Embeds 4

http://www.linkedin.com 3
https://www.linkedin.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Amazon cloud based video transcoding Amazon cloud based video transcoding Presentation Transcript

    • AMAZON CLOUD-BASED VIDEO TRANSCODING Yu Huang Sunnyvale, California yu.huang07@gmail.com
    • Page 2 OUTLINE  Parallel Computing    Cloud computing         Private, public or hybrid Amazon Web Service (AWS) MapReduce – A divide-and-conquer method Implementations of MapReduce (MR) Hadoop: Java implementation of MapReduce Distributed Stream Processing Lambda Structure: stream processing + batch processing Cloud-based Transcoding   Parallelism types Parallel programming models Netflix, HDCloud, Amazon,… Key Issues in Cloud-based Transcoding  Load balancing, Auto-scaling, Monitoring, Fault-tolerance, Job scheduling, Automatic workflow cloud management.
    • Page 4 PARALLELISM TYPES     Data‐level parallelism Task‐level parallelism Instruction‐level parallelism Bit‐level parallelism Flynn's Classical Taxonomy  Shared Memory Architecture  Distributed Memory Architecture  Hybrid Distributed‐Shared Memory Architecture
    • Page 5 PARALLEL PROGRAMMING MODELS  Shared Memory Model     Distributed Memory Model    tasks using their own local memory exchange data through communications by sending and receiving messages; Message Passing Interface (MPI); Data Parallel Model   tasks share a common address space, which they read/write asynchronously; a single process can have multiple, concurrent execution paths; POSIX Threads and OpenMP (Multi-threaded); work collectively on the same data structure/a data set; Hybrid      MPI+OpenMP, GPU, Google MapReduce, Yahoo S4, Twitter Storm.
    • Page 6 WHAT IS CLOUD COMPUTING?  A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (Ethernet), usually for large Internet services;  Dynamic provision of services & resource pools in a coordinated fashion;  Cloud computing infrastructure is just a web service interface to operating system virtualization (via hypervisor);  Heterogeneous by virtualization;  Everything as a service (XaaS);  Data intensive: big data;  Distributed parallel;  Like utility computing;   Not grid computing; Large scale datacenter.
    • Page 7 PUBLIC, PRIVATE OR HYBRID?  Private: data and processes are managed within the organization without restrictions of network bandwidth, security exposures and legal requirements;  Public: resources are dynamically provisioned on a fine-grained, self-service basis over the internet via web applications/services  Hybrid: multiple internal/external providers.
    • Page 8 AMAZON WEB SERVICE  Elastic Compute Cloud (EC2) is a web service that provides resizable compute capacity in the cloud, on which a customer can set up server instances as needed;  Elastic Block Store (EBS) offers persistent storage for Amazon EC2 instances.  Elastic MapReduce (EMR) uses Hadoop to distribute vast amounts of data and process across a resizable cluster of EC2 instances;  Simple Storage Service (S3) store/retrieve data (buckets) addressable using a URL;  CloudWatch (Auto Scaling): automatically scale Amazon EC2 capacity up or down;  Elastic Load Balancing (ELB): distribute incoming app. traffic across multiple EC2;  Simple Queue Service (SQS) provides access to reliable messaging infrastructure;  Simple Notification Service (SNS) provides a hosted queue for storing messages;  CloudFront: web service for CDN, integrates with other Web Services to distribute content with low latency, high transfer speeds, and no commitments;  Storage Gateway provides seamless and secure integration betweenon-premises IT environment and AWS’s storage infrastructure;  Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze data;
    • Page 9 AMAZON WEB SERVICE  Amazon Kinesis: service for real-time processing of streaming data at massive scale;  Glacier: low cost secure and durable storage for data archiving and backup;  Simple Workflow (SWF): task coordination and state management for cloud apps;  DynamoDB is a fast, fully managed NoSQL database service for huge data;  ElastiCache makes it easy to deploy, operate, and scale an in-memory cache in the cloud, supporting open-source caching engines: memcached and Redis;  Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring.  CloudSearch set up, manage, and scale a search solution for your website or app;  Elastic Transcoder (ETS) trancoding mobile formats for S3 hosted videos;  Amazon Mechanical Turk enables a diverse, on-demand workforce, with which developers can build human intelligence directly into their applications;  Amazon Web Service (AWS) Management Console provides a GUI for EC2, Elastic MapReduce (EMR), and CloudFront, with additional Amazon infrastructure services;  Amazon Machine Image (AMI) is simply a packaged-up environment for the instance.
    • Page 10 AMAZON WEB SERVICE CloudFront (CDN)
    • Page 11 MAPREDUCE – A DIVIDE-AND-CONQUER METHOD Separate details of the original problem from details of parallelism; map() produces one or more intermediate (key/value pairs) from the split input (“shards”); reduce() combines intermediate (key/value pairs) into final files after partitioning and sorting by key; Scale to a large cluster of machines from a single machine; Fault tolerance: map or reduce; Locality: distributed GFS chunks; Bottleneck: reduce phase can’t start until map phase is completely finished. functional programming (LISP)
    • Page 12 AN EXAMPLE OF MAPREDUCE: INVERTED INDEX  Map over all documents    Emit term as key, (doc#, tf) as value and emit other information as necessary (e.g., term pos) Sort/shuffle: group postings by term Reduce   Gather and sort the postings (e.g., by doc# or tf) Write postings to disk. Doc 1 Doc 2 one fish, two fish Doc 3 red fish, blue fish cat in the hat one red 2 1 cat 3 1 two 1 1 blue 2 1 hat 3 1 fish Map 1 1 1 2 fish 2 2 Shuffle and Sort: aggregate values by keys cat Reduce 3 1 one red 1 2 1 1 2 1 2 1 hat fish blue 3 1 two 1 1 2 2
    • Page 13 IMPLEMENTATIONS OF MAPREDUCE (MR)           Google: original MapReduce implementation; Hadoop: Yahoo; Amazon Elastic MapReduce; MR for Multicore/processor: Stanford Phoenix; MR for GPU: Mars; MR for Cell: BE (Broadband Engine); Py(thon)doop; BashReduce; Cloud MapReduce (Amazon EC2); ……
    • Page 14 HADOOP: JAVA IMPLEMENTATION OF MR  HDFS: data storage and transfer, GFS in Hadoop;      Job Tracker: scheduling, JobConf and JobClient; Task Tracker: status, TaskRunner, map or reduce; Data In/Out:     NamedNode, DataNode, Master Node, Slave Node; Error handling: replication (3 by default); HDFS block size in Input Splits # of reducers in Output; Task Failure: report; Job Scheduler: FIFO, Fair, Capacity,…
    • Page 15 MAPREDUCE IN HADOOP (HIGH LEVEL) Master node MapReduce job submitted by client computer JobTracker Slave node Slave node Slave node TaskTracker TaskTracker TaskTracker Task instance Task instance Task instance
    • Page 16 MAPREDUCE IN HADOOP (LOW LEVEL) Mapper (intermediates) (intermediates) (intermediates) Partitioner Partitioner Partitioner shuffling Mapper (intermediates) Input-Shuffle-Output. Mapper Partitioner  Mapper Input file (intermediates) (intermediates) Reducer Input file Reducer Reducer InputSplit InputSplit RecordReader RecordReader RecordReader RecordReader Mapper Mapper Mapper Reducer Reducer RecordWriter RecordWriter RecordWriter output file output file Mapper (intermediates) (intermediates) (intermediates) (intermediates) OutputFormat InputSplit Reducer output file InputSplit InputFormat (intermediates)
    • Page 17 HADOOP STREAMING VS HADOOP PIPES  Hadoop Streaming:  API that allows programs written in virtually any language to be used as Hadoop Mapper and Reducer implementations;  Use stdin/stdout & text format  Any language (C/C++, Perl, Python, shell, etc)  Hadoop Pipes:  API that provides close coupling between C++ application code and Hadoop;  Use sockets & binary format (more efficient)  C++ library required.
    • Page 18 HADOOP RELATED OPEN SOURCES         HBase: NoSQL in Hadoop (Big Table in Google); Pig: data processing in Hadoop (Yahoo); Mahout: machine learning with Hadoop; Hive: data warehousing in Hadoop (Facebook); Cascading: work over Hadoop API for defining, sharing, executing data-processing workflows ; Chukwa: data collection for monitoring large distributed system; Flume: distributed collection and aggregation for large amounts of log data with HDFS; ZooKeeper: centralized service of maintaining, useful for Hadoop.
    • Page 19 DISTRIBUTED STREAM PROCESSING   Real-time, Low latency; Lambda Structure;     Bynd. MapReduce & Hadoop; Hadoop 2.0: YARN; Batch and stream processing. Data Capture:  Hadoop Chukwa;  Apache Flume;  Facebook Scribe;  LinkedIn Kafka;  Twitter Summingbird;  Taobao TimeTunnel.  Stream Processing:  Twitter Storm: Spouts & Bolts;  Yahoo S4: Actor Programming (PE & Adaptor);  Esper/NEsper: Complex Event Processing (CEP);  IBM StreamBase;  Facebook Puma1/2/3 (Insights & Data Free-way);  Linkedin Samza: YARN;  Amazon Kinesis;  Taobao 银河;  Hstreaming.
    • LAMBDA ARCHITECTURE      Equation “query = function(all data)” which is the basis of all data systems (data is more than information); Human fault-tolerance – the system is unsusceptible to data loss or data corruption Data immutability – store data in it’s rawest form immutable and for perpetuity. Re-computation – with the two principles above it is always possible to (re)compute results Layered structure:      Can discard any view, batch and real time, and just recreate everything from the master data. Mistakes are corrected via recomputation.    Batch layer: unrestrained batch compute, horizontal scalable, high latency, readonly database, raw dataset, override speed layer (like Hadoop); Speed layer: only new data, stream processing, continuous compute, transactional, limited storage of windowed data (such as Storm); Serving layer: query batch views by load and random access. Write bad data? Remove the data & recompute. Bug in view generation? Just recompute the view. Data storage is highly optimized.
    • Lambda Architecture Flowchart
    • DATA ANALYTICS SYSTEM ARCHITECTURE Facebook Apache Online transaction processing
    • Page 23 AMAZON KINESIS         Kinesis scales elastically for real-time processing of streaming big data; Kinesis requires that a user create at least two applications—a “Producer” and a Kinesis application (also called a “Worker”)—using Amazon’s Kinesis APIs; The “Producer” takes data from some source and converts it into a "Kinesis Stream," a continuous flow of 50-kilobyte data chunks sent in the form of HTTP PUTs; The "Worker" takes the data from the Kinesis Stream and does whatever processing is required; The Kinesis application can run on any type of Amazon EC2 instance, and Kinesis will auto-scale the instances to handle varying streaming loads; The Kinesis SDK libraries, used to create Kinesis Producers and applications, is only available for Java, but you can write your Kinesis applications in any language by simply calling the Kinesis APIs directly; Stream output is typically sent to Amazon’s S3, DynamoDB, or Redshift; Kinesis can create DAGs of Kinesis applications and data streams.
    • Page 24 AMAZON KINESIS  High-level Architecture
    • Page 25 AMAZON KINESIS
    • Page 26 AMAZON KINESIS  Terminology:          Data Record: the units of data that are stored in an Kinesis stream, composed of a sequence number, a partition key, and a data blob, which is an un-interpreted, immutable sequence of bytes. Stream: an ordered sequence of data records, and each record in the stream has a sequence number that is assigned by the service, distributed into shards. Shard: a uniquely identified group of data records in an Kinesis stream, composed of multiple shards, each of which provides a fixed unit of capacity; the data capacity of the stream is a function of the number of shards that you specify for the stream. Partition Key: used to group data by shard within the stream, associated with each data record to determine which shard a given data record belongs to. Sequence Number: assigned by the Kinesis service when a record is put into the stream; sequence numbers generally increase over time. Kinesis Application: a consumer of an Kinesis stream that commonly runs on a fleet of Amazon EC2 instances; a Kinesis application may also emit data to a variety of other data services such as Amazon S3, Amazon EMR, or Redshift. Amazon Kinesis Client Library: compiled into the application to enable FD consumption of data from the stream. The DynamoDB table stores control data. Application Name: identifies the application. Producers: entities that submit records to the Amazon Kinesis stream.
    • Page 27 CLOUD-BASED TRANSCODING • • • • • • • Transcoding: computationally expensive process; High peak-to-trough ratio (“bursty”); CapEx hurts media companies more than ever; Rapidly growing need-set requires high scaling; Open source for transcoding: FFMpeg, mplayer, mencoder, VLC Media Player, X264; Video/audio bitrates and containers: 3GP, ASF, FLV, MP4, MOV, AVI,…; Commercial software: On2 (Google), Elemental, Harmonic, … • • Decode-then-encode or limited in the compression domain? Video split and merge cost and gain.
    • Page 28 (DIGITAL) VIDEO TRANSCODING   Transcoding is the direct analog-to-analog or digital-to-digital conversion of one encoding to another; Transcoding is commonly a lossy process, introducing generation loss; however, transcoding can be lossless if the input is losslessly compressed and the output is either compressed or uncompressed;   The most popular definition of transcoding refers to a two-step process in which the original data/file is decoded to an intermediate uncompressed format, which is then encoded into the target format;   In contrast to a conversion, the prefix "trans" emphasizes a conversion from a source to a different destination. The practicable solution is the original format is deemed the master copy, and subsequent finished versions are from copies of that master; Real-time transcoding in a many-to-many way (any input format to any output format) is becoming a necessity to provide true search capability for any multimedia content on any mobile device, with over 500 million videos on the web and a plethora of mobile devices.
    • Page 29 FFMPEG FOR AUDIO TRANSCODING  Basic options/flags that can be used for audio transcoding      -ar <value> used to set the audio frequency of the output file. The common values used are 22050, 44100, 48k Hz. -ac <value> Set the number of audio channels. -ab <value> This flag is used to set the bitrate value of an audio file. e.g. you can use -ab 128k to use the 128kb bitrate. The higher the value, the better is the audio quality. -an This stands for "no audio recording" and can be used to strip out an audio stream from a media file. -acodec This options lets you choose the type of audio codec you want to use. e.g. if you are using ffmpeg on a mp3 file, then it will need the audio codec libmp3lame. you can specify it using -acodec libmp3lame.
    • Page 30 FFMPEG FOR VIDEO TRANSCODING  Basic options/flags that can be used for video transcoding    -b <value> This options sets the bitrate of a video file. e.g. -r <value> This option is used to set the frame rate. s <resolution> used to specify the resolution of the output file.   -aspect <ratio> used to specify the aspect ratio of the output file.    1280x720, 1920x1080 16:9, 4:3 -qscale is a quantization scale which is basically a quality scale for variable bitrate and coding, with lower number indicating a higher quality. running the above command with and without qscale flag and then you can easily see the quality difference. -vcodec <codec> Specify the codec you want to use for video transcoding and FFmpeg will use it. The default codec is based upon the output file format.
    • Page 31 COMMONLY USED AUDIO CODECS        Apple Lossless Audio Codec (ALAC) Sony ATRAC Advanced Lossless (AAL) Dolby Digital (AC-3, Audio Codec 3) MPEG-1/-2 Audio Layer I, II, III (mp3) MPEG-2 Part 7/MPEG-4 Part 3: Advanced Audio Coding (AAC) ITU: G-719/G-722 Windows Media Audio (WMA)
    • Page 38 COMMONLY USED VIDEO CODECS       H.265/MPEG-H HEVC codecs: X265 H.264/MPEG-4 AVC codecs: DivX Pro, Quicktime, X264. H.263/MPEG-4 Part 2 codecs: Xvid, DivX H.262/MPEG-2 codecs: x262 Microsoft: WMV; Google: VP6, VP6-E, VP6-S, VP7, VP8 VP9
    • Page 42 VIDEO STANDARDS: MPEG-4 AVC (H.264) Coder Control Input Video Signal Control Data - Transform/ Scal./Quant. Quant. Transf. coeffs Scaling & Inv. Transform Split into Macroblocks 16x16 pixels Entropy Coding Decoder Intra-frame Prediction Intra/Inter MotionCompensation De-blocking Filter Output Video Signal Motion Data Motion Estimation
    • Page 43 VIDEO STANDARDS: MPEG-1/-2 VS H.264 Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2 (visual) H.264/MPEG-4 part 10 16x16 16x16 (frame mode) 16x8 (field mode) 16x16 16x16 Block Size 8x8 8x8 16x16, 16x8, 8x8 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4 Transform 8x8 DCT 8x8 DCT 8x8 DCT/Wavelet 4x4, 8x8 Int DCT 4x4, 2x2 Hadamard Scalar quantization with step size of constant increment Scalar quantization with step size of constant increment Vector quantization Scalar quantization with step size increase at the rate of 12.5% Entropy coding VLC VLC VLC VLC, CAVLC, CABAC Motion Estimation & Compensation Yes Yes Yes Yes, more flexible Up to 16 MVs per MB Playback & Random Access Yes Yes Yes Yes Macroblock size Quantization
    • Page 44 VIDEO STANDARDS: MPEG-1/-2 VS H.264 (visual) Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2 H.264/MPEG-4 part 10 Integer, ½-pel Integer, ½-pel Integer, ½-pel, ¼-pel Integer, ½-pel, ¼-pel Profiles No 5 8 4 Reference picture one one one multiple forward/backward forward/backward forward/backward forward/forward forward/backward backward/backward I, P, B, D I, P, B I, P, B I, P, B, SP, SI Error robustness Synchronization & concealment Data partitioning, FEC for important packet transmission Synchronization, Data partitioning, Header extension, Reversible VLCs Data partitioning, Parameter setting, Flexible macroblock ordering, Redundant slice, Switched slice Transmission rate Up to 1.5Mbps 2-15Mbps Compatibility with previous standards n/a Yes Yes No Encoder complexity Low Medium Medium High Pel accuracy Bidirectional prediction mode Picture Types 64kbps - 2Mbps 64kbps -240Mbps
    • Page 45 LATEST VIDEO STANDARDS: HEVC (H.265)  HEVC provides the following new features:           Parallel processing with tiles and wavefronts Ultra-low delay processing with dependend slices Quadtree partitioning for prediction and transform with more and larger block sizes Inter-picture prediction block merging Advanced motion vector prediction (AMVP) 8/7-tap luma and 4-tap chroma interpolation High-throughput transform coefficient coding Transform skip mode for screen content coding CABAC as the only entropy coder Sample adaptive offset in-loop filtering (SAO)
    • Page 47 LATEST VIDEO STANDARDS: H.264 VS H.265
    • Page 48 LATEST VIDEO STANDARDS: H.264 VS H.265
    • Page 49 NETFLIX: AMAZON CLOUD-BASED TRANSCODING  Import encrypted source files into Simple Storage Service (S3);  Utilize elastic infrastructure (EC2) for encoding;  Encrypt and store encoded files in S3;  Used AWS: EC2, Elastic Block Storage (EBS), S3, SimpleDB, Elastic Load Balancing (ELB), Elastic Map Reduce (EMR), Simple Queue Service (SQS) etc.
    • Page 50 HD CLOUD: VIDEO TRANSCODING IN THE CLOUD  Diversion Media's SaaS from web-based GUI and RESTFUL APIs;  Use Amazon’s Elastic Cloud Computing (EC2) service;  HD Cloud uses 3 main concepts in its workflow: Stores, Profiles, Jobs;   Set up Profiles to do multi-bitrate and multi-dimension encodes all at once;   Stores can be FTP servers, a CDN, or an Amazon S3 account; Use the Jobs pages to batch ingest multiple master files all at once; RESTful API supports integration with VMS providers, file transfer accelerators, CDNs, security vendors, and legacy transcoding systems.
    • Page 53 PANDASTREAM: VIDEO ENCODING  Elegant REST API makes integration with web app easy for listing, creating, editing and deleting videos;  Runs completely within Amazon's Web Services utilising EC2, S3 and SimpleDB;  By CloudFront (CDN), stream video to users cost effectively;  Output HD video in h264 for multiple players and devices by using the flexible encoding profiles with full control of FFMpeg;  Panda gem for painless integration with Ruby on Rails.
    • Page 54 ANKODER: VIDEO ENCODING ON-DEMAND      A web service built on top of Amazon's SQS, S3 and EC2; Encode on-Demand: Automatic scaling to minimize the queuing time to 10 minutes on high load; Flexible Recipe workflow to customize transcoding tasks and convert to multiple formats in a single request; REST API adapt to any custom workflow; Transparent upload and external storage.
    • Page 55 KEY ISSUES IN CLOUD-BASED TRANSCODING  Load balancing;  Auto-scaling;  Monitoring;  Fault-tolerance;  Job scheduling;  Automatic workflow cloud management.
    • Page 56 What is Job Scheduling?  A job scheduler is a program that enables an enterprise to schedule and, in some cases, monitor computer "batch" jobs (units of work, such as the running of a payroll program).  A job scheduler can initiate and manage jobs automatically by processing prepared job control language statements or through equivalent interaction with a human operator.  Functions:  Avoid starvation;  Maximize throughput;  Minimize response time;  Optimal use of resources.
    • Page 57 Job Scheduling in Hadoop • A pluggable component from Hadoop version 0.17/8/9; – Default scheduler - FIFO. q1 – Fair Scheduler: Built by FaceBook; q2 – Capacity Scheduler: Yahoo!’s scheduler q3 – Dynamic Priority Scheduler: Users proposed in 2008
    • Page 58 What is Load Balancing?  Load balancing is a method to distribute workload across one or more servers, network interfaces, hard drives, or other computing resources;  A load balancer provides the means by which instances of applications can be provisioned and de-provisioned automatically, without requiring changes to the network or its configuration.  Determine the maximum connection rate that the various solutions are capable of supporting;  Failover: continuation of the service after the failure;
    • Page 59 Load Balancing Technologies  Nginx: It could run either as a web server or as a load balancer, reverse proxy used in front of Apache Web servers;  HAProxy: an open-source software application that provides highavailability and load balancing features: http://haproxy.1wt.eu/;  AWS Elastic Load Balancer (ELB): it facilitates distributing incoming traffic among multiple AWS instances (much like HAProxy);  Span Availability Zones (AZ), and can distribute traffic to different AZs;  Zeus Technologies’ Load Balancer (a subset of Traffic Manager);  Citrix’s NetScaler: web application delivery and load balancer.
    • Page 60 What is Auto-Scaling?  Auto-scaling: the system scales up/down when the load increases/ decreases, ability to handle increasing amount of work gracefully;  Vertical scalability:   Horizontal scalability:   Scaling Up: maintain performance levels as concurrent request increases; Scaling Out: meet demand through replication and across a pool of servers; Dimensions  Load   Geographic   Handling increasing load by adding resources; Maintain performance in case of geographically distributed systems; Functional  Adding new features using minimum effort.
    • Page 61 Auto-Scaling Techniques  Amazon’s Cloud Watch: EC2 (CPU, Disk/Network I/O), ELB;  Rightscale: using Amazon SQS and S3;  RightGrid (a back-end batch processing framework) ;  Alert Escalations (a front end alert-based Server Array on horizontal scaling);  Scalr: using Amazon EC2, EBS and S3 (open source);  Lifeguard: Amazon S3, EC2 and SQS;  Kaavo: IMOD.
    • Page 62 What is Workflow Management?  Workflow is loosely-coupled parallel application that consists of a set of computational tasks linked via data- and control-flow dependencies;  how tasks are structured, who performs them, what their relative order is, how they are synchronized, how information flows to support the tasks and how tasks are being tracked.  An activity is a discrete step in a business process (workflow); Activities are orchestrated together in a workflow;  “Service choreography” –description of coordination between two/more parties .  “Service orchestration” – business process is modeled using workflows.
    • Page 63 Workflow Software for Cloud Environ.  Yahoo's Oozie workflow engine: provides workflow management and a coordination engine to manage jobs running on Hadoop;  Kaavo’s IMOD (SaaS): Application centric management of clouds resources, working on EC2, Eucalyptus, Rackspace;  Amazon Simple Workflow (SWF): task coordination and state management for cloud apps;  Twitter Azkaban: A workflow scheduler that allows the independent pieces to be declaratively assembled into a single workflow;  Pegasus WMS (USC): used for Amazon EC2, S3 and Eucalyptus;  Cascading : A data processing API for defining and executing workflows;  DAGMan: part of the Condor project (workload management by UW-Madison) , represents a collection of job dependencies as a directed acyclic graph.
    • Page 64