The Actor model of concurrent computation discretizes a problem into a series of independent units or actors that interact only through the exchange of messages. Without direct coupling between individual components, an Actor-based system is inherently concurrent and fault-tolerant. These traits lend themselves to so-called “Big Data” applications in which the volume of data to analyze requires a distributed multi-system design. For a practical demonstration of the Actor computational model, a system was developed to assist with the automated analysis of Nondestructive Evaluation (NDE) datasets using the open source Myriad Data Reduction Framework. A machine learning model trained to detect damage in two-dimensional slices of C-Scan data was deployed in a streaming data processing pipeline. To demonstrate the flexibility of the Actor model, the pipeline was deployed on a local system and re-deployed as a distributed system without recompiling, reconfiguring, or restarting the running application.
3. • Distributed Processing Architectures
• Actor Model
• Defect Detection Algorithm
• Sample Results
• Q & A
Agenda
Introduction
Myriad Desktop UI
Emphysic Actor Model for NDE Analysis
4. Comparing Distributed Processing Models
Architecture
Apache Spark
Batch Processing Model (aka
Map-Reduce)
Apache Storm
Stream Processing Model
Akka
Actor Processing Model
Emphysic Actor Model for NDE Analysis
5. • Lightweight
• 1 actor ~ 300 bytes in RAM
• Fault-tolerant
• “Let it crash”
• Configurable
• Understandable
Benefits of Actor Model
Emphysic Actor Model for NDE Analysis
6. • Actor-based “pipeline parallelism” structure
• Algorithm is divided into a series of concurrent
stages
• Each stage in the algorithm consists of a central
routing Actor, one or more worker Actors, and a
work queue
• Output of one stage input to subsequent stage
• Pyramid Actor blurs and subsamples data, sends each
step to a Window Actor
• For each Window the Window Actor sends to a Defect
Scanner Actor
• Defect Scanner Actor sends to Reporter Actor
Overview
Architecture
Defect Detection Structure
Emphysic Actor Model for NDE Analysis
7. • Blur
• Convolve with a blur kernel, usually
• Box or
• Gaussian
• Usually approximated w. 3 Box filter
passes
• Must account for edges
• Subsample
• Also known as down-sampling or decimation
• Take every nth element
Pyramid Actor
Algorithms & Components
Pyramid algorithm
Emphysic Actor Model for NDE Analysis
8. Gaussian Pyramid
Algorithms & Components
STEP 1 STEP 2 STEP 3 STEP 4
80×80 40×40 20×20 10×10
Emphysic Actor Model for NDE Analysis
9. • Scan across each dataset
• Each window is scanned independently for defect
signals
• Tradeoffs:
• Speed of scan affected by size of input data and
• Size of window
• Amount of overlap (smaller step size)
• Step size makes it more likely to detect ROI but
also more likely to find the same ROI multiple
times
Window Actor
Algorithms & Components
Sliding Window Algorithm
Emphysic Actor Model for NDE Analysis
10. • Simple Interface – get data, return True if defect found
• Bundle online learning algorithm with (optional)
preprocessor into a single small (~ 10kB) binary
package, or
• Parallelize existing algorithms
• No need to port to Java, can call external code
(Python, MATLAB, C++, etc.) with system call
Defect Scan Actor
Algorithms & Components
Defect scanner interface
Emphysic Actor Model for NDE Analysis
11. • Compiles results of defect scanning
• Every stage in the process adds metadata to the
message
• Data ingestion – data source
• Pyramid – scaling factor
• Sliding Window – position within scaled data
• Defect detection – ROI found
• Metadata allows the Reporting stage to find ROI relative
to the original input
Reporter Actor
Algorithms & Components
Reporting ROI Results
Emphysic Actor Model for NDE Analysis
12. • Training data
• 2-D slices of ultrasonic
• 15x15 elements
• Model
• Passive Aggressive
learning algorithm
• Sobel edge detection
preprocessing
• Pipeline
• 423 workers
• 1 Ingestor
• 2 Scalers
• 4 Pre-processors
• 128 Sliding
• 256 Defect
• 32 Reporters
Using A Model
Demonstration
Emphysic Actor Model for NDE Analysis
13. • Sample Input
• 33 separate data files (CSV, JPEG, TIFF, etc.)
• 60 million data points
• Single System Single Process (SSSP)
• Eight cores 32GB RAM
• 1 process
• Single System Multiprocess (SSMP)
• Eight cores 32GB RAM
• 184 processes
• Multisystem Multiprocess (MSMP)
• Eight cores 32GB RAM local
• Eight cores 32GB RAM remote (Azure VM)
• 88 local processes 128 remote (216 total)
Trial Number
Architecture
SSSP SSMP MSMP
1 302.66 106.69 107.28
2 299.16 99.43 106.94
3 297.00 111.87 106.11
4 303.22 110.20 106.05
5 299.39 103.83 106.13
Mean Processing Time [s] 300.28 106.40 106.50
Mean Throughput [Points Per
Second]
2.07E+05 5.87E+05 5.85E+05
Sample Throughputs
Emphysic Actor Model for NDE Analysis
When designing a distributed processing system there are two primary models you’ll encounter. Batch processing is typically used for “slow” data, when you have a large amount of data already and/or it’s OK if you take hours to process. Stream processing is for “fast” data in which the data comes in continuously and/or you need to analyze in or near real time. Actor processing is a third model that’s not as well known as the first two, but is used in Google’s Go programming language, the Erlang programming language, and if you dig deep into Spark’s structure. Actors are independent entities that neither know nor care about the rest of the system, only interacting with their mailbox.
Often when considering a distributed processing structure you’ll have a mental model of your application and how you envision processing the data. Batch and stream processing tend to force you to adapt your mental model to their mode of operation, while as a lower-level architecture the Actor model is more easily adapted to fit your approach.
Each stage runs simultaneously and is itself concurrent. Stages can be on the same or different systems – as long as a stage is reachable at a URL it can be anywhere and can be moved dynamically (i.e. without recompiling or even restarting the running application).
The Gaussian Pyramid stage provides scale invariance to our defect detection. If a flaw signal was much larger than the area the defect detection algorithm scans for example it might not be detected without considering the input data at multiple scales.
The Sliding Window stage extracts subsets or “windows” of data from the output of the Gaussian Pyramid stage. The size of the window is determined by the size of the input expected by the defect detection algorithm.
Although in the present work we use machine learning algorithms, any algorithm in any language can be built into a distributed data processing pipeline.
Each stage of our detection pipeline not only sends output data to the subsequent stage but metadata as well. The result is by the time we get to the end of the pipeline, we have sufficient information about what’s happened to the data that we can visually indicate detected anomalies on the original input.
In this video, a mid-range 8 core desktop is used to spin up more than 400 Actors to build a local processing pipeline. We can build an ad-hoc distributed pipeline by updating the URL of one or more stages to point at a remote system. We see that ROI are often detected several times which is normal in machine vision applications – here it’s because we’re seeing the ROI multiple times as we resize and raster across the input. Later builds of the desktop tool include tools to reduce this visual clutter (union of bounding boxes, non-maximum suppression techniques, confidence thresholds, etc.).
Each of the applications uses the algorithm we’ve outlined and only differs in the number of processes and systems in the pipeline. As expected multiple processes are able to process our sample dataset much faster than a single process.
At first glance the chart on the left doesn’t provide much support for a distributed architecture considering it has nearly the same data throughput. In a processing pipeline however it’s not just data points per second we’re interested in, we’re also interested in the system’s capacity to gracefully deal with bursts or long-term increases in data. One way to gauge capacity is to measure resource usage during processing, and as the chart on the right shows the single system multiprocess application is using virtually all available resources while the distributed architecture is not. This suggests that the single system is likely already running at peak capacity and would not be able to handle an increase in data.
When the sample input is doubled, both the distributed and the single process systems deal with the increase gracefully (doubled input leads to roughly doubled output time). In contrast the single system multiprocess system which was already using 100% of the CPU is not able to deal with the increase and wasn’t able to complete processing after more than 6 hours of runtime.