Ultimately, in Reverse Time Seismic Migration (RTM), the coherence between two wavefields is determined across all depth-common gathers (i.e., source-receiver pairings) of seismic-reflection data. Because coherence between the two wavefields minimizes the impact of artifacts in the imaged section (or volume) arising from complex geological structures (e.g., folds, faults, domes, steeply dipping lithological interfaces), seismic-reflection data processed via RTM most-accurately depicts all reflectors in their actual locations in space and time (e.g., Zhou, Practical Seismic Data Analysis, Cambridge University Press, 2014).
In the classical approach for RTM, forward modeling involving the three-dimensional wave equation (3D-WEM) results in source wavefields that are computed using the Finite Difference Method (FDM), and then stored to disk. In a subsequent step, and on a per-gather basis, source wavefields are read from disk so that they can be cross-correlated with the backwards-propagated (i.e., time-reversed) wavefields corresponding to the receivers - a step that again requires use of the FDM modeling kernel for the 3D-WEM. The inherent requirement for disk I/O involving multiple TB volumes of seismic-reflection data, during the application of the imaging condition (i.e., the cross-correlation step), results in a performance penalty well known to be highly problematical throughout the petroleum-exploration industry.
Over the past decade or so, General Purpose Graphics Processing Units (GPGPUs) have been employed to significantly reduce the processing burden of disk I/O in executing RTM. Broadly speaking, in applying RTM’s imaging condition, algorithms have made effective and efficient use of both the memory hierarchy as well as parallel-processing capabilities inherent in GPGPUs. Despite the progress that has been made, particularly in the implementation of algorithms using CUDA for programming GPGPUs, the computational performance of RTM remains an active area of research that continues to engage academics as well as industry.
The need to cross-correlate two wavefields in the application of RTM’s imaging condition remains one of two fundamental challenges with use of the method in practice (e.g., Liu et al., Computers & Geosciences 59, 17–23, 2013). In a significant departure from previous approaches, this computational challenge is addressed here through the introduction of Resilient Distributed Datasets (RDDs) for RTM’s precomputed source wavefields. RDDs are a relatively recent abstraction for in-memory computing ideally suited to distributed computing environments like clusters (Zaharia et al., NSDI 2012, http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf). Originally introduced for Big Data Analytics and popularized (e.g., Lumb, “8 Reasons Apache Spark is So Hot”, insideBIGDATA, http://insidebigdata.com/2015/03/06/8-reasons-apache-spark-hot/, 2015) through the open-source implementation known as Apache Spa