"In the oil and gas industry, utilizing vast amounts of data has long been identified as an important indicator of operational performance. The measurement of key performance indicators is a routine practice in well construction, but a systematic way of statistically analyzing performance against a large data bank of offset wells is not a common practice. The performance of statistical analysis in real-time is even less common. With the adoption of distributed computing platforms, like Apache Spark, new analysis opportunities become available to leverage large-scale time-series data sets to optimize performance. Two case studies are presented in this talk: the rate of penetration (ROP) and the amount of vibration per run.
By collecting real-time, telemetry data and comparing it with historic sample datasets within the Databricks Unified Analytics Platform, the optimization team was able to quickly determine whether the performance being delivered matched or exceeded past performance with statistical certainty. This is extremely important while trying new techniques with data that is highly variable. By substituting anecdotal evidence with statistical analysis, decision making is more precise and better informed. In this talk we'll share how we accomplished this and the lessons learned along the way."