UNCERTAINTY QUANTIFICATION OF
        GEOSCIENCE PREDICTION MODELS
               BASED ON SUPPORT VECTOR
                ...
Outline

• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model...
Outline

• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model...
Uncertainty Quantification (UQ) Framework
                Natural System                                                  ...
Adaptive Stochastic Optimisation for UQ
Sampling
prior                                   iteration
distribution
          ...
Search for Matching Models Challenge
• FW simulation of multiple models generated
 for different combinations of parameter...
UQ Framework with fast ML approximation
                                                                                  ...
Challenges in Geomodelling

• Improve representation of the reality with
  geologically realistic models based on identifi...
Aims
Uncertainty quantification with a geomodel
which is able to improve geological realism
by more effective use of prior...
Outline

• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model...
Support Vector Regression (SVR)
• Linear regression in hyperspace                                     L
                  ...
Semi-supervised Learning Concept

 • Supervised learning with a tutor
    – Learn from known input and output
      (e.g. ...
Kernel Methods on Geo-manifolds
 • Data-driven models incorporate prior knowledge on the domain
   of the problem using gr...
Semi-supervised Approach
• Manifold assumption: data actually lie on the
  low-dimensional manifold in the input space
• G...
Sources of Geo-manifold fro Reservoir Models

 Geo-manifold for reservoir model can be elicited
 from prior information:
 ...
Semi-supervised SVR Geomodel

                                     Prior information



                                  ...
Outline

• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model...
Case Study
Stanford VI: a realistic synthetic reservoir data set
• Fluvial clastic reservoir:
        - sinuous channels
 ...
Variability in Facies Modelling
                 Multi-point simulation realisations




Training Image   Hard well data  ...
Case Study
2D layer slices from different geological section:              porosity truth case
    • sinuous channels
    ...
Stochastic Sampling for Matched Models
• 640 models generated in 8D parameter space
• 40 good fitting models with misfit <...
Fitted Model: Property Distribution
Realistic reproduction of geological structures detected from the prior data:
– fluvia...
Fitted Model Forecast: Fluvial Channels case
Oil and water
production from
7 largest producing
wells:
● History data
  (tr...
Variability of Uncertain Model Properties
• Correlation
  - kernel size σ                                  σ              ...
Non-uniqueness of Semi-supervised SVR
Stochastic realisations, based on geo-manifolds generated with
different random seed...
Impact of Noise in Seismic Data
Original seismic data   with injected noise N(0,σ)     ● unlabelled data




             ...
Production: Stochastic Realisations
Realisations of a single
fitted model with unique
set of parameters
Oil production pro...
Multiple matching models vs Truth case porosity

    Multiple good fitting φ models                               Truth ca...
Fitted Model Forecast: Delta Front case
Oil and water
production from
7 largest producing
wells:

● History data
  (truth ...
Fitted Model Forecast: Delta Front case
Oil production from
7 largest producing
wells:

● History data
  (truth case + noi...
Forecast with Uncertainty
Confidence P10/P90 interval for
production forecast based on
multiple models:
Total oil and wate...
Uncertainty of Model Parameters
Posterior
probability
distribution of the
geomodel
parameters:
• Kernel width
 – correlati...
Outline

• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model...
Conclusions
• A novel learning based model of petroleum reservoir based on
  capturing complex dependencies from data.
•  ...
Further work
• Extension to 3D case by adding one more input to the SVR model
• Integrate other relevant data from outcrop...
Acknowledgments

• J. Caers and S. Castro of Stanford University for providing Stanford
  VI case study
• UK EPSRC grant (...
Research Summary

 • Developed a novel model for petroleum reservoir based on capturing
   complex dependencies from data ...
Multiple good fitting φ models

Labelled (●) & unlabelled (+) data   Seismic data
                                        ...
Next Steps

• Production uncertainty forecasting based on the inference of the
  generated HM models.
• Extension to 3D ca...
Aims

Uncertainty quantification with a geomodel
which is able to improve geological realism
by more effective use of prio...
Content

• Motivation and Aims
• Semi-supervised learning concept
  – Support Vector Machine (SVM) recap
• Machine learnin...
Impact of Noise in Seismic Data
   In a real case additional data (seismic) are usually noisier
     than in our synthetic...
Seismic Data Polluted with Noise
Gaussian noise with zero mean and 3 different std.dev σ is added.




   N(0, σ)         ...
Filtering
Only a low frequency component is left after filtering




 N(0, σ)                     N(0, 2σ)                ...
Geo-manifold
Unlabelled points are generated only in the cells below the threshold




  N(0, σ)                      N(0,...
Porosity SVR Estimates for Noisy Data
Noise level: 1 σ        Noise level: 2 σ          Noise level: 3 σ




Geo-manifold ...
Prediction with a Large Noise Level
Noise level: 3σ




Even with large noise levels the channel
continuity can be traced ...
Impact of Inherent Non-uniqueness
Stochastic realisations
of water production
from 6 largest
producing wells
NA Sampling: Misfit Distribution
Misfit of models
generated by NA


Lowest misfit = 188
NA Sampling: Parameter Distributions
Histogram of
parameter values
for the generated
models


Models generated
by NA home ...
Support Vector Machine (SVM)

Linear separation problem
                              1       αi = 0        Normal Samples...
Upcoming SlideShare
Loading in …5
×

10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)

2,376 views
2,237 views

Published on

10,00 Modelling and analysis of geophysical data using geostatistics and machine learning
Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)

Published in: Technology, Education
1 Comment
3 Likes
Statistics
Notes
  • Outstanding display. Really clear together with helpful
    Sharika
    http://winkhealth.com http://financewink.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,376
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
153
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)

  1. 1. UNCERTAINTY QUANTIFICATION OF GEOSCIENCE PREDICTION MODELS BASED ON SUPPORT VECTOR REGRESSION V. Demyanov1, A. Pozdnoukhov2, M. Kanevski3, M. Christie1 1 Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh, UK vasily.demyanov@pet.hw.ac.uk 2 National Centre for Geocomputation, National University of Ireland, Maynooth. 3 Institute of Geomatics and Risk Analysis, University of Lausanne
  2. 2. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  3. 3. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  4. 4. Uncertainty Quantification (UQ) Framework Natural System Observed Data 2500  1000       2000 800                 1500 600           1000 400       500          200              0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200 3000  Forecast Uncertainty 1000 2500  800  2000   600  1500 2500  1000     Mathematical 400  1000          2000 800       Model 200              500                    Model 0 0 1500 600     MISMATCH  0 200 400 600 800 1000 1200 1400 0 100 200 300 400 500 600 time (days) time (days) parameters   (parameters, pde) 1000 400       500          200               Computationally 0  0 200 400 600 800 1000 1200 1400      0  0 200 400 600 800 1000 1200 1400 expensive time (days) time (days) 1400 3500   1200 3000  1000 2500  800 2000   Computer Simulation Simulated vs Data 600    1500  1000 2500 1000 400      200          500             2000 800       (discretisation, 0     0     0     600 800 1000 1200 1400  200 400   0 100 200 300 400 500 600 1500 600      timestep)  1000 400    time (days) time (days)       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200  3000
  5. 5. Adaptive Stochastic Optimisation for UQ Sampling prior iteration distribution Evaluation: Model 1 Model 2 Model New Model 3 Ranking Reproduction simulation population …………  Mismatch Model n calculation Ensemble of Models Sampling algorithms: • Genetic algorithms • Particle swarm optimisation • Ant Colony optimisation Inferred Ensemble of • Neighbourhood Models for prediction Inference approximation
  6. 6. Search for Matching Models Challenge • FW simulation of multiple models generated for different combinations of parameter values is computationally expensive • High-dimensional parameter space remains fairly empty and poorly described despite thousands of generated models Number of parameters Region of computational efficiency 100-10,000 FW runs Number of points per axis
  7. 7. UQ Framework with fast ML approximation Observed Data Natural System 2500  1000      2000 800                 1500 600           1000 400       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200 3000  1000 2500  Machine Learning 600 800      2000 1500 Forecast Uncertainty  1000 400       2500 1000 Mathematical 200              500             0 0 2000 800        Model 0 200 400 600 800 1000 1200 1400 0 Model 100 200 300 400  500 600                MISMATCH1000 parameters time (days) 1500 (days) time 600       (parameters, pde)  400       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200 3000  1000 2500  800  2000   Simulated vs Data 600   1500 Computer Simulation  1000 400    2500  1000 200         500                  0     0 2000 800     0  200 400 600 800 1000 1200 1400 0 100 200 300 400 500 600   (discretisation, 1500  600          time     (days)  time (days)  timestep)  1000 400       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200  3000
  8. 8. Challenges in Geomodelling • Improve representation of the reality with geologically realistic models based on identifiable parameters. • More effective use of information from various sources by incorporating prior geological and expert knowledge with associate uncertainty • Uncertainty propagation from data into the model without “freezing” assumptions and predefined model dependencies.
  9. 9. Aims Uncertainty quantification with a geomodel which is able to improve geological realism by more effective use of prior information • Model petrophysical properties in a fluvial reservoir using a robust machine learning approach – semi-supervised Support Vector Regression (SVR) • Reproduce realistic geological structures and inherent uncertainty of the geomodel • Integrate additional spatial data that are non-linearly correlated with reservoir properties.
  10. 10. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  11. 11. Support Vector Regression (SVR) • Linear regression in hyperspace L w + C ∑ ξi 2 • Complexity control with training errors: min 1 2 w i =1 SVR is formulated in terms of dot products of input data: (x ∙ x') → K (x , x') where K(x,xi) is a symmetric and positively defined kernel function. Kernel trick projects data into sufficiently high dimensional space: L f ( x) = wx + b f ( x) = ∑ yiα i K ( x,xi ) + b i =1 support vectors
  12. 12. Semi-supervised Learning Concept • Supervised learning with a tutor – Learn from known input and output (e.g. multi-layer perceptron neural network) • Unsupervised learning without a tutor – Learn from known inputs only, no outputs are available (e.g. Kohonen classification maps) • Semi-supervised learning – Learn from a combination of data: • Labelled with both known input and output • Unlabelled with only input available (manifold)
  13. 13. Kernel Methods on Geo-manifolds • Data-driven models incorporate prior knowledge on the domain of the problem using graph models of natural manifolds • Kernel function enforces continuity along the graph model – manifold – obtained from the prior information Spiral manifold Conventional regression Semi-supervised represented by estimate based on regression estimation unlabelled points (+) labelled data only (●) follows the smoothness along the graph
  14. 14. Semi-supervised Approach • Manifold assumption: data actually lie on the low-dimensional manifold in the input space • Geometry of the manifold can be estimated with unlabelled data: – incorporate natural similarities in data – enforce smoothness on the manifold • Manifold carries physical information and incorporates prior physical knowledge • Geo-manifold can reflect stochastic nature of the inherent model uncertainty
  15. 15. Sources of Geo-manifold fro Reservoir Models Geo-manifold for reservoir model can be elicited from prior information: – on-site spatial data (seismic, well logs) – other relevant data (outcrops, modern analogues, lab experiments) – expert knowledge in a non-parametric form – parametric geological models (object shapes, process models) – training image based models
  16. 16. Semi-supervised SVR Geomodel Prior information SVR Learning Seismic data Machine + geo-manifold unlabelled data Stanford VI synthetic case study Semi-supervised (SVR) • poro&perm labelled data from wells
  17. 17. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  18. 18. Case Study Stanford VI: a realistic synthetic reservoir data set • Fluvial clastic reservoir: - sinuous channels - meandering channels - delta front • Geomodel: - multi-points statistics models - sedimentation process model • “Hard” poro/perm data from wells •Synthetic seismic data: - 6 attributes: AI, EI, λ, μ, Sw, Poisson ratio S. Castro, J. Caers and T. Mukerji
  19. 19. Variability in Facies Modelling Multi-point simulation realisations Training Image Hard well data Soft probabilistic data based on seismic
  20. 20. Case Study 2D layer slices from different geological section: porosity truth case • sinuous channels • delta front SVR geomodel (tuneable or fixed parameters): • Spatial correlation size – Gaussian kernel width σ • Continuity strength – Impact of unlabelled data of the manifold • Smoothness along the manifold – Number of unlabelled points in the manifold – Number of neighbours in kernel regression • Prior belief level for seismic data – Weight of additional seismic input (scaling parameter) • Trade-off between goodness of fit and complexity – Regularisation term C determines balance between training error and margin max – Classification error
  21. 21. Stochastic Sampling for Matched Models • 640 models generated in 8D parameter space • 40 good fitting models with misfit < 250 Misfit minimisation: Generated models home in the regions of good fit: Misfit channel porosity 170 180 200 channel permeability 220 250 shale porosity 300 500 1000 shale permeability 2000 5000 channel porosity channel permeability
  22. 22. Fitted Model: Property Distribution Realistic reproduction of geological structures detected from the prior data: – fluvial channels – thin mud channel boundaries – point-bars porosity truth case
  23. 23. Fitted Model Forecast: Fluvial Channels case Oil and water production from 7 largest producing wells: ● History data (truth case + noise) ○ Validation truth case forecast data Matched model
  24. 24. Variability of Uncertain Model Properties • Correlation - kernel size σ σ σ channel sands shale • Smoothness along the manifold - number of unlabelled points N N N channel sands shale • Impact of additional data (seismic) on the predicted variables scaling porosity scaling for permeability • Seismic interpretation uncertainty Amplitude threshold for channel/shale boundary
  25. 25. Non-uniqueness of Semi-supervised SVR Stochastic realisations, based on geo-manifolds generated with different random seeds, represent inherent non-uniqueness of the model with the given combination of the parameter values Realisation 1 Realisation 2 Truth case
  26. 26. Impact of Noise in Seismic Data Original seismic data with injected noise N(0,σ) ● unlabelled data Semi-SVM porosity Truth case porosity Semi-SVM porosity for N(0,2σ) added noise
  27. 27. Production: Stochastic Realisations Realisations of a single fitted model with unique set of parameters Oil production profiles for 10 stochastic realisations for 6 wells: ● History data (truth case + noise) ○ Validation truth case forecast data Oil production profiles for semi-SVR model realisations
  28. 28. Multiple matching models vs Truth case porosity Multiple good fitting φ models Truth case φ The river delta front structure is very similar for different models due to the very clean synthetic seismic with no noise.
  29. 29. Fitted Model Forecast: Delta Front case Oil and water production from 7 largest producing wells: ● History data (truth case + noise) Fitted model Truth case
  30. 30. Fitted Model Forecast: Delta Front case Oil production from 7 largest producing wells: ● History data (truth case + noise) Fitted model Truth case
  31. 31. Forecast with Uncertainty Confidence P10/P90 interval for production forecast based on multiple models: Total oil and water production profiles: ● History data (truth case + noise) ○ Validation truth case forecast data P10/P90 production forecast confidence bounds
  32. 32. Uncertainty of Model Parameters Posterior probability distribution of the geomodel parameters: • Kernel width – correlation – for poro & perm in sand or shale • Continuity in sand and shale bodies – by N unlab • Impact of seismic data to poro & perm – weight
  33. 33. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  34. 34. Conclusions • A novel learning based model of petroleum reservoir based on capturing complex dependencies from data. • Semi-supervised SVR geomodel takes into account natural similarities in space and data relations: – Reproduction of geological structures and anisotropy of a fluvial systems in a realistic way based on prior information on geo-manifold represented by unlabelled data – Robustness to noise and flexible control of signal/noise levels in data to detect geologically interpretable information – Stochastic non-uniqueness inherent to the model is represented by the distribution of unlabelled data • Multiple fitted models match both production history and the validation data in the forecast • Uncertainty of the SVR model is quantified by inference of the multiple generated models, which provide uncertainty forecast envelope based on posterior probability
  35. 35. Further work • Extension to 3D case by adding one more input to the SVR model • Integrate other relevant data from outcrops and lab experiments • Apply SVR modelling approach with Bayesian UQ framework to application in different fields: environmental and climate modelling, epidemiolgy, etc. • 2 PhD positions in the Uncertainty Quantification project: – Geologist, data integration – Uncertainty modelling with machine learning Apply to vasily.demyanov@pet.hw.ac.uk
  36. 36. Acknowledgments • J. Caers and S. Castro of Stanford University for providing Stanford VI case study • UK EPSRC grant (GR/T24838/01) • Swiss National Science Foundation for funding “GeoKernels: kernel- based methods for geo- and environmental sciences” • Sponsors of Heriot-Watt Uncertainty Quantification project:
  37. 37. Research Summary • Developed a novel model for petroleum reservoir based on capturing complex dependencies from data with learning methods. • Novel model provide multiple HM model for different fluvial reservoirs: sinuous channels, delta front – both production history and the validation data in the forecast are matched • Benefits of the novel data driven geomodelling approach: – Reproduce realistic geological structure and anisotropy of property distribution. – Robust to noise in prior data – Relate to identifiable properties: continuity, correlation, prior belief in data, etc. • Model uncertainty is described by the inference of multiple models – Posterior confidence interval describe uncertainty forecast – Uncertainty of the model parameters is quantified by posterior probability distributions
  38. 38. Multiple good fitting φ models Labelled (●) & unlabelled (+) data Seismic data Prior information Learning Machine (SVR)
  39. 39. Next Steps • Production uncertainty forecasting based on the inference of the generated HM models. • Extension to 3D case by adding one more input to the SVR model • Integrate other relevant data from outcrops and lab experiments
  40. 40. Aims Uncertainty quantification with a geomodel which is able to improve geological realism by more effective use of prior information • Explore robustness of semi-supervised SVR geomodel to noisy data • Develop a way to reproduce inherent uncertainty of the semi-supervised SVR geomodel by stochastic realisations • Integrate semi-supervised SVR geomodel into the Bayesian uncertainty quantification framework
  41. 41. Content • Motivation and Aims • Semi-supervised learning concept – Support Vector Machine (SVM) recap • Machine learning based geomodel – Noise pollution experiment – Inherent non-uniqueness of SVR-based model – SVR geomodel in Bayesian sampling framework • Conclusions
  42. 42. Impact of Noise in Seismic Data In a real case additional data (seismic) are usually noisier than in our synthetic case Seismic is processed through a low pass filter to build a manifold of unlabelled points: Elastic impedance Filtering low frequency Channel geo-manifold component from seismic defined by unlabelled points
  43. 43. Seismic Data Polluted with Noise Gaussian noise with zero mean and 3 different std.dev σ is added. N(0, σ) N(0, 2σ) N(0, 3σ) Truth case
  44. 44. Filtering Only a low frequency component is left after filtering N(0, σ) N(0, 2σ) N(0, 3σ) Truth case
  45. 45. Geo-manifold Unlabelled points are generated only in the cells below the threshold N(0, σ) N(0, 2σ) N(0, 3σ) Truth case
  46. 46. Porosity SVR Estimates for Noisy Data Noise level: 1 σ Noise level: 2 σ Noise level: 3 σ Geo-manifold becomes less concentrative and the channel “erodes” with increase of the noise level Truth case
  47. 47. Prediction with a Large Noise Level Noise level: 3σ Even with large noise levels the channel continuity can be traced in SVR prediction although it is barely visible in the input data Truth case
  48. 48. Impact of Inherent Non-uniqueness Stochastic realisations of water production from 6 largest producing wells
  49. 49. NA Sampling: Misfit Distribution Misfit of models generated by NA Lowest misfit = 188
  50. 50. NA Sampling: Parameter Distributions Histogram of parameter values for the generated models Models generated by NA home in the regions of good fit
  51. 51. Support Vector Machine (SVM) Linear separation problem 1 αi = 0 Normal Samples + b= wx 1 0 < αi < C Support Vectors (SV) αi = C Support Vectors untypical or noisy L w + C ∑ ξi 2 Soft margin: min 1 2 w i =1 ξ ξi ≥ 0 slack variables to allow 1 =- noisy samples & outliers +b x2 to lie inside or on the w outer side of the margin Trade-off between: margin maximisation & training error minimisation Increase space dimension to solve separation problem linearly

×