Your SlideShare is downloading. ×
10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)

1,863
views

Published on

10,00 Modelling and analysis of geophysical data using geostatistics and machine learning …

10,00 Modelling and analysis of geophysical data using geostatistics and machine learning
Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)

Published in: Technology, Education

1 Comment
2 Likes
Statistics
Notes
  • Outstanding display. Really clear together with helpful
    Sharika
    http://winkhealth.com http://financewink.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,863
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
139
Comments
1
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. UNCERTAINTY QUANTIFICATION OF GEOSCIENCE PREDICTION MODELS BASED ON SUPPORT VECTOR REGRESSION V. Demyanov1, A. Pozdnoukhov2, M. Kanevski3, M. Christie1 1 Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh, UK vasily.demyanov@pet.hw.ac.uk 2 National Centre for Geocomputation, National University of Ireland, Maynooth. 3 Institute of Geomatics and Risk Analysis, University of Lausanne
  • 2. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  • 3. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  • 4. Uncertainty Quantification (UQ) Framework Natural System Observed Data 2500  1000       2000 800                 1500 600           1000 400       500          200              0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200 3000  Forecast Uncertainty 1000 2500  800  2000   600  1500 2500  1000     Mathematical 400  1000          2000 800       Model 200              500                    Model 0 0 1500 600     MISMATCH  0 200 400 600 800 1000 1200 1400 0 100 200 300 400 500 600 time (days) time (days) parameters   (parameters, pde) 1000 400       500          200               Computationally 0  0 200 400 600 800 1000 1200 1400      0  0 200 400 600 800 1000 1200 1400 expensive time (days) time (days) 1400 3500   1200 3000  1000 2500  800 2000   Computer Simulation Simulated vs Data 600    1500  1000 2500 1000 400      200          500             2000 800       (discretisation, 0     0     0     600 800 1000 1200 1400  200 400   0 100 200 300 400 500 600 1500 600      timestep)  1000 400    time (days) time (days)       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200  3000
  • 5. Adaptive Stochastic Optimisation for UQ Sampling prior iteration distribution Evaluation: Model 1 Model 2 Model New Model 3 Ranking Reproduction simulation population …………  Mismatch Model n calculation Ensemble of Models Sampling algorithms: • Genetic algorithms • Particle swarm optimisation • Ant Colony optimisation Inferred Ensemble of • Neighbourhood Models for prediction Inference approximation
  • 6. Search for Matching Models Challenge • FW simulation of multiple models generated for different combinations of parameter values is computationally expensive • High-dimensional parameter space remains fairly empty and poorly described despite thousands of generated models Number of parameters Region of computational efficiency 100-10,000 FW runs Number of points per axis
  • 7. UQ Framework with fast ML approximation Observed Data Natural System 2500  1000      2000 800                 1500 600           1000 400       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200 3000  1000 2500  Machine Learning 600 800      2000 1500 Forecast Uncertainty  1000 400       2500 1000 Mathematical 200              500             0 0 2000 800        Model 0 200 400 600 800 1000 1200 1400 0 Model 100 200 300 400  500 600                MISMATCH1000 parameters time (days) 1500 (days) time 600       (parameters, pde)  400       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200 3000  1000 2500  800  2000   Simulated vs Data 600   1500 Computer Simulation  1000 400    2500  1000 200         500                  0     0 2000 800     0  200 400 600 800 1000 1200 1400 0 100 200 300 400 500 600   (discretisation, 1500  600          time     (days)  time (days)  timestep)  1000 400       500          200               0       0  0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 time (days) time (days) 1400 3500   1200  3000
  • 8. Challenges in Geomodelling • Improve representation of the reality with geologically realistic models based on identifiable parameters. • More effective use of information from various sources by incorporating prior geological and expert knowledge with associate uncertainty • Uncertainty propagation from data into the model without “freezing” assumptions and predefined model dependencies.
  • 9. Aims Uncertainty quantification with a geomodel which is able to improve geological realism by more effective use of prior information • Model petrophysical properties in a fluvial reservoir using a robust machine learning approach – semi-supervised Support Vector Regression (SVR) • Reproduce realistic geological structures and inherent uncertainty of the geomodel • Integrate additional spatial data that are non-linearly correlated with reservoir properties.
  • 10. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  • 11. Support Vector Regression (SVR) • Linear regression in hyperspace L w + C ∑ ξi 2 • Complexity control with training errors: min 1 2 w i =1 SVR is formulated in terms of dot products of input data: (x ∙ x') → K (x , x') where K(x,xi) is a symmetric and positively defined kernel function. Kernel trick projects data into sufficiently high dimensional space: L f ( x) = wx + b f ( x) = ∑ yiα i K ( x,xi ) + b i =1 support vectors
  • 12. Semi-supervised Learning Concept • Supervised learning with a tutor – Learn from known input and output (e.g. multi-layer perceptron neural network) • Unsupervised learning without a tutor – Learn from known inputs only, no outputs are available (e.g. Kohonen classification maps) • Semi-supervised learning – Learn from a combination of data: • Labelled with both known input and output • Unlabelled with only input available (manifold)
  • 13. Kernel Methods on Geo-manifolds • Data-driven models incorporate prior knowledge on the domain of the problem using graph models of natural manifolds • Kernel function enforces continuity along the graph model – manifold – obtained from the prior information Spiral manifold Conventional regression Semi-supervised represented by estimate based on regression estimation unlabelled points (+) labelled data only (●) follows the smoothness along the graph
  • 14. Semi-supervised Approach • Manifold assumption: data actually lie on the low-dimensional manifold in the input space • Geometry of the manifold can be estimated with unlabelled data: – incorporate natural similarities in data – enforce smoothness on the manifold • Manifold carries physical information and incorporates prior physical knowledge • Geo-manifold can reflect stochastic nature of the inherent model uncertainty
  • 15. Sources of Geo-manifold fro Reservoir Models Geo-manifold for reservoir model can be elicited from prior information: – on-site spatial data (seismic, well logs) – other relevant data (outcrops, modern analogues, lab experiments) – expert knowledge in a non-parametric form – parametric geological models (object shapes, process models) – training image based models
  • 16. Semi-supervised SVR Geomodel Prior information SVR Learning Seismic data Machine + geo-manifold unlabelled data Stanford VI synthetic case study Semi-supervised (SVR) • poro&perm labelled data from wells
  • 17. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  • 18. Case Study Stanford VI: a realistic synthetic reservoir data set • Fluvial clastic reservoir: - sinuous channels - meandering channels - delta front • Geomodel: - multi-points statistics models - sedimentation process model • “Hard” poro/perm data from wells •Synthetic seismic data: - 6 attributes: AI, EI, λ, μ, Sw, Poisson ratio S. Castro, J. Caers and T. Mukerji
  • 19. Variability in Facies Modelling Multi-point simulation realisations Training Image Hard well data Soft probabilistic data based on seismic
  • 20. Case Study 2D layer slices from different geological section: porosity truth case • sinuous channels • delta front SVR geomodel (tuneable or fixed parameters): • Spatial correlation size – Gaussian kernel width σ • Continuity strength – Impact of unlabelled data of the manifold • Smoothness along the manifold – Number of unlabelled points in the manifold – Number of neighbours in kernel regression • Prior belief level for seismic data – Weight of additional seismic input (scaling parameter) • Trade-off between goodness of fit and complexity – Regularisation term C determines balance between training error and margin max – Classification error
  • 21. Stochastic Sampling for Matched Models • 640 models generated in 8D parameter space • 40 good fitting models with misfit < 250 Misfit minimisation: Generated models home in the regions of good fit: Misfit channel porosity 170 180 200 channel permeability 220 250 shale porosity 300 500 1000 shale permeability 2000 5000 channel porosity channel permeability
  • 22. Fitted Model: Property Distribution Realistic reproduction of geological structures detected from the prior data: – fluvial channels – thin mud channel boundaries – point-bars porosity truth case
  • 23. Fitted Model Forecast: Fluvial Channels case Oil and water production from 7 largest producing wells: ● History data (truth case + noise) ○ Validation truth case forecast data Matched model
  • 24. Variability of Uncertain Model Properties • Correlation - kernel size σ σ σ channel sands shale • Smoothness along the manifold - number of unlabelled points N N N channel sands shale • Impact of additional data (seismic) on the predicted variables scaling porosity scaling for permeability • Seismic interpretation uncertainty Amplitude threshold for channel/shale boundary
  • 25. Non-uniqueness of Semi-supervised SVR Stochastic realisations, based on geo-manifolds generated with different random seeds, represent inherent non-uniqueness of the model with the given combination of the parameter values Realisation 1 Realisation 2 Truth case
  • 26. Impact of Noise in Seismic Data Original seismic data with injected noise N(0,σ) ● unlabelled data Semi-SVM porosity Truth case porosity Semi-SVM porosity for N(0,2σ) added noise
  • 27. Production: Stochastic Realisations Realisations of a single fitted model with unique set of parameters Oil production profiles for 10 stochastic realisations for 6 wells: ● History data (truth case + noise) ○ Validation truth case forecast data Oil production profiles for semi-SVR model realisations
  • 28. Multiple matching models vs Truth case porosity Multiple good fitting φ models Truth case φ The river delta front structure is very similar for different models due to the very clean synthetic seismic with no noise.
  • 29. Fitted Model Forecast: Delta Front case Oil and water production from 7 largest producing wells: ● History data (truth case + noise) Fitted model Truth case
  • 30. Fitted Model Forecast: Delta Front case Oil production from 7 largest producing wells: ● History data (truth case + noise) Fitted model Truth case
  • 31. Forecast with Uncertainty Confidence P10/P90 interval for production forecast based on multiple models: Total oil and water production profiles: ● History data (truth case + noise) ○ Validation truth case forecast data P10/P90 production forecast confidence bounds
  • 32. Uncertainty of Model Parameters Posterior probability distribution of the geomodel parameters: • Kernel width – correlation – for poro & perm in sand or shale • Continuity in sand and shale bodies – by N unlab • Impact of seismic data to poro & perm – weight
  • 33. Outline • Geoscience modelling under uncertainty • Machine learning based geomodels • Semi-supervised SVR reservoir model – Case study – Robustness to noise – Predictions with uncertainty • Conclusions
  • 34. Conclusions • A novel learning based model of petroleum reservoir based on capturing complex dependencies from data. • Semi-supervised SVR geomodel takes into account natural similarities in space and data relations: – Reproduction of geological structures and anisotropy of a fluvial systems in a realistic way based on prior information on geo-manifold represented by unlabelled data – Robustness to noise and flexible control of signal/noise levels in data to detect geologically interpretable information – Stochastic non-uniqueness inherent to the model is represented by the distribution of unlabelled data • Multiple fitted models match both production history and the validation data in the forecast • Uncertainty of the SVR model is quantified by inference of the multiple generated models, which provide uncertainty forecast envelope based on posterior probability
  • 35. Further work • Extension to 3D case by adding one more input to the SVR model • Integrate other relevant data from outcrops and lab experiments • Apply SVR modelling approach with Bayesian UQ framework to application in different fields: environmental and climate modelling, epidemiolgy, etc. • 2 PhD positions in the Uncertainty Quantification project: – Geologist, data integration – Uncertainty modelling with machine learning Apply to vasily.demyanov@pet.hw.ac.uk
  • 36. Acknowledgments • J. Caers and S. Castro of Stanford University for providing Stanford VI case study • UK EPSRC grant (GR/T24838/01) • Swiss National Science Foundation for funding “GeoKernels: kernel- based methods for geo- and environmental sciences” • Sponsors of Heriot-Watt Uncertainty Quantification project:
  • 37. Research Summary • Developed a novel model for petroleum reservoir based on capturing complex dependencies from data with learning methods. • Novel model provide multiple HM model for different fluvial reservoirs: sinuous channels, delta front – both production history and the validation data in the forecast are matched • Benefits of the novel data driven geomodelling approach: – Reproduce realistic geological structure and anisotropy of property distribution. – Robust to noise in prior data – Relate to identifiable properties: continuity, correlation, prior belief in data, etc. • Model uncertainty is described by the inference of multiple models – Posterior confidence interval describe uncertainty forecast – Uncertainty of the model parameters is quantified by posterior probability distributions
  • 38. Multiple good fitting φ models Labelled (●) & unlabelled (+) data Seismic data Prior information Learning Machine (SVR)
  • 39. Next Steps • Production uncertainty forecasting based on the inference of the generated HM models. • Extension to 3D case by adding one more input to the SVR model • Integrate other relevant data from outcrops and lab experiments
  • 40. Aims Uncertainty quantification with a geomodel which is able to improve geological realism by more effective use of prior information • Explore robustness of semi-supervised SVR geomodel to noisy data • Develop a way to reproduce inherent uncertainty of the semi-supervised SVR geomodel by stochastic realisations • Integrate semi-supervised SVR geomodel into the Bayesian uncertainty quantification framework
  • 41. Content • Motivation and Aims • Semi-supervised learning concept – Support Vector Machine (SVM) recap • Machine learning based geomodel – Noise pollution experiment – Inherent non-uniqueness of SVR-based model – SVR geomodel in Bayesian sampling framework • Conclusions
  • 42. Impact of Noise in Seismic Data In a real case additional data (seismic) are usually noisier than in our synthetic case Seismic is processed through a low pass filter to build a manifold of unlabelled points: Elastic impedance Filtering low frequency Channel geo-manifold component from seismic defined by unlabelled points
  • 43. Seismic Data Polluted with Noise Gaussian noise with zero mean and 3 different std.dev σ is added. N(0, σ) N(0, 2σ) N(0, 3σ) Truth case
  • 44. Filtering Only a low frequency component is left after filtering N(0, σ) N(0, 2σ) N(0, 3σ) Truth case
  • 45. Geo-manifold Unlabelled points are generated only in the cells below the threshold N(0, σ) N(0, 2σ) N(0, 3σ) Truth case
  • 46. Porosity SVR Estimates for Noisy Data Noise level: 1 σ Noise level: 2 σ Noise level: 3 σ Geo-manifold becomes less concentrative and the channel “erodes” with increase of the noise level Truth case
  • 47. Prediction with a Large Noise Level Noise level: 3σ Even with large noise levels the channel continuity can be traced in SVR prediction although it is barely visible in the input data Truth case
  • 48. Impact of Inherent Non-uniqueness Stochastic realisations of water production from 6 largest producing wells
  • 49. NA Sampling: Misfit Distribution Misfit of models generated by NA Lowest misfit = 188
  • 50. NA Sampling: Parameter Distributions Histogram of parameter values for the generated models Models generated by NA home in the regions of good fit
  • 51. Support Vector Machine (SVM) Linear separation problem 1 αi = 0 Normal Samples + b= wx 1 0 < αi < C Support Vectors (SV) αi = C Support Vectors untypical or noisy L w + C ∑ ξi 2 Soft margin: min 1 2 w i =1 ξ ξi ≥ 0 slack variables to allow 1 =- noisy samples & outliers +b x2 to lie inside or on the w outer side of the margin Trade-off between: margin maximisation & training error minimisation Increase space dimension to solve separation problem linearly

×