Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MISFIT_PREPROCESSOR to ERT template #217

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

wouterjdb
Copy link
Contributor

@wouterjdb wouterjdb commented Oct 14, 2020

This PR implements scaling of correlated observations using the ERT build-in PCA scaling method.


Contributor checklist

  • 🎉 This PR closes Too many observations used when history matching Norne #206.
  • 📜 I have broken down my PR into the following tasks:
    • Add STD_SCALE_CORRELATED_OBS TRUE (On its way out - should use MISFIT_PREPROCESSOR)
    • Bump ERT version to latest master commit
    • Bump Libres to version to v6.0.0.rc0
    • Bump ERT and libres to released 2.16 and v6.0.0 (waiting for release)
    • Add MISFIT_PREPROCESSOR as PRE_FIRST_UPDATE hooked workflow
    • Test and visualize results
  • 🤖 I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR.
  • 📖 I have considered adding a new entry in CHANGELOG.md.
  • 📚 I have considered updating the documentation.

@wouterjdb wouterjdb added the enhancement New feature or request label Oct 14, 2020
@wouterjdb wouterjdb self-assigned this Oct 14, 2020
@wouterjdb wouterjdb changed the title Add STD_SCALE_CORRELATED_OBS TRUE to ERT template Add MISFIT_PREPROCESSOR to ERT template Oct 14, 2020
@wouterjdb
Copy link
Contributor Author

wouterjdb commented Feb 9, 2021

✔️ kmeans clustering has now been added equinor/semeio#286

🚫 currently still blocked by equinor/ert#1316

@wouterjdb
Copy link
Contributor Author

✔️ kmeans clustering has now been added equinor/semeio#286

✔️ speed improvement for many observations equinor/ert#1316

@wouterjdb
Copy link
Contributor Author

🚫 Currently blocked by the new commits not yet being in pypi.

@wouterjdb
Copy link
Contributor Author

Both packages are now updated on pypi (2.21.b0 and 1.0.b0)

✔️ Ready for testing.

@wouterjdb wouterjdb removed the blocked label Feb 22, 2021
@edubarrosTNO
Copy link
Contributor

I have tested the MISFIT_PREPROCESSOR option in the Norne case by using the code in the branch of this PR.
With this workflow job enabled, ERT writes some files to a subfolder inside the FlowNet output folder (<FLOWNET_OUTPUT_FOLDER>/reports/default_0):

  1. Inside subfolder CorrelatedObservationsScalingJob, 3 files are created:
    a. scale_factor.json: [34.63, 14.76]
    b. svd.json: a 2D array of size (33, 2) containing what appears to be two lists of 33 singular values in decreasing order.
    c. workflow-log.txt: a text file with some information about the calculation of the scaling factors stored in scale_factor.json - in this case two blocks of information indicating the number of primary components, number of observations and a list of observation keys used to calculate the scaling factor.

  2. Inside subfolder MisfitPreprocessorJob, 4 files are created:
    a. clusters.json: a Python dictionary of dictionaries associating the observation keys to their numbering
    b. correlation_matrix.csv: a rather large CSV file (950 MB) which was hard to inspect given its size (but I believe a square matrix Nobs x Nobs).
    c. svd.json: a 2D array of size (33, 1) containing what appears to be a list of 33 singular values in decreasing order (same as one of the lists stored in 1.b)
    d. workflow-log.txt: a text file with some information about the obtained clusters of observations - in this case two clusters as stored in clusters.json, cluster 0 and cluster 1, with their respective list of observation keys and numbering (cluster 1 appears to contain many more observation keys than cluster 0)

@edubarrosTNO
Copy link
Contributor

All in all, the only thing that I could infer from these output files is that 2 clusters of observations seem to be formed and assigned to calculated scaling factors based on some singular value decomposition or PCA (with 33 non-zero singular values). But it remains unclear why 2 clusters and how the singular values are used to determine the scaling factors.

Another observation is that, when I ran it for the second time, I noticed differences in the output of MISFIT_PREPROCESSOR with respect to the first attempt. In the second one, 3 clusters seem to have been formed: I saw that the scaling factor of cluster 0 remained close to the factor calculated in the first attempt and that the scaling factors of clusters 1 and 2 add up approximately to the scaling factor of cluster 1 in the first attempting (suggesting that, in this second run, old cluster 1 was split into two clusters). In summary, there seems to be some randomness associated with this MISFIT_PREPROCESSOR process despite that fact that the RANDOM_SEED fixed in the ERT config file is the same in both attempt runs. This should be reported in the ERT repository.

To conclude: based on my tests done in the Norne example, I would not recommend to merge this PR branch to master before we understand better what this option is doing exactly and ensure that we can control any possible randomness associated with this process. If we do proceed with merging, my advice would be to expose this as an optional setting in FlowNet config file and make sure to have it disabled as default. The large number of FlowNet failing simulations when this option was enabled stopped me from determining whether or not this would be useful to mitigate the problem of having a very large number of observations in our FlowNet runs.

@wouterjdb wouterjdb added this to the HM open Norne Model milestone Apr 26, 2021
@wouterjdb wouterjdb removed their assignment Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many observations used when history matching Norne
3 participants