After this tutorial you should know how to run MCMC, implement systematic uncertainties new samples and how to plot standard Bayesian diagnostic.
Current setup include single sample with several cross-section and oscillation parameters.
MaCh3 is predominantly C++ software although some functionality are available through python as well. To access them please use tale of contents.
- How to Start?
- How to Run MCMC
- How to Develop Model of Systematic Uncertainties
- How to Develop New Samples
- MCMC Diagnostic
- Useful Settings
- How to Plot?
To compile simply
mkdir build;
cd build;
cmake ../ -DPYTHON_ENABLED=ON [DPYTHON_ENABLED not mandatory]
make -jN [set number of threads]
make install
then
source bin/setup.MaCh3.sh
source bin/setup.MaCh3Tutorial.sh
source bin/setup.NuOscillator.sh
alternatively you can use containers by
docker pull ghcr.io/mach3-software/mach3tutorial:alma9latest
To reed more how to use containers check our wiki here
To run MCMC simply
./bin/MCMCTutorial Inputs/FitterConfig.yaml
Congratulations! 🎉 You have just completed finished you first MCMC chain.
Being able to visualise and analyse output of MCMC is standard procedure after chain has finished, you can produce simple plots with
./bin/ProcessMCMC bin/TutorialDiagConfig.yaml Test.root
where Test.root is the output of running MCMCTutorial as described here One of plots you will encounter is:
It is marginalised posterior of a single parameter. This is main output of MCMC.
WARNING Your posterior may look very shaky and slightly different to one in example. This is because you run chain with low number of steps. Meaning you don't have enough statistic to build posterior distribution. You can easily modify in Inputs/FitterConfig.yaml
General:
MCMC:
NSteps: 10000
It is good homework to increase number of steps and see how much more smooth posterior becomes, but at the cost of having to wait more.
Warning: If you modified files in main folder not build you will have to call make install!
ProcessMCMC has much more plotting options, we recommend to see here to get better idea what each plot mean. We especially recommend comparing 2D posteriors with correlation matrix and playing with triangle plots.
You can then take the output of running ProcessMCMC which will be called something like _Process.root, and make fancier error plots from it using the GetPostfitParamPlots
app like:
GetPostfitParamPlots Test_drawCorr.root
Output should look like file below, and it convey same information as individual posteriors, it is more compact way of presenting same information useful for comparison especially if you have order of hundred parameters
In the next step you gonna modify analysis setup and repeat steps.
First let's better understand Inputs/SystematicModel.yaml
. This config controls what systematic uncertainties will be used in the analysis for example like this:
- Systematic:
Names:
FancyName: Norm_Param_0
Error: 0.11
FlatPrior: false
IsFixed: false
Mode: [ 0 ]
ParameterBounds: [0, 999.]
ParameterGroup: Xsec
TargetNuclei: [12, 16]
KinematicCuts:
- TrueQ2: [0.25, 0.5]
ParameterValues:
Generated: 1.
PreFitValue: 1.
DetID: 985
StepScale:
MCMC: 0.2
Type: Norm
If you want to read more about implementation please go here
As first step let's modify Error: 0.11
to Error: 2.0
, this should significantly modify error which should be noticeable in MCMC.
Lastly we need to modify name of output file. This is governed by manager class (read more here) modify OutputFile: "Test.root"
in Inputs/FitterConfig.yaml
to for example
OutputFile: "Test_Modified.root"
.
Now let's run MCMC again
./bin/MCMCTutorial Inputs/FitterConfig.yaml
Congratulations! 🎉 Next step is to compare both chains.
Now that you have two chains you can try comparing them using following.
./bin/ProcessMCMC bin/TutorialDiagConfig.yaml Test.root Default_Chain Test_Modified.root Modified_Chain
This will produce pdf file with overlayed posteriors. Most should be similarly except modified parameter.
Sometimes you may want to fix parameter, for example if it causing problem to the fitter or you want to run fit with and without parameters to compare results. To fix parameter just pass name in the Inputs/FitterConfig.yaml
General:
Systematics:
XsecFix: [ "Norm_Param_0" ]
First we gonna investigate how to modify sample, let's take a look at Inputs/SamplePDF_Tutorial.yaml
. Each sample has set of cuts right now we only introduce cut on TrueNeutrinoEnergy
.
SelectionCuts:
- KinematicStr: "TrueNeutrinoEnergy"
Bounds: [ 0., 4 ]
We can introduce additional cut for example on Q2 by expanding config like this:
SelectionCuts:
- KinematicStr: "TrueNeutrinoEnergy"
Bounds: [ 0., 4 ]
- KinematicStr: "TrueQ2"
Bounds: [ 0.6, 1 ]
You can also easily switch variable in which you bin sample.
Binning:
XVarStr : "TrueNeutrinoEnergy"
XVarBins: [0., 0.5, 1., 1.25, 1.5, 1.75, 2., 2.25, 2.5, 2.75, 3., 3.25, 3.5, 3.75, 4., 5., 6., 10.]
You can try again to run MCMC and compare all 3 chains
./bin/ProcessMCMC bin/TutorialDiagConfig.yaml Test.root Default_Chain Test_Modified.root Modified_Chain Test_Modified_Sample.root ModifiedSameple_Chain
Up to this point we only modified sample but how to add new one? First make copy of sample config Inputs/SamplePDF_Tutorial.yaml
and call it Inputs/SamplePDF_User.yaml
. For the moment feel free to change name, binning etc but keep inputs the same. Go wild! Next go to Inputs/FitterConfig.yaml
General:
TutorialSamples: ["Inputs/SamplePDF_Tutorial.yaml"]
To add you newly implemented sample you will have to expand config to for example:
General:
TutorialSamples: ["Inputs/SamplePDF_Tutorial.yaml", "Inputs/SamplePDF_User.yaml"]
MaCh3 has access to many oscillation engines via NuOscillator framework. First you can check features using following command
bin/mach3-config --features
MULTITHREAD MR2T2 PSO Minuit2 Prob3ppLinear NuFast
This way you can easily access information about MaCh3 features, fitter engines and most importantly oscillation engines.
By default we use NuFast, however to change to for example Prob3++ one have to modify sample config Inputs/SamplePDF_Tutorial.yaml
:
NuOsc:
NuOscConfigFile: "Inputs/NuOscillator/Prob3ppLinear.yaml"
In most cases this is enough. However you have to be aware that some engines require different number of parameters. In this example NuFast requires one additional parameter compared with Prob3ppLinear which is Ye. You will have to remove Ye from Inputs/Osc_Test.yaml
Another useful setting is whether you want binned or unbinned oscillations, if you want to do this simple got to Inputs/NuOscillator/Prob3ppLinear.yaml
and modify following path to Unbinned.
General:
CalculationType: "Binned"
Not everything can be done by modifying config in sample implementation. Actual implementation is insamplePDF/samplePDFTutorial
this class inherits from samplePDFFDBase
. The latter class deals with actual reweighting and all heavy lifting. while samplePDFTutorial deals with MC loading etc. This is because each experiment has slightly different MC format and different information available.
Crucial part of MCMC is diagnostic whether chain converged or not. You can produce diagnostic by running.
./bin/DiagMCMC Test.root bin/TutorialDiagConfig.yaml
This will produce plethora of diagnostic however one most often checked are autocreation's which indicate how correlated are MCMC steps which are n-steps apart. We want autocreation's to drop fast. You can read about other diagnostic here on here
Best way to reduce auto-corelations is by step size tunning. There are two step-scale available.
Global which affect identically every parameter and it is proportional to all parameters can be found in Inputs/FitterConfig.yaml
:
General:
Systematics:
XsecStepScale: 0.05
or individual step scale affecting single parameters which highly depend on parameters boundary sensitivity etc can be found in Inputs/SystematicModel.yaml
.
- Systematic:
Names:
FancyName: Norm_Param_0
StepScale:
MCMC: 0.2
We recommend chasing both scales running MCMC again and later producing auto-corelations. Understanding how auto-corelations change while playing with step-size is very useful skill.
At this point, you should be aware that to have a smooth posterior distribution, you may need a lot of steps, which can be time-consuming. A great property of MCMC is that once a chain reaches the stationary distribution (or, in other words, converges), it samples the same distribution. This means we can run several chains in parallel. Computing clusters give us the ability to run thousands of MCMC chains in parallel, allowing us to accumulate steps very fast.
The only caveat of this method is that chains have to converge to the same stationary distribution (there can only be one stationary distribution but chains can stuck in local minima or not converge due to wrong step-size tunning). To validate if chains converged we can use RHat.
In the following example 1000 indicate number of toys we want to sample while other arguments indicate different chains. You can pass as many chains as you want
./bin/RHat 1000 MCMC_Test_1.root MCMC_Test_2.root MCMC_Test_3.root MCMC_Test_4.root
RHat is estimator of variance between chains, in other words it should be peaking at 1. Single entry in histogram refers to single parameters. If your distribution has a tail reaching beyond 1.1 (according to Gellman) then this maybe indicate some chains haven't converged to the same distribution. Which MUST be investigated in analysis (in this tutorial main culprit will be number of steps)
Once you validated that chains converged you may need to merge them
./bin/CombineMaCh3Chains -o MergedChain.root MCMC_Test_1.root MCMC_Test_2.root MCMC_Test_3.root MCMC_Test_4.root
This works very similarly to hadd although has some advantages, main one being it checks if chains were run with the same settings. If for example one chains was run with different systematic parameters then this should be caught and raised.
There are plenty of useful settings in Fitting Algorithm: Most likely you run MCMC but what if you want to use algorithm like Minuit2?
General:
FittingAlgorithm: "MCMC"
In Inputs/FitterConfig.yaml
you should switch following setting to "Minuit2"
LLH Type: By default we use Barlow-Beeston LLH, however several are implemented. For example by changing config you can use Poisson or maybe IceCube.
LikelihoodOptions:
TestStatistic: "Barlow-Beeston"
Read more here
There are a number of apps included to make plots from the results of your fits, llh scans etc. You can find more details on them and how they work in the main MaCh3 wiki here. There you will also find some instructions on how you can write yor own plotting scripts.
The plotting library is configured using yaml files. You can see some examples of such config files in the plotting directory, and a detailed explanation of them is given in the wiki.
Some examples on how to make some "standard" plots are given below.
You can run MCMC in very similar way as MCMC
./bin/LLHScanTutorial Inputs/FitterConfig.yaml
You can plot the results of an LLH scan using the aptly named PlotLLH app like so
PlotLLH LLH_Test.root
where LLH_Test.root is the result of running the LLH scan as described here.
It is possible to compare several files simply by:
PlotLLH LLH_Test.root LLH_Test_2.root
If you have installed the python interface for MaCh3 as described here then you can also use the provided python plotting module. The details on how to write custom python scripts using this plotting module are detailed in the wiki. Here we will walk you through some example scripts.
For the examples here, we will use matplotlib and numpy. These can be installed using the provided requirements.txt by doing
pip install -r requirements.txt
but note that these are just what is used for an example for the purpose of this tutorial. When making your own plots you are totally free to use whatever plotting libraries you like!
You can use the example script MCMCPlotting-tutorial.py to plot the raw MCMC chain values that were obtained in the how to run MCMC section above by doing
python plotting/MCMC-plotting-tutorial.py Test.root
This will give you some plots that look something like
After you have run this chain through the MCMC processor as described in the How To Plot? section, you can pass the processed file to 1DPosterior-tutorial.py like
python plotting/1DPosterior-tutorial.py Test_Process.root
which will plot the projected one dimensional posteriors which should look something like
Similarly you can use LLHScan-plotting-tutorial.py to plot the LLH scans you made in the How to run LLH scan section like
python plotting/LLHScan-plotting-tutorial.py LLH_Test.root
which will give you some plots that look something like
If you encountered any issues or find something confusing please contact us:
- Kamil Skwarczynski
Royal Holloway, University of London, UK