Statistical methods 2021

Lecture notes

  • PDF with lecture notes: notes
  • The notes are still a work in progress. Please let us know if you spot errors and mistakes.
  • A script for the videos below is here. It is a bit improved version of the lecture notes, some topics are a bit more developed in the script than in the lecture notes.


Introduction to statistics

Parameter estimation

Hypothesis testing

Multivariate analysis



  • HW1 on the Introduction part: solve problems described here and send us a scanned copy (or photo) of your solution
  • HW2 on the Maximum Likelihood Method part: go through all 10 examples below and do the exercise described in MLE_homework.cpp
  • HW3 on the Hypothesis Testing: solve problems described here and send us a scanned copy (or photo) of your solution as well as your code
  • HW4 on the MVA: go through all 3 examples on MVA below and then:
    • Build a ROC curve for classifiers Fisher, BDT and MLP from the examples. Use the testing sample that’s used in the examples.
    • Calculate integral of each ROC curve and use it to compare the three methods.
  • HW5 on the Unfolding: find it at the bottom of this page


Examples: Maximum Likelihood Method

  • You will need to install the ROOT framework

1. Simple example illustrating the method of maximum likelihood

  • The gaussian-distributed data are generated using MC random number generator: TRandom2::Gaus
  • Likelihood function is defined in the “logL”.
  • Likelihood calculated for three cases: the real values of parameters, obviously wrong values, and values maximising the likelihood
  • NOTE: the class TF1 is used to find max likelihood in this example. This is just an illustration, do not use this method in real life. Practically usable methods of fitting are shown below.
  • Source code: MLE_example01.cpp
  • To execute:

    root -l MLE_example01.cpp+

2. Examples of various methods for estimating the fitted parameter uncertainty

  • Pseudo-data are drawn from the exponential pdf
  • Fit is performed again using the TF1 class. Do not use this method in the real life!
  • Uncertainty of the fitted parameter estimated using four methods: analytical calculation, RFC bound, graphic method and MC method.
  • Source code: MLE_example02.cpp
  • root -l MLE_example02.cpp+

3. Example of fitting with the two-parameter function using Minuit2 class. Illustration of the “accept/reject” MC method.

  • Pseudo-data are this time generated “by hand” using the accept/reject method
  • Likelihood function is implemented using c++ lambda-function and passed to the Minuit2 minimizer.
  • This is the first example usable in practical situations. However, it requires a lot of work since the fitted function must be implemented by hand together with its proper normalization factor.
  • The fit is repeated many times with different pseudo-data samples. The results are displayed as a 2D distribution. This illustrates relation between the parameter uncertainty estimated using the RFC bound technique and the uncertainty estimated using the pseudo-experiments.
  • Source code: MLE_example03.cpp
  • root -l MLE_example03.cpp+

4. Fitting in RooFit

  • The previous example rewritten using RooFit library
  • RooFit is a standard component of ROOT. It allows comfortable definition of the fitted PDF using the c++ classes. We recommend using RooFit for the ML fits in ROOT.
  • This example also illustrates relation between the individual parameter uncertainties and the likelihood contour.
  • This example uses pseudo-data from the previous example. Therefore, you need to run the previous script first!
  • source code: MLE_example04.cpp
  • root -l MLE_example04.cpp+

5. Pull distributions

  • Fit from the previous example is repeated many times using pseudo-data generated using RooFit (yes, RooFit can generate any pseudo-data using any PDF)
  • Pull distribution is created: pull = (theta – theta_i) / sigma_theta_i
    where theta is the estimated parameter, theta_i is parameter estimated in the i-th pseudo-experiment and sigma_theta_i is its uncertainty.
  • For unbiased estimate and correctly estimated uncertainty, pull distribution should have mean at 0 and standard deviation of 1. This is a standard way of checking bias and stability of the ML fit (remember that ML can be in principle biased. Also, because the minimization is done numerically, it can suffer from non-convergence, instabilities, etc.)
  • source code: MLE_example05.cpp

6. Relation between the likelihood contour and the parameter uncertainty

  • Again, the ML fit using data from the example 3. The likelihood contour is drawn around the estimated parameter values.
  • The fit is repeated 10000-times using pseudo-data drawn from the PDF with the true values of parameters. Results of these pseudo-experiments are filled in the 2D histogram. This histogram is visualised using a single contour (option TH2F::Draw(“cont1”)).
  • This contour is roughly elliptical and has roughly the same are as the ML contour drawn previously. However, it is centered around the true value rather than around the estimate.
  • Both contour are overlapping but are shifted with respect to each other. The ML contour can be interpreted as an uncertainty region covering the true value of the parameter with a certain probability (c.f. Neyman method).
  • NOTE: the code runs for a long time. You can .speed it up by reducing the number of pseudo-experiments, but then the pseudo-result contour will not be very nicely rendered.
  • MLE_example06.cpp

7. Binned maximum likelihood method

  • Comparison of different methods of fitting the histogram: least-square fit, binned ML fit and unbinned ML fit.
  • MLE_example07.cpp

8. Defining custom PDFs in RooFit

  • Examples of construction a custom PDF using RooFit classes.
  • MLE_example08.cpp

9. Extended ML fit

  • Comparison of the extended ML fit and ML fit for a two-component PDF.
  • Number of signal and background events extracted from the classic ML fit is calculated and compared to the results of the extended ML fit.
  • While the numbers themselves agree, uncertainties estimated in the classic ML fit do not reflect uncertainty of the total number of events, which one has to take into account when e.g. measuring cross-section.
  • MLE_example09.cpp

10. Constrained ML fit

  • Example of definition of the constrained ML fit.
  • MLE_example10.cpp

11. Template fit

  • See the next section on hypothesis testing (HistFitter tool).


Examples: MVA

1. Comparison of methods “Orthogonal cuts”, “Fisher discriminant”, “BDT” and “ANN” with the use of the TMVA package

  • TMVA Libraries offer a unified interface to the usage of several MVA methods
  • The example is based on a simulation of Lambda hyperon decays to a proton and a pion
  • The following discriminating variables are used:
    • Chi2_min of the reconstruction of a secondary decay vertex (a measure of the vertex reconstruction quality)
    • Transverse momentum
    • Transverse decay length significance – distance of the vertex from the primary vertex in the transverse plane divided by an uncertainty of this quantity reconstruction
    • Transverse impact parameter significance – transverse distance of the Lambda hyperon trajectory from the primary vertex divided by an uncertainty on this quantity reconstruction
  • The MC sample is split into two halves: training and testing samples
  • Four selection methods are trained with the use of the training sample:
    • Cuts: application of 4 selection criteria on the discriminating variables
    • Fisher: Fisher Discriminant
    • BDT: Boosted Decision Tree
    • MLP: Artificial Neural Network Multilayer Perceptron
  • Each method has some parameters, so-called metaparameters, that need to be set by the user. In the example, they are set to some “reasonable” default
  • The training results are saved to an output ROOT file. Try to open the file in the ROOT’s TBrowser and look at the saved histograms
  • Optimized values of all parameters of each method is saved in and XML file in the “weights/” folder. You will use it for your work
  • TMVA compares the different methods with the use of the ROC integral. Note the section “Evaluation results ranked by best signal efficiency and purity (area)” in the TMVA printout on the screen
  • TMVA also performs an overtraining test. Signal efficiencies are compared for the training and the testing samples for three values of the background rejection. A much larger significance for the training sample is a clear sign of the classifier overtraining.
  • Code: MVA_example01.cpp
  • How to run it: root -l MVA_example01.cpp+

2. MVA classifiers and the testing sample

  • We will read the optimized weights for three methods from the previous step (Fisher, BDT, MLP) and we will draw the test statistic distributions estimated with the testing sample.
  • A separation of signal and background is clear in the displayed historams
  • Code: MVA_example02.cpp

3. MVA application to the data

  • We will apply the classifiers from the step 1 to the data
  • Cut values are set “by eye” such that all methods had similar background rejections
  • The histograms “Mass of selected events” displays the Lambda invariant mass spectrum selected by 4 methods
  • The amount of signal is different for the different methods. The best performance is achieved by the MLP and BDT. Cuts have the lowest signal efficiency.
  • The panel MVA selection regions displays a region selected by each method
  • Code: MVA_example03.cpp

Problem: Unfolding

  • You will need ROOT for your work with the RooUnfold package.
  • Download RooUnfold and compile it according to the instructions in the section Building the Library on the gitlab. You can also just Install the Library with pip.
  • Run
    • python examples/
  • Read the example
    • It uses Gaus(x; 0, 2) as a truth spectrum.
    • Detector effects are simulated with a function smear that simulates a finite detector efficiency and resolution as well as a shift of the measured values wrt. the truth ones.
    • The example shows how to create the response matrix
      • It uses BreitWigner(x; 0.3, 2.5) as a truth spectrum used to build the matrix
    • Finally, the measured spectrum is unfolded with a method of the user’s choice
      • The default is RooUnfoldBayes which is the D’Agostini’s method

Homework on Unfolding

  • Take the example code examples/ and compare results (unfolded spectra) of three different methods:
    • RooUnfoldBayes (D’Agostini’s method)
    • RooUnfoldSvd (SVD)
    • RooUnfoldIds (IDS – Iterative Dynamically stabilized method of data unfolding)
    • Your output should be a plot showing the three spectra
  • Use pseudoexperiments to confirm that the D’Agostini’s method estimates statistical uncertainties on the unfolded spectrum correctly:
    • Again, take examples/
    • Print out statistical uncertainties of the unfolded spectrum estimated with the D’Agostini’s method. Something like:
      • for i in range(1, hReco.GetNbinsX()+1): print hReco.GetBinError(i)
    • Estimate the statistical uncertainties with the use of pseudoexperiments:
      • In each pseudoexperiment:
        • Prepare the “measured” spectrum such that event counts in the individual bins are random numbers drawn from a Poisson distribution: gRandom.Poisson(hMeas.GetBinContent(i))
        • Unfold the “measured” spectrum
      • Estimate the covariance matrix for the unfolded spectrum bins:
        • cov[bin i, bin j] = 1/nToy ∑_k (unf_k.GetBinContent(i) – mean[unf_bin_i])*(unf_k.GetBinContent(j) – mean[unf_bin_j])
        • unf_k is the unfolded spectrum from the k-th pseudoexperiment, mean[unf_bin_i] is the mean value of a bin content (averaged over pseudoexperiments)
        • nToy is the number of pseudoexperiments. Choose 100
        • Compare the square root of cov[bin i, bin i] with the uncertainties estimated by the D’Agostini’s method