Analysis tutorial #4: Histograms

In ROOT, histograms are implemented using the classes TH1D (or TH1F), TH2D (or TH2F) and TH3D (TH3F). The number indicates how many variables are displayed. The D or F stands for the precision of the bin content. Usually, we use “D” histograms whose bin content uses double precision. However, in some cases it might be sufficient to use float precision (“F” histogram) esp. when we are short of memory (e.g. if you create 10000 histograms with very fine binning it is better to use the float precision).

Most commonly, one uses 1-variable histograms. It will also be the case in our example. For example, to create an empty 1D histogram for the Higgs mass variable, use the following constructor:

TH1D* hist = new TH1D("ditau_m", ";#it{m}_{#tau#tau} [GeV];Events", 30, 0, 300);
hist->Sumw2();
  • The first parameter is the histogram name. When we save the class into the ROOT file, this name is used. So it should ideally be descriptive and unique.
  • The second parameter contains histogram and axes titles. The titles are separated by a semicolon  “;” like this: “histogram title;x-axis title;y-axis title”. In our example, we decided not to give the histogram any title, so the first part is empty. For the x-axis title we use latex-like syntax which is supported by ROOT.  The variable “m” will be typeset in italics with the “ττ” subscript.
  • The last 3 parameters are: number of bins, x-axis minimum, y-axis maximum. So in our case we will have 30 bins with bin size of 10 GeV.
  • In the second line, we tell the histogram class to also store errors. This we always want to do. Using histograms without errors usually leads to incorrectly interpreted statistical properties of the aggregated data so you will always want to call “Sumw2” on your histograms when you create them.

To add one value into the histogram, call it’s “Fill” method:

hist->Fill(125., 1);

In our example, this will add 1 into the 13th bin. The second (optional) parameter is the event weight. By default, value of 1 is used. It means each call of the Fill method adds +1 into one of the bins. Because sometimes we work with weighted data (e.g. in case of the MC simulation as will be discussed later), we can also use non-integer values of the second parameter.

One can also set the whole content of the histogram bin “by hand”. This is done using the following methods:

hist->SetBinContent(10, 12);
hist->SetBinError(10, 0);
  • The first parameter is the bin index. The bins are indexed from 1 unlike lists and vectors in python and c++!
  • The second parameter is the value/error that will be stored in the bin.

Now that we know the basics of the histogram class, let’s implement them into our program. We want to modify the event loop so that instead of printing the mass on the screen it stores it in the histogram which is then saved in the output root file. The most obvious way to implement this would be to add the histogram as a member attribute of the EventLoop class and then call its Fill method from the EventLoop’s execute method. This would certainly work, however, we would loose flexibility of the EventLoop implementation. So let’s implement the histogram-filling code into a separate class, instead of hard-coding it into the EventLoop.

We will need a new header and source files. We will call them Algorithm.h and Algorithm.cpp. Algorithm class will need the following features:

  • Access to the data. So it must have an attribute of “Data*” type which will point to the instance of the Data class created by the EventLoop. The pointer must be passed from the EventLoop to the Algorithm in the initialisation stage before the event loop starts.
  • It must hold instances of the histogram classes
  • It must  have an “execute” method which when called fills the histograms using variables from the Data instance.

Firstly, let’s make the class declaration in the “Algorithm.h”:

#ifndef ALGORITHM_H
#define ALGORITHM_H

#include "TString.h"
#include "TH1D.h"
#include "Data.h"

class Algorithm {
 public: 
  /**
  * @brief Construct a new Event Loop object
  */
  Algorithm();

  /**
  * @brief Initialize the algorithm
  * @param data - pointer to the data-access class instance
  */
  void initialize(Data* data);

  /**
  * @brief Execute. Here the stuff happens
  */
  void execute();

  /**
  * @brief Pointer to the histogram class instance defined as public attribute
  */
  TH1D* h_ditau_m = 0; // must be initialized to 0

 protected:

  /**
  * @brief Instance of the data-access class
  */
  Data* m_data = 0;

};

#endif

And super-simple implementation of the class’s body in “Algorithm.cpp”:

#include "Algorithm.h"

Algorithm::Algorithm() {
}

void Algorithm::initialize(Data* data) {
 m_data = data;
}

void Algorithm::execute() {
 // here we fill the histograms. We protect the code against the empty pointers.
 if(h_ditau_m) h_ditau_m->Fill( m_data->ditau_mmc_mlm_m );
}
  • Note that nowhere in this code we create the “TH1D” class instances. Also, we did not include any code that saves the filled histograms into the output root file. This is because it is much easier to do these things in Python, so it will come a bit later.

When you compile the new classes, do not forget to add them into the Makefile and Linkdef.h files!

The last bit in the c++ code is to modify the EventLoop class so that it initialises and executes the algorithm. Open EventLoop.h file and modify:

#ifndef EVENTLOOP_H
#define EVENTLOOP_H

#include <vector>
#include "TString.h"
#include "TChain.h"
#include "Data.h"
#include "Algorithm.h"

class EventLoop {
 public: 
  /**
  * @brief Construct a new Event Loop object
  */
  EventLoop();

  /**
  * @brief Initialize the event loop
  */
  void initialize();

  /**
  * @brief Execute the event loop
  */
  void execute();

  /**
  * @brief list of input ROOT file names
  */
  std::vector<TString> inputFiles;

  /**
  * @brief Name of the TTree instance. Must be same in all files
  */
  TString treeName;

  /**
  * @brief List of algorithms to be executed in the event loop
  */
  std::vector<Algorithm*> algorithms;

 protected:

  /**
  * @brief Instance of the TChain class used to read the data 
  */
  TChain* m_chain = 0; // pointer is initialized to zero

  /**
  * @brief Instance of the data-access class
  */
  Data* m_data = 0;

};

#endif
  • Here, we have just added include for the Algorithm class and a public attribute “std::vector<Algorithm*> algorithms”.

In the source file we just make sure that “initialize” and “execute” methods of the algorithms class are executed:

#include "EventLoop.h"
#include <iostream>
#include <stdexcept>

EventLoop::EventLoop() {
 // nothing to do here
}

void EventLoop::initialize() {
 // create an instance of the TChain class
 m_chain = new TChain(treeName);

 // loop through the input files and add them to the chain
 for(auto inputFile : inputFiles) {
  m_chain->Add(inputFile);
  std::cout << "Added file: " << inputFile << std::endl;
 }

 // create an instance of the Data class. Here the variables
 // are linked with the tree using the SetBranchAddress method
 m_data = new Data(m_chain);

 // initialise the algorithms
 for(auto algorithm : algorithms) {
  algorithm->initialize(m_data);
 }
}

void EventLoop::execute() {
 // sanity check. m_chain must not be zero
 if(!m_chain) {
  throw std::runtime_error("Calling execute while the event loop was not initialized.");
 }

 // here we do the actual event loop
 for(int i=0; i<m_chain->GetEntries(); ++i) {
  // event number printout
  if(i%1000==0) {
   std::cout << "Event " << i << std::endl;
  }

  // read the data for i-th event
  m_chain->GetEntry(i);

  // execute all the algorithms attached to this event loop
  for(auto algorithm : algorithms) {
   algorithm->execute();
  } 
 }
}
  • Note that we have created algorithms attribute as a list (std::vector). This is because we foresee that in the future we might want to call multiple instances of the algorithm from a single event loop.
  • In the c++ code we do not create instances of the Algorithm class anywhere. This will be done in the python steering script.

Finally, we have to update the runMe.py.

# here we load the shared library created using the Makefile
from ROOT import gSystem
gSystem.Load("Analysis.so")

# now we can create instance of the class EventLoop
from ROOT import EventLoop
eventLoop = EventLoop()

# set the attributes. All public attributes are accessible

# 1. name of the tree. In this example, we use H->tautau analysis, where
# tree is called NOMINAL
eventLoop.treeName = "NOMINAL"

# 2. add the input files. We use MC simulation of the
# gluon-gluon fusion production of the Higgs boson.
eventLoop.inputFiles.push_back('../ggH.root')

# create algorithm(s)
from ROOT import Algorithm
alg = Algorithm()

# create the histograms
from ROOT import TH1D
alg.h_ditau_m = TH1D("ditau_m", ";#it{m}_{#tau#tau} [GeV];Events", 30, 0, 300)
alg.h_ditau_m.Sumw2()

# add the algorithm into the event loop
eventLoop.algorithms.push_back( alg )

# initialize and execute the event loop
eventLoop.initialize()
eventLoop.execute()

# Create a new ROOT file
from ROOT import TFile
f = TFile.Open("histograms.root", "recreate")
f.cd()

# save the histogram
alg.h_ditau_m.Write()

# close the file
f.Close()

As you can see, most changes happened here.

  • We create Algorithm class instance “alg”
  • We create instance of the TH1D class and pass it to the Algorithm’s attribute “alg.h_ditau_m”. This means that we set the pointer inside the Algorithm class and the histogram can now be filled when the execute method is called
  • Then we add the algorithm instance “alg” into the “eventLoop.algorithms” list.
  • Finally, we create a new ROOT file with TFile class and save the histogram into it. The second argument of the TFile.Open method is set to “recreate” which means the file is opened for writing and it will be overwritten if it already exists.

After running the program, check the content of the file “histograms.root” that was just created:

> python runMe.py
...

> root histograms.root
Attaching file histograms.root as _file0...
(TFile *) 0x563bd3382b20

> .ls

TFile** histograms.root 
 TFile* histograms.root 
 KEY: TH1D ditau_m;1 

> ditau_m->Draw()

Call of method “Draw” of the TH1D class displays the histogram on the screen:

Screenshot 2021-07-15 at 16.30.28

As an exercise, you can modify the code and more histograms. For instance, you can add histograms of the leading and sub-leading lepton transverse momenta. The lepton momenta are already implemented in the Data class and the variables are called tau_0_p4 and tau_1_p4. They are of non-trivial type TLorentzVector so you will have to check the class’s documentation to see how the transverse component can be accessed.

Now we have functioning program that does something useful. However, some of the pieces of the python code are still a bit cumbersome. In the next exercise we will show how to exploit python classes and inheritance to make more modular code

Next section →