Typical dataset you’ll use in your analysis can be viewed as a table of numbers. Each row corresponds to one recorded collision event, each column some measured variable: transverse momentum of a particle, its energy, pseudorapidity, etc. In practice, the variables can have sometimes complicated types (e.g. 4-momentum can be stored as a single variable of type TLorentzVector).
In ROOT, datasets are stored using the TTree class. TTree provides access to data and handles I/O operations, compression of data, etc. Instance of TTree can be stored in “.root” files, which in we usually call “n-tuples”.
To analyse the dataset, one typically has to go through all the events and process the variables of interest. This is called the event loop. Let’s write now a simple event loop using the root’s classes. First, we need a skeleton class, let’s call it EventLoop. The header file “EventLoop.h”
#ifndef EVENTLOOP_H #define EVENTLOOP_H class EventLoop { public: /** * @brief Construct a new Event Loop object */ EventLoop(); /** * @brief Initialize the event loop */ void initialize(); /** * @brief Execute the event loop */ void execute(); }; #endif
This time, we have two methods (apart from the constructor): “initialize” where we open the data file and load the TTree class instance stored in it and second where we execute the actual event loop.
Now let’s add some data members to the class. In c++, we can have public attributes (accessible from the outside) and private/protected members, which can be only accessed from within class’s methods. Now let’s summarise which attributes are needed:
- name of the data file. This should be a public attribute so that it can be set when we create the instance of the EventLoop. Also, because very often one dataset is split into multiple files, it’s better to declare it as a vector (array) of strings.
- Instance of TTree class we use to read the data. Because we allow multiple input files, we have to actually use “TChain” class, which is an extension of TTree and allows for access to multiple files. This attribute can be made as protected, because use of the TChain instance will be handled internally and there is no need for access from the outside.
- Finally, to pull out the TTree object from the file one needs to know its name (sometimes called the key). Because we do not want to hard-code too many things, we make another public attribute bearing the TTree name.
So, let’s update our header file:
#ifndef EVENTLOOP_H #define EVENTLOOP_H #include <vector> #include <TString.h> #include <TChain.h> class EventLoop { public: /** * @brief Construct a new Event Loop object */ EventLoop(); /** * @brief Initialize the event loop */ void initialize(); /** * @brief Execute the event loop */ void execute(); /** * @brief list of input ROOT file names */ std::vector<TString> inputFiles; /** * @brief Name of the TTree instance. Must be same in all files */ TString treeName; protected: /** * @brief Instance of the TChain class used to read the data */ TChain* m_chain = 0; }; #endif
- Note that at the beginning of the file we have included all the classes that are needed. TString for string (text) variables and vector for lists. “m_chain” pointer variable is initialised to 0.
Now let’s implement the body “EventLoop.cpp”:
#include "EventLoop.h" #include <iostream> #include <stdexcept> EventLoop::EventLoop() { // nothing to do here } void EventLoop::initialize() { // create an instance of the TChain class m_chain = new TChain(treeName); // loop through the input files and add them to the chain for(auto inputFile : inputFiles) { m_chain->Add(inputFile); std::cout << "Added file: " << inputFile << std::endl; } } void EventLoop::execute() { // sanity check. m_chain must not be zero if(!m_chain) { throw std::runtime_error("Calling execute while the wvent loop was not initialized."); } // here we do the actual event loop for(int i=0; i<m_chain->GetEntries(); ++i) { // event number printout if(i%1000==0) { std::cout << "Event " << i << std::endl; } // read the data for i-th event m_chain->GetEntry(i); } }
- At the beginning we have a usual round of includes for classes used later. The constructor stays empty because one variable that needs initialising (m_chain) was already initialised in the header file.
- In the “initialise” method we create new instance of the TChain class. The constructor has one parameter, which is the name of the tree as stored in the data file. The for-loop over the input files follows. In this example I use c++11 implementation of the for-loop. This simply means that for each item in the “inputFiles” vector the body of the loop is called.
- In the “execute” method, the loop over all the events is performed. “m_chain->GetEntries()” returns the number of events in all the files added to the TChain. “m_chain->GetEntry(i)” reads the i-th entry.
- Note that nothing is hard-coded. The names of the files and trees can be set via public attributes.
- Note that the code doesn’t do anything useful. It just loops through the events, but we do not attempt to access the variables. That will come in the next example.
Now we have to compile the script. We use modification of the “Linkdef.h” and “Makefile” files we have made for the previous example:
Note that in this “Makefile” we have named the program “Analysis” so the shared library is called “Analysis.so”. Recompile everything using standard “make” command:
> make clean > make
Finally, we have to prepare a steering script where we set the file paths and tree name. In this example, some random files for MC simulation are used. We have downloaded them when at the beginning of our tutorial.
So let’s now create the script called “runMe.py”:
# here we load the shared library created using the Makefile from ROOT import gSystem gSystem.Load("Analysis.so") # now we can create instance of the class EventLoop from ROOT import EventLoop eventLoop = EventLoop() # set the attributes. All public attributes are accessible # 1. name of the tree. In this example the tree is called NOMINAL eventLoop.treeName = "NOMINAL" # 2. add the input files. We use MC simulation of the # gluon-gluon fusion production of the Higgs boson. eventLoop.inputFiles.push_back('../ggH.root') # initialize and execute the event loop eventLoop.initialize() eventLoop.execute()
When you execute the script in python you’ll see as it loops through all the events contained in the two specified files.
> python runMe.py Added file: ../ggH.root Event 0 Event 1000 Event 2000 Event 3000 ... and so on
Because things are written generically, the same class could be used as an event loop for any kind of ROOT file. However, since it does not access the data it is not very useful as it stands. We will learn to read data in the next example.