In the previous example we have created an event loop class that can open data files and run through all the events. Now let’s add the bit where we actually access the data and read some variables.
Firstly, let’s learn how we can see which objects are stored in the root file and which variables are stored in a tree. We will use ROOT’s command line for this. You can load a ROOT file into the ROOT environment by executing the root with the file path given a command line argument. For example, using our test files from the Higgs analysis:
> root ../ggH.root Attaching file ../ggH.root... (TFile *) 0x56497ffd4d50
ROOT informs you that it opened the file and that that pointer to the TFile class (which represents the ROOT file in memory) is available the variable called “_file0”. We can list the file content using the TFile method “ls”. In our example, you’ll see something like this
> _file0->ls() TFile** ../ggH.root TFile* ../ggH.root KEY: TTree NOMINAL;1 NOMINAL KEY: TH1D cutflow_muon_NOMINAL;1 cutflow_muon_NOMINAL KEY: TH1D cutflow_ele_NOMINAL;1 cutflow_ele_NOMINAL KEY: TH1D cutflow_pho_NOMINAL;1 cutflow_pho_NOMINAL KEY: TH1D cutflow_tau_NOMINAL;1 cutflow_tau_NOMINAL KEY: TH1D cutflow_mc_hs_jet_NOMINAL;1 cutflow_mc_hs_jet_NOMINAL KEY: TH1D cutflow_mc_pileup_jet_NOMINAL;1 cutflow_mc_pileup_jet_NOMINAL KEY: TH1D cutflow_HSM_common;1 Number of accepted events KEY: TH1D h_metadata;1 KEY: TH1D h_metadata_theory_weights;1
You see that the list of objects stored in the file with their type. Note that there is an object of type “TTree” called “NOMINAL”, which we loaded in the previous example using the “TChain” class. Apart form it, there are other objects stored in the file (type “TH1D”) and we come back to them later.
NOTE: the same printout can be gained by using ROOT shortcut command (the dot is part of the command!):
> .ls
Let’s have a look at the TTree NOMINAL. There are many ways you can browse its content and we will try some of them.
Using TBrowser class
ROOT has a graphic user interface embodied in the TBrowser cass. To launch it, you just need to create an instance of this class from the ROOT’s command line:
> TBrowser b
You will get a window like this:
In the left panel you see the file we have opened. You can expand the list to see the objects stored in the file and then you can expand the TTree object NOMINAL to see the variables stored in the tree:
You can even make simple plots using TBrowser. Try to double-click on a variable (e.g. “ditau_mmc_mlm_m”) and you will get a histogram of the reconstructed Higgs mass:
While TBrowser is useful quickly to check the file content, it is impractical if you need to copy names of many variables or if you need to get their type.
TTree::Show or TTree::Print methods
Calling “Show” method from the ROOT command line will give you list of all variables in the tree but without their type. “Print” method will give you more information.
> NOMINAL->Show() HLT_2e17_lhvloose_nod0_L12EM15VHI = 0 HTXS_Njets_pTjet25 = 0 HTXS_Njets_pTjet30 = 0 HTXS_Stage0_Category = 0 HTXS_Stage1_1_Category_pTjet25GeV = 0 HTXS_Stage1_1_Category_pTjet30GeV = 0 HTXS_Stage1_1_Fine_Category_pTjet25GeV = 0 HTXS_Stage1_1_Fine_Category_pTjet30GeV = 0 HTXS_Stage1_Category_pTjet25GeV = 0 HTXS_Stage1_Category_pTjet30GeV = 0 HTXS_errorMode = 0 HTXS_prodMode = 0 NOMINAL_pileup_combined_weight = 0 NOMINAL_pileup_random_lb_number = 0 NOMINAL_pileup_random_run_number = 0 PRW_DATASF_1down_pileup_combined_weight = 0 PRW_DATASF_1up_pileup_combined_weight = 0 boson_0_truth_p4 = NULL boson_0_truth_pdgId = 0 boson_0_truth_q = 0 boson_0_truth_status = 0 ... > NOMINAL->Print() ****************************************************************************** *Tree :NOMINAL : NOMINAL * *Entries : 19785 : Total = 87411639 bytes File Size = 40373126 * * : : Tree compression factor = 2.16 * ****************************************************************************** *Br 0 :HLT_2e17_lhvloose_nod0_L12EM15VHI : * * | HLT_2e17_lhvloose_nod0_L12EM15VHI/i * *Entries : 19785 : Total Size= 91611 bytes File Size = 12741 * *Baskets : 98 : Basket Size= 1024 bytes Compression= 7.00 * *............................................................................* *Br 1 :HTXS_Njets_pTjet25 : HTXS_Njets_pTjet25/I * *Entries : 19785 : Total Size= 90081 bytes File Size = 24508 * *Baskets : 98 : Basket Size= 1024 bytes Compression= 3.58 * *............................................................................* *Br 2 :HTXS_Njets_pTjet30 : HTXS_Njets_pTjet30/I * *Entries : 19785 : Total Size= 90081 bytes File Size = 23839 * *Baskets : 98 : Basket Size= 1024 bytes Compression= 3.68 * *............................................................................* ...
Using MakeClass method
In my experience, the easiest way to access the list of variables inside the tree is using the MakeClass method of the TTree class. Let’s give it a try:
> NOMINAL->MakeClass("tree") Info in <TTreePlayer::MakeClass>: Files: tree.h and tree.C generated from TTree: NOMINAL (int) 0
The command has created two files, tree.h and tree.C. We do not care about the .C file but let’s have a look at the header file: tree.h
You see that at the beginning of the file you have list of all variables including their type, all in a convenient way ready to be copied into your own code.
Reading data from TTree
Now that we know how to get the list of variables stored in the tree we can go ahead and write our own code that will access the data. First, let’s make a skeleton class, e.g. called “Data”. The header file “Data.h”:
#ifndef DATA_H #define DATA_H #include "TTree.h" class Data{ public: /** * @brief Construct a new Data object * * @param tree - pointer to the TTree (or TChain) class */ Data(TTree* tree); protected: /** * @brief pointer to the TTree (or TChain) class */ TTree* m_tree = 0; }; #endif
And the source “Data.cpp”:
#include "Data.h" Data::Data(TTree* tree) : m_tree(tree) { }
Couple of remarks:
- In this code we use TTree class while in the event loop we used TChain. However, because TChain inherits from TTree, one can always pass the TChain instance into TTree pointer, but one cannot do it the other way around:
TChain* chain = new TChain(“NOMINAL”);
TTree* tree = chain; // this works
TTree* tree2 = new TTree();
chain = tree2; // this doesn’t - This time, we have constructor that takes one parameter: pointer to the tree (or TChain) instance. This pointer is then passed to the class’s attribute m_chain. Apart from this the constructor is empty (yet).
To compile this new class, you have to add it to the “Makefile” and “Linkdef.h” class. You have now learned enough to do it yourself
Now that we have the skeleton ready and checked that it compiles, we can add variables we want to read from the ROOT file. The variables are read from the TTrees like this:
// first we create a c++ variables of the correct type Float_t someFloatVar; Int_t someIntVar; TLorentzVector *someFourMomentumVar = 0; //then we pass them into the tree tree->SetBranchAddress("someFloatVar", &someFloatVar); tree->SetBranchAddress("someIntVar", &someIntVar); tree->SetBranchAddress("someFourMomentumVar", &someFourMomentumVar); // event loop for(int i=0; i<tree->GetEntries(); ++i) { // here the values from the file get copied into memory tree->GetEntry(i); // now we can work with the variables, e.g. print them std::cout << someFloatVar << std::endl; }
- It is important that the type of the variables as defined in our c++ code is the same as what is in the file. This is where the “tree.h” file becomes handy, because we can copy the variable declaration from there making sure the types are correct.
- The SetBranchAddress method takes a name of the variable as stored in the tree as a first parameter and a pointer to the variable in c++ where we want it stored. The “&” symbol means that the pointer (i.e. address in the memory) is passed as a parameter, not the value itself.
- The name of the c++ variable and the name of the variable in the file does not have to be the same. However, it is a good practice to keep them identical to make things consistent.
- For more complicated types like TLorentzVector it is better to define the c++ variable itself as a pointer (note the * in the someFourMomentumVar declaration). It is important to set the pointer to 0, otherwise you get a segmentation violation error when running the code!
- Data files usually contain more variables than is needed for your analysis. Do not read all the variables, it is not efficient. It is always better to access only variables that are actually needed.
Now that we know how the things work, let’s implement some real variables into our “Data” class. The example ntuple contains pre-processed data from the H->tautau Monte Carlo simulation. It contains hundreds of variables. We will not try to explain their meaning (not important for the exercise) and we will of course not try to read all of them but only few:
- Reconstructed Higgs boson mass. It is stored in variable called ditau_mmc_mlm_m (complicated name
)
- Type of the leading and sub-leading lepton: tau_0 and tau_1
- Momentum of the leading and sub-leading lepton: tau_0_p4 and tau_1_p4
Look up the declaration of these variables in the “tree.h” file and copy them into the public section of the Data class declaration:
#ifndef DATA_H #define DATA_H #include "TTree.h" #include "TLorentzVector.h" class Data{ public: /** * @brief Construct a new Data object * * @param tree - pointer to the TTree (or TChain) class */ Data(TTree* tree); /** * @brief Tree variables */ Float_t ditau_mmc_mlm_m; UInt_t tau_0; UInt_t tau_1; TLorentzVector *tau_0_p4 = 0; //MUST SET POINTERS TO 0! TLorentzVector *tau_1_p4 = 0; //MUST SET POINTERS TO 0! protected: /** * @brief pointer to the TTree (or TChain) class */ TTree* m_tree = 0; }; #endif
- Note that we have added #include “TLorentzVector.h” declaration for the TLorentzVector class. Otherwise the code would not compile because compiler would not know what this type is. The simple types (float, int, uint, etc) do not need include statements.
- We have declared the variables in the public section of the class so that we can access them from outside the class.
- Note that for the TLorentzVector variables the pointer must be initialised to 0. It is important, otherwise you get a segmentation error.
We also need to add the “SetBranchAddress” calls. This will go into the constructor in the “Data.cpp” file. Since the variables are defined as attributed of the Data class, they are accessible from within any class’s method including the constructor. So we can simply do:
#include "Data.h" Data::Data(TTree* tree) : m_tree(tree) { m_tree->SetBranchAddress("ditau_mmc_mlm_m", &ditau_mmc_mlm_m); m_tree->SetBranchAddress("tau_0", &tau_0); m_tree->SetBranchAddress("tau_1", &tau_1); m_tree->SetBranchAddress("tau_0_p4", &tau_0_p4); m_tree->SetBranchAddress("tau_1_p4", &tau_1_p4); }
Now that we have our simple data access class, let’s integrate it into the event loop. We need to include the “Data.h” into the EventLoop.h and add a member attribute “Data* m_data” (EventLoop.h):
#ifndef EVENTLOOP_H #define EVENTLOOP_H #include <vector> #include "TString.h" #include "TChain.h" #include "Data.h" class EventLoop { public: /** * @brief Construct a new Event Loop object */ EventLoop(); /** * @brief Initialize the event loop */ void initialize(); /** * @brief Execute the event loop */ void execute(); /** * @brief list of input ROOT file names */ std::vector<TString> inputFiles; /** * @brief Name of the TTree instance. Must be same in all files */ TString treeName; protected: /** * @brief Instance of the TChain class used to read the data */ TChain* m_chain = 0; // pointer is initialized to zero /** * @brief Instance of the data-access class */ Data* m_data = 0; }; #endif
In the source file we need to create an instance of the Data class and access it in the event loop (EventLoop.cpp):
#include "EventLoop.h" #include <iostream> #include <stdexcept> EventLoop::EventLoop() { // nothing to do here } void EventLoop::initialize() { // create an instance of the TChain class m_chain = new TChain(treeName); // loop through the input files and add them to the chain for(auto inputFile : inputFiles) { m_chain->Add(inputFile); std::cout << "Added file: " << inputFile << std::endl; } // create an instance of the Data class. Here the variables // are linked with the tree using the SetBranchAddress method m_data = new Data(m_chain); } void EventLoop::execute() { // sanity check. m_chain must not be zero if(!m_chain) { throw std::runtime_error("Calling execute while the event loop was not initialized."); } // here we do the actual event loop for(int i=0; i<m_chain->GetEntries(); ++i) { // event number printout if(i%1000==0) { std::cout << "Event " << i << std::endl; } // read the data for i-th event m_chain->GetEntry(i); // now we can work with the variables. Let's for example print Higgs mass // but only for every 1000th event if(i%1000==0) { std::cout << "m = " << m_data->ditau_mmc_mlm_m << std::endl; } } }
- Note that we use class inheritance as mentioned above. When we create the instance of the Data class we pass the pointer to TChain instance into the constructor that expects TTree. Everything compiles and runs just fine because TChain is inherited from TTree.
- In the last block we can access the attribute ditau_mmc_mlm_m because it was defined as public. This is not how things are usually done in c++ (it breaks so called encapsulation) but it is the simplest implementation so we will use it.
Recompile as usual.
> make clean > make
Run the example using the python script “runMe.py”:
> python runMe.py Added file: ../ggH.root Event 0 m = 117.665 Event 1000 m = 164.447 Event 2000 m = 61.6836 Event 3000 m = 53.453 Event 4000 m = 114.276 Event 5000 m = 198.691 Event 6000 m = 38.7263 Event 7000
Now we have a simple example that actually reads some data from the file. However, at the moment it does not do anything useful with them :-). In the next example, we will show how to create histograms and fill them with values read from the file.