Analysis tutorial #5: Polishing the code using Python classes

Python is object-oriented language just like c++. So far, we have not used classes when writing the steering script. Classes can be very useful when you have a complicated setup and you do not want to copy the same code into many places.

In this example, we will create python classes that will serve as “wrappers” around the c++ classes. Then we will use class inheritance to create different configurations without need to make unnecessary copies of the same code.

Let’s start with the Algorithm class. Look in the “runMe.py” and lookup places where we do something with the c++ class Algorithm. It gets created, its histograms are initialised and we call Sumw2 function on them, the instance is passed to the event loop and finally we store the histograms into the output ROOT file. It is all done in the following lines:

from ROOT import Algorithm
alg = Algorithm()
from ROOT import TH1D
alg.h_ditau_m = TH1D("ditau_m", ";#it{m}_{#tau#tau} [GeV];Events", 30, 0, 300)
alg.h_ditau_m.Sumw2()
eventLoop.algorithms.push_back( alg )
from ROOT import TFile
f = TFile.Open("histograms.root", "recreate")
f.cd()
alg.h_ditau_m.Write()
f.Close()

We will now move this code into the python classes and optimize it a little bit. We will make two python files (modules). One will contain everything that is generic and the second file will contain configuration-specific code. First, we create file called “Algorithm.py” and the generic Algorithm class (we chose to name the python class the same as the c++ one but it can have any name):

from ROOT import Algorithm as AlgorithmCpp
from ROOT import TFile

class Algorithm(object):
 """ Wrapper around the c++ Algorithm class
 """
 def __init__(self, name):
  self.name = name

  self.alg = AlgorithmCpp()

 def setSumw2(self):
  """ Calls Sumw2 for all the histograms
  """
  for attrName in dir(self.alg):
   attr = getattr(self.alg, attrName)
   if hasattr(attr, "Sumw2"):
    attr.Sumw2()

 def save(self, prefix):
  """ Saves the histograms into the output file.
  """
  # save everything into the ROOT file.
  f = TFile.Open("histograms.{}.{}.root".format(prefix,self.name), "recreate")
  f.cd()

  # printout
  print "Histograms saved into file: {}".format(f.GetName())

  # Loop throught all attributes of self.alg class 
  # and save all classes that have "Write" method
  for attrName in dir(self.alg):
   attr = getattr(self.alg, attrName)
   if hasattr(attr, "Write"):
    attr.Write()

  # close the file
  f.Close()
  • In the first line we import the c++ Algorithm class. Because we have named the Python class also “Algorithm” we have to rename the c++ one to AlgorithmCpp. This is done using the “as AlgorithmCpp” construct.
  • Unlike in c++, in Python constructors are always called “__init__” and their first argument must always be “self”. Self is a reference to the current class’s instance (in c++ the same thing is called “this”). Class attributes are referenced as “self.var1”, “self.var2” in the class’s methods. Beware, without “self.” one accesses local variables and not the attributes! In the constructor of the python Algorithm class we create member attribute “self.alg” which will hold the instance of the c++ class. We also name the algorithm using attribute “self.name” which is passed as an argument of the constructor. This name is used later when naming the output file.
  • The second method called “setSumw2” turns on errors for all the algorithm’s histograms. Since we want this class to be generic, we had to rewrite this bit of code to work without knowing explicitly concrete histogram names. Fortunately, in python this is very easy (unlike in c++!). Let’s look at the code more closely:
    • we have a for loop over elements of the “dir(self.alg)”. Python function “dir” returns list of all attribute names of a given class instance, so in this case we have a loop over all the attributes of the c++ class Algorithm.
    • The actual reference to the attribute (note that it can itself be an instance of a class) is retrieved using “getattr” function and stored in the variable “attr”.
    • Than we check if “attr” has an attribute “Sumw2”. Since this will only be the case if “attr” is an instance of the ROOT histogram class (it is very unlikely that some other class would have a method of this name) we can simply call it. This way, when more histograms are added in the c++ class Algorithm, this code will still work.
  • Finally, “save” method works with the same logic. This time, we are looking for method “Write” instead of “Sumw2” but otherwise it’s the same thing. If the “attr” has method “Write” it means it’s ROOT class that can be saved in the ROOT file so we save it.
  • The name of the output file is constructed using this expression:
    “histograms.{}.{}.root”.format(prefix,self.name)”
    where “prefix” is parameter of the “save” method while “self.name” is the algorithm’s name we have initialised in the constructor. So if the algorithm is named “Default” and prefix is “GGH”, the output file will be called “histograms.ggH.AlgDefault.root”. The file name string is created using Python format function, which is a powerful tool for string manipulation and one of the reasons to prefer python over c++ for any string-related operations :-).

Secondly, we will make the configuration-specific classes. Let’s call the file “Algorithms.py” and as an example we create two different classes with different histogram binning:

from Algorithm import Algorithm
from ROOT import TH1D

# -----------------------------------
class AlgDefault(Algorithm):
 """ Default set of histograms
 """
 def __init__(self):
  # call inherited constructor
  Algorithm.__init__(self, "AlgDefault")

  # create histograms
  self.alg.h_ditau_m = TH1D("ditau_m", ";#it{m}_{#tau#tau} [GeV];Events", 30, 0, 300)

# -----------------------------------
class AlgFineBinning(Algorithm):
 """ set of histograms with fine binning
 """
 def __init__(self):
  # call inherited constructor
  Algorithm.__init__(self, "AlgFineBinning")

  # create histograms: fine binning
  self.alg.h_ditau_m = TH1D("ditau_m", ";#it{m}_{#tau#tau} [GeV];Events", 300, 0, 300)
  • These classes are now much simpler. They only contain constructor which only contains initialisation of the histograms. In class “AlgDefault” the binning of “ditau_m” is set to 30, in the “AlgFineBinning” to 100.
  • Both classes inherit from the generic “Algorithm” class.
  • Note that we have to call the constructor of the parent class! Otherwise, the instance would not be initialised properly.

Now that we’ve got the Algorithm class out of the way, we can move to the event loop. In the “runMe.py” script we have the following code that creates the event loop:

from ROOT import EventLoop
eventLoop = EventLoop()
eventLoop.treeName = "NOMINAL"
eventLoop.inputFiles.push_back('../ggH.root')
eventLoop.algorithms.push_back( alg )
eventLoop.initialize()
eventLoop.execute()

While the beginning and ending of the script are generic pieces of code that will always have to be executed like this, the middle two lines refer to concrete data samples. If we want to read different samples, we will have to change these lines. So again, we have a mixture of generic and concrete code. Like before, we will split it into two files.

Firstly, the generic code will go into “EventLoop.py” file:

from ROOT import EventLoop as EventLoopCpp

class EventLoop(object):
 """ Wrapper around the c++ class event loop
 """ 
 def __init__(self, name):
  # set the event loop name
  self.name = name

  # now we can create instance of the c++ class EventLoop
  self.eventLoop = EventLoopCpp()

  # set the tree name attribute
  self.eventLoop.treeName = "NOMINAL"

  # keep the list of python Algorithm class instances
  self.algs = []

 def addAlgorithm(self, algorithm):
  """ add an algorithm into this event loop
  """
  algorithm.setSumw2()
  self.algs += [ algorithm ]
  self.eventLoop.algorithms.push_back(algorithm.alg)
 
 def addAlgorithms(self, algorithms):
  """ add multiple algorithms into this event loop
  """
  for alg in algorithms:
   self.addAlgorithm(alg)

 def execute(self):
  """ initialize and execute the event loop
  """
  self.eventLoop.initialize()
  self.eventLoop.execute()

 def save(self):
  """ save histograms from all algorithms
  """
  for alg in self.algs:
   alg.save(self.name)
  • Like before, we create an instance of the c++ class EventLoop in the constructor and save it as “self.eventLoop” member attribute
  • Method “addAlgorithm” adds the c++ Algorithm class instance to the c++ EventLoop class instance. However, in addition we also call the method “algorithm.setSumw2” we have written earlier. So this way we make sure that all histograms added to the event loop have the errors turned on.
  • In “execute” method we initialise and execute the event loop. We have merged the calls into a single method because at this point there is not need to have them separate.
  • Finally, “save” method calls “save” for all algorithms scheduled in the event loop.

Since any specific configuration of the EventLoop is related to the samples we want to process, we name the second python file (module) “Samples.py”:

from EventLoop import EventLoop

# -----------------------------------
class SampleGGH(EventLoop):
 """ event loop over the gluon-gluon fusion sample
 """ 
 def __init__(self):
  # call the inherited constructor
  EventLoop.__init__(self, "ggH")

  # add the ggH samples into the event loop
  self.eventLoop.inputFiles.push_back('../ggH.root')

# -----------------------------------
class SampleVBFH(EventLoop):
 """ event loop over the vector boson fusion sample
 """ 
 def __init__(self):
  # call the inherited constructor
  EventLoop.__init__(self, "VBFH")

  # add the ggH samples into the event loop
  self.eventLoop.inputFiles.push_back('../vbfH.root')
  • Them we set the sample specific options, i.e. we add file names for a given samples into the “inputFiles” list.
  • We have created two examples of derived classes: one with the gluon-gluon fusion Higgs sample we have used, second with a different sample also from the Higgs analysis.

Lastly, we need to modify the steering script “runMe.py” to make use of new classes. It will be much simpler now:

# here we load the shared library created using the Makefile
from ROOT import gSystem
gSystem.Load("Analysis.so")

# now we can create instance of the class EventLoop
from Samples import *
eventLoop = SampleGGH()

# create algorithm
from Algorithms import * 
algs = []
algs += [ AlgDefault() ]
algs += [ AlgFineBinning() ]

# add the algorithm into the event loop
eventLoop.addAlgorithms( algs )

# initialize and execute the event loop
eventLoop.execute()

# save plots from all algorithms in the loop
eventLoop.save()
  • Construct “from Samples import *” means that all classes from the module samples are imported.
  • In this example we schedule both available algorithms “AlgDefault” and “AlgFineBinning” into the same event loop. So in the same loop we will get two sets of histograms, one with default binning and one with finer bins.

When you run the new script, you should see two files created: “histograms.ggH.AlgDefault.root” and “histograms.ggH.AlgFineBinning.root” both containing the plot(s).

In this example, by rewriting the previous code using python classes, we have added more lines into it, but now the code is more structured and it will be easier to expand. We showed how you can configure your algorithms in a minimalist way using python inheritance. In the next example we will try to use this when implementing some basic event selection into our code.

Next section →