A TMVA example in pyROOT

Posted on August 27, 2011. Filed under: Uncategorized | Tags: , , , , , , |

This is an example showing how to use TMVA using python/pyROOT. TMVA is a toolkit for multivariate analysis in ROOT which is widely used in High Energy Physics data analysis.

The example code in this post is made available under the Apache License Version 2.0. If you want to experiment with the code examples shown, it’s best to do so in a newly created directory. It has been tested with ROOT 5.30/00 which includes TMVA 4.1.2 . Copying and pasting of the example code below directly into the interactive python shell is strongly discouraged as crashes of the python interpreter were observed when doing this. Also, the indentation is not taken into account correctly, leading to syntax errors. Copying the code into a python file and then running this file with python -i however should work.

Generating the samples

For simplicity and to allow for easy visualization, we will work in two dimensions. First, we’ll generate two samples (‘signal’ and ‘background’) drawn from Gaussian distributions with different means and fill them into a TNtuple:

import ROOT

# create a TNtuple
ntuple = ROOT.TNtuple("ntuple","ntuple","x:y:signal")

# generate 'signal' and 'background' distributions
for i in range(10000):
    # throw a signal event centered at (1,1)
    ntuple.Fill(ROOT.gRandom.Gaus(1,1), # x
                ROOT.gRandom.Gaus(1,1), # y
                1)                      # signal
    # throw a background event centered at (-1,-1)
    ntuple.Fill(ROOT.gRandom.Gaus(-1,1), # x
                ROOT.gRandom.Gaus(-1,1), # y
                0)                       # background

In order to visualize the generated distributions, we can do the following:

# keeps objects otherwise removed by garbage collected in a list
gcSaver = []

# create a new TCanvas

# draw an empty 2D histogram for the axes
histo = ROOT.TH2F("histo","",1,-5,5,1,-5,5)

# draw the signal events in red
ntuple.Draw("y:x","signal > 0.5","same")

# draw the background events in blue
ntuple.Draw("y:x","signal <= 0.5","same")

In my case, the resulting plot looked like this:

where the red points correspond to ‘signal’ and the blue points correspond to ‘background’. We can start training a classifier which attempts to label individual points as ‘signal’ or ‘background’ based on the value of their coordinates x and y.

The following code is inspired by the standard TMVA classification example ($ROOTSYS/tmva/test/TMVAClassification.C). First, we’ll initialize TMVA and create a factory object:


# note that it seems to be mandatory to have an
# output file, just passing None to TMVA::Factory(..)
# does not work. Make sure you don't overwrite an
# existing file.
fout = ROOT.TFile("test.root","RECREATE")

factory = ROOT.TMVA.Factory("TMVAClassification", fout,

The parameters given to the constructor of ROOT.TMVA.Factory are described in section 3.1 of the current TMVA user’s guide. Verbosity is disabled with the !V (‘not verbose’) option while !Silent (‘not silent’) still allows for some level of reporting. Color enables the use of color and DrawProgressBar enables displaying the progress of training. Currently, some of these options actually correspond to the default setting so need not necessarily be specified. The values given to the Transformations parameter only affect testing and visualization, not the training (according to the manual).

Now we declare which variables should be used for classification (the ‘independent’ variables) and add the tree for signal and background events (which are stored in the same tree in this example, a cut on the variable signal is used to distinguish between signal and background events):



# cuts defining the signal and background sample
sigCut = ROOT.TCut("signal > 0.5")
bgCut = ROOT.TCut("signal <= 0.5")

factory.PrepareTrainingAndTestTree(sigCut,   # signal events
                                   bgCut,    # background events

The parameters given to PrepareTrainingAndTestTree are described in section 3.1.4 of the current TMVA user’s guide. Essentially, the available events are separated randomly into two equally large sets for training and testing and verbosity is disabled.

We then configure a classifier to learn to distinguish the two samples generated above. As an example, we use a boosted decision tree:

method = factory.BookMethod(ROOT.TMVA.Types.kBDT, "BDT",


The options (described in section 8.12.2 of the current TMVA user’s guide) can be summarized as follows:

  • the help text should not be printed
  • verbosity is disabled
  • 850 trees should be used
  • leaf nodes must contain at least 150 events
  • the depth of the trees is limited to 3
  • adaptive boosting should be used
  • the Gini index should be used to select the best variable to be used in each tree node
  • 20 steps should be used when scanning cuts on a variable and
  • no pruning of trees after they have been constructed should be applied.

Examining the trained classifier

After running this, TMVA should have created a file weights/TMVAClassification_BDT.weights.xml which contains the structure of trained classifier. In order to evaluate the classification function at arbitrary coordinates, we create an instance of ROOT.TMVA.Reader:

reader = ROOT.TMVA.Reader()

To calculate the value of the classifier for a given input coordinate (x,y), we first create two arrays (with one element each) for the variables x and y such that later on we can take (C++) references of the first element which we pass to the reader (see also this discussion on the ROOT bulletin board):

import array
varx = array.array('f',[0]) ; reader.AddVariable("x",varx)
vary = array.array('f',[0]) ; reader.AddVariable("y",vary)

Now that we have given the reader the variables x and y, we can read the weights file:


In order to plot the BDT output as function of x and y, we can use the following code snippet:

# create a new 2D histogram with fine binning
histo2 = ROOT.TH2F("histo2","",200,-5,5,200,-5,5)

# loop over the bins of a 2D histogram
for i in range(1,histo2.GetNbinsX() + 1):
    for j in range(1,histo2.GetNbinsY() + 1):
        # find the bin center coordinates
        varx[0] = histo2.GetXaxis().GetBinCenter(i)
        vary[0] = histo2.GetYaxis().GetBinCenter(j)
        # calculate the value of the classifier
        # function at the given coordinate
        bdtOutput = reader.EvaluateMVA("BDT")
        # set the bin content equal to the classifier output


# draw sigma contours around means
for mean, color in (
    ((1,1), ROOT.kRed), # signal
    ((-1,-1), ROOT.kBlue), # background
    # draw contours at 1 and 2 sigmas
    for numSigmas in (1,2):
        circle = ROOT.TEllipse(mean[0], mean[1], numSigmas)


The output should look like this:

The overlaid ellipses are the one and two sigma contours around the mean signal and background. Note that these are ellipses because the x and y axes have a different scale. One can nicely see how the classifier approximates the Gaussian distributions with a set of trees representing rectangular regions.

Typically, one also wants to look at the distribution of the classifier output for signal and background, e.g. to get an idea how well the two samples can be separated (although the ROC curve is more suitable for comparing the performance of different classifiers). TMVA fills a tree for the events in the test sample which we can use for this purpose:

# fill histograms for signal and background from the test sample tree
ROOT.TestTree.Draw("BDT>>hSig(22,-1.1,1.1)","classID == 0","goff")  # signal
ROOT.TestTree.Draw("BDT>>hBg(22,-1.1,1.1)","classID == 1", "goff")  # background

ROOT.hSig.SetLineColor(ROOT.kRed); ROOT.hSig.SetLineWidth(2)  # signal histogram
ROOT.hBg.SetLineColor(ROOT.kBlue); ROOT.hBg.SetLineWidth(2)   # background histogram

# use a THStack to show both histograms
hs = ROOT.THStack("hs","")

# show the histograms

which for me produced the following output:

Enjoy !

Read Full Post | Make a Comment ( 5 so far )

Filling a TNtuple in pyROOT

Posted on October 14, 2010. Filed under: Uncategorized | Tags: , , , |

I recently came across the problem that I wanted to create a TNtuple with 635 variables. I wanted to avoid using a TTree if possible (how to do this is discussed here).

I found a solution in a tutorial (page 17) how to use fill a TNtuple with few variables (TNTuple currently has two Fill(..) methods: One taking up to 15 values and another one taking an array).
So the example in the tutorial did not work for me as I have more than 15 values per row. I managed to do this however with code similar to the following one:

import ROOT, array

# the variable names for the output tuple
varnames = [ ... ]

# open the output file
fout = ROOT.TFile("output.root", "RECREATE")

# create the ntuple
output_tuple = ROOT.TNtuple("tuple","tuple",":".join(varnames))

# loop over events

    # calculate the values to be stored in the output tuple
    values = [ ... ]

    # convert the list of values to an float array and fill the output tuple

# end of loop over events

# write the tuple to the output file and close it
Read Full Post | Make a Comment ( 5 so far )

Python + ROOT in Ubuntu Lucid Lynx

Posted on May 9, 2010. Filed under: Uncategorized | Tags: , , , |

I recently tried to get ROOT in Python (also known as PyROOT) working under the latest Ubuntu release (Lucid Lynx).

It took me a while until I figured out that I installing the Ubuntu package libroot-python5.18
was not sufficient but I also had to install libroot-python-dev in order to get it to work (i.e. do import ROOT from the python interpreter).
I also had to make sure that /usr/lib/root/5.18 was in my PYTHONPATH environment variable.

Read Full Post | Make a Comment ( None so far )

TCanvases in PyROOT

Posted on February 16, 2010. Filed under: Uncategorized | Tags: , , , , , |

When working with TCanvases in PyROOT[1], they can suddenly disappear. This seems to be due to Python’s garbage collection, i.e. when doing the following:

import ROOT

Python at some point thinks, the newly created TCanvas object is not needed any more (i.e. there is no variable pointing to it) and seems to remove it. To illustrate this, I ran the garbage collector by hand:

import gc

and the TCanvas disappears from the screen.

I saw that one can fix this behavior by setting an object ownership flag[2] (on a per-instance basis) by doing:

canv = ROOT.TCanvas()

Profiting from Python’s ability to redefine methods and classes at runtime, I have put the following code in one of my initialization files:

import ROOT

# keep a pointer to the original TCanvas constructor
oldinit = ROOT.TCanvas.__init__

# define a new TCanvas class (inheriting from the original one),
# setting the memory ownership in the constructor
class GarbageCollectionResistentCanvas(ROOT.TCanvas):
  def __init__(self, *args):

# replace the old TCanvas class by the new one
ROOT.TCanvas = GarbageCollectionResistentCanvas

Whenever I create a new TCanvas, the ownership flag is set automatically and the resulting canvas resists the garbage collection.


[1] http://root.cern.ch/drupal/content/how-use-use-python-pyroot-interpreter
[2] http://root.cern.ch/phpBB2/viewtopic.php?t=9786&sid=039b50aac41b93648e49387f188ad741

Read Full Post | Make a Comment ( None so far )

Liked it here?
Why not try sites on the blogroll...