Printing a tensor’s shape in Theano

Posted on January 3, 2016. Filed under: Uncategorized | Tags: , |

While developing code with Theano I got exceptions that the shapes of the quantities given to an operator do not match.

Searching around, one finds examples how to print the values of the elements of tensors (e.g. here) but not necessarily their shape:

from theano import tensor as T, function, printing
x = T.dvector()
printing_op = printing.Print('vector')
printed_x = printing_op(x)
f = function([x], printed_x)
result = f([1, 2, 3])

which yields:

vector __str__ = [ 1.  2.  3.]

While this works nicely for small and few dimensions, inferring the shape from a printout of a large tensor is tedious…

Just printing x.shape where x is a Theano tensor does not give the information we look for. Consider the following example:

from theano import tensor as T, function, printing
x = T.dvector()

def myfunc(x):
    print "x.shape=",x.shape
    return x

f = function([x], myfunc(x))
result = f([1, 2, 3])

which prints the following:

x.shape= Shape.0

which is usually not what we want.

It turns out however that theano.printing.Print(..) can print attributes of a tensor when given a list of attributes to be printed to the attrs parameter of its constructor. To print the shape we can therefore do:

from theano import tensor as T, function, printing
x = T.dvector()

printing_op = printing.Print('vector', attrs = [ 'shape' ])

printed_x = printing_op(x)

f = function([x], printed_x)
result = f([1, 2, 3])

which then gives the desired printout:

vector shape = (3,)

This also works for other attributes like min or max (which can be useful to check the validity of the values in a tensor) etc.

(the above example code is admittedly inspired by examples found on deeplearning.net)

Advertisements
Read Full Post | Make a Comment ( None so far )

A TMVA example in pyROOT

Posted on August 27, 2011. Filed under: Uncategorized | Tags: , , , , , , |

This is an example showing how to use TMVA using python/pyROOT. TMVA is a toolkit for multivariate analysis in ROOT which is widely used in High Energy Physics data analysis.

The example code in this post is made available under the Apache License Version 2.0. If you want to experiment with the code examples shown, it’s best to do so in a newly created directory. It has been tested with ROOT 5.30/00 which includes TMVA 4.1.2 . Copying and pasting of the example code below directly into the interactive python shell is strongly discouraged as crashes of the python interpreter were observed when doing this. Also, the indentation is not taken into account correctly, leading to syntax errors. Copying the code into a python file and then running this file with python -i however should work.

Generating the samples

For simplicity and to allow for easy visualization, we will work in two dimensions. First, we’ll generate two samples (‘signal’ and ‘background’) drawn from Gaussian distributions with different means and fill them into a TNtuple:

import ROOT

# create a TNtuple
ntuple = ROOT.TNtuple("ntuple","ntuple","x:y:signal")

# generate 'signal' and 'background' distributions
for i in range(10000):
    # throw a signal event centered at (1,1)
    ntuple.Fill(ROOT.gRandom.Gaus(1,1), # x
                ROOT.gRandom.Gaus(1,1), # y
                1)                      # signal
    
    # throw a background event centered at (-1,-1)
    ntuple.Fill(ROOT.gRandom.Gaus(-1,1), # x
                ROOT.gRandom.Gaus(-1,1), # y
                0)                       # background

In order to visualize the generated distributions, we can do the following:

# keeps objects otherwise removed by garbage collected in a list
gcSaver = []

# create a new TCanvas
gcSaver.append(ROOT.TCanvas())

# draw an empty 2D histogram for the axes
histo = ROOT.TH2F("histo","",1,-5,5,1,-5,5)
histo.Draw()

# draw the signal events in red
ntuple.SetMarkerColor(ROOT.kRed)
ntuple.Draw("y:x","signal > 0.5","same")

# draw the background events in blue
ntuple.SetMarkerColor(ROOT.kBlue)
ntuple.Draw("y:x","signal <= 0.5","same")

In my case, the resulting plot looked like this:

where the red points correspond to ‘signal’ and the blue points correspond to ‘background’. We can start training a classifier which attempts to label individual points as ‘signal’ or ‘background’ based on the value of their coordinates x and y.

The following code is inspired by the standard TMVA classification example ($ROOTSYS/tmva/test/TMVAClassification.C). First, we’ll initialize TMVA and create a factory object:

ROOT.TMVA.Tools.Instance()

# note that it seems to be mandatory to have an
# output file, just passing None to TMVA::Factory(..)
# does not work. Make sure you don't overwrite an
# existing file.
fout = ROOT.TFile("test.root","RECREATE")

factory = ROOT.TMVA.Factory("TMVAClassification", fout,
                            ":".join([
                                "!V",
                                "!Silent",
                                "Color",
                                "DrawProgressBar",
                                "Transformations=I;D;P;G,D",
                                "AnalysisType=Classification"]
                                     ))

The parameters given to the constructor of ROOT.TMVA.Factory are described in section 3.1 of the current TMVA user’s guide. Verbosity is disabled with the !V (‘not verbose’) option while !Silent (‘not silent’) still allows for some level of reporting. Color enables the use of color and DrawProgressBar enables displaying the progress of training. Currently, some of these options actually correspond to the default setting so need not necessarily be specified. The values given to the Transformations parameter only affect testing and visualization, not the training (according to the manual).

Now we declare which variables should be used for classification (the ‘independent’ variables) and add the tree for signal and background events (which are stored in the same tree in this example, a cut on the variable signal is used to distinguish between signal and background events):

factory.AddVariable("x","F")
factory.AddVariable("y","F") 

factory.AddSignalTree(ntuple)
factory.AddBackgroundTree(ntuple)

# cuts defining the signal and background sample
sigCut = ROOT.TCut("signal > 0.5")
bgCut = ROOT.TCut("signal <= 0.5")

factory.PrepareTrainingAndTestTree(sigCut,   # signal events
                                   bgCut,    # background events
                                   ":".join([
                                        "nTrain_Signal=0",
                                        "nTrain_Background=0",
                                        "SplitMode=Random",
                                        "NormMode=NumEvents",
                                        "!V"
                                       ]))

The parameters given to PrepareTrainingAndTestTree are described in section 3.1.4 of the current TMVA user’s guide. Essentially, the available events are separated randomly into two equally large sets for training and testing and verbosity is disabled.

We then configure a classifier to learn to distinguish the two samples generated above. As an example, we use a boosted decision tree:

method = factory.BookMethod(ROOT.TMVA.Types.kBDT, "BDT",
                   ":".join([
                       "!H",
                       "!V",
                       "NTrees=850",
                       "nEventsMin=150",
                       "MaxDepth=3",
                       "BoostType=AdaBoost",
                       "AdaBoostBeta=0.5",
                       "SeparationType=GiniIndex",
                       "nCuts=20",
                       "PruneMethod=NoPruning",
                       ]))

factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()

The options (described in section 8.12.2 of the current TMVA user’s guide) can be summarized as follows:

  • the help text should not be printed
  • verbosity is disabled
  • 850 trees should be used
  • leaf nodes must contain at least 150 events
  • the depth of the trees is limited to 3
  • adaptive boosting should be used
  • the Gini index should be used to select the best variable to be used in each tree node
  • 20 steps should be used when scanning cuts on a variable and
  • no pruning of trees after they have been constructed should be applied.

Examining the trained classifier

After running this, TMVA should have created a file weights/TMVAClassification_BDT.weights.xml which contains the structure of trained classifier. In order to evaluate the classification function at arbitrary coordinates, we create an instance of ROOT.TMVA.Reader:

reader = ROOT.TMVA.Reader()

To calculate the value of the classifier for a given input coordinate (x,y), we first create two arrays (with one element each) for the variables x and y such that later on we can take (C++) references of the first element which we pass to the reader (see also this discussion on the ROOT bulletin board):

import array
varx = array.array('f',[0]) ; reader.AddVariable("x",varx)
vary = array.array('f',[0]) ; reader.AddVariable("y",vary)

Now that we have given the reader the variables x and y, we can read the weights file:

reader.BookMVA("BDT","weights/TMVAClassification_BDT.weights.xml")

In order to plot the BDT output as function of x and y, we can use the following code snippet:

# create a new 2D histogram with fine binning
histo2 = ROOT.TH2F("histo2","",200,-5,5,200,-5,5)

# loop over the bins of a 2D histogram
for i in range(1,histo2.GetNbinsX() + 1):
    for j in range(1,histo2.GetNbinsY() + 1):
        
        # find the bin center coordinates
        varx[0] = histo2.GetXaxis().GetBinCenter(i)
        vary[0] = histo2.GetYaxis().GetBinCenter(j)
        
        # calculate the value of the classifier
        # function at the given coordinate
        bdtOutput = reader.EvaluateMVA("BDT")
        
        # set the bin content equal to the classifier output
        histo2.SetBinContent(i,j,bdtOutput)

gcSaver.append(ROOT.TCanvas())
histo2.Draw("colz")

# draw sigma contours around means
for mean, color in (
    ((1,1), ROOT.kRed), # signal
    ((-1,-1), ROOT.kBlue), # background
    ):
    
    # draw contours at 1 and 2 sigmas
    for numSigmas in (1,2):
        circle = ROOT.TEllipse(mean[0], mean[1], numSigmas)
        circle.SetFillStyle(0)
        circle.SetLineColor(color)
        circle.SetLineWidth(2)
        circle.Draw()
        gcSaver.append(circle)

ROOT.gPad.Modified()

The output should look like this:

The overlaid ellipses are the one and two sigma contours around the mean signal and background. Note that these are ellipses because the x and y axes have a different scale. One can nicely see how the classifier approximates the Gaussian distributions with a set of trees representing rectangular regions.

Typically, one also wants to look at the distribution of the classifier output for signal and background, e.g. to get an idea how well the two samples can be separated (although the ROC curve is more suitable for comparing the performance of different classifiers). TMVA fills a tree for the events in the test sample which we can use for this purpose:

# fill histograms for signal and background from the test sample tree
ROOT.TestTree.Draw("BDT>>hSig(22,-1.1,1.1)","classID == 0","goff")  # signal
ROOT.TestTree.Draw("BDT>>hBg(22,-1.1,1.1)","classID == 1", "goff")  # background

ROOT.hSig.SetLineColor(ROOT.kRed); ROOT.hSig.SetLineWidth(2)  # signal histogram
ROOT.hBg.SetLineColor(ROOT.kBlue); ROOT.hBg.SetLineWidth(2)   # background histogram

# use a THStack to show both histograms
hs = ROOT.THStack("hs","")
hs.Add(ROOT.hSig)
hs.Add(ROOT.hBg)

# show the histograms
gcSaver.append(ROOT.TCanvas())
hs.Draw()

which for me produced the following output:

Enjoy !

Read Full Post | Make a Comment ( 5 so far )

Using Jython as a configuration language for Java programs

Posted on November 26, 2010. Filed under: Uncategorized | Tags: , , , , |

I recently tried using Jython as a configuration language for a Java program. In fact, this was inspired by the fact that the main C++ project I’m working with at work has switched to python as a configuration language some time ago. This has several advantages:

  • no need to invent a configuration language from scratch
  • no need to write a parser for such a language
  • some users might already know python and thus they will not need to learn a new language (and python/jython is also useful beyond writing configuration files for the program)
  • the power of a programming language:
    • conditional statements
    • looping
    • generate configurations depending on parameters, contents of external files, environment variables etc.

An example configuration file could then look like:

input_files = [ "/etc/fstab", "/etc/mtab" ]
output_file = "/tmp/test.txt"

but we can also use some more complex logic inside the configuration file:

import glob, time
input_files = glob.glob("/etc/*tab")
output_file = "/tmp/test-" + time.strftime("%Y-%m-%d") + ".txt"

This example should give you an idea how powerful the concept is. In fact, whatever is possible in Jython is also possible to do in the configuration file.

It is often natural to group related parameters into a class, such as InputFile in the following example configuration:

input_files = [
  InputFile(name = "/etc/fstab", startLine = 2, endLine = 5, commentChar = '#'),
  InputFile(name = "/etc/mtab", endLine = 3),
  InputFile(name = "/etc/crontab"),
]

The underlying Java implementation of InputFile could look as follows:

public class InputFile
{
  ...

  /** Keyword arguments constructor. */
  public InputFile(PyObject values[], String names[])
  {
    System.out.println("called kwargs constructor: names=" +
      Arrays.asList(names) + " values=" + Arrays.asList(values));
     // store the given values in fields
    ...
  }
}

Note that keyword arguments constructors (requested here) are only supported in recent Jython releases (since 2.5.2-b2 according to the release notes). They are a bit more complicated to implement on the Java side than classic constructors but the use of keyword arguments in the configuration file makes it much more readable.

In order to read a configuration from a Jython file one needs to embed a Jython interpreter into the application. For this, it is necessary to add jython.jar from the Jython installation to the corresponding Java project.

The code for reading the configuration might look like this:

public class JythonConfigReader
{

  public JythonConfigReader(String input_fname)
  {
    // instantiate a Jython interpreter
    PythonInterpreter interpreter = new PythonInterpreter();

    // import configuration objects package in the Jython interpreter
    interpreter.exec("from my.package.with.configuration.classes import InputFile");

    // read the input file with the Jython interpreter
    // note that this will throw an exception if the python
    // code is not valid
    interpreter.execfile(input_fname);

    // get local variables
    PyStringMap locals = (PyStringMap) interpreter.getLocals();

    if (! locals.has_key("input_files"))
      throw new IllegalArgumentException("parameter input_files not found");

    // get the list of input files
    PyObject input_files = locals.__getitem__("input_files");

    if (! (input_files instanceof PyList))
      throw new IllegalArgumentException("parameter input_files is not a list");

    PyList input_files_list = (PyList) input_files;
    Iterator iter = input_files_list.iterator();

    // loop over all elements of the list
    while (iter.hasNext())
    {
      Object obj = iter.next();

      if (! (obj instanceof InputFile))
        throw new IllegalArgumentException("input_files contains objects of type other than InputFile");

      System.out.println("got an input file");

      // process this input file
      ...

    } // loop over all elements of input_files
  }
}

One potential danger of using Jython as a configuration language is that if one only looks at the names of the known parameters in locals() (such as input_files in the above example), one might miss mis-spelled optional parameters. To prevent such problems to some extent one can define a top-level configuration class and add other configuration objects as members, e.g. like:

config = Configuration()
config.input_files = [ InputFile(...), InputFile(...) ]

and then insist that there is only one top level Configuration object.

Read Full Post | Make a Comment ( None so far )

Filling a TNtuple in pyROOT

Posted on October 14, 2010. Filed under: Uncategorized | Tags: , , , |

I recently came across the problem that I wanted to create a TNtuple with 635 variables. I wanted to avoid using a TTree if possible (how to do this is discussed here).

I found a solution in a tutorial (page 17) how to use fill a TNtuple with few variables (TNTuple currently has two Fill(..) methods: One taking up to 15 values and another one taking an array).
So the example in the tutorial did not work for me as I have more than 15 values per row. I managed to do this however with code similar to the following one:

import ROOT, array

# the variable names for the output tuple
varnames = [ ... ]

# open the output file
fout = ROOT.TFile("output.root", "RECREATE")
fout.cd()

# create the ntuple
output_tuple = ROOT.TNtuple("tuple","tuple",":".join(varnames))

# loop over events
...

    # calculate the values to be stored in the output tuple
    values = [ ... ]

    # convert the list of values to an float array and fill the output tuple
    output_tuple.Fill(array.array("f",values))

# end of loop over events

# write the tuple to the output file and close it
fout.cd()
output_tuple.Write()
Read Full Post | Make a Comment ( 5 so far )

Using json from Jython

Posted on July 31, 2010. Filed under: Uncategorized | Tags: , , , , , |

I recently tried to use json from Jython. In CPython, there is a json module which is part of the standard library. However, this was only introduced in python 2.6. The most recent stable version of Jython is 2.5.1 however and it does not know a json module out of the box.

Fortunately, I came across jyson (http://opensource.xhaus.com/projects/show/jyson) which is a pure Java implementation of a json reader/writer. Although the code lives in its own name space, it seems that one can get the basic functionality (json.loads and json.dumps) equivalent to the CPython json module by doing:

import com.xhaus.jyson.JysonCodec as json

Read Full Post | Make a Comment ( 2 so far )

TCanvases in PyROOT

Posted on February 16, 2010. Filed under: Uncategorized | Tags: , , , , , |

When working with TCanvases in PyROOT[1], they can suddenly disappear. This seems to be due to Python’s garbage collection, i.e. when doing the following:

import ROOT
ROOT.TCanvas()

Python at some point thinks, the newly created TCanvas object is not needed any more (i.e. there is no variable pointing to it) and seems to remove it. To illustrate this, I ran the garbage collector by hand:

import gc
gc.collect()

and the TCanvas disappears from the screen.

I saw that one can fix this behavior by setting an object ownership flag[2] (on a per-instance basis) by doing:

canv = ROOT.TCanvas()
ROOT.SetOwnership(canv,False)

Profiting from Python’s ability to redefine methods and classes at runtime, I have put the following code in one of my initialization files:

import ROOT

# keep a pointer to the original TCanvas constructor
oldinit = ROOT.TCanvas.__init__

# define a new TCanvas class (inheriting from the original one),
# setting the memory ownership in the constructor
class GarbageCollectionResistentCanvas(ROOT.TCanvas):
  def __init__(self, *args):
    oldinit(self,*args)
    ROOT.SetOwnership(self,False)

# replace the old TCanvas class by the new one
ROOT.TCanvas = GarbageCollectionResistentCanvas

Whenever I create a new TCanvas, the ownership flag is set automatically and the resulting canvas resists the garbage collection.

References:

[1] http://root.cern.ch/drupal/content/how-use-use-python-pyroot-interpreter
[2] http://root.cern.ch/phpBB2/viewtopic.php?t=9786&sid=039b50aac41b93648e49387f188ad741

Read Full Post | Make a Comment ( None so far )

Liked it here?
Why not try sites on the blogroll...