Andre Holzner's Blog

Printing a tensor’s shape in Theano

aholzner — Sun, 03 Jan 2016 21:10:38 +0000

While developing code with Theano I got exceptions that the shapes of the quantities given to an operator do not match.

Searching around, one finds examples how to print the values of the elements of tensors (e.g. here) but not necessarily their shape:

from theano import tensor as T, function, printing
x = T.dvector()
printing_op = printing.Print('vector')
printed_x = printing_op(x)
f = function([x], printed_x)
result = f([1, 2, 3])

which yields:

vector __str__ = [ 1.  2.  3.]

While this works nicely for small and few dimensions, inferring the shape from a printout of a large tensor is tedious…

Just printing x.shape where x is a Theano tensor does not give the information we look for. Consider the following example:

from theano import tensor as T, function, printing
x = T.dvector()

def myfunc(x):
    print "x.shape=",x.shape
    return x

f = function([x], myfunc(x))
result = f([1, 2, 3])

which prints the following:

x.shape= Shape.0

which is usually not what we want.

It turns out however that theano.printing.Print(..) can print attributes of a tensor when given a list of attributes to be printed to the attrs parameter of its constructor. To print the shape we can therefore do:

from theano import tensor as T, function, printing
x = T.dvector()

printing_op = printing.Print('vector', attrs = [ 'shape' ])

printed_x = printing_op(x)

f = function([x], printed_x)
result = f([1, 2, 3])

which then gives the desired printout:

vector shape = (3,)

This also works for other attributes like min or max (which can be useful to check the validity of the values in a tensor) etc.

(the above example code is admittedly inspired by examples found on deeplearning.net)

An illustration of Minuit minimizing a function

aholzner — Mon, 10 Feb 2014 22:20:34 +0000

Here is an animation of how the function minimizer Minuit minimizes a function. The function to be minimized depends on two parameters so that one can visualize it. It was inspired by the ‘banana shaped valley’ function so that Minuit must take several steps to find the minimum. The value of the goal function is indicated by the color, i.e. the goal function is a ‘spiral’ whose depth increases as one goes more counterclockwise.

The types of the steps (shown at the top left) were inferred from the functions from which it was called. The minimization typically has two alternating phases: numerical determination of the gradient and line search along some promising direction. At the end, the second derivative is calculated (although the function has a discontinuity on one side).

The red point shows the coordinates with which Minuit calls a goal function to be minimized.

The arrow at the top left shows the direction from the previous to the current step. For example during the line search phase, the direction remains mostly fixed or is reversed while during the gradient calculation phase it looks like first the horizontal component is calculated and then the vertical one.

A world map of particle physics

aholzner — Sun, 07 Jul 2013 14:00:10 +0000

not yet complete but growing (click on the map to follow the link to the interactive map):

RooFitExplorer

aholzner — Sat, 13 Oct 2012 13:49:19 +0000

Here is a project which I started to more easily navigate RooFit workspaces:

https://bitbucket.org/andreh/roofitexplorer/wiki/Home

This is a ‘interactive, graphical viewer for RooFit workspaces’. It still in very alpha stage but maybe useful to others (use at your own risk). Constructive feedback and test cases are welcome !

Here is a screenshot:

A list of follow-ups on the recent CNGS/Opera result on the neutrino velocity

aholzner — Wed, 28 Sep 2011 19:41:59 +0000

As found on hep-ex on arXiv (no guarantee that this list is exhaustive, last updated 2011-10-06):

arXiv:1110.1162: Probing neutrino masses with neutrino-speed experiments
arXiv:1110.0821: Testing the OPERA Superluminal Neutrino Anomaly at the LHC
arXiv:1110.0736: Resolving 7 problems with OPERA’s superluminal neutrino experiment
arXiv:1110.0644: A classical model explaining the OPERA velocity paradox
arXiv:1110.0595: Is there a neutrino speed anomaly?
arXiv:1109.6667: Constraints and tests of the OPERA superluminal neutrinos
arXiv:1110.0424: How large is the fraction of superluminal neutrinos at OPERA?
arXiv:1110.0351: OPERA, SN1987a and energy dependence of superluminal neutrino velocity
arXiv:1110.0243: Using an Einstein’s idea to explain OPERA faster than light neutrinos.
arXiv:1110.0241: Superluminal Neutrinos at OPERA Confront Pion Decay Kinematics
arXiv:1110.0239: A simple explanation of OPERA results without strange physics
arXiv:1109.6631: Superluminal Neutrinos and a Curious Phenomenon in the Relativistic Quantum Hamilton-Jacobi Equation
arXiv:1109.6624: Superluminal neutrino and spontaneous breaking of Lorentz invariance
arXiv:1109.6562: New Constraints on Neutrino Velocities
arXiv:1109.6354: Neutrino Shortcuts in Spacetime
arXiv:1109.6296: On the Possibility of Superluminal Neutrino Propagation
arXiv:1109.6238: Comparison of muon and neutrino times from decays of mesons in the atmosphere
arXiv:1109.6097: Neutrino speed anomaly as a signal of Lorentz violation
arXiv:1109.5924: Mass-dependent Lorentz Violation and Neutrino Velocity
arXiv:1109.5749: Superluminal neutrinos at the OPERA?
arXiv:1109.5727: A possible statistical mechanism of anomalous neutrino velocity in OPERA experiment?
arXiv:1109.5721: A comment on the OPERA result and CPT
arXiv:1109.5671: OPERA’s superluminal muon-neutrino velocity and an FPS-type model of Lorentz violation
arXiv:1109.5599: Comments on the recent velocity measurement of the muon neutrinos by the OPERA Collaboration
arXiv:1109.5368: Inconsistence of super-luminal Opera neutrino speed with SN1987A neutrinos burst and with flavor neutrino mixing
arXiv:1109.4980: Superluminal neutrinos in long baseline experiments and SN1987a

Here is the original OPERA preprint: arXiv:1109.4897: Measurement of the neutrino velocity with the OPERA detector in the CNGS beam

A TMVA example in pyROOT

aholzner — Sat, 27 Aug 2011 07:53:00 +0000

This is an example showing how to use TMVA using python/pyROOT. TMVA is a toolkit for multivariate analysis in ROOT which is widely used in High Energy Physics data analysis.

The example code in this post is made available under the Apache License Version 2.0. If you want to experiment with the code examples shown, it’s best to do so in a newly created directory. It has been tested with ROOT 5.30/00 which includes TMVA 4.1.2 . Copying and pasting of the example code below directly into the interactive python shell is strongly discouraged as crashes of the python interpreter were observed when doing this. Also, the indentation is not taken into account correctly, leading to syntax errors. Copying the code into a python file and then running this file with python -i however should work.

Generating the samples

For simplicity and to allow for easy visualization, we will work in two dimensions. First, we’ll generate two samples (‘signal’ and ‘background’) drawn from Gaussian distributions with different means and fill them into a TNtuple:

import ROOT

# create a TNtuple
ntuple = ROOT.TNtuple("ntuple","ntuple","x:y:signal")

# generate 'signal' and 'background' distributions
for i in range(10000):
    # throw a signal event centered at (1,1)
    ntuple.Fill(ROOT.gRandom.Gaus(1,1), # x
                ROOT.gRandom.Gaus(1,1), # y
                1)                      # signal
    
    # throw a background event centered at (-1,-1)
    ntuple.Fill(ROOT.gRandom.Gaus(-1,1), # x
                ROOT.gRandom.Gaus(-1,1), # y
                0)                       # background

In order to visualize the generated distributions, we can do the following:

# keeps objects otherwise removed by garbage collected in a list
gcSaver = []

# create a new TCanvas
gcSaver.append(ROOT.TCanvas())

# draw an empty 2D histogram for the axes
histo = ROOT.TH2F("histo","",1,-5,5,1,-5,5)
histo.Draw()

# draw the signal events in red
ntuple.SetMarkerColor(ROOT.kRed)
ntuple.Draw("y:x","signal > 0.5","same")

# draw the background events in blue
ntuple.SetMarkerColor(ROOT.kBlue)
ntuple.Draw("y:x","signal <= 0.5","same")

In my case, the resulting plot looked like this:

where the red points correspond to ‘signal’ and the blue points correspond to ‘background’. We can start training a classifier which attempts to label individual points as ‘signal’ or ‘background’ based on the value of their coordinates x and y.

The following code is inspired by the standard TMVA classification example ($ROOTSYS/tmva/test/TMVAClassification.C). First, we’ll initialize TMVA and create a factory object:

ROOT.TMVA.Tools.Instance()

# note that it seems to be mandatory to have an
# output file, just passing None to TMVA::Factory(..)
# does not work. Make sure you don't overwrite an
# existing file.
fout = ROOT.TFile("test.root","RECREATE")

factory = ROOT.TMVA.Factory("TMVAClassification", fout,
                            ":".join([
                                "!V",
                                "!Silent",
                                "Color",
                                "DrawProgressBar",
                                "Transformations=I;D;P;G,D",
                                "AnalysisType=Classification"]
                                     ))

The parameters given to the constructor of ROOT.TMVA.Factory are described in section 3.1 of the current TMVA user’s guide. Verbosity is disabled with the !V (‘not verbose’) option while !Silent (‘not silent’) still allows for some level of reporting. Color enables the use of color and DrawProgressBar enables displaying the progress of training. Currently, some of these options actually correspond to the default setting so need not necessarily be specified. The values given to the Transformations parameter only affect testing and visualization, not the training (according to the manual).

Now we declare which variables should be used for classification (the ‘independent’ variables) and add the tree for signal and background events (which are stored in the same tree in this example, a cut on the variable signal is used to distinguish between signal and background events):

factory.AddVariable("x","F")
factory.AddVariable("y","F") 

factory.AddSignalTree(ntuple)
factory.AddBackgroundTree(ntuple)

# cuts defining the signal and background sample
sigCut = ROOT.TCut("signal > 0.5")
bgCut = ROOT.TCut("signal <= 0.5")

factory.PrepareTrainingAndTestTree(sigCut,   # signal events
                                   bgCut,    # background events
                                   ":".join([
                                        "nTrain_Signal=0",
                                        "nTrain_Background=0",
                                        "SplitMode=Random",
                                        "NormMode=NumEvents",
                                        "!V"
                                       ]))

The parameters given to PrepareTrainingAndTestTree are described in section 3.1.4 of the current TMVA user’s guide. Essentially, the available events are separated randomly into two equally large sets for training and testing and verbosity is disabled.

We then configure a classifier to learn to distinguish the two samples generated above. As an example, we use a boosted decision tree:

method = factory.BookMethod(ROOT.TMVA.Types.kBDT, "BDT",
                   ":".join([
                       "!H",
                       "!V",
                       "NTrees=850",
                       "nEventsMin=150",
                       "MaxDepth=3",
                       "BoostType=AdaBoost",
                       "AdaBoostBeta=0.5",
                       "SeparationType=GiniIndex",
                       "nCuts=20",
                       "PruneMethod=NoPruning",
                       ]))

factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()

The options (described in section 8.12.2 of the current TMVA user’s guide) can be summarized as follows:

the help text should not be printed
verbosity is disabled
850 trees should be used
leaf nodes must contain at least 150 events
the depth of the trees is limited to 3
adaptive boosting should be used
the Gini index should be used to select the best variable to be used in each tree node
20 steps should be used when scanning cuts on a variable and
no pruning of trees after they have been constructed should be applied.

Examining the trained classifier

After running this, TMVA should have created a file weights/TMVAClassification_BDT.weights.xml which contains the structure of trained classifier. In order to evaluate the classification function at arbitrary coordinates, we create an instance of ROOT.TMVA.Reader:

reader = ROOT.TMVA.Reader()

To calculate the value of the classifier for a given input coordinate (x,y), we first create two arrays (with one element each) for the variables x and y such that later on we can take (C++) references of the first element which we pass to the reader (see also this discussion on the ROOT bulletin board):

import array
varx = array.array('f',[0]) ; reader.AddVariable("x",varx)
vary = array.array('f',[0]) ; reader.AddVariable("y",vary)

Now that we have given the reader the variables x and y, we can read the weights file:

reader.BookMVA("BDT","weights/TMVAClassification_BDT.weights.xml")

In order to plot the BDT output as function of x and y, we can use the following code snippet:

# create a new 2D histogram with fine binning
histo2 = ROOT.TH2F("histo2","",200,-5,5,200,-5,5)

# loop over the bins of a 2D histogram
for i in range(1,histo2.GetNbinsX() + 1):
    for j in range(1,histo2.GetNbinsY() + 1):
        
        # find the bin center coordinates
        varx[0] = histo2.GetXaxis().GetBinCenter(i)
        vary[0] = histo2.GetYaxis().GetBinCenter(j)
        
        # calculate the value of the classifier
        # function at the given coordinate
        bdtOutput = reader.EvaluateMVA("BDT")
        
        # set the bin content equal to the classifier output
        histo2.SetBinContent(i,j,bdtOutput)

gcSaver.append(ROOT.TCanvas())
histo2.Draw("colz")

# draw sigma contours around means
for mean, color in (
    ((1,1), ROOT.kRed), # signal
    ((-1,-1), ROOT.kBlue), # background
    ):
    
    # draw contours at 1 and 2 sigmas
    for numSigmas in (1,2):
        circle = ROOT.TEllipse(mean[0], mean[1], numSigmas)
        circle.SetFillStyle(0)
        circle.SetLineColor(color)
        circle.SetLineWidth(2)
        circle.Draw()
        gcSaver.append(circle)

ROOT.gPad.Modified()

The output should look like this:

The overlaid ellipses are the one and two sigma contours around the mean signal and background. Note that these are ellipses because the x and y axes have a different scale. One can nicely see how the classifier approximates the Gaussian distributions with a set of trees representing rectangular regions.

Typically, one also wants to look at the distribution of the classifier output for signal and background, e.g. to get an idea how well the two samples can be separated (although the ROC curve is more suitable for comparing the performance of different classifiers). TMVA fills a tree for the events in the test sample which we can use for this purpose:

# fill histograms for signal and background from the test sample tree
ROOT.TestTree.Draw("BDT>>hSig(22,-1.1,1.1)","classID == 0","goff")  # signal
ROOT.TestTree.Draw("BDT>>hBg(22,-1.1,1.1)","classID == 1", "goff")  # background

ROOT.hSig.SetLineColor(ROOT.kRed); ROOT.hSig.SetLineWidth(2)  # signal histogram
ROOT.hBg.SetLineColor(ROOT.kBlue); ROOT.hBg.SetLineWidth(2)   # background histogram

# use a THStack to show both histograms
hs = ROOT.THStack("hs","")
hs.Add(ROOT.hSig)
hs.Add(ROOT.hBg)

# show the histograms
gcSaver.append(ROOT.TCanvas())
hs.Draw()

which for me produced the following output:

Enjoy !

Debugging unresponsive Java GUI applications

aholzner — Sat, 25 Jun 2011 11:53:09 +0000

You probably heard about it: one should not run time-consuming operations on the event dispatch thread in Java GUI applications but use SwingWorker instead to run lengthy tasks, possibly showing a dialog box once the task is completed. One typical case of running tasks on the event dispatch thread is in a callback to the user’s action such as pressing a button or selecting a menu item from a pop-up menu.

I’m working on an application which runs external commands very often and the running time of such commands can be several minutes. Wherever possible, commands are run only on demand and the states of the objects in the application are updated according to the command’s output. While care was taken to use SwingWorker in obvious places the application still became unresponsive (i.e. the windows were not repainted for minutes) because the event dispatching thread was waiting to lock an object which was locked by another thread running a lengthy task (an external command in this case). In the following I describe how I found such locks with the standard Java debugger.

Starting the program and the debugger

The application I was working on was packaged into a single .jar file. So instead of starting the program with

java -jar myapp.jar

I started the Java virtual machine and my application and made them listen to connections from a debugger:

java -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=localhost:10101 -jar myapp.jar

This instructs the Java virtual machine to bind to socket 10101 on localhost to listen to a connection from jdb. The suspend=n part instructs it to start program execution immediately.

WARNING: as far as I understood is there no authentication involved when connecting with jdb (at least not when running it as described above). In other words: all users having access to this machine can also connect to the process you are debugging and control it !

On a second console, I started the Java debugger:

jdb -attach localhost:10101

The command help on the debugger console will give a short list of all commands understood by jdb.

Finding the locks

I waited until the GUI froze. At that point, I did:

suspend

to suspend all threads. I got a quick overview of the state of all threads by using the command threads:

Group system:
  (java.lang.ref.Reference$ReferenceHandler)0x832 Reference Handler           cond. waiting
  (java.lang.ref.Finalizer$FinalizerThread)0x831  Finalizer                   cond. waiting
  (java.lang.Thread)0x830                         Signal Dispatcher           running
  (java.lang.Thread)0x82f                         Java2D Disposer             cond. waiting
  (java.lang.Thread)0x901                         TimerQueue                  cond. waiting
Group main:
  (java.lang.Thread)0x82e                         AWT-XAWT                    running
  (java.lang.Thread)0x82d                         AWT-Shutdown                cond. waiting
  (java.awt.EventDispatchThread)0x82c             AWT-EventQueue-0            waiting in a monitor
  (java.lang.Thread)0x82b                         pool-2-thread-1             cond. waiting
  (java.lang.Thread)0x82a                         pool-1-thread-1             cond. waiting
  (java.lang.Thread)0x829                         pool-1-thread-2             cond. waiting
  (java.lang.Thread)0x828                         pool-1-thread-3             cond. waiting
  (java.lang.Thread)0x827                         pool-1-thread-4             cond. waiting
  (java.lang.Thread)0x826                         pool-1-thread-5             cond. waiting
  (java.lang.Thread)0x825                         pool-1-thread-6             cond. waiting
  (java.lang.Thread)0x824                         pool-1-thread-7             cond. waiting
  (java.lang.Thread)0x823                         pool-1-thread-8             cond. waiting
  (java.lang.Thread)0x8db                         DestroyJavaVM               running
  (java.lang.Thread)0x9cd                         SwingWorker-pool-3-thread-1 cond. waiting
  (java.lang.Thread)0x9d6                         SwingWorker-pool-3-thread-2 waiting in a monitor
  (java.lang.Thread)0x9d8                         SwingWorker-pool-3-thread-3 cond. waiting
  (java.lang.UNIXProcess$1$1)0x9d9                process reaper              running
  (java.lang.Thread)0x9da                         SwingWorker-pool-3-thread-4 waiting in a monitor
  (java.lang.Thread)0x9db                         SwingWorker-pool-3-thread-5 waiting in a monitor
  (java.lang.Thread)0x9dc                         SwingWorker-pool-3-thread-6 waiting in a monitor
  (java.lang.Thread)0x9dd                         SwingWorker-pool-3-thread-7 cond. waiting
  (java.lang.UNIXProcess$1$1)0x9de                process reaper              running
  (java.lang.Thread)0x9df                         SwingWorker-pool-3-thread-8 cond. waiting
  (java.lang.UNIXProcess$1$1)0x9e0                process reaper              running
  (java.lang.UNIXProcess$1$1)0x9e1                process reaper              running

The important thing to note is the line:

  (java.awt.EventDispatchThread)0x82c             AWT-EventQueue-0            waiting in a monitor

Which means that the event dispatch thread is actually waiting to obtain a lock on an object — but which one ? First of all, I wanted to know where (in the source code) the thread was waiting. I could print the stack trace of this thread by doing:

where 0x82c

(0x82c is the thread id obtained from the above output). This gave me something like:

[1] MyTableModel.getValueAt (MyTableModel.java:136)
[2] javax.swing.JTable.getValueAt (JTable.java:2,686)
[3] javax.swing.JTable.prepareRenderer (JTable.java:5,703)
...

So indeed, the thread was stuck somewhere in my code. Looking at the corresponding line of source code, I had to go up a few lines to find the variable (the field myList of the class MyTableModel) in the synchronized statement which I was locking on.

However, I needed also to find out which other thread currently holds the lock on this variable in order to solve the problem. I could do this by doing running the following command in jdb:

lock this.myList

(where the jdb prompt showed me that the current thread was AWT-EventQueue0 because of the previous commands and the current stack frame was 1, so no need to set the current thread or stack frame). The output of the above command was:

com.sun.tools.example.debug.expr.ParseException: Unable to complete expression. Thread not suspended for method invoke
Owned by: SwingWorker-pool-3-thread-1, entry count: 1
Waiting thread: SwingWorker-pool-3-thread-2
Waiting thread: AWT-EventQueue-0

again, from the output of the threads command (see above) I could find the thread id of the thread SwingWorker-pool-3-thread-1 which owned the lock on myList. A quick

where 0x9cd

revealed that indeed the lock on myList was held by a thread which was waiting for an external command to complete:

[1] java.lang.Object.wait (native method)
[2] java.lang.Object.wait (Object.java:485)
[3] java.lang.UNIXProcess.waitFor (UNIXProcess.java:165)
[4] Utils.runCommand (Utils.java:102)
...

and I could also easily find the synchronized statement which acquired the lock on myList.

Facebook Hacker Cup 2011 Round 2 Problem 3 – some discussion

aholzner — Tue, 08 Feb 2011 21:21:00 +0000

Facebook’s Hacker Cup online round 2 took place some time ago. This article discusses solutions to problem 3 ‘Bonus assignments’ (the original problem statement can be found by going to http://www.facebook.com/hackercup and following the corresponding links there). Essentially, one had to count the possibilities to choose N numbers a₁…a_N (where N can range from one to one million) such that:

the smallest of these numbers must be between two given limits A and B while the largest number must be in the range C to D. Obviously, the other N-2 numbers must be between the smallest and largest number. Note that each of these boundaries A,B,C,D may be as large as one million.
at least one of the N numbers must have a non-zero remainder when divided by an unknown integer number P which is greater than one.

The brute force approach would be as follows:

generate all possible sets of N numbers between A and D
- check if the set satisfies requirement 1
- for each accepted set, loop over all possible values P and
  - check whether for each value P at least one a_i fulfills condition 2. If yes count this set in.

The number of sets to generate and investigate is (D-A+1)^N. For each set we must verify that the minimum element is in the range A..B and the maximum element is in the range C..D. Then, for each set, we have to scan over the number P we have to test the remainder of all N numbers after division by P, so the time complexity of the brute force approach is O((D-A+1)^N * D * D). Assuming the most unfavorable values for the parameters we end up with 1’000’000^{1’000’002} = 10^{6’000’012} tests. Comparing this number to the estimated number of atoms in the universe we’d rather enumerate all atoms before trying this brute force approach. And we conclude that there must be a smarter way than generating all possible sets of numbers and test them. Nevertheless, it is instructive to implement the brute force solution to check (with the examples given along with the problem statement) if one has understood the problem.

A first simplification can be achieved by trying to get rid of the ‘asymmetric’ constraints on the minimum and maximum elements. If we could calculate the number of bonus assignments by replacing condition 1 by a simpler constraint, e.g. just one common range A..D for all elements (i.e. the smallest element would not have to be smaller or equal to C) our problem would be simpler. Let’s call this number f(A,D).

We can visualize the pairs of minimum and maximum values of a tuple a₁…a_N as points in the leftmost triangle of the following figure:

The shaded area in the leftmost part of the figure corresponds to those tuples which have their smallest value (vertical axis) in the range A..B and their largest value (horizontal axis) in the range C..D. As indicated in the figure can we write this shaded area as sum and difference of triangular areas which are calculable with by function f, namely:

f(A,D), represented by the first triangle after the equal sign.
We have also counted those assignments for which the largest value is below C, we thus must subtract f(A,C-1), corresponding to the second triangle after the equal sign
similarly we also included those assignments for which the smallest value is above C, we thus must subtract f(B+1,D), the third triangle after the equal sign
we have subtracted the number those assignments which have the lowest value ≥ B+1 and the highest value ≤ C-1 twice, so we must add f(B+1,C-1) again (indicated by the rightmost triangle)

We have simplified the problem to finding all possible assignments of N values which are all in the range A to D.

In order to get closer to finding a solution, let’s first make another simplification and look at the case N = 1. Our task is to cross out all numbers in the range A..D which are integer multiples of any value P greater than 2 (and less than D). So one would start by crossing out all integer multiples of 2, then all integer multiples of 3 as shown in the following diagram:

In this diagram, the numbers being ‘crossed out’ are those on the horizontal axis and the numbers whose multiples we cross out are shown on the vertical axis. Each time a number is ‘crossed out’, a circle is placed in the corresponding column. Prime numbers are shown in red on the vertical axis.

We start by crossing out multiples of 2. Then we cross out those of 3. We notice that we don’t need to cross the multiples of 4 because it is a multiple of 2 and we have crossed out all multiples of 4 already when crossing out all multiples of 2. Indeed, the fact that 4 has been crossed out already when we arrive there tells us that all its multiples have been crossed out already. So as a first generalization, we could think of counting all integer multiples of prime numbers in the range A..D and add all counts together. Numbers like 4 are shown in gray in the figure.

However, we also notice that when we crossed out all multiples of 2 and of 3 we also crossed out multiples of 6 in both cases ! This would mean that by counting the multiples of 2 and adding the number of multiples of 3 we would have double counted all multiples of 6. We thus must subtract the number of multiples of 6 in A..D. Such numbers are shown in blue in the figure.

So we add another rule: for each pair of (distinct) primes p and q we must subtract the number of integer multiples of p*q in the range A..D. Note that we only consider products of distinct primes. For example multiples of p*p are a subset of the multiples of p but are not a subset of the multiples of q, i.e. not all of them are in the overlap of the multiples of p and the multiples of q. The multiples of p*p*q and p*q*q and p*p*q*q etc. on the other hand are already contained in the multiples of p*q.

How many times do we count 30 ? Let’s consider all numbers which divide 30:

30 is a multiple of 2 (which is prime), so we counted it when counting multiples of 2
30 is a multiple of 3 (which is prime), so we counted it again when counting multiples of 3
30 is a multiple of 5 (which is prime), so we counted it again when counting multiples of 5
30 is a multiple of 6, so we have subtracted one from the count when counting multiples of 2 * 3
30 is a multiple of 10, so we have subtracted one from the count when counting 2 * 5
30 is a multiple of 15, so we have subtracted one from the count when counting 3 * 5

Which leaves us with zero at the end, indicating that 30 would not contribute to the number of numbers (in A..D) which are divided by any number P in the range 2..D. But this is not true, 30 is divided by several numbers as we just saw. We can fix this by adding another rule for multiples of three distinct primes (2,3 and 5 in this case) we must (again) add one to the count of a number (the reason for which 15 is shown in green).

We start to see a pattern:

count the number of integer multiples in A..D of all prime numbers P (P ≤ D)
subtract the number of integer multiples in A..D of all pairs of products of prime numbers (P ≤ D)
add the number of integer multiples in A..D of all triples of products of prime numbers (P ≤ D)
…
add the number of integer multiples in A..D of all k-tuples of products of prime numbers (P ≤ D) if k is odd
subtract the number of integer multiples in A..D of all k-tuples of products of prime numbers (P ≤ D) if k is even

By the way the primes and products of distinct prime numbers are (apart from the number 1) exactly the set of square-free integers.

To put things on a more rigorous foundation we notice that we actually try to determine the size of a set (the number of values in A..D which can be written as integer multiple of any P ≥ 2) which we can write as a union of overlapping subsets (namely multiples of one or more primes). Let S_i denote the set of integer multiples in the range A..D of the i’th prime. To calculate the length of the union, we use the Inclusion-Exclusion principle for set unions, which reads:

The intersections are simply the multiples of the products of the (distinct) primes i₁ * … * i_k. We also see that there is a factor (-1)^k-1 which is positive if the number k of distinct primes is odd and negative if k is even, as we suspected above. Incidentally, this is exactly the negative of the Möbius function, which we denote as μ here.

We can tabulate this function for the values 2..D with an algorithm similar to the Sieve of Eratosthenes:

we initialize the table of the values of μ with all ones
whenever we encounter a prime number, we set the corresponding value in the table to -1 (a prime number is the product of an odd number of distinct primes, namely just the ‘product’ of itself).
- We flip the sign of all multiples of the prime. For a number which is the product of k distinct primes, this will happen k times and thus at the end the corresponding value in the table will have a value of (-1)^k. For any multiple of p which is divisible more than once by p (i.e. is not square free), we set the value of μ to zero (and future sign flips will thus leave this value at zero).

The time complexity of the Sieve of Eratosthenes (and thus also of this method of computing the values of the Möbius function) is O(D * log(log(D))) which is growing only slightly faster than linear.

By the way, you certainly already have noticed that for N=1 all numbers in A..D can be written as multiples of a number P ≥ 2, i.e. there will always be zero possible bonus assignments (unless 1 is part of the allowed range for numbers). No matter what number a₁ one chooses, if price the P happens to be a₁, the remainder of a₁ after division by P is zero. In fact, for N dimensions, an N-tuple only needs at least two numbers which are coprime (i.e. have no common non-trivial divisor) which ‘help each other’, i.e. even if the price is set to one of these two coprimes, the other ‘protects’ the tuple. The values of the other N-2 members of the tuple can be any values from the allowed range. However, to correctly count all possible assignments while avoiding double counting (some of the other N-2 members can also have the same value as one of these two coprimes and therefore the number of permutations leading to distinct sets is difficult to count) makes this a tedious task to say the least.

We still need to find an expression for the number of multiples of P in the range A to D (inclusive). This can be derived from the following arguments: let’s first simplify the task again to counting all multiples of P ≤ D. We must be careful with the ‘edge case’ if D is an integer multiple of P:

if D is an integer multiple of P, the number of multiples of P smaller than or equal to D is exactly D/P
if D is not an integer multiple of P, the largest multiple of P smaller than D is P * ⌊ D/P ⌋ and thus the number of integer multiples of P smaller than D is ⌊ D/P ⌋.

One sees that for the first bullet D/P is equal to ⌊ D/P ⌋ so the number of multiples of P less than or equal to D is ⌊ D/P ⌋. To count the number of integer multiples in the range A to D, we subtract the number of multiples of P less than or equal to (A-1) from the number of multiples of P less than or equal to D or in other words: ⌊ D/P ⌋ – ⌊ (A-1)/P ⌋ .

One can compare this to what one gets from the application of Theorem 1 of the paper at http://arxiv.org/abs/1002.3254.

To summarize the case N=1 we can write the number of values in A..D which are an integer multiple of at least one number P (2≤ P ≤ D) as:

How to generalize this to N dimensions ? One would first be tempted to enumerate all allowed N-tuples by trying to enumerate all those for which there is at least one a_i which is not divided by a number P. However, as we pointed out above, this is difficult to deal with, the ‘at least’ being a contributor to this difficulty. It is easier to count the complement, namely all tuples for which a number P divides all elements of the tuple. These tuples are the ‘Cartesian power’ of the set of multiples of P in the range A..D. Or in other words, for each a_i we are free to select any integer multiple of P which is in the range A..D and repetitions of values among a₁..a_N are allowed. Thus the number of N-tuples for which P divides all elements is simply (⌊ D/P ⌋ – ⌊ (A-1)/P ⌋)^N.. We can add up these terms for all values of P under consideration like in the one-dimensional case.

To summarize: for f(A,D) we have the following expression:

And the value sought is (f(A,D) – f(A,C-1) – f(B+1,D) + f(B+1,C+1)) mod 1000000007 as discussed above.

Looking at the time complexity, we note that the above sum involves powers of N (which can be large). An efficient method to calculate such powers is exponentiation by squaring which has O(log(N)) time complexity. The complexity of the sum is therefore O(D * log(N)) and the overall asymptotic behaviour (when taking into account the calculation of the Möbius function) is O(D * (log(N) + log(log(D)))).

Facebook Hacker Cup 2011 Round 2 Problem 1 – some discussion

aholzner — Mon, 07 Feb 2011 12:23:22 +0000

(see also the discussion of problem 3)

Facebook’s Hacker Cup online round 2 took place recently. The first problem (‘Scott’s New Trick’) was about counting products of elements of two series which are smaller than a given value L, modulo a given prime number P (see http://www.facebook.com/hackercup and links therein to see the exact problem statement). These two series were defined as linear homogeneous recurrence relations with constant coefficients.

The brute force approach consists of explicitly calculating all products of the two series (which have length M and N respectively), resulting in a complexity of O(M*N). With M and N allowed to go to ten million, M*N can be as high as 10¹⁴. Even if one was able to test one pair per processor clock cycle, it would still take 10⁵ seconds (more than a day) with a 1 GHz processor with this approach in the worst case.

As we are performing operations on a finite field GF(P), one could imagine that the series of values starts repeating at some point and thus one would not have to calculate pairs of products of all elements of a series but only look at the elements of one period.

However, each element a_i was defined in terms of its two preceding elements a_i-1 and a_i-2 so there are P*P different states (rather than P) when generating the series and thus the period of the two series is bounded by P². With a maximum value of 250’000 for P, P² is 62.5 billions in the worst case, corresponding to 62.5 seconds worth of CPU cycles on a 1 GHz processor. Moreover, the two series a and b need not have the same period and thus one would have to consider the least common multiple in terms of number of pairs of the two periods, which is O(P⁴) in the worst case and thus not feasible.

A more elegant idea is to just count the number of occurrences of each of the possible P values in each series a and b, test for all possible products of integers a_i, b_j smaller than P whether they satisfy the condition (to be less than the given number L) and then weight the accepted products with (number of occurrences of a_i) * (number of occurrences of b_j). Such an approach reduces the complexity from O(N*M) to O(M+N+P²). However, this is still at least at the limit for the calculation of 20 test cases within six minutes if not impossible.

A more efficient solution makes use of discrete logarithms. In other words, we write each integer x_i as x_i = g^y_i where y_i is called the discrete logarithm of x_i with respect to the basis g. g is must be a primitive root modulo P to ensure that for any integer x_i ∈ [0..P-1] there is a value y_i which satisfies this equation. It takes O(P) time to calculate a map of all x_i to their logarithms y_i by simply calculating the powers 0..P-1 of g by repeated multiplication. Thus we can reformulate our problem from:

find all pairs of elements of the field GF(P) whose product is equal to a given value V

find all pairs of elements of the field GF(P) whose sum is equal to a given value log_gV

where we would vary V from 0 to L-1. Note that taking the logarithm of each element in GF(P) simply corresponds to a reordering of the elements which we have to take into account when considering how often an integer appears in the sequences a and b (see above).

What do we gain from this transformation ? When writing down the transformed task as an equation, we want to calculate:

where w_i and v_i are the number of occurrences of element g^w_i in the series a and g^v_i in the series b, respectively. u_k is thus the sum of the products of all weights (i.e. number of occurrences) associated to pairs whose product is g^{i + (k-i)} = g^k. k is the discrete logarithm of V (introduced above) with respect to the base g.

Naturally one would think that one would need to calculate P products for these P equations (k ranges from 0 to P-1), thus still requiring O(P²) operations. However, looking at it a bit more carefully, one realizes that this in fact a discrete (circular) convolution. And for this task, fast convolution algorithms exist which have O(P * log(P)) time complexity. One common approach for fast convolution is to transform the two sequences into Fourier space (using a fast Fourier transform) where the convolution of the series corresponds to the product of the transformed series and transform them back to the original space (using the inverse transform).

Another possibility is to use a Karatsuba like algorithm for convolution which has time complexity O(P^1.585). This algorithm is designed to multiply two numbers written as x_n * bⁿ+…+x₀ * b⁰ and y_n * bⁿ+…+y₀ * b⁰ where the x_i and y_i are the ‘digits’ and b is the radix. On the other hand, multiplying these two sums is this equivalent to finding the convolution of the series of coefficients x and y.

Using Jython as a configuration language for Java programs

aholzner — Fri, 26 Nov 2010 21:18:18 +0000

I recently tried using Jython as a configuration language for a Java program. In fact, this was inspired by the fact that the main C++ project I’m working with at work has switched to python as a configuration language some time ago. This has several advantages:

no need to invent a configuration language from scratch
no need to write a parser for such a language
some users might already know python and thus they will not need to learn a new language (and python/jython is also useful beyond writing configuration files for the program)
the power of a programming language:
- conditional statements
- looping
- generate configurations depending on parameters, contents of external files, environment variables etc.

An example configuration file could then look like:

input_files = [ "/etc/fstab", "/etc/mtab" ]
output_file = "/tmp/test.txt"

but we can also use some more complex logic inside the configuration file:

import glob, time
input_files = glob.glob("/etc/*tab")
output_file = "/tmp/test-" + time.strftime("%Y-%m-%d") + ".txt"

This example should give you an idea how powerful the concept is. In fact, whatever is possible in Jython is also possible to do in the configuration file.

It is often natural to group related parameters into a class, such as InputFile in the following example configuration:

input_files = [
  InputFile(name = "/etc/fstab", startLine = 2, endLine = 5, commentChar = '#'),
  InputFile(name = "/etc/mtab", endLine = 3),
  InputFile(name = "/etc/crontab"),
]

The underlying Java implementation of InputFile could look as follows:

public class InputFile
{
  ...

  /** Keyword arguments constructor. */
  public InputFile(PyObject values[], String names[])
  {
    System.out.println("called kwargs constructor: names=" +
      Arrays.asList(names) + " values=" + Arrays.asList(values));
     // store the given values in fields
    ...
  }
}

Note that keyword arguments constructors (requested here) are only supported in recent Jython releases (since 2.5.2-b2 according to the release notes). They are a bit more complicated to implement on the Java side than classic constructors but the use of keyword arguments in the configuration file makes it much more readable.

In order to read a configuration from a Jython file one needs to embed a Jython interpreter into the application. For this, it is necessary to add jython.jar from the Jython installation to the corresponding Java project.

The code for reading the configuration might look like this:

public class JythonConfigReader
{

  public JythonConfigReader(String input_fname)
  {
    // instantiate a Jython interpreter
    PythonInterpreter interpreter = new PythonInterpreter();

    // import configuration objects package in the Jython interpreter
    interpreter.exec("from my.package.with.configuration.classes import InputFile");

    // read the input file with the Jython interpreter
    // note that this will throw an exception if the python
    // code is not valid
    interpreter.execfile(input_fname);

    // get local variables
    PyStringMap locals = (PyStringMap) interpreter.getLocals();

    if (! locals.has_key("input_files"))
      throw new IllegalArgumentException("parameter input_files not found");

    // get the list of input files
    PyObject input_files = locals.__getitem__("input_files");

    if (! (input_files instanceof PyList))
      throw new IllegalArgumentException("parameter input_files is not a list");

    PyList input_files_list = (PyList) input_files;
    Iterator iter = input_files_list.iterator();

    // loop over all elements of the list
    while (iter.hasNext())
    {
      Object obj = iter.next();

      if (! (obj instanceof InputFile))
        throw new IllegalArgumentException("input_files contains objects of type other than InputFile");

      System.out.println("got an input file");

      // process this input file
      ...

    } // loop over all elements of input_files
  }
}

One potential danger of using Jython as a configuration language is that if one only looks at the names of the known parameters in locals() (such as input_files in the above example), one might miss mis-spelled optional parameters. To prevent such problems to some extent one can define a top-level configuration class and add other configuration objects as members, e.g. like:

config = Configuration()
config.input_files = [ InputFile(...), InputFile(...) ]

and then insist that there is only one top level Configuration object.