# Uncategorized

## Printing a tensor’s shape in Theano

While developing code with Theano I got exceptions that the shapes of the quantities given to an operator do not match.

Searching around, one finds examples how to print the values of the elements of tensors (e.g. here) but not necessarily their shape:

from theano import tensor as T, function, printing x = T.dvector() printing_op = printing.Print('vector') printed_x = printing_op(x) f = function([x], printed_x) result = f([1, 2, 3])

which yields:

vector __str__ = [ 1. 2. 3.]

While this works nicely for small and few dimensions, inferring the shape from a printout of a large tensor is tedious…

Just printing x.shape where x is a Theano tensor does not give the information we look for. Consider the following example:

from theano import tensor as T, function, printing x = T.dvector() def myfunc(x): print "x.shape=",x.shape return x f = function([x], myfunc(x)) result = f([1, 2, 3])

which prints the following:

x.shape= Shape.0

which is usually not what we want.

It turns out however that `theano.printing.Print(..)`

can print attributes of a tensor when given a list of attributes to be printed to the `attrs`

parameter of its constructor. To print the shape we can therefore do:

from theano import tensor as T, function, printing x = T.dvector() printing_op = printing.Print('vector', attrs = [ 'shape' ]) printed_x = printing_op(x) f = function([x], printed_x) result = f([1, 2, 3])

which then gives the desired printout:

vector shape = (3,)

This also works for other attributes like `min`

or `max`

(which can be useful to check the validity of the values in a tensor) etc.

(the above example code is admittedly inspired by examples found on deeplearning.net)

**Read Full Post**|

**Make a Comment**( None so far )

## An illustration of Minuit minimizing a function

Here is an animation of how the function minimizer Minuit minimizes a function. The function to be minimized depends on two parameters so that one can visualize it. It was inspired by the ‘banana shaped valley’ function so that Minuit must take several steps to find the minimum. The value of the goal function is indicated by the color, i.e. the goal function is a ‘spiral’ whose depth increases as one goes more counterclockwise.

The types of the steps (shown at the top left) were inferred from the functions from which it was called. The minimization typically has two alternating phases: numerical determination of the gradient and line search along some promising direction. At the end, the second derivative is calculated (although the function has a discontinuity on one side).

The red point shows the coordinates with which Minuit calls a goal function to be minimized.

The arrow at the top left shows the direction from the previous to the current step. For example during the line search phase, the direction remains mostly fixed or is reversed while during the gradient calculation phase it looks like first the horizontal component is calculated and then the vertical one.

**Read Full Post**|

**Make a Comment**( None so far )

## A world map of particle physics

not yet complete but growing (click on the map to follow the link to the interactive map):

**Read Full Post**|

**Make a Comment**( None so far )

## RooFitExplorer

Here is a project which I started to more easily navigate RooFit workspaces:

https://bitbucket.org/andreh/roofitexplorer/wiki/Home

This is a ‘interactive, graphical viewer for RooFit workspaces’. It still in very alpha stage but maybe useful to others (use at your own risk). Constructive feedback and test cases are welcome !

Here is a screenshot:

**Read Full Post**|

**Make a Comment**( None so far )

## A list of follow-ups on the recent CNGS/Opera result on the neutrino velocity

As found on hep-ex on arXiv (no guarantee that this list is exhaustive, last updated 2011-10-06):

- arXiv:1110.1162: Probing neutrino masses with neutrino-speed experiments
- arXiv:1110.0821: Testing the OPERA Superluminal Neutrino Anomaly at the LHC
- arXiv:1110.0736: Resolving 7 problems with OPERA’s superluminal neutrino experiment
- arXiv:1110.0644: A classical model explaining the OPERA velocity paradox
- arXiv:1110.0595: Is there a neutrino speed anomaly?
- arXiv:1109.6667: Constraints and tests of the OPERA superluminal neutrinos
- arXiv:1110.0424: How large is the fraction of superluminal neutrinos at OPERA?
- arXiv:1110.0351: OPERA, SN1987a and energy dependence of superluminal neutrino velocity
- arXiv:1110.0243: Using an Einstein’s idea to explain OPERA faster than light neutrinos.
- arXiv:1110.0241: Superluminal Neutrinos at OPERA Confront Pion Decay Kinematics
- arXiv:1110.0239: A simple explanation of OPERA results without strange physics
- arXiv:1109.6631: Superluminal Neutrinos and a Curious Phenomenon in the Relativistic Quantum Hamilton-Jacobi Equation
- arXiv:1109.6624: Superluminal neutrino and spontaneous breaking of Lorentz invariance
- arXiv:1109.6562: New Constraints on Neutrino Velocities
- arXiv:1109.6354: Neutrino Shortcuts in Spacetime
- arXiv:1109.6296: On the Possibility of Superluminal Neutrino Propagation
- arXiv:1109.6238: Comparison of muon and neutrino times from decays of mesons in the atmosphere
- arXiv:1109.6097: Neutrino speed anomaly as a signal of Lorentz violation
- arXiv:1109.5924: Mass-dependent Lorentz Violation and Neutrino Velocity
- arXiv:1109.5749: Superluminal neutrinos at the OPERA?
- arXiv:1109.5727: A possible statistical mechanism of anomalous neutrino velocity in OPERA experiment?
- arXiv:1109.5721: A comment on the OPERA result and CPT
- arXiv:1109.5671: OPERA’s superluminal muon-neutrino velocity and an FPS-type model of Lorentz violation
- arXiv:1109.5599: Comments on the recent velocity measurement of the muon neutrinos by the OPERA Collaboration
- arXiv:1109.5368: Inconsistence of super-luminal Opera neutrino speed with SN1987A neutrinos burst and with flavor neutrino mixing
- arXiv:1109.4980: Superluminal neutrinos in long baseline experiments and SN1987a

Here is the original OPERA preprint: arXiv:1109.4897: Measurement of the neutrino velocity with the OPERA detector in the CNGS beam

**Read Full Post**|

**Make a Comment**( None so far )

## A TMVA example in pyROOT

This is an example showing how to use TMVA using python/pyROOT. TMVA is a toolkit for multivariate analysis in ROOT which is widely used in High Energy Physics data analysis.

The example code in this post is made available under the Apache License Version 2.0. If you want to experiment with the code examples shown, it’s best to do so in a newly created directory. It has been tested with ROOT 5.30/00 which includes TMVA 4.1.2 . Copying and pasting of the example code below directly into the interactive python shell is **strongly discouraged** as crashes of the python interpreter were observed when doing this. Also, the indentation is not taken into account correctly, leading to syntax errors. Copying the code into a python file and then running this file with `python -i ` however should work.

## Generating the samples

For simplicity and to allow for easy visualization, we will work in two dimensions. First, we’ll generate two samples (‘signal’ and ‘background’) drawn from Gaussian distributions with different means and fill them into a TNtuple:

import ROOT # create a TNtuple ntuple = ROOT.TNtuple("ntuple","ntuple","x:y:signal") # generate 'signal' and 'background' distributions for i in range(10000): # throw a signal event centered at (1,1) ntuple.Fill(ROOT.gRandom.Gaus(1,1), # x ROOT.gRandom.Gaus(1,1), # y 1) # signal # throw a background event centered at (-1,-1) ntuple.Fill(ROOT.gRandom.Gaus(-1,1), # x ROOT.gRandom.Gaus(-1,1), # y 0) # background

In order to visualize the generated distributions, we can do the following:

# keeps objects otherwise removed by garbage collected in a list gcSaver = [] # create a new TCanvas gcSaver.append(ROOT.TCanvas()) # draw an empty 2D histogram for the axes histo = ROOT.TH2F("histo","",1,-5,5,1,-5,5) histo.Draw() # draw the signal events in red ntuple.SetMarkerColor(ROOT.kRed) ntuple.Draw("y:x","signal > 0.5","same") # draw the background events in blue ntuple.SetMarkerColor(ROOT.kBlue) ntuple.Draw("y:x","signal <= 0.5","same")

In my case, the resulting plot looked like this:

where the red points correspond to ‘signal’ and the blue points correspond to ‘background’. We can start training a classifier which attempts to label individual points as ‘signal’ or ‘background’ based on the value of their coordinates x and y.

The following code is inspired by the standard TMVA classification example ($ROOTSYS/tmva/test/TMVAClassification.C). First, we’ll initialize TMVA and create a factory object:

ROOT.TMVA.Tools.Instance() # note that it seems to be mandatory to have an # output file, just passing None to TMVA::Factory(..) # does not work. Make sure you don't overwrite an # existing file. fout = ROOT.TFile("test.root","RECREATE") factory = ROOT.TMVA.Factory("TMVAClassification", fout, ":".join([ "!V", "!Silent", "Color", "DrawProgressBar", "Transformations=I;D;P;G,D", "AnalysisType=Classification"] ))

The parameters given to the constructor of `ROOT.TMVA.Factory` are described in section 3.1 of the current TMVA user’s guide. Verbosity is disabled with the `!V` (‘not verbose’) option while `!Silent` (‘not silent’) still allows for some level of reporting. `Color` enables the use of color and `DrawProgressBar` enables displaying the progress of training. Currently, some of these options actually correspond to the default setting so need not necessarily be specified. The values given to the `Transformations` parameter only affect testing and visualization, not the training (according to the manual).

Now we declare which variables should be used for classification (the ‘independent’ variables) and add the tree for signal and background events (which are stored in the same tree in this example, a cut on the variable `signal` is used to distinguish between signal and background events):

factory.AddVariable("x","F") factory.AddVariable("y","F") factory.AddSignalTree(ntuple) factory.AddBackgroundTree(ntuple) # cuts defining the signal and background sample sigCut = ROOT.TCut("signal > 0.5") bgCut = ROOT.TCut("signal <= 0.5") factory.PrepareTrainingAndTestTree(sigCut, # signal events bgCut, # background events ":".join([ "nTrain_Signal=0", "nTrain_Background=0", "SplitMode=Random", "NormMode=NumEvents", "!V" ]))

The parameters given to `PrepareTrainingAndTestTree` are described in section 3.1.4 of the current TMVA user’s guide. Essentially, the available events are separated randomly into two equally large sets for training and testing and verbosity is disabled.

We then configure a classifier to learn to distinguish the two samples generated above. As an example, we use a boosted decision tree:

method = factory.BookMethod(ROOT.TMVA.Types.kBDT, "BDT", ":".join([ "!H", "!V", "NTrees=850", "nEventsMin=150", "MaxDepth=3", "BoostType=AdaBoost", "AdaBoostBeta=0.5", "SeparationType=GiniIndex", "nCuts=20", "PruneMethod=NoPruning", ])) factory.TrainAllMethods() factory.TestAllMethods() factory.EvaluateAllMethods()

The options (described in section 8.12.2 of the current TMVA user’s guide) can be summarized as follows:

- the help text should not be printed
- verbosity is disabled
- 850 trees should be used
- leaf nodes must contain at least 150 events
- the depth of the trees is limited to 3
- adaptive boosting should be used
- the Gini index should be used to select the best variable to be used in each tree node
- 20 steps should be used when scanning cuts on a variable and
- no pruning of trees after they have been constructed should be applied.

## Examining the trained classifier

After running this, TMVA should have created a file `weights/TMVAClassification_BDT.weights.xml` which contains the structure of trained classifier. In order to evaluate the classification function at arbitrary coordinates, we create an instance of `ROOT.TMVA.Reader`:

reader = ROOT.TMVA.Reader()

To calculate the value of the classifier for a given input coordinate (x,y), we first create two arrays (with one element each) for the variables x and y such that later on we can take (C++) references of the first element which we pass to the reader (see also this discussion on the ROOT bulletin board):

import array varx = array.array('f',[0]) ; reader.AddVariable("x",varx) vary = array.array('f',[0]) ; reader.AddVariable("y",vary)

Now that we have given the reader the variables x and y, we can read the weights file:

reader.BookMVA("BDT","weights/TMVAClassification_BDT.weights.xml")

In order to plot the BDT output as function of x and y, we can use the following code snippet:

# create a new 2D histogram with fine binning histo2 = ROOT.TH2F("histo2","",200,-5,5,200,-5,5) # loop over the bins of a 2D histogram for i in range(1,histo2.GetNbinsX() + 1): for j in range(1,histo2.GetNbinsY() + 1): # find the bin center coordinates varx[0] = histo2.GetXaxis().GetBinCenter(i) vary[0] = histo2.GetYaxis().GetBinCenter(j) # calculate the value of the classifier # function at the given coordinate bdtOutput = reader.EvaluateMVA("BDT") # set the bin content equal to the classifier output histo2.SetBinContent(i,j,bdtOutput) gcSaver.append(ROOT.TCanvas()) histo2.Draw("colz") # draw sigma contours around means for mean, color in ( ((1,1), ROOT.kRed), # signal ((-1,-1), ROOT.kBlue), # background ): # draw contours at 1 and 2 sigmas for numSigmas in (1,2): circle = ROOT.TEllipse(mean[0], mean[1], numSigmas) circle.SetFillStyle(0) circle.SetLineColor(color) circle.SetLineWidth(2) circle.Draw() gcSaver.append(circle) ROOT.gPad.Modified()

The output should look like this:

The overlaid ellipses are the one and two sigma contours around the mean signal and background. Note that these are ellipses because the x and y axes have a different scale. One can nicely see how the classifier approximates the Gaussian distributions with a set of trees representing rectangular regions.

Typically, one also wants to look at the distribution of the classifier output for signal and background, e.g. to get an idea how well the two samples can be separated (although the ROC curve is more suitable for comparing the performance of different classifiers). TMVA fills a tree for the events in the test sample which we can use for this purpose:

# fill histograms for signal and background from the test sample tree ROOT.TestTree.Draw("BDT>>hSig(22,-1.1,1.1)","classID == 0","goff") # signal ROOT.TestTree.Draw("BDT>>hBg(22,-1.1,1.1)","classID == 1", "goff") # background ROOT.hSig.SetLineColor(ROOT.kRed); ROOT.hSig.SetLineWidth(2) # signal histogram ROOT.hBg.SetLineColor(ROOT.kBlue); ROOT.hBg.SetLineWidth(2) # background histogram # use a THStack to show both histograms hs = ROOT.THStack("hs","") hs.Add(ROOT.hSig) hs.Add(ROOT.hBg) # show the histograms gcSaver.append(ROOT.TCanvas()) hs.Draw()

which for me produced the following output:

Enjoy !

**Read Full Post**|

**Make a Comment**(

**5**so far )

## Debugging unresponsive Java GUI applications

You probably heard about it: one should not run time-consuming operations on the event dispatch thread in Java GUI applications but use SwingWorker instead to run lengthy tasks, possibly showing a dialog box once the task is completed. One typical case of running tasks on the event dispatch thread is in a callback to the user’s action such as pressing a button or selecting a menu item from a pop-up menu.

I’m working on an application which runs external commands very often and the running time of such commands can be several minutes. Wherever possible, commands are run only on demand and the states of the objects in the application are updated according to the command’s output. While care was taken to use `SwingWorker` in obvious places the application still became unresponsive (i.e. the windows were not repainted for minutes) because the event dispatching thread was waiting to lock an object which was locked by another thread running a lengthy task (an external command in this case). In the following I describe how I found such locks with the standard Java debugger.

### Starting the program and the debugger

The application I was working on was packaged into a single .jar file. So instead of starting the program with

java -jar myapp.jar

I started the Java virtual machine and my application and made them listen to connections from a debugger:

java -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=localhost:10101 -jar myapp.jar

This instructs the Java virtual machine to bind to socket 10101 on localhost to listen to a connection from `jdb`. The `suspend=n` part instructs it to start program execution immediately.

**WARNING:** as far as I understood is there no authentication involved when connecting with `jdb` (at least not when running it as described above). In other words: all users having access to this machine can also connect to the process you are debugging and control it !

On a second console, I started the Java debugger:

jdb -attach localhost:10101

The command `help` on the debugger console will give a short list of all commands understood by `jdb`.

### Finding the locks

I waited until the GUI froze. At that point, I did:

suspend

to suspend all threads. I got a quick overview of the state of all threads by using the command `threads`:

Group system: (java.lang.ref.Reference$ReferenceHandler)0x832 Reference Handler cond. waiting (java.lang.ref.Finalizer$FinalizerThread)0x831 Finalizer cond. waiting (java.lang.Thread)0x830 Signal Dispatcher running (java.lang.Thread)0x82f Java2D Disposer cond. waiting (java.lang.Thread)0x901 TimerQueue cond. waiting Group main: (java.lang.Thread)0x82e AWT-XAWT running (java.lang.Thread)0x82d AWT-Shutdown cond. waiting (java.awt.EventDispatchThread)0x82c AWT-EventQueue-0 waiting in a monitor (java.lang.Thread)0x82b pool-2-thread-1 cond. waiting (java.lang.Thread)0x82a pool-1-thread-1 cond. waiting (java.lang.Thread)0x829 pool-1-thread-2 cond. waiting (java.lang.Thread)0x828 pool-1-thread-3 cond. waiting (java.lang.Thread)0x827 pool-1-thread-4 cond. waiting (java.lang.Thread)0x826 pool-1-thread-5 cond. waiting (java.lang.Thread)0x825 pool-1-thread-6 cond. waiting (java.lang.Thread)0x824 pool-1-thread-7 cond. waiting (java.lang.Thread)0x823 pool-1-thread-8 cond. waiting (java.lang.Thread)0x8db DestroyJavaVM running (java.lang.Thread)0x9cd SwingWorker-pool-3-thread-1 cond. waiting (java.lang.Thread)0x9d6 SwingWorker-pool-3-thread-2 waiting in a monitor (java.lang.Thread)0x9d8 SwingWorker-pool-3-thread-3 cond. waiting (java.lang.UNIXProcess$1$1)0x9d9 process reaper running (java.lang.Thread)0x9da SwingWorker-pool-3-thread-4 waiting in a monitor (java.lang.Thread)0x9db SwingWorker-pool-3-thread-5 waiting in a monitor (java.lang.Thread)0x9dc SwingWorker-pool-3-thread-6 waiting in a monitor (java.lang.Thread)0x9dd SwingWorker-pool-3-thread-7 cond. waiting (java.lang.UNIXProcess$1$1)0x9de process reaper running (java.lang.Thread)0x9df SwingWorker-pool-3-thread-8 cond. waiting (java.lang.UNIXProcess$1$1)0x9e0 process reaper running (java.lang.UNIXProcess$1$1)0x9e1 process reaper running

The important thing to note is the line:

(java.awt.EventDispatchThread)0x82c AWT-EventQueue-0 waiting in a monitor

Which means that the event dispatch thread is actually waiting to obtain a lock on an object — but which one ? First of all, I wanted to know where (in the source code) the thread was waiting. I could print the stack trace of this thread by doing:

where 0x82c

(0x82c is the thread id obtained from the above output). This gave me something like:

[1] MyTableModel.getValueAt (MyTableModel.java:136) [2] javax.swing.JTable.getValueAt (JTable.java:2,686) [3] javax.swing.JTable.prepareRenderer (JTable.java:5,703) ...

So indeed, the thread was stuck somewhere in my code. Looking at the corresponding line of source code, I had to go up a few lines to find the variable (the field `myList` of the class `MyTableModel`) in the `synchronized` statement which I was locking on.

However, I needed also to find out which other thread currently holds the lock on this variable in order to solve the problem. I could do this by doing running the following command in `jdb`:

lock this.myList

(where the `jdb` prompt showed me that the current thread was AWT-EventQueue0 because of the previous commands and the current stack frame was 1, so no need to set the current thread or stack frame). The output of the above command was:

com.sun.tools.example.debug.expr.ParseException: Unable to complete expression. Thread not suspended for method invoke Owned by: SwingWorker-pool-3-thread-1, entry count: 1 Waiting thread: SwingWorker-pool-3-thread-2 Waiting thread: AWT-EventQueue-0

again, from the output of the `threads` command (see above) I could find the thread id of the thread `SwingWorker-pool-3-thread-1` which owned the lock on `myList`. A quick

where 0x9cd

revealed that indeed the lock on `myList` was held by a thread which was waiting for an external command to complete:

[1] java.lang.Object.wait (native method) [2] java.lang.Object.wait (Object.java:485) [3] java.lang.UNIXProcess.waitFor (UNIXProcess.java:165) [4] Utils.runCommand (Utils.java:102) ...

and I could also easily find the `synchronized` statement which acquired the lock on `myList`.

**Read Full Post**|

**Make a Comment**(

**1**so far )

## Facebook Hacker Cup 2011 Round 2 Problem 3 – some discussion

Facebook’s Hacker Cup online round 2 took place some time ago. This article discusses solutions to problem 3 ‘Bonus assignments’ (the original problem statement can be found by going to http://www.facebook.com/hackercup and following the corresponding links there). Essentially, one had to count the possibilities to choose N numbers a_{1}…a_{N} (where N can range from one to one million) such that:

- the smallest of these numbers must be between two given limits A and B while the largest number must be in the range C to D. Obviously, the other N-2 numbers must be between the smallest and largest number. Note that each of these boundaries A,B,C,D may be as large as one million.
- at least one of the N numbers must have a non-zero remainder when divided by an unknown integer number P which is greater than one.

The brute force approach would be as follows:

- generate all possible sets of N numbers between A and D
- check if the set satisfies requirement 1
- for each accepted set, loop over all possible values P and
- check whether for each value P
*at least one*a_{i}fulfills condition 2. If yes count this set in.

- check whether for each value P

The number of sets to generate and investigate is (D-A+1)^{N}. For each set we must verify that the minimum element is in the range A..B and the maximum element is in the range C..D. Then, for each set, we have to scan over the number P we have to test the remainder of all N numbers after division by P, so the time complexity of the brute force approach is O((D-A+1)^{N} * D * D). Assuming the most unfavorable values for the parameters we end up with 1’000’000^{1’000’002} = 10^{6’000’012} tests. Comparing this number to the estimated number of atoms in the universe we’d rather enumerate all atoms before trying this brute force approach. And we conclude that there must be a smarter way than generating all possible sets of numbers and test them. Nevertheless, it is instructive to implement the brute force solution to check (with the examples given along with the problem statement) if one has understood the problem.

A first simplification can be achieved by trying to get rid of the ‘asymmetric’ constraints on the minimum and maximum elements. If we could calculate the number of bonus assignments by replacing condition 1 by a simpler constraint, e.g. just one common range A..D for all elements (i.e. the smallest element would not have to be smaller or equal to C) our problem would be simpler. Let’s call this number f(A,D).

We can visualize the pairs of minimum and maximum values of a tuple a_{1}…a_{N} as points in the leftmost triangle of the following figure:

The shaded area in the leftmost part of the figure corresponds to those tuples which have their smallest value (vertical axis) in the range A..B and their largest value (horizontal axis) in the range C..D. As indicated in the figure can we write this shaded area as sum and difference of triangular areas which are calculable with by function f, namely:

- f(A,D), represented by the first triangle after the equal sign.
- We have also counted those assignments for which the largest value is below C, we thus must subtract f(A,C-1), corresponding to the second triangle after the equal sign
- similarly we also included those assignments for which the smallest value is above C, we thus must subtract f(B+1,D), the third triangle after the equal sign
- we have subtracted the number those assignments which have the lowest value ≥ B+1 and the highest value ≤ C-1 twice, so we must add f(B+1,C-1) again (indicated by the rightmost triangle)

We have simplified the problem to finding all possible assignments of N values which are all in the range A to D.

In order to get closer to finding a solution, let’s first make another simplification and look at the case N = 1. Our task is to cross out all numbers in the range A..D which are integer multiples of any value P greater than 2 (and less than D). So one would start by crossing out all integer multiples of 2, then all integer multiples of 3 as shown in the following diagram:

In this diagram, the numbers being ‘crossed out’ are those on the horizontal axis and the numbers whose multiples we cross out are shown on the vertical axis. Each time a number is ‘crossed out’, a circle is placed in the corresponding column. Prime numbers are shown in red on the vertical axis.

We start by crossing out multiples of 2. Then we cross out those of 3. We notice that we don’t need to cross the multiples of 4 because it is a multiple of 2 and we have crossed out all multiples of 4 already when crossing out all multiples of 2. Indeed, the fact that 4 has been crossed out already when we arrive there tells us that all its multiples have been crossed out already. So as a first generalization, we could think of counting all integer multiples of prime numbers in the range A..D and add all counts together. Numbers like 4 are shown in gray in the figure.

However, we also notice that when we crossed out all multiples of 2 and of 3 we also crossed out multiples of 6 in both cases ! This would mean that by counting the multiples of 2 and adding the number of multiples of 3 we would have **double counted** all multiples of 6. We thus must **subtract** the number of multiples of 6 in A..D. Such numbers are shown in blue in the figure.

So we add another rule: for each pair of (distinct) primes p and q we must subtract the number of integer multiples of p*q in the range A..D. Note that we only consider products of *distinct* primes. For example multiples of p*p are a subset of the multiples of p but are **not** a subset of the multiples of q, i.e. not all of them are in the **overlap** of the multiples of p and the multiples of q. The multiples of p*p*q and p*q*q and p*p*q*q etc. on the other hand are already contained in the multiples of p*q.

How many times do we count 30 ? Let’s consider all numbers which divide 30:

- 30 is a multiple of 2 (which is prime), so we counted it when counting multiples of 2
- 30 is a multiple of 3 (which is prime), so we counted it again when counting multiples of 3
- 30 is a multiple of 5 (which is prime), so we counted it again when counting multiples of 5
- 30 is a multiple of 6, so we have subtracted one from the count when counting multiples of 2 * 3
- 30 is a multiple of 10, so we have subtracted one from the count when counting 2 * 5
- 30 is a multiple of 15, so we have subtracted one from the count when counting 3 * 5

Which leaves us with zero at the end, indicating that 30 would **not** contribute to the number of numbers (in A..D) which are divided by any number P in the range 2..D. But this is not true, 30 is divided by several numbers as we just saw. We can fix this by adding another rule for multiples of three distinct primes (2,3 and 5 in this case) we must (again) add one to the count of a number (the reason for which 15 is shown in green).

We start to see a pattern:

**count**the number of integer multiples in A..D of all**prime numbers**P (P ≤ D)**subtract**the number of integer multiples in A..D of all**pairs**of products of prime numbers (P ≤ D)**add**the number of integer multiples in A..D of all**triples**of products of prime numbers (P ≤ D)- …
**add**the number of integer multiples in A..D of all**k-tuples**of products of prime numbers (P ≤ D)**if k is odd****subtract**the number of integer multiples in A..D of all**k-tuples**of products of prime numbers (P ≤ D)**if k is even**

By the way the primes and products of distinct prime numbers are (apart from the number 1) exactly the set of **square-free integers**.

To put things on a more rigorous foundation we notice that we actually try to determine the size of a set (the number of values in A..D which can be written as integer multiple of any P ≥ 2) which we can write as a union of overlapping subsets (namely multiples of one or more primes). Let S_{i} denote the set of integer multiples in the range A..D of the i’th prime. To calculate the length of the union, we use the **Inclusion-Exclusion principle for set unions**, which reads:

The intersections are simply the multiples of the products of the (distinct) primes i_{1} * … * i_{k}. We also see that there is a factor (-1)^{k-1} which is positive if the number k of distinct primes is odd and negative if k is even, as we suspected above. Incidentally, this is exactly the negative of the **Möbius function**, which we denote as μ here.

We can tabulate this function for the values 2..D with an algorithm similar to the **Sieve of Eratosthenes**:

- we initialize the table of the values of μ with all ones
- whenever we encounter a prime number, we set the corresponding value in the table to -1 (a prime number is the product of an odd number of distinct primes, namely just the ‘product’ of itself).
- We flip the sign of all multiples of the prime. For a number which is the product of k distinct primes, this will happen k times and thus at the end the corresponding value in the table will have a value of (-1)
^{k}. For any multiple of p which is divisible**more than once**by p (i.e. is not square free), we set the value of μ to zero (and future sign flips will thus leave this value at zero).

- We flip the sign of all multiples of the prime. For a number which is the product of k distinct primes, this will happen k times and thus at the end the corresponding value in the table will have a value of (-1)

The time complexity of the Sieve of Eratosthenes (and thus also of this method of computing the values of the Möbius function) is O(D * log(log(D))) which is growing only slightly faster than linear.

By the way, you certainly already have noticed that for N=1 *all* numbers in A..D can be written as multiples of a number P ≥ 2, i.e. there will always be zero possible bonus assignments (unless 1 is part of the allowed range for numbers). No matter what number a_{1} one chooses, if price the P happens to be a_{1}, the remainder of a_{1} after division by P is zero. In fact, for N dimensions, an N-tuple only needs at least two numbers which are coprime (i.e. have no common non-trivial divisor) which ‘help each other’, i.e. even if the price is set to one of these two coprimes, the other ‘protects’ the tuple. The values of the other N-2 members of the tuple can be any values from the allowed range. However, to correctly count all possible assignments while avoiding double counting (some of the other N-2 members can also have the same value as one of these two coprimes and therefore the number of permutations leading to distinct sets is difficult to count) makes this a tedious task to say the least.

We still need to find an expression for the number of multiples of P in the range A to D (inclusive). This can be derived from the following arguments: let’s first simplify the task again to counting all multiples of P ≤ D. We must be careful with the ‘edge case’ if D is an integer multiple of P:

- if D is an integer multiple of P, the number of multiples of P smaller than or equal to D is exactly D/P
- if D is
**not**an integer multiple of P, the largest multiple of P smaller than D is P * ⌊ D/P ⌋ and thus the number of integer multiples of P smaller than D is ⌊ D/P ⌋.

One sees that for the first bullet D/P is equal to ⌊ D/P ⌋ so the number of multiples of P less than or equal to D is ⌊ D/P ⌋. To count the number of integer multiples in the range A to D, we subtract the number of multiples of P less than or equal to (A-1) from the number of multiples of P less than or equal to D or in other words: ⌊ D/P ⌋ – ⌊ (A-1)/P ⌋ .

One can compare this to what one gets from the application of Theorem 1 of the paper at http://arxiv.org/abs/1002.3254.

To summarize the case N=1 we can write the number of values in A..D which are an integer multiple of at least one number P (2≤ P ≤ D) as:

How to generalize this to N dimensions ? One would first be tempted to enumerate all *allowed* N-tuples by trying to enumerate all those for which there is **at least one a _{i}** which is

**not**divided by a number P. However, as we pointed out above, this is difficult to deal with, the ‘at least’ being a contributor to this difficulty. It is easier to count the

**complement**, namely all tuples for which a number P divides

**all**elements of the tuple. These tuples are the ‘Cartesian power’ of the set of multiples of P in the range A..D. Or in other words, for each a

_{i}we are free to select any integer multiple of P which is in the range A..D and repetitions of values among a

_{1}..a

_{N}are allowed. Thus the number of N-tuples for which P divides

**all**elements is simply (⌊ D/P ⌋ – ⌊ (A-1)/P ⌋)

^{N.}. We can add up these terms for all values of P under consideration like in the one-dimensional case.

To summarize: for f(A,D) we have the following expression:

And the value sought is (f(A,D) – f(A,C-1) – f(B+1,D) + f(B+1,C+1)) mod 1000000007 as discussed above.

Looking at the time complexity, we note that the above sum involves powers of N (which can be large). An efficient method to calculate such powers is exponentiation by squaring which has O(log(N)) time complexity. The complexity of the sum is therefore O(D * log(N)) and the overall asymptotic behaviour (when taking into account the calculation of the Möbius function) is O(D * (log(N) + log(log(D)))).

**Read Full Post**|

**Make a Comment**(

**1**so far )

## Facebook Hacker Cup 2011 Round 2 Problem 1 – some discussion

(see also the discussion of problem 3)

Facebook’s Hacker Cup online round 2 took place recently. The first problem (‘Scott’s New Trick’) was about counting products of elements of two series which are smaller than a given value L, modulo a given prime number P (see http://www.facebook.com/hackercup and links therein to see the exact problem statement). These two series were defined as linear homogeneous recurrence relations with constant coefficients.

The brute force approach consists of explicitly calculating all products of the two series (which have length M and N respectively), resulting in a complexity of O(M*N). With M and N allowed to go to ten million, M*N can be as high as 10^{14}. Even if one was able to test one pair per processor clock cycle, it would still take 10^{5} seconds (more than a day) with a 1 GHz processor with this approach in the worst case.

As we are performing operations on a finite field GF(P), one could imagine that the series of values starts repeating at some point and thus one would not have to calculate pairs of products of *all* elements of a series but only look at the elements of one period.

However, each element a_{i} was defined in terms of its **two** preceding elements a_{i-1} and a_{i-2} so there are P*P different states (rather than P) when generating the series and thus the period of the two series is bounded by P^{2}. With a maximum value of 250’000 for P, P^{2} is 62.5 billions in the worst case, corresponding to 62.5 seconds worth of CPU cycles on a 1 GHz processor. Moreover, the two series a and b need not have the same period and thus one would have to consider the least common multiple in terms of number of pairs of the two periods, which is O(P^{4}) in the worst case and thus not feasible.

A more elegant idea is to just count the **number of occurrences** of each of the possible P values in each series a and b, test for all possible products of integers a_{i}, b_{j} smaller than P whether they satisfy the condition (to be less than the given number L) and then weight the accepted products with (number of occurrences of a_{i}) * (number of occurrences of b_{j}). Such an approach reduces the complexity from O(N*M) to O(M+N+P^{2}). However, this is still at least at the limit for the calculation of 20 test cases within six minutes if not impossible.

A more efficient solution makes use of **discrete logarithms**. In other words, we write each integer x_{i} as x_{i} = g^{yi} where y_{i} is called the discrete logarithm of x_{i} with respect to the basis g. g is must be a primitive root modulo P to ensure that for any integer x_{i} ∈ [0..P-1] there is a value y_{i} which satisfies this equation. It takes O(P) time to calculate a map of all x_{i} to their logarithms y_{i} by simply calculating the powers 0..P-1 of g by repeated multiplication. Thus we can reformulate our problem from:

- find all pairs of elements of the field GF(P) whose
**product**is equal to a given value V

to

- find all pairs of elements of the field GF(P) whose
**sum**is equal to a given value log_{g}V

where we would vary V from 0 to L-1. Note that taking the logarithm of each element in GF(P) simply corresponds to a reordering of the elements which we have to take into account when considering how often an integer appears in the sequences a and b (see above).

What do we gain from this transformation ? When writing down the transformed task as an equation, we want to calculate:

where w_{i} and v_{i} are the number of occurrences of element g^{wi} in the series a and g^{vi} in the series b, respectively. u_{k} is thus the sum of the products of all weights (i.e. number of occurrences) associated to pairs whose product is g^{i + (k-i)} = g^{k}. k is the discrete logarithm of V (introduced above) with respect to the base g.

Naturally one would think that one would need to calculate P products for these P equations (k ranges from 0 to P-1), thus still requiring O(P^{2}) operations. However, looking at it a bit more carefully, one realizes that this in fact a **discrete (circular) convolution**. And for this task, fast convolution algorithms exist which have O(P * log(P)) time complexity. One common approach for fast convolution is to transform the two sequences into Fourier space (using a fast Fourier transform) where the convolution of the series corresponds to the product of the transformed series and transform them back to the original space (using the inverse transform).

Another possibility is to use a Karatsuba like algorithm for convolution which has time complexity O(P^{1.585}). This algorithm is designed to multiply two numbers written as x_{n} * b^{n}+…+x_{0} * b^{0} and y_{n} * b^{n}+…+y_{0} * b^{0} where the x_{i} and y_{i} are the ‘digits’ and b is the radix. On the other hand, multiplying these two sums is this equivalent to finding the convolution of the series of coefficients x and y.

**Read Full Post**|

**Make a Comment**(

**3**so far )

« Previous Entries