## Facebook Hacker Cup 2011 Round 2 Problem 3 – some discussion

Facebook’s Hacker Cup online round 2 took place some time ago. This article discusses solutions to problem 3 ‘Bonus assignments’ (the original problem statement can be found by going to  http://www.facebook.com/hackercup and following the corresponding links there). Essentially, one had to count the possibilities to choose N numbers a1…aN (where N can range from one to one million) such that:

1. the smallest of these numbers must be between two given limits A and B while the largest number must be in the range C to D. Obviously, the other N-2 numbers must be between the smallest and largest number. Note that each of these boundaries A,B,C,D may be as large as one million.
2. at least one of the N numbers must have a non-zero remainder when divided by an unknown integer number P which is greater than one.

The brute force approach would be as follows:

• generate all possible sets of N numbers between A and D
• check if the set satisfies requirement 1
• for each accepted set, loop over all possible values P and
• check whether for each value P at least one ai fulfills condition 2. If yes count this set in.

The number of sets to generate and investigate is (D-A+1)N. For each set we must verify that the minimum element is in the range A..B and the maximum element is in the range C..D. Then, for each set, we have to scan over the number P we have to test the remainder of all N numbers after division by P, so the time complexity of the brute force approach is O((D-A+1)N * D * D). Assuming the most unfavorable values for the parameters we end up with 1’000’0001’000’002 = 106’000’012 tests. Comparing this number to the estimated number of atoms in the universe we’d rather enumerate all atoms before trying this brute force approach. And we conclude that there must be a smarter way than generating all possible sets of numbers and test them. Nevertheless, it is instructive to implement the brute force solution to check (with the examples given along with the problem statement) if one has understood the problem.

A first simplification can be achieved by trying to get rid of the ‘asymmetric’ constraints on the minimum and maximum elements. If we could calculate the number of bonus assignments by replacing condition 1 by a simpler constraint, e.g. just one common range A..D for all elements (i.e. the smallest element would not have to be smaller or equal to C) our problem would be simpler. Let’s call this number f(A,D).

We can visualize the pairs of minimum and maximum values of a tuple a1…aN as points in the leftmost triangle of the following figure:

The shaded area in the leftmost part of the figure corresponds to those tuples which have their smallest value (vertical axis) in the range A..B and their largest value (horizontal axis) in the range C..D. As indicated in the figure can we write this shaded area as sum and difference of triangular areas which are calculable with by function f, namely:

• f(A,D), represented by the first triangle after the equal sign.
• We have also counted those assignments for which the largest value is below C, we thus must subtract f(A,C-1), corresponding to the second triangle after the equal sign
• similarly we also included those assignments for which the smallest value is above C, we thus must subtract f(B+1,D), the third triangle after the equal sign
• we have subtracted the number those assignments which have the lowest value ≥ B+1 and the highest value ≤ C-1 twice, so we must add f(B+1,C-1) again (indicated by the rightmost triangle)

We have simplified the problem to finding all possible assignments of N values which are all in the range A to D.

In order to get closer to finding a solution, let’s first make another simplification and look at the case N = 1. Our task is to cross out all numbers in the range A..D which are integer multiples of any value P greater than 2 (and less than D). So one would start by crossing out all integer multiples of 2, then all integer multiples of 3 as shown in the following diagram:

In this diagram, the numbers being ‘crossed out’ are those on the horizontal axis and the numbers whose multiples we cross out are shown on the vertical axis. Each time a number is ‘crossed out’, a circle is placed in the corresponding column. Prime numbers are shown in red on the vertical axis.

We start by crossing out multiples of 2. Then we cross out those of 3. We notice that we don’t need to cross the multiples of 4 because it is a multiple of 2 and we have crossed out all multiples of 4 already when crossing out all multiples of 2. Indeed, the fact that 4 has been crossed out already when we arrive there tells us that all its multiples have been crossed out already. So as a first generalization, we could think of counting all integer multiples of prime numbers in the range A..D and add all counts together. Numbers like 4 are shown in gray in the figure.

However, we also notice that when we crossed out all multiples of 2 and of 3 we also crossed out multiples of 6 in both cases ! This would mean that by counting the multiples of 2 and adding the number of multiples of 3 we would have double counted all multiples of 6. We thus must subtract the number of multiples of 6 in A..D. Such numbers are shown in blue in the figure.

So we add another rule: for each pair of (distinct) primes p and q we must subtract the number of integer multiples of p*q in the range A..D. Note that we only consider products of distinct primes. For example multiples of p*p are a subset of the multiples of p but are not a subset of the multiples of q, i.e. not all of them are in the overlap of the multiples of p and the multiples of q. The multiples of p*p*q and p*q*q and p*p*q*q etc. on the other hand are already contained in the multiples of p*q.

How many times do we count 30 ? Let’s consider all numbers which divide 30:

• 30 is a multiple of 2 (which is prime), so we counted it when counting multiples of 2
• 30 is a multiple of 3 (which is prime), so we counted it again when counting multiples of 3
• 30 is a multiple of 5 (which is prime), so we counted it again when counting multiples of 5
• 30 is a multiple of 6, so we have subtracted one from the count when counting multiples of 2 * 3
• 30 is a multiple of 10, so we have subtracted one from the count when counting 2 * 5
• 30 is a multiple of 15, so we have subtracted one from the count when counting 3 * 5

Which leaves us with zero at the end, indicating that 30 would not contribute to the number of numbers (in A..D) which are divided by any number P in the range 2..D. But this is not true, 30 is divided by several numbers as we just saw. We can fix this by adding another rule for multiples of three distinct primes (2,3 and 5 in this case) we must (again) add one to the count of a number (the reason for which 15 is shown in green).

We start to see a pattern:

• count the number of integer multiples in A..D of all prime numbers P (P ≤ D)
• subtract the number of integer multiples in A..D of all pairs of products of prime numbers (P ≤ D)
• add the number of integer multiples in A..D of all triples of products of prime numbers (P ≤ D)
• add the number of integer multiples in A..D of all k-tuples of products of prime numbers (P ≤ D) if k is odd
• subtract the number of integer multiples in A..D of all k-tuples of products of prime numbers (P ≤ D) if k is even

By the way the primes and products of distinct prime numbers are (apart from the number 1) exactly the set of square-free integers.

To put things on a more rigorous foundation we notice that we actually try to determine the size of a set (the number of values in A..D which can be written as integer multiple of any P ≥ 2) which we can write as a union of overlapping subsets (namely multiples of one or more primes). Let Si denote the set of integer multiples in the range A..D of the i’th prime. To calculate the length of the union, we use the Inclusion-Exclusion principle for set unions, which reads:

$\biggl|\bigcup_{i} S_i\biggr| =\sum_{i}\left|S_i\right| -\sum_{i < j}\left|S_i\cap S_j\right| \qquad +\sum_{i < j < k}\left|S_i\cap S_j\cap S_k\right|\ \ldots\ +\ \left(-1\right)^{k-1} \sum_{i_1 < \ldots < i_k}\left|S_{i_1}\cap \ldots \cap S_{i_k}\right| + \ldots\ + \left(-1\right)^{n-1} \left|S_1\cap\ldots\cap S_n\right|$

The intersections $S_{i_1} \cap \ldots \cap S_{i_k}$ are simply the multiples of the products of the (distinct) primes i1 * … * ik. We also see that there is a factor (-1)k-1 which is positive if the number k of distinct primes is odd and negative if k is even, as we suspected above. Incidentally, this is exactly the negative of the Möbius function, which we denote as μ here.

We can tabulate this function for the values 2..D with an algorithm similar to the Sieve of Eratosthenes:

• we initialize the table of the values of μ with all ones
• whenever we encounter a prime number, we set the corresponding value in the table to -1 (a prime number is the product of an odd number of distinct primes, namely just the ‘product’ of itself).
• We flip the sign of all multiples of the prime. For a number which is the product of k distinct primes, this will happen k times and thus at the end the corresponding value in the table will have a value of (-1)k. For any multiple of p which is divisible more than once by p (i.e. is not square free), we set the value of μ to zero (and future sign flips will thus leave this value at zero).

The time complexity of the Sieve of Eratosthenes (and thus also of this method of computing the values of the Möbius function) is O(D * log(log(D))) which is growing only slightly faster than linear.

By the way, you certainly already have noticed that for N=1 all numbers in A..D can be written as multiples of a number P ≥ 2, i.e. there will always be zero possible bonus assignments (unless 1 is part of the allowed range for numbers). No matter what number a1 one chooses, if price the P happens to be a1, the remainder of a1 after division by P is zero. In fact, for N dimensions, an N-tuple only needs at least two numbers which are coprime (i.e. have no common non-trivial divisor) which ‘help each other’, i.e. even if the price is set to one of these two coprimes, the other ‘protects’ the tuple. The values of the other N-2 members of the tuple can be any values from the allowed range. However, to correctly count all possible assignments while avoiding double counting (some of the other N-2 members can also have the same value as one of these two coprimes and therefore the number of permutations leading  to distinct sets is difficult to count) makes this a tedious task to say the least.

We still need to find an expression for the number of multiples of P in the range A to D (inclusive). This can be derived from the following arguments: let’s first simplify the task again to counting all multiples of P ≤ D. We must be careful with the ‘edge case’ if D is an integer multiple of P:

• if D is an integer multiple of P, the number of multiples of P smaller than or equal to D is exactly D/P
• if D is not an integer multiple of P, the largest multiple of P smaller than D is P * ⌊ D/P ⌋ and thus the number of integer multiples of P smaller than D is ⌊ D/P ⌋.

One sees that for the first bullet D/P is equal to ⌊ D/P ⌋ so the number of multiples of P less than or equal to D is ⌊ D/P ⌋. To count the number of integer multiples in the range A to D, we subtract the number of multiples of P less than or equal to (A-1) from the number of multiples of P less than or equal to D or in other words: ⌊ D/P ⌋ – ⌊ (A-1)/P ⌋ .

One can compare this to what one gets from the application of Theorem 1 of the paper at http://arxiv.org/abs/1002.3254.

To summarize the case N=1 we can write the number of values in A..D which are an integer multiple of at least one number P (2≤ P ≤ D) as:

$- \sum_{2 \le P \le D} \mu(P) \cdot \left(\lfloor D/P \rfloor - \lfloor(A-1)/P \rfloor\right)$

How to generalize this to N dimensions ? One would first be tempted to enumerate all allowed N-tuples by trying to enumerate all those for which there is at least one ai which is not divided by a number P. However, as we pointed out above, this is difficult to deal with, the ‘at least’ being a contributor to this difficulty. It is easier to count the complement, namely all tuples for which a number P divides all elements of the tuple. These tuples are the ‘Cartesian power’ of the set of multiples of P in the range A..D. Or in other words, for each ai we are free to select any integer multiple of P which is in the range A..D and repetitions of values among a1..aN are allowed. Thus the number of N-tuples for which P divides all elements is simply (⌊ D/P ⌋ – ⌊ (A-1)/P ⌋)N.. We can add up these terms for all values of P under consideration like in the one-dimensional case.

To summarize: for f(A,D) we have the following expression:

$f(A,D) = (D-A+1)^N - ( - \sum_{2 \le P \le D} \mu(P) \cdot \left(\lfloor D/P \rfloor - \lfloor(A-1)/P \rfloor\right)^N)$

And the value sought is (f(A,D) – f(A,C-1) – f(B+1,D) + f(B+1,C+1)) mod 1000000007 as discussed above.

Looking at the time complexity, we note that the above sum involves powers of N (which can be large). An efficient method to calculate such powers is exponentiation by squaring which has O(log(N)) time complexity. The complexity of the sum is therefore O(D * log(N)) and the overall asymptotic behaviour (when taking into account the calculation of the Möbius function) is O(D * (log(N) +  log(log(D)))).

Read Full Post | Make a Comment ( 1 so far )