Saturday, December 30, 2017

A correlation week

The Dec 4 - Dec 10 week was another of the calm ones, so let's use this post to discuss the interactive NEERC problem I mentioned in the previous one: you work with a device that takes an a as input and computes ad mod n using the following pseudocode:

1   modPow(a, d, n) {
2     r = 1;
3     for (i = 0; i < 60; ++i) {
4       if ((d & (1 << i)) != 0) {
5         r = r * a % n;
6       }
7       a = a * a % n;
8     }
9   }

However, instead of receiving the result of the computation, you only receive the time it took the device to execute it, which is computed in the following way: only the multiplications on lines 5 and 7 take nonzero time, and multiplying x by y modulo n takes (bits(x)+1)*(bits(y)+1), where bits(x) is the length of the binary representation of x without leading zeroes.

You know the number n but not the number d, and your goal is to find d after sending at most 30000 requests to the device. You know that n and d were chosen in the following way: first, two prime numbers p and with bits(p)=bits(q)=30 were picked independently and uniformly at random. Then n was computed as p*q. Then the number m=(p-1)*(q-1) was computed. Then d was picked uniformly at random between 1 and m-1 inclusive, such that it is coprime with m.

The official analysis has a pretty detailed description of the intended solution starting from page 8, so let me just recap it here briefly;
  • We're going to just send 30000 random queries.
  • Lowest bit of d is always 1, let's determine the second lowest.
  • If it's also 1, then we multiply a by a2, otherwise we don't.
  • For queries where a and/or ahave less bits than usual, the overall modPow time will also be less than usual, but only on average, so we can't just tell from one a.
  • However, the correlation between the time to multiply a by a2 and the overall modPow time over all 30000 queries can tell us this bit with extremely high certainty.
  • We can then determine the next bit in the same manner, and so on.
  • This Wikipedia page also describes the same idea.
This solution is very easy to code and generalizes well to bigger constraints, but it might be hard to intuitively feel if 30000 queries gives enough certainty. Egor has suggested another approach which is a bit harder to code but easier to believe that it works (but we have not actually tried to implement it): instead of sending random queries, we will send queries such that one repeated square of them is very small modulo n, say, at most 10 bits. Such a query will give much more signal on whether this repeated square is used in the multiplications, since multiplying 10 by 60 bits, compared to multiplying 60 by 60 bits, is about 3000 nanoseconds faster, and standard deviation of modPow computation time if we just feed it random inputs is on the order of 1700, so the middle point is roughly one standard deviation away from each expectation. If we try, say, 49 such numbers for every bit, then we'll have seven standard deviations and will always guess correctly, so we need only 60*49 which is about 3000 queries to find all bits.

Of course, finding a number that has few bits in a certain power modulo n is a hard task in itself, but here n is small enough that we can factorize it using Pollard's rho algorithm, and after that we just need to invert the said power modulo phi(n) to be able to find the root of this power of any number, including ones with few bits.

Thanks for reading, and check back for more!

No comments:

Post a Comment