Monday, December 11, 2017

A global shift week

The Sep 18 - Sep 24 week featured the second Open Cup stage: the Grand Prix of Ural (problems, results, top 5 on the left). The Seoul National U 2 team continued to amaze, this time claiming the victory by 1 problem to Past Glory and by 2 problems to everybody else. Very well done!

Later on Sunday, Codeforces hosted a contest titled "Manthan, Codefest 17" (problems, results, top 5 on the left, analysis). anta has stumbled on the tricky problem D (less than a third of contestants who submitted got it accepted), but has still came out clearly on top thanks to being the only contestant to submit all problems. Congratulations on the victory!

In my previous summary, I have mentioned a couple of problems. One was with a recursive background: there are many contestants taking part in a competition, and top c will be awarded with t-shirts. There are n available t-shirt sizes (n<=200000). Each contestant has indicated one of two things: either they need a t-shirt with some concrete size, or they need a t-shirt with one of the two concrete adjacent sizes. More precisely, for each size i we know the number ai of contestants (up to 108) that need a t-shirt of size i, and the number bi of contestants (also up to 108) that need either a t-shirt of size i or a t-shirt of size i+1. What is the minimal number of t-shirts we need to buy in order to be able to give t-shirts to top c contestants, no matter which ones end up in top c?

First, we can notice that we need to buy such set of t-shirts that in the resulting (huge) graph there's a full matching for any set of c nodes in the left part. Hall's theorem gives us an equivalent formulation: we need to make sure that for any set of sizes for which we buy x t-shirts in total, and x is less than c, the number of people who can only wear a t-shirt from this set must not exceed x.

Then, we can notice that we can only consider segments of sizes, and not arbitrary sets, in the above condition, as when we have multiple disjoint segments that violate the constraint together, one of them will also provide a constraint violation.

Then, just like in many other problems with constraints on segments, we can buy the t-shirts greedily: going in increasing order of sizes, for each size, we will buy just enough t-shirts to make sure the constraint is not violated for all segments that ends with this size. It doesn't make sense to buy more of this size as we can replace those extra t-shirts with ones of size one higher without making things worse.

Formally speaking, si=max(min(c,ai), min(c,ai+bi-1+ai-1)-si-1, min(c,ai+bi-1+ai-1+bi-2+ai-2)-si-2-si-1, ...). As we move from i to i+1, the arguments to max() change in the following way: the subtracted part increases by si for all terms, the a+b part increases by ai+1+bi, and we get a new term. The latter causes some terms to jump over c, in which case the corresponding min() will forever stay equal to c, and these terms will be zero or negative in all subsequent steps, so we can forget about them.

This leads to the following approach: let's keep a set of terms for which the min() is still less than c, then we only need to be able to add a constant to the positive part of all terms, add a constant to the negative part of all terms, to find all terms where the positive part exceeds c, and to find the term with the largest difference between positive and negative parts. Such set of operations can be most straightforwardly implemented by keeping two priority queues of those terms: ordered by positive part and ordered by difference, and to implement adding a constant by keeping an extra "global shift" variable.

This yields a O(nlogn) solution which is fast enough, but it can also be made O(n) as described in the official analysis.

Another problem I mentioned was: two players have a sequence of 50000 integers, and alternately take numbers from either end of the sequence. When all numbers have been taken, each player computes the bitwise xor of their numbers, and the one with a higher bitwise xor wins. Who will win?

I have described our solution in this comment, but I still feel a bit uneasy about it. It can probably be proven by induction, but it would seem that in such simple setting there should be a simple argument that proves it. I'm really curious to see such argument, please share if you know one!

Finally, I've just remembered an interesting bit about the TopCoder problem which I explained in the previous summary, about the attractions, mornings, evenings and constraints. During the contest, I was unable to come up with a max flow solution even though I felt like one is possible, and out of desperation started to implement backtracking that greedily does things that are obviously good to do, branches when we don't have an obvious next step, and cuts branches that are definitely worse than the current best answer. To my surprise, I've received a wrong answer instead of time limit exceeded on the system test. The reason for that was that I assumed that I've taken a transitive closure of the set of constraints when implementing the solution, and forgot to actually take the closure. After making this fix, the backtracking passed the system test in the practice room in 42ms :)

Thanks for reading, and check back later for more September stories!

A chain cut week

The Sep 11 - Sep 17 week had its contests on the weekend. On Saturday MemSQL Start[c]UP has returned to Codeforces after a three-year absence with Round 1 of its third edition (problems, results, top 5 on the left, my screencast, analysis). moejy0viiiiiv was 25 minutes faster than everybody else to solve all problems, and made sure to protect his lead with a few challenges. Congratulations on the win!

The round had a few nice problems, but let me highlight problem F: there are many contestants taking part in a competition, and top c will be awarded with t-shirts. There are n available t-shirt sizes (n<=200000). Each contestant has indicated one of two things: either they need a t-shirt with some concrete size, or they need a t-shirt with one of the two concrete adjacent sizes. More precisely, for each size i we know the number ai of contestants (up to 108) that need a t-shirt of size i, and the number bi of contestants (also up to 108) that need either a t-shirt of size i or a t-shirt of size i+1. What is the minimal number of t-shirts we need to buy in order to be able to give t-shirts to top c contestants, no matter which ones end up in top c?

Apart from being an interesting problem in general, it's a great example for this recent discussion about stories in problem statements. I think that here the t-shirt and contestant story actually makes the problem statement really easy to understand and think about, while a completely formal mathematical statement would be hard to parse.

On Sunday the new season of the Open Cup started with the Grand Prix of Romania (problems, results, top 5 on the left). Team Past Glory did not show any signs of slowing down and won convincingly by a problem — well done! The great performance and second place of the Seoul National U 2 team was a great surprise, and even more so given that this is a different team from the one that earned gold medals at the last ICPC World Finals for Seoul.

Problem L of this contest showed that we can still find new games with non-trivial solutions in a very simple setup: two players have a sequence of 50000 integers, and alternately take numbers from either end of the sequence. When all numbers have been taken, each player computes the bitwise xor of their numbers, and the one with a higher bitwise xor wins. Who will win?

And right after the end of the Grand Prix, Codeforces held its Round 434 (problems, results, top 5 on the left, analysis). The battle for the first place was quite tight. Congratulations to kmjp who came out on top by a mere 25 points!

In my previous summary, I have mentioned a hard TopCoder problem: you are given 50 attractions in a city. You will visit exactly one attraction per day. Each attraction can be visited either in the morning, costing ai, or in the evening, costing bi. Every time you visit an attraction in the morning after visiting another attraction the previous evening, you pay an extra cost c. Every time you switch from evening to morning, you pay an extra cost d. Additionally, you are given a directed graph of restrictions — a set pairs of attractions that constrain the order of visits: in each pair, the first attraction must be visited before the second one. What is the minimal cost to perform all visits?

First of all, let's split our visits into groups: a group of morning visits, then a group of evening visits, then a group of morning visits, and so on (we need to consider two cases based on whether the very first visit is in the morning or in the evening). Since we only pay extra costs when we switch groups, the total cost depends on the number of groups and on the assignment of visits to groups, but not on the order of visits within each group.

Now let's try to build a network in which a cost of a cut would correspond to the total cost of the visits, so that we can solve our problem with the minimum cut algorithm. For each attraction, let us build a chain of vertices, with all chains having the source and sink as start and end respectively. A chain for one attraction looks like: s->v1->v2->..->vn->t. Let's add infinite capacity edges going backwards in the chain, to guarantee that any finite cut separates some prefix of this chain from the rest. The size of this prefix will correspond to the number of the group that this attraction belongs to, and thus the capacity of the edges should be alternating ai and bi: if the vertex belongs to an even-numbered group, the corresponding edge will contribute ai to the total capacity of the cut, and otherwise bi, just as we need.

In order incorporate a restriction "p must be visited before q" into our network, we will add infinite capacity edges from the vertices of the chain for p to the corresponding vertices of the chain for q. This guarantees that with any finite cut, the position we cut the chain for p will be the same or earlier than the position we cut the chain for q.

Finally, in order to incorporate the morning/evening switching costs, let us add an extra attraction that must be visited after all others, and set up the capacities on its chain not as alternating ai and bi, but rather as 0, c, c+d, 2c+d, 2c+2d, ...

The finite cuts in the resulting network correspond to valid ways of visiting the attractions, and the capacity of the cut is equal to the total visit cost, so now we just need to find the minimum cut. Also note that we have built our network using two relatively standard building blocks: infinite edges can give us constraints on the cut, and a chain of vertices with infinite backwards edges implements a choice from several alternatives with specific costs.

Thanks for reading, and check back soon!

Sunday, December 10, 2017

An almost there week

The Sep 4 - Sep 10 week started with two Codeforces rounds. On Monday, Codeforces Round 432 took place (problems, results, top 5 on the left, analysis). AngryBacon has started with the hardest problem F and solved it very fast, which gave them a clear path to victory. Congratulations!

On Wednesday, Codeforces hosted the next Round 433 (problems, results, top 5 on the left, analysis). In a round with 4 problems solvable within an hour and the fifth one too hard to crack, Um_nik has combined excellent speed with 3 successful challenges for a clear first place. Well done!

On the weekend TopCoder hosted not one but two rounds that finalized the list of finalists for its flagship tournament TopCoder Open 2017. Round 3B took place on Saturday (problems — unavailable now, but hopefully the TopCoder website will be fixed soon, results, top 5 on the left). In order to advance one simply needed to solve either the first two (like qwerty787788, Um_nik and ikatanic did), or the third problem (like rng_58 and RRx did). Congratulations to all advancers!

With 1.5 minutes to go in the challenge phase, I had 344.14 points which would be enough to advance with just the easy problem solved, only to make an incorrect challenge that brought me down to 319.14 which was not enough. Of course I didn't know that 344.14 would be enough during the round itself, but seeing the final scoreboard made me quite annoyed at myself for that -25 :)

Here is the easy problem that brought the challenge phase excitement about: you are given two strings of the same length, which is at most 50. You are allowed to swap corresponding characters in the strings (i-th character of the first string with the i-th character of the second string) as many times as you want. Your goal is to obtain two strings with as large as possible longest common substring. Can you see the right approach?


In between the TopCoder rounds, Russian Code Cup 2017 Final Round saw the top contestants compete for glory and prizes (problems, results, top 5 on the left, my screencast, analysis). xudyh was about half an hour faster than everybody else on the first 5 problems. I have tried to leapfrog him by trying to squeeze in a suboptimal solution for the last problem that is fast to code, but it did not pass and xudyh has maintained his first place — congratulations on the win!

I have found the easiest problem of this round to be the cutest. You are given a set 100 distinct integers ai each up to a million, and need to find any set of 100 integers bj also each up to a million, such that all 1002 pairwise sums ai+bj are distinct. What is the nicest way to achieve that?

And finally, the Wildcard Round of TCO 2017 determined the last two finalists on Sunday (problems — unavailable now, but hopefully the TopCoder website will be fixed soon, results, top 5 on the left, my screencast). Only kuniavski and xudyh have solved the hard problem, so it is only fitting that they qualified for the TCO onsite — great job!

I have once again tried to compensate the poor coding phase performance with lots of successful challenges, but was unable to find enough and got eliminated from the TCO. The hard problem that I couldn't crack stated: you are given 50 attractions in a city. You will visit exactly one attraction per day. Each attraction can be visited either in the morning, costing ai, or in the evening, costing bi. Every time you visit an attraction in the morning after visiting another attraction the previous evening, you pay an extra cost c. Every time you switch from evening to morning, you pay an extra cost d. Additionally, you are given a directed graph of restrictions — a set pairs of attractions that constrain the order of visits: in each pair, the first attraction must be visited before the second one. Can you see the way to use max flow to find the minimal cost to perform all visits?

Thanks for reading, and check back tomorrow for more backfills!

A Chihuahua week

Codeforces hosted its Round 431 during the Aug 28 - Sep 3 week (problems, results, top 5 on the left, analysis). dotorya has managed to solve all four solvable problems in just over one hour, more than 20 minutes faster than the second-placed Reyna and more than 50 minutes faster than everybody else. Well done!

Also during this week, the biannual Petrozavodsk Training Camp for teams taking part in ICPC has completed (results, top 5 on the left). With 9 5-hour contests packed into just 11 days, and many world class teams participating, this is the first detailed comparison of top student teams this season. Congratulations to the Moscow SU team Chihuahua for topping the overall standings with quite a margin!

Thanks for reading, and check back for more.

A yosupo week

Miss me?

The Aug 21 - Aug 27 week started of with TopCoder SRM 720 on Thursday (problems — unavailable now, but hopefully the TopCoder website will be fixed soon, results, top 5 on the left). There was a tense challenge phase battle at the top, but yosupo has made sure it was only for the second place by solving the 1000-pointer — congratulations on the victory!

The said 1000-pointer was concerned with somewhat theoretical isotonic regression in L1 metric. On the outside, it looked like this: you are given an undirected weighted graph with at most 50 vertices and at most 1000 edges, and two spanning trees in it. You have to modify the weights of the edges in such a way that the first given spanning tree is minimal, and the second given spanning tree is maximal, while minimizing the total change of weights.

A few solutions for this problem are described and linked in this comment thread.

Later on the same day we had AIM Tech Round 4 on Codeforces (problems, results, top 5 on the left, analysis). yosupo has continued his great day, this time winning without an extra problem but thanks to solving everything fast enough. Well done!

Finally, Saturday presented us with AtCoder Grand Contest 019 (problems, results, top 5 on the left, my screencastanalysis). This time yosupo was only 6th, and the victory instead went to Um_nik who has dominated the round by finishing all problems with half an hour to spare. Congratulations!

In my previous summary, I have mentioned a Codeforces problem: you are given an array of 300000 numbers, and 300000 queries of the following form: a segment [l;r] and a number k, for which you need to find the smallest number that appears on at least 1/k of all positions in the given segment of the array. Note that k is at most 5, so the number must be quite frequent within the segment.

The first idea is to use Mo's algorithm to handle the segments; we'll keep count of how many of each number are there in the current segment in a simple array, and thus can handle the counting for all segments in O(n*sqrt(n)). In order to find the answer for each segment, we'll need to find the counts that are above 1/5 of their sum. One way to do that would be to maintain a priority queue, but that adds an extra log factor both for queries and for the Mo's part, and some coding complexity.

A nicer way to find the high counts is to use randomization: let's just pick 120 random numbers within our segment, and hope that all numbers that occupy at least 1/5 of the segment will be found. Indeed, the probability of that not occurring for a number is less than (1-1/5)120 , or about 2.3e-12. We have at most 5 interesting numbers, 300000 queries, and around 100 testcases, so the overall probability of an incorrect answer does not exceed 5*100*300000*2.3e-12=3.45e-4, which is good enough.

Thanks for reading, and check back soon as I crawl towards the present :)

Tuesday, October 24, 2017

Commentary stream for TopCoder Open 2017 Finals

TopCoder Open 2017 final round is today at 19:30 CEST. Tune in at https://youtu.be/qrlDoP3JQkI to hear me, pashka and Endagorion discuss! This time we should have problem statements at the beginning of the round :)

Monday, October 23, 2017

Commentary stream for TopCoder Open 2017 Semifinal 2 with Egor

We'll try to do live commentary of TopCoder Open 2017 Semifinal 2 with Egor on https://youtu.be/AqpjM86iZlc  - tune in at 19:30 CEST!