Thursday, 8 April 2010

On finding majority

I have recently given a thought to a following problem:
You are given an array (size N) of some numbers. Find out if there exists a number that appears at least N/2 – 4 times in it. Print out any of them or “Not exists” if there isn’t any.
Expected solution: linear time, constant additional memory.
Sounds silly and very unpractical, I know, but it’s still a quite interesting problem :) Try to solve it yourself first.



Let’s consider a simpler version of this problem. Find a solution with assumption that a number must appear more than N/2 times. Why is it simpler? ... Because it’s a well-known problem in some circles :)

One easy solution for that one is just to sort the array and go through it checking if there is any number that follows given condition. It’s time complexity is O(N log N), though.
Another “obvious” one is to find a median of the array. If the solution exists, it must be a median (for Problem 2). It can be implemented in linear time, but with O(log N) additional memory (at least as far as I remember) and it’s quite complicated.
As you probably expect, there IS a very simple solution for that problem.

typedef int Number;
// checks if given number appears at least howMany times in the array
bool checkIfMajority(Number major, Number array[], int N, int howMany)
{
 int check = 0;
 for (int i = 0; i < N; ++i) if (array[i] == major) ++check;
 return (check >= howMany);
}

// finds a candidate for a majority element
// if a majority element exists it must be equal to the number returned by this function
Number candidateMajority(Number array[], int N)
{
 Number major;
 int counter = 0;
 for (int i = 0; i < N; ++i) {
  if (counter == 0) major = array[i];
  if (array[i] == major) ++counter;
  else --counter;
 }
 return major;
}

void findMajority(Number array[], int N)
{
 Number candidate = candidateMajority(array, N);
 if(checkIfMajority(candidate, array, N, N/2 + 1)) print(candidate);
 else print(“Not exists”);
}

It does exactly 2N comparisons of the elements of the array. Try to prove the solution yourself. It shouldn’t be very difficult.

Let’s go back to our original problem. How to approach it?
N/2 – 4 is rather an unusual number. Since we have just solved the problem for N/2 + 1, probably it is worth seeing what happens with numbers in between. What about N/2? It basically means that we want to find a number that exists at least half of the times. The only difference to the N/2 + 1 problem is that we have to cover the case when there is a number that appears exactly N/2 times and this isn’t a very difficult problem, but it will be very important later. Spend some time to figure out the solution and read the next paragraph afterwards :)

Well, look what happens when we take out one element from the array. We have two cases, then. We either took out the element that was appearing N/2 times or not. In the first case, we are happy because we’ve just found proper element. In the second one we are left with an array of size n-1 and there should be still an element that appears n/2 times. But then it must be then a majority element of the array of size n-1!

The overall code that solves n/2 case looks more or less like the following:

void findMajorityAtLeastHalf(Number array[], int N)
{
 if(checkIfMajority(array[0], array, N, N/2)) print(array[0]);
 else findMajority(array + 1, N – 1);
}

It does about 3N comparisons and the approach can be easily generalized. We discarded an element from the array so that we could use “standard” algorithm for finding majority element. Let’s do it for our original problem.

We can use standard algorithm only one the number appears more times than half of the array. What should be the size of the array, then? Let’s write it as inequality:
M – new size of the array
N/2 – 4 > M/2
N – 8 > M

So, it means then when an array is of size at most N-9 we can apply standard algorithm for finding majority element to find a number that appears at least N/2 – 4 times. If we can take out 9 elements from the array such that none of them are good, we are happy. So what we can do is to manually check first 9 elements. If none of them appears at least N/2 – 4 times, it means that the number that we are looking for is a majority element in the array of size N-9. Let’s see that in code:

void findMajorityNew(Number array[], int N)
{
 for(int i = 0; i < min(N, 9); ++i) if(checkIfMajority(array[i], array, N, N/2-4))
 {
  print(array[i]); return ;
 }
 if(N > 9) findMajority(array + 9, N – 9);
 else print(“Not exists”);
}

It solves our problem and it does about 11N comparisons, which is linear.

We can generalize it a bit more. Let’s find a number that appears at least N/2 – K times, for some value of K. Solution will be similar, just the number of manual checks will be different.

M – new size of the array
N/2 – K > M/2
N – 2K > M
M ≤ N – 2K – 1

In code it would look like this:

void findMajorityK(Number array[], int N, int K)
{
 Number candidate = candidateMajority(array, N – 2K – 1);
 if(checkIfMajority(candidate, array, N, N/2 - K)) print(candidate);
 else
 {
  for(int i = N – 2K – 1; i < N; ++i) if(checkIfMajority(array[i], array, N, N/2 – K))
  {
   print(array[i]);
   return ;
  }
  print(“Not exists”);
 }
}

This one does about (2K+3)N comparisons and hopefully works correct. It’s not much more complicated than the standard algorithm. So, after all we ended up with algorithm that finds an element that appears at least N/2 – K times in the array in time complexity O(KN+N) and constant additional memory.

Maybe the problem isn’t very pretty and useful, but I think it’s quite good for more difficult interviews.

0 comments:

Post a Comment