Have you ever had bad dreams because of not being able to solve any dynamic programming problems?
Yeah, me neither.
Many of my friends hate this technique. Why is that?
Let's see the typical process of learning DP:
1. Encounter hard enough DP problem on some competition / book
2. After not solving it, ask someone how to do it correctly
3. “They’re crazy. How the ?#$% should I be able to come up with this?”
4. Give it a shot - look at DP exercises on the internet / books and finding one from the early chapters that will block you forever
5. “I am never going to learn that!”
6. Give up and (or not) start from the beginning after some time
Been there?
I have, a couple of times. It think it's because of the fact that I’ve been underestimating the "simple" problems. Without fully understanding their solutions I was trying to do the harder ones. That's not the approach we're going to take here! Let's get to work.
Problem I
Count the number of words of size N over alphabet {0,1} that don't contain two 1's next to each other.
Solution for small values:
N = 1: 2 // 0, 1
N = 2: 3 // 00, 01, 10, but not 11
and so on...
Do you know how to solve it already? If not, let's think how to find the correct results in the simplest possible way, not caring about the complexity.
We can try to use what we have learned from the last post. The most natural way of solving problems like that using brute force method is to simulate creation of new words - adding one letter and calling itself recursively. This function won't differ much from the ones implemented in the previous post. We just have to remember that no two consecutive letters are equal to 1.
// maxN is some constant greater than maximal possible value of N on which we want to test the solution
bool words[maxN];
int N;
int generate(int position, bool lastLetter)
{
if (position == N)
{
// we can print words here to see better how it works
return 1; // we found a new word
}
int result = 0;
// counting words with 0 at this position
words[position] = 0;
result = generate(position + 1, 0);
// counting words with 1 at this position
words[position] = 1;
// but only those without two 1's next to each other
if (lastLetter != 1)
result += generate(position + 1, 1);
return result;
}
// and to call it:
// we can use 0 as a lastLetter because 0 doesn't break anything
cout << generate(0, 0);
OK, it looks nice, but why exactly did we do that? It is terribly slow, but that was very important step in solving this problem. Actually, there is not much left to do. Look that we are not using words array in any way. We could remove it and two lines in which some values are assigned.
What does it mean for us?
Since we're not using any values outside the function (except for N which is constant) the only things that influence what function returns are parameters. There can be (N+1) * 2 possible different combinations of parameters ( the first one has values between 0 and N and the second 0 or 1) so there is only linear number of different function calls to get the final result. And here comes DP...
We know that the function returns the same result when we call it with the same set of parameters. So after calculating it for the first time we can store a result somewhere and the next time function will be called, we will just return the result from the previous call. Here is how it can be implemented:
int N, DP[maxN+1][2];
// DP keeps information about values returned by the function
// -1 means that value haven't been calculated yet
// any other is the result of the function
int generate(int position, bool lastLetter)
{
if (position == N)
return 1; // we found a new word
// we calculated the value before
if (DP[position][lastLetter] != -1)
return DP[position][lastLetter];
int result = 0;
// counting words with 0 at this position
result = generate(position + 1, 0);
// counting words with 1 at this position
// but only those without two 1's next to each other
if (lastLetter != 1)
result += generate(position + 1, 1);
DP[position][lastLetter] = result;
return result;
}
// and to call it:
// initializing the array
for(int i = 0; i <= N; ++i) DP[i][0] = DP[i][1] = -1;
// we can use 0 as a lastLetter because it doesn't break anything
cout << generate(0, 0);
How fast does it work now? Let's see: we have 2(N+1) different possible function calls and in each of them we do constant number of operations not carying about other function calls (no loops inside). It's maybe not that obvious (at least it wasn't for me at the beginning) but it means that in total, we do O(N) operations.
To be more clear look at the following code:
int N, DP[maxN+1][2];
int generate(int position, bool lastLetter)
{
if (position == N)
return 1;
if (DP[position][lastLetter] != -1)
return DP[position][lastLetter];
int result = 0;
result = generate(position + 1, 0);
if (lastLetter != 1)
result += generate(position + 1, 1);
DP[position][lastLetter] = result;
return result;
}
// and to call it:
for(int i = 0; i <= N; ++i) DP[i][0] = DP[i][1] = -1;
for(int i = N; i >= 0; --i)
{
generate(i, 0);
generate(i, 1);
}
cout << generate(0, 0);
which is basically the same as:
int N, DP[maxN+1][2];
int generate(int position, bool lastLetter)
{
if (position == N)
return 1;
int result = 0;
result = DP[position+1][0];
if (lastLetter != 1)
result += DP[position+1][1];
DP[position][lastLetter] = result;
return result;
}
// and to call it:
for(int i = N; i >= 0; --i)
{
generate(i, 0);
generate(i, 1);
}
cout << DP[0][0];
and after some thinking the same as:
int N, DP[maxN+1][2];
DP[N][0] = DP[N][1] = 1;
for(int i = N-1; i >= 0; --i)
{
DP[i][0] = DP[i+1][0] + DP[i+1][1];
DP[i][1] = DP[i+1][0];
}
cout << DP[0][0];
and this one does without any doubt linear number of operations!The only difference between fully recursive solution and the last one is that in the former we are doing some kind of lazy calculations - we are only calculating values that we need to find the final result. This technique is often called in books as memoization (resursion with storing calculated values) or top-down approach.
On the other hand the code shown in the last example is a fully iterative solution to this problem. It is often called bottom-up approach.
Which one of them is better?
It depends on the problem. Usually the iterative approach is faster because using recursion takes some time (and more memory). That's how it is in this case. Sometimes memoization can be much faster when we don't need to calculate all intermediate values.
In this case iterative approach has another advantage. Look that in the last code we don't need to keep whole array in memory. Two values from the next row are enough. It can look like this:
int N;
int withZero = 1, withOne = 1;
for(int i = N-1; i >= 0; --i)
{
withZero = withZero + withOne;
withOne = withZero;
}
cout << withZero;
Anyway, memory isn't usually a big problem, however, sometimes optimizing memory, we can greatly decrease the time of the code execution.
I think that after solving many problems it is easier to write a solution using recursion. I prefer to use it whenever possible. It is much easier for me to take into account all constraints in the simple recursive function than (sometimes) doing the same thing in many places in the iterative approach. And the most important thing: the solution is more intuitive - we started with the brute force solution which must have been correct and optimized it without breaking anything. I feel it much easier to prove solutions that way. After some time you don't even think about it because "it obviously must be correct".
I actually haven't written what dynamic programming means, but it isn't very important .It's about storing intermediate values to compute the results faster, which is basically what we were doing. You can always check out wikipedia definition.
In this post I wanted to show some basic intuitions related to DP. They're useful in solving many problems. I hope it was possible to understand the ideas involved.
As a very simple exercise you can solve following problem:
Problem II
Count the number of words of size N over alphabet {a, b, c} that don't contain "ab" and "bc" as the subwords
Solution for small values:
N = 1: 3 // a, b, c
N = 2: 7 // aa, ac, ba, bb, ca, cb, cc, but not ab and bc
and so on...
I haven't checked if the codes are working. If anything is unclear just write below or mail me.

Yeah, I feel young again :D
ReplyDelete