Friday, 21 May 2010

On DP and graphs

Many DP problems can be presented in the graph theory language. It is good to see a relation between those two worlds. Very often it helps to come up with the solution faster and in a more structured way.

Let's solve a simple problem:

Simple Problem
You are given two dimensional array NxM, which is a map of the forest. Each cell contains the information about the number of blueberries in that place. Alice is standing at the north-west corner of the forest and she wants to move to south/east directions only. Help her get to home which at the opposite corner while  collecting the largest number of blueberries.

Graphical example:







Result is: 4 + 7 + 10 + 5 + 3 + 8 + 5 = 42


In a more graph-like way, with all the possible transitions, it looks like this:













I think it's much easier to see now what the problem's author is asking for. As a graph guy would say we are looking for the most expensive path in the directed graph with weights put on nodes.

The good thing about DAGs is that you can order the vertices in a way that all edges are going from left to right. It's called topological sorting. Look at the following picture:

As you see when we pick a rectangle (node), all the edges are coming from the left side. Such ordering is very useful in dynamic programming, because it clearly defines what the sub problems are - the calculation for the node depends only on the nodes to the left.

Let's see how does it relate to the original problem.

We want to calculate the most expensive path for the green to the orange node. For DAGs it can be solved in exactly the same way as the shortest path problem. To get to the orange node we needed to come from one of the nodes to the left. So, we need to calculate first the most expensive paths to the predecessors of the orange one.

It leads to a very simple recursive solution:
Most_expensive_path_to_node(x) := max( Most_expensive_path_to_node(first_predecessor_of_x), Most_expensive_path_to_node(second_predecessor_of_x)) + value_of_the_cell(x);

From the technical side. Nodes are represented by the cells in the two dimensional array [x][y] for x < N, y < M and the edges connect nodes [x][y] -> [x+1][y] and [x][y] -> [x][y+1] wherever possible.

In code it looks like this - recursive version:



int DP[maxN][maxM], cellValue[maxN][maxM], N, M;
bool visited[maxN][maxM];

int mostExpensivePath(int x, int y)
{
 if (x < 0 || y < 0) return 0;
 if (visited[x][y]) return DP[x][y];
 return DP[x][y] = max(mostExpensivePath(x - 1, y), mostExpensivePath(x, y-1)) + celLValue[x][y];
}

// and to call it
memset(visited, 0, sizeof(visited));
cout << mostExpensivePath(N-1, M-1);
Or probably even simpler solution using iterative approach:
int DP[maxN][maxM], cellValue[maxN][maxM], N, M;

// initialization of boundary cases
DP[0][0] = forest[0][0];
for(int i = 1; i < N; ++i)
 DP[i][0] = DP[i-1][0] + cellValue[i][0];
for(int i = 1; i < M; ++i)
 DP[0][i] = DP[0][i-1] + cellValue[0][i];

for(int i = 1; i < N; ++i)
 for(int j = 1; j < M; ++j)
  DP[i][j] = max(DP[i-1][j], DP[i][j-1]) + cellValue[i][j];

cout << DP[N-1][M-1];

The above ideas easily generalize for all DAGs.

Awesome pictures made using Microsoft Visio :)

2 comments:

  1. "weights put on edges": did you mean "on nodes"?

    ReplyDelete