[MUSIC] And we now move to the next topic that,
once again, seemingly has nothing to do with
the sequence comparison. We are now on a sightseeing tour in
Manhattan. And our goal is to walk from the
intersection shown in blue to the intersection shown in red.
And to visit as many attractions as possible. The only restriction is that we can either
move south or east. What should be our strategy to visit the maximum number of attractions on this
trip? We can model Manhattan as a grid, and show
edges in this grid where there are attractions as edges of rate one, and then
our goal is to find a path in this graph that has maximum number of
attractions. Sometimes computer scientists call this a longest path in this graph from the
blue node to the red node. We can actually try to solve this problem
in an arbitrary grid with this arbitrary number of
attractions, like this grid shown here. There are many different ways to travel in
this grid. For example, we can travel like this, and in this case, we’ll
visit 18 attractions. The length of the longest path is 18. Is it an optimal path?
No, because we can travel differently, and in this case, visit 20 attractions.
What should be our strategy, for the optimum exploration of Manhattan.
We need to solve the Manhattan tourist problem. The input is a weighted rectangular grid,
and the output is a longest path from the source
vertex (node), the vertex shown by blue, to the sink
vertex (node) that is shown by red in this grid. So what should we do? The simplest thing that comes to mind is
to explore a greedy strategy. For example, being in the very beginning,
we can either move east or move south. If you move east, we will visit three
attractions immediately. If we move south, we will visit only one. It makes sense to move east.
Let’s do it. Then, afterwards, once again, we can move
either east or south. We make choice based on the more, maximum
number of attractions. And we continue like this, and finally, we arrive to the source, visiting 23
attractions. And you have already guessed that this is a very simple
strategy. But not an optimal one. We need now to find the optimal strategy. Another thing we need to keep in mind is that Manhattan is not a perfect
rectangular grid. Broadway cuts across. And we can model this grid as an arbitrary
graph where edges can go from whatever vertex (node)
to whatever node. And how do we travel in this grid? And to travel in this grid, we need to
solve (find) the longest path in a directed graph problem,
where the input is an edge-weighted directed graph with the source
and sink nodes, and the output is simply a longest path from source to sink
in this graph. Now you may be surprised by now why I’m talking about alignment game and
Manhattan. Do you see a connection between the longest path problem and the
alignment game? And it may be not obvious that there is a connection but let’s try to figure out
what is it. For every column of the alignment
matrix, let’s code it with an arrow, as I showed in the slide. And then, after we design this arrow, let’s see how
this arrow would translate into the new grid that I have constructed
and presented in this slide. The first arrow is diagonal. Let’s move diagonally in our grid. The next arrow is also diagonal, let’s
move diagonally again. The next arrow is horizontal. We continue further horizontally. Diagonal, diagonal, vertical, diagonal,
vertical, diagonal, and actually using alignment matrix, we
were able to travel in the Manhattan grid. Right? Now, let’s ask a reverse question. If we have a path in Manhattan grid, would we be able to construct the
alignment matrix? Let’s try. This is an arbitrary path in the alignment grid.
Let’s combine all the arrows in one place. And as soon as we’ve done it, of course we can construct the alignment matrix as I
showed here. Therefore, alignments are nothing but
passes in the grid. And therefore, to play the alignment game, to construct longest common subsequences,
the only thing we need to do is to travel
optimal ways in the graph. And the often question for sequence
comparison problem in biology amounts to build an
appropriate Manhattan. We’ll do it a lot in this lecture. In the case of the longest common subsequence,
what would be this Manhattan? In this case, diagonal red edges
correspond to matching symbol and have a score of one. If these corresponding symbols for these
diagonal edges match to each other.
And highest scoring alignment in this case is simply the longest path in a
properly built Manhattan. And to learn how we travel in whatever
Manhattan, we need once again to talk about the
problem that seemingly has nothing to do with biology.
And it is the change problem.