Wednesday, September 24, 2014

Robust Chessboard Recognition Part 4: A Solution


In the Part 1 of this series, I showed how to turn chessboard recognition into an optimization problem. In Part 2, I discussed different ways of solving this optimization problem, deciding on a custom evolutionary algorithm with tailored mutation operators described in Part 3.   In this post, I discuss my solution. A video of the final solution is shown to the right.

Basic Solution

The last three posts developed the chessboard recognition problem as an optimization problem to be solved by an evolutionary algorithm using specialized mutation operators. So given the score function that assigns a score to possible quadrilaterals as chessboard boundaries, this simple algorithm works as follows. First, a population of 100 quadrilaterals is generated by placing a virtual camera at a random positions around an imaginary chessboard and projecting the chessboard into the camera's frustrum. Then the following steps are repeated one hundred times:

  1. Each quadrilateral in the population is scored.
  2. The quadrilateral with the highest score is chosen; for this purpose the best quadrilateral from the last population is included, so that selection is elitist.
  3. The best quadrilateral is mutated using the tailored mutation operations to generate a new population of 10 quadrilaterals.
After one hundred repetitions (generations) of this form, the best quadrilateral is returned as the algorithms' best guess of at the chessboard location. This quadrilateral is the best out of 1,090 possible quadrilaterals searched. This basic algorithm runs in less than 10 seconds on an Intel i7 processor with support from an nVidia 780ti GPU.

An Elitist Island Model

Although the basic solution works well, it sometimes gets stuck. This problem can be alleviated by running the basic algorithm several times in a distributed fashion. So when the chessboard detection system first starts, it runs the basic algorithm eight times in eight separate processes. Then each one of the eight instance of the algorithm returns a proposed solution. These eight solutions are compared using the score function, but with the addition of a score component that checks the color of the different board positions, which is too expensive to check for all 8,000 possible solutions. The best out of these eight solutions is chosen as the current best guess at the chessboard location for one time step. It takes 10-20 seconds to compute this guess.

Then, for subsequent time steps, four smaller evolutionary algorithms are run. Rather than generating new chessboards from random camera positions, each algorithm is seeded with the initial guess, and generates a new population of ten quadrilaterals just as in Step 3 above. The algorithm is run for just ten more generations, and then the four best quadrilaterals for each evolutionary algorithm are compared again. Once again, the color information is also checked, and the best quadrilateral is published as the current chessboard guess. These smaller evolutionary runs are run in parallel and complete within 250 milliseconds. So after initialization, the entire model runs at 4 Hz, publishing four successive guesses at the chessboard per second.  

An algorithm of this form is called an island model. Each distributed evolutionary algorithm is called an island, and when information travels between islands it is called migration. Island models can be very effective because they maintain some diversity across all islands while allowing a focused search on each island. In this case, our island model is elitist, because when the islands exchange information, it wipes out the past entirely.

Adapting to Change

One of the nice things about an evolutionary algorithm is that it adapts easily to change. If the board is moved while the board detector is running, it only takes a few cycles (5-10 seconds) for the best chessboard guess to adapt to the new reality. While this latency is not fast enough to allow the robot to move its eyes while tracking the board, it is good enough to adjust reasonably if someone accidentally (or intentionally) bumps into the board.

Also, in order to promote stability when the board does not change, it makes sense to keep old detected intersection points, since the lines and intersections detected at each frame may vary. In my approach, old intersection points are rolled off every 5 frames. This is enough to make the board detection robust, for example when an hand is placed between the camera and the board. You can see an example of this in the video above.

Overall Results

The overall solution is very robust and relatively fast. The image to the right shows what a converged example looks like, and the video above shows how the detection adjusts when the board is obscured or moved.

Wednesday, September 3, 2014

Robust Chessboard Recognition, Part 3: Tailored Mutation Operators


A correctly detected chessboard
In the last two posts, I discussed the chessboard recognition problem and the types of algorithms we could use to solve it. In this post I will describe the solution I developed. As described in the first post, I use an image of the chessboard taken from the iCub's eye, extract lines and intersections, and create a chessboard scoring function that takes a proposed quadrilateral, warps the image to undo the perspective, and then checks to see how many lines and intersections match what is expected from a chessboard. An example of a good quadrilateral is shown to the right.

In the second post, I argued the case for using a simple evolutionary algorithm with custom mutation operators tailored to look for chessboards. In this post, I will describe the mutation operators. In the next post, I'll describe an elitist distributed island model used to select the best chessboard candidates.

Rectangle Mutation

A randomly generated quadrilateral
The general idea is that we start with a quadrilateral with a decent score, and we want to change in a way that might make it better. At the very beginning, the quadrilaterals are chosen by using a projective model from a camera whose position and orientation are somewhat random above and around the chessboard. This projection generates some random convex quadrilaterals. An example is shown to the left. So let's think about what could be wrong with our quadrilateral, and how we could adjust to quadrilateral to fix it.



Snapping Corners to Points


Leftmost corner snapped to point
Looking at the chessboard in the real image above, we notice that the generated quadrilateral has no relationship the to the lines (in red) and intersections (white circles) that we computed for the scoring function. Ideally, the right quadrilateral should have corners that are near intersections on the chessboard. So one of the first mutations we might consider is to snap a corner of the quadrilateral to some intersection that we detected. An example of what happens after this is done is shown to the right. It looks like we have now have nearly matched near the upper left corner of the real chessboard. The image on the right was generated by just ten trial points. 
Top right snapped to point
This snapping operation works because there is a concentration of intersection points inside or near the chessboard, so with high probability we'll pick a point right around the chessboard, and there is a decent probability that we will get a snap point closer to one of the corners. If we do it again, the we might get a second corner a bit closer. This is shown to the right. The snap point actually came from the keyboard. But it doesn't matter, since it got us closer to the right answer.  Again, I only had to sample 10 snap points to get the image in the right.

But now we're stuck. The snapping operation works by randomly choosing an intersection and then moving the closest corner of the quadrilateral to that point, as long as the corner lies within a threshold from the snap point (50 pixels from a 480 x 640 image here). As can be seen, our two earlier snap points are now closer to most of the new snap points we might choose than the other two corners are. Despite generating 100 new samples, I could not get a third corner in place. We need something more.

Elongating and Shortening

After six shortening steps
After three shortening steps
We need to move the corners that are below and to the right of the visible area. Specifically, we need then to come close enough to the other corners so that we can snap them into place. One way to do this is to take lines on opposite sides of the quadrilateral and make them shorter, pulling the lower corners in towards the top two points.

After six more snaps
The actual mutation operation chooses a random side of the quadrilateral, and then picks a percentage to shorten or lengthen that side and its opposite. The percentage is chosen from a random normal variable with a variance of 12.5%. In our case, the quadrilateral's corners are off at 1255 pixels on the right, while the image ends at 640 pixels. It will take several operations to pull the right side in towards the image.

The images to the right show the progress after three and six steps. Notice that we've also gotten further away from the corners. But now the corners at the right and bottom are close enough that we can hope to pull them in by snapping more. The image above is what we get after six snaps.

Widening and Narrowing

After six narrowing steps
Our quadrilateral is looking better; we're getting close on three out of the four corners. But that fourth corner is still at 881 pixels out of 480. It needs to be pulled in. This time, we'll use an operation to widen and narrow the quadrilateral. This operation randomly picks a single side, and then pulls both the corners it is attached to towards the center of the line. We might imagine that while doing so we are moving the camera up or down with respect to the plane of the board. Again we use a standard normal with variation 12.5% to decide the percentage of widening/narrowing. 

Three elongations and a snap
After six narrowing steps, we get the picture to the right. We've pulled the lowest coordinate up from 881 to 632 pixels, but we're stuck again. The side on which most of the narrowing has been done pulled down the corner on the right, and we're not going to get much more out of narrowing that side. But we can try to lengthen it back a bit via the elongation operation. And then we can fix the corners by snapping. The image to the left shows the progress after three elongations and a single snap. We're getting a bit closer, but we still need more.

Clipping to Lines

Second clip to right
First clip to botton
Up to now, I haven't mentioned our most effective mutation operator. We used the intersections points right at the outset, but we didn't use the lines. And yet, if all works well, the outline of the chessboard is right there in our list of lines! So what we do now is to choose a random line, and then move the closest side of the rectangle inward to match the lines. And look below at what happens after just three steps of clipping the quadrilateral to lines. We now have three of the four sides. But because of all the pieces on the board, it will be hard to get the fourth line, since less of that line was extracted. No matter; we can snap again.

After another snap
Third clip to left
After snapping, we are now in a position where the elongation/shortening operator should work perfectly. After three elongations and two clips, we are almost done.




Three elongations and a clip

Sliding into Place

Since we can't clip the final line into place, we have to randomly shift the corners. Our final operator picks one corner of the rectangle and shifts it a random amount. I chose to use a poisson random variable with \(\lambda = 5\). A few random shifts that affect the top corner, and then we can clip the left side back into place. A few more random shifts on the right, and then the right side gets clipped in to place. We can stop here, since we now have a good match. Once the solution is this close to optimum, the scoring functions can quickly guide an evolutionary algorithm to the correct solution.
Shift top left corner

Clip top left into place

Shift top right into place

Final clip to refine.

Summary of Mutations

It took 40 mutations to achieve the picture above. While that might seem like a lot, a more general-purpose solution can take far longer. For example, differential evolution needs thousands of generations and still does not find matches this good. The clip and snap mutations are very effective at moving the quadrilateral close to a good solution, and the other three mutations are useful to shift everything into the right place using the fitness function.

In the end, all of the mutations mentioned above are useful. Our final evolutionary algorithm mutates a quadrilateral by randomly selecting one of the mutations and applying it. The list of mutations is listed below, together with the probability of choosing each one.

  1. Clip to Line (40% probability)
  2. Snap to Point (20% probability)
  3. Elongate / Shorten (15% probability)
  4. Widen / Narrow (15% probability)
  5. Shift One Corner (10% probability)
We're counting on clipping and snapping to get us near the right solution, and on elongate and widen to bring us into situations where clipping and snapping work. Finally, small local shifts optimize the solution in the end.

In the next post, we'll look at how the evolutionary algorithm operates in real-time to track the chessboard and adjust when the chessboard is moved.









Monday, September 1, 2014

Robust Chessboard Recognition, Part 2: Evolutionary Algorithms


Lines and intersections extracted from the iCub's eyes.
In Part 1 of this post, I discussed my strategy for recognizing the occluded or cluttered chessboards. The main idea was to recast the chessboard recognition problem as an optimization problem. I wrote a Python program that extracts lines and intersections from the video images out of the iCub's eyes. An example of these lines is shown to the right. I developed a scoring function that takes a set of four points representing the corners of a quadrilateral and assigns a high number to quadrilaterals that seem to match a chessboard, and a low number to those that don't. This scoring function does a reverse perspective projection and checks to see if the warped image matches the criteria from the last post:
  1. What percentage of the expected horizontal and vertical grid lines are matched?
  2. What percentage of the grid intersection points are matched?
  3. How uniform is the color is the fringe just outside of the warped image?
Now we have a fitness function (also known as an objective function or cost function) that can score each proposed chessboard. We can use a black-box optimization method to find the chessboard that scores best. Specifically, we will use an evolutionary algorithm. These methods are described in this post, and the actual solution is discussed in Part 3.

Black-box Optimization

A black-box optimization method is given a fitness function with no extra information. The fitness function is just a black-box, an engineering term for a component in a design that reliably does some work without the designer needing to know how it works. The fitness function will give no other information other than the value of search point. A black-box optimizer searches through the inputs to the fitness function trying to find points with high scores.

Gradient-based (Newton) Methods

Black-box optimization was originally named to contrast with the gradient-based or Newton methods that everyone learns in high-school calculus. A gradient-based method method not only sees the fitness value \(f(x)\), it also sees the gradient \(\nabla f(x)\), the direction in which the fitness increases most rapidly. Newton methods can find the high and low points of a function very quickly because they are given good information about where to look next. Conjugate gradient Ascent reduces error with the square of time. That is, if at time \(t = 1\) the different in current fitness from the highest nearby fitness value is \(100\), then at time \(t=2\) will be \(10\), at least in theory. 

But there are caveats. First, Newton methods need a gradient. In a case like ours, the fitness function is a program, and you can't take the derivative of a program. You could extract a math formula from the program with a lot of hard work. And in this case, that formula would even be differentiable (it's piecewise linear by design). But it would take many symbols to write it down.  It would be much easier to estimate the gradient by \(\frac{1}{2} \left(f (x + h) - f(x-h)\right) \) using small increments \(h\) in each direction. However, estimated gradients slow down gradient ascent so that it only reduces error at a rate proportional to the time -- estimating the gradient costs a linear factor. Not only that, but even with a gradient, some functions don't work very well with gradients. Like ours.

Our scoring function is \(multi-modal\). If you could plot the fitness values over a plane, you would see lots of bumps distributed at the grid points. This is because if you slide the correct quadrilateral for the chessboard over one square, you would still have \(\frac{7}{8}\) of the grid points matched. So in ideal conditions the landscape of our fitness function has a large pyramid over the quadrilateral where the best chessboard is. Then it at each grid points in a box around the correct quadrilateral it has a slightly smaller pyramid, followed by a another row of pyramids at the next grid point, and so on out to the edges of the board. There is also noise created by extraneous and missing lines and intersections. 

If you run a gradient method on this type of fitness function, you will at best get the tip top of one of the small pyramids. Although Newton methods are very accurate, they are local. They don't look for the highest point anywhere in the fitness function, just for the one nearest to the starting point. There are hacks like momentum factors that can avoid small local optima. But if all you have is a regular series of local optima, they won't work.  

Quasi-Newton and Local Methods

The need for non-gradient methods became apparent from the earliest days of computing, and some were even used by some scientists in the Manhattan Project. By the 1960's a number of such methods had been developed -- Compass Search, Powell's method, and the Nelder-Mead (or amoeba) method are three well-known ones. Researchers within the mathematical community call such methods Direct Search. But these methods were not always reliable, and mathematicians disliked the fact that such methods can fail to find even a local optimum in some cases. Instead, they developed quasi-Newton methods such as the BFGS method that provide local convergence guarantees using sophisticated estimation of the gradient. 

Still, estimating gradients is not enough to solve multi-modal and noisy problems, and so direct search methods were researched further. Starting from the Nelder-Mead algorithm, Virginia Torczon and her colleagues have developed a much improved method called pattern search.  One nice thing about pattern search is that there are (local) convergence proofs and the convergence rates are known. A general pattern search method performs comparably to the best evolutionary algorithms mentioned below (e.g. CMA-ES, Differential Evolution). However, none of these work as well as the simple method that we will ultimately develop by using the characteristics of the chessboard problem, so no more needs to be said about pattern search here.

Evolutionary Algorithms

In Europe during the 1960s, Hans-Paul Schwefel and Ingo Rechenberg developed a method called evolution strategies inspired by Darwinian selection to search for the optima of real functions in Euclidean spaces. In their method, several search points (a population) are placed randomly in the space. A small selection of the best points are then varied in many ways by adding a Gaussian random variable to create a new population. The process is repeated as long as desired. There are no convergence guarantees, but the method can work well, and numerous improvements and variations have been developed over the years.

Around the same time in America, John Holland developed a similar method for binary codes inspired by Mendelian genetics. He called his method a genetic algorithm and published a widely read book named Adaptation in Natural and Artifical Systems in 1975. His student, David Goldberg, popularized these methods in the 1980s, and together with neural networks the terms became widely known in popular science and sci-fi circles.

A genetic algorithm is characterized by a gene encoding, in which solutions to a problem are expressed typically in a binary form as a list of ones and zeros. So to search for solutions to a problem in the real numbers, one might use a fixed decimal encoding. For the integers, one could use two's complement binary numbers. Since any representation inside a computer requires a binary code, genetic algorithms can be easily applied to any object in a digital computer (not necessarily with success).

The other prominent feature of genetic algorithms is crossover. In crossover, two gene encodings are mixed according to some scheme to create a third new gene encoding. Overall, genetic algorithms, like evolution strategies, propose a population of search points, then use the scores to generate a new population using selection, crossover, and mutation. Selection chooses the best points by fitness to generate the next population. Crossover mixes and matches these points, and mutation randomly flips a small number of bits. Populations are generated in this way as long as needed, usually until the population consists of several copies of the same point.

These days, standard genetic algorithms and evolution strategies have been amplified, extended an modified in numerous ways to produce an entire academic field of evolutionary computation, or natural computation. There are optimization methods inspired by the foraging behavior of antsthe migration of geese, and the adaptive defense mechanisms of the immune system

Performance of Evolutionary Algorithms

A key characteristic of both evolution strategies and genetic algorithms is that neither provides any guarantees at all with respect to performance. They can and often fail badly, especially when applied to problem naïvely. For this reason, and because of overstated claims by some supporters, these methods have acquired an unwarranted bad reputation in some circles. Nonetheless, they can work quite well if applied with care. Evolutionary algorithms have been used to design a control system for a finless rocket, to evolve a better radio antenna, and to schedule courses satisfying many constraints

The study of when and why evolutionary methods work is relatively young. The earliest attempt at explaining behavior was John Holland's schema theory. However, schema theory is not taken seriously anymore, for good reasons. Kenneth de Jong did early experimental work examining when and why genetic algorithms fail.  

Since the mid 1990s, several researchers have begun to study the performance characteristic of evolutionary methods from a formal mathematical point of view. Some success and failure cases are now known. For example, needle in a haystack problems cannot be solved without exponential time. And a simple evolution strategy performs better than line search on on the problem of finding an exact match to some unknown string of bits. But theory that is known lags quite far behind practice. One important fact is that the No Free Lunch theorems imply that there must be a class of problems for which evolutionary methods perform better than gradient methods. It is a fallacy to say that because gradient methods are understood they must perform better! But it is also a fallacy to assume that the class of problems exists where evolutionary methods perform better is likely to occur in real-world problems. Still, evolutionary methods are used because they can work well.

In practice, the best evolutionary methods at present for searching real functions seem to be Differential Evolution and CMA-ES (or its generalized version, NES). Both of these methods tend to perform well out of the box, comparably to the pattern search mentioned above. And any of those three are far superior to a Newton or quasi-Newton method on certain multi-modal problems (like the chessboard problem). Differential evolution has even been shown to converge to the global optimum under certain conditions. 

But to really get a good evolutionary algorithm, one has to take problem characteristics into account. In a way, this defies the original plan for a black-box method. The idea was that we would get good performance on any problem. But the reality is that one gets good performance on some problems by ignoring others. There is a general tradeoff between generality and quality/speed. In a recent paper, I even showed what the optimal solutions look like when you have a model of fitness problems. The catch is that the optimal solution averages over problems (and includes a second internal optimization). So a general model has to average over problems that aren't be solved at the moment, for which it sacrifices some performance on problems that are being solved.

Thus if you really want to solve a problem, you have to tinker with the method. One interesting thing about the evolutionary framework is its flexibility. One has two very flexible phases, selection and mutation. The selection phase uses fitness values in order to decide which search points should receive focus, and the mutation phase then tries to combine or alter the selected points to potentially make them better. After choosing a good general selection method, such as tournament selection or even evolutionary annealing, one can adapt the mutation process to match the search problem at hand. One way to manage such adaptations is by using fancy indirect encodings, or one can work directly in the search space using complex mutators.

Our chessboard recognition algorithm will do exactly this: it will directly search quadrilaterals, using the known features of the problem to choose good mutations. In this way, we will be able to find chessboards much faster than any general-purpose method does while still using an evolutionary algorithm. But that is the topic for the next post.