Tuesday, August 12, 2014

Robust Chessboard Recognition, Part 1: Some Tools


Raw image of chessboard from iCub's left eye.
In order for the iCub to play chess, it first needs to see the board. For eyes, the iCub has two Dragonfly cameras with VGA resolution (640 x 480) that produce 30 frames per second. An example frame with the chessboard in it is shown to the right. It might seem easy to figure out where a chessboard is inside of an image. After all, checkerboard patterns are commonly used to calibrate cameras for image processing.  It is true that it is easier to locate the chessboard in an image than it is to solve than many of the other problems we will encounter, but as we will see, it is by no means simple or straightforward to do so.

To find the chessboard, we will use computer vision techniques to extract information about the image. We could use more advanced techniques, such as neural nets, but they are not needed to get a good solution. If we wanted to train a neural network to find the board using supervised techniques (e.g. backpropagation), we would need to gather a large dataset of chessboard images, either by generating images or by collecting and annotating hundreds of thousands of frames from the robot. If we generate the images, then we can't be sure that the learning will transfer from generated images to the real frames. Collecting the images would take many weeks of thankless labor. As an alternative, one could try to use the inherent properties of chessboard images in order to develop an unsupervised training method, but there is no obvious way to do so in this case. So rather than make this task more complicated than it needs to be, we'll just use standard methods for detecting lines and intersections.

Undistorting the Image

Image of chessboard after removing lens distortion.

Take a look at the image from the robot's cameras. Notice that the lines in the picture are not straight; the raw images are distorted by the camera lens. This distortion needs to be removed so that we can extract the lines.  Thanks to the many high-powered methods available in the open source computer vision package Open CV, the distortion can be taken out by just one call to the undistort method once we know the parameters for the camera lens. As discussed here, these parameters can be extracted by another call to a function named findChessboardCorners, which requires nothing more than an image with a chessboard in it as input. The undistorted image is shown to the right.

At this point, you might wonder whether we can just call findChessboardCorners on our raw image and be done with the chessboard recognition problem. The answer, of course, is no. While findChessboardCorners is good at finding unobstructed chessboards that are roughly parallel the frustrum of the camera, it fails utterly when the chessboard is covered with chess pieces and laying at an oblique angle on a table. But this is exactly the situation for which we need to solve the problem, so we have to develop our own method.

Our Strategy to Find the Chessboard

We will use the following properties of the chessboard to develop a robust detection method for chessboards:
  1. Chessboards are square.
  2. Chessboards contain a grid of 9 horizontal and vertical lines, equally spaced.
  3. Chessboards contain an 9 x 9 grid of intersections from these lines, again equally spaced.
  4. Chessboards typically have a monochrome fringe with no lines around the board.
  5. The 64 squares in a chessboard alternate in color, with one dark and one light color.
Chessboard after perspective warp given the board outline.
Many of these statements (such as equally spaced, horizontal, and vertical) require the robot to be looking at the chessboard from directly overhead at a known distance from the board. Since we don't know where the board is, that's too much to ask. However, we know that the chessboard is a square and that its position in the image is determined by a perspective projection from 3D into 2D space, and given any quadrilateral in an image, we can compute the perspective transformation required to warp the quadrilateral back into a square viewed from directly overhead at a known scaling. In fact, Open CV provides us with a function getPerspectiveTransform that does exactly what we want, giving us a 4x4 matrix that we describes the perspective transform (we could also use findHomography, but we're not going to need it). We can "undo" the perspective transformation in order to flatten the chessboard by using warpPerspective. If we knew the quadrilateral outlining the chessboard, then we would get an image like the one to shown here. We would see that the warped image roughly satisfies the five observations about chessboards listed above. 

Of course, we don't know the outline of the chessboard yet, but there's nothing that prevents us from guessing. We could pick any four points in the image that outline a (convex) quadrilateral, undo the perspective transform, and then visually check items 1-5 listed above. Since there are 94,371,840,000 possible outlines of our chessboard, we can't check every quadrilateral, but we could easily check at least some of them. We could check even more if we could find an automated way of checking items 1-5. If we write a program that is smart about how to pick points to check, it might even be able to find the chessboard fast enough to be useful. And that is what we will do, using a custom-built evolutionary algorithm. 

But that's the topic for the next post. In this post, we'll finish off by describing the tools we need in order to check whether a proposed outline of a chessboard is a good outline or not.

Processing Chessboard Images

Chessboard edges detected by Canny edge detector.
The chessboard features listed above were chosen to depend on lines, intersections, and colors. It's easy to see how we can get colors out of an image, but lines and intersections are more difficult. We start off by extracting edges using a Canny edge detector. Edge detectors try to find areas in an image where colors change rapidly. These areas are likely to indicate where one object begins and another ends. They could also indicate color patterns on a single object. We hope to detect the boundaries between chessboard spaces as edges. The picture to the left shows what the Canny edge detector does to our undistorted chessboard image (using Open CV's canny function, after various successive calls to erode and dilate). As expected, the chessboard shows through clearly in the edges, although many edges are chopped up by the chess pieces.


Now we need to pass from edges to lines. The traditional tool for this purpose is the Hough transform, which scans through the image searching for lines at each possible angle. Using the probabilistic version HoughLinesP in Open CV, we get a list of dozens of potential lines satisfying given thresholds on length, etc. These lines will be used to check items 2 and 4 in the list of chessboard features above.

We can easily compute the intersection points of each pair of lines using some basic vector math. However, if there are hundreds of lines, there are tens of thousands of possible intersections, so it's important to be very careful about how we do this. I used numpy to compute all of the intersections in parallel using matrix operations in a matter of milliseconds thanks to built-in hardware acceleration. If we were to loop through all pairs of lines explicitly, it would take far longer. Careful thresholding on the Hough transform is also useful to cut down on the number of lines.

Chessboard with lines and intersections from Hough transform.
All of the intersections give us a long list of points. We keep only the intersection points that intersect inside of the line segments that generated the intersection, and throw away the rest. In addition to these intersection points, we add in the endpoints of each line segment, which may represent an intersection that the edge detector failed to see. These points are candidates for the grid intersections and can be used to check item 3 on the list of chessboard features.


The lines, their intersections, and endpoints are shown to the right, with lines in red and intersections as white circles. The chessboard is clearly visible, but it is also missing some intersections and grid lines. Thus when we match the chessboard, we will have to allow for the possibility that some elements of the chessboard are invisible to us. We are looking for the candidate chessboard that has as many of the desired features as possible.

Chessboard after color filtering to reveal dark and light squares.
As a final step, we can threshold the colors in order to try to discriminate light squares from dark squares as in item 5 of our chessboard features. A picture is shown to the left. Notice that the colors are not reliable; dark pieces on light squares can cause confusion, and our threshold is not perfect. The colors would also fail if our chessboard had different colors or were placed in different lighting. Additionally, computing a score for the color is more computationally expensive than the other steps. For these reasons, we only use the color to select among the best potential chessboards.


Chessboard Detection as an Optimization Problem

Putting the information above together (and omitting some details), we can build a scoring function for potential chessboards. The scoring function takes a quadrilateral \(q\) and warps the image to remove the perspective transform so that the edges of the quadrilateral are located at \(xy\)-coordinates \((-N,-N)\), \((-N,N)\), \((N,N)\), and \((N,-N)\) for some \(N\) (we used \(N=480\) to avoid certain precision issues with Open CV). Then it computes three subscores:

  1. The percentage of horizontal and vertical lines of the chessboard that are matched.
  2. The percentage of intersection points on the chessboard that are matched.
  3. The variation in a fringe of width $N\slash 16$ around the chessboard. 
The first two components give 'partial credit' for lines that are not quite straight and points that are not quite at the intersections; this credit scales with the square of the error in the line angle or the intersection error. This is done so that the score behaves smoothly around the good matches, leading a chessboard search along a smooth path to the best matches. For the line and intersection subscores, the "percentages" may be greater than one, because some lines and points may overlap. The variation of the fringe is simply the sum of the squared differences from the average color, times a scaling factor. 

Thanks to an efficient implementation, plus Open CV's support for offloading items such as perspective warping to our nVidia 780ti graphics card, our score function can be computed in about 200 microseconds. So we can test 5,000 possible chessboards per second. Although we still cannot check all of the billions of possible boards, with a good heuristic choice, we can find very good matches in a matter of seconds. 

As often happens in artificial intelligence problems, our chessboard recognition effort has now led us to an optimization problem. Specifically, we want to find the quadrilateral in the image that results in the best chessboard after we warp it to take out the perspective transform. If \(Q\) is the set of potential quadrilaterals and \(score\) is our scoring function, then we want to choose the quadrilateral \(q\) such that $$q = \mathrm{arg\,min}_{q' \in Q} \,\,score(q').$$ The problem of detecting the chessboard is now reduced to the problem of optimizing the \(score\) function using whichever method works best.


Conclusion

Chessboard detected using our evolutionary algorithm.
Some final comments are in order about why we are solving the problem in such a roundabout manner. After all, it might seem that it would be easier to use a brute force approach (such as the one used to detect chessboards for calibration). For example, we could use the lines in order to detect possible rectangles, and then assemble sets of similar rectangles hierarchically into a chessboard pattern. But in practice, the rectangle-based method is far slower than the ones we have outlined. There are billions of possible rectangles, and it would be difficult to choose which ones to prefer. In the end, our optimization method works quickly and is correct in most instances. It seems doubtful that there is a simpler method for robust chessboard detection that works faster and more elegantly. I would actually prefer to have a board recognition method based on a convolutional neural network (a more complex method), but at this point I am not quite sure either how to get good training data for the network or how to find a good unsupervised training method (I have tried a few approaches).

In the next post, I'll discuss the evolutionary method that we developed and the other optimization methods that we could have chosen.