This page presents an applet illustrating the role of Occam's Razor (actually
the *Rule of Simplicity*, as defined on the
interpretations page) in choosing a decision
boundary based on limited sample data.

The applet challenges you to draw a two-class decision boundary, and see how your rule does against the Bayes-optimal and nearest-neighbor rules. A set of training points are displayed, picked from some distribution (either set by the user or picked by the applet and concealed). Based on these points, you are invited to draw the decision boundary which you think will best classify future data points. Then the Bayes-optimal and nearest-neighbor decision rules are calculated and displayed, and a number of further points are plotted, showing the relative success of the three boundaries. You will have trouble beating the Bayes-optimal boundary, but can you beat nearest-neighbor?

**Using the Applet**

- Enter the prior probability for class 1, a number between 0 and 1 (the class 2 prior probability follows implicitly). Then select the distributions for each feature of each class. (For simplicity, we assume that the features are independent.) You can select "Random" to let the applet choose a distribution without revealing it. Click "Done" to go to the next step.
- Build up a training set by plotting a number of sample points. You can plot them one by one or 10 at a time; the less total points you plot, the tougher the challenge. Class 1 points are shown in red, class 2 in blue. When you're ready to draw your decision boundary, click "Done".
- Draw the boundary by clicking the left mouse button and dragging the mouse. Click "Done" to go the next step. Note that the boundary you draw must properly partition the points into two separate sets - be careful not to leave any gaps in your boundary, or the applet will ask you to redraw it.
- Now the applet will calculate the Bayes-optimal and nearest-neighbor boundaries. This can take a while, especially if you plotted a lot of sample points, so please be patient.
- When the applet is done calculating, it will draw the three boundaries: the user boundary in green, Bayes-optimal in cyan, nearest-neighbor in magenta. Now it's time to see how the three boundaries match up. Plot more data points and see how the statistics emerge.

The challenge here, of making the best guess at a decision boundary based on
limited data, is a fundamental problem of pattern recognition, in some sense
*the* fundamental problem. In principle the Bayes-optimal boundary is the
best possible and cannot be beaten, so we would like our guess to approach it.
Unfortunately the Bayes-optimal boundary requires explicit knowledge of the
distributions followed by the data points, which is rarely available a priori.
The nearest-neighbor boundary is simple to calculate, does not rely on
knowledge of the distributions, and is usually not far from the Bayes-optimal
boundary. Often you will find that your best guess at a decision boundary is
similar in shape to the nearest-neighbor boundary; the nearest-neighbor idea
has a strong intuitive pull.

However, one feature of the nearest-neighbor rule is counterintuitive. Because it is constructed from a series of interconnecting lines (from the Voronoi diagram), the nearest-neighbor boundary often features sharp corners where its segments intersect. As you can see by experimenting with different distributions, the Bayes-optimal boundary for classes following simple distributions never has these sharp corners, and you are likely to find that the boundaries you draw are smooth as well. Again, there is something intuitive about smooth decision boundaries, perhaps because so much of our experience of the world is with classes following simple distributions. This is an example of the sort of simplifying intuition that is captured by the Rule of Simplicity interpretation of Occam's Razor.

If you compare the boundaries you draw with those produced by the nearest-neighbor rule, you will often find that your guess has combined these two intuitions: user boundaries typically resemble smoothed or "simplified" versions of the nearest-neighbor boundary. Is this a valid, if unquantitative, improvement? Run through several trials and see how often you can beat the nearest-neighbor rule. Try varying your "smoothing factor": do you do better if you smooth only slightly (producing a slightly rounded version of nearest-neighbor) or heavily (approaching a straight line)?

The applet is slow due to the brute-force approach used to calculate the boundaries, especially the nearest-neighbor boundary. Speed could probably be greatly improved by calculating the Voronoi diagram of the sample points and using this to calculate the nearest neighbors of each new data point. This is left as an exercise for the reader.