Vista wrote:
>
> Let me describe my problem with an simpler example.
>
> Let's say I have a 2D plane.
>
> I draw a straight line to split the plane to two parts: Left and Right.
The
> equation for the straightline is y=a*x+b.
>
> In the left part the values of F(x, y) are 1s, while in the right part
the
> values of F(x, y) are 0s.
>
> You can query the value of F for any x and y. That's to say, you give me
a
> location (x, y), I will tell you the function value F(x, y).
>
> What's the most efficient way to discover the value of "a" and "b"?
>
First off, the original message was cross-posted all over the place, so
I'm replying to all the same groups. Apologies to anyone who finds what
follows exceptionally boring.
In the absence of a closed form for F, or some strong known properties
of F (and I'm not sure what those properties might be, given that the
range of F is discrete), I don't know that you'll be able to *prove*
that the line (more generally, hyperplane) that you end up with
separates the zeros from the ones. I suspect that the most you could do
would be to randomly sample the domain of F, compute values, verify
they're all correctly separated by the hyperplane, then give some
probability that the hyperplane is universally accurate.
As to computing the hyperplane, a relatively straightforward approach
would be to compute F for a random sample of points from its domain,
then solve the LP
min \sum_{n \in N} d_n
s.t. a'x_n + b + d_n - e_n \ge 1 (n \in O)
a'x_n + b - d_n + e_n \le -1 (n \in Z)
a, b free
d, e \ge 0
where N is the set of indices of your sample points x_n, O and Z are the
sets of indices where F is one respectively zero, a is vector of
variables of the same dimension as x (which is the vector of arguments
to F), b is scalar, and d_n and e_n are nonnegative variables for n \in
N. The variables for the LP model are a (the hyperplane coefficients),
b (the hyperplane offset from the origin) and d and e (deviations from
the boundary hyperplane in the wrong and right directions, respectively).
There's a literature on classification using math programming, in which
this model is usually called the MSD (minimize sum of deviations) model.
There are various other LP classification models, but the key is that
when the training data can be strongly separated via a linear model
(meaning all ones are in the *interior* of one half space and all zeros
in the *interior* of the other half space), all LP models produce valid
separating hyperplanes (valid in the sense of separating the training
data -- no guarantee what happens with non-training data).
If you're not thrilled about solving an LP, there's also a fairly old
algorithm called the "perceptron algorithm" that will find a valid
separating hyperplane (when one exists) by iterative means, with the
number of iterations bounded. It's known to fail when the data cannot
be separated linearly, but again I believe you are assuming it can be.
Two caveats apply. First, different samples will likely produce
different hyperplanes. Second, let's assume that there really is a
hyperplane separating the ones from the zeros. Unless (a) that
hyperplane contains both ones and zeros and (b) your sample includes at
least one point of each type from the hyperplane, the odds of any
solution (LP or otherwise) finding the exact hyperplane are minimal. On
the other hand, if your sample does happen to contain points of both
flavor from the boundary hyperplane, then the training data cannot be
strongly separated, in which case different LP models actually have
different performances (some may be more prone to missing widely than
others). I don't want to recap all the literature here, particularly as
I don't remember it all :-).
To end on a more positive note, Leo Breiman did some interesting work on
"bagging" classifiers that would apply here. A short and possibly
inaccurate summary: generate a random sample and construct a classifier
(separating hyperplane), say by some LP model; generate a new sample and
repeat (getting a somewhat different hyperplane); repeat ad nauseum;
average the coefficients from the hyperplanes you got and hope for the
best.
/Paul


|