Jeff Miller wrote:
> Hi Bruce,
>
>> The following (from the Wikipedia page on box-plots) expands a
>> bit on what Rich said.
> I had seen that, actually, but thanks for your reply.
>
>> * Indicate outliers by open and closed dots. "Extreme" outliers, or
>> those which lie more than three times the IQR to the left and right
from
>> the first and third quartiles respectively, are indicated by the
>> presence of an open dot. "Mild" outliers - that is, those observations
>> which lie more than 1.5 times the IQR from the first and third quartile
>> but are not also extreme outliers are indicated by the presence of a
>> closed dot.
> What I don't like about this is that I don't think these
> extreme values are outliers in my data set, at least not
> in the sense of an outlier as a data point generated (primarily)
> by some process other than the one I am trying to study.
>
> In general, I am very uncomfortable with any such hard
> and fast definitions of outliers. In my area, people often
> exclude outliers from their data sets (e.g., before
> computation of means, etc). These sorts of statistical
> guidelines seem to evolve quickly into statistical
> dogma, leading people to exclude a data point
> at q3+3.001*IQR but include one at q3+2.999*IRQ.
> It seems to me that would probably be a mistake.
Jeff, I agree with your comments about outliers, and doubt that
Tukey ever intended the suspected outliers in box plots to be
treated that way.
Here are some examples of box plots for various kinds of
distributions (including left & right skewed) that might be useful:
http://www.basic.northwestern.edu/statguidefiles/boxplots.html
Cheers,
Bruce
--
Bruce Weaver
bweaver@[EMAIL PROTECTED]
"When all else fails, RTFM."


|