org.das2.qds.util.AutoHistogram

Self-configuring histogram dynamically adjusts range and bin size as data is added. Also it tries to identify outlier points, which are available as a {@code Map} going from value to number observed. Also for each bin, we keep track of a running mean and variance, which are useful for identifying continuous bins and total moments. Introduced to support automatic cadence algorithm, should be generally useful in data discovery.

AutoHistogram( )


USER_PROP_BIN_START


USER_PROP_BIN_WIDTH


USER_PROP_INVALID_COUNT


USER_PROP_OUTLIERS


USER_PROP_MIN_GT_ZERO


USER_PROP_TOTAL

Long, total number of valid points.


addPropertyChangeListener

addPropertyChangeListener( java.beans.PropertyChangeListener listener ) → void

Parameters

listener - a PropertyChangeListener

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]


binOf

binOf( QDataSet hist, double d ) → int

convenient method for getting the bin location of a value from a completed histogram's metadata. Note this is inefficient since it must do HashMap lookups to get the bin width and bin start, so use this carefully.

Parameters

hist - a QDataSet
d - a double

Returns:

the index of the bin for the point.

[search for examples] [view on GitHub] [view on old javadoc] [view source]


doit

doit( QDataSet ds ) → QDataSet

Parameters

ds - a QDataSet

Returns:

org.das2.qds.QDataSet

[search for examples] [view on GitHub] [view on old javadoc] [view source]


doit

doit( QDataSet ds, QDataSet wds ) → QDataSet

do a histogram, dynamically shifting the bins and changing the bin size. returns a QDataSet with the planes: each bin has the number of points in each bin. total total of values in the bin depend_0 is the bin names. The property QDataSet.USER_PROPERTIES contains a map with the following keys: binStart, Double, the left side of the first bin. binWidth, Double, the bin width. total, Integer, the number of points in the distribution. outliers, Map, outliers and the count of each outlier.

Parameters

ds - a QDataSet
wds - WeightsDataSet or null.

Returns:

a QDataSet

[search for examples] [view on GitHub] [view on old javadoc] [view source]


getHistogram

getHistogram( ) → org.das2.qds.DDataSet

get the histogram of the data accumulated thus far.

Returns:

an org.das2.qds.DDataSet

[search for examples] [view on GitHub] [view on old javadoc] [view source]


mean

mean( QDataSet hist ) → org.das2.qds.RankZeroDataSet

returns the mean of the dataset that has been histogrammed.

Parameters

hist - a QDataSet

Returns:

an org.das2.qds.RankZeroDataSet

[search for examples] [view on GitHub] [view on old javadoc] [view source]


moments

moments( QDataSet hist ) → org.das2.qds.RankZeroDataSet

returns the mean of the dataset that has been histogrammed.

Parameters

hist - a rank 1 dataset with each bin containing the count in each bin. DEPEND_0 are the labels for each bin. The property "means" returns a rank 1 dataset containing the means for each bin. The property "stddevs" contains the standard deviation within each bin.

Returns:

rank 0 dataset (a Datum) whose value is the mean, and the property("stddev") contains the standard deviation

[search for examples] [view on GitHub] [view on old javadoc] [view source]


monoExtent

monoExtent( QDataSet dep0 ) → QDataSet

fast extent only works when monotonic. Returns null if there is no valid data.

Parameters

dep0 - a QDataSet

Returns:

rank 1 bins dataset or null

[search for examples] [view on GitHub] [view on old javadoc] [view source]


peakIds

peakIds( QDataSet hist ) → QDataSet

return a list of all the peaks in the histogram. A peak is defined as a local maximum, then including the adjacent bins consistent with the peak population, and not belonging to another peak.

Parameters

hist - the output of autohistogram, which has "means" and "stddevs" properties.

Returns:

QDataSet which varies with hist.

[search for examples] [view on GitHub] [view on old javadoc] [view source]


peaks

peaks( QDataSet hist ) → QDataSet

return a list of all the peaks in the histogram. See peakIds to see how peaks are identified. Once the bins of a peak have been identified, then the mean and stddev of each peak is returned. Note the stddev typically reads low, probably because the tails have been removed.

Parameters

hist - the result of AutoHistogram

Returns:

QDataSet rank 1 dataset with length equal to the number of identified peaks

[search for examples] [view on GitHub] [view on old javadoc] [view source]


removePropertyChangeListener

removePropertyChangeListener( java.beans.PropertyChangeListener listener ) → void

Parameters

listener - a PropertyChangeListener

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]


reset

reset( ) → void

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]


simpleRange

simpleRange( QDataSet hist2 ) → QDataSet

returns the simple range, the min and the max containing the data.

Parameters

hist2 - the result of autoHistogram.

Returns:

rank 1 bins dataset showing the min and max. value(0) is the min, value(1) is the max.

[search for examples] [view on GitHub] [view on old javadoc] [view source]