org.das2.qds.util.AsciiParser

Class for reading ASCII tables into a QDataSet. This parses a file by breaking it up into records, and passing the record off to a delegate record parser. The record parser then breaks up the record into fields, and each field is parsed by a delegate field parser. Each column of the table has a Unit, field name, and field label associated with it. Examples of record parsers include DelimParser, which splits the record by a delimiter such as a tab or comma, RegexParser, which processes each record with a regular expression to get the fields, and FixedColumnsParser, which splits the record by character positions. Example of field parsers include DOUBLE_PARSER which parses the value as a double, and UNITS_PARSER, which uses the Unit attached to the column to interpret the value. When the first record with the correct number of fields is found but is not parseable, we look for field labels and units. The skipLines property tells the parser to skip a given number of header lines before attempting to parse the record. Also, commentPrefix identifies lines to be ignored. In either the header or in comments, we look for propertyPattern, and if a property is matched, then the builder property is set. Two Patterns are provided NAME_COLON_VALUE_PATTERN and NAME_EQUAL_VALUE_PATTERN for convenience. Adapted to QDataSet model, Jeremy, May 2007.

AsciiParser( )

Creates a new instance. This is created and then configured before any files can be parsed.

NAME_COLON_VALUE_PATTERN

pattern for name:value.

NAME_EQUAL_VALUE_PATTERN

pattern for name=value.

PROPERTY_FIELD_NAMES

PROPERTY_FILE_HEADER

PROPERTY_FIRST_RECORD

PROPERTY_FIELD_PARSER

DELIM_COMMA

DELIM_TAB

DELIM_WHITESPACE

UNIT_UTC

Convenient unit for parsing UTC times.

PROP_HEADERDELIMITER

DOUBLE_PARSER

parses the field using Double.parseDouble, Java's double parser.

UNITS_PARSER

delegates to the unit object set for this field to parse the data.

ENUMERATION_PARSER

uses the EnumerationUnits for the field to create a Datum.

PROP_VALIDMIN

PROP_VALIDMAX

addPropertyChangeListener

addPropertyChangeListener( java.beans.PropertyChangeListener l ) → void

Adds a PropertyChangeListener to the listener list.

Parameters

l - The listener to add.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getDelimParser

getDelimParser( int fieldCount, String delim ) → DelimParser

provide more control to external codes by providing a way to assert that an N-column delim parser should be used.

Parameters

fieldCount - an int
delim - the delimiter pattern, such as "," or "\s+"

Returns:

the DelimParser.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getFieldCount

getFieldCount( ) → int

return the number of fields in each record. Note the RecordParsers also have a fieldCount, which should be equal to this. This allows them to be independent of the parser.

Returns:

an int

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getFieldIndex

getFieldIndex( String string ) → int

returns the index of the field. Supports the name, or field0, or 0, etc. returns -1 when the column is not identified.

Parameters

string - the label for the field, such as "field2" or "time"

Returns:

-1 or the index of the field.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getFieldLabels

getFieldLabels( ) → String

return the labels found for each field. If a label wasn't found, then the name is returned.

Returns:

a java.lang.String[]

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getFieldNames

getFieldNames( ) → String

return the name of each field. field0, field1, ... are the default names when names are not discovered in the table. Changing the array will not affect internal representation.

Returns:

a java.lang.String[]

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getFieldUnits

getFieldUnits( ) → String

return the units that were associated with the field. This might also be the channel label for spectrograms. In "field0(str)" or "field0[str]" this is str. elements may be null if not found.

Returns:

a java.lang.String[]

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getFillValue

getFillValue( ) → double

return the fillValue. numbers that parse to this value are considered to be fill. Note validMin and validMax may be used as well.

Returns:

Value of property fillValue.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getHeaderDelimiter

getHeaderDelimiter( ) → String

get the header delimiter

Returns:

the header delimiter.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getRecordParser

getRecordParser( ) → RecordParser

Getter for property recordParser.

Returns:

Value of property recordParser.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getRegexForFormat

getRegexForFormat( String format ) → String

Convert FORTRAN (F77) style format to C-style format specifiers.

Parameters

format - for example "%5d%5d%9f%s"

Returns:

for example "d5,d5,f9,a"

getRegexParser

getRegexParser( String regex ) → RegexParser

return a regex parser for the given regular expression. Groups are used for the fields, for example getRegexParser( 'X (\d+) (\d+)' ) would parse lines like "X 00005 00006".

Parameters

regex - a String

Returns:

the regex parser

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getRegexParserForFormat

getRegexParserForFormat( String format ) → RegexParser

see private TimeParser(String formatString, Map fieldHandlers), which is very similar.

"%5d%5d%9f%s"
"d5,d5,f9,a"

Parameters

format - a String

Returns:

an org.das2.qds.util.AsciiParser.RegexParser

getRichFields

getRichFields( ) → Map

returns the high rank rich fields in a map from NAME to LABEL. NAME:>fieldX< or NAME:>fieldX-fieldY<

Returns:

the high rank rich fields in a map from NAME to LABEL.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getUnits

getUnits( int index ) → Units

Indexed getter for property units.

Parameters

index - Index of the property.

Returns:

Value of the property at index.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getValidMax

getValidMax( ) → double

get the maximum value for any field.

Returns:

the validMax

[search for examples] [view on GitHub] [view on old javadoc] [view source]

getValidMin

getValidMin( ) → double

get the minimum valid value for any field.

Returns:

validMin

[search for examples] [view on GitHub] [view on old javadoc] [view source]

guessDelimParser

guessDelimParser( String line ) → DelimParser

Parameters

line - a String

Returns:

org.das2.qds.util.AsciiParser.DelimParser

[search for examples] [view on GitHub] [view on old javadoc] [view source]

guessDelimParser( String line, int lineNumber ) → DelimParser

guessFieldCount

guessFieldCount( String filename ) → int

return the field count that would result in the largest number of records parsed. The entire file is scanned, and for each line the number of decimal fields is counted. At the end of the scan, the fieldCount with the highest record count is returned.

Parameters

filename - the file name, a local file opened with a FileReader

Returns:

the apparent field count.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

guessLengthForFormat

guessLengthForFormat( String format ) → int

return the length of the format specifier. %30d -> 30 %30d%5f -> 35. TODO: consider String.format(format,1) or String.format(format,1.0).

Parameters

format - a String

Returns:

an int

[search for examples] [view on GitHub] [view on old javadoc] [view source]

guessSkipAndDelimParser

guessSkipAndDelimParser( String filename ) → DelimParser

read in records, allowing for a header of non-records before guessing the delim parser. This will return a reference to the DelimParser and set skipLines. DelimParser header field is set as well. One must set the record parser explicitly.

Parameters

filename - a String

Returns:

the record parser to use, or null if no records are found.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

guessSkipLines

guessSkipLines( String filename, org.das2.qds.util.AsciiParser.RecordParser recParser ) → int

try to figure out how many lines to skip by looking for the line where the number of fields becomes stable.

Parameters

filename - a String
recParser - an AsciiParser.RecordParser

Returns:

an int

[search for examples] [view on GitHub] [view on old javadoc] [view source]

isHeader

isHeader( int iline, String lastLine, String thisLine, int recCount ) → boolean

returns true if the line is a header or comment.

Parameters

iline - the line number in the file, starting with 0.
lastLine - the last line read.
thisLine - the line we are testing.
recCount - the number of records successfully read.

Returns:

true if the line is a header line.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

isIso8601Time

isIso8601Time( String s ) → boolean

quick-n-dirty check to see if a string appears to be an ISO8601 time. minimally 2000-002T00:00, but also 2000-01-01T00:00:00Z etc. Note that an external code may explicitly indicate that the field is a time, This is just to catch things that are obviously times.

Parameters

s - a String

Returns:

true if this is clearly an ISO time.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

isKeepFileHeader

isKeepFileHeader( ) → boolean

Getter for property keepHeader.

Returns:

Value of property keepHeader.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

isRichHeader

isRichHeader( String header ) → boolean

return true if the header appears to contain JSON code which could be interpreted as a "Rich Header" (a.k.a. JSONHeadedASCII). This is a very simple test, simply looking for #{ and #} with a colon contained within.

Parameters

header - string containing the commented header.

Returns:

true if parsing as a Rich Header should be attempted.

newParser

newParser( int fieldCount ) → AsciiParser

creates a parser with @param fieldCount fields, named "field0,...,fieldN"

Parameters

fieldCount - the number of fields

Returns:

the file parser

[search for examples] [view on GitHub] [view on old javadoc] [view source]

newParser( java.lang.String[] fieldNames ) → AsciiParser

readFile

readFile( String filename, ProgressMonitor mon ) → WritableDataSet

Parse the file using the current settings.

Parameters

filename - the file to read
mon - a monitor

Returns:

a rank 2 dataset.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

readFirstParseableRecord

readFirstParseableRecord( String filename ) → String

returns the first record that the record parser parses successfully. The recordParser should be set and configured enough to identify the fields. If no records can be parsed, then null is returned. The first record should be in the first 1000 lines.

Parameters

filename - a String

Returns:

the first parseable line, or null if no such line exists.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

readFirstRecord

readFirstRecord( String filename ) → String

return the first record that the parser would parse. If skipLines is more than the total number of lines, or all lines are comments, then null is returned.

Parameters

filename - a String

Returns:

the first line after skip lines and comment lines.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

readFirstRecord( java.io.BufferedReader reader ) → String

readStream

readStream( java.io.Reader in, ProgressMonitor mon ) → WritableDataSet

Parse the stream using the current settings.

Parameters

in - the input stream
mon - a ProgressMonitor

Returns:

an org.das2.qds.WritableDataSet

[search for examples] [view on GitHub] [view on old javadoc] [view source]

readStream( java.io.Reader in, String firstRecord, ProgressMonitor mon ) → WritableDataSet

readString

readString( String str, ProgressMonitor mon ) → WritableDataSet

Parameters

str - the data, encoded in a UTF-8 string
mon - null or a progress monitor

Returns:

the data

[search for examples] [view on GitHub] [view on old javadoc] [view source]

removePropertyChangeListener

removePropertyChangeListener( java.beans.PropertyChangeListener l ) → void

Removes a PropertyChangeListener from the listener list.

Parameters

l - The listener to remove.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setCommentPrefix

setCommentPrefix( String comment ) → void

Records starting with this are not processed as data, for example "#". This is initially "#". Setting this to null disables this check.

Parameters

comment - the prefix

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setDelimParser

setDelimParser( String filename, String delimRegex ) → DelimParser

The DelimParser splits each record into fields using a delimiter like "," or "\\s+".

Parameters

filename - filename to read in.
delimRegex - the delimiter, such as "," or "\t" or "\s+"

Returns:

the record parser that will split each line into fields

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setDelimParser( String line, String delimRegex, int expectedColumnCount ) → DelimParser
setDelimParser( java.io.Reader in, String delimRegex ) → DelimParser

setFieldParser

setFieldParser( int field, org.das2.qds.util.AsciiParser.FieldParser fp ) → void

set the special parser for a field.

Parameters

field - the field number, 0 is the first column.
fp - the parser

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setFillValue

setFillValue( double fillValue ) → void

numbers that parse to this value are considered to be fill.

Parameters

fillValue - New value of property fillValue.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setFixedColumnsParser

setFixedColumnsParser( String filename, String delim ) → FixedColumnsParser

looks at the first line after skipping, and splits it to calculate where the columns are. The FixedColumnsParser is the fastest of the three parsers.

Parameters

filename - filename to read in.
delim - regex to split the initial line into the fixed columns.

Returns:

the record parser that will split each line.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setFixedColumnsParser( java.io.Reader in, String delim ) → FixedColumnsParser
setFixedColumnsParser( int[] columnOffsets, int[] columnWidths, org.das2.qds.util.AsciiParser.FieldParser[] parsers ) → FixedColumnsParser

setHeaderDelimiter

setHeaderDelimiter( String headerDelimiter ) → void

set the delimiter which explicitly separates header from the data. For example "-------" could be used. Normally the parser just looks at the number of fields and this is sufficient.

Parameters

headerDelimiter - a String

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setKeepFileHeader

setKeepFileHeader( boolean keepHeader ) → void

Setter for property keepHeader. By default false but if true, the file header ignored by skipLines is put into the property PROPERTY_FILE_HEADER.

Parameters

keepHeader - New value of property keepHeader.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setPropertyPattern

setPropertyPattern( java.util.regex.Pattern propertyPattern ) → void

specify the Pattern used to recognize properties. Note property values are not parsed, they are provided as Strings. This is a regular expression with two groups for the property name and value. For example, (.+)=(.+)

Parameters

propertyPattern - regular expression Pattern with two groups.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setRecordCountLimit

setRecordCountLimit( int recordCountLimit ) → void

limit the number of records read. parsing will stop once this number of records is read into the result. This is Integer.MAX_VALUE by default.

Parameters

recordCountLimit - an int

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setRecordParser

setRecordParser( org.das2.qds.util.AsciiParser.RecordParser recordParser ) → void

Setter for property recordParser.

Parameters

recordParser - New value of property recordParser.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setRecordStart

setRecordStart( int recordStart ) → void

set the number of records to skip before accumulating the result.

Parameters

recordStart - an int

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setRegexParser

setRegexParser( java.lang.String[] fieldNames ) → RecordParser

The regex parser is a slow parser, but gives precise control.

Parameters

fieldNames - a java.lang.String[]

Returns:

the parser for each record.

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setSkipLines

setSkipLines( int skipLines ) → void

skip a number of lines before trying to parse anything. This can be set to point at the first valid line, and the RecordParser will be configured using that line.

Parameters

skipLines - an int

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setUnits

setUnits( int index, Units units ) → void

Indexed setter for property units. This now sets the field parser for the field to be a UNITS_PARSER if it is the default DOUBLE_PARSER.

Parameters

index - Index of the property.
units - New value of the property at index.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setUnits( org.das2.datum.Units[] u ) → void

setValidMax

setValidMax( double validMax ) → void

set the maximum value for any field. Values above this are to be considered invalid.

Parameters

validMax - a double

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setValidMin

setValidMin( double validMin ) → void

set the minimum valid value for any field. Values less than this are to be considered invalid.

Parameters

validMin - a double

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

setWhereConstraint

setWhereConstraint( String sparm, String op, String sval ) → void

allow constraint for where condition is true. This doesn't need the data to be interpreted for "eq", string equality is checked for nominal data. Note sval is compared after trimming outside spaces.

Parameters

sparm - column name, such as "field4"
op - constraint, one of eq gt ge lt le ne
sval - String value. For nominal columns, String equality is used.

Returns:

void (returns nothing)

[search for examples] [view on GitHub] [view on old javadoc] [view source]

org.das2.qds.util.AsciiParser

AsciiParser( )

NAME_COLON_VALUE_PATTERN

NAME_EQUAL_VALUE_PATTERN

PROPERTY_FIELD_NAMES

PROPERTY_FILE_HEADER

PROPERTY_FIRST_RECORD

PROPERTY_FIELD_PARSER

DELIM_COMMA

DELIM_TAB

DELIM_WHITESPACE

UNIT_UTC

PROP_HEADERDELIMITER

DOUBLE_PARSER

UNITS_PARSER

ENUMERATION_PARSER

PROP_VALIDMIN

PROP_VALIDMAX

addPropertyChangeListener

Parameters

Returns:

getDelimParser

Parameters

Returns:

getFieldCount

Returns:

getFieldIndex

Parameters

Returns:

getFieldLabels

Returns:

getFieldNames

Returns:

getFieldUnits

Returns:

getFillValue

Returns:

getHeaderDelimiter

Returns:

getRecordParser

Returns:

getRegexForFormat

Parameters

Returns:

See Also:

getRegexParser

Parameters

Returns:

getRegexParserForFormat

Parameters

Returns:

See Also:

getRichFields

Returns:

getUnits

Parameters

Returns:

getValidMax

Returns:

getValidMin

Returns:

guessDelimParser

Parameters

Returns:

guessFieldCount

Parameters

Returns:

guessLengthForFormat

Parameters

Returns:

guessSkipAndDelimParser

Parameters

Returns:

guessSkipLines

Parameters

Returns:

isHeader

Parameters

Returns:

isIso8601Time