TokenStreamToAutomaton
Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
Functions
Link copied to clipboard
If true, any final offset gaps will result in adding a position hole.
Link copied to clipboard
Whether to generate holes in the automaton for missing positions, true by default.
Link copied to clipboard
Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default
Link copied to clipboard
Pulls the graph (including PositionLengthAttribute) from the provided [ ], and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.