Dee.Analyzer¶
- Subclasses:
Methods¶
- Inherited:
- Structs:
class |
|
class |
|
|
|
|
|
|
|
|
|
|
Virtual Methods¶
- Inherited:
|
|
|
|
|
|
|
|
|
Properties¶
None
Signals¶
- Inherited:
Fields¶
- Inherited:
Name |
Type |
Access |
Description |
---|---|---|---|
parent |
r |
||
priv |
r |
Class Details¶
- class Dee.Analyzer(**kwargs)¶
- Bases:
- Abstract:
No
- Structure:
All fields in the
Dee.Analyzer
structure are private and should never be accessed directly- classmethod collate_cmp_func(key1, key2, analyzer)¶
- Parameters:
key1 (
str
) – The first key to comparekey2 (
str
) – The second key to compareanalyzer (
object
orNone
) – TheDee.Analyzer
to use for the comparison
- Returns:
-1, 0 or 1, if key1 is <, == or > than key2.
- Return type:
A
GLib.CompareDataFunc
using aDee.Analyzer
to compare the keys. This is just a convenience wrapper aroundDee.Analyzer.collate_cmp
().
- classmethod new()¶
- Return type:
- add_term_filter(filter_func, *filter_data)¶
- Parameters:
filter_func (
Dee.TermFilterFunc
) – Function to callfilter_data (
object
orNone
) – Data to pass to filter_func when it is invoked
Register a
Dee.TermFilterFunc
to be called wheneverDee.Analyzer.analyze
() is called.Term filters can be used to normalize, add, or remove terms from an input data stream.
- analyze(data, terms_out, colkeys_out)¶
- Parameters:
data (
str
) – The input data to analyzeterms_out (
Dee.TermList
orNone
) – ADee.TermList
to place the generated terms in. IfNone
to terms are generatedcolkeys_out (
Dee.TermList
orNone
) – ADee.TermList
to place generated collation keys in. IfNone
no collation keys are generated
Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).
The terms and corresponding collation keys will be written in order to the provided
Dee.TermList
s.Implementation notes for subclasses: The analysis process must call
Dee.Analyzer.tokenize
() and run the tokens through all term filters added withDee.Analyzer.add_term_filter
(). Collation keys must be generated withDee.Analyzer.collate_key
().
- collate_cmp(key1, key2)¶
- Parameters:
- Returns:
-1, 0 or 1, if key1 is <, == or > than key2.
- Return type:
Compare collation keys generated by
Dee.Analyzer.collate_key
() with similar semantics as strcmp(). See alsoDee.Analyzer.collate_cmp_func
() if you need a version of this function that works as aGLib.CompareDataFunc
.The default implementation in
Dee.Analyzer
just uses strcmp().
- collate_key(data)¶
- Parameters:
data (
str
) – The input data to generate a collation key for- Returns:
A newly allocated collation key. Use
Dee.Analyzer.collate_cmp
() orDee.Analyzer.collate_cmp_func
() to compare collation keys. Free withGLib.free
().- Return type:
Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).
The default implementation just calls
GLib.strdup
().
- tokenize(data, terms_out)¶
- Parameters:
data (
str
) – The input data to analyzeterms_out (
Dee.TermList
) – ADee.TermList
to place the generated tokens in.
Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).
Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.
- do_add_term_filter(filter_func, filter_data) virtual¶
- Parameters:
filter_func (
Dee.TermFilterFunc
) – Function to callfilter_data (
object
orNone
) – Data to pass to filter_func when it is invoked
Register a
Dee.TermFilterFunc
to be called wheneverDee.Analyzer.analyze
() is called.Term filters can be used to normalize, add, or remove terms from an input data stream.
- do_analyze(data, terms_out, colkeys_out) virtual¶
- Parameters:
data (
str
) – The input data to analyzeterms_out (
Dee.TermList
orNone
) – ADee.TermList
to place the generated terms in. IfNone
to terms are generatedcolkeys_out (
Dee.TermList
orNone
) – ADee.TermList
to place generated collation keys in. IfNone
no collation keys are generated
Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).
The terms and corresponding collation keys will be written in order to the provided
Dee.TermList
s.Implementation notes for subclasses: The analysis process must call
Dee.Analyzer.tokenize
() and run the tokens through all term filters added withDee.Analyzer.add_term_filter
(). Collation keys must be generated withDee.Analyzer.collate_key
().
- do_collate_cmp(key1, key2) virtual¶
- Parameters:
- Returns:
-1, 0 or 1, if key1 is <, == or > than key2.
- Return type:
Compare collation keys generated by
Dee.Analyzer.collate_key
() with similar semantics as strcmp(). See alsoDee.Analyzer.collate_cmp_func
() if you need a version of this function that works as aGLib.CompareDataFunc
.The default implementation in
Dee.Analyzer
just uses strcmp().
- do_collate_key(data) virtual¶
- Parameters:
data (
str
) – The input data to generate a collation key for- Returns:
A newly allocated collation key. Use
Dee.Analyzer.collate_cmp
() orDee.Analyzer.collate_cmp_func
() to compare collation keys. Free withGLib.free
().- Return type:
Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).
The default implementation just calls
GLib.strdup
().
- do_tokenize(data, terms_out) virtual¶
- Parameters:
data (
str
) – The input data to analyzeterms_out (
Dee.TermList
) – ADee.TermList
to place the generated tokens in.
Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).
Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.