Dee.Analyzer¶

Subclasses:: Dee.TextAnalyzer

Methods¶

Inherited:: GObject.Object (37)
Structs:: GObject.ObjectClass (5)

class	`collate_cmp_func` (key1, key2, analyzer)
class	`new` ()
	`add_term_filter` (filter_func, *filter_data)
	`analyze` (data, terms_out, colkeys_out)
	`collate_cmp` (key1, key2)
	`collate_key` (data)
	`tokenize` (data, terms_out)

Virtual Methods¶

Inherited:: GObject.Object (7)

	`do_add_term_filter` (filter_func, filter_data)
	`do_analyze` (data, terms_out, colkeys_out)
	`do_collate_cmp` (key1, key2)
	`do_collate_key` (data)
	`do_tokenize` (data, terms_out)

Properties¶

None

Signals¶

Inherited:: GObject.Object (1)

Fields¶

Inherited:: GObject.Object (1)

Name	Type	Access	Description
parent	`GObject.Object`	r
priv	`Dee.AnalyzerPrivate`	r

Class Details¶

class Dee.Analyzer(**kwargs)¶

Bases:: GObject.Object
Abstract:: No
Structure:: Dee.AnalyzerClass

All fields in the Dee.Analyzer structure are private and should never be accessed directly

classmethod collate_cmp_func(key1, key2, analyzer)¶

Parameters:

key1 (str) – The first key to compare
key2 (str) – The second key to compare
analyzer (object or None) – The Dee.Analyzer to use for the comparison

Returns:

-1, 0 or 1, if key1 is <, == or > than key2.

Return type:

int

A GLib.CompareDataFunc using a Dee.Analyzer to compare the keys. This is just a convenience wrapper around Dee.Analyzer.collate_cmp().

classmethod new()¶

Return type:: Dee.Analyzer

add_term_filter(filter_func, *filter_data)¶

Parameters:

filter_func (Dee.TermFilterFunc) – Function to call
filter_data (object or None) – Data to pass to filter_func when it is invoked

Register a Dee.TermFilterFunc to be called whenever Dee.Analyzer.analyze() is called.

Term filters can be used to normalize, add, or remove terms from an input data stream.

analyze(data, terms_out, colkeys_out)¶

Parameters:

data (str) – The input data to analyze
terms_out (Dee.TermList or None) – A Dee.TermList to place the generated terms in. If None to terms are generated
colkeys_out (Dee.TermList or None) – A Dee.TermList to place generated collation keys in. If None no collation keys are generated

Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).

The terms and corresponding collation keys will be written in order to the provided Dee.TermList s.

Implementation notes for subclasses: The analysis process must call Dee.Analyzer.tokenize() and run the tokens through all term filters added with Dee.Analyzer.add_term_filter(). Collation keys must be generated with Dee.Analyzer.collate_key().

collate_cmp(key1, key2)¶

Parameters:

key1 (str) – The first collation key to compare
key2 (str) – The second collation key to compare

Returns:

-1, 0 or 1, if key1 is <, == or > than key2.

Return type:

int

Compare collation keys generated by Dee.Analyzer.collate_key() with similar semantics as strcmp(). See also Dee.Analyzer.collate_cmp_func() if you need a version of this function that works as a GLib.CompareDataFunc.

The default implementation in Dee.Analyzer just uses strcmp().

collate_key(data)¶

Parameters:: data (str) – The input data to generate a collation key for
Returns:: A newly allocated collation key. Use Dee.Analyzer.collate_cmp() or Dee.Analyzer.collate_cmp_func() to compare collation keys. Free with GLib.free().
Return type:: str

Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).

The default implementation just calls GLib.strdup().

tokenize(data, terms_out)¶

Parameters:

data (str) – The input data to analyze
terms_out (Dee.TermList) – A Dee.TermList to place the generated tokens in.

Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).

Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.

do_add_term_filter(filter_func, filter_data) virtual¶

Parameters:

filter_func (Dee.TermFilterFunc) – Function to call
filter_data (object or None) – Data to pass to filter_func when it is invoked

Register a Dee.TermFilterFunc to be called whenever Dee.Analyzer.analyze() is called.

Term filters can be used to normalize, add, or remove terms from an input data stream.

do_analyze(data, terms_out, colkeys_out) virtual¶

Parameters:

data (str) – The input data to analyze
terms_out (Dee.TermList or None) – A Dee.TermList to place the generated terms in. If None to terms are generated
colkeys_out (Dee.TermList or None) – A Dee.TermList to place generated collation keys in. If None no collation keys are generated

Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).

The terms and corresponding collation keys will be written in order to the provided Dee.TermList s.

Implementation notes for subclasses: The analysis process must call Dee.Analyzer.tokenize() and run the tokens through all term filters added with Dee.Analyzer.add_term_filter(). Collation keys must be generated with Dee.Analyzer.collate_key().

do_collate_cmp(key1, key2) virtual¶

Parameters:

key1 (str) – The first collation key to compare
key2 (str) – The second collation key to compare

Returns:

-1, 0 or 1, if key1 is <, == or > than key2.

Return type:

int

Compare collation keys generated by Dee.Analyzer.collate_key() with similar semantics as strcmp(). See also Dee.Analyzer.collate_cmp_func() if you need a version of this function that works as a GLib.CompareDataFunc.

The default implementation in Dee.Analyzer just uses strcmp().

do_collate_key(data) virtual¶

Parameters:: data (str) – The input data to generate a collation key for
Returns:: A newly allocated collation key. Use Dee.Analyzer.collate_cmp() or Dee.Analyzer.collate_cmp_func() to compare collation keys. Free with GLib.free().
Return type:: str

Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).

The default implementation just calls GLib.strdup().

do_tokenize(data, terms_out) virtual¶

Parameters:

data (str) – The input data to analyze
terms_out (Dee.TermList) – A Dee.TermList to place the generated tokens in.

Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).

Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.