Dee.Analyzer

g Dee.Analyzer Dee.Analyzer GObject.Object GObject.Object GObject.Object->Dee.Analyzer

Subclasses:

Dee.TextAnalyzer

Methods

Inherited:

GObject.Object (37)

Structs:

GObject.ObjectClass (5)

class

collate_cmp_func (key1, key2, analyzer)

class

new ()

add_term_filter (filter_func, *filter_data)

analyze (data, terms_out, colkeys_out)

collate_cmp (key1, key2)

collate_key (data)

tokenize (data, terms_out)

Virtual Methods

Inherited:

GObject.Object (7)

do_add_term_filter (filter_func, filter_data)

do_analyze (data, terms_out, colkeys_out)

do_collate_cmp (key1, key2)

do_collate_key (data)

do_tokenize (data, terms_out)

Properties

None

Signals

Inherited:

GObject.Object (1)

Fields

Inherited:

GObject.Object (1)

Name

Type

Access

Description

parent

GObject.Object

r

priv

Dee.AnalyzerPrivate

r

Class Details

class Dee.Analyzer(**kwargs)
Bases:

GObject.Object

Abstract:

No

Structure:

Dee.AnalyzerClass

All fields in the Dee.Analyzer structure are private and should never be accessed directly

classmethod collate_cmp_func(key1, key2, analyzer)
Parameters:
  • key1 (str) – The first key to compare

  • key2 (str) – The second key to compare

  • analyzer (object or None) – The Dee.Analyzer to use for the comparison

Returns:

-1, 0 or 1, if key1 is <, == or > than key2.

Return type:

int

A GLib.CompareDataFunc using a Dee.Analyzer to compare the keys. This is just a convenience wrapper around Dee.Analyzer.collate_cmp().

classmethod new()
Return type:

Dee.Analyzer

add_term_filter(filter_func, *filter_data)
Parameters:

Register a Dee.TermFilterFunc to be called whenever Dee.Analyzer.analyze() is called.

Term filters can be used to normalize, add, or remove terms from an input data stream.

analyze(data, terms_out, colkeys_out)
Parameters:

Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).

The terms and corresponding collation keys will be written in order to the provided Dee.TermList s.

Implementation notes for subclasses: The analysis process must call Dee.Analyzer.tokenize() and run the tokens through all term filters added with Dee.Analyzer.add_term_filter(). Collation keys must be generated with Dee.Analyzer.collate_key().

collate_cmp(key1, key2)
Parameters:
  • key1 (str) – The first collation key to compare

  • key2 (str) – The second collation key to compare

Returns:

-1, 0 or 1, if key1 is <, == or > than key2.

Return type:

int

Compare collation keys generated by Dee.Analyzer.collate_key() with similar semantics as strcmp(). See also Dee.Analyzer.collate_cmp_func() if you need a version of this function that works as a GLib.CompareDataFunc.

The default implementation in Dee.Analyzer just uses strcmp().

collate_key(data)
Parameters:

data (str) – The input data to generate a collation key for

Returns:

A newly allocated collation key. Use Dee.Analyzer.collate_cmp() or Dee.Analyzer.collate_cmp_func() to compare collation keys. Free with GLib.free().

Return type:

str

Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).

The default implementation just calls GLib.strdup().

tokenize(data, terms_out)
Parameters:

Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).

Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.

do_add_term_filter(filter_func, filter_data) virtual
Parameters:

Register a Dee.TermFilterFunc to be called whenever Dee.Analyzer.analyze() is called.

Term filters can be used to normalize, add, or remove terms from an input data stream.

do_analyze(data, terms_out, colkeys_out) virtual
Parameters:

Extract terms and or collation keys from some input data (which is normally, but not necessarily, a UTF-8 string).

The terms and corresponding collation keys will be written in order to the provided Dee.TermList s.

Implementation notes for subclasses: The analysis process must call Dee.Analyzer.tokenize() and run the tokens through all term filters added with Dee.Analyzer.add_term_filter(). Collation keys must be generated with Dee.Analyzer.collate_key().

do_collate_cmp(key1, key2) virtual
Parameters:
  • key1 (str) – The first collation key to compare

  • key2 (str) – The second collation key to compare

Returns:

-1, 0 or 1, if key1 is <, == or > than key2.

Return type:

int

Compare collation keys generated by Dee.Analyzer.collate_key() with similar semantics as strcmp(). See also Dee.Analyzer.collate_cmp_func() if you need a version of this function that works as a GLib.CompareDataFunc.

The default implementation in Dee.Analyzer just uses strcmp().

do_collate_key(data) virtual
Parameters:

data (str) – The input data to generate a collation key for

Returns:

A newly allocated collation key. Use Dee.Analyzer.collate_cmp() or Dee.Analyzer.collate_cmp_func() to compare collation keys. Free with GLib.free().

Return type:

str

Generate a collation key for a set of input data (usually a UTF-8 string passed through tokenization and term filters of the analyzer).

The default implementation just calls GLib.strdup().

do_tokenize(data, terms_out) virtual
Parameters:

Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).

Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.