Poppler.Document

g GObject.Object GObject.Object Poppler.Document Poppler.Document GObject.Object->Poppler.Document

Subclasses:

None

Methods

Inherited:

GObject.Object (37)

Structs:

GObject.ObjectClass (5)

class

new_from_bytes (bytes, password)

class

new_from_data (data, password)

class

new_from_fd (fd, password)

class

new_from_file (uri, password)

class

new_from_gfile (file, password, cancellable)

class

new_from_stream (stream, length, password, cancellable)

create_dests_tree ()

find_dest (link_name)

get_attachments ()

get_author ()

get_creation_date ()

get_creation_date_time ()

get_creator ()

get_form_field (id)

get_id ()

get_keywords ()

get_metadata ()

get_modification_date ()

get_modification_date_time ()

get_n_attachments ()

get_n_pages ()

get_n_signatures ()

get_page (index)

get_page_by_label (label)

get_page_layout ()

get_page_mode ()

get_pdf_conformance ()

get_pdf_part ()

get_pdf_subtype ()

get_pdf_subtype_string ()

get_pdf_version ()

get_pdf_version_string ()

get_permissions ()

get_print_duplex ()

get_print_n_copies ()

get_print_page_ranges ()

get_print_scaling ()

get_producer ()

get_signature_fields ()

get_subject ()

get_title ()

has_attachments ()

has_javascript ()

is_linearized ()

reset_form (fields, exclude_fields)

save (uri)

save_a_copy (uri)

save_to_fd (fd, include_changes)

set_author (author)

set_creation_date (creation_date)

set_creation_date_time (creation_datetime)

set_creator (creator)

set_keywords (keywords)

set_modification_date (modification_date)

set_modification_date_time (modification_datetime)

set_producer (producer)

set_subject (subject)

set_title (title)

Virtual Methods

Inherited:

GObject.Object (7)

Properties

Name

Type

Flags

Short Description

author

str

r/w

The author of the document

creation-date

int

d/r/w

The date and time the document was created deprecated

creation-datetime

GLib.DateTime

r/w

The date and time the document was created

creator

str

r/w

The software that created the document

format

str

r

The PDF version of the document

format-major

int

r

The PDF major version number of the document

format-minor

int

r

The PDF minor version number of the document

keywords

str

r/w

Keywords

linearized

bool

r

Is the document optimized for web viewing?

metadata

str

r

Embedded XML metadata

mod-date

int

d/r/w

The date and time the document was modified deprecated

mod-datetime

GLib.DateTime

r/w

The date and time the document was modified

page-layout

Poppler.PageLayout

r

Initial Page Layout

page-mode

Poppler.PageMode

r

Page Mode

permissions

Poppler.Permissions

r

Permissions

print-duplex

Poppler.PrintDuplex

r

Duplex Viewer Preference

print-n-copies

int

r

Number of Copies Viewer Preference

print-scaling

Poppler.PrintScaling

r

Print Scaling Viewer Preference

producer

str

r/w

The software that converted the document

subject

str

r/w

Subjects the document touches

subtype

Poppler.PDFSubtype

r

The PDF subtype of the document

subtype-conformance

Poppler.PDFConformance

r

The conformance level of PDF subtype

subtype-part

Poppler.PDFPart

r

The part of PDF conformance

subtype-string

str

r

The PDF subtype of the document

title

str

r/w

The title of the document

viewer-preferences

Poppler.ViewerPreferences

r

Viewer Preferences

Signals

Inherited:

GObject.Object (1)

Fields

Inherited:

GObject.Object (1)

Class Details

class Poppler.Document(**kwargs)
Bases:

GObject.Object

Abstract:

No

classmethod new_from_bytes(bytes, password)
Parameters:
Raises:

GLib.Error

Returns:

a newly created Poppler.Document, or None

Return type:

Poppler.Document

Creates a new Poppler.Document from bytes. The returned document will hold a reference to bytes.

On error, None is returned, with error set. Possible errors include those in the #POPPLER_ERROR and #G_FILE_ERROR domains.

New in version 0.82.

classmethod new_from_data(data, password)
Parameters:
  • data (bytes) – the pdf data

  • password (str or None) – password to unlock the file with, or None

Raises:

GLib.Error

Returns:

A newly created Poppler.Document, or None

Return type:

Poppler.Document

Creates a new Poppler.Document. If None is returned, then error will be set. Possible errors include those in the #POPPLER_ERROR and #G_FILE_ERROR domains.

Note that data is not copied nor is a new reference to it created. It must remain valid and cannot be destroyed as long as the returned document exists.

Deprecated since version 0.82: This requires directly managing length and data. Use Poppler.Document.new_from_bytes() instead.

classmethod new_from_fd(fd, password)
Parameters:
  • fd (int) – a valid file descriptor

  • password (str or None) – password to unlock the file with, or None

Raises:

GLib.Error

Returns:

a new Poppler.Document, or None

Return type:

Poppler.Document

Creates a new Poppler.Document reading the PDF contents from the file descriptor fd. fd must refer to a regular file, or STDIN, and be open for reading. Possible errors include those in the #POPPLER_ERROR and #G_FILE_ERROR domains. Note that this function takes ownership of fd; you must not operate on it again, nor close it.

New in version 21.12.0.

classmethod new_from_file(uri, password)
Parameters:
  • uri (str) – uri of the file to load

  • password (str or None) – password to unlock the file with, or None

Raises:

GLib.Error

Returns:

A newly created Poppler.Document, or None

Return type:

Poppler.Document

Creates a new Poppler.Document. If None is returned, then error will be set. Possible errors include those in the #POPPLER_ERROR and #G_FILE_ERROR domains.

classmethod new_from_gfile(file, password, cancellable)
Parameters:
Raises:

GLib.Error

Returns:

a new Poppler.Document, or None

Return type:

Poppler.Document

Creates a new Poppler.Document reading the PDF contents from file. Possible errors include those in the #POPPLER_ERROR and #G_FILE_ERROR domains.

New in version 0.22.

classmethod new_from_stream(stream, length, password, cancellable)
Parameters:
Raises:

GLib.Error

Returns:

a new Poppler.Document, or None

Return type:

Poppler.Document

Creates a new Poppler.Document reading the PDF contents from stream. Note that the given Gio.InputStream must be seekable or Gio.IOErrorEnum.NOT_SUPPORTED will be returned. Possible errors include those in the #POPPLER_ERROR, #G_FILE_ERROR and #G_IO_ERROR domains.

New in version 0.22.

create_dests_tree()
Returns:

the GLib.Tree, or None

Return type:

GLib.Tree or None

Creates a balanced binary tree of all named destinations in self

The tree key is strings in the form returned by Poppler.named_dest_to_bytestring() which constains a destination name. The tree value is the Poppler.Dest which contains a named destination. The return value must be freed with GLib.Tree.destroy().

New in version 0.78.

find_dest(link_name)
Parameters:

link_name (str) – a named destination

Returns:

a new Poppler.Dest destination, or None if link_name is not a destination.

Return type:

Poppler.Dest

Creates a Poppler.Dest for the named destination link_name in self.

Note that named destinations are bytestrings, not string. That means that unless link_name was returned by a poppler function (e.g. is Poppler.Dest.named_dest), it needs to be converted to string using Poppler.named_dest_from_bytestring() before being passed to this function.

The returned value must be freed with Poppler.Dest.free().

get_attachments()
Returns:

a list of available attachments.

Return type:

[Poppler.Attachment]

Returns a GLib.List containing Poppler.Attachment s. These attachments are unowned, and must be unreffed, and the list must be freed with g_list_free().

get_author()
Returns:

a new allocated string containing the author of self, or None

Return type:

str

Returns the author of the document

New in version 0.16.

get_creation_date()
Returns:

the date the document was created, or -1

Return type:

int

Returns the date the document was created as seconds since the Epoch

New in version 0.16.

get_creation_date_time()
Returns:

the date the document was created, or None

Return type:

GLib.DateTime or None

Returns the date the document was created as a GLib.DateTime

New in version 20.09.0.

get_creator()
Returns:

a new allocated string containing the creator of self, or None

Return type:

str

Returns the creator of the document. If the document was converted from another format, the creator is the name of the product that created the original document from which it was converted.

New in version 0.16.

get_form_field(id)
Parameters:

id (int) – an id of a Poppler.FormField

Returns:

a new Poppler.FormField or None if not found

Return type:

Poppler.FormField

Returns the Poppler.FormField for the given id. It must be freed with GObject.Object.unref()

get_id()
Returns:

True if the self contains an id, False otherwise

permanent_id:

location to store an allocated string, use GLib.free() to free the returned string

update_id:

location to store an allocated string, use GLib.free() to free the returned string

Return type:

(bool, permanent_id: str, update_id: str)

Returns the PDF file identifier represented as two byte string arrays of size 32. permanent_id is the permanent identifier that is built based on the file contents at the time it was originally created, so that this identifer never changes. update_id is the update identifier that is built based on the file contents at the time it was last updated.

Note that returned strings are not null-terminated, they have a fixed size of 32 bytes.

New in version 0.16.

get_keywords()
Returns:

a new allocated string containing keywords associated to self, or None

Return type:

str

Returns the keywords associated to the document

New in version 0.16.

get_metadata()
Returns:

a new allocated string containing the XML metadata, or None

Return type:

str

Returns the XML metadata string of the document

New in version 0.16.

get_modification_date()
Returns:

the date the document was most recently modified, or -1

Return type:

int

Returns the date the document was most recently modified as seconds since the Epoch

New in version 0.16.

get_modification_date_time()
Returns:

the date the document was modified, or None

Return type:

GLib.DateTime or None

Returns the date the document was most recently modified as a GLib.DateTime

New in version 20.09.0.

get_n_attachments()
Returns:

Number of attachments

Return type:

int

Returns the number of attachments in a loaded document. See also Poppler.Document.get_attachments()

New in version 0.18.

get_n_pages()
Returns:

Number of pages

Return type:

int

Returns the number of pages in a loaded document.

get_n_signatures()
Returns:

The number of signatures found in the document

Return type:

int

Returns how many digital signatures self contains. PDF digital signatures ensure that the content hash not been altered since last edit and that it was produced by someone the user can trust

New in version 21.12.0.

get_page(index)
Parameters:

index (int) – a page index

Returns:

The Poppler.Page at index

Return type:

Poppler.Page

Returns the Poppler.Page indexed at index. This object is owned by the caller.

get_page_by_label(label)
Parameters:

label (str) – a page label

Returns:

The Poppler.Page referenced by label

Return type:

Poppler.Page

Returns the Poppler.Page reference by label. This object is owned by the caller. label is a human-readable string representation of the page number, and can be document specific. Typically, it is a value such as “iii” or “3”.

By default, “1” refers to the first page.

get_page_layout()
Returns:

a Poppler.PageLayout that should be used when the document is opened

Return type:

Poppler.PageLayout

Returns the page layout that should be used when the document is opened

New in version 0.16.

get_page_mode()
Returns:

a Poppler.PageMode that should be used when document is opened

Return type:

Poppler.PageMode

Returns a Poppler.PageMode representing how the document should be initially displayed when opened.

New in version 0.16.

get_pdf_conformance()
Returns:

the document’s subtype conformance level

Return type:

Poppler.PDFConformance

Returns the conformance level of the self as Poppler.PDFConformance.

New in version 0.70.

get_pdf_part()
Returns:

the document’s subtype part

Return type:

Poppler.PDFPart

Returns the part of the conforming standard that the self adheres to as a Poppler.PDFSubtype.

New in version 0.70.

get_pdf_subtype()
Returns:

the document’s subtype

Return type:

Poppler.PDFSubtype

Returns the subtype of self as a Poppler.PDFSubtype.

New in version 0.70.

get_pdf_subtype_string()
Returns:

a newly allocated string containing the PDF subtype version of self, or None

Return type:

str or None

Returns the PDF subtype version of self as a string.

New in version 0.70.

get_pdf_version()
Returns:

major_version:

return location for the PDF major version number

minor_version:

return location for the PDF minor version number

Return type:

(major_version: int, minor_version: int)

Updates values referenced by major_version & minor_version with the major and minor PDF versions of self.

New in version 0.16.

get_pdf_version_string()
Returns:

a new allocated string containing the PDF version of self, or None

Return type:

str

Returns the PDF version of self as a string (e.g. PDF-1.6)

New in version 0.16.

get_permissions()
Returns:

a set of flags from Poppler.Permissions enumeration

Return type:

Poppler.Permissions

Returns the flags specifying which operations are permitted when the document is opened.

New in version 0.16.

get_print_duplex()
Returns:

a Poppler.PrintDuplex that should be used when document is printed

Return type:

Poppler.PrintDuplex

Returns the duplex mode value suggested for printing by author of the document. Value Poppler.PrintDuplex.NONE means that the document does not specify this preference.

New in version 0.80.

get_print_n_copies()
Returns:

Number of copies

Return type:

int

Returns the suggested number of copies to be printed. This preference should be applied only if returned value is greater than 1 since value 1 usually means that the document does not specify it.

New in version 0.80.

get_print_page_ranges()
Returns:

an array of Poppler.PageRange s or None. Free the array when it is no longer needed.

Return type:

[Poppler.PageRange]

Returns the suggested page ranges to print in the form of array of Poppler.PageRange s and number of ranges. None pointer means that the document does not specify page ranges for printing.

New in version 0.80.

get_print_scaling()
Returns:

a Poppler.PrintScaling that should be used when document is printed

Return type:

Poppler.PrintScaling

Returns the print scaling value suggested by author of the document.

New in version 0.73.

get_producer()
Returns:

a new allocated string containing the producer of self, or None

Return type:

str

Returns the producer of the document. If the document was converted from another format, the producer is the name of the product that converted it to PDF

New in version 0.16.

get_signature_fields()
Returns:

a list of all signature form fields.

Return type:

[Poppler.FormField]

Returns a GLib.List containing all signature Poppler.FormField s in the document.

New in version 22.02.0.

get_subject()
Returns:

a new allocated string containing the subject of self, or None

Return type:

str

Returns the subject of the document

New in version 0.16.

get_title()
Returns:

a new allocated string containing the title of self, or None

Return type:

str

Returns the document’s title

New in version 0.16.

has_attachments()
Returns:

True, if self has attachments.

Return type:

bool

Returns True of self has any attachments.

has_javascript()
Return type:

bool

Returns whether self has any javascript in it.

New in version 0.90.

is_linearized()
Returns:

True if self is linearized, False otherwise

Return type:

bool

Returns whether self is linearized or not. Linearization of PDF enables efficient incremental access of the PDF file in a network environment.

New in version 0.16.

reset_form(fields, exclude_fields)
Parameters:
  • fields ([str] or None) – list of fields to reset

  • exclude_fields (bool) – whether to reset all fields except those in fields

Resets the form fields specified by fields if exclude_fields is False. Resets all others if exclude_fields is True. All form fields are reset regardless of the exclude_fields flag if fields is empty.

New in version 0.90.

save(uri)
Parameters:

uri (str) – uri of file to save

Raises:

GLib.Error

Returns:

True, if the document was successfully saved

Return type:

bool

Saves self. Any change made in the document such as form fields filled, annotations added or modified will be saved. If error is set, False will be returned. Possible errors include those in the #G_FILE_ERROR domain.

save_a_copy(uri)
Parameters:

uri (str) – uri of file to save

Raises:

GLib.Error

Returns:

True, if the document was successfully saved

Return type:

bool

Saves a copy of the original self. Any change made in the document such as form fields filled by the user will not be saved. If error is set, False will be returned. Possible errors include those in the #G_FILE_ERROR domain.

save_to_fd(fd, include_changes)
Parameters:
  • fd (int) – a valid file descriptor open for writing

  • include_changes (bool) – whether to include user changes (e.g. form fills)

Raises:

GLib.Error

Returns:

True, if the document was successfully saved

Return type:

bool

Saves self. Any change made in the document such as form fields filled, annotations added or modified will be saved if include_changes is True, or discarded i include_changes is False.

Note that this function takes ownership of fd; you must not operate on it again, nor close it.

If error is set, False will be returned. Possible errors include those in the #G_FILE_ERROR domain.

New in version 21.12.0.

set_author(author)
Parameters:

author (str) – A new author

Sets the document’s author. If author is None, Author entry is removed from the document’s Info dictionary.

New in version 0.46.

set_creation_date(creation_date)
Parameters:

creation_date (int) – A new creation date

Sets the document’s creation date. If creation_date is -1, CreationDate entry is removed from the document’s Info dictionary.

New in version 0.46.

set_creation_date_time(creation_datetime)
Parameters:

creation_datetime (GLib.DateTime or None) – A new creation GLib.DateTime

Sets the document’s creation date. If creation_datetime is None, CreationDate entry is removed from the document’s Info dictionary.

New in version 20.09.0.

set_creator(creator)
Parameters:

creator (str) – A new creator

Sets the document’s creator. If creator is None, Creator entry is removed from the document’s Info dictionary.

New in version 0.46.

set_keywords(keywords)
Parameters:

keywords (str) – New keywords

Sets the document’s keywords. If keywords is None, Keywords entry is removed from the document’s Info dictionary.

New in version 0.46.

set_modification_date(modification_date)
Parameters:

modification_date (int) – A new modification date

Sets the document’s modification date. If modification_date is -1, ModDate entry is removed from the document’s Info dictionary.

New in version 0.46.

set_modification_date_time(modification_datetime)
Parameters:

modification_datetime (GLib.DateTime or None) – A new modification GLib.DateTime

Sets the document’s modification date. If modification_datetime is None, ModDate entry is removed from the document’s Info dictionary.

New in version 20.09.0.

set_producer(producer)
Parameters:

producer (str) – A new producer

Sets the document’s producer. If producer is None, Producer entry is removed from the document’s Info dictionary.

New in version 0.46.

set_subject(subject)
Parameters:

subject (str) – A new subject

Sets the document’s subject. If subject is None, Subject entry is removed from the document’s Info dictionary.

New in version 0.46.

set_title(title)
Parameters:

title (str) – A new title

Sets the document’s title. If title is None, Title entry is removed from the document’s Info dictionary.

New in version 0.46.

Property Details

Poppler.Document.props.author
Name:

author

Type:

str

Default Value:

None

Flags:

READABLE, WRITABLE

The author of the document

Poppler.Document.props.creation_date
Name:

creation-date

Type:

int

Default Value:

-1

Flags:

DEPRECATED, READABLE, WRITABLE

The date the document was created as seconds since the Epoch, or -1

Deprecated since version 20.09.0: This will overflow in 2038. Use creation-datetime instead.

Poppler.Document.props.creation_datetime
Name:

creation-datetime

Type:

GLib.DateTime

Default Value:

None

Flags:

READABLE, WRITABLE

The GLib.DateTime the document was created.

New in version 20.09.0.

Poppler.Document.props.creator
Name:

creator

Type:

str

Default Value:

None

Flags:

READABLE, WRITABLE

The creator of the document. See also Poppler.Document.get_creator()

Poppler.Document.props.format
Name:

format

Type:

str

Default Value:

None

Flags:

READABLE

The PDF version as string. See also Poppler.Document.get_pdf_version_string()

Poppler.Document.props.format_major
Name:

format-major

Type:

int

Default Value:

1

Flags:

READABLE

The PDF major version number. See also Poppler.Document.get_pdf_version()

Poppler.Document.props.format_minor
Name:

format-minor

Type:

int

Default Value:

0

Flags:

READABLE

The PDF minor version number. See also Poppler.Document.get_pdf_version()

Poppler.Document.props.keywords
Name:

keywords

Type:

str

Default Value:

None

Flags:

READABLE, WRITABLE

The keywords associated to the document

Poppler.Document.props.linearized
Name:

linearized

Type:

bool

Default Value:

False

Flags:

READABLE

Whether document is linearized. See also Poppler.Document.is_linearized()

Poppler.Document.props.metadata
Name:

metadata

Type:

str

Default Value:

None

Flags:

READABLE

Document metadata in XML format, or None

Poppler.Document.props.mod_date
Name:

mod-date

Type:

int

Default Value:

-1

Flags:

DEPRECATED, READABLE, WRITABLE

The date the document was most recently modified as seconds since the Epoch, or -1

Deprecated since version 20.09.0: This will overflow in 2038. Use mod-datetime instead.

Poppler.Document.props.mod_datetime
Name:

mod-datetime

Type:

GLib.DateTime

Default Value:

None

Flags:

READABLE, WRITABLE

The GLib.DateTime the document was most recently modified.

New in version 20.09.0.

Poppler.Document.props.page_layout
Name:

page-layout

Type:

Poppler.PageLayout

Default Value:

Poppler.PageLayout.UNSET

Flags:

READABLE

The page layout that should be used when the document is opened

Poppler.Document.props.page_mode
Name:

page-mode

Type:

Poppler.PageMode

Default Value:

Poppler.PageMode.UNSET

Flags:

READABLE

The mode that should be used when the document is opened

Poppler.Document.props.permissions
Name:

permissions

Type:

Poppler.Permissions

Default Value:

Poppler.Permissions.OK_TO_PRINT | Poppler.Permissions.OK_TO_MODIFY | Poppler.Permissions.OK_TO_COPY | Poppler.Permissions.OK_TO_ADD_NOTES | Poppler.Permissions.OK_TO_FILL_FORM | Poppler.Permissions.OK_TO_EXTRACT_CONTENTS | Poppler.Permissions.OK_TO_ASSEMBLE | Poppler.Permissions.OK_TO_PRINT_HIGH_RESOLUTION | Poppler.Permissions.FULL

Flags:

READABLE

Flags specifying which operations are permitted when the document is opened

Poppler.Document.props.print_duplex
Name:

print-duplex

Type:

Poppler.PrintDuplex

Default Value:

Poppler.PrintDuplex.NONE

Flags:

READABLE

Duplex Viewer Preference

New in version 0.80.

Poppler.Document.props.print_n_copies
Name:

print-n-copies

Type:

int

Default Value:

1

Flags:

READABLE

Suggested number of copies to be printed for this document

New in version 0.80.

Poppler.Document.props.print_scaling
Name:

print-scaling

Type:

Poppler.PrintScaling

Default Value:

Poppler.PrintScaling.APP_DEFAULT

Flags:

READABLE

Print Scaling Viewer Preference

New in version 0.73.

Poppler.Document.props.producer
Name:

producer

Type:

str

Default Value:

None

Flags:

READABLE, WRITABLE

The producer of the document. See also Poppler.Document.get_producer()

Poppler.Document.props.subject
Name:

subject

Type:

str

Default Value:

None

Flags:

READABLE, WRITABLE

The subject of the document

Poppler.Document.props.subtype
Name:

subtype

Type:

Poppler.PDFSubtype

Default Value:

Poppler.PDFSubtype.UNSET

Flags:

READABLE

Document PDF subtype type

Poppler.Document.props.subtype_conformance
Name:

subtype-conformance

Type:

Poppler.PDFConformance

Default Value:

Poppler.PDFConformance.UNSET

Flags:

READABLE

Document PDF subtype conformance

Poppler.Document.props.subtype_part
Name:

subtype-part

Type:

Poppler.PDFPart

Default Value:

Poppler.PDFPart.UNSET

Flags:

READABLE

Document PDF subtype part

Poppler.Document.props.subtype_string
Name:

subtype-string

Type:

str

Default Value:

None

Flags:

READABLE

Document PDF subtype. See also Poppler.Document.get_pdf_subtype_string()

Poppler.Document.props.title
Name:

title

Type:

str

Default Value:

None

Flags:

READABLE, WRITABLE

The document’s title or None

Poppler.Document.props.viewer_preferences
Name:

viewer-preferences

Type:

Poppler.ViewerPreferences

Default Value:

Poppler.ViewerPreferences.UNSET

Flags:

READABLE

Viewer Preferences