Soup.URI¶
Fields¶
Name |
Type |
Access |
Description |
---|---|---|---|
fragment |
r/w |
a fragment identifier within path, or |
|
host |
r/w |
the hostname or IP address, or |
|
password |
r/w |
a password, or |
|
path |
r/w |
the path on host |
|
port |
r/w |
the port number on host |
|
query |
r/w |
a query for path, or |
|
scheme |
r/w |
the URI scheme (eg, “http”) |
|
user |
r/w |
a username, or |
Methods¶
class |
|
class |
|
class |
|
class |
|
class |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Details¶
- class Soup.URI¶
A
Soup.URI
represents a (parsed) URI.Soup.URI
supports RFC 3986 (URI Generic Syntax), and can parse any valid URI. However, libsoup only uses “http” and “https” URIs internally; You can use SOUP_URI_VALID_FOR_HTTP() to test if aSoup.URI
is a valid HTTP URI.scheme will always be set in any URI. It is an interned string and is always all lowercase. (If you parse a URI with a non-lowercase scheme, it will be converted to lowercase.) The macros %SOUP_URI_SCHEME_HTTP and %SOUP_URI_SCHEME_HTTPS provide the interned values for “http” and “https” and can be compared against URI scheme values.
user and password are parsed as defined in the older URI specs (ie, separated by a colon; RFC 3986 only talks about a single “userinfo” field). Note that password is not included in the output of
Soup.URI.to_string
(). libsoup does not normally use these fields; authentication is handled viaSoup.Session
signals.host contains the hostname, and port the port specified in the URI. If the URI doesn’t contain a hostname, host will be
None
, and if it doesn’t specify a port, port may be 0. However, for “http” and “https” URIs, host is guaranteed to be non-None
(trying to parse an http URI with no host will returnNone
), and port will always be non-0 (because libsoup knows the default value to use when it is not specified in the URI).path is always non-
None
. For http/https URIs, path will never be an empty string either; if the input URI has no path, the parsedSoup.URI
will have a path of “/”.query and fragment are optional for all URI types.
Soup.form_decode
() may be useful for parsing query.Note that path, query, and fragment may contain % -encoded characters.
Soup.URI.new
() callsSoup.URI.normalize
() on them, but notSoup.URI.decode
(). This is necessary to ensure thatSoup.URI.to_string
() will generate a URI that has exactly the same meaning as the original. (In theory,Soup.URI
should leave user, password, and host partially-encoded as well, but this would be more annoying than useful.)- classmethod decode(part)¶
-
Fully % -decodes part.
In the past, this would return
None
if part contained invalid percent-encoding, but now it just ignores the problem (asSoup.URI.new
() already did).
- classmethod encode(part, escape_extra)¶
- Parameters:
- Returns:
the encoded URI part
- Return type:
This % -encodes the given URI part and returns the escaped version in allocated memory, which the caller must free when it is done.
- classmethod new(uri_string)¶
-
Parses an absolute URI.
You can also pass
None
for uri_string if you want to get back an “empty”Soup.URI
that you can fill in by hand. (You will need to call at leastSoup.URI.set_scheme
() andSoup.URI.set_path
(), since those fields are required.)
- classmethod new_with_base(base, uri_string)¶
- Parameters:
- Returns:
a parsed
Soup.URI
.- Return type:
Parses uri_string relative to base.
- classmethod normalize(part, unescape_extra)¶
- Parameters:
- Returns:
the normalized URI part
- Return type:
% -decodes any “unreserved” characters (or characters in unescape_extra) in part, and % -encodes any non-ASCII characters, spaces, and non-printing characters in part.
“Unreserved” characters are those that are not allowed to be used for punctuation according to the URI spec. For example, letters are unreserved, so
Soup.URI.normalize
() will turnhttp://example.com/foo/b%61r
intohttp://example.com/foo/bar
, which is guaranteed to mean the same thing. However, “/” is “reserved”, sohttp://example.com/foo%2Fbar
would not be changed, because it might mean something different to the server.In the past, this would return
None
if part contained invalid percent-encoding, but now it just ignores the problem (asSoup.URI.new
() already did).
- copy()¶
- Returns:
a copy of self, which must be freed with
Soup.URI.free
()- Return type:
Copies self
- copy_host()¶
-
Makes a copy of self, considering only the protocol, host, and port
New in version 2.28.
- equal(uri2)¶
-
Tests whether or not self and uri2 are equal in all parts
- free()¶
Frees self.
- get_fragment()¶
- Returns:
self's fragment.
- Return type:
Gets self's fragment.
New in version 2.32.
- get_password()¶
- Returns:
self's password.
- Return type:
Gets self's password.
New in version 2.32.
- host_equal(v2)¶
- Parameters:
- Returns:
whether or not the URIs are equal in scheme, host, and port.
- Return type:
Compares self and v2, considering only the scheme, host, and port.
New in version 2.28.
- host_hash()¶
- Returns:
a hash
- Return type:
Hashes self, considering only the scheme, host, and port.
New in version 2.28.
- set_fragment(fragment)¶
-
Sets self's fragment to fragment.
- set_host(host)¶
-
Sets self's host to host.
If host is an IPv6 IP address, it should not include the brackets required by the URI syntax; they will be added automatically when converting self to a string.
http and https URIs should not have a
None
host.
- set_password(password)¶
-
Sets self's password to password.
- set_port(port)¶
- Parameters:
port (
int
) – the port, or 0
Sets self's port to port. If port is 0, self will not have an explicitly-specified port.
- set_query_from_form(form)¶
- Parameters:
form ({
str
:str
}) – aGLib.HashTable
containing HTML form information
Sets self's query to the result of encoding form according to the HTML form rules. See
Soup.form_encode_hash
() for more information.
- set_scheme(scheme)¶
- Parameters:
scheme (
str
) – the URI scheme
Sets self's scheme to scheme. This will also set self's port to the default port for scheme, if known.
- to_string(just_path_and_query)¶
- Parameters:
just_path_and_query (
bool
) – ifTrue
, output just the path and query portions- Returns:
a string representing self, which the caller must free.
- Return type:
Returns a string representing self.
If just_path_and_query is
True
, this concatenates the path and query together. That is, it constructs the string that would be needed in the Request-Line of an HTTP request for self.Note that the output will never contain a password, even if self does.