Downloader API¶
Downloading¶
wpextract.WPDownloader
¶
WPDownloader(
target: str,
out_path: Path,
data_types: list[str],
session: Optional[RequestSession] = None,
json_prefix: Optional[str] = None,
)
Manages the download of data from a WordPress site.
| PARAMETER | DESCRIPTION |
|---|---|
target |
the target WordPress site URL
TYPE:
|
out_path |
the output path for the downloaded data
TYPE:
|
data_types |
set of data types to download |
session |
request session. Will be created from default constructor if not provided.
TYPE:
|
json_prefix |
prefix to prepend to JSON file names |
download_media_files
¶
download_media_files(
session: RequestSession, dest: Path
) -> None
Download site media files.
| PARAMETER | DESCRIPTION |
|---|---|
session |
the request session to use
TYPE:
|
dest |
destination directory for media
TYPE:
|
Configuring Request Behaviour¶
wpextract.download.RequestSession
¶
RequestSession(
proxy: Optional[str] = None,
cookies: Optional[str] = None,
authorization: Optional[AuthorizationType] = None,
timeout: Optional[float] = 30,
wait: Optional[float] = None,
random_wait: bool = False,
max_retries: int = 10,
backoff_factor: float = 0.1,
max_redirects: int = 20,
user_agent: Optional[str] = None,
)
Manages HTTP requests and their behaviour.
| PARAMETER | DESCRIPTION |
|---|---|
proxy |
a dict containing a proxy server string for HTTP and/or HTTPS connection |
cookies |
a string in the format of the Cookie header |
authorization |
a tuple containing login and password or
TYPE:
|
timeout |
maximum time in seconds to wait for a response before giving up |
wait |
wait time in seconds between requests, None to not wait |
random_wait |
If true, the wait time between requests is multiplied by a random factor between 0.5 and 1.5
TYPE:
|
max_retries |
the maximum number of retries before failing
TYPE:
|
backoff_factor |
Factor to wait between successive retries
TYPE:
|
max_redirects |
maximum number of redirects to follow
TYPE:
|
user_agent |
User agent to use for requests. Set to |
wpextract.download.AuthorizationType
module-attribute
¶
AuthorizationType = Union[
tuple[str, str], HTTPBasicAuth, HTTPDigestAuth
]
wpextract.download.requestsession.DEFAULT_UA
module-attribute
¶
DEFAULT_UA = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"