Skip to content

Downloader API

Downloading

wpextract.WPDownloader

WPDownloader(
    target: str,
    out_path: Path,
    data_types: list[str],
    session: Optional[RequestSession] = None,
    json_prefix: Optional[str] = None,
)

Manages the download of data from a WordPress site.

PARAMETER DESCRIPTION
target

the target WordPress site URL

TYPE: str

out_path

the output path for the downloaded data

TYPE: Path

data_types

set of data types to download

TYPE: list[str]

session

request session. Will be created from default constructor if not provided.

TYPE: Optional[RequestSession] DEFAULT: None

json_prefix

prefix to prepend to JSON file names

TYPE: Optional[str] DEFAULT: None

download

download() -> None

Download and export the requested data lists.

download_media_files

download_media_files(
    session: RequestSession, dest: Path
) -> None

Download site media files.

PARAMETER DESCRIPTION
session

the request session to use

TYPE: RequestSession

dest

destination directory for media

TYPE: Path

Configuring Request Behaviour

wpextract.download.RequestSession

RequestSession(
    proxy: Optional[str] = None,
    cookies: Optional[str] = None,
    authorization: Optional[AuthorizationType] = None,
    timeout: Optional[float] = 30,
    wait: Optional[float] = None,
    random_wait: bool = False,
    max_retries: int = 10,
    backoff_factor: float = 0.1,
    max_redirects: int = 20,
    user_agent: Optional[str] = None,
)

Manages HTTP requests and their behaviour.

PARAMETER DESCRIPTION
proxy

a dict containing a proxy server string for HTTP and/or HTTPS connection

TYPE: Optional[str] DEFAULT: None

cookies

a string in the format of the Cookie header

TYPE: Optional[str] DEFAULT: None

authorization

a tuple containing login and password or requests.auth.HTTPBasicAuth for basic authentication or requests.auth.HTTPDigestAuth for NTLM-like authentication

TYPE: Optional[AuthorizationType] DEFAULT: None

timeout

maximum time in seconds to wait for a response before giving up

TYPE: Optional[float] DEFAULT: 30

wait

wait time in seconds between requests, None to not wait

TYPE: Optional[float] DEFAULT: None

random_wait

If true, the wait time between requests is multiplied by a random factor between 0.5 and 1.5

TYPE: bool DEFAULT: False

max_retries

the maximum number of retries before failing

TYPE: int DEFAULT: 10

backoff_factor

Factor to wait between successive retries

TYPE: float DEFAULT: 0.1

max_redirects

maximum number of redirects to follow

TYPE: int DEFAULT: 20

user_agent

User agent to use for requests. Set to DEFAULT_UA by default.

TYPE: Optional[str] DEFAULT: None

wpextract.download.AuthorizationType module-attribute

AuthorizationType = Union[
    tuple[str, str], HTTPBasicAuth, HTTPDigestAuth
]

wpextract.download.requestsession.DEFAULT_UA module-attribute

DEFAULT_UA = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"