Module rok4.storage

Provide functions to read or write data

Available storage types are :

  • S3 (path are preffixed with s3://)
  • CEPH (path are prefixed with ceph://)
  • FILE (path are prefixed with file://, but it is the default paths' interpretation)
  • HTTP (path are prefixed with http://)
  • HTTPS (path are prefixed with https://)

According to functions, all storage types are not necessarily available.

Readings uses a LRU cache system with a TTL. It's possible to configure it with environment variables :

  • ROK4_READING_LRU_CACHE_SIZE : Number of cached element. Default 64. Set 0 or a negative integer to configure a cache without bound. A power of two make cache more efficient.
  • ROK4_READING_LRU_CACHE_TTL : Validity duration of cached element, in seconds. Default 300. 0 or negative integer to get cache without expiration date.

To disable cache (always read data on storage), set ROK4_READING_LRU_CACHE_SIZE to 1 and ROK4_READING_LRU_CACHE_TTL to 1.

Using CEPH storage requires environment variables :

  • ROK4_CEPH_CONFFILE
  • ROK4_CEPH_USERNAME
  • ROK4_CEPH_CLUSTERNAME

Using S3 storage requires environment variables :

  • ROK4_S3_KEY
  • ROK4_S3_SECRETKEY
  • ROK4_S3_URL

To use several S3 clusters, each environment variable have to contain a list (comma-separated), with the same number of elements

Example, work with 2 S3 clusters:

To precise the cluster to use, bucket name should be bucket_name@s3.storage.fr or bucket_name@s4.storage.fr. If no host is defined (no @) in the bucket name, first S3 cluster is used

Functions

def copy(from_path: str, to_path: str, from_md5: str = None) ‑> None

Copy a file or object to a file or object place. If MD5 sum is provided, it is compared to sum after the copy.

Args

from_path : str
source file/object path, to copy
to_path : str
destination file/object path
from_md5 : str, optional
MD5 sum, re-processed after copy and controlled. Defaults to None.

Raises

StorageError
Copy issue
MissingEnvironmentError
Missing object storage informations
NotImplementedError
Storage type not handled
def disconnect_ceph_clients() ‑> None

Clean CEPH clients

def disconnect_s3_clients() ‑> None

Clean S3 clients

def exists(path: str) ‑> bool

Do the file or object exist ?

Args

path : str
path of file/object to test

Raises

MissingEnvironmentError
Missing object storage informations
StorageError
Storage read issue
NotImplementedError
Storage type not handled

Returns

bool
file/object existing status
def get_data_binary(path: str, range: Tuple[int, int] = None) ‑> str

Load data into a binary string

This function uses a LRU cache, with a TTL of 5 minutes

Args

path : str
path to data
range : Tuple[int, int], optional
offset and size, to make a partial read. Defaults to None.

Raises

MissingEnvironmentError
Missing object storage informations
StorageError
Storage read issue
FileNotFoundError
File or object does not exist
NotImplementedError
Storage type not handled

Returns

str
Data binary content
def get_data_str(path: str) ‑> str

Load full data into a string

Args

path : str
path to data

Raises

MissingEnvironmentError
Missing object storage informations
StorageError
Storage read issue
FileNotFoundError
File or object does not exist
NotImplementedError
Storage type not handled

Returns

str
Data content
def get_infos_from_path(path: str) ‑> Tuple[StorageType, str, str, str]

Extract storage type, the unprefixed path, the container and the basename from path (Default: FILE storage)

For a FILE storage, the tray is the directory and the basename is the file name.

For an object storage (CEPH or S3), the tray is the bucket or the pool and the basename is the object name. For a S3 bucket, format can be @ to use several clusters. Cluster name is the host (without protocol)

Args

path : str
path to analyse

Returns

Tuple[StorageType, str, str, str]
storage type, unprefixed path, the container and the basename
def get_osgeo_path(path: str) ‑> str

Return GDAL/OGR Open compliant path and configure storage access

For a S3 input path, endpoint, access and secret keys are set and path is built with "/vsis3" root.

For a FILE input path, only storage prefix is removed

Args

path : str
Source path

Raises

NotImplementedError
Storage type not handled

Returns

str
GDAL/OGR Open compliant path
def get_path_from_infos(storage_type: StorageType, *args) ‑> str

Write full path from elements

Prefixed wih storage's type, elements are joined with a slash

Args

storage_type : StorageType
Storage's type for path

Returns

str
Full path
def get_size(path: str) ‑> int

Get size of file or object

Args

path : str
path of file/object whom size is asked

Raises

MissingEnvironmentError
Missing object storage informations
StorageError
Storage read issue
NotImplementedError
Storage type not handled

Returns

int
file/object size, in bytes
def hash_file(path: str) ‑> str

Process MD5 sum of the provided file

Args

path : str
path to file

Returns

str
hexadeimal MD5 sum

Create a symbolic link

Args

target_path : str
file/object to link
link_path : str
link to create
hard : bool, optional
hard link rather than symbolic. Only for FILE storage. Defaults to False.

Raises

StorageError
link issue
MissingEnvironmentError
Missing object storage informations
NotImplementedError
Storage type not handled
def put_data_str(data: str, path: str) ‑> None

Store string data into a file or an object

UTF-8 encoding is used for bytes conversion

Args

data : str
data to write
path : str
destination path, where to write data

Raises

MissingEnvironmentError
Missing object storage informations
StorageError
Storage write issue
NotImplementedError
Storage type not handled
def remove(path: str) ‑> None

Remove the file/object

Args

path : str
path of file/object to remove

Raises

MissingEnvironmentError
Missing object storage informations
StorageError
Storage removal issue
NotImplementedError
Storage type not handled
def size_path(path: str) ‑> int

Return the size of the given path (or, for the CEPH, the sum of the size of each object of the .list)

Args

path : str
Source path

Raises

StorageError
Unhandled link or link issue
MissingEnvironmentError
Missing object storage informations
NotImplementedError
Storage type not handled

Returns

int
size of the path