Module rok4.storage
Provide functions to read or write data
Available storage types are :
- S3 (path are preffixed with
s3://
) - CEPH (path are prefixed with
ceph://
) - FILE (path are prefixed with
file://
, but it is the default paths' interpretation) - HTTP (path are prefixed with
http://
) - HTTPS (path are prefixed with
https://
)
According to functions, all storage types are not necessarily available.
Readings uses a LRU cache system with a TTL. It's possible to configure it with environment variables :
- ROK4_READING_LRU_CACHE_SIZE : Number of cached element. Default 64. Set 0 or a negative integer to configure a cache without bound. A power of two make cache more efficient.
- ROK4_READING_LRU_CACHE_TTL : Validity duration of cached element, in seconds. Default 300. 0 or negative integer to get cache without expiration date.
To disable cache (always read data on storage), set ROK4_READING_LRU_CACHE_SIZE to 1 and ROK4_READING_LRU_CACHE_TTL to 1.
Using CEPH storage requires environment variables :
- ROK4_CEPH_CONFFILE
- ROK4_CEPH_USERNAME
- ROK4_CEPH_CLUSTERNAME
Using S3 storage requires environment variables :
- ROK4_S3_KEY
- ROK4_S3_SECRETKEY
- ROK4_S3_URL
To use several S3 clusters, each environment variable have to contain a list (comma-separated), with the same number of elements
Example, work with 2 S3 clusters:
- ROK4_S3_KEY=KEY1,KEY2
- ROK4_S3_SECRETKEY=SKEY1,SKEY2
- ROK4_S3_URL=https://s3.storage.fr,https://s4.storage.fr
To precise the cluster to use, bucket name should be bucket_name@s3.storage.fr or bucket_name@s4.storage.fr. If no host is defined (no @) in the bucket name, first S3 cluster is used
Functions
def copy(from_path: str, to_path: str, from_md5: str = None) ‑> None
-
Copy a file or object to a file or object place. If MD5 sum is provided, it is compared to sum after the copy.
Args
from_path
:str
- source file/object path, to copy
to_path
:str
- destination file/object path
from_md5
:str
, optional- MD5 sum, re-processed after copy and controlled. Defaults to None.
Raises
StorageError
- Copy issue
MissingEnvironmentError
- Missing object storage informations
NotImplementedError
- Storage type not handled
def disconnect_ceph_clients() ‑> None
-
Clean CEPH clients
def disconnect_s3_clients() ‑> None
-
Clean S3 clients
def exists(path: str) ‑> bool
-
Do the file or object exist ?
Args
path
:str
- path of file/object to test
Raises
MissingEnvironmentError
- Missing object storage informations
StorageError
- Storage read issue
NotImplementedError
- Storage type not handled
Returns
bool
- file/object existing status
def get_data_binary(path: str, range: Tuple[int, int] = None) ‑> str
-
Load data into a binary string
This function uses a LRU cache, with a TTL of 5 minutes
Args
path
:str
- path to data
range
:Tuple[int, int]
, optional- offset and size, to make a partial read. Defaults to None.
Raises
MissingEnvironmentError
- Missing object storage informations
StorageError
- Storage read issue
FileNotFoundError
- File or object does not exist
NotImplementedError
- Storage type not handled
Returns
str
- Data binary content
def get_data_str(path: str) ‑> str
-
Load full data into a string
Args
path
:str
- path to data
Raises
MissingEnvironmentError
- Missing object storage informations
StorageError
- Storage read issue
FileNotFoundError
- File or object does not exist
NotImplementedError
- Storage type not handled
Returns
str
- Data content
def get_infos_from_path(path: str) ‑> Tuple[StorageType, str, str, str]
-
Extract storage type, the unprefixed path, the container and the basename from path (Default: FILE storage)
For a FILE storage, the tray is the directory and the basename is the file name.
For an object storage (CEPH or S3), the tray is the bucket or the pool and the basename is the object name. For a S3 bucket, format can be
@ to use several clusters. Cluster name is the host (without protocol) Args
path
:str
- path to analyse
Returns
Tuple[StorageType, str, str, str]
- storage type, unprefixed path, the container and the basename
def get_osgeo_path(path: str) ‑> str
-
Return GDAL/OGR Open compliant path and configure storage access
For a S3 input path, endpoint, access and secret keys are set and path is built with "/vsis3" root.
For a FILE input path, only storage prefix is removed
Args
path
:str
- Source path
Raises
NotImplementedError
- Storage type not handled
Returns
str
- GDAL/OGR Open compliant path
def get_path_from_infos(storage_type: StorageType, *args) ‑> str
-
Write full path from elements
Prefixed wih storage's type, elements are joined with a slash
Args
storage_type
:StorageType
- Storage's type for path
Returns
str
- Full path
def get_size(path: str) ‑> int
-
Get size of file or object
Args
path
:str
- path of file/object whom size is asked
Raises
MissingEnvironmentError
- Missing object storage informations
StorageError
- Storage read issue
NotImplementedError
- Storage type not handled
Returns
int
- file/object size, in bytes
def hash_file(path: str) ‑> str
-
Process MD5 sum of the provided file
Args
path
:str
- path to file
Returns
str
- hexadeimal MD5 sum
def link(target_path: str, link_path: str, hard: bool = False) ‑> None
-
Create a symbolic link
Args
target_path
:str
- file/object to link
link_path
:str
- link to create
hard
:bool
, optional- hard link rather than symbolic. Only for FILE storage. Defaults to False.
Raises
StorageError
- link issue
MissingEnvironmentError
- Missing object storage informations
NotImplementedError
- Storage type not handled
def put_data_str(data: str, path: str) ‑> None
-
Store string data into a file or an object
UTF-8 encoding is used for bytes conversion
Args
data
:str
- data to write
path
:str
- destination path, where to write data
Raises
MissingEnvironmentError
- Missing object storage informations
StorageError
- Storage write issue
NotImplementedError
- Storage type not handled
def remove(path: str) ‑> None
-
Remove the file/object
Args
path
:str
- path of file/object to remove
Raises
MissingEnvironmentError
- Missing object storage informations
StorageError
- Storage removal issue
NotImplementedError
- Storage type not handled
def size_path(path: str) ‑> int
-
Return the size of the given path (or, for the CEPH, the sum of the size of each object of the .list)
Args
path
:str
- Source path
Raises
StorageError
- Unhandled link or link issue
MissingEnvironmentError
- Missing object storage informations
NotImplementedError
- Storage type not handled
Returns
int
- size of the path