Conversation
be97119 to
08f91e2
Compare
17fe956 to
80691c3
Compare
|
The automated tests on conda environments currently fail because of an outdate version of the |
80691c3 to
701678b
Compare
| try: | ||
| os.remove("/tmp/dummy") | ||
| except: # pylint: disable=broad-exception-caught | ||
| pass | ||
| fs.copy_to_local( | ||
| f"{proto}://{bucket_name}/khiops-cicd/samples/{file}", "/tmp/dummy" | ||
| f"{proto}://{bucket_name_or_storage_prefix}/khiops-cicd/" | ||
| f"samples/{file}", | ||
| "/tmp/dummy", | ||
| ) | ||
| # We cannot use `unittest.TestCase.assertTrue` in a class method | ||
| assert os.path.isfile("/tmp/dummy") |
There was a problem hiding this comment.
- I don't get why this new code was added
- If it stays I'd refactor it as:
- Before the loop create
/tmp/khiops-dummy - Save each file to
/tmp/khiops-dummy/{file} - Do not use
assertas it is not a programming error, I think it should be something likeif not os.path.isfile(...): raise RuntimeError - Clean
/tmp/khiops-dummyafterwards
This avoids thetry/exceptwithpassin the middle.
- Before the loop create
There was a problem hiding this comment.
During the development phase when fs.copy_to_local was an empty method, the test would pass even if the file download was not successful. This part had to be improved to assess the file is correctly downloaded (without any error and with the local file creation).
Following your suggestions I improved a little bit the snippet
| def should_skip_in_a_conda_env(self): | ||
| # The S3 driver is now released for conda too. | ||
| # No need to skip the tests any longer in a conda environment | ||
| return False | ||
|
|
||
| def config_exists(self): |
There was a problem hiding this comment.
I think these elimination should go in a different commit.
| class AzureStorageResourceMixin: | ||
| """Azure compatible Storage Resource Mixin |
There was a problem hiding this comment.
I think all these docs should go in the classes AzureStorage*Resource.
In the docstring of this class they should point to those, or it could be duplicated.
Also I think we should add an URL to a relevant Azure storage page.
701678b to
19aa231
Compare
folmos-at-orange
left a comment
There was a problem hiding this comment.
I think this is an opportunity to have better docs on file system.
Currently, very little is documented, and we now have 3 different remote systems of which only one (S3) is well documented.
I'd a section for each remote system in the module's docstring. Each section contains how to configure it in each case, and a URI example.
|
|
||
|
|
||
| class AzureStorageResourceMixin: | ||
| """Azure compatible Storage Resource Mixin""" |
There was a problem hiding this comment.
I'd add in the body, See `AzureStorageFileResource` and `AzureBlobResource` for more details.
| """ | ||
| Check if the target resource exists (file or directory) | ||
| """ |
There was a problem hiding this comment.
Move this to the first docstring line to keep the codebase style.
|
|
||
| def remove(self): | ||
| """ | ||
| Remove the target resource either it is a file or a directory. |
There was a problem hiding this comment.
Move this to the first docstring line to keep the codebase style.
|
|
||
| def list_dir(self): | ||
| """ | ||
| List the files of the current directory |
There was a problem hiding this comment.
Move this to the first docstring line to keep the codebase style.
- Among all the Azure storages, Khiops supports only (via its specific driver) "Files" and "Blobs" (Binary Large Objects) - The only supported authentication method is currently the `AZURE_STORAGE_CONNECTION_STRING` embedding the account name and account key
19aa231 to
4dc0953
Compare
| "<container name>/<0 to n virtual folder name(s)>/<blob name>" | ||
| ) | ||
|
|
||
| if __class__.is_netloc_of_file_share(self.uri_info.netloc): |
There was a problem hiding this comment.
This if / else should be split and dispatched to the __init__ methods of the relevant Azure storage class IMHO.
| # Create the authentication object using the connection string | ||
| connection_string = os.environ.get("AZURE_STORAGE_CONNECTION_STRING") | ||
| mappings = utils.parse_connection_string(connection_string) | ||
| creds = credentials.AzureNamedKeyCredential( |
There was a problem hiding this comment.
I would put this in an attribute named credentials, to be reused in the per-storage Azure classes directly (also see the if / else split comment below).
| return create_resource(parent_uri_info(self.uri_info).geturl()) | ||
|
|
||
| @staticmethod | ||
| def _splitall(path): |
There was a problem hiding this comment.
I would turn this method in a per-instance method, because it is only used on an instance attribute:
def _uri_parts(self):
allparts = []
path = self.uri_info.path
while True:
# Always split on "/" as `path` is a URI
parts = path.split("/")
if parts[0] == path: # sentinel for absolute paths
allparts.insert(0, parts[0])
break
elif parts[1] == path: # sentinel for relative paths
allparts.insert(0, parts[1])
break
else:
path = parts[0]
allparts.insert(0, parts[1])
return allpartsThen, I would also create a uri_parts property:
@property
def uri_parts(self):
return self._uri_parts()and use it instead of parts, in the per-storage type Azure classes (also see the if / else split comment).
| return allparts | ||
|
|
||
| @staticmethod | ||
| def is_netloc_of_file_share(netloc): |
There was a problem hiding this comment.
Once the if / else in the mixin's __init__ has been split to the per-storage classes, this method is no longer needed: in create_resource (its only other usage besides the if / else mentioned above), we can use netloc.endswith("...") directly.
| # (via its specific driver): | ||
| # - Files | ||
| # - Blobs (Binary Large Objects) | ||
| if AzureStorageResourceMixin.is_netloc_of_file_share(uri_info.netloc): |
There was a problem hiding this comment.
Use netloc.endswith directly (see other comments related to this).
| except Exception as delete_error: # pylint: disable=broad-exception-caught | ||
| delete_directory_error = delete_error | ||
|
|
||
| raise delete_directory_error or delete_file_error |
There was a problem hiding this comment.
I would rewrite all this method thus:
file_share_client = self.azure_share_client.get_file_client(
self.relative_remaining_path
)
# Attempt to delete the file first; if this fails, attempt to delete the directory; fail with the Azure
# exception if this fails
try:
file_share_client.delete_file()
except Exception: # pylint: disable=broad-exception-caught
directory_share_client = self.azure_share_client.get_directory_client(
self.relative_remaining_path
)
directory_share_client.delete_directory()| with open(local_path) as input_file: | ||
| file_share_client.upload_file(input_file.read()) |
There was a problem hiding this comment.
Why not use file_share_client.upload_file(read(local_path)) directly?
| self.relative_remaining_path | ||
| ) | ||
| with open(local_path, "wb") as output_file: | ||
| data = file_share_client.download_file() |
There was a problem hiding this comment.
Move this line above the with (thus out of its scope)?
| with open(local_path) as input_file: | ||
| self.azure_blob_client.upload_blob(input_file.read(), overwrite=True) |
There was a problem hiding this comment.
Why not use self.azure_blob_client.upload_blob(read(local_path), overwrite=True) directly?
|
|
||
| def copy_to_local(self, local_path): | ||
| with open(local_path, "wb") as output_file: | ||
| data = self.azure_blob_client.download_blob() |
There was a problem hiding this comment.
I'd move this line above the with line (thus out of its scope).
| $CONDA install -y -n py${version}_conda \ | ||
| khiops-driver-s3==${KHIOPS_S3_DRIVER_REVISION} \ | ||
| khiops-driver-gcs==${KHIOPS_GCS_DRIVER_REVISION}; \ | ||
| # hardcoded version because the latest version is not released on conda-forge \ |
There was a problem hiding this comment.
Precede this comment text with XXX in order to better locate this for future removal.
| && wget -O khiops-gcs.deb https://github.com/KhiopsML/khiopsdriver-gcs/releases/download/${KHIOPS_GCS_DRIVER_REVISION}/khiops-driver-gcs_${KHIOPS_GCS_DRIVER_REVISION}-1-bookworm.amd64.deb \ | ||
| && wget -O khiops-s3.deb https://github.com/KhiopsML/khiopsdriver-s3/releases/download/${KHIOPS_S3_DRIVER_REVISION}/khiops-driver-s3_${KHIOPS_S3_DRIVER_REVISION}-1-bookworm.amd64.deb \ | ||
| && (dpkg -i --force-all khiops-gcs.deb khiops-s3.deb || true) \ | ||
| # There is a typo in the latest release tag. Until it is fixed, we need to use the hardcoded folder name |
There was a problem hiding this comment.
Precede comment text with XXX .
| $CONDA install -y -n py${version}_conda \ | ||
| khiops-driver-s3==${KHIOPS_S3_DRIVER_REVISION} \ | ||
| khiops-driver-gcs==${KHIOPS_GCS_DRIVER_REVISION}; \ | ||
| # hardcoded version because the latest version is not released on conda-forge \ |
There was a problem hiding this comment.
Precede comment with XXX.
| && wget -O khiops-gcs.deb https://github.com/KhiopsML/khiopsdriver-gcs/releases/download/${KHIOPS_GCS_DRIVER_REVISION}/khiops-driver-gcs_${KHIOPS_GCS_DRIVER_REVISION}-1-${VERSION_CODENAME}.amd64.deb \ | ||
| && wget -O khiops-s3.deb https://github.com/KhiopsML/khiopsdriver-s3/releases/download/${KHIOPS_S3_DRIVER_REVISION}/khiops-driver-s3_${KHIOPS_S3_DRIVER_REVISION}-1-${VERSION_CODENAME}.amd64.deb \ | ||
| && (dpkg -i --force-all khiops-gcs.deb khiops-s3.deb || true) \ | ||
| # There is a typo in the latest release tag. Until it is fixed, we need to use the hardcoded folder name |
There was a problem hiding this comment.
Precede comment with XXX.
| f"samples/{file}", | ||
| f"/tmp/khiops-{proto}-dummy", | ||
| ) | ||
| # We cannot use `unittest.TestCase.assertTrue` in a class method |
There was a problem hiding this comment.
I'd remove this comment from here.
AZURE_STORAGE_CONNECTION_STRINGembedding the account name and account keyCompletes #244
TODO Before Asking for a Review
main(ormain-v10)Unreleasedsection ofCHANGELOG.md(no date)index.html