Skip to content

Create new Extract validator #50

Description

@ckunki

Background of Current Implementation, Additional improvements and Features

  • Check for file manifest.json to be extracted from the SLC
    • (appended as last file to enable reliable detection of complete extraction)
    • check the file to exist on all nodes of the database cluster
  • Support a callback function for updates regarding validation in progress
  • More robust tests
    • mocking time.monotonic() in tests involving tenacity Retrys.
    • Injecting Validator as a separate object to LanguageContainerDeployer
  • Support configuring timeouts in production usage and via CLI

Design

  • Create method in PEC language_container_deployer.py
  • Use method as_udf_path()
  • add an optional call-back for reporting progress (n of m succeeded)
  • Use Context Manager for database schema: with temp_schema(self._pyexasol_conn) as schema:
  • Create a UDF in this schema
  • Create a busy loop with tenacity, incl. number of retries or a global timeout
    • Run UDF on all nodes in parallel using GROUP BY IPROC()
    • Each UDF reports if the expected file exists
    • Aggregate all UDF results
    • total success requires all udfs to succeed

Path Construction in language_container_deployer.py

name = re.sub(r"\.(tar|tar\.gz|zip|gzip)$", "", file_path.name)
manifest = file_path.parent / name / "exasol-manifest.json"
udf_path = manifest.as_udf_path()

Proposal for UDF

--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT manifest(my_path VARCHAR(256)) RETURNS BOOL AS
import os
def run(ctx):
        return os.path.isfile(ctx.my_path)
/

SELECT statement

After uploading file sample.tar.gz:

SELECT iproc() "Node", manifest('/buckets/uploads/default/sample/exasol-manifest.json') "Manifest"
from values between 0 and 5 group by iproc();
NODE MANIFEST
0 true
1 true
2 true
3 true

Retrying a code block with tenacity

try:
    for attempt in Retrying(stop=stop_after_attempt(3)):
        with attempt:
            raise Exception(‘My code is failing!’)
except RetryError:
    pass

Metadata

Metadata

Assignees

Labels

featureProduct feature

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions