ExtractTracker - Python¶
Below is a deeper dive of the capabilities of the Python implementation ExtractTracker submodule. Note that ProcessTracker MUST be used in conjunction with ExtractTracker.
Registering Extracts¶
Once the process run has been registered, an extract can be registered, provided the following variables are set.:
process_run = ProcessTracker(process_name='Lahman Teams Load'
, process_type='Stage Load'
, actor_name='New ProcessTracker User'
, tool_name='Spark'
, source_name='Lahman Baseball Dataset')
extract = ExtractTracker(process_run=process_run
, filename='Teams.csv'
, location_name='Lahman Baseball Databank 2018'
, location_path='~/baseballdatabank-master_2018-03-28/baseballdatabank-master/core/')
Those variables will be used to populate the data store backend as explained in the following table:
Variable Name | Variable Description | Reference Object | Object Created If Not Exist? |
---|---|---|---|
process_run | An instance of ProcessTracker | Process Tracking | No |
filename | The extract file’s filename | Extract Tracking | Yes |
location | An instance of Extract Location, optional if created already. | Location | No |
location_name | The given name of the location. Optional. | Location | Yes |
location_path | The filepath of the location. Required if location instance not provided. | Location | Yes |
status | The extract file status. Optional. | Extract Status | Yes |
Changing Extract Status¶
As extract files are used within a process run, their status will need to be modified.:
extract.change_extract_status(status='loading')
Custom extract status can be entered, but the default status types must be used for ProcessTracker to know what to do with files. As long as the file’s status is eventually changed to one of those then the process flow will continue.