ExtractTracker - Python¶

Below is a deeper dive of the capabilities of the Python implementation ExtractTracker submodule. Note that ProcessTracker MUST be used in conjunction with ExtractTracker.

Registering Extracts¶

Once the process run has been registered, an extract can be registered, provided the following variables are set.:

process_run = ProcessTracker(process_name='Lahman Teams Load'
                                     , process_type='Stage Load'
                                     , actor_name='New ProcessTracker User'
                                     , tool_name='Spark'
                                     , source_name='Lahman Baseball Dataset')

extract = ExtractTracker(process_run=process_run
                              , filename='Teams.csv'
                              , location_name='Lahman Baseball Databank 2018'
                              , location_path='~/baseballdatabank-master_2018-03-28/baseballdatabank-master/core/')

Those variables will be used to populate the data store backend as explained in the following table:

ExtractTracker object initialization variables¶
Variable Name	Variable Description	Reference Object	Object Created If Not Exist?
process_run	An instance of ProcessTracker	Process Tracking	No
filename	The extract file’s filename	Extract Tracking	Yes
location	An instance of Extract Location, optional if created already.	Location	No
location_name	The given name of the location. Optional.	Location	Yes
location_path	The filepath of the location. Required if location instance not provided.	Location	Yes
status	The extract file status. Optional.	Extract Status	Yes

Changing Extract Status¶

As extract files are used within a process run, their status will need to be modified.:

extract.change_extract_status(status='loading')

Custom extract status can be entered, but the default status types must be used for ProcessTracker to know what to do with files. As long as the file’s status is eventually changed to one of those then the process flow will continue.