Backdated Liberator data ingestion
This section describes how a previous days Liberator dataset can be ingested onto the data platform.
1. Disable Cloudwatch s3 event trigger (optional)
The company who owns the dataset (Farthest Gate) uploads a zipped SQL dump of the whole database to S3 bucket in the data platform production AWS account at the path s3://dataplatform-prod-liberator-data-storage/parking/
.
The file needs to be named liberator_dump_210604.zip
where the date stamp at the end is the current date.
If this file has not been uploaded then you will need to obtain it from Farthest Gate and upload manually. If this is the case then you will also need to disable the cloudwatch trigger before uploading
Open Cloudwatch and search for liberator-to-rds-snapshot-event
and disable the trigger
2. Trigger the ECS task to start the backdated ingestion process
- Open ECS and in
Task Definitions
selectliberator-to-rds-snapshot
- Click on the
Actions
dropdown and selectRun Task
- This next page includes a lot of options, unless specified below leave all options as they are:
- Cluster:
dataplatform-prod-workers
- Launch Type:
FARGATE
- Operating System Family:
Linux
- Cluster VPC: Select the one VPC available
- Subnets: Select all subnets available one by one
- Auto-assign public IP:
DISABLED
- Expand the
Advanced options
section:- In
Container Overrides
expand theliberator-to-rds-snapshot
section:- Click on
Add Environment Variable
- Set the key to
IMPORT_DATE_OVERRIDE
- Set the value to the date used in the liberator dump file you wish to ingest. This date must be in the following format: yyyy-mm-dd eg. 2022-06-01
- Click on
- In
- Cluster:
- Click
Run Task
at the bottom of the page
3. Automated ingestion of backdated liberator data starts
From here the ingestion is fully automated and closely follows the normal liberator ingestion process detailed in Liberator Ingestion Process
The difference between the normal liberator process and this one is that the date used to ingest from is taken from the IMPORT_DATE_OVERIDE
environment
variable that you input in step 2 instead of always assuming we are ingesting data for the current day
This process triggers a backdated glue workflow that will transfer the data from the landing
zone into the raw zone in the correct partition. This glue workflow is called parking-liberator-backdated-data-workflow
When finished the backdated liberator data will be ingested into the raw zone
in the partition date that matches the IMPORT_DATE_OVERIDE
environment variable that you input in step 2
4. Enable Cloudwatch s3 event trigger (optional)
Please enable the cloudwatch trigger that you disabled in step 1. This is very important otherwise the next days liberator ingestion will not happen