Skip to main content

Backdated Liberator data ingestion

This section describes how a previous days Liberator dataset can be ingested onto the data platform.

1. Disable Cloudwatch s3 event trigger (optional)

The company who owns the dataset (Farthest Gate) uploads a zipped SQL dump of the whole database to S3 bucket in the data platform production AWS account at the path s3://dataplatform-prod-liberator-data-storage/parking/. The file needs to be named liberator_dump_210604.zip where the date stamp at the end is the current date.

If this file has not been uploaded then you will need to obtain it from Farthest Gate and upload manually. If this is the case then you will also need to disable the cloudwatch trigger before uploading

Open Cloudwatch and search for liberator-to-rds-snapshot-event and disable the trigger

2. Trigger the ECS task to start the backdated ingestion process

  • Open ECS and in Task Definitions select liberator-to-rds-snapshot
  • Click on the Actions dropdown and select Run Task
  • This next page includes a lot of options, unless specified below leave all options as they are:
    • Cluster: dataplatform-prod-workers
    • Launch Type: FARGATE
    • Operating System Family: Linux
    • Cluster VPC: Select the one VPC available
    • Subnets: Select all subnets available one by one
    • Auto-assign public IP: DISABLED
    • Expand the Advanced options section:
      • In Container Overrides expand the liberator-to-rds-snapshot section:
        • Click on Add Environment Variable
        • Set the key to IMPORT_DATE_OVERRIDE
        • Set the value to the date used in the liberator dump file you wish to ingest. This date must be in the following format: yyyy-mm-dd eg. 2022-06-01
  • Click Run Task at the bottom of the page

3. Automated ingestion of backdated liberator data starts

From here the ingestion is fully automated and closely follows the normal liberator ingestion process detailed in Liberator Ingestion Process

The difference between the normal liberator process and this one is that the date used to ingest from is taken from the IMPORT_DATE_OVERIDE environment variable that you input in step 2 instead of always assuming we are ingesting data for the current day

This process triggers a backdated glue workflow that will transfer the data from the landing zone into the raw zone in the correct partition. This glue workflow is called parking-liberator-backdated-data-workflow

When finished the backdated liberator data will be ingested into the raw zone in the partition date that matches the IMPORT_DATE_OVERIDE environment variable that you input in step 2

4. Enable Cloudwatch s3 event trigger (optional)

Please enable the cloudwatch trigger that you disabled in step 1. This is very important otherwise the next days liberator ingestion will not happen