Exporting database snapshots to the Data Platform Landing Zone
This section covers the technical overview of uploading data into the Data Platform from a db instance in AWS. For step by step instructions on how to do this, refer to Ingesting RDS snapshot into the Data Platform Landing Zone guide.
The terraform module db_snapshot_to_s3
will provision the following resources:
After the deployment these resources will export the data from any instance listed in this environment variable by executing the following steps:
- An RDS event subscription is provisioned, which attaches to the defined instance and pushes events to an SNS topic
- When a snapshot of the defined instance is created a message is sent to an SQS queue
- The sent SQS message invokes the export lambda by passing it the RDS instance id
- The export lambda retrieves the last snapshot created for the received instance id and starts an RDS snapshot export task
- Once the export task is started the lambda queues an SQS message with the export task identifier and the RDS export storage bucket name
- The queued message invoked the copier lambda
- The copier lambda checks the state of the export task and once the export is completed, it transfers the exported data into an S3 bucket in the data platform account
Developer notes
There are development modules for testing this setup between sandbox and Data Platform development account. They'll deploy all necessary resources for complete test setup.
The modules can be found in core/29-db-snapshot-to-s3-sandbox.tf
file.
To deploy the resources please follow these steps (instructions also in the terraform file itself):
- Deploy the
db_snapshot_to_s3_sandbox_resources
module. This will deploy the database and bastion bost resources to sandbox account - Copy the deployed sandbox database's name to
rds_instance_ids
array variable in theenv.tfvars
file - Deploy the
lambda_artefact_storage_for_sandbox_account
module. This will deploy the bucket for Lambda resources - Deploy the
db_snapshot_to_s3_sandbox
module. This is the current module for the RDS snapshot to cross account S3 feature used in production and will deploy all resources required for end to end setup - Update the raw zone bucket on Data Platform devevelopment account in your workspace with the following bucket and bucket key policy statements. Use these as inputs for
bucket_policy_statements
andbucket_key_policy_statements
in the raw zone bucket module
Bucket policy statement:
sandbox_s3_to_s3_copier_write_access_to_raw_zone_statement = {
sid = "AllowSandboxS3toS3CopierWriteAccessToRawZoneUnrestrictedLocation"
effect = "Allow"
actions = [
"s3:ListBucket",
"s3:PutObject",
"s3:PutObjectAcl"
]
principals = {
type = "AWS"
identifiers = [
"arn:aws:iam::${var.aws_sandbox_account_id}:root",
"arn:aws:iam::${var.aws_sandbox_account_id}:role/${local.identifier_prefix}-s3-to-s3-copier-lambda"
]
}
resources = [
"${module.raw_zone.bucket_arn}",
"${module.raw_zone.bucket_arn}/unrestricted/${replace(lower(local.identifier_prefix), "/[^a-zA-Z0-9]+/", "_")}sandbox/*"
]
}
Bucket key policy statement:
sandbox_s3_to_s3_copier_raw_zone_key_statement = {
sid = "SandboxS3ToS3CopierAccessToRawZoneKey"
effect = "Allow"
actions = [
"kms:Encrypt",
"kms:GenerateDataKey*"
]
principals = {
type = "AWS"
identifiers = [
"arn:aws:iam::${var.aws_sandbox_account_id}:root"
"arn:aws:iam::${var.aws_sandbox_account_id}:role/${local.identifier_prefix}-s3-to-s3-copier-lambda"
]
}
-
Uncomment the statement in the sandbox database key policy to allow the RDS snapshot to S3 lambda role access to the key. This must be done after all other resources have been deployed.
-
Connect to the database from the sandbox bastion host using AWS Session Manager and import some dummy data for testing
You can import some test data by following these steps:
sudo yum update -y
sudo amazon-linux-extras enable postgresql14
sudo yum install postgresql-server -y
psql --host=<YOUR_DATABASE_HOST> --port=5432 --username=sandbox_db_admin --password --dbname=<YOUR_DATABASE_NAME>
Enter the generated database password from the Secrets Manager
You can use the following resources for quick test data setup
schema: https://github.com/LBHackney-IT/social-care-case-viewer-api/blob/master/database/schema.sql
seeds: https://github.com/LBHackney-IT/social-care-case-viewer-api/blob/master/database/seeds.sql
Simply copy and paste the files' content to the PSQL CLI terminal
- Test the setup by taking a manual snapshot of the sandbox database. Once the process is complete you should have data in your Data Platform devevelopment raw zone in the unrestricted area. The data will be in a folder matching the sandbox database name
Once you have completed testing you can destroy the resources by completing the steps in reverse order from step 6.