56 docs tagged with "playbook"

Overview of how Academy data is ingested onto the Data Platform from MS SQL databases and distributed to Housing Benefits & Needs and Revenues Departments

Ingesting data from an API to the Data Platform

Ingesting API data into the Data Platform using an AWS Lambda function

Ingesting data from databases into the Data Platform

Ingesting database tables into the Data Platform using a JDBC Connection

Ingesting data from Google Sheets

Objective

Ingesting Dynamo DB tables into the Landing Zone

Ingesting tables from a Dynamo DB instance into the Data Platform landing zone

Ingesting new data entities via event streaming

Setting up a new Kafka topic to streaming events to from a new data entity

Ingesting RDS snapshot into the Data Platform Landing Zone

Ingesting a snapshot of an RDS instance into the DataPlatform landing zone

Liberator data ingestion

Description of the ingestion process for Liberator data

Local Notebook Environment Setup

Onboarding new departments to the platform

How to add a new Google group for a department.

Onboarding new users to the platform

How to add users to a Google group

Optimizing Glue jobs

Elements for optimizing glue jobs

Production to pre-production data sync

Overview of how data is copied from production to pre-production

Prototyping glue jobs in a notebook

Prototyping transformation scripts using a Jupyter Notebook

Querying the Data Platform using SQL within AWS Athena

AWS Athena to query data in S3

Redshift - Creating users, databases and exposing data from Glue

RingGo data ingestion

Description of the ingestion process for RingGo data

Roles

There are currently four tiers of role within the data platform project, and they are as follows:

Scheduling Liberator Glue Jobs

Schedule a glue job to run when new liberator data is added into the platform

Tascomi data ingestion

Description of the ingestion and refinement pipeline for Tascomi planning data

Tips on writing an API Ingestion script for AWS Lambda

Recommendations to write an API ingestion script for a Lambda in the Data Platform

Types of data and ingestion process

Due to the variety of data sources we have had to develop several different ingestion methods. These methods and the data being ingested are detailed in this section

Using GitHub

A guide on how to carry out common tasks in GitHub

Using Glue Studio

Using AWS Glue Studio to create ETL processes.

Using the data catalogue

How to use the data catalogue

Using Watermarks to Record AWS GLue Job States Between Runs

Use of the watermarks class for recording Glue job states between runs

VPC Peering Connection between Data Platform and Production APIs AWS accounts

Overview of how the VPC Peering Connection and its purpose