Ingesting data from Google Sheets
Objective
Ingest data contained within a Google Sheet for use on the Data Platform, optionally setting a recurring schedule to ingest new data.
Intended audience
- Data Analyst
Preparing a Google Sheet for Ingestion
To ensure a Google Sheet is ready for ingestion, follow these steps:
- Open the Google Sheet you want to ingest.
- Ensure all columns have headers – Columns without headers will be ignored.
- Name the document if it is currently unnamed.
- Remove
restricted
tag on the sheet. - Share the sheet with the appropriate service account emails:
- Click
Share
in the top right corner. - Enter the two service email addresses (found in the table below) and set their access to
Viewer
. - Uncheck the
Notify people
box. - Click
Share
. If prompted to confirm sharing outside your organisation, clickShare anyway
.
- Click
- Edit the YAML file as described below. If your department does not have a YAML file, contact the DAP team for configuration assistance.
- Your Google Sheet is now ready for ingestion.
Departmental Service Email Addresses and YAML Files
How to Edit the YAML File
To configure data ingestion for a new sheet, follow these steps using parking as an example. This process ensures that the sheet is correctly mapped in the YAML file.
Steps:
-
Locate the YAML file
- Open the relevant YAML file (e.g.,
parking.yaml
).
- Open the relevant YAML file (e.g.,
-
Append the following configuration
- Scroll to the end of the file.
- Copy and paste the following details, replacing values where necessary:
- department: parking
spreadsheet_title: nas_live_manual_updates_data_load
sheet_name: nas_live_manual_updates_data_load
sheet_id: 1Xf8nBhIW6wyvNOU7JOWSCJwZvsx65T9ZWYSct9vj9Gw
target_table: nas_live_manual_updates_data_load
pause_flag: -
Modify the parameters
department
: Define the department to which the data belongs.spreadsheet_title
: (Optional) A human-readable name for easy identification.sheet_name
: Specify the name of the sheet in the Google Spreadsheet.sheet_id
: Enter the unique Google Spreadsheet identifier.target_table
: Define the table name in the raw zone database (The table name in the raw zone database will be same as target_table specified here and it's not mandatory to add department name in the table name).pause_flag
: Leave blank for scheduled ingestion or set toTrue
to pause.