Ingesting data from Google Sheets
Objective
Ingest data contained within a Google Sheet for use on the Data Platform, optionally setting a recurring schedule to ingest new data.
Intended audience
- Data Analyst
Preparing a Google Sheet for Ingestion
To ensure a Google Sheet is ready for ingestion, follow these steps:
- Open the Google Sheet you want to ingest.
- Ensure all columns have headers – Columns without headers will be ignored.
- Name the document if it is currently unnamed.
- Remove
restrictedtag on the sheet. - Share the sheet with the appropriate service account emails:
- Click
Sharein the top right corner. - Enter the two service email addresses (found in the table below) and set their access to
Viewer. - Uncheck the
Notify peoplebox. - Click
Share. If prompted to confirm sharing outside your organisation, clickShare anyway.
- Click
- Edit the YAML file as described below. If your department does not have a YAML file, contact the DAP team for configuration assistance.
- Your Google Sheet is now ready for ingestion.
Departmental Service Email Addresses and YAML Files
How to Edit the YAML File
To configure data ingestion for a new sheet, follow these steps using parking as an example. This process ensures that the sheet is correctly mapped in the YAML file.
Steps:
-
Locate the YAML file
- Open the relevant YAML file (e.g.,
parking.yaml).
- Open the relevant YAML file (e.g.,
-
Append the following configuration
- Scroll to the end of the file.
- Copy and paste the following details, replacing values where necessary:
- department: parking
spreadsheet_title: nas_live_manual_updates_data_load
sheet_name: nas_live_manual_updates_data_load
sheet_id: 1Xf8nBhIW6wyvNOU7JOWSCJwZvsx65T9ZWYSct9vj9Gw
target_table: nas_live_manual_updates_data_load
pause_flag: -
Modify the parameters
department: Define the department to which the data belongs.spreadsheet_title: (Optional) A human-readable name for easy identification.sheet_name: Specify the name of the sheet in the Google Spreadsheet.sheet_id: Enter the unique Google Spreadsheet identifier.target_table: Define the table name in the raw zone database (The table name in the raw zone database will be same as target_table specified here and it's not mandatory to add department name in the table name).starting_position: (Optional) Specify the first cell to read from using A1 notation, such asA1,A2,W16, orAB123. The import ignores rows above and columns to the left of this cell. The selected cell is treated as the first header cell, with the data rows expected below it. If this value is omitted, blank, or set toNone, the import starts fromA1.pause_flag: Leave blank for scheduled ingestion or set totrueto pause.