Skip to main content

Ingesting data from Google Sheets

Objective

Ingest data contained within a Google Sheet for use on the Data Platform, optionally setting a recurring schedule to ingest new data.

Intended audience

  • Data Analyst

Preparing a Google Sheet for Ingestion

To ensure a Google Sheet is ready for ingestion, follow these steps:

  1. Open the Google Sheet you want to ingest.
  2. Ensure all columns have headers – Columns without headers will be ignored.
  3. Name the document if it is currently unnamed.
  4. Remove restricted tag on the sheet.
  5. Share the sheet with the appropriate service account emails:
    • Click Share in the top right corner.
    • Enter the two service email addresses (found in the table below) and set their access to Viewer.
    • Uncheck the Notify people box.
    • Click Share. If prompted to confirm sharing outside your organisation, click Share anyway.
  6. Edit the YAML file as described below. If your department does not have a YAML file, contact the DAP team for configuration assistance.
  7. Your Google Sheet is now ready for ingestion.

Departmental Service Email Addresses and YAML Files

DepartmentStaging Service Account EmailProduction Service Account EmailYAML File URL
Parkingparking@dataplatform-stg.iam.gserviceaccount.comparking@dataplatform-prod0.iam.gserviceaccount.comLink
Housing Repairshousing-repairs@dataplatform-stg.iam.gserviceaccount.comhousing-repairs@dataplatform-prod0.iam.gserviceaccount.com
Data & Insightdata-and-insight@dataplatform-stg.iam.gserviceaccount.comdata-and-insight@dataplatform-prod0.iam.gserviceaccount.comLink
Financefinance@dataplatform-stg.iam.gserviceaccount.comfinance@dataplatform-prod0.iam.gserviceaccount.com
Environmental Enforcementenv-enforcement@dataplatform-stg.iam.gserviceaccount.comenv-enforcement@dataplatform-prod0.iam.gserviceaccount.com
Planningplanning@dataplatform-stg.iam.gserviceaccount.complanning@dataplatform-prod0.iam.gserviceaccount.com
Unrestrictedunrestricted@dataplatform-stg.iam.gserviceaccount.comunrestricted@dataplatform-prod0.iam.gserviceaccount.com
Sandboxsandbox@dataplatform-stg.iam.gserviceaccount.comsandbox@dataplatform-prod0.iam.gserviceaccount.com
Benefits & Housing Needsbenefits-housing-needs@dataplatform-stg.iam.gserviceaccount.combenefits-housing-needs@dataplatform-prod0.iam.gserviceaccount.com
Revenuesrevenues@dataplatform-stg.iam.gserviceaccount.comrevenues@dataplatform-prod0.iam.gserviceaccount.com
Environmental Servicesenvironmental-services@dataplatform-stg.iam.gserviceaccount.comenvironmental-services@dataplatform-prod0.iam.gserviceaccount.com
Housinghousing@dataplatform-stg.iam.gserviceaccount.comhousing@dataplatform-prod0.iam.gserviceaccount.comLink
Child & Family Serviceschild-fam-services@dataplatform-prod0.iam.gserviceaccount.comchild-fam-services@dataplatform-stg.iam.gserviceaccount.comLink

How to Edit the YAML File

To configure data ingestion for a new sheet, follow these steps using parking as an example. This process ensures that the sheet is correctly mapped in the YAML file.

Steps:

  1. Locate the YAML file

    • Open the relevant YAML file (e.g., parking.yaml).
  2. Append the following configuration

    • Scroll to the end of the file.
    • Copy and paste the following details, replacing values where necessary:
    - department: parking
    spreadsheet_title: nas_live_manual_updates_data_load
    sheet_name: nas_live_manual_updates_data_load
    sheet_id: 1Xf8nBhIW6wyvNOU7JOWSCJwZvsx65T9ZWYSct9vj9Gw
    target_table: nas_live_manual_updates_data_load
    pause_flag:
  3. Modify the parameters

    • department: Define the department to which the data belongs.
    • spreadsheet_title: (Optional) A human-readable name for easy identification.
    • sheet_name: Specify the name of the sheet in the Google Spreadsheet.
    • sheet_id: Enter the unique Google Spreadsheet identifier.
    • target_table: Define the table name in the raw zone database (The table name in the raw zone database will be same as target_table specified here and it's not mandatory to add department name in the table name).
    • pause_flag: Leave blank for scheduled ingestion or set to True to pause.