Export data catalog table or view to Google Sheets
Objective
Export data catalog table or view from AWS Glue to Google Sheets, allowing you to share and analyze data in a spreadsheet format.
Intended audience
- Data Analyst
Preparing a Google Sheet for Export
To ensure a Google Sheet is ready to receive exported table or view data, follow these steps:
- Create or open the Google Sheet where you want to export the data.
- Create your worksheet in the sheet with the desired name.
- Remove restrictedtag on the sheet.
- Share the sheet with the appropriate service account emails:
- Click Sharein the top right corner.
- Enter the two service email addresses (found in the table below) and set their access to Editor(required for writing data to the sheet).
- Uncheck the Notify peoplebox.
- Click Share. If prompted to confirm sharing outside your organisation, clickShare anyway.
 
- Click 
- Add your sheet information to the configuration file as described below.
- Your Google Sheet is now ready to receive exported table or view data.
Departmental Service Email Addresses and YAML Files
Configuration Parameters
Each sheet export configuration can include the following parameters:
- sheet_id: The unique identifier for the Google Sheet (required)
- worksheet_name: The name of the worksheet within the Google Sheet where data will be exported (required)
- glue_database: The database name in Glue data catalog (required)
- glue_table: The table or view name in the Glue database (required)
- custom_query: (Optional) Custom SQL query to filter the catalog data before export.- By default, it's empty and means SELECT * FROM glue_database.glue_table
- Cannot be used together with only_latest_partition
 
- By default, it's empty and means 
- only_latest_partition: (Optional) Boolean (- true/- false). Only allow to be- trueif table is partitioned by import_date. When- true, automatically generates a query to get only the latest partition based on- import_date:- SELECT * FROM "database"."table" WHERE import_date = (SELECT MAX(import_date) FROM "database"."table")- Cannot be used together with custom_query. By default, it'sfalse.
 
- Cannot be used together with 
- sheet_schedule: (Optional) Schedule for this specific sheet export within the DAG run.- If not specified, the task runs daily.
- If you want to export more than once a day, you still need to talk to the DAP to do some config behind the scenes.
- Format: cron expression (e.g., "30 3 * * 5"for Friday at 3:30 AM)
 
Important Notes
- Use either custom_queryORonly_latest_partition, not both.
Example Configuration
# Example for non-partitioned table or view
sheets:
  - sheet_id: 1qgRT9GTSJw9DzJAazbLcN1sjIkqNXEMOMWaQD6m79M0
    worksheet_name: Flat Table
    glue_database: housing_nec_migration
    glue_table: homeowners_flat_database
# Example for custom query
  - sheet_id: 1InRDSKgMGYuuhAB8huORD_mZVZKADuuVNM-egsZ_7hc
    worksheet_name: cfs_test1
    glue_database: child-fam-services-refined-zone
    glue_table: addresses
    custom_query: SELECT id, flat_number, post_code FROM "child-fam-services-raw-zone"."addresses" where street = 'Essex Road'
    sheet_schedule: 30 3 * * 5
# Example for partitioned table
  - sheet_id: 1InRDSKgMGYuuhAB8huORD_mZVZKADuuVNM-egsZ_7hc
    worksheet_name: cfs_test2
    glue_database: child-fam-services-refined-zone
    glue_table: open_cin_cases_vs_supervisions
    only_latest_partition: true