Import files from google to s3
Preparing to import the file from Google drive
- Open the file you would like to import
- Ensure that all columns have headers. Columns without headers will be lost
- Click
Share
in the top right corner of the sheet - If the document is unnamed, name it
- Paste in the service account email address you have been provided into the email box
- Ensure the suggested email matches the service account email and select it
- On the new window, choose from the dropdown on the right hand side and select
Viewer
- Uncheck the
Notify people
checkbox - Click
Share
- You will be asked to confirm sharing outside the organisation, click
share anyway
- Your file is now available for import
Getting file detail
-
You will need to obtain the document key from the url
-
The document id is the portion of the url between
https://docs.google.com/file/d/
and/edit#gid=0
. See example below
Setting up the copier lambda
-
Before setting up an AWS Glue job, ensure that the relevant department configuration for that account is set up in AWS
- see
Adding a department
section inmanaging-departments.md
- see
-
Open the Data Platform Project. You'll need to have a Github account (which you can create yourself using your Hackney email) and have been added to the 'LBHackney-IT' team to view this project (you'll need to request this from Rashmi Shetty). If you don't have the correct permissions, you'll get a '404' error.
-
Navigate to the main
terraform
directory (data-platform/terraform) -
Open the
65-g-drive-to-s3
terraform file -
Switch to 'edit mode' (using edit button on top right)
-
Copy one of the modules above, paste at the bottom of the file and update the following fields:
module
= "your-unique-module-name" (it is helpful to keep the same naming convention as your dataset/folder)lambda_name
= "Your lambda name" (this is what you'll see in the Glue console, can be the same as your module name)file_id
= "Your document id - see theGetting file detail
section above"file_name
= "The name of the file you are importing including the file extension and using underscores instead of spaces"service_area
= "The name of the service area folder you would like to store in e.g.housing
,social-care
" (if this folder doesn't already exist in S3 you can name it here and this script will create it)
-
Committing your changes: The Data Platform team needs to approve any changes to the code, so your change won't happen automatically. To submit your change:
- Provide a description to explain what you've changed
- Select the option to create a
new branch
for this commit (i.e. the code you've changed). You can just use the suggested name for your branch. - Once you click 'Propose changes' you'll have the opportunity to add even more detail if needed before submitted for review.
- You'll receive an email to confirm that your changes have been approved.