Read Police API crime street data and transform it
Goal
Read street-level crime data from the Police API:
https://data.police.uk/api/crimes-street/all-crime?date=2024-01&lat=51.5450&lng=-0.0553
Write the raw JSON response to:
s3://dataplatform-stg-raw-zone/data-and-insight/testing/demo/police_api_crime_street/crimes-street-all-crime-hackney.json
Then transform that raw JSON into a partitioned parquet Glue table in the raw zone:
"data-and-insight-raw-zone"."test_tian_demo_police_crime_street"
Partition the transformed table by:
["import_year", "import_month", "import_day", "import_date"]
The Police API street-level crimes method supports a point search using lat, lng, and an optional date. This example uses a fixed point in Hackney and date=2024-01 so the result is small enough for a training run.
API documentation:
1. Configure the AWS profile on Windows
Open this file:
notepad C:\Users\<your_windows_username>\.aws\config
If the .aws folder or config file does not exist, create it.
Add this profile:
[profile DataPlatformDataAndInsightStg]
sso_start_url = https://hackney.awsapps.com/start
sso_region = eu-west-1
sso_account_id = 120038763019
sso_role_name = DataPlatformDataAndInsightStg
region = eu-west-2
output = json
Then log in:
aws sso login --profile DataPlatformDataAndInsightStg
Check the login works:
aws sts get-caller-identity --profile DataPlatformDataAndInsightStg
2. Install the Python packages
pip install boto3 requests awswrangler pandas
3. Check the scripts
The runnable scripts are stored next to this page:
docs/training-modules/ingest_from_API/police_api_crime_street_raw.py
docs/training-modules/ingest_from_API/police_api_crime_street_transform.py
The first script reads from the API and writes one JSON file to the raw zone.
The second script reads that raw JSON file, flattens the nested API fields, adds import partition columns, writes parquet to the raw zone, and registers a Glue table.
Both scripts use logging, not print, so the run output is clear.
4. Run the raw ingest
Check the raw JSON file to s3://dataplatform-stg-raw-zone/data-and-insight/testing/demo/police_api_crime_street/
5. Run the transform
Check the partitioned parquet files: s3://dataplatform-stg-raw-zone/data-and-insight/testing/demo/test_tian_demo_police_crime_street/
6. Check the result in Athena
check "data-and-insight-raw-zone"."test_tian_demo_police_crime_street"