Skip to main content

Ingesting new data entities via event streaming

Objective

To explain the setup needed when you want to ingest a new data type via event streaming.

Intended audience

  • Data Engineers

Prerequisites

  • Ensure you have completed the pre-requisites in our getting set up guide.
  • You will need the schema for the data entity you are hoping to ingest.

Adding a new topic and schema for the data entity

You will need to upload a schema for this data entity to the registry. Events pushed to the new topic with be checked against this schema, if they are not compatible the events will be rejected. The schema file needs to be a JSON file that follows the the AVRO schema specification. For an example, you can see the schema for the tenure API events here.

Once you have the schema file, create a pull request to commit the following changes. If you need help with this, refer to the committing changes section of the Using Github guide.

  1. Add the schema file to the schemas folder of the kafka schema registry module in the data platform repository. The file should have the same name as the topic, for example, tenure_api.json.
  2. Add the topic name to the list of topics in the kafka event streaming module.

Once the pull request has been approved and merged, the new schema will be added to the registry in the pipeline.