Skip to main content

Getting started with Kafka and Beam

·2 mins
Cover image

I’ve been investigating using Apache Kafka with Apache Beam and set up a small prototype environment using Docker Compose. It’s published on GitHub.

You can start up the Kafka and Zookeeper (which it depends on) with make up.

Once started, we can create a Kafka topic to use for testing with make topic. You can check it’s been created successfully by running make describe to dump some information on the topic.

$ make describe
docker-compose exec kafka kafka-topics --describe --topic words --zookeeper zookeeper:32181
Topic:words	PartitionCount:1	ReplicationFactor:1	Configs:
	Topic: words	Partition: 0	Leader: 1	Replicas: 1	Isr: 1

Now we can use Beam to write some data to that topic. make producer will run the code in src/main/java/com/andrewjones/KafkaProducerExample.java.

To check we now have some data in the topic run make offset. This will tell us there are 5 events in that topic. We can dump the data back out again from Kafka using make dump.

$ make dump
docker-compose exec kafka kafka-console-consumer --bootstrap-server kafka:29092 --topic words --new-consumer --from-beginning --max-messages 5
hi bob
hi
hi sue bob
hi there
hi sue

Finally we can use Beam to read out the data again. We’ll do a simple wordcount on the data in the topic. Run make consumer to run the code in src/main/java/com/andrewjones/KafkaConsumerExample.java.

To see the output, run cat wordcount*.

$ cat wordcounts-0000*
sue: 2
there: 1
bob: 2
hi: 5

Finally close down everything with make down and clean up with make clean.

The full code can be found on GitHub.

Cover image from Unsplash.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


Andrew Jones
Author
Andrew Jones
I build data platforms that reduce risk and drive revenue.