I’ve been investigating using Apache Kafka with Apache Beam and set up a small prototype environment using Docker Compose. It’s published on GitHub.

You can start up the Kafka and Zookeeper (which it depends on) with make up.

Once started, we can create a Kafka topic to use for testing with make topic. You can check it’s been created successfully by running make describe to dump some information on the topic.

$ make describe
docker-compose exec kafka kafka-topics --describe --topic words --zookeeper zookeeper:32181
Topic:words	PartitionCount:1	ReplicationFactor:1	Configs:
	Topic: words	Partition: 0	Leader: 1	Replicas: 1	Isr: 1

Now we can use Beam to write some data to that topic. make producer will run the code in src/main/java/com/andrewjones/KafkaProducerExample.java.

To check we now have some data in the topic run make offset. This will tell us there are 5 events in that topic. We can dump the data back out again from Kafka using make dump.

$ make dump
docker-compose exec kafka kafka-console-consumer --bootstrap-server kafka:29092 --topic words --new-consumer --from-beginning --max-messages 5
hi bob
hi
hi sue bob
hi there
hi sue

Finally we can use Beam to read out the data again. We’ll do a simple wordcount on the data in the topic. Run make consumer to run the code in src/main/java/com/andrewjones/KafkaConsumerExample.java.

To see the output, run cat wordcount*.

$ cat wordcounts-0000*
sue: 2
there: 1
bob: 2
hi: 5

Finally close down everything with make down and clean up with make clean.

The full code can be found on GitHub.

Cover image from Unsplash.