Getting started with Kafka and Beam
I’ve been investigating using Apache Kafka with Apache Beam and set up a small prototype environment using Docker Compose. It’s published on GitHub.
You can start up the Kafka and Zookeeper (which it depends on) with make up
.
Once started, we can create a Kafka topic to use for testing with make topic
. You can check it’s been created successfully by running make describe
to dump some information on the topic.
$ make describe
docker-compose exec kafka kafka-topics --describe --topic words --zookeeper zookeeper:32181
Topic:words PartitionCount:1 ReplicationFactor:1 Configs:
Topic: words Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Now we can use Beam to write some data to that topic. make producer
will run the code in src/main/java/com/andrewjones/KafkaProducerExample.java
.
To check we now have some data in the topic run make offset
. This will tell us there are 5 events in that topic. We can dump the data back out again from Kafka using make dump
.
$ make dump
docker-compose exec kafka kafka-console-consumer --bootstrap-server kafka:29092 --topic words --new-consumer --from-beginning --max-messages 5
hi bob
hi
hi sue bob
hi there
hi sue
Finally we can use Beam to read out the data again. We’ll do a simple wordcount on the data in the topic. Run make consumer
to run the code in src/main/java/com/andrewjones/KafkaConsumerExample.java
.
To see the output, run cat wordcount*
.
$ cat wordcounts-0000*
sue: 2
there: 1
bob: 2
hi: 5
Finally close down everything with make down
and clean up with make clean
.
The full code can be found on GitHub.
Cover image from Unsplash.