Apache Kafka is a distributed streaming platform designed to build real-time pipelines, and is used as a message broker or as a replacement for a log aggregation solution for big data applications.
For this demonstration, we leveraged on Tanzu Kubernetes Grid 2.3.0 (Kubernetes 1.26.x) to create a well-configured and highly available infrastructure for the Apache Kafka deployment. The Tanzu Infrastructure played an important role in optimizing the deployment and management of Apache Kafka, adding further value to the backup and restore capabilities.
You can use a Helm chart to get started with an Apache Kafka cluster on Kubernetes.
Once the cluster is deployed and operational, it is important to back up its data regularly and ensure that it can easily be restored whenever needed. Data backup and restore procedures are also important in cases, such as, off-site data migration/data analysis, or application load testing.
This document explains the process to back up and restore a Apache Kafka Service on Tanzu Kubernetes Grid clusters using Velero, an open-source Kubernetes backup/restore tool.
kubectl
and Helm v3
CLI on the client machine.For this demonstration purpose, we use Helm to deploy Apache Kafka using Helm on source cluster and upload some data to it.
kafka
on source cluster for deploying the Apache Kafka Service.Deploy Apache Kafka using Helm by running the following command:
helm install kafka oci://registry-1.docker.io/bitnamicharts/kafka -n kafka
To upload messages into Apache Kafka, perform the following steps:
Get Kafka user Password.
The default user is user1
.
kubectl get secret kafka-user-passwords --namespace kafka -o jsonpath='{.data.client-passwords}' | base64 -d | cut -d , -f 1
Create the client.properties
file with following content:
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-256
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
username="user1" \
password="<Password>";
Create a pod that you can use as a Kafka client, and run the following commands to connect to the database:
kubectl run kafka-client --restart='Never' --image docker.io/bitnami/kafka:3.6.1-debian-11-r0 --namespace kafka --command -- sleep infinity
kubectl cp --namespace kafka ./client.properties kafka-client:/tmp/client.properties
kubectl exec --tty -i kafka-client --namespace kafka -- bash
chmod 777 /tmp/client.properties
Upload data by running the kafka-console-producer.sh
script, and insert the data:
kafka-console-producer.sh \
--producer.config /tmp/client.properties \
--broker-list kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-2.kafka-controller-headless.kafka.svc.cluster.local:9092 \
--topic test
>one
>two
>three
>four
Verify the messages by running the kafka-console-consumer.sh
script:
kafka-console-consumer.sh \
--consumer.config /tmp/client.properties \
--bootstrap-server kafka.kafka.svc.cluster.local:9092 \
--topic test \
--from-beginning
one
two
three
four
In this section, we’ll use Velero to back up the Apache Kafka deployment including namespace.
Back up the database using velero:
# velero backup create kafka-backup-01b --include-namespaces kafka
Backup request "kafka-backup-01b" submitted successfully.
Run `velero backup describe kafka-backup-01b` or `velero backup logs kafka-backup-01b` for more details.
Check the status of backup by using the following command:
velero backup describe kafka-backup-01b --details
Now, we’ll restore the Apache Kafka deployment on destination cluster.
Run the following command to restore the backup:
# velero restore create --from-backup kafka-backup-01b
Restore request "kafka-backup-01b-20231221113616" submitted successfully.
Run `velero restore describe kafka-backup-01b-20231221113616` or `velero restore logs kafka-backup-01b-20231221113616` for more details.
Ensure that the PVCs are recovere,d and the status of the active PVC is bound:
# kubectl get pvc -n kafka
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-kafka-controller-0 Bound pvc-1627bd3a-f29e-4455-a1ab-46f0c78615b3 8Gi RWO default 2m44s
data-kafka-controller-1 Bound pvc-02315c12-388e-40ff-bd20-e0c6fa5a7283 8Gi RWO default 2m44s
data-kafka-controller-2 Bound pvc-b0ca13e9-3088-42c7-a2f0-89dd9504693b 8Gi RWO default 2m44s
Ensure that the pods are up and running:
# kubectl get pods -n kafka
NAME READY STATUS RESTARTS AGE
kafka-controller-0 1/1 Running 0 2m32s
kafka-controller-1 1/1 Running 0 2m32s
kafka-controller-2 1/1 Running 0 2m32s
Connect to Apache Kafka, and ensure that the messages are intact:
Create a pod that you can use as a Kafka client, and run the following commands to connect to the database:
kubectl run kafka-client --restart='Never' --image docker.io/bitnami/kafka:3.6.1-debian-11-r0 --namespace kafka --command -- sleep infinity
kubectl cp --namespace kafka ./client.properties kafka-client:/tmp/client.properties
kubectl exec --tty -i kafka-client --namespace kafka -- bash
chmod 777 /tmp/client.properties
Verify the messages by running the kafka-console-consumer.sh
script:
kafka-console-consumer.sh \
--consumer.config /tmp/client.properties \
--bootstrap-server kafka.kafka.svc.cluster.local:9092 \
--topic test \
--from-beginning
one
two
three
four
Regular backups of your Apache Kafka deployments are crucial for ensuring data safety and minimizing downtime. By using the procedures explained in this document, you can establish a reliable backup routine and test restoration practices to guarantee a swift and successful recovery when needed.