Sunteți pe pagina 1din 8

WHITE PAPER

Apache Kafka Key Concepts and


Architecture
Ranjith Kumar Gannamani (676122)

Page 1 of 8
Apache Kafka

In today's world, real-time information is continuously getting generated by applications


(business, social, or any other type), and this information needs easy ways to be reliably and
quickly routed to multiple types of receivers. Most of the time, applications that are producing
information and applications that are consuming this information are well apart and
inaccessible to each other. This, at times, leads to redevelopment of information producers or
consumers to provide an integration point between them. Therefore, a mechanism is required
for seamless integration of information of producers and consumers to avoid any kind of
rewriting of an application at either end.

Apache Kafka is an open-source distributed stream processing platform. It acts as a pub-sub


Model. Kafka provides seamless integration between information of producers and consumers.
Kafka provides a real-time publish-subscribe solution, which overcomes the challenges of real
time data usage for consumption, for data volumes that may grow in order of magnitude, larger
that the real data.

Let’s discuss about few concepts in Kafka before digging deeper into it.

 Producer: Application that sends the messages using Kafka Producer API.
 Consumer: Application that receives the messages.
 Message: Information that is sent from the producer to a consumer through Apache Kafka.
 Broker: Cluster consists of one or more servers which are running with kafka.
 Topic: A Topic is a category/feed name to which messages are stored and published.
 Topic partition: Kafka topics are divided into a number of partitions, which allows you to
split data across multiple brokers.
 Replicas A replica of a partition is a "backup" of a partition. Replicas never read or write
data. They are used to prevent data loss.
 Consumer Group: A consumer group includes the set of consumer processes that are
subscribing to a specific topic.

Page 2 of 8
 Offset: The offset is a unique identifier of a record within a partition. It denotes the position
of the consumer in the partition.
 Node: A node is a single computer in the Apache Kafka cluster.
 Cluster: A cluster is a group of nodes i.e., a group of computers.

Let’s discuss about the Kafka Producer and Consumer properties in order to set in your application
code for a better and efficient message transmission and consumption
Producer properties:

 acks= The number of acknowledgments the producer requires the leader to have received
before considering a request complete.
Eg: acks = 1
 security.protocol= Protocol used to communicate with brokers
Eg:security.protocol=SASL_SSL
 sasl.mechanism= SASL mechanism used for client connections
Eg: sasl.mechanism=PLAIN
 ssl.enabled.protocols= The list of protocols enabled for SSL connections
Eg: ssl.enabled.protocols=TLSv1.2
 request.timeout.ms= The configuration controls the maximum amount of time the client
will wait for the response of a request.
Eg: request.timeout.ms=3000
 retries= Setting a value greater than zero will cause the client to resend any request that
fails with a potentially transient error.
Eg: retries=3
 batch.size=Producer will batch together the requests until it reaches its size.
Eg:batch.size=65536
 linger.ms= Producer will batch together the requests until it reaches the time in millis.
Eg:linger.ms=90
 max.in.flight.requests.per.connection=Max number of unacknowledged requests the client
will send on a single connection before blocking

Page 3 of 8
Eg: max.in.flight.requests.per.connection=2

 key.serializer – serializer class for the key in order to send a message to kafka topic.
Eg; ByteArraySerializer, JsonSerializer, StringSerializer
 value.serializer – serializer class for the value in order to send a message to kafka topic.
Eg; ByteArraySerializer, JsonSerializer, StringSerializer
 bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to
the Kafka cluster.
Eg: localhost:9092
Consumer Properties:

 key.deserializer – deserializer class for the key in order to deserialize message from kafka
topic.
Eg; ByteArrayDeserializer, JsonDeserializer, StringDeserializer
 value.deserializer – deserializer class for the value in order to deserialize message from
kafka topic.
Eg; ByteArrayDeserializer, JsonDeserializer, StringDeserializer
 group.id - A unique string that identifies the consumer group this consumer belongs to.
Eg: group.id = something.driver
 enable.autocommit.false - If true the consumer's offset will be periodically committed in
the background
Eg: enable.autocommit.false = false
 max.poll.records - The maximum number of records returned in a single call to poll().
Eg: max.poll.records=1
 auto.offset.reset - What to do when there is no initial offset in Kafka or if the current offset
does not exist any more on the serve
Eg: auto.offset.rest = earliest
 bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to
the Kafka cluster to receive the messages
Eg: localhost:9092

Page 4 of 8
If Kafka cluster is secured, please discus with the admin in your team before setting these
properties in the application code.

Zookeeper:
Zookeeper is a top-level software developed by Apache that acts as a centralized service and is
used to maintain naming and configuration data and to provide flexible and robust synchronization
within distributed systems. Zookeeper keeps track of status of the Kafka cluster nodes and it also
keeps track of Kafka topics, partitions etc.

Start the zookeeper first, and then start the kafka-server always. Otherwise you will see the
connection refused

Steps for installing Kafka on Mac using Homebrew:


 brew cask install java
 brew install kafka
Now, start the Zookeeper and Kafka-server
 zookeeper-server-start
 kafka-server-start
Create Kafka topic:
 kafka-topics --create --zookeeper 0.0.0.0:2181 --replication-factor 1 --partitions 1 --topic
test
Initialize Producer now:
 kafka-console-producer --broker-list localhost:9092 --topic test
first message
second message
Note: producer will listen to localhost/0.0.0.0 at port 9092
Start consumer:
 kafka-console-consumer –bootstrap-servers localhost:9092 --topic test –from-beginning
first message
second message

Page 5 of 8
You will see the messages in the consumer console after starting the kafka consumer.
Stop the zookeeper and Kafka server:
 zookeeper-server-stop
 kafka-server-stop

Simple Producer code:

object ProducerExample extends App {

import java.util.Properties

import org.apache.kafka.clients.producer._

val props = new Properties()


props.put("bootstrap.servers", "localhost:9092")

// props.put("ssl.enabled.protocols", "TLSv1.2")
// props.put("security.protocol", "SASL_SSL")
// props.put("sasl.mechanism", "PLAIN")
// .set("bootstrap.servers", "localhost:9092")
// props.put("sasl.jaas.config", //s"org.apache.kafka.common.security.plain.PlainLoginModule required
//username=$username password=$password;")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("acks", "1")

val producer = new KafkaProducer[String, String](props)

val TOPIC="test"

for(i<- 1 to 50){
val record = new ProducerRecord(TOPIC, "key", s"hello $i")
producer.send(record)
}

val record = new ProducerRecord(TOPIC, "key", "the end "+new java.util.Date)


producer.send(record)

producer.close()
}

This code will produce a total of 51 messages to the kafka topic. Commented some of the
properties in the code because my local kafka server is not enabled with any security
mechansims. You will get an idea on how to use those properties when you try to produce the
messages to the kafka cluster which is enabled with security mechansims.

Note: The above code is written in Scala.

Simple Consumer code:

Page 6 of 8
package org.apache.spark.sql.execution.streaming.vertica

import java.util

import org.apache.kafka.clients.consumer.KafkaConsumer

import scala.collection.JavaConverters._

object ConsumerExample extends App {

import java.util.Properties

val TOPIC="test"

val props = new Properties()

props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "something.driver")
props.put("auto.offset.reset", "latest")
props.put("max.poll.records", "1")
props.put("enable.auto.commit", "false")

// props.put("ssl.enabled.protocols", "TLSv1.2")
// props.put("security.protocol", "SASL_SSL")
// props.put("sasl.mechanism", "PLAIN")

props.put("bootstrap.servers", "localhost:9092”)

val consumer = new KafkaConsumer[String, String](props)

consumer.subscribe(util.Collections.singletonList(TOPIC))

while(true){
val records=consumer.poll(0)
for (record<-records.asScala){
println(record.toString)
}
}
}

References:

https://kafka.apache.org/

Page 7 of 8
Page 8 of 8

S-ar putea să vă placă și