An Introduction to Kafka

In most of the applications in various domain, an enormous volume of data is used. Regarding data, we have two main challenges. The first challenge is how to collect volumes of data and the second challenge is how to analyze the collected data. To overcome these challenges, you need a robust messaging system.

Messaging System and its Types

A messaging system is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it. Distributed messaging is based on the concept of reliable message queuing. Messages are queued asynchronously between client applications and messaging system.

The main types of Messaging System are as below:

Point to Point Messaging System
Publish-Subscribe Messaging System

Point to Point Messaging System

In a point-to-point system, messages are persisted in a queue. One or more consumers can consume the messages in the queue, but a message can be consumed by a maximum of one consumer only. Once a consumer reads a message in the queue, it disappears from that queue. Eg: Order Processing System

Introduction to Kafka

Publish-Subscribe Messaging System

In the publish-subscribe system, messages are persisted in a topic. Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic. In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers. Ex: Dish TV

Introduction to Kafka

History of Apache Kafka

Previously, LinkedIn was facing the issue of low latency ingestion of huge amount of data from the website into a lambda architecture which could be able to process real-time events. As a solution, Apache Kafka was developed in the year 2010, since none of the solutions was available to deal with this drawback, before.

However, there were technologies available for batch processing, but the deployment details of those technologies were shared with the downstream users. Hence, while it comes to Real-time Processing, those technologies were not enough suitable. Then, in the year 2011 Kafka was made public.

Today we run several clusters of Kafka brokers for different purposes in each data centre. We generally run off the open-source Apache Kafka trunk and put out a new internal release a few times a year. However, as our Kafka usage continued to rapidly grow, we had to solve some significant problems to make all of this happen at scale. In the years since we released Kafka as open source, the Engineering team at LinkedIn has developed an entire ecosystem around Kafka.

Introduction to Kafka

Apache Kafka is a fast, scalable, fault-tolerant, publish-subscribe messaging system. Basically, it designs a platform for high-end new generation distributed applications. Also, it allows many permanent or ad-hoc consumers. We use Apache Kafka when it comes to enabling communication between producers and consumers using message-based topics. One of the best features of Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.

There are numerous benefits of Apache Kafka such as:

Tracking web activities by storing/sending the events for real-time processes.
Alerting and reporting the operational metrics.
Transforming data into the standard format.
Continuous processing of streaming data to the topics.

Kafka Use Cases

There are several use Cases of Kafka that show why we use Apache Kafka.

Messaging

For a more traditional message broker, Kafka works well as a replacement. We can say Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large-scale message processing applications.

Metrics

For operational monitoring data, Kafka finds the good application. It includes aggregating statistics from distributed applications to produce centralized feeds of operational data.

Event Sourcing

Since it supports very large, stored log data, that means Kafka is an excellent backend for applications of event sourcing.

Kafka Components

Using the following components, Kafka achieves messaging:

Kafka Topic – Basically, how Kafka stores and organizes messages across its system and essentially a collection of messages are Topics. In addition, we can replicate and partition Topics. Here, replicate refers to copies and partition refers to the division. Also, visualize them as logs wherein, Kafka stores messages. However, this ability to replicate and partitioning topics is one of the factors that enable Kafka’s fault tolerance and scalability.

Introduction to Kafka

Kafka Producer – It publishes messages to a Kafka topic.

Kafka Consumer – This component subscribes to a topic(s), reads and processes messages from the topic(s).

Kafka Broker – Kafka Broker manages the storage of messages in the topic(s). If Kafka has more than one broker, that is what we call a Kafka cluster.

Kafka Zookeeper – To offer the brokers with metadata about the processes running in the system and to facilitate health checking and broker leadership election, Kafka uses Kafka zookeeper.

Benefits of Apache Kafka

Low Latency: Apache Kafka offers low latency value, i.e., up to 10 milliseconds. It is because it decouples the message which lets the consumer to consume that message anytime.

High Throughput: Due to low latency, Kafka can handle a greater number of messages of high volume and high velocity. Kafka can support thousands of messages in a second. Many companies such as Uber use Kafka to load a high volume of data.

Fault tolerance: Kafka has an essential feature to provide resistant to node/machine failure within the cluster.

Durability: Kafka offers the replication feature, which makes data or messages to persist more on the cluster over a disk. This makes it durable.

Reduces the need for multiple integrations: All the data that a producer writes go through Kafka. Therefore, we just need to create one integration with Kafka, which automatically integrates us with each producing and consuming system.

Easily accessible: As all our data gets stored in Kafka, it becomes easily accessible to anyone.

Distributed System: Apache Kafka contains a distributed architecture which makes it scalable. Partitioning and replication are the two capabilities under the distributed system.

Real-Time handling: Apache Kafka can handle real-time data pipeline. Building a real-time data pipeline includes processors, analytics, storage, etc.

Batch approach: Kafka uses batch-like use cases. It can also work like an ETL tool because of its data persistence capability.

Scalability: The quality of Kafka to handle large number of messages simultaneously make it a scalable software product.

Conclusion

The result of the performance of the Kafka system totally depends on how much accuracy occurs in data delivery. The performance depends on the delivery of every message to its consumer correctly, on time without more lag of time. The message delivery must be done at a perfect time and if not received within that time, alerts start to raise.

The post An Introduction to Kafka appeared first on WinWire.

An Introduction to Kafka

Messaging System and its Types

Point to Point Messaging System

Publish-Subscribe Messaging System

History of Apache Kafka

Introduction to Kafka

Kafka Use Cases

Kafka Components

Benefits of Apache Kafka

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112