Blog post
The benefits of Kafka and why it’s so popular
By Samantha Allen 20 Feb 2020
Kafka is a popular, publish-subscribe messaging system that enables you to build distributed applications. With its scalability, fault-tolerance, and many other benefits, learning how to use Kafka is a wise idea. While plenty of small-scale open-source projects come and go, Kafka seems to be going from strength to strength in 2020.
So what are the full benefits of Kafka? Why is Kafka so popular, and should you learn how to use it? Read on to find out all you need to know about Kafka.
What is Kafka?
Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. Written in Scala and Java, Kafka was named after the author Franz Kafka because it is "a system optimized for writing".
Many developers begin exploring messaging when they have to connect lots of things together. When other integration patterns such as shared databases are too dangerous or simply not feasible, Kafka solves this problem.
Apache Kafka was originally created to connect different internal systems at LinkedIn. LinkedIn needed to reimagine capabilities like data integration and real time stream processing, so to break away from outdated approaches to these problems, Kafka was born. Within the Apache Software Foundation ecosystem of products, Kafka is a popular piece of the puzzle.
What is Kafka used for?
Kafka’s main use is for streaming data in real-time into other systems. Kafka was designed to be the ultimate platform to handle all the real-time data feeds a large company might have.
Kafka can connect to external systems (for data import/export) via Kafka Connect and Kafka Streams, a Java stream processing library.
With Kafka, users can subscribe and publish data to any number of systems or real-time applications. Some examples of this include managing passenger and driver matching at Uber, providing real-time analytics and predictive maintenance for British Gas’ smart home, and performing numerous real-time services across LinkedIn.
While Kafka is mostly used for real-time data analytics and stream processing, you can also use it for log aggregation, messaging, click-stream tracking, audit trails, and much more. Kafka is also used for website activity tracking, metrics collection and monitoring, log aggregation, real-time analytics, CEP, ingesting data into Spark, ingesting data into Hadoop, CQRS, replay messages, error recovery, and more.
Why is Kafka so popular?
Kafka’s excellent performance makes it extremely popular. Kafka is fast and efficient, and with the right training, it’s easy to set up and use. One of Kafka’s main features is fault tolerant storage which makes it stable and reliable. It has a flexible publish-subscribe/queue that scales well. Kafka is optimized for efficiency and naturally groups messages together to reduce the overhead of the network round trip.
With the right developer talent creating the consumer code, Kafka can support a large number of consumers and retain large amounts of data with very little overhead.
Data science and analytics are an extremely important part of businesses today, so using Kafka to capture data to feed into your data lakes and real-time analytics systems is incredibly important.
What are the alternatives to Kafka?
Kafka’s popularity has unsurprisingly resulted in similar systems being developed. For example, Amazon Kenesis is modelled after Apache Kafka. Other popular alternatives and competitors to Kafka include ActiveMQ, RabbitMQ, Apache Spark and Akka.
Kafka is used in cases where JMS, RabbitMQ, and AMQP may not even be considered due to volume and responsiveness.
Apache Kafka has recently added Kafka Streams which positions itself as an alternative to streaming platforms such as Apache Spark, Apache Flink, Apache Beam/Google Cloud Data Flow and Spring Cloud Data Flow.
While Apache Kafka and AWS Kinesis Data Streams are both good choices for real-time data streaming platforms, Kafka has a slight edge. If you need to keep messages for more than 7 days with no limitation on message size per blob, Kafka is your best choice. However, Apache Kafka requires the right skill level to set up, manage, and support.
If your organization lacks Apache Kafka experts or Kafka skilled employees, then booking onto a Kafka training course HERE is your next move.
You can browse all of our Kafka courses HERE or contact us to find out more.
At Go.Courses our mission is to bring you the world’s best IT courses. Our aim is to make it easy for you to book training and learn new skills. All our courses are trainer-led by experts in their field and available all over the UK and Europe.