Chapter 5. Ingesting Data with the Apache Kafka Spout Connector
Apache Kafka is a high-throughput, distributed messaging system. Apache Storm provides a Kafka spout to facilitate ingesting data from Kafka 0.8x brokers. Storm developers should include downstream bolts in their topologies to process data ingested with the Kafka spout.
The storm-kafka components include a core-storm spout, as well as a
fully transactional Trident spout. Storm-kafka spouts provide the following key
features:
'Exactly once' tuple processing with the Trident API
Dynamic discovery of Kafka brokers and partitions
Hortonworks recommends that Storm developers use the Trident API. However, use the core-storm API if sub-second latency is critical for your Storm topology.
The core-storm API represents a Kafka spout with the
KafkaSpoutclass.The Trident API provides a
OpaqueTridentKafkaSpoutclass to represent the spout.
To initialize KafkaSpout and
OpaqueTridentKafkaSpout, Storm developers need an instance of a
subclass of the KafkaConfig class, which represents configuration
information needed to ingest data from a Kafka cluster.
The
KafkaSpoutconstructor requires theSpoutConfigsubclass.The
OpaqueTridentKafkaSpoutrequires theTridentKafkaConfigsubclass.
In turn, the constructors for both KafkaSpout and
OpaqueTridentKafkaSpout require an implementation of the
BrokerHosts interface, which is used to map Kafka brokers to
topic partitions. The storm-kafka component provides two implementations of
BrokerHosts: ZkHosts and
StaticHosts.
Use the
ZkHostsimplementation to dynamically track broker-to-partition mapping.Use the
StaticHostsimplementation for static broker-to-partition mapping.

