Trident APIs
The following example shows construction of a Kafka bolt using core Storm APIs, followed by details about the code:
DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper()
.withColumnFields(new Fields(colNames))
.withTimeAsPartitionField("YYYY/MM/DD");
HiveOptions hiveOptions = new HiveOptions(metaStoreURI,dbName,tblName,mapper)
.withTxnsPerBatch(10)
.withBatchSize(1000)
.withIdleTimeout(10)
StateFactory factory = new HiveStateFactory().withOptions(hiveOptions);
TridentState state = stream.partitionPersist(factory, hiveFields, new HiveUpdater(),
new Fields());Instantiate an Implementation of HiveMapper Interface
The
storm-hivestreaming bolt uses theHiveMapperinterface to map the names of tuple fields to the names of Hive table columns. Storm provides two implementations:DelimitedRecordHiveMapperandJsonRecordHiveMapper. Both implementations take the same arguments.Table 5.3. HiveMapper Arguments
Argument
Data Type
Description
withColumnFieldsorg.apache.storm.tuple.FieldsThe name of the tuple fields that you want to map to table column names.
withPartitionFieldsorg.apache.storm.tuple.FieldsThe name of the tuple fields that you want to map to table partitions.
withTimeAsPartitionFieldStringRequests that table partitions be created with names set to system time. Developers can specify any Java-supported date format, such as "YYYY/MM/DD".
The following sample code illustrates how to use
DelimitedRecordHiveMapper:... DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper() .withColumnFields(new Fields(colNames)) .withPartitionFields(new Fields(partNames)); DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper() .withColumnFields(new Fields(colNames)) .withTimeAsPartitionField("YYYY/MM/DD"); ...Instantiate a
HiveOptionsclass with theHiveMapperImplementationUse the
HiveOptionsclass to configure the transactions used by Hive to ingest the streaming data, as illustrated in the following code sample.... HiveOptions hiveOptions = new HiveOptions(metaStoreURI,dbName,tblName,mapper) .withTxnsPerBatch(10) .withBatchSize(1000) .withIdleTimeout(10); ...
See "HiveOptions Class Configuration Properties" for a list of configuration properties for the
HiveOptionsclass.Instantiate the
HiveBoltwith theHiveOptionsclass:... StateFactory factory = new HiveStateFactory().withOptions(hiveOptions); TridentState state = stream.partitionPersist(factory, hiveFields, new HiveUpdater(), new Fields()); ...
Before building your topology code, add the following dependency to your topology
pom.xmlfile:<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.3.3</version> </dependency>

