Improving Splunk and Kafka Platforms with Cloud-Native Technologies – InfoWorld

Intel Select Solutions for Splunk and Kafka on Kubernetes use containers and S3-compliant storage to increase application performance and infrastructure utilization while simplifying the management of hybrid cloud environments.

Executive Summary

Data architects and administrators of modern analytic and streaming platforms like Splunk and Kafka continually look for ways to simplify the managementofhybrid or multi-cloud platforms, while also scaling these platforms to meet the needs of their organizations. They are challenged with increasing data volumes and the need for faster insights and responses. Unfortunately, scaling often results in server sprawl, underutilized infrastructure resources and operational inefficiencies.

The release of Splunk Operator for Kubernetes and Confluent for Kubernetes, combined with Splunk SmartStore and Confluent Tiered Storage, offers new options for architectures designed with containers and S3-compatible storage. These new cloud-native technologies, running on Intel architecture and Pure Storage FlashBlade, can help improve application performance, increase infrastructure utilization and simplify the management of hybrid and multi-cloud environments.

Intel and Pure Storage architects designed a new reference architecture called Intel Select Solutions for Splunk and Kafka on Kubernetes and conducted a proof of concept(PoC) to test the value of this reference architecture. Tests were run using Splunk Operator for Kubernetes and Confluent for Kubernetes with Intel ITs high-cardinality production data to demonstrate a real-worldscenario.

In our PoC, a nine-node cluster reached a Splunk ingest rate of 886 MBps, while simultaneously completing 400 successful dense Splunk searches per minute, with an overall CPU utilization rate of 58%.1 We also tested Splunk super-sparse searches and Splunk ingest from Kafka data stored locally versus data in Confluent Tiered Storage on FlashBlade, which exhibited remarkable results. The outcomes of this PoC informed the Intel Select Solutions for Splunk and Kafka on Kubernetes.

Keep reading to find out how to build a similar Splunk and Kafka platform that can provide the performance and resource utilization your organization needs tomeet the demands of todays data-intensive workloads.

Solution Brief

Business challenge

The ongoing digital transformation of virtually every industry means that modern enterprise workloads utilize massive amounts of structured and unstructured data. Forapplications like Splunk and Kafka, the explosion of data can be compounded by other issues. First, thetraditional distributed scale-out model with direct-attached storage requires multiple copies of data to be stored, driving up storage needs even further. Second, many organizations are retaining their data for longer periods of time for security and/or compliance reasons. These trends createmany challenges, including:

Beyond the challenges presented by legacy architectures, organizations often have other challenges. Large organizations often have Splunk and Kaka platforms in both on-prem and multi-cloud environments. Managing the differences between these environments creates complexity for Splunk and Kafka administrators, architects and engineers.

Value of Intel Select Solutions for Splunk and Kafka on Kubernetes

Many organizations understand the value of Kubernetes, which offers portability and flexibility and works with almost any type of container runtime. It has become the standard across organizations for running cloud-native applications; 69% of respondents from a recent Cloud-Native Computing Foundation (CNCF) survey reported using Kubernetes in production.2 To support their customers desire to deploy Kubernetes, Confluent developed Confluent for Kubernetes, and Splunk led the development of Splunk Operator for Kubernetes.

In addition, Splunk and Confluent have developed new storage capabilities: Splunk SmartStore and Confluent Tiered Storage, respectively. These capabilities use S3compliant object storage to reduce the cost of massive data sets. In addition, organizations can maximize data availability by placing data in centralized S3 object storage, while reducing application storage requirements by storing a single copy of data that was moved to S3, relying on the S3 platform for data resiliency.

The cloud-native technologies underlying this reference architecture enable systems to quickly process the large amounts ofdata todays workloads demand; improve resource utilization and operational efficiency; and help simplify the deployment and management of Splunk andKafkacontainers.

Solution architecture highlights

We designed our reference architecture to take advantage of the previously mentioned new Splunk and Kafka products and technologies. We ran tests with a proof of concept (PoC) designed to assess Kafka and Splunk performance running on Kubernetes with servers based on high-performance Intel architecture and S3-compliant storage supported by Pure Storage FlashBlade.

Figure 1 illustrates the solution architecture at a high level. The critical software and hardware products and technologies included in this reference architecture are listed below:

Additional information about some of these components is provided in the A Closer Look at Intel Select Solutions for Splunk and Kafka on Kubernetes section that follows.

Figure 1. The solution reference architecture uses high-performance hardware and cloud-native software to help increase performance and improve hardware utilization and operational efficiency.

A Closer Look at Intel Select Solutions for Splunk and Kafka on Kubernetes

The ability to run Splunk and Kafka on the same Kubernetes cluster connected to S3-compliant flash storage unleashes seamless scalability with an extraordinary amount of performance and resource utilization efficiency. The following sections describe some of the software innovations that make this possible.

Confluent for Kubernetes and Confluent TieredStorage

Confluent for Kubernetes provides a cloud-native, infrastructure-as-code approach to deploying Kafka on Kubernetes. It goes beyond the open-source version of Kubernetes to provide a complete, declarative API to build a private cloud Kafka service. It automates the deployment of Confluent Platform and uses Kubernetes to enhance the platforms elasticity, ease of operations and resiliency for enterprises operating at any scale.

Confluent Tiered Storage architecture augments Kafka brokers with the S3 object store via FlashBlade, storing data on the FlashBlade instead of the local storage. Therefore, Kafka brokers contain significantly less state locally, making them more lightweight and rebalancing operations orders of magnitude faster. Tiered Storage simplifies the operation and scaling of the Kafka cluster and enables the cluster to scale efficiently to petabytes of data. With FlashBlade as the backend, Tiered Storage has the performance to make all Kafka data accessible for both streaming consumers and historical queries.

Splunk Operator for Kubernetes and SplunkSmartStore

The Splunk Operator for Kubernetes simplifies the deployment of Splunk Enterprise in a cloud-native environment that uses containers. The Operator simplifies the scaling and management of Splunk Enterprise by automating administrative workflows using Kubernetes best practices.

Splunk SmartStore is an indexer capability that provides a way to use remote object stores to store indexed data. SmartStore makes it easier for organizations to retain data for a longer period of time. Using FlashBlade as the high-performance remote object store, SmartStore holds the single master copy of the warm/cold data. At the same time, a cache manager on the indexer maintains the recently accessed data. The cache manager manages data movement between the indexer and the remote storage tier. The data availability and fidelity functions are offloaded to FlashBlade, which offers N+2 redundancy.4

Remote Object Storage Capabilities

Pure Storage FlashBlade is a scale-out, all-flash file and object storage system that is designed to consolidate complete data silos while accelerating real-time insights from machine data using applications such as Splunk and Kafka. FlashBlades ability to scale performance and capacity is based on five key innovations:

A complete FlashBlade system configuration consists of up to 10 self-contained rack-mounted servers. A single 4U chassis FlashBlade can host up to 15 blades and a full FlashBlade system configuration can scale up to 10 chassis (150 blades), potentially representing years of data for even higher ingest systems. Each blade assembly is a selfcontained compute module equipped with processors, communication interfaces and either 17TB or 52 TB of flash memory for persistent data storage. Figure 2 shows how the reference architecture uses Splunk SmartStore andFlashBlade.

Figure 2. Splunk SmartStore using FlashBlade for the remote object store.

Proof of Concept Testing Process andResults

The following tests were performed in our PoC:

For all the tests, we used Intel ITs real-world high-cardinality production data from sources such as DNS, Endpoint Detection and Response (EDR) and Firewall, which were collected into Kafka and ingested into Splunk through Splunk Connect for Kafka.

Test #1: Application Performance and InfrastructureUtilization

In this test, we compared the performance of a baremetal Splunk and Kafka deployment to a Kubernetes deployment. The test consisted of reading data from four Kafka topics and ingesting that data into Splunk, while dense searches were scheduled to run every minute.

Bare-Metal Performance

We started with a bare-metal test using nine physical servers. Five nodes served as Splunk indexers, three nodes as Kafka brokers and one node served as a Splunk search head. With this bare-metal cluster, the peak ingest rate was 301 MBps, while simultaneously finishing 90 successful Splunk dense searches per minute (60 in cache, 30 from FlashBlade), with an average CPU utilization of 12%. The average search runtime for the Splunk dense search was 22seconds.

Addition of Kubernetes

Next, we deployed Splunk Operator for Kubernetes and Confluent for Kubernetes on the same nine-node cluster. Kubernetes spawned 62 containers: 35 indexers, 18 Kafka brokers and nine search heads. With this setup, we reached a peak Splunk ingest rate of 886 MBps, while simultaneously finishing 400 successful Splunk dense searches per minute (300 in cache, 100 from FlashBlade), with an average CPU utilization of 58%. Theaverage search runtime for the Splunk dense search was 16 secondsa 27% decrease from the Splunk average search time on the bare-metal cluster. Figure 3 illustrates the improved CPU utilization gained from containerization using Kubernetes. Figure 4 shows the high performance enabled by the reference architecture.

Figure 3. Deployment of the Splunk Operator for Kubernetes and Confluent for Kubernetes enabled 62Splunk and Kafka containers on the nine physical serversinthe PoC cluster.

Figure 4. Running Splunk Operator for Kubernetes and Confluent for Kubernetes enabled up to 2.9X higher ingest rate, up to 4x more successful dense searches, and a 27% reduction in average Splunk search time, compared to the bare-metal cluster.

Test #2: Data Ingest from Kafka Local Storage versus Confluent Tiered Storage

Kafkas two key functions in event streaming are producer (ingest) and consumer (search/read). In the classic Kafka setup, the produced data is maintained at the broker's local storage, but with Tiered Storage, Confluent offloads the data from the Tiered Storage to the object store and enables infinite retention. If any consumer is looking for data that is not in the local storage, the data would be downloaded from the object storage.

To compare the consumer/download performance, we started the Splunk Connect workers for Kafka after one hour of data ingestion into Kafka with all data on the local SSD storage. The Connect workers read the data from Kafka and forwarded it to the Splunk indexers, where we measured the ingest throughput and elapsed time to load all the unconsumed events. During this time, Kafka read the data from the local SSD storage, and Splunk was also writing the hot buckets into the local SSD storage that hoststhe hot tier.

We repeated the same test when the topic was enabled with Tiered Storage by starting the Splunk Connect workers for Kafka, which initially read the data out of FlashBlade and later from the local SSD storage for the last 15 minutes. We then measured the ingest throughput and the elapsed time to load all the unconsumed events.

As shown in Figure 5, there is no reduction in the Kafka consumer performance when the broker data is hosted on Tiered Storage on FlashBlade. This reaffirms that offloading Kafka data to the object store, FlashBlade, gives not only similar performance for consumers but also the added benefit of longer retention.

Figure 5. Using Confluent Tiered Storage with FlashBlade enables longer data retention while maintaining (or even improving) the ingest rate.

Test #3: Splunk Super-Sparse Searches in SplunkSmartStore

When data is in the cache, Splunk SmartStore searches are expected to be similar to non-SmartStore searches. When data is not in the cache, search times are dependent on the amount of data to be downloaded from the remote object store to the cache. Hence, searches involving rarely accessed data or data covering longer time periods can have longer response times than experienced with non-SmartStore indexes. However, FlashBlade accelerates the download time considerably in comparison to any other cheap-and-deep object storage available today.4

To demonstrate FlashBlades ability to accelerate downloads, we tested the performance of a super-sparse search (the equivalent of finding a needle in a haystack); the response time of this type of search is generally tied to I/O performance. The search was initially performed against the data in the Splunk cache to measure the resulting event counts. The search returned 64 events out of several billion events. Following this, the entire cache was evicted from all the indexers, and the same super-sparse search was issued again, which downloaded all the required data from FlashBlade into the cache to perform the search. We discovered that FlashBlade supported a download of 376 GB in just 84 seconds with a maximum download throughput of 19 GBps (see Table 1).

Table 1. Results from Super-Sparse Search

Results

Downloaded Buckets

376 GB

Elapsed Time

84 seconds

Average Download Throughput

4.45 GBps

Maximum Download Throughput

19 GBps

A super-sparse search downloading

376 GB in 84 Seconds

Configuration Summary

Introduction

The previous pages provided a high-level discussion of the business value provided by Intel Select Solutions for Splunk andKafka on Kubernetes, the technologies used in the solution and the performance and scalability that can be expected. This section provides more detail about the Intel technologies used in the reference design and the bill of materials for building the solution.

Intel Select Solutions for Splunk and Kafka on Kubernetes Design

The following tables describe the required components needed to build this solution. Customers must use firmware with the latest microcode. Tables 2, 3 and 4 detail the key components of our reference architecture and PoC. Theselection of software, compute, network, and storage components was essential to achieving the performance gains observed.

Table 2. Key Server Components

Component

Description

CPU

2x Intel Xeon Platinum 8360Y (36 cores, 2.4 GHz)

Memory

16x 32 GB DDR4 @ 3200 MT/s

Storage (Cache Tier)

1x Intel Optane SSD P5800x (1.6 TB)

Storage (Capacity Tier)

1x SSD DC P4510 (4 TB)

Boot Drive

1x SSD D3-S4610 (960 GB)

Network

Intel Ethernet Network Adapter E810-XXVDA2 (25 GbE)

Table 3. Software Components

Software

Version

Kubernetes

1.23.0

Splunk Operator for Kubernetes

1.0.1

Splunk Enterprise

8.2.0

Splunk Connect for Kafka

2.0.2

Confluent for Kubernetes

2.2.0

Confluent Platform

7.0.1 using Apache Kafka 3.0.0

Table 4. S3 Object Storage Components

Read more from the original source:
Improving Splunk and Kafka Platforms with Cloud-Native Technologies - InfoWorld

Related Posts

Comments are closed.