Skip to content

Splunk vs. ELK Stack: A Side-by-Side Comparison

Compare the advantages and disadvantages of each vendor in terms of architecture, features and pricing to see who comes out on top.

Architecture

Elastic Stack Architecture

The Elastic architecture uses a variety of methods (Beats, syslog, etc.) to forward data to Logstash, which processes the data and sends it to the Elasticsearch tier (based on Apache Lucene) for indexing and storage. Kibana is used for visualization. In production installations, Kafka is used as the message broker between Beats and Logstash, and to ensure incoming data from Beats is not lost. Zookeeper is used for Kafka configuration and synchronization. A diagram of these components is shown below.

Advantages of Elastic Stack:

One advantage of running Elastic in a SaaS deployment is having the complex work of deploying and managing all the components handled as part of the service. This alleviates a significant amount of operational pain for the customer, enabling them to focus on leveraging the data. But this approach is not without its downside.

Disadvantages of Elastic Stack:

The disadvantages of a SaaS deployment of this complex architecture involve scalability, performance, and cost. Because Elastic in the cloud is a lift-and-shift of its on-prem architecture, all the same challenges of on-prem deployment carry over to the cloud. Since data must be indexed and parsed before it is searchable, there can be a significant lag from when the data is ingested to when it is available for search.

This problem becomes more acute during big bursts of data on the ingest side. The lag can be especially problematic when analysts are investigating high priority operational or security issues. Because data must be parsed prior to ingest, its format must be known. If the format changes, the data must be reindexed, a lengthy process that impacts search. Another capability to benchmark is search performance. When an index becomes large (>50GB), Elastic splits it into “shards.” Searching data across shards can be slow, and performance degrades as more data is ingested. Finally, you must consider the storage footprint. Depending on how you decide to index data (truncate original data or not), your ratio of ingested data to index size can be anywhere from .5x to 1.4x. This results in large storage requirements for hot data. For this reason, buyers typically opt to keep hot data for only about 30 days. It is possible to search against a warm data tier, but performance is not the same. Inconsistent search speeds due to the way data is tiered and sharded can result in slow search performance, especially for historical data. This can impact the speed of threat investigations to find the first instance of an IOC. Similarly, the rate at which threat hunting queries can be answered is compromised when querying across large data sets.

Splunk Architecture

The Splunk architecture is similar to Elastic’s, but it is slightly simpler. Data sources send information to Splunk’s heavy forwarder via Rsyslog, NXLog, etc. The heavy forwarder pre-filters the data before sending it to the Splunk indexer. Search heads distribute searches to one or more indexers and search results are available directly from the browser via Splunk’s web interface. A simple version of this architecture is shown in the below diagram.

Advantages of Splunk:

Like Elastic, Splunk was designed for on-premises deployment, so its cloud offering also is a lift-and-shift version of the original. The main advantage of Splunk’s architecture compared to Elastic’s is simplicity. Although they are similar, Splunk has fewer moving parts, which results in fewer cloud instances to support. But because of the similarities, many of the same drawbacks exist. Since data must be indexed before it can be searched, there are potential delays to queries, especially during bursts of data. When data models are used, there can be up to 15 minutes of lag between data ingest and searchability.

Disadvantages of Splunk:

Each indexer can’t ingest much more than 250GB a day, which necessitates a large number of indexers at scale. This significantly increases cloud infrastructure costs, which Splunk passes on to the customer. Lastly, each of Splunk’s add-on applications (IT Service Insights, Enterprise Security) requires additional data models to run against the indexers, further slowing search performance. This diminished search performance affects everything from single user queries to dashboard refreshes.

Compare Devo vs. Splunk vs. Elastic

Read why Devo is the clear choice for centralized logging in cloud-first organizations.

Features & Functionality

Elastic Stack Features

As an open-source product, Elastic provides a rich feature set right out of the box. You can use Elastic not just for log data, but also for SIEM, ITOps, and APM use cases. Infrastructure metrics, such as CPU and memory utilization, can be combined with logs to troubleshoot infrastructure. Elastic can import data from a distributed tracing system such as Zipkin to help troubleshoot slow application performance. And Elastic has a SIEM module that includes a detection engine, threat hunt capability, case management functions, and some basic endpoint security. Elastic also has ML algorithms that spot anomalous behavior or activity to aid in detections. With the exception of endpoint security—which is only available with Elastic’s Enterprise-level subscription—all of Elastic’s features are out of the box.

Elastic does not include a SOAR (security orchestration and automated response) platform as part of its solution but does offer tight integration with IBM’s Resilient SOAR product.

Splunk Features

Splunk takes a modular approach to functionality. If you want the ability to perform infrastructure monitoring, you have to pay extra for the Splunk ITSI (IT Service Intelligence) premium app. The same is true of Splunk Enterprise Security. The end result is Splunk’s cloud offering includes the same features as the Elastic Cloud offering, but you need to pay more to access them.

Until October 2019, Splunk did not include distributed tracing as part of ITSI, but its acquisition of SignalFx rectified this. Splunk does have a SOAR platform (Phantom), but it is not included in Splunk Cloud. Users must run Phantom in their own AWS, GCP, or Azure cloud environment and integrate it with Splunk Cloud.

Pricing

It is challenging to compare the costs of these solutions because each company prices its products using different methods. Therefore, rather than compare specific costs, we will compare the pricing models themselves for fairness, simplicity, and predictability.

Elastic Stack Pricing

Elastic’s cost model is based on infrastructure costs for compute, storage, and memory plus three tiers of support. Elastic’s underlying architecture makes it challenging to size the solution for your environment. There is a cost estimator tool on Elastic’s website: https://cloud.elastic.co/pricing.

There are different estimator tools for “Observability,” “Security,” and “Classic ELK Stack.” You have the option to pick your cloud provider (AWS, Azure, GCP), and availability zone. You then need to decide how many CPUs, how much memory, and how many instances you need. Next, you must select if you require fault tolerance in multiple zones. This process must be repeated for additional components such as APM. Here is a screenshot of Elastic’s price estimator tool.

As mentioned above in the Architecture section, Elastic requires a significant number of infrastructure components. Deploying it across multiple availability zones for fault tolerance doubles this already large infrastructure count. Lastly, because Elastic’s indexes don’t compress well, you also need a lot of storage—even when storing just a short time period of data. The final component is Elastic’s levels of support, which can be found here: https://www.elastic.co/pricing/. Note that you need their highest support package to get access to their Endpoint Security capabilities. So, even though Elastic is open source, the cloud infrastructure costs can add up quickly

Splunk Pricing

Splunk’s pricing models are complicated and depend on many options. Broadly speaking, they break down into ingest pricing or resource pricing. The resource based pricing model is based mostly on the compute power for your searches. Compute resources are based on a combination of your total logical cores, multiplied by “a premium data %,” and added to your premium cores. You also need to be very aware of what searches you regularly run and when those searches run, since you’ll need to allocate dedicated compute power for those priority searches. You also have to go through this exercise for Splunk’s core product, IT Service Intelligence, and Enterprise Security individually. The idea is that compute power is less important for ingestion compared to searches—so paying for compute cores dedicated to searching is a better value. But again, it’s hard to know what data is important until you need it, and this model could result in slower performance for data that you hadn’t already designated as “premium.” Lastly, if you don’t add compute power as your total amount of data volume increases, it most likely will result in slower search times as data volume grows. This resource based pricing model is new for Splunk and it remains to be seen how popular it will be.

Historically, Splunk’s pricing model has been based on data ingest volume. This is the pricing model most customers currently have. But this model also has many factors that come into play, and each one adds to your total cost. Splunk charges extra for each “premium application,” such as IT Service Intelligence and Enterprise Security, and the cost of the premium application rises as the volume of data increases. Splunk also charges extra to encrypt data at rest, and the more data you have the higher the expense. They charge extra for additional storage— in 500GB blocks—to store data for longer historical periods. And all of these individual charges have two tiers based on whether you want the “Standard” or “Premium” plan. Before jumping to Splunk Cloud be sure to fully understand which applications you will need to address your use cases, how much historical data you require to be stored hot, and account for the cost of encrypting your data at rest, as well as support costs.

Conclusion

Centralized log management has proven its value in a variety of use cases from ITOps to SecOps and more. Although centralized log management solutions have always posed challenges, today’s SaaS delivery model removes most of them and makes it easier than ever to deploy and run a centralized log management solution. For those comparing ELK Stack vs Splunk, Devo is the best log management solution for medium-sized to large buyers with multiple environments (data centers and multiple-cloud environments) because it delivers the most modern and efficient architecture, offers a rich feature set, and has the most attractive cost model. To learn more how Devo stacks up against Splunk and Elastic, download the Buyer’s Guide for Centralized Log Management.