Compare Architectures
ELK Stack Architecture
The Elastic architecture uses a variety of methods (Beats, syslog, etc.) to forward data to Logstash, which processes the data and sends it to the Elasticsearch tier (based on Apache Lucene) for indexing and storage. Kibana is used for visualization. In production installations, Kafka is used as the message broker between Beats and Logstash, and to ensure incoming data from Beats is not lost. Zookeeper is used for Kafka configuration and synchronization. A diagram of these components is shown below.
Advantages of ELK Stack:
One advantage of running Elastic in a SaaS deployment is having the complex work of deploying and managing all the components handled as part of the service. This alleviates a significant amount of operational pain for the customer, enabling them to focus on leveraging the data. But this approach is not without its downside.
Disadvantages of Elastic Stack:
The disadvantages of a SaaS deployment of this complex architecture involve scalability, performance, and cost. Because Elastic in the cloud is a lift-and-shift of its on-prem architecture, all the same challenges of on-prem deployment carry over to the cloud. Since data must be indexed and parsed before it is searchable, there can be a significant lag from when the data is ingested to when it is available for search.
This problem becomes more acute during big bursts of data on the ingest side. The lag can be especially problematic when analysts are investigating high priority operational or security issues. Because data must be parsed prior to ingest, its format must be known. If the format changes, the data must be reindexed, a lengthy process that impacts search. Another capability to benchmark is search performance. When an index becomes large (>50GB), Elastic splits it into “shards.” Searching data across shards can be slow, and performance degrades as more data is ingested. Finally, you must consider the storage footprint. Depending on how you decide to index data (truncate original data or not), your ratio of ingested data to index size can be anywhere from .5x to 1.4x. This results in large storage requirements for hot data. For this reason, buyers typically opt to keep hot data for only about 30 days. It is possible to search against a warm data tier, but performance is not the same. Inconsistent search speeds due to the way data is tiered and sharded can result in slow search performance, especially for historical data. This can impact the speed of threat investigations to find the first instance of an IOC. Similarly, the rate at which threat hunting queries can be answered is compromised when querying across large data sets.
Splunk Architecture
The Splunk architecture is similar to Elastic’s, but it is slightly simpler. Data sources send information to Splunk’s heavy forwarder via Rsyslog, NXLog, etc. The heavy forwarder pre-filters the data before sending it to the Splunk indexer. Search heads distribute searches to one or more indexers and search results are available directly from the browser via Splunk’s web interface. A simple version of this architecture is shown in the below diagram.
Advantages of Splunk:
Like Elastic, Splunk was designed for on-premises deployment, so its cloud offering also is a lift-and-shift version of the original. The main advantage of Splunk’s architecture compared to Elastic’s is simplicity. Although they are similar, Splunk has fewer moving parts, which results in fewer cloud instances to support. But because of the similarities, many of the same drawbacks exist. Since data must be indexed before it can be searched, there are potential delays to queries, especially during bursts of data. When data models are used, there can be up to 15 minutes of lag between data ingest and searchability.
Disadvantages of Splunk:
Each indexer can’t ingest much more than 250GB a day, which necessitates a large number of indexers at scale. This significantly increases cloud infrastructure costs, which Splunk passes on to the customer. Lastly, each of Splunk’s add-on applications (IT Service Insights, Enterprise Security) requires additional data models to run against the indexers, further slowing search performance. This diminished search performance affects everything from single user queries to dashboard refreshes.