Search Command> stats, eventstats and streamstats | Splunk (2024)

Getting started with stats, eventstats and streamstats

When I first joined Splunk, like many newbies I needed direction on where to start. Someone gave me some excellent advice:

“Learn the stats and eval commands.”

Putting eval aside for another blog post, let’s examine the stats command. It never ceases to amaze me how many Splunkers are stuck in the “super grep” stage. They just use Splunk to search (happily I might add) for keywords and phrases over many sources of machine data. Hopefully this will help advance some folks beyond “super grep” as well as assist those who may be new to Splunk.

When you dive into Splunk’s excellent documentation, you will find that the stats command has a couple of siblings — eventstats and streamstats. In this blog post, I will attempt, by means of a simple web log example, to illustrate how the variations on the stats command work, and how they are different. Stats typically gets a lot of use, but I’ll use it to set the stage for eventstats and streamstats which don’t get as much use. Reference documentation links are included at the end of the post. I will take a very basic, step-by-step approach by going through what is happening with the stats command, and then expand on that example to show how stats differs from eventstats and streamstats. In an effort to keep it simple, I’ll limit the data of interest to five (5) events with the head command. (If you’re cool with stats, scroll on down to eventstats or streamstats.)

As the name implies, stats is for statistics. Per the Splunk documentation:

Description:
Calculate aggregate statistics over the dataset, similar to SQL aggregation. If called without a by clause, one row is produced, which represents the aggregation over the entire incoming result set. If called with a by-clause, one row is produced for each distinct value of the by-clause.

There are also a number of statistical functions at your disposal, avg() , count() , distinct_count() , median() , perc<int>() , stdev() , sum() , sumsq() , etc. just to name a few.

So let’s look at a simple search command that sums up the number of bytes per IP address from some web logs.

To begin, do a simple search of the web logs in Splunk and look at 5 events and the associated byte count related to two ip addresses in the field clientip.

sourcetype=access_combined* | head 5

The fields (and values of those fields) of interest are as follows:

STATS

Splunk users will notice the raw log events in the results area, as well as a number of fields (in addition to bytes and clientip) listed in a column to the left on the screen shot above. Right now we are just interested in the number of bytes per clientip. Using the stats command and the sum function, I can compute the sum of the bytes for each clientip. I’ll also rename the result to be “ASimpleSumOfBytes” so that it stands out. In addition, I’ll make it easy to find alphabetically, I’ll prefix it with an “A”.

sourcetype=access_combined* | head 5 | stats sum(bytes) as ASimpleSumOfBytes by clientip

To understand what happened with the above search take a look at the “search pipeline” section of the Search Manual in the Splunk documentation and pay attention to intermediate tables, as well as the different types of search commands.

Splunk computes the statistics, in this case “sum” and puts them in a table along with the relevant client IP addresses. This is wonderful and easy, but what if one wishes to build on this and is interested in aggregating the original byte count (or any other related field) in a table such as this:

sourcetype=access_combined* | head 5 | stats sum(bytes) as ASimpleSumOfBytes by clientip | table bytes, ASimpleSumOfBytes, clientip

Hmmm. What happened to my bytes field? Also explained in the documentation is the anatomy of a search. With each Splunk command or term, an intermediate table is produced without the user having to issue any command to allocate the tables. If we wish to add any of the original fields (like bytes) or perform additional calculations on original fields, they would have to be placed before the stats command. See what happens in the screen shot above when we try to add the bytes field to the end of the search command string. This will make the use case for eventstats.

EVENTSTATS

Notice that the bytes column is empty above, because once the table is created by the stats command, Splunk now knows nothing about the original bytes field earlier in the pipeline. This is where eventstats can be helpful. The Splunk command, eventstats, computes the requested statistics like stats, but aggregates them to the original raw data as shown below:

sourcetype=access_combined* | head 5 | eventstats sum(bytes) as ASimpleSumOfBytes by clientip

Now just like stats there are two values ( one for each clientip ) for ASimpleSumOfBytes, but they are aggregated to the raw events and can be used for later calculation. Just a note, your raw data is untouched. The aggregation is just a presentation feature that you get with eventstats.

If I want to add the bytes field for each of the event along with the summation and the clientip, I can easily create the table that failed with stats. Note that the sum of all the bytes per clientip is included along side each of the original bytes value. As the following search illustrates:

sourcetype=access_combined* | head 5 |sort _time | eventstats sum(bytes) as ASimpleSumOfBytes by clientip | table bytes, ASimpleSumOfBytes, clientip

STREAMSTATS

Having the statistics aggregated onto the original events is great, but what if one is interested in what is happening in a streaming manner, or as Splunk sees the events in time. Streamstats is your command. To help visualize this, I’m sorting time in ascending order. I’ll use the “_time” internal field, and then try out streamstats:

sourcetype=access_combined* | head 5 | sort _time | streamstats sum(bytes) as ASimpleSumOfBytes by clientip

Like eventstats, streamstats aggregates the statistics to the original data, so all of the original data is accessible for further calculations, should we wish. By including time and the original byte count in the table below, we can better see what is going on with the streamstats command.

sourcetype=access_combined* | head 5 |sort _time | streamstats sum(bytes) as ASimpleSumOfBytes by clientip | table _time, clientip, bytes, ASimpleSumOfBytes

As shown in the screen shot above, instead of a total sum for each clientip (as in stats and eventstats), there is a sum for each event as it is seen in time, each one building on the other. Also note that two of these match the total sum calculated in stats and eventstats for each clientip.

The difference here is that the value of the calculated field “ASimpleSumOfBytes ” varies depending on the time that Splunk sees the event at a specific moment. Where is this helpful? As it turns out, lots of questions that folks have about their data concerns what is going on at a specific moment or range in time. Streamstats is extremely useful for this kind of searching and reporting.

Below I have included some links to the Splunk Documentation and Answers communities. Check them out. I hope this has been helpful.

Happy Splunking

References:

About the Splunk search language
http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutthesearchlanguage

Anatomy of a Splunk search
http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutthesearchpipeline#The_anatomy_of_a_search

Answers post to help understand visualizing time in Splunk, related to streamstats
http://answers.splunk.com/answers/105733/streamstats-is-reversed

Splunker finds a cool use for streamstats
http://blogs.splunk.com/2013/10/31/streamstats-example

The stats page in the Splunk docs
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Stats

Functions that work with Stats
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonStatsFunctions

Search Command&gt; stats, eventstats and streamstats | Splunk (9)

Splunk

The world’s leading organizations trustSplunkto help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learnwhat Splunk doesandwhy customers choose Splunk.

Search Command&gt; stats, eventstats and streamstats | Splunk (2024)

FAQs

What is the difference between Splunk Eventstats and Streamstats? ›

By analyzing data in real time, Splunk users can quickly identify trends, patterns, and anomalies that would be difficult to detect with traditional analysis methods. eventstats is particularly useful for analyzing historical data, while 'streamstats' is designed for real-time data analysis.

What is the eventstats command in Splunk? ›

The eventstats command looks for events that contain the field that you want to use to generate the aggregation. The command creates a new field in every event and places the aggregation in that field. The aggregation is added to every event, even events that were not used to generate the aggregation.

How do I get stats in Splunk? ›

The SPL2 stats command calculates aggregate statistics, such as average, count, and sum, over the incoming search results set. This is similar to SQL aggregation. If the stats command is used without a BY clause, only one row is returned, which is the aggregation over the entire incoming result set.

How to use top command in Splunk? ›

Top Values for a Field by a Field

Next, we can also include another field as part of this top command's by clause to display the result of field1 for each set of field2. In the below search, we find top 3 productids for each file name.

What are the 4 types of searches in Splunk by performance? ›

How search types affect Splunk Enterprise performance
Search typeRef. indexer throughputPerformance impact
DenseUp to 50,000 matching events per second.CPU-bound
SparseUp to 5,000 matching events per second.CPU-bound
Super-sparseUp to 2 seconds per index bucket.I/O bound
RareFrom 10 to 50 index buckets per second.I/O bound

What is the function of Streamstats in Splunk? ›

The SPL2 streamstats command adds a cumulative statistical value to each search result as each result is processed. For example, you can calculate the running total for a particular field, or compare a value in a search result with a the cumulative value, such as a running average.

What is the limit of eventstats? ›

By default, eventstats can aggregate up to 50,000 events at a time. You can change this limit with the MaxNoOfa*ggregatedEvents parameter.

What is the difference between stats and transaction commands in Splunk? ›

Stats provides the aggregation. transaction provides the unique number / count. Like you perform 10 steps as part of one transaction.

What is the difference between stats and chart command in Splunk? ›

Note that you can specify any number of "group by" fields to the stats command, whereas the chart/timechart command can only have one "group by" (with timechart it is always _time) and one "split by". This is why our first example was able to incorporate the "host" field easily whereas the second example did not.

How do I search data in Splunk? ›

To search on a keyword, select the Keyword tab, type the keyword or phrase you want to search on, then press Enter. If you want to search on a field, select the Fields tab, enter the field name, then press Enter. To continue adding keywords or fields to the search, select Add Filter.

What does the stats command do? ›

The stats command is used to calculate summary statistics on the results of a search or the events retrieved from an index. The stats command works on the search results as a whole and returns only the fields that you specify.

What are eventstats in Splunk? ›

The SPL2 eventstats command generates summary statistics from fields in your events and saves those statistics into a new field. The eventstats command places the generated statistics in new field that is added to the original raw events.

What is the rare command in Splunk? ›

Rare Command Syntax:

Where: <field>: Specifies the field to analyze. <limit> (optional): Sets the maximum number of results to display (default is 10). <by> (optional): Specifies a secondary field to further refine the analysis.

What is the most efficient way to limit search results returned in Splunk? ›

One of the most effective ways to limit the data that is pulled off from disk is to limit the time range. Use the time range picker or specify time modifiers in your search to identify the smallest window of time necessary for your search.

What is the difference between events and statistics in Splunk? ›

The difference is that with the eventstats command aggregation results are added inline to each event and added only if the aggregation is pertinent to that event. let me know if this helps ! stats - Calculates aggregate statistics over the results set, such as average, count, and sum.

What are the three types of Splunk authentication? ›

About user authentication
SchemeSplunk platform types
Native Splunk authenticationall
Lightweight Directory Access Protocol (LDAP)all
Security Assertion Markup Language (SAML)all
Multi-factor authenticationSplunk Enterprise
1 more row
May 1, 2024

What are the different types of indexes in Splunk? ›

There are two types of indexes: Events indexes. Events indexes are the default type of index. They can hold any type of data.

What is a Splunk streaming command? ›

A streaming command applies a transformation to each event returned by a search. For example, the rex command is streaming because it extracts and adds fields to events at search time.

Top Articles
Latest Posts
Article information

Author: Manual Maggio

Last Updated:

Views: 5468

Rating: 4.9 / 5 (49 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.