What is tstats and why is so much faster than stats? (2024)

tstats is faster than stats since tstats only looks at the indexed metadata (the .tsidx files in the buckets on the indexers) whereas stats is working off the data (in this case the raw events) before that command.

Since tstats can only look at the indexed metadata it can only search fields that are in the metadata. By default, this only includes index-time fields such as sourcetype, host, source, _time, etc.

You can do this:

| tstats count by index sourcetype source

But you can't do this:

| tstats count where status>200 by username

Since status and username are not index-time fields (they are search-time).

tstats can run on the index-time fields from the following methods:

  • An accelerated data models
  • A namespace created by the tscollect search command
  • Index-time fields manually via fields.conf, props.conf, and transforms.conf
  • INDEXED_EXTRACTIONS in props.conf for structured data like CSV

Generally, I recommend using accelerated data models.

References:
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/tstats
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/tscollect
http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Acceleratedatamodels
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Indextimeversussearchtime
http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureindex-timefieldextraction
http://docs.splunk.com/Splexicon:Tsidxfile

============================ EDIT #2 =============================

For even more in-depth articles then see

2016 talk by David Veuve:
2017 talk again by David Veuve:
2017 talk by me:

============================ EDIT #1 =============================

Since this seems to be an popular answer, I'll get in even more details:

For our example, let's use the out-of-the-box data model called "Splunk's Internal Server Logs - SAMPLE" at http://127.0.0.1:8000/en-US/app/search/data_model_editor?model=%2FservicesNS%2Fnobody%2Fsearch%2Fdat...

First, run a simple tstats on the DM (doesn't have to be accelerated) to make sure it's working and you get some result:

| tstats count from datamodel=internal_server

Note that I use the DM filename internal_server (ie Object ID), not the "pretty" name. If the DM isn't accelerated then tstats will translate to a normal search command, so the above command will run:

index=_internal source=*scheduler.log* OR source=*metrics.log* OR source=*splunkd.log* OR source=*license_usage.log* OR source=*splunkd_access.log* | stats count

The translation is defined by the base search of the DM (under "Constraints").
You can verify that you'll get the exact same count from both the tstats and normal search. Make sure you use the same fixed time range (ie from X to Y). Don't do "Last X minutes" since the time range will be different when you run the search ad-hoc.

If the data model is accelerated then the new *.tsidx indexed files are created on the indexers at $SPLUNK_DB$/<index_name>/datamodel_summary/<bucket_id>_<indexer_guid>/<search_head_guid>/DM_<app>_<data_model_name>.

If you wanna "see" what's inside these tsidx files then you can do something like:

./splunk cmd walklex /opt/splunk/var/lib/splunk/_internaldb/datamodel_summary/53_FC88CEC5-A07C-40AB-AC9A-6098C3336C42/FC88CEC5-A07C-40AB-AC9A-6098C3336C42/DM_search_internal_server/1457686104-1457046881-3297202533729992781.tsidx "*"

Anyway, tstats can basically accesses and searches on these special, DM-created tsidx files. You tell tstats which DM to use with the from datamodel=internal_server clause.

Now accelerate the internal_server DM if you haven't already. Pick a window big enough like 7 days and search the last 24 hours for testing. Once the DM is finished accelerating to 100%, then try running this:

| tstats count where index=_internal by sourcetype source

And note how much faster this is compared to

index=_internal | stats count by sourcetype source

If you don't see a big difference then either try increasing the time range/acceleration window or create your own DM on a much chattier data source/index.

Remember that everything has a cost. And in this case, you'll trading disk space to gain faster search. That's why the DM status in the UI tells you how much disk space the DM acceleration takes up. The size is how big the .tsidx files are across all indexers. But this still a great trade. Disks are cheap; CPUs are not.

As mentioned before, another drawback of tstats is that you can only access either the default index-time fields or custom index-time fields created by DM acceleration. And if you forgot to add a field in the DM then you must stop/delete the DM acceleration, modify the DM, then re-accelerate. On a big DM (like from a large Splunk Enterprise Security environment), this could take hours or days depending on how much data needs to be accelerated and how busy the servers are.

Note that tstats is like stats but more "SQL-like". It can take a where and a by clause too. For example:

| tstats count from datamodel=internal_server where source=*scheduler.log

Which happens to be the same as

| tstats count from datamodel=internal_server where nodename=server.scheduler

Because this DM has a child node under the the Root Event. The name, once again, comes from the "Object ID", not the pretty name label (ie use summaryindexing, not "Summary Indexing Searches").

Also note that every field you want to reference in the DM, must be prefixed by the node name Object ID server. Except the default 4 fields in the DM: _time, host, source, sourcetype. So you this won't work:

| tstats count from datamodel=internal_server by name current_size_kb

name and current_size_kb aren't one of the 4 default DM fields, so it must be server.name and server. current_size_kb. This is also the main reason I choose very short (usually one-letter) node names since it can become very annoying to write server. all the time.

One last thing worth mentioning is tstats performance. We all know tstats on an accelerated DMs is fast since it's mostly reading from disk and minimizing computation, but tstats isn't very good at returning many, many results. So although you can do this:

| tstats count from datamodel=foo by a.apple a.pear a.orange _time span=1s

You really shouldn't. tstats can't return raw events and trying to "trick" it to return raw events by using span=1s is going against its design principle. You can be clever and get raw events if you use tstats inside of a subsearch like this:

index=data [| tstats count from datamodel=foo where a.name="hobbes" by a.id a.user | rename a.* as * | fields - count]

So basically tstats is really good at aggregating values and reducing rows. tstats will have as bad performance as a normal search (or worse) if your search isn't trying to reduce. For example, if you have 10 million rows in a DM and your tstats is grouping everything by _time span=1s and returning 8 million rows, then that's going to be a slow search even if your tstats is searching on an accelerated DM. But if your tstats is doing something like avg(all.foo) by all.group and returning only 1000 rows (but still searching on 10 million events) then it'll be blazing fast since it's reducing.

Also note that if you do by _time in tstats then tstats will automatically group _time based on the search time range similar to timechart (ie if you search the last 24 hours then the bucket/group size will be 30 minutes). You also can't go any granular than 1 second so all microseconds will be group together.

Lastly tstats has an prestats=t option, but that's another lesson for another day (prestats is like si-commands in Splunk). If you really want to know more then check out my 2017 conf talk "Speed up your search!".

Hopefully that is enough to get you started. If you get stuck then troubleshoot your tstats by keep removing extra clause until you get results again (like removing the by and where clauses). Eventually you'll end up with the most basic tstats command that will give you results:

| tstats count from datamodel=foo

then work backward again until you spot where you get more than 0 results. Common pitfalls include

  • Typos (check your datamodel name)
  • Using the pretty name instead of the Object ID
  • Not including the prefix to the non-default DM fields (ie you need to do server.cpu_seconds or all.foo, not just cpu_seconds or foo, but you can just do sourcetype or source)
  • Your by clause include null events (common pitfalls in stats too); one way to remedy that is to create an evaluated field in the DM and do something like foo=coalesce(foo, "NULL")

Good luck and may your searches be fast!

View solution in original post

What is tstats and why is so much faster than stats? (2024)

FAQs

What is tstats and why is so much faster than stats? ›

tstats is faster than stats since tstats only looks at the indexed metadata (the . tsidx files in the buckets on the indexers) whereas stats is working off the data (in this case the raw events) before that command. Since tstats can only look at the indexed metadata it can only search fields that are in the metadata.

What is the difference between stats and Tstats? ›

In above example its calculating the sum of the value of “status” with respect to “method” and for next iteration its considering the previous value. tstats is faster than stats, since tstats only looks at the indexed metadata that is . tsidx files. (i.e., only metadata fields- sourcetype, host, source and _time).

What is the use of Tstats in Splunk? ›

The tstats command for hunting

The tstats command — in addition to being able to leap tall buildings in a single bound (ok, maybe not) — can produce search results at blinding speed. Much like metadata, tstats is a generating command that works on: Indexed fields (host, source, sourcetype and _time) Data models.

What is the difference between pivot and Tstats? ›

Other than the syntax, the primary difference between the pivot and tstats commands is that pivot is designed to be used only against datamodels and unlike tstats, doesn't require those datamodels to be accelerated (this is a big benefit for shipping app dashboards where you give the customer the choice of accelerating ...

What does using the tstats command with summariesonly t argument do? ›

Explanation: Using the tstats command with the summariesonly=t argument in Splunk primarily causes the command to output only the summary statistics that have been pre-aggregated through Splunk's data summarization feature called 'data model acceleration'.

What does Tstat mean in statistics? ›

In statistics, the t-statistic is the ratio of the difference in a number's estimated value from its assumed value to its standard error. It is used in hypothesis testing via Student's t-test. The t-statistic is used in a t-test to determine whether to support or reject the null hypothesis.

What does Tstats mean? ›

The tstats command, short for "tscollect statistics," is a versatile and high-performance command in Splunk that allows you to generate statistics from indexed data quickly. It's specifically designed for summarizing time-series data, making it ideal for analyzing time-based events, logs, and metrics.

What is the use of stats in Splunk? ›

The stats command is a fundamental Splunk command. It will perform any number of statistical functions on a field, which could be as simple as a count or average, or something more advanced like a percentile or standard deviation.

Why is Splunk the best? ›

Splunk empowers organizations with visibility across their entire digital footprint, surfacing key risks and detecting incidents so teams — supported by automation — can respond before they become major issues.

What do accelerating reports do? ›

Report acceleration lets you speed up searches by using cached data you create ahead of time. Report acceleration is used to accelerate individual reports and is easy to set up for any transforming search or report that runs over a large dataset.

Which pivot point is best? ›

Short time frames like 1-minute, 2-minute and 5-minute are the best for pivot point indicator. This makes pivot points more preferable to day traders. Pivot point indicators are amongst the best tools when accuracy is concerned. This is because of the fact that pivot points are so widely used.

What is one significant difference between charts and PivotCharts? ›

Source data Standard charts are linked directly to worksheet cells, while PivotCharts are based on their associated PivotTable's data source. Unlike a standard chart, you cannot change the chart data range in a PivotChart's Select Data Source dialog box.

What is the basic difference between pivot and pivot table? ›

pivot_table are both reshaping tools in Pandas, but they serve different purposes. First, pivot is for basic reshaping and requires unique index-column combinations without aggregation capabilities. On the other hand, pivot_table is designed for advanced reshaping.

What is tstats in Splunk? ›

Similar to the stats command, tstats will perform statistical queries on indexed fields in tsidx files. Significant search performance is gained when using the tstats command, however, you are limited to the fields in indexed data, tscollect data, or accelerated data models.

How do you accelerate data models in Splunk? ›

Enable data model acceleration

In Splunk Web, go to Apps > Manage Apps. Click on Set up in the row for Splunk Common Information Model. Click on the Settings tab. Select a data model that you want to accelerate.

What is acceleration in Splunk? ›

Data model acceleration is a tool that you can use to speed up data models that represent extremely large datasets. After acceleration, pivots based on accelerated data model datasets complete quicker than they did before, as do reports and dashboard panels that are based on those pivots.

What is the difference between z-test and t test ap stats? ›

A z-test is used to test a Null Hypothesis if the population variance is known, or if the sample size is larger than 30, for an unknown population variance. A t-test is used when the sample size is less than 30 and the population variance is unknown.

Is test statistic the same as t? ›

T-value is what statisticians refer to as a test statistic, and it is calculated from your sample data during hypothesis tests. It is then used to compare your data to what is expected under s.c. null hypothesis.

What does stats mean in Splunk? ›

The SPL2 stats command calculates aggregate statistics, such as average, count, and sum, over the incoming search results set. This is similar to SQL aggregation. If the stats command is used without a BY clause, only one row is returned, which is the aggregation over the entire incoming result set.

What is the difference between stats and statistics? ›

A statistic is the descriptor of a set of sample data. Statistics is the broader concept of the process of designing, comparing, interpreting, and analyzing data.

Top Articles
Latest Posts
Article information

Author: Melvina Ondricka

Last Updated:

Views: 5466

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Melvina Ondricka

Birthday: 2000-12-23

Address: Suite 382 139 Shaniqua Locks, Paulaborough, UT 90498

Phone: +636383657021

Job: Dynamic Government Specialist

Hobby: Kite flying, Watching movies, Knitting, Model building, Reading, Wood carving, Paintball

Introduction: My name is Melvina Ondricka, I am a helpful, fancy, friendly, innocent, outstanding, courageous, thoughtful person who loves writing and wants to share my knowledge and understanding with you.