TimeSeries table engine
A table engine storing time series, i.e. a set of values associated with timestamps and tags (or labels):
This is an experimental feature that may change in backwards-incompatible ways in the future releases.
Enable usage of the TimeSeries table engine
with allow_experimental_time_series_table setting.
Input the command set allow_experimental_time_series_table = 1.
Syntax
The keyword SAMPLES has an alias DATA which is kept for backwards compatibility.
Usage
It's easier to start with everything set by default (it's allowed to create a TimeSeries table without specifying a list of columns):
Then this table can be used with the following protocols (a port must be assigned in the server configuration):
Outer columns
Columns of a TimeSeries table are generated automatically. These are outer columns, they store no data, they just provide interface for SELECT/INSERT. Actual data is stored in target tables. Here is the list of the outer columns:
| Name | Type | Description |
|---|---|---|
metric_name | String | The name of the metric |
tags | Map(String, String) | Map of tags (labels) for the time series |
time_series | Array(Tuple(DateTime64(3), Float64)) by default | Array of (timestamp, value) pairs for a time series. The tuple's timestamp and scalar element types can be derived from the samples INNER COLUMNS declaration (see Specifying outer columns) |
metric_family | String | The name of the metric family (for metrics metadata) |
type | String | The type of the metric (e.g. "counter", "gauge") |
unit | String | The unit of the metric |
help | String | The description of the metric |
Example:
metric_name is allowed to be empty on insertion, that means the metric name is specified in tags under __name__, for example:
To insert metrics metadata, insert into the metric_family, type, unit, and help columns:
Specifying outer columns
The outer time_series column can be listed explicitly in a CREATE TABLE statement to override its default Array(Tuple(DateTime64(3), Float64)) type. ClickHouse extracts the timestamp and scalar types from the tuple and propagates them to the inner samples table:
This is equivalent to declaring the timestamp and value column types in the samples INNER COLUMNS clause directly:
If both forms are used in the same CREATE TABLE statement, the declared types must match.
Target tables
A TimeSeries table doesn't have its own data, everything is stored in its target tables.
This is similar to how a materialized view works,
with the difference that a materialized view has one target table
whereas a TimeSeries table has three target tables named samples, tags, and metrics.
The target tables can be either specified explicitly in the CREATE TABLE query
or the TimeSeries table engine can generate inner target tables automatically.
Rows inserted into a TimeSeries table are transformed, split into blocks, and inserted in these three target tables.
The target tables are the following:
Samples table
The samples table contains time series associated with some identifier.
The samples table must have columns:
| Name | Mandatory? | Default type | Possible types | Description |
|---|---|---|---|---|
id | [x] | UUID | any | Identifies a combination of a metric names and tags |
timestamp | [x] | DateTime64(3) | DateTime64(X) | A time point |
value | [x] | Float64 | Float32 or Float64 | A value associated with the timestamp |
Tags table
The tags table contains identifiers calculated for each combination of a metric name and tags.
The tags table must have columns:
| Name | Mandatory? | Default type | Possible types | Description |
|---|---|---|---|---|
id | [x] | UUID | any (must match the type of id in the samples table) | An id identifies a combination of a metric name and tags. The DEFAULT expression specifies how to calculate such an identifier |
metric_name | [x] | LowCardinality(String) | String or LowCardinality(String) | The name of a metric |
<tag_value_column> | [ ] | String | String or LowCardinality(String) or LowCardinality(Nullable(String)) | The value of a specific tag, the tag's name and the name of a corresponding column are specified in the tags_to_columns setting |
tags | [x] | Map(LowCardinality(String), String) | Map(String, String) or Map(LowCardinality(String), String) or Map(LowCardinality(String), LowCardinality(String)) | Map of tags excluding the tag __name__ containing the name of a metric and excluding tags with names enumerated in the tags_to_columns setting |
all_tags | [ ] | Map(String, String) | Map(String, String) or Map(LowCardinality(String), String) or Map(LowCardinality(String), LowCardinality(String)) | Ephemeral column, each row is a map of all the tags excluding only the tag __name__ containing the name of a metric. The only purpose of that column is to be used while calculating id |
min_time | [ ] | Nullable(DateTime64(3)) | DateTime64(X) or Nullable(DateTime64(X)) | Minimum timestamp of time series with that id. The column is created if store_min_time_and_max_time is true |
max_time | [ ] | Nullable(DateTime64(3)) | DateTime64(X) or Nullable(DateTime64(X)) | Maximum timestamp of time series with that id. The column is created if store_min_time_and_max_time is true |
Metrics table
The metrics table contains some information about metrics been collected, the types of those metrics and their descriptions.
The metrics table must have columns:
| Name | Mandatory? | Default type | Possible types | Description |
|---|---|---|---|---|
metric_family_name | [x] | String | String or LowCardinality(String) | The name of a metric family |
type | [x] | LowCardinality(String) | String or LowCardinality(String) | The type of a metric family, one of "counter", "gauge", "summary", "stateset", "histogram", "gaugehistogram" |
unit | [x] | LowCardinality(String) | String or LowCardinality(String) | The unit used in a metric |
help | [x] | String | String or LowCardinality(String) | The description of a metric |
Creation
There are multiple ways to create a table with the TimeSeries table engine.
The simplest statement
will actually create the following table (you can see that by executing SHOW CREATE TABLE my_table):
So the columns were generated automatically and also there are three inner target tables with their own column definitions
stored in the INNER COLUMNS clauses.
Inner target tables have names like .inner_id.samples.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx,
.inner_id.tags.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, .inner_id.metrics.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
and each target table has its own set of columns:
Creating a table AS existing table
Statement CREATE TABLE new_table AS existing_table copies from the existing_table:
SETTINGSINNER COLUMNSfor each kindINNER ENGINEfor each kind
The statement is not allowed if the existing_table has external targets.
The outer column list is regenerated and not copied.
Adjusting types of columns
You can adjust the types of columns in the inner target tables using the INNER COLUMNS clause. For example, to store timestamps in microseconds and values as Float32:
The same clause can be used to specify codecs and other column attributes:
The id column
The id column contains identifiers, every identifier is calculated for a combination of a metric name and tags.
The type and the DEFAULT expression used to generate identifiers can be customized via the TAGS INNER COLUMNS clause:
The id column type must be one of UUID, UInt64, UInt128, or FixedString(16). If no DEFAULT expression is given, ClickHouse will choose it automatically based on the id type. The id types declared in the samples and tags inner tables must match.
The id_generator setting offers the same customization without using the INNER COLUMNS clause:
If the setting is set, it's used to generate id even if the column's DEFAULT contains a different expression.
The tags and all_tags columns
There are two columns containing maps of tags - tags and all_tags. In this example they mean the same, however they can be different
if setting tags_to_columns is used. This setting allows to specify that a specific tag should be stored in a separate column instead of storing
in a map inside the tags column:
This statement will add columns instance and job to the inner tags target table.
In this case the tags column will not contain tags instance and job,
but the all_tags column will contain them. The all_tags column is ephemeral and its only purpose to be used in the DEFAULT expression
for the id column.
Table engines of inner target tables
By default inner target tables use the following table engines:
- the samples table uses MergeTree;
- the tags table uses AggregatingMergeTree because the same data is often inserted multiple times to this table so we need a way
to remove duplicates, and also because it's required to do aggregation for columns
min_timeandmax_time; - the metrics table uses ReplacingMergeTree because the same data is often inserted multiple times to this table so we need a way to remove duplicates.
Other table engines also can be used for inner target tables if it's specified so:
External target tables
It's possible to make a TimeSeries table use a manually created table:
The external tables' column types (id, timestamp, value, and the <tag_value_column>s listed in tags_to_columns) must match what the TimeSeries table would otherwise generate internally (see Samples table, Tags table, and Metrics table for the type constraints). Type mismatches are reported at CREATE time.
The id-generator expression for an external tags target is resolved at INSERT time in the following order: the id_generator setting (if set), then the DEFAULT declared on the external table's id column (if any), then the canonical generator derived from the id type. The setting therefore overrides whatever DEFAULT is declared on the external table — see The id column for details.
Altering settings
Two settings can be changed after CREATE:
id_generatorfilter_by_min_time_and_max_time
Note that changing id_generator while data is already in the tags table can produce different IDs for the same metric+tag combination — old rows keep their old IDs, new rows use the new generator.
The other settings can't be changed with ALTER ... MODIFY SETTING because they are baked into the schema of the inner tables at CREATE time.
Settings
Here is a list of settings which can be specified while defining a TimeSeries table:
| Name | Type | Default | Description |
|---|---|---|---|
id_generator | Expression | depends on id type | Expression that computes the identifier (fingerprint) of a time series from its tags. If unset, the default expression for the id column is used. If the default expression for the id column is also unset then the expression is chosen automatically |
tags_to_columns | Map | Map specifying which tags should be put to separate columns in the tags table. Syntax: {'tag1': 'column1', 'tag2' : column2, ...} | |
use_all_tags_column_to_generate_id | Bool | true | When generating an expression to calculate an identifier of a time series, this flag enables using the all_tags column in that calculation |
store_min_time_and_max_time | Bool | true | If set to true then the table will store min_time and max_time for each time series |
aggregate_min_time_and_max_time | Bool | true | When creating an inner target tags table, this flag enables using SimpleAggregateFunction(min, Nullable(DateTime64(3))) instead of just Nullable(DateTime64(3)) as the type of the min_time column, and the same for the max_time column |
filter_by_min_time_and_max_time | Bool | true | If set to true then the table will use the min_time and max_time columns for filtering time series |
Functions
Here is a list of functions supporting a TimeSeries table as an argument: