-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
output.js
478 lines (478 loc) · 143 KB
/
output.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
export const data = [
{
"label": "api",
"type": "keyword",
"detail": "Use Tenzir's REST API directly from a pipeline.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>api <endpoint> [<request-body>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>api</code> operator interacts with Tenzir's REST API without needing to spin up a\nweb server, making all APIs accessible from within pipelines.</p>\n<div class=\"remark-container info\"><div class=\"remark-container-title info\">OpenAPI</div><p>Visit <a href=\"/api\">Tenzir's REST API specification</a> to see a list of all available\nendpoints.</p></div>\n<h3><code><endpoint></code></h3>\n<p>The endpoint to request, e.g., <code>/pipeline/list</code> to list all pipelines created\nthrough the <code>/pipeline/create</code> endpoint.</p>\n<h3><code>[<request-body>]</code></h3>\n<p>A single string containing the JSON request body to send with the request.</p>\n<h2>Examples</h2>\n<p>List all running pipelines:</p>\n<pre><code>api /pipeline/list\n</code></pre>\n<p>Create a new pipeline and start it immediately.</p>\n<pre><code>api /pipeline/create '{\"name\": \"Suricata Import\", \"definition\": \"from file /tmp/eve.sock read suricata\", \"autostart\": {\"created\": true}}'\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/api"
},
{
"label": "apply",
"type": "keyword",
"detail": "Include the pipeline defined in another file.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>apply <file>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>apply</code> operator searches for the given file, first in the current\ndirectory, and then in <code><config>/apply/</code> for every config directory, for example\n<code>~/.config/tenzir/apply/</code>.</p>\n<p>The <code>.tql</code> extension is automatically added to the filename, unless it already\nhas an extension.</p>",
"docLink": "https://docs.tenzir.com/operators/apply"
},
{
"label": "batch",
"type": "keyword",
"detail": "The batch operator controls the batch size of events.",
"processedHTML": "<div class=\"remark-container warning\"><div class=\"remark-container-title warning\">Expert Operator</div><p>The <code>batch</code> operator is a lower-level building block that lets users explicitly\ncontrol batching, which otherwise is controlled automatically by Tenzir's\nunderlying pipeline execution engine. Use with caution!</p></div>\n<h2>Synopsis</h2>\n<pre><code>batch [--timeout <duration>] [<limit>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>batch</code> operator takes its input and rewrites it into batches of up to the\ndesired size.</p>\n<h3><code>--timeout <duration></code></h3>\n<p>Specifies a maximum latency for events passing through the batch operator. When\nunspecified, an infinite duration is used.</p>\n<h3><code><limit></code></h3>\n<p>An unsigned integer denoting how many events to put into one batch at most.</p>\n<p>Defaults to 65536.</p>\n<h2>Examples</h2>\n<p>Write exactly one NDJSON object at a time to a Kafka topic.</p>\n<pre><code>batch 1 | to kafka -t topic write json -c\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/batch"
},
{
"label": "chart",
"type": "keyword",
"detail": "Add metadata to a schema, necessary for rendering as a chart.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>chart line [-x|--x-axis <fields>] [-y|--y-axis <field>]\nchart area [-x|--x-axis <fields>] [-y|--y-axis <field>]\nchart bar [-x|--x-axis <fields>] [-y|--y-axis <field>]\nchart pie [--name <field>] [--value <fields>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>chart</code> operator adds attributes to the schema of the input events,\nthat are used to guide rendering of the data as a chart.\nThe operator does no rendering itself.</p>\n<h3><code>-x|--x-axis <fields></code> (<code>line</code>, <code>area</code>, and <code>bar</code> charts only)</h3>\n<p>Set the field used for the X-axis. Defaults to the first field in the schema.</p>\n<p>Values in this field must be strictly increasing\n(sorted in ascending order, without duplicates)\nwhen creating a <code>line</code> or <code>area</code> chart,\nor unique when creating a <code>bar</code> chart.</p>\n<h3><code>-y|--y-axis <fields></code> (<code>line</code>, <code>area</code>, and <code>bar</code> charts only)</h3>\n<p>Set the fields used for the Y-axis.\nCan either be a single field, or a list of fields spelled with\na list syntax (<code>[field1, field2]</code>).\nDefaults to every field but the first one.</p>\n<h3><code>position=<position></code> (<code>line</code>, <code>area</code>, and <code>bar</code> charts only)</h3>\n<p>Control how the values are grouped when rendered as a chart.\nPossible values are <code>grouped</code> and <code>stacked</code>.\nDefaults to <code>grouped</code>.</p>\n<h3><code>--name <field></code> (<code>pie</code> chart only)</h3>\n<p>Set the field used for the names of the segments.\nDefaults to the first field in the schema.</p>\n<p>Values in this field must be unique.</p>\n<h3><code>--value <fields></code> (<code>pie</code> chart only)</h3>\n<p>Set the fields used for the value of a segment.\nCan either be a single field, or multiple fields delimited with commas\n(<code>field1,field2</code>).\nDefaults to every field but the first one.</p>\n<h2>Examples</h2>\n<p>Render most common <code>src_ip</code> values in <code>suricata.flow</code> events as a bar chart:</p>\n<pre><code>export\n| where #schema == \"suricata.flow\"\n| top src_ip\n/* -x and -y are defaulted to `src_ip` and `count` */\n| chart bar\n</code></pre>\n<p>Render historical import throughput statistics as a line chart:</p>\n<pre><code>metrics\n| where #schema == \"tenzir.metrics.operator\"\n| where source == true\n| summarize bytes=sum(output.approx_bytes) by timestamp resolution 1s\n| sort timestamp desc\n| chart line -x timestamp -y bytes\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/chart"
},
{
"label": "compress",
"type": "keyword",
"detail": "Compresses a stream of bytes.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>compress [--level=<level>] <codec>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>compress</code> operator compresses bytes in a pipeline incrementally with a\nknown codec.</p>\n<p>The <code>compress</code> operator is invoked automatically as a part of <a href=\"to.md\"><code>to</code></a>\nif the resulting file has a file extension indicating compression.\nThis behavior can be circumvented by using <a href=\"save.md\"><code>save</code></a> directly.</p>\n<div class=\"remark-container note\"><div class=\"remark-container-title note\">Streaming Compression</div><p>The operator uses <a href=\"https://arrow.apache.org/docs/cpp/api/utilities.html#compression\">Apache Arrow's compression\nutilities</a> under the hood, and transparently supports\nall options that Apache Arrow supports for streaming compression.</p><p>Besides the supported <code>brotli</code>, <code>bz2</code>, <code>gzip</code>, <code>lz4</code>, and <code>zstd</code>, Apache Arrow\nalso ships with codecs for <code>lzo</code>, <code>lz4_raw</code>, <code>lz4_hadoop</code> and <code>snappy</code>, which\nonly support oneshot compression. Support for them is not currently implemented.</p></div>\n<h3><code>--level=<level></code></h3>\n<p>The compression level to use. The supported values depend on the codec used. If\nomitted, the default level for the codec is used.</p>\n<h3><code><codec></code></h3>\n<p>An identifier of the codec to use. Currently supported are <code>brotli</code>, <code>bz2</code>,\n<code>gzip</code>, <code>lz4</code>, and <code>zstd</code>.</p>\n<h2>Examples</h2>\n<p>Export all events in a Gzip-compressed NDJSON file:</p>\n<pre><code>export\n| write json --compact-output\n| compress gzip\n| save file /tmp/backup.json.gz\n</code></pre>\n<p>Recompress a Zstd-compressed file at a higher compression level:</p>\n<pre><code>load file in.zst\n| decompress zstd\n| compress --level 18 zstd\n| save file out.zst\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/compress"
},
{
"label": "context",
"type": "keyword",
"detail": "Manages a context.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>context create <name> <type> [<args>]\ncontext delete <name>\ncontext update <name> [<args>]\ncontext reset <name>\ncontext save <name>\ncontext load <name>\ncontext inspect <name>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>context</code> operator manages <a href=\"../contexts.md\">context</a> instances.</p>\n<ul>\n<li>\n<p>The <code>create</code> command creates a new context with a unique name.</p>\n</li>\n<li>\n<p>The <code>delete</code> command destroys a given context.</p>\n</li>\n<li>\n<p>The <code>update</code> command adds new data to a given context.</p>\n</li>\n<li>\n<p>The <code>reset</code> command clears the state of a given context, as if it had just\nbeen created.</p>\n</li>\n<li>\n<p>The <code>save</code> command outputs the state of the context, serialized into bytes.\nThe result can be processed further in a pipeline,\ne.g. as an input for the <a href=\"./save.md\"><code>save</code></a> operator,\nor to initialize another context with <code>context load</code>.</p>\n</li>\n<li>\n<p>The <code>load</code> command takes in bytes, likely previously created with\n<code>context save</code>, and initializes the context with that data.</p>\n</li>\n<li>\n<p>The <code>inspect</code> command dumps a specific context's user-provided data, usually\nthe context's content.</p>\n</li>\n</ul>\n<h3><code><name></code></h3>\n<p>The name of the context to create, update, or delete.</p>\n<h3><code><type></code></h3>\n<p>The context type for the new context.</p>\n<p>See the <a href=\"../contexts.md\">list of available context types</a>.</p>\n<h3><code><args></code></h3>\n<p>Context-specific options in the format <code>--key value</code> or <code>--flag</code>.</p>\n<h2>Examples</h2>\n<p>Create a <a href=\"../contexts/lookup-table.md\">lookup table</a> context called <code>feodo</code>:</p>\n<pre><code>context create feodo lookup-table\n</code></pre>\n<p>Replace all previous data in the context <code>feodo</code> with data from the <a href=\"https://feodotracker.abuse.ch\">Feodo\nTracker IP Block List</a>, using the <code>ip_address</code>\nfield as the lookup table key:</p>\n<pre><code>from https://feodotracker.abuse.ch/downloads/ipblocklist.json read json --arrays-of-objects\n| context update feodo --clear --key=ip_address\n</code></pre>\n<p>Delete the context named <code>feodo</code>:</p>\n<pre><code>context delete feodo\n</code></pre>\n<p>Inspect all data provided to <code>feodo</code>:</p>\n<pre><code>context inspect feodo\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/context"
},
{
"label": "decapsulate",
"type": "keyword",
"detail": "Decapsulates packet data at link, network, and transport layer.",
"processedHTML": "<div class=\"remark-container warning\"><div class=\"remark-container-title warning\">Deprecated</div><p>This operator will soon be removed in favor of first-class support for functions\nthat can be used in a variety of different operators and contexts.</p></div>\n<h2>Synopsis</h2>\n<pre><code>decapsulate\n</code></pre>\n<h2>Description</h2>\n<p>The <code>decapsulate</code> operator proceses events of type <code>pcap.packet</code> and\ndecapsulates the packet payload by extracting fields at the link, network, and\ntransport layer. The aim is not completeness, but rather exposing commonly used\nfield for analytics.</p>\n<p>The operator only processes events of type <code>pcap.packet</code> and emits events of\ntype <code>tenzir.packet</code>.</p>\n<h3>VLAN Tags</h3>\n<p>While decapsulating packets, <code>decapsulate</code> extracts\n<a href=\"https://en.wikipedia.org/wiki/IEEE_802.1Q\">802.1Q</a> VLAN tags into the nested\n<code>vlan</code> record, consisting of an <code>outer</code> and <code>inner</code> field for the respective\ntags. The value of the VLAN tag corresponds to the 12-bit VLAN identifier (VID).\nSpecial values include <code>0</code> (frame does not carry a VLAN ID) and <code>0xFFF</code>\n(reserved value; sometimes wildcard match).</p>\n<h2>Examples</h2>\n<p>Decapsulate packets from a PCAP file:</p>\n<pre><code>from file /tmp/trace.pcap read pcap\n| decapsulate\n</code></pre>\n<p>Extract packets as JSON that have the address 6.6.6.6 as source or destination,\nand destination port 5158:</p>\n<pre><code>read pcap\n| decapsulate\n| where 6.6.6.6 && dport == 5158\n| write json\n</code></pre>\n<p>Query VLAN IDs using <code>vlan.outer</code> and <code>vlan.inner</code>:</p>\n<pre><code>read pcap\n| decapsulate\n| where vlan.outer > 0 || vlan.inner in [1, 2, 3]\n</code></pre>\n<p>Filter packets by <a href=\"https://github.com/corelight/community-id-spec\">Community\nID</a>:</p>\n<pre><code>read pcap\n| decapsulate\n| where community_id == \"1:wCb3OG7yAFWelaUydu0D+125CLM=\"\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/decapsulate"
},
{
"label": "decompress",
"type": "keyword",
"detail": "Decompresses a stream of bytes.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>decompress <codec>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>decompress</code> operator decompresses bytes in a pipeline incrementally with a\nknown codec. The operator supports decompressing multiple concatenated streams\nof the same codec transparently.</p>\n<p>The <code>decompress</code> operator is invoked automatically as a part of <a href=\"from.md\"><code>from</code></a>\nif the source file has a file extension indicating compression.\nThis behavior can be circumvented by using <a href=\"load.md\"><code>load</code></a> directly.</p>\n<div class=\"remark-container note\"><div class=\"remark-container-title note\">Streaming Decompression</div><p>The operator uses <a href=\"https://arrow.apache.org/docs/cpp/api/utilities.html#compression\">Apache Arrow's compression\nutilities</a> under the hood, and transparently supports\nall options that Apache Arrow supports for streaming decompression.</p><p>Besides the supported <code>brotli</code>, <code>bz2</code>, <code>gzip</code>, <code>lz4</code>, and <code>zstd</code>, Apache Arrow\nalso ships with codecs for <code>lzo</code>, <code>lz4_raw</code>, <code>lz4_hadoop</code> and <code>snappy</code>, which\nonly support oneshot decompression. Support for them is not currently implemented.</p></div>\n<h3><code><codec></code></h3>\n<p>An identifier of the codec to use. Currently supported are <code>brotli</code>, <code>bz2</code>,\n<code>gzip</code>, <code>lz4</code>, and <code>zstd</code>.</p>\n<h2>Examples</h2>\n<p>Import Suricata events from a Zstd-compressed file:</p>\n<pre><code>from eve.json.zst\n| import\n\nload file eve.json.zst\n| decompress zstd\n| read suricata\n| import\n</code></pre>\n<p>Convert a Zstd-compressed file into an LZ4-compressed file:</p>\n<pre><code>from in.zst\n| to out.lz4\n\nload file in.zst\n| decompress zstd\n| compress lz4\n| save file out.lz4\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/decompress"
},
{
"label": "deduplicate",
"type": "keyword",
"detail": "Removes duplicate events based on the values of one or more fields.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>deduplicate [<extractor>...]\n [--limit <count>] [--distance <count>] [--timeout <duration>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>deduplicate</code> operator removes duplicates from a stream of events, based\non the value of one or more fields.</p>\n<p>You have three independent configuration options to customize the operator's\nbehavior:</p>\n<ol>\n<li><strong>Limit</strong>: the multiplicity of the events until they are supressed as\nduplicates. A limit of 1 is equivalent to emission of unique events. A limit\nof <em>N</em> means that events with a unique key (defined by the fields) get\nemitted at most <em>N</em> times. For example, <code>GGGYBYYBGYGB</code> with a limit of 2\nyields <code>GGYBYB</code>.</li>\n<li><strong>Distance</strong>: The number of events in sequence since the last occurrence of\na unique event. For example, deduplicating a stream <code>GGGYBYYBGYGB</code> with\ndistance 2 yields <code>GYBBGYB</code>.</li>\n<li><strong>Timeout</strong>: The time that needs to pass until a surpressed event is no\nlonger considered a duplicate. When an event with surpressed key is seen\nbefore the timeout is reached, the timer resets.</li>\n</ol>\n<p>The diagram below illustrates these three options. The different colored boxes\nrefer to events of different schemas.</p>\n<p><img src=\"deduplicate.excalidraw.svg\" alt=\"Deduplicate Configuration Knobs\"></p>\n<h3><code><extractor>...</code></h3>\n<p>A comma-separated list of extractors that identify the fields used for\ndeduplicating. Missing fields are treated as if they had the value <code>null</code>.</p>\n<p>Defaults to the entire event.</p>\n<h3><code>--limit <count></code></h3>\n<p>The number of duplicates allowed before they are removed.</p>\n<p>Defaults to 1.</p>\n<h3><code>--distance <count></code></h3>\n<p>Distance between two events that can be considered duplicates. Value of <code>1</code>\nmeans only adjacent events can be considered duplicates. <code>0</code> means infinity.</p>\n<p>Defaults to infinity.</p>\n<h3><code>--timeout <duration></code></h3>\n<p>The amount of time a specific value is remembered for deduplication. For each\nvalue, the timer is reset every time a match for that value is found.</p>\n<p>Defaults to infinity.</p>\n<h2>Examples</h2>\n<p>Consider the following data:</p>\n<pre><code class=\"language-json\">{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n{\"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n</code></pre>\n<p>For <code>deduplicate --limit 1</code>, all duplicate events are removed:</p>\n<pre><code class=\"language-json\">{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n{\"bar\": \"b\"}\n</code></pre>\n<p>If <code>deduplicate bar --limit 1</code> is used, only the field <code>bar</code> is considered when\ndetermining whether an event is a duplicate:</p>\n<pre><code class=\"language-json\">{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"b\"}\n</code></pre>\n<p>And for <code>deduplicate foo --limit 1</code>, only the field <code>foo</code> is considered.\nNote, how the missing <code>foo</code> field is treated as if it had the value <code>null</code>,\ni.e., it's not included in the output.</p>\n<pre><code class=\"language-json\">{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": null, \"bar\": \"b\"}\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/deduplicate"
},
{
"label": "delay",
"type": "keyword",
"detail": "Delays events relative to a given start time, with an optional speedup.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>delay [--start <time>] [--speed <factor>] <field>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>delay</code> operator replays a dataflow according to a time field by introducing\nsleeping periods proportional to the inter-arrival times of the events.</p>\n<p>With <code>--speed</code>, you can adjust the sleep time of the time series induced by\n<code>field</code> with a multiplicative factor. This has the effect of making the time\nseries \"faster\" for values great than 1 and \"slower\" for values less than 1.\nUnless you provide a start time with <code>--start</code>, the operator will anchor the\ntimestamps in <code>field</code> to begin with the current wall clock time, as if you\nprovided <code>--start now</code>.</p>\n<p>The diagram below illustrates the effect of applying <code>delay</code> to dataflow. If an\nevent in the stream has a timestamp the precedes the previous event, <code>delay</code>\nemits it instanstly. Otherwise <code>delay</code> sleeps the amount of time to reach the\nnext timestamp. As shown in the last illustration, the <code>--speed</code> factor has a\nscaling effect on the inter-arrival times.</p>\n<p><img src=\"delay.excalidraw.svg\" alt=\"Delay\"></p>\n<p>The options <code>--start</code> and <code>--speed</code> work independently, i.e., you can use them\nseparately or both together.</p>\n<h3><code>--start <time></code></h3>\n<p>The timestamp to anchor the time values around.</p>\n<p>Defaults to the first non-null timestamp in <code>field</code>.</p>\n<h3><code>--speed <speed></code></h3>\n<p>A constant factor to be divided by the inter-arrival time. For example, 2.0\ndecreases the event gaps by a factor of two, resulting a twice as fast dataflow.\nA value of 0.1 creates dataflow that spans ten times the original time frame.</p>\n<p>Defaults to 1.0.</p>\n<h3><code><field></code></h3>\n<p>The name of the field containing the timestamp values.</p>\n<h2>Examples</h2>\n<p>Replay the M57 Zeek logs with real-world inter-arrival times from the <code>ts</code>\ncolumn. For example, if event <em>i</em> arrives at time <em>t</em> and <em>i + 1</em> at time <em>u</em>,\nthen the <code>delay</code> operator will wait time <em>u - t</em> after emitting event <em>i</em> before\nemitting event <em>i + 1</em>. If <em>t > u</em> then the operator immediately emits event *i</p>\n<ul>\n<li>1*.</li>\n</ul>\n<pre><code>from https://storage.googleapis.com/tenzir-datasets/M57/zeek-all.log.zst read zeek-tsv\n| delay ts\n</code></pre>\n<p>Replay the M57 Zeek logs at 10 times the original speed. That is, wait <em>(u - t)\n/ 10</em> between event <em>i</em> and <em>i + 1</em>, assuming <em>u > t</em>.</p>\n<pre><code>from https://storage.googleapis.com/tenzir-datasets/M57/zeek-all.log.zst read zeek-tsv\n| delay --speed 10 ts\n</code></pre>\n<p>Replay as above, but start delaying only after <code>ts</code> exceeds <code>2021-11-17T16:35</code>\nand emit all events prior to that timestamp immediately.</p>\n<pre><code>from https://storage.googleapis.com/tenzir-datasets/M57/zeek-all.log.zst read zeek-tsv\n| delay --start \"2021-11-17T16:35\" --speed 10 ts\n</code></pre>\n<p>Adjust the timestamp to the present, and then start replaying in 2 hours from\nnow:</p>\n<pre><code>from https://storage.googleapis.com/tenzir-datasets/M57/zeek-all.log.zst read zeek-tsv\n| timeshift ts\n| delay --start \"in 2 hours\" ts\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/delay"
},
{
"label": "diagnostics",
"type": "keyword",
"detail": "Retrieves diagnostic events from a Tenzir node.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>diagnostics [--live]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>diagnostics</code> operator retrieves diagnostic events from a Tenzir\nnode.</p>\n<h3><code>--live</code></h3>\n<p>Work on all diagnostic events as they are generated in real-time instead of on\ndiagnostic events persisted at a Tenzir node.</p>\n<h2>Schemas</h2>\n<p>Tenzir emits diagnostic information with the following schema:</p>\n<h3><code>tenzir.diagnostic</code></h3>\n<p>Contains detailed information about the diagnostic.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>pipeline_id</code>|<code>string</code>|The ID of the pipeline that created the diagnostic.|\n|<code>run</code>|<code>uint64</code>|The number of the run, starting at 1 for the first run.|\n|<code>timestamp</code>|<code>time</code>|The exact timestamp of the diagnostic creation.|\n|<code>message</code>|<code>string</code>|The diagnostic message.|\n|<code>severity</code>|<code>string</code>|The diagnostic severity.|\n|<code>notes</code>|<code>list<record></code>|The diagnostic notes. Can be empty.|\n|<code>annotations</code>|<code>list<record></code>|The diagnostic annotations. Can be empty.|</p>\n<p>The record <code>notes</code> has the following schema:</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>kind</code>|<code>string</code>|The kind of note, which is <code>note</code>, <code>usage</code>, <code>hint</code> or <code>docs</code>.|\n|<code>message</code>|<code>string</code>|The message of this note.|</p>\n<p>The record <code>annotations</code> has the following schema:</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>primary</code>|<code>bool</code>|True if the <code>source</code> represents the underlying reason for the diagnostic, false if it is only related to it.|\n|<code>text</code>|<code>string</code>|A message for explanations. Can be empty.|\n|<code>source</code>|<code>string</code>|The character range in the pipeline string that this annotation is associated to.|</p>\n<h2>Examples</h2>\n<p>View all diagnostics generated in the past five minutes.</p>\n<pre><code>diagnostics\n| where timestamp > 5 minutes ago\n</code></pre>\n<p>Only show diagnostics that contain the <code>error</code> severity.</p>\n<pre><code>diagnostics\n| where severity == \"error\"\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/diagnostics"
},
{
"label": "discard",
"type": "keyword",
"detail": "Discards all incoming events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>discard\n</code></pre>\n<h2>Description</h2>\n<p>The <code>discard</code> operator has a similar effect as <code>to file /dev/null write json</code>,\nbut it immediately discards all events without first rendering them with a\nprinter.</p>\n<p>This operator is mainly used to test or benchmark pipelines.</p>\n<h2>Examples</h2>\n<p>Benchmark to see how long it takes to export everything:</p>\n<pre><code>export | discard\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/discard"
},
{
"label": "drop",
"type": "keyword",
"detail": "Drops fields from the input.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>drop <extractor>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>drop</code> operator removes all fields matching the provided extractors and\nkeeps all other fields. It is the dual to <a href=\"select.md\"><code>select</code></a>.</p>\n<p>In relational algebra, <code>drop</code> performs a <em>projection</em> of the complement of the\nprovided arguments.</p>\n<h3><code><extractor>...</code></h3>\n<p>A comma-separated list of extractors that identify the fields to remove.</p>\n<h2>Examples</h2>\n<p>Remove the fields <code>foo</code> and <code>bar</code>:</p>\n<pre><code>drop foo, bar\n</code></pre>\n<p>Remove all fields of type <code>ip</code>:</p>\n<pre><code>drop :ip\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/drop"
},
{
"label": "enrich",
"type": "keyword",
"detail": "Enriches events with a context.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>enrich <name> [--field <field...>] [--replace] [--filter] [--separate]\n [--yield <field>] [<context-options>]\nenrich <output>=<name> [--field <field...>] [--filter] [--separate]\n [--yield <field>] [<context-options>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>enrich</code> operator applies a context, extending input events with a new field\ndefined by the context.</p>\n<h3><code><name></code></h3>\n<p>The name of the context to enrich with.</p>\n<h3><code><output></code></h3>\n<p>The name of the field in which to store the context's enrichment. Defaults to\nthe name of the context.</p>\n<h3><code>--field <field...></code></h3>\n<p>A comma-separated list of fields, type extractors, or concepts to match.</p>\n<h3><code>--replace</code></h3>\n<p>Replace the given fields with their respective context, omitting all\nmeta-information.</p>\n<h3><code>--filter</code></h3>\n<p>Filter events that do not match the context.</p>\n<p>This option is incompatible with <code>--replace</code>.</p>\n<h3><code>--separate</code></h3>\n<p>When multiple fields are provided, e.g., when using <code>--field :ip</code> to enrich all\nIP address fields, duplicate the event for every provided field and enrich them\nindividually.</p>\n<p>When using the option, the context moves from <code><output>.context.<path...></code> to\n<code><output></code> in the resulting event, with a new field <code><output>.path</code> containing\nthe enriched path.</p>\n<h3><code>--yield <path></code></h3>\n<p>Provide a field into the context object to use as the context instead. If the\nkey does not exist within the context, a <code>null</code> value is used instead.</p>\n<h3><code><context-options></code></h3>\n<p>Optional, context-specific options in the format <code>--key value</code> or <code>--flag</code>.\nRefer to the documentation of the individual contexts for these.</p>\n<h2>Examples</h2>\n<p>Apply the <code>lookup-table</code> context <code>feodo</code> to <code>suricata.flow</code> events, using the\n<code>dest_ip</code> field as the field to compare the context key against.</p>\n<pre><code>export\n| where #schema == \"suricata.flow\"\n| enrich feodo --field dest_ip\n</code></pre>\n<p>To return only events that have a context, use:</p>\n<pre><code>export\n| where #schema == \"suricata.flow\"\n| enrich feodo --field dest_ip --filter\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/enrich"
},
{
"label": "enumerate",
"type": "keyword",
"detail": "Prepend a column with row numbers.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>enumerate [<field>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>enumerate</code> operator prepends a new column with row numbers to the beginning\nof the input record.</p>\n<div class=\"remark-container note\"><div class=\"remark-container-title note\">Per-schema Counting</div><p>The operator counts row numbers per schema. We plan to change this behavior with\na in the future once we have a modifer that toggles \"per-schema-ness\"\nexplicitly.</p></div>\n<h3><code><field></code></h3>\n<p>Sets the name of the output field.</p>\n<p>Defaults to <code>#</code> to avoid conflicts with existing field names.</p>\n<h2>Examples</h2>\n<p>Enumerate the input by prepending row numbers:</p>\n<pre><code>from file eve.json read suricata | select event_type | enumerate | write json\n</code></pre>\n<pre><code class=\"language-json\">{\"#\": 0, \"event_type\": \"alert\"}\n{\"#\": 0, \"event_type\": \"flow\"}\n{\"#\": 1, \"event_type\": \"flow\"}\n{\"#\": 0, \"event_type\": \"http\"}\n{\"#\": 1, \"event_type\": \"alert\"}\n{\"#\": 1, \"event_type\": \"http\"}\n{\"#\": 2, \"event_type\": \"flow\"}\n{\"#\": 0, \"event_type\": \"fileinfo\"}\n{\"#\": 3, \"event_type\": \"flow\"}\n{\"#\": 4, \"event_type\": \"flow\"}\n</code></pre>\n<p>Use <code>index</code> as field name instead of the default:</p>\n<pre><code>enumerate index\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/enumerate"
},
{
"label": "export",
"type": "keyword",
"detail": "Retrieves events from a Tenzir node. The dual to import.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>export [--live] [--internal] [--low-priority]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>export</code> operator retrieves events from a Tenzir node.</p>\n<h3><code>--live</code></h3>\n<p>Work on all events that are imported with <code>import</code> operators in real-time\ninstead of on events persisted at a Tenzir node.</p>\n<h3><code>--internal</code></h3>\n<p>Export internal events, such as metrics or diagnostics, instead. By default,\n<code>export</code> only returns events that were previously imported with <code>import</code>. In\ncontrast, <code>export --internal</code> exports internal events such as operator metrics.</p>\n<h3><code>--low-priority</code></h3>\n<p>Treat this export with a lower priority, causing it to interfere less with\nregular priority exports at the cost of potentially running slower.</p>\n<h2>Examples</h2>\n<p>Expose all persisted events as JSON data.</p>\n<pre><code>export | to stdout\n</code></pre>\n<p><a href=\"where.md\">Apply a filter</a> to all persisted events, then <a href=\"head.md\">only expose the first\nten results</a>.</p>\n<pre><code>export | where 1.2.3.4 | head 10 | to stdout\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/export"
},
{
"label": "extend",
"type": "keyword",
"detail": "Appends fields to events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>extend <field=operand>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>extend</code> operator appends a specified list of fields to the input. All\nexisting fields remain intact.</p>\n<p>The difference between <code>extend</code> and <a href=\"put.md\"><code>put</code></a> is that <code>put</code> drops all\nfields not explicitly specified, whereas <code>extend</code> only appends fields.</p>\n<p>The difference between <code>extend</code> and <a href=\"replace.md\"><code>replace</code></a> is that <code>replace</code>\noverwrites existing fields, whereas <code>extend</code> doesn't touch the input.</p>\n<p>The difference between <code>extend</code> and <a href=\"set.md\"><code>set</code></a> is that <code>set</code> does not\nignore fields that do already exist in the data.</p>\n<h3><code><field=operand></code></h3>\n<p>The assignment consists of <code>field</code> that describes the new field name and\n<code>operand</code> that defines the field value.</p>\n<h3>Examples</h3>\n<p>Add new fields with fixed values:</p>\n<pre><code>extend secret=\"xxx\", ints=[1, 2, 3], strs=[\"a\", \"b\", \"c\"]\n</code></pre>\n<p>Duplicate a column:</p>\n<pre><code>extend source=src_ip\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/extend"
},
{
"label": "files",
"type": "keyword",
"detail": "Shows file information for a given directory.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>files [<directory>] [-r|--recurse-directories]\n [--follow-directory-symlink]\n [--skip-permission-denied]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>files</code> operator shows file information for all files in the given\ndirectory.</p>\n<h3><code><directory></code></h3>\n<p>The directory to list files in.</p>\n<p>Defaults to the current working directory.</p>\n<h3><code>-r|--recurse-directories</code></h3>\n<p>Recursively list files in subdirectories.</p>\n<h3><code>--follow-directory-symlink</code></h3>\n<p>Follow rather than skip directory symlinks.</p>\n<h3><code>--skip-permission-denied</code></h3>\n<p>Skip directories that would otherwise result in permission denied errors.</p>\n<h2>Schemas</h2>\n<p>Tenzir emits file information with the following schema.</p>\n<h3><code>tenzir.file</code></h3>\n<p>Contains detailed information about the file.</p>\n<p>| Field | Type | Description |\n| :---------------- | :------- | :--------------------------------------- |\n| <code>path</code> | <code>string</code> | The file path. |\n| <code>type</code> | <code>string</code> | The type of the file (see below). |\n| <code>permissions</code> | <code>record</code> | The permissions of the file (see below). |\n| <code>owner</code> | <code>string</code> | The file's owner. |\n| <code>group</code> | <code>string</code> | The file's group. |\n| <code>file_size</code> | <code>uint64</code> | The file size in bytes. |\n| <code>hard_link_count</code> | <code>uint64</code> | The number of hard links to the file. |\n| <code>last_write_time</code> | <code>time</code> | The time of the last write to the file. |</p>\n<p>The <code>type</code> field can have one of the following values:</p>\n<p>| Value | Description |\n| :---------- | :------------------------------ |\n| <code>regular</code> | The file is a regular file. |\n| <code>directory</code> | The file is a directory. |\n| <code>symlink</code> | The file is a symbolic link. |\n| <code>block</code> | The file is a block device. |\n| <code>character</code> | The file is a character device. |\n| <code>fifo</code> | The file is a named IPC pipe. |\n| <code>socket</code> | The file is a named IPC socket. |\n| <code>not_found</code> | The file does not exist. |\n| <code>unknown</code> | The file has an unknown type. |</p>\n<p>The <code>permissions</code> record contains the following fields:</p>\n<p>| Field | Type | Description |\n| :------- | :------- | :---------------------------------- |\n| <code>owner</code> | <code>record</code> | The file permissions for the owner. |\n| <code>group</code> | <code>record</code> | The file permissions for the group. |\n| <code>others</code> | <code>record</code> | The file permissions for others. |</p>\n<p>The <code>owner</code>, <code>group</code>, and <code>others</code> records contain the following fields:</p>\n<p>| Field | Type | Description |\n| :-------- | :----- | :------------------------------ |\n| <code>read</code> | <code>bool</code> | Whether the file is readable. |\n| <code>write</code> | <code>bool</code> | Whether the file is writeable. |\n| <code>execute</code> | <code>bool</code> | Whether the file is executable. |</p>\n<h2>Examples</h2>\n<p>Compute the total file size of the current directory:</p>\n<pre><code>files -r\n| summarize total_size=sum(file_size)\n</code></pre>\n<p>Find all named pipes in <code>/tmp</code>:</p>\n<pre><code>files -r --skip-permission-denied /tmp\n| where type == \"symlink\"\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/files"
},
{
"label": "flatten",
"type": "keyword",
"detail": "Flattens nested data.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>flatten [<separator>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>flatten</code> operator acts on <a href=\"../data-model/type-system.md\">container types</a>:</p>\n<ol>\n<li><strong>Records</strong>: Join nested records with a separator (<code>.</code> by default). For\nexample, if a field named <code>x</code> is a record with fields <code>a</code> and <code>b</code>, flattening\nwill lift the nested record into the parent scope by creating two new fields\n<code>x.a</code> and <code>x.b</code>.</li>\n<li><strong>Lists</strong>: Merge nested lists into a single (flat) list. For example,\n<code>[[[2]], [[3, 1]], [[4]]]</code> becomes <code>[2, 3, 1, 4]</code>.</li>\n</ol>\n<p>For records inside lists, <code>flatten</code> \"pushes lists down\" into one list per record\nfield. For example, the record</p>\n<pre><code class=\"language-json\">{\n \"foo\": [\n {\n \"a\": 2,\n \"b\": 1\n },\n {\n \"a\": 4\n }\n ]\n}\n</code></pre>\n<p>becomes</p>\n<pre><code class=\"language-json\">{\n \"foo.a\": [2, 4],\n \"foo.b\": [1, null]\n}\n</code></pre>\n<p>Lists nested in records that are nested in lists will also be flattened. For\nexample, the record</p>\n<pre><code class=\"language-json\">{\n \"foo\": [\n {\n \"a\": [\n [2, 23],\n [1,16]\n ],\n \"b\": [1]\n },\n {\n \"a\": [[4]]\n }\n ]\n}\n</code></pre>\n<p>becomes</p>\n<pre><code class=\"language-json\">{\n \"foo.a\": [\n 2,\n 23,\n 1,\n 16,\n 4\n ],\n \"foo.b\": [\n 1\n ]\n}\n</code></pre>\n<p>As you can see from the above examples, flattening also removes <code>null</code> values.</p>\n<h3><code><separator></code></h3>\n<p>The separator string to join the field names of nested records.</p>\n<p>Defaults to <code>.</code>.</p>\n<h2>Examples</h2>\n<p>Consider the following record:</p>\n<pre><code class=\"language-json\">{\n \"src_ip\": \"147.32.84.165\",\n \"src_port\": 1141,\n \"dest_ip\": \"147.32.80.9\",\n \"dest_port\": 53,\n \"event_type\": \"dns\",\n \"dns\": {\n \"type\": \"query\",\n \"id\": 553,\n \"rrname\": \"irc.freenode.net\",\n \"rrtype\": \"A\",\n \"tx_id\": 0,\n \"grouped\": {\n \"A\": [\"tenzir.com\", null]\n }\n }\n}\n</code></pre>\n<p>After <code>flatten</code> the record looks as follows:</p>\n<pre><code class=\"language-json\">{\n \"src_ip\": \"147.32.84.165\",\n \"src_port\": 1141,\n \"dest_ip\": \"147.32.80.9\",\n \"dest_port\": 53,\n \"event_type\": \"dns\",\n \"dns.type\": \"query\",\n \"dns.id\": 553,\n \"dns.rrname\": \"irc.freenode.net\",\n \"dns.rrtype\": \"A\",\n \"dns.tx_id\": 0,\n \"dns.grouped.A\": [\"tenzir.com\"]\n}\n</code></pre>\n<p>Note that <code>dns.grouped.A</code> no longer contains a <code>null</code> value.</p>",
"docLink": "https://docs.tenzir.com/operators/flatten"
},
{
"label": "fluent-bit",
"type": "keyword",
"detail": "Sends and receives events via Fluent Bit.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>fluent-bit [-X|--set <key=value>,...] <plugin> [<key=value>...]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>fluent-bit</code> operator acts as a bridge into the Fluent Bit ecosystem,\nmaking it possible to acquire events from a Fluent Bit <a href=\"https://docs.fluentbit.io/manual/pipeline/inputs\">input plugin</a>\nand process events with a Fluent Bit <a href=\"https://docs.fluentbit.io/manual/pipeline/outputs\">output plugin</a>.</p>\n<p>Syntactically, the <code>fluent-bit</code> operator behaves similar to an invocation of the\n<code>fluent-bit</code> command line utility. For example, the invocation</p>\n<pre><code class=\"language-bash\">fluent-bit -o plugin -p key1=value1 -p key2=value2 -p ...\n</code></pre>\n<p>translates to our <code>fluent-bit</code> operator as follows:</p>\n<pre><code class=\"language-bash\">fluent-bit plugin key1=value1 key2=value2 ...\n</code></pre>\n<h3><code>-X|--set <key=value></code></h3>\n<p>A comma-separated list of key-value pairs that represent the global properties\nof the Fluent Bit service., e.g., <code>-X flush=1,grace=3</code>.</p>\n<p>Consult the list of available <a href=\"https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/configuration-file#config_section\">key-value pairs</a> to configure\nFluent Bit according to your needs.</p>\n<p>We recommend factoring these options into the plugin-specific <code>fluent-bit.yaml</code>\nso that they are independent of the <code>fluent-bit</code> operator arguments.</p>\n<h3><code><plugin></code></h3>\n<p>The name of the Fluent Bit plugin.</p>\n<p>Run <code>fluent-bit -h</code> and look under the <strong>Inputs</strong> and <strong>Outputs</strong> section of the\nhelp text for available plugin names. The web documentation often comes with an\nexample invocation near the bottom of the page, which also provides a good idea\nhow you could use the operator.</p>\n<h3><code><key=value></code></h3>\n<p>Sets a plugin configuration property.</p>\n<p>The positional arguments of the form <code>key=value</code> are equivalent to the\nmulti-option <code>-p key=value</code> of the <code>fluent-bit</code> executable.</p>\n<h2>Examples</h2>\n<h3>Source</h3>\n<p>Ingest <a href=\"https://docs.fluentbit.io/manual/pipeline/inputs/slack\">OpenTelemetry</a>\nlogs, metrics, and traces:</p>\n<pre><code>fluent-bit opentelemetry\n</code></pre>\n<p>You can then send JSON-encoded log data to a freshly created API endpoint:</p>\n<pre><code class=\"language-bash\">curl \\\n --header \"Content-Type: application/json\" \\\n --request POST \\\n --data '{\"resourceLogs\":[{\"resource\":{},\"scopeLogs\":[{\"scope\":{},\"logRecords\":[{\"timeUnixNano\":\"1660296023390371588\",\"body\":{\"stringValue\":\"{\\\"message\\\":\\\"dummy\\\"}\"},\"traceId\":\"\",\"spanId\":\"\"}]}]}]}' \\\n http://0.0.0.0:4318/v1/logs\n</code></pre>\n<p>Handle <a href=\"https://docs.fluentbit.io/manual/pipeline/inputs/splunk\">Splunk</a> HTTP\nHEC requests:</p>\n<pre><code>fluent-bit splunk port=8088\n</code></pre>\n<p>Handle <a href=\"https://docs.fluentbit.io/manual/pipeline/inputs/elasticsearch\">ElasticSearch &\nOpenSearch</a>\nBulk API requests or ingest from beats (e.g., Filebeat, Metricbeat, Winlogbeat):</p>\n<pre><code>fluent-bit elasticsearch port=9200\n</code></pre>\n<h3>Sink</h3>\n<p>Send events to <a href=\"https://docs.fluentbit.io/manual/pipeline/outputs/slack\">Slack</a>:</p>\n<pre><code>fluent-bit slack webhook=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX\n</code></pre>\n<p>Send events to\n<a href=\"https://docs.fluentbit.io/manual/pipeline/outputs/splunk\">Splunk</a>:</p>\n<pre><code>fluent-bit splunk host=127.0.0.1 port=8088 tls=on tls.verify=off splunk_token=11111111-2222-3333-4444-555555555555\n</code></pre>\n<p>Send events to\n<a href=\"https://docs.fluentbit.io/manual/pipeline/outputs/elasticsearch\">ElasticSearch</a>:</p>\n<pre><code>fluent-bit es host=192.168.2.3 port=9200 index=my_index type=my_type\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/fluent-bit"
},
{
"label": "from",
"type": "keyword",
"detail": "Produces events by combining a \\[connector]\\[connectors] and a \\[format]\\[formats].",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>from <url> [read <format>]\nfrom <path> [read <format>]\nfrom <connector> [read <format>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>from</code> operator produces events at the beginning of a pipeline by bringing\ntogether a <a href=\"../connectors.md\">connector</a> and a <a href=\"../formats.md\">format</a>.</p>\n<p>If given something that looks like a path to a file, the connector can pick\nout a format automatically based on the file extension or the file name.\nThis enables a shorter syntax, e.g., <code>from https://example.com/file.yml</code>\nuses the <code>yaml</code> format. All connectors also have a default format,\nwhich will be used if the format can't be determined by the path.\nFor most connectors, this default format is <code>json</code>. So, for example,\n<code>from stdin</code> uses the <code>json</code> format.</p>\n<p>Additionally, if a file extension indicating compression can be found,\n<a href=\"decompress.md\"><code>decompress</code></a> is automatically used.\nFor example, <code>from myfile.json.gz</code> is automatically gzip-decompressed\nand parsed as json, i.e., <code>load myfile.json.gz | decompress gzip | read json</code>.</p>\n<p>The <code>from</code> operator is a pipeline under the hood. For most cases, it is equal to\n<code>load <connector> | read <format></code>. However, for some combinations of\nconnectors and formats the underlying pipeline is a lot more complex. We\nrecommend always using <code>from ... read ...</code> over the <a href=\"load.md\"><code>load</code></a> and\n<a href=\"read.md\"><code>read</code></a> operators.</p>\n<h3><code><connector></code></h3>\n<p>The <a href=\"../connectors.md\">connector</a> used to load bytes.</p>\n<p>Some connectors have connector-specific options. Please refer to the\ndocumentation of the individual connectors for more information.</p>\n<h3><code><format></code></h3>\n<p>The <a href=\"../formats.md\">format</a> used to parse events from the loaded bytes.</p>\n<p>Some formats have format-specific options. Please refer to the documentation of\nthe individual formats for more information.</p>\n<h2>Examples</h2>\n<p>Read bytes from stdin and parse them as JSON.</p>\n<pre><code>from stdin read json\nfrom file stdin read json\nfrom file - read json\nfrom - read json\n</code></pre>\n<p>Read bytes from the file <code>path/to/eve.json</code> and parse them as Suricata.\nNote that the <code>file</code> connector automatically assigns the Suricata parser for\n<code>eve.json</code> files when no other parser is specified.\nAlso, when directly passed a filesystem path, the <code>file</code> connector is automatically used.</p>\n<pre><code>from path/to/eve.json\nfrom file path/to/eve.json\nfrom file path/to/eve.json read suricata\n</code></pre>\n<p>Read bytes from the URL <code>https://example.com/data.json</code> over HTTPS and parse them as JSON.\nNote that when <code>from</code> is passed a URL directly, the <code>https</code> connector is automatically used.</p>\n<pre><code>from https://example.com/data.json read json\nfrom https example.com/data.json read json\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/from"
},
{
"label": "hash",
"type": "keyword",
"detail": "Computes a SHA256 hash digest of a given field.",
"processedHTML": "<div class=\"remark-container warning\"><div class=\"remark-container-title warning\">Deprecated</div><p>This operator will soon be removed in favor of first-class support for functions\nthat can be used in a variety of different operators and contexts.</p></div>\n<h2>Synopsis</h2>\n<pre><code>hash [-s|--salt=<string>] <field>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>hash</code> operator calculates a hash digest of a given field.</p>\n<h3><code><-s|--salt>=<string></code></h3>\n<p>A salt value for the hash.</p>\n<h3><code><field></code></h3>\n<p>The field over which the hash is computed.</p>\n<h2>Examples</h2>\n<p>Hash all values of the field <code>username</code> using the salt value <code>\"xxx\"</code> and store\nthe digest in a new field <code>username_hashed</code>:</p>\n<pre><code>hash --salt=\"B3IwnumKPEJDAA4u\" username\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/hash"
},
{
"label": "head",
"type": "keyword",
"detail": "Limits the input to the first N events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>head [<limit>]\n</code></pre>\n<h2>Description</h2>\n<p>The semantics of the <code>head</code> operator are the same of the equivalent Unix tool:\nprocess a fixed number of events from the input. The operator terminates\nafter it has reached its limit.</p>\n<p><code>head <limit></code> is a shorthand notation for <a href=\"slice.md\"><code>slice --end <limit></code></a>.</p>\n<h3><code><limit></code></h3>\n<p>An unsigned integer denoting how many events to keep.</p>\n<p>Defaults to 10.</p>\n<h2>Examples</h2>\n<p>Get the first ten events:</p>\n<pre><code>head\n</code></pre>\n<p>Get the first five events:</p>\n<pre><code>head 5\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/head"
},
{
"label": "import",
"type": "keyword",
"detail": "Imports events into a Tenzir node. The dual to export.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>import\n</code></pre>\n<h2>Description</h2>\n<p>The <code>import</code> operator persists events in a Tenzir node.</p>\n<h2>Examples</h2>\n<p>Import Zeek conn logs into a Tenzir node.</p>\n<pre><code>from file conn.log read zeek-tsv | import\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/import"
},
{
"label": "load",
"type": "keyword",
"detail": "The load operator acquires raw bytes from a connector.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>load <url>\nload <path>\nload <connector>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>load</code> operator emits raw bytes.</p>\n<p>Notably, it cannot be used together with operators that expect events as input,\nbut rather only with operators that expect bytes, e.g., <a href=\"read.md\"><code>read</code></a> or\n<a href=\"save.md\"><code>save</code></a>.</p>\n<h3><code><connector></code></h3>\n<p>The <a href=\"../connectors.md\">connector</a> used to load bytes.</p>\n<p>Some connectors have connector-specific options. Please refer to the\ndocumentation of the individual connectors for more information.</p>\n<h2>Examples</h2>\n<p>Read bytes from stdin:</p>\n<pre><code>load stdin\n</code></pre>\n<p>Read bytes from the URL <code>https://example.com/file.json</code>:</p>\n<pre><code>load https://example.com/file.json\nload https example.com/file.json\n</code></pre>\n<p>Read bytes from the file <code>path/to/eve.json</code>:</p>\n<pre><code>load path/to/eve.json\nload file path/to/eve.json\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/load"
},
{
"label": "lookup",
"type": "keyword",
"detail": "Performs live filtering of the import feed using a context,",
"processedHTML": "<p>and translates context updates into historical queries.</p>\n<h2>Synopsis</h2>\n<pre><code>lookup <context> [--field <field...>] [--separate]\n [--live] [--retro] [--snapshot]\n [--yield <field>] [<context-options>]\nlookup <output>=<context> [--field <field...>] [--separate]\n [--live] [--retro] [--snapshot]\n [--yield <field>] [<context-options>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>lookup</code> operator performs two actions simultaneously:</p>\n<ol>\n<li>Translate context updates into historical queries</li>\n<li>Filter all data with a context that gets ingested into a node</li>\n</ol>\n<p>These two operations combined offer <em>unified matching</em>, i.e., automated retro\nmatching by turning context updates into historical queries, and live matching\nwith a context on the import feed.</p>\n<p>The diagram below illustrates how the operator works:</p>\n<p><img src=\"lookup.excalidraw.svg\" alt=\"lookup\"></p>\n<h3><code><context></code></h3>\n<p>The name of the context to lookup with.</p>\n<h3><code><output></code></h3>\n<p>The name of the field in which to store the context's enrichment.</p>\n<p>Defaults to the name of the context.</p>\n<h3><code>--field <field...></code></h3>\n<p>A comma-separated list of fields, type extractors, or concepts to match.</p>\n<h3><code>--separate</code></h3>\n<p>When multiple fields are provided, e.g., when using <code>--field :ip</code> to enrich all\nIP address fields, duplicate the event for every provided field and enrich them\nindividually.</p>\n<p>When using the option, the context moves from <code><output>.context.<path...></code> to\n<code><output></code> in the resulting event, with a new field <code><output>.path</code> containing\nthe enriched path.</p>\n<h3><code>--live</code></h3>\n<p>Enables live lookup for incoming events.</p>\n<p>By default, both retro and live lookups are enabled. Specifying either <code>--retro</code>\nor <code>--live</code> explicitly disables the other.</p>\n<h3><code>--retro</code></h3>\n<p>Enables retrospective lookups for previously imported events. The <code>lookup</code>\noperator will then apply a context <a href=\"context.md\">after a context update</a>.</p>\n<p>By default, both retro and live lookups are enabled.\nSpecifying either <code>--retro</code> or <code>--live</code> explicitly disables\nthe other.</p>\n<h3><code>--snapshot</code></h3>\n<p>Creates a snapshot of the context at the time of execution. In combination with\n<code>--retro</code>, this will commence a retrospective lookup with that current context\nstate.</p>\n<p>By default, snapshotting is disabled. Not all contexts support this operation.</p>\n<h3><code>--yield <path></code></h3>\n<p>Provide a field into the context object to use as the context instead. If the\nkey does not exist within the context, a <code>null</code> value is used instead.</p>\n<h3><code><context-options></code></h3>\n<p>Optional, context-specific options in the format <code>--key value</code> or <code>--flag</code>.\nRefer to the documentation of the individual contexts for these.</p>\n<h2>Examples</h2>\n<p>Apply the context <code>feodo</code> to incoming <code>suricata.flow</code> events.</p>\n<pre><code>lookup --live feodo --field src_ip\n| where #schema == \"suricata.flow\"\n</code></pre>\n<p>Apply the context <code>feodo</code> to historical <code>suricata.flow</code> events with every update\nto <code>feodo</code>.</p>\n<pre><code>lookup --retro feodo --field src_ip\n| where #schema == \"suricata.flow\"\n</code></pre>\n<p>Apply the context <code>feodo</code> to incoming <code>suricata.flow</code> events, and also apply the\ncontext after an update to <code>feodo</code>.</p>\n<pre><code>lookup feodo --field src_ip\n| where #schema == \"suricata.flow\"\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/lookup"
},
{
"label": "measure",
"type": "keyword",
"detail": "Replaces the input with metrics describing the input.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>measure [--real-time] [--cumulative]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>measure</code> operator yields metrics for each received batch of events or bytes\nusing the following schema, respectively:</p>\n<pre><code class=\"language-title="Events\">type tenzir.metrics.events = record {\n timestamp: time,\n schema: string,\n schema_id: string,\n events: uint64,\n}\n</code></pre>\n<pre><code class=\"language-title="Bytes\">type tenzir.metrics.bytes = record {\n timestamp: time,\n bytes: uint64,\n}\n</code></pre>\n<h3><code>--real-time</code></h3>\n<p>Emit metrics immediately with every batch, rather than buffering until the\nupstream operator stalls, i.e., is idle or waiting for further input.</p>\n<p>The <code>--real-time</code> option is useful when inspect should emit data without\nlatency.</p>\n<h3><code>--cumulative</code></h3>\n<p>Emit running totals for the <code>events</code> and <code>bytes</code> fields rather than per-batch\nstatistics.</p>\n<h2>Examples</h2>\n<p>Get the number of bytes read incrementally for a file:</p>\n<pre><code class=\"language-json\">{\"timestamp\": \"2023-04-28T10:22:10.192322\", \"bytes\": 16384}\n{\"timestamp\": \"2023-04-28T10:22:10.223612\", \"bytes\": 16384}\n{\"timestamp\": \"2023-04-28T10:22:10.297169\", \"bytes\": 16384}\n{\"timestamp\": \"2023-04-28T10:22:10.387172\", \"bytes\": 16384}\n{\"timestamp\": \"2023-04-28T10:22:10.408171\", \"bytes\": 8232}\n</code></pre>\n<p>Get the number of events read incrementally from a file:</p>\n<pre><code class=\"language-json\">{\"timestamp\": \"2023-04-28T10:26:45.159885\", \"events\": 65536, \"schema\": \"suricata.dns\", \"schema_id\": \"d49102998baae44a\"}\n{\"timestamp\": \"2023-04-28T10:26:45.812321\", \"events\": 412, \"schema\": \"suricata.dns\", \"schema_id\": \"d49102998baae44a\"}\n</code></pre>\n<p>Get the total number of events in a file, grouped by schema:</p>\n<pre><code class=\"language-json\">{\"events\": 65948, \"schema\": \"suricata.dns\"}\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/measure"
},
{
"label": "metrics",
"type": "keyword",
"detail": "Retrieves metrics events from a Tenzir node.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>metrics [--live]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>metrics</code> operator retrieves metrics events from a Tenzir node. Metrics\nevents are collected every second.</p>\n<h3><code>--live</code></h3>\n<p>Work on all metrics events as they are generated in real-time instead of on\nmetrics events persisted at a Tenzir node.</p>\n<h2>Schemas</h2>\n<p>Tenzir collects metrics with the following schemas.</p>\n<h3><code>tenzir.metrics.cpu</code></h3>\n<p>Contains a measurement of CPU utilization.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>loadavg_1m</code>|<code>double</code>|The load average over the last minute.|\n|<code>loadavg_5m</code>|<code>double</code>|The load average over the last 5 minutes.|\n|<code>loadavg_15m</code>|<code>double</code>|The load average over the last 15 minutes.|</p>\n<h3><code>tenzir.metrics.disk</code></h3>\n<p>Contains a measurement of disk space usage.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>path</code>|<code>string</code>|The byte measurements below refer to the filesystem on which this path is located.|\n|<code>total_bytes</code>|<code>uint64</code>|The total size of the volume, in bytes.|\n|<code>used_bytes</code>|<code>uint64</code>|The number of bytes occupied on the volume.|\n|<code>free_bytes</code>|<code>uint64</code>|The number of bytes still free on the volume.|</p>\n<h3><code>tenzir.metrics.memory</code></h3>\n<p>Contains a measurement of the available memory on the host.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>total_bytes</code>|<code>uint64</code>|The total available memory, in bytes.|\n|<code>used_bytes</code>|<code>uint64</code>|The amount of memory used, in bytes.|\n|<code>free_bytes</code>|<code>uint64</code>|The amount of free memory, in bytes.|</p>\n<h3><code>tenzir.metrics.operator</code></h3>\n<p>Contains input and output measurements over some amount of time for a single\noperator instantiation.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>pipeline_id</code>|<code>string</code>|The ID of the pipeline where the associated operator is from.|\n|<code>run</code>|<code>uint64</code>|The number of the run, starting at 1 for the first run.|\n|<code>hidden</code>|<code>bool</code>|True if the pipeline is running for the explorer.|\n|<code>operator_id</code>|<code>uint64</code>|The ID of the operator inside the pipeline referenced above.|\n|<code>source</code>|<code>bool</code>|True if this is the first operator in the pipeline.|\n|<code>transformation</code>|<code>bool</code>|True if this is neither the first nor the last operator.|\n|<code>sink</code>|<code>bool</code>|True if this is the last operator in the pipeline.|\n|<code>internal</code>|<code>bool</code>|True if the data flow is considered to internal to Tenzir.|\n|<code>timestamp</code>|<code>time</code>|The time when this event was emitted (immediately after the collection period).|\n|<code>duration</code>|<code>duration</code>|The timespan over which this data was collected.|\n|<code>starting_duration</code>|<code>duration</code>|The time spent to start the operator.|\n|<code>processing_duration</code>|<code>duration</code>|The time spent processing the data.|\n|<code>scheduled_duration</code>|<code>duration</code>|The time that the operator was scheduled.|\n|<code>running_duration</code>|<code>duration</code>|The time that the operator was running.|\n|<code>paused_duration</code>|<code>duration</code>|The time that the operator was paused.|\n|<code>input</code>|<code>record</code>|Measurement of the incoming data stream.|\n|<code>output</code>|<code>record</code>|Measurement of the outgoing data stream.|</p>\n<p>The records <code>input</code> and <code>output</code> have the following schema:</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>unit</code>|<code>string</code>|The type of the elements, which is <code>void</code>, <code>bytes</code> or <code>events</code>.|\n|<code>elements</code>|<code>uint64</code>|Number of elements that were seen during the collection period.|\n|<code>approx_bytes</code>|<code>uint64</code>|An approximation for the number of bytes transmitted.|</p>\n<h3><code>tenzir.metrics.process</code></h3>\n<p>Contains a measurement of the amount of memory used by the <code>tenzir-node</code> process.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>current_memory_usage</code>|<code>uint64</code>|The memory currently used by this process.|\n|<code>peak_memory_usage</code>|<code>uint64</code>|The peak amount of memory, in bytes.|\n|<code>swap_space_usage</code>|<code>uint64</code>|The amount of swap space, in bytes. Only available on Linux systems.|</p>\n<h2>Examples</h2>\n<p>Show the CPU usage over the last hour:</p>\n<pre><code class=\"language-c\">metrics\n| where #schema == \"tenzir.metrics.cpu\"\n| where timestamp > 1 hour ago\n| put timestamp, percent=loadavg_1m\n</code></pre>\n<details>\n<summary>Output</summary>\n<pre><code class=\"language-json\">{\n \"timestamp\": \"2023-12-21T12:00:32.631102\",\n \"percent\": 0.40478515625\n}\n{\n \"timestamp\": \"2023-12-21T11:59:32.626043\",\n \"percent\": 0.357421875\n}\n{\n \"timestamp\": \"2023-12-21T11:58:32.620327\",\n \"percent\": 0.42578125\n}\n{\n \"timestamp\": \"2023-12-21T11:57:32.614810\",\n \"percent\": 0.50390625\n}\n{\n \"timestamp\": \"2023-12-21T11:56:32.609896\",\n \"percent\": 0.32080078125\n}\n{\n \"timestamp\": \"2023-12-21T11:55:32.605871\",\n \"percent\": 0.5458984375\n}\n</code></pre>\n</details>\n<p>Get the current memory usage:</p>\n<pre><code class=\"language-c\">metrics\n| where #schema == \"tenzir.metrics.memory\"\n| sort timestamp desc\n| tail 1\n| put current_memory_usage\n</code></pre>\n<details>\n<summary>Output</summary>\n<pre><code class=\"language-json\">{\n \"current_memory_usage\": 1083031552\n}\n</code></pre>\n</details>\n<p>Show the total pipeline ingress in bytes for every day over the last week,\nexcluding pipelines run in the Explorer:</p>\n<pre><code class=\"language-c\">metrics\n| where #schema == \"tenzir.metrics.operator\"\n| where timestamp > 1 week ago\n| where hidden == false and source == true\n| summarize bytes=sum(output.approx_bytes) by timestamp resolution 1 day\n</code></pre>\n<details>\n<summary>Output</summary>\n<pre><code class=\"language-json\">{\n \"timestamp\": \"2023-11-08T00:00:00.000000\",\n \"bytes\": 79927223\n}\n{\n \"timestamp\": \"2023-11-09T00:00:00.000000\",\n \"bytes\": 51788928\n}\n{\n \"timestamp\": \"2023-11-10T00:00:00.000000\",\n \"bytes\": 80740352\n}\n{\n \"timestamp\": \"2023-11-11T00:00:00.000000\",\n \"bytes\": 75497472\n}\n{\n \"timestamp\": \"2023-11-12T00:00:00.000000\",\n \"bytes\": 55497472\n}\n{\n \"timestamp\": \"2023-11-13T00:00:00.000000\",\n \"bytes\": 76546048\n}\n{\n \"timestamp\": \"2023-11-14T00:00:00.000000\",\n \"bytes\": 68643200\n}\n</code></pre>\n</details>\n<p>Show the three operator instantiations that produced the most events in total\nand their pipeline IDs:</p>\n<pre><code class=\"language-c\">metrics\n| where #schema == \"tenzir.metrics.operator\"\n| where output.unit == \"events\"\n| summarize events=max(output.elements) by pipeline_id, operator_id\n| sort events desc\n| head 3\n</code></pre>\n<details>\n<summary>Output</summary>\n<pre><code class=\"language-json\">{\n \"pipeline_id\": \"70a25089-b16c-448d-9492-af5566789b99\",\n \"operator_id\": 0,\n \"events\": 391008694\n}\n{\n \"pipeline_id\": \"7842733c-06d6-4713-9b80-e20944927207\",\n \"operator_id\": 0,\n \"events\": 246914949\n}\n{\n \"pipeline_id\": \"6df003be-0841-45ad-8be0-56ff4b7c19ef\",\n \"operator_id\": 1,\n \"events\": 83013294\n}\n</code></pre>\n</details>\n<p>Get the disk usage over time:</p>\n<pre><code class=\"language-c\">metrics\n| where #schema == \"tenzir.metrics.disk\"\n| sort timestamp\n| put timestamp, used_bytes\n</code></pre>\n<details>\n<summary>Output</summary>\n<pre><code class=\"language-json\">{\n \"timestamp\": \"2023-12-21T12:52:32.900086\",\n \"used_bytes\": 461834444800\n}\n{\n \"timestamp\": \"2023-12-21T12:53:32.905548\",\n \"used_bytes\": 461834584064\n}\n{\n \"timestamp\": \"2023-12-21T12:54:32.910918\",\n \"used_bytes\": 461840302080\n}\n{\n \"timestamp\": \"2023-12-21T12:55:32.916200\",\n \"used_bytes\": 461842751488\n}\n</code></pre>\n</details>\n<p>Get the memory usage over time:</p>\n<pre><code class=\"language-c\">metrics\n| where #schema == \"tenzir.metrics.memory\"\n| sort timestamp\n| put timestamp, used_bytes\n</code></pre>\n<details>\n<summary>Output</summary>\n<pre><code class=\"language-json\">{\n \"timestamp\": \"2023-12-21T13:08:32.982083\",\n \"used_bytes\": 48572645376\n}\n{\n \"timestamp\": \"2023-12-21T13:09:32.986962\",\n \"used_bytes\": 48380682240\n}\n{\n \"timestamp\": \"2023-12-21T13:10:32.992494\",\n \"used_bytes\": 48438878208\n}\n{\n \"timestamp\": \"2023-12-21T13:11:32.997889\",\n \"used_bytes\": 48491839488\n}\n{\n \"timestamp\": \"2023-12-21T13:12:33.003323\",\n \"used_bytes\": 48529952768\n}\n</code></pre>\n</details>",
"docLink": "https://docs.tenzir.com/operators/metrics"
},
{
"label": "nics",
"type": "keyword",
"detail": "Shows a snapshot of available network interfaces.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>nics\n</code></pre>\n<h2>Description</h2>\n<p>The <code>nics</code> operator shows a snapshot of all available network interfaces.</p>\n<h2>Schemas</h2>\n<p>Tenzir emits network interface card information with the following schema.</p>\n<h3><code>tenzir.nic</code></h3>\n<p>Contains detailed information about the network interface.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>name</code>|<code>string</code>|The name of the network interface.|\n|<code>description</code>|<code>string</code>|A brief note or explanation about the network interface.|\n|<code>addresses</code>|<code>list</code>|A list of IP addresses assigned to the network interface.|\n|<code>loopback</code>|<code>bool</code>|Indicates if the network interface is a loopback interface.|\n|<code>up</code>|<code>bool</code>|Indicates if the network interface is up and can transmit data.|\n|<code>running</code>|<code>bool</code>|Indicates if the network interface is running and operational.|\n|<code>wireless</code>|<code>bool</code>|Indicates if the network interface is a wireless interface.|\n|<code>status</code>|<code>record</code>|A record containing detailed status information about the network interface.|</p>\n<p>The record <code>status</code> has the following schema:</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>unknown</code>|<code>bool</code>|Indicates if the network interface status is unknown.|\n|<code>connected</code>|<code>bool</code>|Indicates if the network interface is connected.|\n|<code>disconnected</code>|<code>bool</code>|Indicates if the network interface is disconnected.|\n|<code>not_applicable</code>|<code>bool</code>|Indicates if the network interface is not applicable.|</p>\n<h2>Examples</h2>\n<p>List all connected network interfaces.</p>\n<pre><code>nics\n| where status.connected == true\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/nics"
},
{
"label": "openapi",
"type": "keyword",
"detail": "Shows the node's OpenAPI specification.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>openapi\n</code></pre>\n<h2>Description</h2>\n<p>The <code>openapi</code> operator shows the current Tenzir node's <a href=\"/api\">OpenAPI\nspecification</a> for all available REST endpoint plugins.</p>\n<h2>Examples</h2>\n<p>Render the OpenAPI specification as YAML:</p>\n<pre><code>openapi | write yaml\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/openapi"
},
{
"label": "parse",
"type": "keyword",
"detail": "Applies a parser to the string stored in a given field.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>parse <input> <parser> <args>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>parse</code> operator parses a given <code><input></code> field of type <code>string</code> using\n<code><parser></code> and replaces this field with the result. <code><parser></code> can be one of the\nparsers in <a href=\"../formats.md\">formats</a>.</p>\n<h2>Examples</h2>\n<p>Parse <a href=\"../formats/cef.md\">CEF</a> from the Syslog messages stored in <code>test.log</code>,\nreturning only the result from CEF parser.</p>\n<pre><code>from test.log read syslog | parse content cef | yield content\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/parse"
},
{
"label": "pass",
"type": "keyword",
"detail": "Does nothing with the input.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>pass\n</code></pre>\n<h2>Description</h2>\n<p>The <code>pass</code> operator relays the input without any modification. It exists\nprimarily for testing and debugging.</p>\n<p>You can think of <code>pass</code> as the \"identity\" operator.</p>\n<h2>Examples</h2>\n<p>Forward the input without any changes:</p>\n<pre><code>pass\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/pass"
},
{
"label": "processes",
"type": "keyword",
"detail": "Shows a snapshot of running processes.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>processes\n</code></pre>\n<h2>Description</h2>\n<p>The <code>processes</code> operator shows a snapshot of all currently running processes.</p>\n<h2>Schemas</h2>\n<p>Tenzir emits process information with the following schema.</p>\n<h3><code>tenzir.process</code></h3>\n<p>Contains detailed information about the process.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>name</code>|<code>string</code>|The process name.|\n|<code>command_line</code>|<code>list<string></code>|The command line of the process.|\n|<code>pid</code>|<code>uint64</code>|The process identifier.|\n|<code>ppid</code>|<code>uint64</code>|The parent process identifier.|\n|<code>uid</code>|<code>uint64</code>|The user identifier of the process owner.|\n|<code>gid</code>|<code>uint64</code>|The group identifier of the process owner.|\n|<code>ruid</code>|<code>uint64</code>|The real user identifier of the process owner.|\n|<code>rgid</code>|<code>uint64</code>|The real group identifier of the process owner.|\n|<code>priority</code>|<code>string</code>|The priority level of the process.|\n|<code>startup</code>|<code>time</code>|The time when the process was started.|\n|<code>vsize</code>|<code>uint64</code>|The virtual memory size of the process.|\n|<code>rsize</code>|<code>uint64</code>|The resident set size (physical memory used) of the process.|\n|<code>swap</code>|<code>uint64</code>|The amount of swap memory used by the process.|\n|<code>peak_mem</code>|<code>uint64</code>|Peak memory usage of the process.|\n|<code>open_fds</code>|<code>uint64</code>|The number of open file descriptors by the process.|\n|<code>utime</code>|<code>duration</code>|The user CPU time consumed by the process.|\n|<code>stime</code>|<code>duration</code>|The system CPU time consumed by the process.|</p>\n<h2>Examples</h2>\n<p>Show running processes sorted by how long they've been running:</p>\n<pre><code>processes\n| sort startup desc\n</code></pre>\n<p>Show the top five running processes by name:</p>\n<pre><code>processes\n| top name\n| head 5\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/processes"
},
{
"label": "pseudonymize",
"type": "keyword",
"detail": "Pseudonymizes fields according to a given method.",
"processedHTML": "<div class=\"remark-container warning\"><div class=\"remark-container-title warning\">Deprecated</div><p>This operator will soon be removed in favor of first-class support for functions\nthat can be used in a variety of different operators and contexts.</p></div>\n<h2>Synopsis</h2>\n<pre><code>pseudonymize -m|--method=<string> -s|--seed=<seed> <extractor>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>pseudonimize</code> operator replaces IP address using the\n<a href=\"https://en.wikipedia.org/wiki/Crypto-PAn\">Crypto-PAn</a> algorithm.</p>\n<p>Currently, <code>pseudonimize</code> exclusively works for fields of type <code>ip</code>.</p>\n<h3><code>-m|--method=<string></code></h3>\n<p>The algorithm for pseudonimization</p>\n<h3><code>-s|--seed=<seed></code></h3>\n<p>A 64-byte seed that describes a hexadecimal value. When the seed is shorter than\n64 bytes, the operator will append zeros to match the size; when it is longer,\nit will truncate the seed.</p>\n<h3><code><extractor>...</code></h3>\n<p>The list of extractors describing fields to pseudonomize. If an extractor\nmatches types other than IP addresses, the operator will ignore them.</p>\n<h2>Example</h2>\n<p>Pseudonymize all values of the fields <code>src_ip</code> and <code>dest_ip</code> using the\n<code>crypto-pan</code> algorithm and <code>deadbeef</code> seed:</p>\n<pre><code>pseudonymize --method=\"crypto-pan\" --seed=\"deadbeef\" src_ip, dest_ip\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/pseudonymize"
},
{
"label": "publish",
"type": "keyword",
"detail": "Publishes events to a channel with a topic. The dual to",
"processedHTML": "<p><a href=\"subscribe.md\"><code>subscribe</code></a>.</p>\n<h2>Synopsis</h2>\n<pre><code>publish [<topic>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>publish</code> operator publishes events at a node in a channel with the\nspecified topic. Any number of subscribers using the <a href=\"subscribe.md\"><code>subscribe</code></a>\noperator receive the events immediately.</p>\n<h3><code><topic></code></h3>\n<p>An optional topic for publishing events under. The provided topic must be\nunique.</p>\n<p>Defaults to the empty string.</p>\n<h2>Examples</h2>\n<p>Publish Zeek conn logs under the topic <code>zeek-conn</code>.</p>\n<pre><code>from file conn.log read zeek-tsv | publish zeek-conn\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/publish"
},
{
"label": "put",
"type": "keyword",
"detail": "Returns new events that only contain a set of specified fields.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>put <field[=operand]>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>put</code> operator produces new events according to a specified list of fields.\nAll other fields are removed from the input.</p>\n<p>The difference between <code>put</code> and <a href=\"extend.md\"><code>extend</code></a> is that <code>put</code> drops all\nfields not explicitly specified, whereas <code>extend</code> only appends fields.</p>\n<h3><code><field[=operand]></code></h3>\n<p>The <code>field</code> describes the name of the field to select. The extended form with an\n<code>operand</code> assignment allows for computing functions over existing fields.</p>\n<p>If the right-hand side of the assignment\nis omitted, the field name is implicitly used as an extractor. If multiple\nfields match the extractor, the first matching field is used in the output. If\nno fields match, <code>null</code> is assigned instead.</p>\n<h3>Examples</h3>\n<p>Overwrite values of the field <code>payload</code> with a fixed value:</p>\n<pre><code class=\"language-c\">put payload=\"REDACTED\"\n</code></pre>\n<p>Create connection 4-tuples:</p>\n<pre><code class=\"language-c\">put src_ip, src_port, dst_ip, dst_port\n</code></pre>\n<p>Unlike <a href=\"select.md\"><code>select</code></a>, <code>put</code> reorders fields. If the specified fields\ndo not exist in the input, <code>null</code> values will be assigned.</p>\n<p>You can also reference existing fields:</p>\n<pre><code class=\"language-c\">put src_ip, src_port, dst_ip=dest_ip, dst_port=dest_port\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/put"
},
{
"label": "python",
"type": "keyword",
"detail": "Executes Python code against each event of the input.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>python [--requirements <string>] <code>\npython [--requirements <string>] --file <path>\n</code></pre>\n<div class=\"remark-container info\"><div class=\"remark-container-title info\">Requirements</div><p>A Python 3 (>=3.10) interpreter must be present in the <code>PATH</code> environment\nvariable of the <code>tenzir</code> or <code>tenzir-node</code> process.</p></div>\n<h2>Description</h2>\n<p>The <code>python</code> operator executes user-provided Python code against each event of\nthe input.</p>\n<p>By default, the Tenzir node executing the pipeline creates a virtual environment\ninto which the <code>tenzir</code> Python package is installed. This behavior can be turned\noff in the node configuration using the <code>plugin.python.create-venvs</code> boolean\noption.</p>\n<div class=\"remark-container note\"><div class=\"remark-container-title note\">Performance</div><p>The <code>python</code> operator implementation applies the provided Python code to each\ninput row one bw one. We use\n<a href=\"https://arrow.apache.org/docs/python/index.html\">PyArrow</a> to convert the input\nvalues to native Python data types and back to the Tenzir data model after the\ntransformation.</p></div>\n<h3><code>--requirements <string></code></h3>\n<p>The <code>--requirements</code> flag can be used to pass additional package dependencies in\nthe pip format. When it is used, the argument is passed on to <code>pip install</code> in a\ndedicated virtual environment.</p>\n<p>The string is passed verbatim to <code>pip install</code>. To add multiple dependencies,\nseparate them with a space: <code>--requirements \"foo bar\"</code>.</p>\n<h3><code><code></code></h3>\n<p>The provided Python code describes an event-for-event transformation, i.e., it\nis executed once for each input event and produces exactly output event.</p>\n<p>An implicitly defined <code>self</code> variable represents the event. Modify it to alter\nthe output of the operator. Fields of the event can be accessed with the dot\nnotation. For example, if the input event contains fields <code>a</code> and <code>b</code> then the\nPython code can access and modify them using <code>self.a</code> and <code>self.b</code>. Similarly,\nnew fields are added by assigning to <code>self.fieldname</code> and existing fields can be\nremoved by deleting them from <code>self</code>. When new fields are added, it is required\nthat the new field has the same type for every row of the event.</p>\n<h3><code>--file <path></code></h3>\n<p>Instead of providing the code inline, the <code>--file</code> option allows for passing\na path to a file containing the code the operator executes per event.</p>\n<h2>Examples</h2>\n<p>Insert or modify the field <code>x</code> and set it to <code>\"hello, world\"</code>:</p>\n<pre><code>python 'self.x = \"hello, world\"'\n</code></pre>\n<p>Clear the contents of <code>self</code> to remove the implicit input values from the\noutput:</p>\n<pre><code>python '\n self.clear()\n self.x = 23\n'\n</code></pre>\n<p>Define a new field <code>x</code> as the square root of the field <code>y</code>, and remove <code>y</code> from\nthe output:</p>\n<pre><code>python '\n import math\n self.x = math.sqrt(self.y)\n del self.y\n'\n</code></pre>\n<p>Make use of third party packages:</p>\n<pre><code>python --requirements \"requests=^2.30\" '\n import requests\n requests.post(\"http://imaginary.api/receive\", data=self)\n'\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/python"
},
{
"label": "rare",
"type": "keyword",
"detail": "Shows the least common values. The dual to top.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>rare <field> [--count-field=<count-field>|-c <count-field>]\n</code></pre>\n<h2>Description</h2>\n<p>Shows the least common values for a given field. For each unique value, a new event containing its count will be produced.</p>\n<h3><code><field></code></h3>\n<p>The name of the field to find the least common values for.</p>\n<h3><code>--count-field=<count-field>|-c <count-field></code></h3>\n<p>An optional argument specifying the field name of the count field. Defaults to <code>count</code>.</p>\n<p>The count field and the value field must have different names.</p>\n<h2>Examples</h2>\n<p>Find the least common values for field <code>id.orig_h</code>.</p>\n<pre><code>rare id.orig_h\n</code></pre>\n<p>Find the least common values for field <code>count</code> and present the value amount in a field <code>amount</code>.</p>\n<pre><code>rare count --count-field=amount\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/rare"
},
{
"label": "read",
"type": "keyword",
"detail": "The read operator converts raw bytes into events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>read <format>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>read</code> operator parses events by interpreting its input bytes in a given\nformat.</p>\n<h3><code><format></code></h3>\n<p>The <a href=\"../formats.md\">format</a> used to convert raw bytes into events.</p>\n<p>Some formats have format-specific options. Please refer to the documentation of\nthe individual formats for more information.</p>\n<h2>Examples</h2>\n<p>Read the input bytes as Zeek TSV logs:</p>\n<pre><code>read zeek-tsv\n</code></pre>\n<p>Read the input bytes as Suricata Eve JSON:</p>\n<pre><code>read suricata\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/read"
},
{
"label": "rename",
"type": "keyword",
"detail": "Renames fields and types.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>rename <name=extractor>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>rename</code> operator assigns new names to fields or types. Renaming only\nmodifies metadata and is therefore computationally inexpensive. The operator\nhandles nested field extractors as well, but cannot perform field reordering,\ne.g., by hoisting nested fields into the top level.</p>\n<p>Renaming only takes place if the provided extractor on the right-hand side of\nthe assignment resolves to a field or type. Otherwise the assignment does\nnothing. If no extractors match, <code>rename</code> degenerates to <a href=\"pass.md\"><code>pass</code></a>.</p>\n<h3><code><name=extractor>...</code></h3>\n<p>An assignment of the form <code>name=extractor</code> renames the field or type identified\nby <code>extractor</code> to <code>name</code>.</p>\n<h2>Examples</h2>\n<p>Rename events of type <code>suricata.flow</code> to <code>connection</code>:</p>\n<pre><code>rename connection=:suricata.flow\n</code></pre>\n<p>Assign new names to the fields <code>src_ip</code> and <code>dest_ip</code>:</p>\n<pre><code>rename src=src_ip, dst=dest_ip\n</code></pre>\n<p>Give the nested field <code>orig_h</code> nested under the record <code>id</code> the name <code>src_ip</code>:</p>\n<pre><code>rename src=id.orig_h\n</code></pre>\n<p>Same as above, but consider fields at any nesting hierarchy:</p>\n<pre><code>rename src=orig_h\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/rename"
},
{
"label": "repeat",
"type": "keyword",
"detail": "Repeats the input a number of times.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>repeat [<repetitions>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>repeat</code> operator relays the input without any modification, and repeats its\ninputs a specified number of times. It is primarily used for testing and when\nworking with generated data.</p>\n<p>The repeat operator keeps its input in memory. Avoid using it to repeat large\ndata sets.</p>\n<h3><code><repetitions></code></h3>\n<p>The number of times to repeat the input data.</p>\n<p>If not specified, the operator repeats its input indefinitely.</p>\n<h2>Examples</h2>\n<p>Given the following events as JSON:</p>\n<pre><code class=\"language-json\">{\"number\": 1, \"text\": \"one\"}\n{\"number\": 2, \"text\": \"two\"}\n</code></pre>\n<p>The <code>repeat</code> operator will repeat them indefinitely, in order:</p>\n<pre><code>repeat\n</code></pre>\n<pre><code class=\"language-json\">{\"number\": 1, \"text\": \"one\"}\n{\"number\": 2, \"text\": \"two\"}\n{\"number\": 1, \"text\": \"one\"}\n{\"number\": 2, \"text\": \"two\"}\n{\"number\": 1, \"text\": \"one\"}\n{\"number\": 2, \"text\": \"two\"}\n// …\n</code></pre>\n<p>To just repeat the first event 5 times, use:</p>\n<pre><code>head 1 | repeat 5\n</code></pre>\n<pre><code class=\"language-json\">{\"number\": 1, \"text\": \"one\"}\n{\"number\": 1, \"text\": \"one\"}\n{\"number\": 1, \"text\": \"one\"}\n{\"number\": 1, \"text\": \"one\"}\n{\"number\": 1, \"text\": \"one\"}\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/repeat"
},
{
"label": "replace",
"type": "keyword",
"detail": "Replaces the fields matching the given extractors with fixed values.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>replace <extractor=operand>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>replace</code> operator mutates existing fields by providing a new value.</p>\n<p>The difference between <code>replace</code> and <a href=\"extend.md\"><code>extend</code></a> is that <code>replace</code>\noverwrites existing fields, whereas <code>extend</code> doesn't touch the input.</p>\n<h3><code><extractor=operand></code></h3>\n<p>The assignment consists of an <code>extractor</code> that matches against existing fields\nand an <code>operand</code> that defines the new field value.</p>\n<p>If <code>field</code> does not exist in the input, the operator degenerates to\n<a href=\"pass.md\"><code>pass</code></a>. Use the <a href=\"set.md\"><code>set</code></a> operator to extend fields that cannot\nbe replaced.</p>\n<h3>Examples</h3>\n<p>Replace the field the field <code>src_ip</code> with a fixed value:</p>\n<pre><code>replace src_ip=0.0.0.0\n</code></pre>\n<p>Replace all IP address with a fixed value:</p>\n<pre><code>replace :ip=0.0.0.0\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/replace"
},
{
"label": "save",
"type": "keyword",
"detail": "The save operator saves bytes to a connector.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>save <uri>\nsave <path>\nsave <connector>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>save</code> operator operates on raw bytes.</p>\n<p>Notably, it cannot be used after an operator that emits events, but rather only\nwith operators that emit bytes, e.g., <a href=\"write.md\"><code>write</code></a> or <a href=\"load.md\"><code>load</code></a>.</p>\n<h3><code><connector></code></h3>\n<p>The <a href=\"../connectors.md\">connector</a> used to save bytes.</p>\n<p>Some connectors have connector-specific options. Please refer to the\ndocumentation of the individual connectors for more information.</p>\n<h2>Examples</h2>\n<p>Write bytes to stdout:</p>\n<pre><code>save stdin\n</code></pre>\n<p>Write bytes to the file <code>path/to/eve.json</code>:</p>\n<pre><code>save path/to/eve.json\nsave file path/to/eve.json\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/save"
},
{
"label": "select",
"type": "keyword",
"detail": "Selects fields from the input.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>select <extractor>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>select</code> operator keeps only the fields matching the provided extractors and\nremoves all other fields. It is the dual to <a href=\"drop.md\"><code>drop</code></a>.</p>\n<p>In relational algebra, <code>select</code> performs a <em>projection</em> of the provided\narguments.</p>\n<h3><code><extractor>...</code></h3>\n<p>A comma-separated list of extractors that identify the fields to keep.</p>\n<h2>Examples</h2>\n<p>Only keep fields <code>foo</code> and <code>bar</code>:</p>\n<pre><code>select foo, bar\n</code></pre>\n<p>Select all fields of type <code>ip</code>:</p>\n<pre><code>select :ip\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/select"
},
{
"label": "serve",
"type": "keyword",
"detail": "Make events available under the \\[/serve REST API",
"processedHTML": "<p>endpoint](/api#/paths/~1serve/post).</p>\n<h2>Synopsis</h2>\n<pre><code>serve [--buffer-size <buffer-size>] <serve-id>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>serve</code> operator bridges between pipelines and the corresponding <code>/serve</code>\n<a href=\"/api#/paths/~1serve/post\">REST API endpoint</a>:</p>\n<p><img src=\"serve.excalidraw.svg\" alt=\"Serve Operator\"></p>\n<p>Pipelines ending with the <code>serve</code> operator exit when all events have been\ndelivered over the corresponding endpoint.</p>\n<h3><code>--buffer-size <buffer-size></code></h3>\n<p>The buffer size specifies the maximum number of events to keep in the <code>serve</code>\noperator to make them instantly available in the corresponding endpoint before\nthrottling the pipeline execution.</p>\n<p>Defaults to <code>64Ki</code>.</p>\n<h3><code><serve-id></code></h3>\n<p>The serve id is an identifier that uniquely identifies the operator. The <code>serve</code>\noperator errors when receiving a duplicate serve id.</p>\n<h2>Examples</h2>\n<h3>Read a Zeek conn.log, 100 events at a time:</h3>\n<pre><code class=\"language-bash\">tenzir 'from file path/to/conn.log read zeek-tsv | serve zeek-conn-logs'\n</code></pre>\n<pre><code class=\"language-bash\">curl \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -d '{\"serve_id\": \"zeek-conn-logs\", \"continuation_token\": null, \"timeout\": \"1s\", \"max_events\": 100}' \\\n http://localhost:5160/api/v0/serve\n</code></pre>\n<p>This will return up to 100 events, or less if the specified timeout of 1 second\nexpired.</p>\n<p>Subsequent results for further events must specify a continuation token. The\ntoken is included in the response under <code>next_continuation_token</code> if there are\nfurther events to be retrieved from the endpoint.</p>\n<h3>Wait for an initial event</h3>\n<p>This pipeline will produce 10 events after 3 seconds of doing nothing.</p>\n<pre><code class=\"language-bash\">tenzir \"shell \\\"sleep 3; jq --null-input '{foo: 1}'\\\" | read json | repeat 10 | serve slow-events\"\n</code></pre>\n<pre><code class=\"language-bash\">curl \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -d '{\"serve_id\": \"slow-events\", \"continuation_token\": null, \"timeout\": \"5s\", \"min_events\": 1}' \\\n http://localhost:5160/api/v0/serve\n</code></pre>\n<p>The call to <code>/serve</code> will wait up to 5 seconds for the first event from the pipeline arriving at the serve operator,\nand return immediately once the first event arrives.</p>",
"docLink": "https://docs.tenzir.com/operators/serve"
},
{
"label": "set",
"type": "keyword",
"detail": "Upserts fields in events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>set <field=operand>...\n</code></pre>\n<h2>Description</h2>\n<p>The <code>set</code> operator sets a list of fields to the given values. It overwrites old\nvalues of fields matching the <code>field</code> expression, or creates new fields of a\ngiven name otherwise.</p>\n<h3><code><field=operand></code></h3>\n<p>The assignment consists of <code>field</code> that describes the new field name and\n<code>operand</code> that defines the field value. If the field name already exists, the\noperator replaces the value of the field.</p>\n<h3>Examples</h3>\n<p>Upsert new fields with fixed values:</p>\n<pre><code>set secret=\"xxx\", ints=[1, 2, 3], strs=[\"a\", \"b\", \"c\"]\n</code></pre>\n<p>Move a column, replacing the old value with <code>null</code>.</p>\n<pre><code>set source=src_ip, src_ip=null\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/set"
},
{
"label": "shell",
"type": "keyword",
"detail": "Executes a system command and hooks its stdin and stdout into the pipeline.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>shell <command>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>shell</code> operator executes the provided command by spawning a new process.\nThe input of the operator is forwarded to the child's standard input. Similarly,\nthe child's standard output is forwarded to the output of the operator.</p>\n<h3><code><command></code></h3>\n<p>The command to execute and hook into the pipeline processing.</p>\n<p>The value of <code>command</code> is a single string. If you would like to pass a command\nline as you would on the shell, use single or double quotes for escaping, e.g.,\n<code>shell 'jq -C'</code> or <code>shell \"jq -C\"</code>. The command is interpreted by <code>/bin/sh -c</code>.</p>\n<h2>Examples</h2>\n<p>Show a live log from the <code>tenzir-node</code> service:</p>\n<pre><code>shell \"journalctl -u tenzir-node -f | read json\"\n</code></pre>\n<p>Consider the use case of converting CSV to JSON:</p>\n<pre><code class=\"language-bash\">tenzir 'read csv | write json' | jq -C\n</code></pre>\n<p>The <code>write json</code> operator produces NDJSON. Piping this output to <code>jq</code> generates a\ncolored, tree-structured variation that is (arguably) easier to read. Using the\n<code>shell</code> operator, you can integrate Unix tools that rely on\nstdin/stdout for input/output as \"native\" operators that process raw bytes. For\nexample, in this pipeline:</p>\n<pre><code>write json | save stdout\n</code></pre>\n<p>The <a href=\"write.md\"><code>write</code></a> operator produces raw bytes and <a href=\"save.md\"><code>save</code></a>\naccepts raw bytes. The <code>shell</code> operator therefore fits right in the middle:</p>\n<pre><code>write json | shell \"jq -C\" | save stdout\n</code></pre>\n<p>Using <a href=\"../language/user-defined-operators.md\">user-defined operators</a>, we can\nexpose this (potentially verbose) post-processing more succinctly in the\npipeline language:</p>\n<pre><code class=\"language-yaml\">tenzir:\n operators:\n jsonize:\n write json | shell \"jq -C\" | save stdout\n</code></pre>\n<p>Now you can use <code>jsonize</code> as a custom operator in a pipeline:</p>\n<pre><code class=\"language-bash\">tenzir 'read csv | where field > 42 | jsonize' < file.csv\n</code></pre>\n<p>This mechanism allows for wrapping also more complex invocation of tools.\n<a href=\"https://zeek.org\">Zeek</a>, for example, converts packets into structured network\nlogs. Tenzir already has support for consuming Zeek output with the formats\n<a href=\"../formats/zeek-json.md\"><code>zeek-json</code></a> and\n<a href=\"../formats/zeek-tsv.md\"><code>zeek-tsv</code></a>. But that requires attaching yourself\ndownstream of a Zeek instance. Sometimes you want instant Zeek analytics given a\nPCAP trace.</p>\n<p>With the <code>shell</code> operator, you can script a Zeek invocation and readily\npost-process the output with a rich set of operators, to filter, reshape,\nenrich, or route the logs as structured data. Let's define a <code>zeek</code> operator for\nthat:</p>\n<pre><code class=\"language-yaml\">tenzir:\n operators:\n zeek:\n shell \"zeek -r - LogAscii::output_to_stdout=T\n JSONStreaming::disable_default_logs=T\n JSONStreaming::enable_log_rotation=F\n json-streaming-logs\"\n | read zeek-json\n</code></pre>\n<p>Processing a PCAP trace now is a matter of calling the <code>zeek</code> operator:</p>\n<pre><code class=\"language-bash\">gunzip -c example.pcap.gz |\n tenzir 'zeek | select id.orig_h, id.orig_p, id.resp_h | head 3'\n</code></pre>\n<pre><code class=\"language-json\">{\"id\": {\"orig_h\": null, \"resp_h\": null, \"resp_p\": null}}\n{\"id\": {\"orig_h\": \"192.168.168.100\", \"resp_h\": \"83.135.95.78\", \"resp_p\": 0}}\n{\"id\": {\"orig_h\": \"192.168.168.100\", \"resp_h\": \"83.135.95.78\", \"resp_p\": 22}}\n</code></pre>\n<p>NB: because <code>zeek</code> (= <code>shell</code>) reads bytes, we can drop the implicit <code>load stdin</code> source operator in this pipeline.</p>",
"docLink": "https://docs.tenzir.com/operators/shell"
},
{
"label": "show",
"type": "keyword",
"detail": "Returns information about a Tenzir node.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>show [<aspect>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>show</code> operator offers introspection capabilities to look at various\n<em>aspects</em> of a Tenzir node.</p>\n<h3><code><aspect></code></h3>\n<p>Describes the part of Tenzir to look at.</p>\n<p>Available aspects:</p>\n<ul>\n<li><code>config</code>: shows all current configuration options.</li>\n<li><code>connectors</code>: shows all available <a href=\"../connectors.md\">connectors</a>.</li>\n<li><code>contexts</code>: shows all available contexts.</li>\n<li><code>formats</code>: shows all available <a href=\"../formats.md\">formats</a>.</li>\n<li><code>operators</code>: shows all available <a href=\"../operators.md\">operators</a>.</li>\n<li><code>partitions</code>: shows all table partitions of a remote node.</li>\n<li><code>pipelines</code>: shows all managed pipelines of a remote node.</li>\n<li><code>plugins</code>: shows all loaded plugins.</li>\n</ul>\n<p>We also offer some additional aspects for experts that want to take a deeper\nlook at what's going on:</p>\n<ul>\n<li><code>build</code>: shows compile-time build information.</li>\n<li><code>dependencies</code>: shows information about build-time dependencies.</li>\n<li><code>fields</code>: shows all fields of existing tables at a remote node.</li>\n<li><code>schemas</code> shows all schema definitions for which data is stored at the node.</li>\n<li><code>serves</code> shows all pipelines with the <code>serve</code> sink operator currently\navailable from the <code>/serve</code> API endpoint.</li>\n<li><code>types</code>: shows all known types at a remote node.</li>\n</ul>\n<p>When no aspect is specified, all are shown.</p>\n<h2>Examples</h2>\n<p>Show all available connectors and formats:</p>\n<pre><code>show connectors\nshow formats\n</code></pre>\n<p>Show all transformations:</p>\n<pre><code>show operators | where transformation == true\n</code></pre>\n<p>Show all fields and partitions at a node:</p>\n<pre><code>show fields\nshow partitions\n</code></pre>\n<p>Show all aspects of a node:</p>\n<pre><code>show\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/show"
},
{
"label": "sigma",
"type": "keyword",
"detail": "Filter the input with \\[Sigma rules]\\[sigma] and output matching events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>sigma <rule> [--refresh-interval <refresh-interval>]\nsigma <directory> [--refresh-interval <refresh-interval>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>sigma</code> operator executes <a href=\"https://github.com/SigmaHQ/sigma\">Sigma rules</a> on\nits input. If a rule matches, the operator emits a <code>tenzir.sigma</code> event that\nwraps the input record into a new record along with the matching rule. The\noperator discards all events that do not match the provided rules.</p>\n<p>For each rule, the operator transpiles the YAML into an\n<a href=\"../language/expressions.md\">expression</a> and instantiates a\n<a href=\"where.md\"><code>where</code></a> operator, followed by <a href=\"put.md\"><code>put</code></a> to generate an output.\nHere's how the transpilation works. The Sigma rule YAML format requires a\n<code>detection</code> attribute that includes a map of named sub-expression called <em>search\nidentifiers</em>. In addition, <code>detection</code> must include a final <code>condition</code> that\ncombines search identifiers using boolean algebra (AND, OR, and NOT) or\nsyntactic sugar to reference groups of search expressions, e.g., using the\n<code>1/all of *</code> or plain wildcard syntax. Consider the following <code>detection</code>\nembedded in a rule:</p>\n<pre><code class=\"language-yaml\">detection:\n foo:\n a: 42\n b: \"evil\"\n bar:\n c: 1.2.3.4\n condition: foo or not bar\n</code></pre>\n<p>We translate this rule piece by building a symbol table of all keys (<code>foo</code> and\n<code>bar</code>). Each sub-expression is a valid expression in itself:</p>\n<ol>\n<li><code>foo</code>: <code>a == 42 && b == \"evil\"</code></li>\n<li><code>bar</code>: <code>c == 1.2.3.4</code></li>\n</ol>\n<p>Finally, we combine the expression according to <code>condition</code>:</p>\n<pre><code class=\"language-c\">(a == 42 && b == \"evil\") || ! (c == 1.2.3.4)\n</code></pre>\n<p>We parse the YAML string values according to Tenzir's richer data model, e.g.,\nthe expression <code>c: 1.2.3.4</code> becomes a field named <code>c</code> and value <code>1.2.3.4</code> of\ntype <code>ip</code>, rather than a <code>string</code>. Sigma also comes with its own <a href=\"https://github.com/SigmaHQ/sigma-specification/blob/main/Taxonomy_specification.md\">event\ntaxonomy</a>\nto standardize field names. The <code>sigma</code> operator currently does not normalize\nfields according to this taxonomy but rather takes the field names verbatim from\nthe search identifier.</p>\n<p>Sigma uses <a href=\"https://github.com/SigmaHQ/sigma-specification/blob/main/Sigma_specification.md#value-modifiers\">value\nmodifiers</a>\nto select a concrete relational operator for given search predicate. Without a\nmodifier, Sigma uses equality comparison (<code>==</code>) of field and value. For example,\nthe <code>contains</code> modifier changes the relational operator to substring search, and\nthe <code>re</code> modifier switches to a regular expression match. The table below shows\nwhat modifiers the <code>sigma</code> operator supports, where ✅ means implemented, 🚧 not\nyet implemented but possible, and ❌ not yet supported:</p>\n<p>|Modifier|Use|sigmac|Tenzir|\n|--------|---|:----:|:--:|\n|<code>contains</code>|perform a substring search with the value|✅|✅|\n|<code>startswith</code>|match the value as a prefix|✅|✅|\n|<code>endswith</code>|match the value as a suffix|✅|✅|\n|<code>base64</code>|encode the value with Base64|✅|✅\n|<code>base64offset</code>|encode value as all three possible Base64 variants|✅|✅\n|<code>utf16le</code>/<code>wide</code>|transform the value to UTF16 little endian|✅|🚧\n|<code>utf16be</code>|transform the value to UTF16 big endian|✅|🚧\n|<code>utf16</code>|transform the value to UTF16|✅|🚧\n|<code>re</code>|interpret the value as regular expression|✅|✅\n|<code>cidr</code>|interpret the value as a IP CIDR|❌|✅\n|<code>all</code>|changes the expression logic from OR to AND|✅|✅\n|<code>lt</code>|compare less than (<code><</code>) the value|❌|✅\n|<code>lte</code>|compare less than or equal to (<code><=</code>) the value|❌|✅\n|<code>gt</code>|compare greater than (<code>></code>) the value|❌|✅\n|<code>gte</code>|compare greater than or equal to (<code>>=</code>) the value|❌|✅\n|<code>expand</code>|expand value to placeholder strings, e.g., <code>%something%</code>|❌|❌</p>\n<h3><code><rule.yaml></code></h3>\n<p>The rule to match.</p>\n<p>This invocation transpiles <code>rule.yaml</code> at the time of pipeline creation.</p>\n<h3><code><directory></code></h3>\n<p>The directory to watch.</p>\n<p>This invocation watches a directory and attempts to parse each contained file as\na Sigma rule. The <code>sigma</code> operator matches if <em>any</em> of the contained rules\nmatch, effectively creating a disjunction of all rules inside the directory.</p>\n<h3><code>--refresh-interval <refresh-interval></code></h3>\n<p>How often the Sigma operator looks at the specified rule or directory of rules\nto update its internal state.</p>\n<p>Defaults to 5 seconds.</p>\n<h2>Examples</h2>\n<p>Apply a Sigma rule to an EVTX file using\n<a href=\"https://github.com/omerbenamram/evtx\"><code>evtx_dump</code></a>:</p>\n<pre><code class=\"language-bash\">evtx_dump -o jsonl file.evtx | tenzir 'read json | sigma rule.yaml'\n</code></pre>\n<p>Apply a Sigma rule over historical data in a node from the last day:</p>\n<pre><code>export | where :timestamp > 1 day ago | sigma rule.yaml\n</code></pre>\n<p>Watch a directory of Sigma rules and apply all of them on a continuous stream of\nSuricata events:</p>\n<pre><code>from file --follow eve.json read suricata | sigma /tmp/rules/\n</code></pre>\n<p>When you add a new file to <code>/tmp/rules</code>, the <code>sigma</code> operator transpiles it and\nwill match it on all subsequent inputs.</p>",
"docLink": "https://docs.tenzir.com/operators/sigma"
},
{
"label": "slice",
"type": "keyword",
"detail": "Keep a range events within the half-closed interval \\[begin, end).",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>slice [--begin <begin>] [--end <end>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>slice</code> operator selects a range of events from the input. The semantics of\nthe operator match Python's array slicing.</p>\n<h3><code><begin></code></h3>\n<p>An signed integer denoting the beginning (inclusive) of the range to keep. Use a\nnegative number to count from the end.</p>\n<h3><code><end></code></h3>\n<p>An signed integer denoting the end (exclusive) of the range to keep. Use a\nnegative number to count from the end.</p>\n<h2>Examples</h2>\n<p>Get the second 100 events:</p>\n<pre><code>slice --begin 100 --end 200\n</code></pre>\n<p>Get the last five events:</p>\n<pre><code>slice --begin -5\n</code></pre>\n<p>Skip the last ten events:</p>\n<pre><code>slice --end -10\n</code></pre>\n<p>Return the last 50 events, except for the last 2:</p>\n<pre><code>slice --begin -50 --end -2\n</code></pre>\n<p>Skip the first and the last event:</p>\n<pre><code>slice --begin 1 --end -1\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/slice"
},
{
"label": "sockets",
"type": "keyword",
"detail": "Shows a snapshot of open sockets.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>sockets\n</code></pre>\n<h2>Description</h2>\n<p>The <code>sockets</code> operator shows a snapshot of all currently open sockets.</p>\n<h2>Schemas</h2>\n<p>Tenzir emits socket information with the following schema.</p>\n<h3><code>tenzir.socket</code></h3>\n<p>Contains detailed information about the socket.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>pid</code>|<code>uint64</code>|The process identifier.|\n|<code>process</code>|<code>string</code>|The name of the process involved.|\n|<code>protocol</code>|<code>uint64</code>|The protocol used for the communication.|\n|<code>local_addr</code>|<code>ip</code>|The local IP address involved in the connection.|\n|<code>local_port</code>|<code>port</code>|The local port number involved in the connection.|\n|<code>remote_addr</code>|<code>ip</code>|The remote IP address involved in the connection.|\n|<code>remote_port</code>|<code>port</code>|The remote port number involved in the connection.|\n|<code>state</code>|<code>string</code>|The current state of the connection.|</p>\n<h2>Examples</h2>\n<p>Show process ID, local, and remote IP address of all sockets:</p>\n<pre><code>sockets\n| select pid, local_addr, remote_addr \n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/sockets"
},
{
"label": "sort",
"type": "keyword",
"detail": "Sorts events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>sort [--stable] <field> [<asc>|<desc>] [<nulls-first>|<nulls-last>]\n</code></pre>\n<h2>Description</h2>\n<p>Sorts events by a provided field.</p>\n<h3><code>--stable</code></h3>\n<p>Preserve the relative order of events that cannot be sorted because the provided\nfields resolve to the same value.</p>\n<h3><code><field></code></h3>\n<p>The name of the field to sort by.</p>\n<h3><code><asc>|<desc></code></h3>\n<p>Specifies the sort order.</p>\n<p>Defaults to <code>asc</code>.</p>\n<h3><code><nulls-first>|<nulls-last></code></h3>\n<p>Specifies how to order null values.</p>\n<p>Defaults to <code>nulls-last</code>.</p>\n<h2>Examples</h2>\n<p>Sort by the <code>timestamp</code> field in ascending order.</p>\n<pre><code>sort timestamp\n</code></pre>\n<p>Sort by the <code>timestamp</code> field in descending order.</p>\n<pre><code>sort timestamp desc\n</code></pre>\n<p>Arrange by field <code>foo</code> and put null values first:</p>\n<pre><code>sort foo nulls-first\n</code></pre>\n<p>Arrange by field <code>foo</code> in descending order and put null values first:</p>\n<pre><code>sort foo desc nulls-first\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/sort"
},
{
"label": "subscribe",
"type": "keyword",
"detail": "Subscribes to events from a channel with a topic. The dual to",
"processedHTML": "<p><a href=\"publish.md\"><code>publish</code></a>.</p>\n<h2>Synopsis</h2>\n<pre><code>subscribe [<topic>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>subscribe</code> operator subscribes to events from a channel with the specified\ntopic. Multiple <code>subscribe</code> operators with the same topic receive the same\nevents.</p>\n<h3><code><topic></code></h3>\n<p>An optional topic identifying the channel events are published under.</p>\n<h2>Examples</h2>\n<p>Subscribe to the events under the topic <code>zeek-conn</code>:</p>\n<pre><code>subscribe zeek-conn\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/subscribe"
},
{
"label": "summarize",
"type": "keyword",
"detail": "Groups events and applies aggregate functions on each group.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>summarize <[field=]aggregation>... [by <extractor>... [resolution <duration>]]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>summarize</code> operator groups events according to a grouping expression and\napplies an aggregation function over each group. The operator consumes the\nentire input before producing an output.</p>\n<p>Fields that neither occur in an aggregation function nor in the <code>by</code> list\nare dropped from the output.</p>\n<h3><code>[field=]aggregation</code></h3>\n<p>Aggregation functions compute a single value of one or more columns in a given\ngroup. Syntactically, <code>aggregation</code> has the form <code>f(x)</code> where <code>f</code> is the\naggregation function and <code>x</code> is a field.</p>\n<p>By default, the name for the new field <code>aggregation</code> is its string\nrepresentation, e.g., <code>min(timestamp)</code>. You can specify a different name by\nprepending a field assignment, e.g., <code>min_ts=min(timestamp)</code>.</p>\n<p>The following aggregation functions are available:</p>\n<ul>\n<li><code>sum</code>: Computes the sum of all grouped values.</li>\n<li><code>min</code>: Computes the minimum of all grouped values.</li>\n<li><code>max</code>: Computes the maximum of all grouped values.</li>\n<li><code>any</code>: Computes the disjunction (OR) of all grouped values. Requires the\nvalues to be booleans.</li>\n<li><code>all</code>: Computes the conjunction (AND) of all grouped values. Requires the\nvalues to be booleans.</li>\n<li><code>distinct</code>: Creates a sorted list of all unique grouped values that are not\nnull.</li>\n<li><code>sample</code>: Takes the first of all grouped values that is not null.</li>\n<li><code>count</code>: Counts all grouped values that are not null.</li>\n<li><code>count_distinct</code>: Counts all distinct grouped values that are not null.</li>\n</ul>\n<h3><code>by <extractor></code></h3>\n<p>The extractors specified after the optional <code>by</code> clause partition the input into\ngroups. If <code>by</code> is omitted, all events are assigned to the same group.</p>\n<h3><code>resolution <duration></code></h3>\n<p>The <code>resolution</code> option specifies an optional duration value that specifies the\ntolerance when comparing time values in the <code>group-by</code> section. For example,\n<code>01:48</code> is rounded down to <code>01:00</code> when a 1-hour <code>resolution</code> is used.</p>\n<p>NB: we introduced the <code>resolution</code> option as a stop-gap measure to compensate for\nthe lack of a rounding function. The ability to apply functions in the grouping\nexpression will replace this option in the future.</p>\n<h2>Examples</h2>\n<p>Group the input by <code>src_ip</code> and aggregate all unique <code>dest_port</code> values into a\nlist:</p>\n<pre><code>summarize distinct(dest_port) by src_ip\n</code></pre>\n<p>Same as above, but produce a count of the unique number of values instead of a\nlist:</p>\n<pre><code>summarize count_distinct(dest_port) by src_ip\n</code></pre>\n<p>Compute minimum, maximum of the <code>timestamp</code> field per <code>src_ip</code> group:</p>\n<pre><code>summarize min(timestamp), max(timestamp) by src_ip\n</code></pre>\n<p>Compute minimum, maximum of the <code>timestamp</code> field over all events:</p>\n<pre><code>summarize min(timestamp), max(timestamp)\n</code></pre>\n<p>Create a boolean flag <code>originator</code> that is <code>true</code> if any value in the group is\n<code>true</code>:</p>\n<pre><code>summarize originator=any(is_orig) by src_ip\n</code></pre>\n<p>Create 1-hour groups and produce a summary of network traffic between host\npairs:</p>\n<pre><code>summarize sum(bytes_in), sum(bytes_out) by ts, src_ip, dest_ip resolution 1 hour\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/summarize"
},
{
"label": "tail",
"type": "keyword",
"detail": "Limits the input to the last N events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>tail [<limit>]\n</code></pre>\n<h2>Description</h2>\n<p>The semantics of the <code>tail</code> operator are the same of the equivalent Unix tool:\nconsume all input and only display the last <em>N</em> events.</p>\n<p><code>tail <limit></code> is a shorthand notation for <a href=\"slice.md\"><code>slice --begin -<limit></code></a>.</p>\n<h3><code><limit></code></h3>\n<p>An unsigned integer denoting how many events to keep. Defaults to 10.</p>\n<p>Defaults to 10.</p>\n<h2>Examples</h2>\n<p>Get the last ten results:</p>\n<pre><code>tail\n</code></pre>\n<p>Get the last five results:</p>\n<pre><code>tail 5\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/tail"
},
{
"label": "taste",
"type": "keyword",
"detail": "Limits the input to N events per unique schema.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>taste [<limit>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>taste</code> operator provides an exemplary overview of the \"shape\" of the data\ndescribed by the pipeline. This helps to understand the diversity of the\nresult, especially when interactively exploring data. Usually, the first <em>N</em>\nevents are returned, but this is not guaranteed.</p>\n<h3><code><limit></code></h3>\n<p>An unsigned integer denoting how many events to keep per schema.</p>\n<p>Defaults to 10.</p>\n<h2>Examples</h2>\n<p>Get 10 results of each unique schema:</p>\n<pre><code>taste\n</code></pre>\n<p>Get one sample for every unique event type:</p>\n<pre><code>taste 1\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/taste"
},
{
"label": "timeshift",
"type": "keyword",
"detail": "Adjusts timestamps relative to a given start time, with an optional speedup.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>timeshift [--start <time>] [--speed <factor>] <field>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>timeshift</code> operator adjusts a series of time values by anchoring them\naround a given start time.</p>\n<p>With <code>--speed</code>, you can adjust the relative speed of the time series induced by\n<code>field</code> with a multiplicative factor. This has the effect of making the time\nseries \"faster\" for values great than 1 and \"slower\" for values less than 1.</p>\n<p>If you do not provide a start time with <code>--start</code>, the operator will anchor the\ntimestamps at the first non-null timestamp.</p>\n<p><img src=\"timeshift.excalidraw.svg\" alt=\"Timeshift\"></p>\n<p>The options <code>--start</code> and <code>--speed</code> work independently, i.e., you can use them\nseparately or both together.</p>\n<h3><code>--start <time></code></h3>\n<p>The timestamp to anchor the time values around.</p>\n<p>Defaults to the first non-null timestamp in <code>field</code>.</p>\n<h3><code>--speed <speed></code></h3>\n<p>A constant factor to be divided by the inter-arrival time. For example, 2.0\ndecreases the event gaps by a factor of two, resulting a twice as fast dataflow.\nA value of 0.1 creates dataflow that spans ten times the original time frame.</p>\n<p>Defaults to 1.0.</p>\n<h3><code><field></code></h3>\n<p>The name of the field containing the timestamp values.</p>\n<h2>Examples</h2>\n<p>Set the M57 Zeek logs to begin at Jan 1, 1984:</p>\n<pre><code>from https://storage.googleapis.com/tenzir-datasets/M57/zeek-all.log.zst read zeek-tsv\n| timeshift --start 1984-01-01 ts\n</code></pre>\n<p>As above, but also make the time span of the trace 100 times longer:</p>\n<pre><code>from https://storage.googleapis.com/tenzir-datasets/M57/zeek-all.log.zst read zeek-tsv\n| timeshift --start 1984-01-01 --speed 0.01 ts\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/timeshift"
},
{
"label": "to",
"type": "keyword",
"detail": "Consumes events by combining a \\[connector]\\[connectors] and a \\[format]\\[formats].",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>to <uri> [write <format>]\nto <path> [write <format>]\nto <connector> [write <format>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>to</code> operator consumes events at the end of a pipeline by bringing together\na <a href=\"../connectors.md\">connector</a> and a <a href=\"../formats.md\">format</a>.</p>\n<p>If given something that looks like a path to a file, the connector can pick\nout a format automatically based on the file extension or the file name.\nThis enables a shorter syntax, e.g., <code>to ./file.csv</code> uses the <code>csv</code> format.\nAll connectors also have a default format, which will be used\nif the format can't be determined by the path. For most connectors,\nthis default format is <code>json</code>.\nSo, for example, <code>to stdin</code> uses the <code>json</code> format.</p>\n<p>Additionally, if a file extension indicating compression can be found,\n<a href=\"compress.md\"><code>compress</code></a> is automatically used. For example, <code>to myfile.json.gz</code> is automatically gzip-compressed and formatted as json, i.e.,\n<code>write json | compress gzip | save myfile.json.gz</code>.</p>\n<p>The <code>to</code> operator is a pipeline under the hood. For most cases, it is equal to\n<code>write <format> | save <connector></code>. However, for some combinations of\nconnectors and formats the underlying pipeline is a bit more complex. We\nrecommend always using <code>to ... write ...</code> over the <a href=\"write.md\"><code>write</code></a> and\n<a href=\"save.md\"><code>save</code></a> operators.</p>\n<h3><code><connector></code></h3>\n<p>The <a href=\"../connectors.md\">connector</a> used to save bytes.</p>\n<p>Some connectors have connector-specific options. Please refer to the\ndocumentation of the individual connectors for more information.</p>\n<h3><code><format></code></h3>\n<p>The <a href=\"../formats.md\">format</a> used to print events to bytes.</p>\n<p>Some formats have format-specific options. Please refer to the documentation of\nthe individual formats for more information.</p>\n<h2>Examples</h2>\n<p>Write events to stdout formatted as CSV.</p>\n<pre><code>to stdout write csv\n</code></pre>\n<p>Write events to the file <code>path/to/eve.json</code> formatted as JSON.</p>\n<pre><code>to path/to/eve.json write json\nto file path/to/eve.json write json\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/to"
},
{
"label": "top",
"type": "keyword",
"detail": "Shows the most common values. The dual to rare.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>top <field> [--count-field=<count-field>|-c <count-field>]\n</code></pre>\n<h2>Description</h2>\n<p>Shows the most common values for a given field. For each unique value, a new event containing its count will be produced.</p>\n<h3><code><field></code></h3>\n<p>The name of the field to find the most common values for.</p>\n<h3><code>--count-field=<count-field>|-c <count-field></code></h3>\n<p>An optional argument specifying the field name of the count field. Defaults to <code>count</code>.</p>\n<p>The count field and the value field must have different names.</p>\n<h2>Examples</h2>\n<p>Find the most common values for field <code>id.orig_h</code>.</p>\n<pre><code>top id.orig_h\n</code></pre>\n<p>Find the most common values for field <code>count</code> and present the value amount in a field <code>amount</code>.</p>\n<pre><code>top count --count-field=amount\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/top"
},
{
"label": "unflatten",
"type": "keyword",
"detail": "Unflattens data structures whose field names imply a nested structure.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>unflatten [<separator>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>unflatten</code> operator creates nested records out of record entries whose\nnames include a separator, thus unflattening</p>\n<div class=\"remark-container info\"><p><code>unflatten</code> uses a heuristic to determine the unflattened schema. Thus, the\nschema of a record that has been flattened using the <a href=\"flatten.md\"><code>flatten</code></a> operator and\nunflattened afterwards may not be identical to the schema of the unmodified\nrecord.</p></div>\n<h3><code><separator></code></h3>\n<p>The separator string to unflatten records with.</p>\n<p>Defaults to <code>.</code>.</p>\n<h2>Examples</h2>\n<p>Consider the following data:</p>\n<pre><code class=\"language-json\">{\n \"src_ip\": \"147.32.84.165\",\n \"src_port\": 1141,\n \"dest_ip\": \"147.32.80.9\",\n \"dest_port\": 53,\n \"event_type\": \"dns\",\n \"dns.type\": \"query\",\n \"dns.id\": 553,\n \"dns.rrname\": \"irc.freenode.net\",\n \"dns.rrtype\": \"A\",\n \"dns.tx_id\": 0,\n \"dns.grouped.A\": [\"tenzir.com\"]\n}\n</code></pre>\n<p>The <code>unflatten</code> operator recreates nested records from fields that contain the <code>.</code>\nseparator:</p>\n<pre><code class=\"language-json\">{\n \"src_ip\": \"147.32.84.165\",\n \"src_port\": 1141,\n \"dest_ip\": \"147.32.80.9\",\n \"dest_port\": 53,\n \"event_type\": \"dns\",\n \"dns\": {\n \"type\": \"query\",\n \"id\": 553,\n \"rrname\": \"irc.freenode.net\",\n \"rrtype\": \"A\",\n \"tx_id\": 0,\n \"grouped\": {\n \"A\": [\n \"tenzir.com\"\n ]\n }\n }\n}\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/unflatten"
},
{
"label": "unique",
"type": "keyword",
"detail": "Removes adjacent duplicates.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>unique\n</code></pre>\n<h2>Description</h2>\n<p>The <code>unique</code> operator deduplicates adjacent values, similar to the Unix tool\n<code>uniq</code>.</p>\n<p>A frequent use case is <a href=\"select.md\">selecting a set of fields</a>, <a href=\"sort.md\">sorting the\ninput</a>, and then removing duplicates from the input.</p>\n<h2>Examples</h2>\n<p>Consider the following data:</p>\n<pre><code class=\"language-json\">{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n{\"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n</code></pre>\n<p>The <code>unique</code> operator removes adjacent duplicates and produces the following output:</p>\n<pre><code class=\"language-json\">{\"foo\": 1, \"bar\": \"a\"}\n{\"foo\": 1, \"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n{\"bar\": \"b\"}\n{\"foo\": null, \"bar\": \"b\"}\n</code></pre>\n<p>Note that the output still contains the event <code>{\"foo\": null, \"bar\": \"b\"}</code> twice.\nThis is because <code>unique</code> only removes <em>adjacent</em> duplicates.</p>\n<p>To remove <em>all</em> duplicates (including non-adjacent ones), <a href=\"sort.md\"><code>sort</code></a>\nthe input first such that duplicate values lay adjacent to each other. Unlike\ndeduplication via <code>unique</code>, sorting is a blocking and operation and consumes\nthe entire input before producing outputs.</p>",
"docLink": "https://docs.tenzir.com/operators/unique"
},
{
"label": "unroll",
"type": "keyword",
"detail": "Unrolls a list by producing multiple events, one for each item.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>unroll <field>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>unroll</code> operator transforms each input event into a multiple output events.\nFor each item in the input list, one output event is created, where the list is\nreplaced with its item. The surrounding data is kept as-is.</p>\n<p><img src=\"unroll.excalidraw.svg\" alt=\"Unroll Example\"></p>\n<p>No output events are produced if the list is empty or if the field is <code>null</code>.</p>\n<h2>Examples</h2>\n<p>Consider the following events:</p>\n<pre><code class=\"language-json\">{\"a\": 1, \"b\": [1, 2, 3]}\n{\"a\": 2, \"b\": [1]}\n{\"a\": 3, \"b\": []}\n{\"a\": 4, \"b\": null}\n</code></pre>\n<p><code>unroll b</code> would produce the following output:</p>\n<pre><code class=\"language-json\">{\"a\": 1, \"b\": 1}\n{\"a\": 1, \"b\": 2}\n{\"a\": 1, \"b\": 3}\n{\"a\": 2, \"b\": 1}\n</code></pre>\n<p>The <code>unroll</code> operator can also be used with records.</p>\n<pre><code class=\"language-json\">{\n \"src\": \"192.168.0.5\",\n \"conn\": [\n {\n \"dest\": \"192.168.0.34\",\n \"active\": \"381ms\"\n },\n {\n \"dest\": \"192.168.0.120\",\n \"active\": \"42ms\"\n },\n {\n \"dest\": \"1.2.3.4\",\n \"active\": \"67ms\"\n }\n ]\n}\n</code></pre>\n<p>We can use <code>unroll conn</code> to bring this into a form more suited for analysis.\nFor example, we would then be able to use\n<code>where active > 100ms || conn.dest !in 192.168.0.0/16</code> to filter for relevant\nconnections.</p>\n<pre><code class=\"language-json\">{\n \"src\": \"192.168.0.5\",\n \"conn\": {\n \"dest\": \"192.168.0.34\",\n \"active\": \"381.0ms\"\n }\n}\n{\n \"src\": \"192.168.0.5\",\n \"conn\": {\n \"dest\": \"1.2.3.4\",\n \"active\": \"67.0ms\"\n }\n}\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/unroll"
},
{
"label": "velociraptor",
"type": "keyword",
"detail": "Submits VQL to a Velociraptor server and returns the response as events.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>velociraptor [-n|--request-name <string>] [-o|--org-id <string>]\n [-r|--max-rows <uint64>] [-s|--subscribe <artifact>]\n [-q|--query <vql>] [-w|--max-wait <duration>]\n [--profile <profile>]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>velociraptor</code> source operator provides a request-response interface to a\n<a href=\"https://docs.velociraptor.app\">Velociraptor</a> server:</p>\n<p><img src=\"velociraptor.excalidraw.svg\" alt=\"Velociraptor\"></p>\n<p>The pipeline operator is the client and it establishes a connection to a\nVelociraptor server. The client request contains a query written in the\n<a href=\"https://docs.velociraptor.app/docs/vql\">Velociraptor Query Language (VQL)</a>, a SQL-inspired language with a <code>SELECT .. FROM .. WHERE</code> structure.</p>\n<p>You can either send a raw VQL query via <code>velociraptor --query \"<vql>\"</code> to a\nserver and processs the response, or hook into a continuous feed of artifacts\nvia <code>velociraptor --subscribe <artifact></code>. Whenever a hunt runs that contains\nthis artifact, the server will forward it to the pipeline and emit the artifact\npayload in the response field <code>HuntResults</code>.</p>\n<p>All Velociraptor client-to-server communication is mutually authenticated and\nencrypted via TLS certificates. This means you must provide client-side\ncertificate, which you can generate as follows. (Velociraptor ships as a static\nbinary that we refer to as <code>velociraptor-binary</code> here.)</p>\n<ol>\n<li>\n<p>Create a server configuration <code>server.yaml</code>:</p>\n<pre><code class=\"language-bash\">velociraptor-binary config generate > server.yaml\n</code></pre>\n</li>\n<li>\n<p>Create an API client:</p>\n<pre><code class=\"language-bash\">velociraptor-binary -c server.yaml config api_client --name tenzir client.yaml\n</code></pre>\n<p>Copy the generated <code>client.yaml</code> to your Tenzir plugin configuration\ndirectory as <code>velociraptor.yaml</code> so that the operator can find it:</p>\n<pre><code class=\"language-bash\">cp client.yaml /etc/tenzir/plugin/velociraptor.yaml\n</code></pre>\n</li>\n<li>\n<p>Run the frontend with the server configuration:</p>\n<pre><code class=\"language-bash\">velociraptor-binary -c server.yaml frontend\n</code></pre>\n</li>\n</ol>\n<p>Now you are ready to run VQL queries!</p>\n<h3><code>-n|--request-name <string></code></h3>\n<p>An identifier for the request to the Velociraptor server.</p>\n<p>Defaults to a randoum UUID.</p>\n<h3><code>-o|--org-id <string></code></h3>\n<p>The ID of the Velociraptor organization.</p>\n<p>Defaults to <code>root</code>.</p>\n<h3><code>-q|--query <vql></code></h3>\n<p>The <a href=\"https://docs.velociraptor.app/docs/vql\">VQL</a> query string.</p>\n<h3><code>-r|--max-rows <uint64></code></h3>\n<p>The maxium number of rows to return in a the stream gRPC messages returned by\nthe server.</p>\n<p>Defaults to 1,000.</p>\n<h3><code>-s|--subscribe <artifact></code></h3>\n<p>Subscribes to a flow artifact.</p>\n<p>This option generates a larger VQL expression under the hood that creates one\nevent per flow and artifact. The response contains a field <code>HuntResult</code> that\ncontains the result of the hunt.</p>\n<h3><code>-w|--max-wait <duration></code></h3>\n<p>Controls how long to wait before releasing a partial result set.</p>\n<p>Defaults to <code>1 sec</code>.</p>\n<h3><code>--profile <profile></code></h3>\n<p>Specifies the configuration profile for the Velociraptor instance. This enables\nconnecting to multiple Velociraptor instances from the same Tenzir node.</p>\n<p>To use profiles, edit your <code>velociraptor.yaml</code> configuration like this, where\n<code><config></code> refers to the contents of the configuration file created by Velociraptor, and\n<code><profile></code> to the desired profile name.</p>\n<pre><code class=\"language-yaml\"># before\n<config>\n\n# after\nprofiles:\n <profile>:\n <config>\n</code></pre>\n<p>If profiles are defined, the operator defaults to the first profile.</p>\n<h2>Examples</h2>\n<p>Show all processes:</p>\n<pre><code>velociraptor --query \"select * from pslist()\"\n</code></pre>\n<p>Subscribe to a hunt flow that contains the <code>Windows</code> artifact:</p>\n<pre><code>velociraptor --subscribe Windows\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/velociraptor"
},
{
"label": "version",
"type": "keyword",
"detail": "Shows the current version.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>version\n</code></pre>\n<h2>Description</h2>\n<p>The <code>version</code> operator shows the current Tenzir version.</p>\n<h2>Schemas</h2>\n<p>Tenzir emits version information with the following schema.</p>\n<h3><code>tenzir.version</code></h3>\n<p>Contains detailed information about the process version.</p>\n<p>|Field|Type|Description|\n|:-|:-|:-|\n|<code>version</code>|<code>string</code>|The formatted version string.|\n|<code>major</code>|<code>uint64</code>|The major release version.|\n|<code>minor</code>|<code>uint64</code>|The minor release version.|\n|<code>patch</code>|<code>uint64</code>|The patch release version.|\n|<code>tweak</code>|<code>uint64</code>|The number of changes since the last release.|</p>\n<h2>Examples</h2>\n<p>Use <code>version</code> to show the current version of a development build:</p>\n<pre><code>{\n \"version\": \"v4.6.3-36-gbd4c8a058b-dirty\",\n \"major\": 4,\n \"minor\": 6,\n \"patch\": 3,\n \"tweak\": 36\n}\n</code></pre>\n<p>Use <code>version</code> to show the current version of a release build:</p>\n<pre><code>{\n \"version\": \"v4.7.0\",\n \"major\": 4,\n \"minor\": 7,\n \"patch\": 0,\n \"tweak\": 0\n}\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/version"
},
{
"label": "where",
"type": "keyword",
"detail": "Filters events according to an expression.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>where <expression>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>where</code> operator only keeps events that match the provided\n<a href=\"../language/expressions.md\">expression</a> and discards all other events.</p>\n<p>Use <code>where</code> to extract the subset of interest of the data. Tenzir's expression\nlanguage offers various ways to describe the desired data. In particular,\nexpressions work <em>across schemas</em> and thus make it easy to concisely articulate\nconstraints.</p>\n<h3><code><expression></code></h3>\n<p>The <a href=\"../language/expressions.md\">expression</a> to evaluate for each event.</p>\n<h2>Examples</h2>\n<p>Select all events that contain a field with the value <code>1.2.3.4</code>:</p>\n<pre><code>where 1.2.3.4\n</code></pre>\n<p>This expression internally completes to <code>:ip == 1.2.3.4</code>. The type extractor\n<code>:ip</code> describes all fields of type <code>ip</code>. Use field extractors to only consider a\nsingle field:</p>\n<pre><code>where src_ip == 1.2.3.4\n</code></pre>\n<p>As a slight variation of the above: use a nested field name and a temporal\nconstraint of the field with name <code>ts</code>:</p>\n<pre><code>where id.orig_h == 1.2.3.4 and ts > 1 hour ago\n</code></pre>\n<p>Subnets are first-class values:</p>\n<pre><code>where 10.10.5.0/25\n</code></pre>\n<p>This expression unfolds to <code>:ip in 10.10.5.0/25 or :subnet == 10.10.5.0/25</code>. It\nmeans \"select all events that contain a field of type <code>ip</code> in the subnet\n<code>10.10.5.0/25</code>, or a field of type <code>subnet</code> the exactly matches <code>10.10.5.0/25</code>\".</p>\n<p>Expressions consist of predicates that can be connected with <code>and</code>, <code>or</code>, and\n<code>not</code>:</p>\n<pre><code>where 10.10.5.0/25 and (orig_bytes > 1 Mi or duration > 30 min)\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/where"
},
{
"label": "write",
"type": "keyword",
"detail": "The write operator converts events into raw bytes.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>write <format>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>write</code> operator prints events and outputs the formatted result as raw\nbytes.</p>\n<h3><code><format></code></h3>\n<p>The <a href=\"../formats.md\">format</a> used to convert events into raw bytes.</p>\n<p>Some formats have format-specific options. Please refer to the documentation of\nthe individual formats for more information.</p>\n<h2>Examples</h2>\n<p>Convert events into JSON:</p>\n<pre><code>write json\n</code></pre>\n<p>Convert events into CSV:</p>\n<pre><code>write csv\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/write"
},
{
"label": "yara",
"type": "keyword",
"detail": "Executes YARA rules on byte streams.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>yara [-B|--blockwise] [-C|--compiled-rules] [-f|--fast-scan] <rule> [<rule>..]\n</code></pre>\n<h2>Description</h2>\n<p>The <code>yara</code> operator applies <a href=\"https://virustotal.github.io/yara/\">YARA</a> rules to\nan input of bytes, emitting rule context upon a match.</p>\n<p><img src=\"yara-operator.excalidraw.svg\" alt=\"YARA Operator\"></p>\n<p>We modeled the operator after the official <a href=\"https://yara.readthedocs.io/en/stable/commandline.html\"><code>yara</code> command-line\nutility</a> to enable a\nfamiliar experience for the command users. Similar to the official <code>yara</code>\ncommand, the operator compiles the rules by default, unless you provide the\noption <code>-C,--compiled-rules</code>. To quote from the above link:</p>\n<blockquote>\n<p>This is a security measure to prevent users from inadvertently using compiled\nrules coming from a third-party. Using compiled rules from untrusted sources\ncan lead to the execution of malicious code in your computer.</p>\n</blockquote>\n<p>The operator uses a YARA <em>scanner</em> under the hood that buffers blocks of bytes\nincrementally. Even though the input arrives in non-contiguous blocks of\nmemories, the YARA scanner engine support matching across block boundaries. For\ncontinuously running pipelines, use the <code>--blockwise</code> option that considers each\nblock as a separate unit. Otherwise the scanner engine would simply accumulate\nblocks but never trigger a scan.</p>\n<h3><code>-B|--blockwise</code></h3>\n<p>Match on every byte chunk instead of triggering a scan when the input exhausted.</p>\n<p>This option makes sense for never-ending dataflows where each chunk of bytes\nconstitutes a self-contained unit, such as a single file.</p>\n<h3><code>-C|--compiled-rules</code></h3>\n<p>Interpret the rules as compiled.</p>\n<p>When providing this flag, you must exactly provide one rule path as positional\nargument.</p>\n<h3><code>-f|--fast-scan</code></h3>\n<p>Enable fast matching mode.</p>\n<h3><code><rule></code></h3>\n<p>The path to the YARA rule(s).</p>\n<p>If the path is a directory, the operator attempts to recursively add all\ncontained files as YARA rules.</p>\n<h2>Examples</h2>\n<p>The examples below show how you can scan a single file and how you can create a\nsimple rule scanning service.</p>\n<h3>Perform one-shot scanning of files</h3>\n<p>Scan a file with a set of YARA rules:</p>\n<pre><code>load file --mmap evil.exe | yara rule.yara\n</code></pre>\n<div class=\"remark-container info\"><div class=\"remark-container-title info\">Memory Mapping Optimization</div><p>The <code>--mmap</code> flag is merely an optimization that constructs a single chunk of\nbytes instead of a contiguous stream. Without <code>--mmap</code>, the\n<a href=\"../connectors/file.md\"><code>file</code></a> loader generates a stream of byte chunks and\nfeeds them incrementally to the <code>yara</code> operator. This also works, but\nperformance is better due to memory locality when using <code>--mmap</code>.</p></div>\n<p>Let's unpack a concrete example:</p>\n<pre><code class=\"language-yara\">rule test {\n meta:\n string = \"string meta data\"\n integer = 42\n boolean = true\n\n strings:\n $foo = \"foo\"\n $bar = \"bar\"\n $baz = \"baz\"\n\n condition:\n ($foo and $bar) or $baz\n}\n</code></pre>\n<p>You can produce test matches by feeding bytes into the <code>yara</code> operator:</p>\n<pre><code class=\"language-bash\">echo 'foo bar' | tenzir 'load stdin | yara /tmp/test.yara'\n</code></pre>\n<p>You will get one <code>yara.match</code> per matching rule:</p>\n<pre><code class=\"language-json\">{\n \"rule\": {\n \"identifier\": \"test\",\n \"namespace\": \"default\",\n \"tags\": [],\n \"meta\": {\n \"string\": \"string meta data\",\n \"integer\": 42,\n \"boolean\": true\n },\n \"strings\": {\n \"$foo\": \"foo\",\n \"$bar\": \"bar\",\n \"$baz\": \"baz\"\n }\n },\n \"matches\": {\n \"$foo\": [\n {\n \"data\": \"Zm9v\",\n \"base\": 0,\n \"offset\": 0,\n \"match_length\": 3\n }\n ],\n \"$bar\": [\n {\n \"data\": \"YmFy\",\n \"base\": 0,\n \"offset\": 4,\n \"match_length\": 3\n }\n ]\n }\n}\n</code></pre>\n<p>Each match has a <code>rule</code> field describing the rule and a <code>matches</code> record\nindexed by string identifier to report a list of matches per rule string.</p>\n<h3>Build a YARA scanning service</h3>\n<p>Let's say you want to build a service that scans malware sample that you receive\nover a Kafka topic <code>malware</code>.</p>\n<p>Launch the processing pipeline as follows:</p>\n<pre><code>load kafka --topic malware | yara --blockwise /path/to/rules\n</code></pre>\n<p>If you run this pipeline on the command line via <code>tenzir <pipeline></code>, you see\nthe matches arriving as JSON. You could also send the matches via the\n<a href=\"fluent-bit.md\"><code>fluent-bit</code></a> sink to Slack, Splunk, or any other\nFluent Bit output. For example, via Slack:</p>\n<pre><code>load kafka --topic malware\n| yara --blockwise /path/to/rules\n| fluent-bit slack webhook=<url>\n</code></pre>\n<p>This pipeline requires that every Kafka message is a self-contained malware\nsample. Because the pipeline runs continuously, we supply the <code>--blockwise</code>\noption so that the <code>yara</code> triggers a scan for every Kafka message, as opposed to\naccumulating all messages indefinitely and only initiating a scan when the input\nexhausts.</p>\n<p>You can now submit a malware sample by sending it to the <code>malware</code> Kafka topic:</p>\n<pre><code>load file --mmap evil.exe | save kafka --topic malware\n</code></pre>\n<p>This pipeline loads the file <code>evil.exe</code> as single blob and sends it to Kafka, at\ntopic <code>malware</code>.</p>",
"docLink": "https://docs.tenzir.com/operators/yara"
},
{
"label": "yield",
"type": "keyword",
"detail": "Extracts nested records with the ability to unfold lists.",
"processedHTML": "<h2>Synopsis</h2>\n<pre><code>yield <extractor>\n</code></pre>\n<h2>Description</h2>\n<p>The <code>yield</code> operator can be used to \"zoom into\" the extracted part of the\nincoming events. It can also return a new event for each element of a list.</p>\n<h3><code><extractor></code></h3>\n<p>The extractor must start with a field name. This can be followed by <code>.</code> and\nanother field name, or by <code>[]</code> to extract all elements from the given list.</p>\n<h2>Examples</h2>\n<p>The schema <code>suricata.dns</code> provides a list of answers for DNS queries. Assume we\nwant to extract all answers for <code>CNAME</code> records.</p>\n<pre><code>from eve.json\n| where #schema == \"suricata.dns\"\n| yield dns.answers[]\n| where rrtype == \"CNAME\"\n</code></pre>",
"docLink": "https://docs.tenzir.com/operators/yield"
}
];