A dedicated network sensor appliance is the recommended method for capturing and analyzing live network traffic when performance and throughput is of utmost importance. Hedgehog Linux is a custom Debian-based operating system built to:
- monitor network interfaces
- capture packets to PCAP files
- detect file transfers in network traffic and extract and scan those files for threats
- generate and forward Zeek and Suricata logs, Arkime sessions, and other information to [Malcolm]({{ site.github.repository_url }})
Please see the Hedgehog Linux README for more information.
The options for monitoring traffic on local network interfaces can be configured by running ./scripts/configure
.
Malcolm's pcap-capture
, suricata-live
and zeek-live
containers can monitor one or more local network interfaces, specified by the PCAP_IFACE
environment variable in pcap-capture.env
. These containers are started with additional privileges to allow opening network interfaces in promiscuous mode for capture.
The instances of Zeek and Suricata (in the suricata-live
and zeek-live
containers when the SURICATA_LIVE_CAPTURE
and ZEEK_LIVE_CAPTURE
environment variables are set to true
, respectively) analyze traffic on-the-fly and generate log files containing network session metadata. These log files are in turn scanned by Filebeat and forwarded to Logstash for enrichment and indexing into the OpenSearch document store.
In contrast, the pcap-capture
container buffers traffic to PCAP files and periodically rotates these files for processing (by Arkime's capture
utlity in the arkime
container) according to the thresholds defined by the PCAP_ROTATE_MEGABYTES
and PCAP_ROTATE_MINUTES
environment variables in pcap-capture.env
. If for some reason (e.g., a low resources environment) you also want Zeek and Suricata to process these intermediate PCAP files rather than monitoring the network interfaces directly, you can set SURICATA_ROTATED_PCAP
/ZEEK_ROTATED_PCAP
to true
and SURICATA_LIVE_CAPTURE
/ZEEK_LIVE_CAPTURE
to false. The only exception to this behavior (i.e., the creation of intermediate PCAP files by netsniff-ng
or tcpdump
in the pcap-capture
which are periodically rolled over for processing by Arkime) is when running the "Hedgehog" run profile, when using a remote OpenSearch or Elasticsearch instance, or in a Kubernetes-based deployment. In those configurations, users may choose to have Arkime's capture
tool monitor live traffic on the network interface without using the intermediate PCAP file.
Note that Microsoft Windows and Apple macOS platforms currently run Docker inside of a virtualized environment. Live traffic capture and analysis on those platforms would require additional configuration of virtual interfaces and port forwarding in Docker, which is outside of the scope of this document.
Another configuration for monitoring local network interfaces is to use the hedgehog
run profile. During Malcolm configuration users are prompted "Run with Malcolm (all containers) or Hedgehog (capture only) profile?" Docker Compose can use profiles to selectively start services. While the malcolm
run profile runs all of Malcolm's containers (OpenSearch, Dashboards, LogStash, etc.), the hedgehog
profile runs only the containers necessary for traffic capture.
When configuring the hedgehog
profile, users must provide connection details for another Malcolm instance to which to forward its network traffic logs.
Malcolm's Logstash instance can also be configured to accept logs from a remote forwarder by running ./scripts/configure
and answering "yes" to "Expose Logstash port to external hosts?
" Enabling encrypted transport of these log files is discussed in Configure authentication and the description of the BEATS_SSL
environment variable in beats-common.env
.
Configuring Filebeat to forward Zeek logs to Malcolm might look something like this example filebeat.yml
:
filebeat.inputs:
- type: log
paths:
- /var/zeek/*.log
fields_under_root: true
compression_level: 0
exclude_lines: ['^\s*#']
scan_frequency: 10s
clean_inactive: 180m
ignore_older: 120m
close_inactive: 90m
close_renamed: true
close_removed: true
close_eof: false
clean_renamed: true
clean_removed: true
output.logstash:
hosts: ["192.0.2.123:5044"]
ssl.enabled: true
ssl.certificate_authorities: ["/foo/bar/ca.crt"]
ssl.certificate: "/foo/bar/client.crt"
ssl.key: "/foo/bar/client.key"
ssl.supported_protocols: "TLSv1.2"
ssl.verification_mode: "none"
For environments where high-performance capture is desired, some manual tuning of the parameters of Arkime, Zeek, and Suricata will be necessary. These parameters will vary from situation to situation depending on network traffic characteristics and hardware resources, and may require adjustments over time to get the best performance possible. The following sections can help users know which settings to adjust for individual circumstances. Users should take particular care when determining the number of CPUs to use to read from network interfaces (e.g., ZEEK_LB_PROCS_WORKER_DEFAULT
for Zeek, ARKIME_TPACKETV3_NUM_THREADS
and ARKIME_PACKET_THREADS
for Arkime, and SURICATA_AF_PACKET_IFACE_THREADS
for Suricata) to determine the appropriate balance between these tools with regards to the system's available CPU resources.
Zeek's resource utilization and performance can be tuned using environment variables. These environment variables are the same for both Hedgehog Linux and Malcolm's own monitoring of local network interfaces. For Hedgehog Linux, they are found in [/opt/sensor/sensor_ctl/control_vars.conf
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/hedgehog-iso/interface/sensor_ctl/control_vars.conf), and for Malcolm they should be added to or modified in [zeek-live.env
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/config/zeek-live.env.example).
Malcolm and Hedgehog Linux use Zeek's support for AF_Packet sockets for packet capture. Review Zeek's documentation on cluster setup to better understand the parameters discussed below.
The relevant environment variables related to tuning Zeek for live packet capture are:
ZEEK_AF_PACKET_BUFFER_SIZE
- AF_Packet ring buffer size in bytes (default67108864
)ZEEK_AF_PACKET_FANOUT_MODE
- AF_Packet fanout mode (defaultFANOUT_HASH
)ZEEK_LB_PROCS_WORKER_DEFAULT
- "Zeek is not multithreaded, so once the limitations of a single processor core are reached the only option currently is to spread the workload across many cores". This value defines the number of processors to be assigned to each group of workers created for each capture interface for load balancing (default2
). A value of0
means "autocalculate based on the number of CPUs present in the system."ZEEK_LB_PROCS_WORKER_n
- Explicitly defines the number of processor to be assigned to the group of workers for the n-th capture interface. If unspecified this defaults to the number of CPUsZEEK_PIN_CPUS_WORKER_n
if defined, orZEEK_LB_PROCS_WORKER_DEFAULT
otherwise.ZEEK_LB_PROCS_LOGGER
- Defines the number of processors to be assigned to the loggers (default1
)ZEEK_LB_PROCS_PROXY
- Defines the number of processors to be assigned to the proxies (default1
)ZEEK_LB_PROCS_CPUS_RESERVED
- IfZEEK_LB_PROCS_WORKER_DEFAULT
is0
("autocalculate"), exclude this number of CPUs from the autocalculation (defaults to1
(kernel) +1
(manager) +ZEEK_LB_PROCS_LOGGER
+ZEEK_LB_PROCS_PROXY
)ZEEK_PIN_CPUS_WORKER_AUTO
- Automatically pin worker CPUs (defaultfalse
)ZEEK_PIN_CPUS_WORKER_n
- Explicitly defines the processor IDs to be to be assigned to the group of workers for the n-th capture interface (e.g.,0
means "the first CPU";12,13,14,15
means "the last four CPUs" on a 16-core system)ZEEK_PIN_CPUS_OTHER_AUTO
- automatically pin CPUs for manager, loggers, and proxies if possible (defaultfalse
)ZEEK_PIN_CPUS_MANAGER
- list of CPUs to pin for the manager process (default is unset; only used ifZEEK_PIN_CPUS_OTHER_AUTO
isfalse
)ZEEK_PIN_CPUS_LOGGER
- list of CPUs to pin for the logger processes (default is unset; only used ifZEEK_PIN_CPUS_OTHER_AUTO
isfalse
)ZEEK_PIN_CPUS_PROXY
- list of CPUs to pin for the proxy processes (default is unset; only used ifZEEK_PIN_CPUS_OTHER_AUTO
isfalse
)
Arkime's capture
process is controlled by settings in its config.ini
file. Arkime's documentation on High Performance Settings outlines the settings that most influence performance and resource utilization.
Malcolm's default values for Arkime's live traffic capture are mostly already configured for high-performance traffic capture. Some other parameters that influence Arkime's resource utilization and performance can be tuned using environment variables for both Hedgehog Linux and Malcolm's own monitoring of local network interfaces.
For Hedgehog Linux, those values are found in [/opt/sensor/sensor_ctl/control_vars.conf
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/hedgehog-iso/interface/sensor_ctl/control_vars.conf), from which they are read and are used to generate [config.ini
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/hedgehog-iso/interface/sensor_ctl/arkime/config.ini) by the [arkime_config_populate.sh
script]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/interface/sensor_ctl/supervisor.init/arkime_config_populate.sh) prior to starting capture
.
When Malcolm is capturing traffic on it's own local network interfaces, the issue becomes a bit more complicated: as described above in the section that references the pcap-capture
capture, most container-based Malcolm deployments don't actually use Arkime's capture
to generate Arkime sessions. Instead, intermediate PCAP files are generated by netsniff-ng
or tcpdump
are periodically rolled over for "offline" processing by Arkime capture
. This being the case, most of the settings dealing with traffic capture don't apply, since (from it's point of view) capture
isn't running against "live" traffic. The only exception to this behavior is when running the "Hedgehog" run profile, when using a remote OpenSearch or Elasticsearch instance, or in a Kubernetes-based deployment, in which cases users may choose to have Arkime's capture
tool monitor live traffic on the network interface without using the intermediate PCAP file so that the arkime-live
container will use [its environment variables]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/config/arkime-live.env.example) in its [entrypoint]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/interface/sensor_ctl/supervisor.init/arkime/scripts/docker_entrypoint.sh) to populate [config.ini
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/arkime/etc/config.ini).
The relevant environment variables related to tuning Arkime for live packet capture are:
ARKIME_COMPRESSION_TYPE
- the type of seekable compression to use when creating PCAP files (none
,zstd
orgzip
)ARKIME_COMPRESSION_LEVEL
- the compression level ifARKIME_COMPRESSION_TYPE
isgzip
orzstd
ARKIME_DB_BULK_SIZE
- approximate size of bulk indexing requests to send to OpenSearch/ElasticsearchARKIME_MAGIC_MODE
- "magicking" mode for HTTP/SMTP bodiesARKIME_MAX_PACKETS_IN_QUEUE
- the number of packets per packet thread that can be waiting to be processed (Arkime will start dropping packets if the queue fills up)ARKIME_PACKET_THREADS
- the number of packet threads used to process packets after the reader has received the packets (default2
)ARKIME_PCAP_WRITE_METHOD
- how packets are written to diskARKIME_PCAP_WRITE_SIZE
- buffer size to use when writing PCAP filesARKIME_PCAP_READ_METHOD
- how packets are read from network cards (tpacketv3
indicates AF_Packet should be used)ARKIME_TPACKETV3_NUM_THREADS
- the number of threads used to read packets from each network interface (default2
)ARKIME_TPACKETV3_BLOCK_SIZE
- the block size in bytes used for reads from each interface
Aside from the settings mentioned above, to quote the Arkime documentation, often issues with traffic capture performance "are not a problem with Arkime, but usually an issue with either the hardware or the packet rate exceeding what the hardware can save to disk." Please read Why am I dropping packets? (and Disk Q issues) from the Arkime FAQ.
Suricata's resource utilization and performance can be tuned using environment variables. These environment variables are the same for both Hedgehog Linux and Malcolm's own monitoring of local network interfaces. For Hedgehog Linux, they are found in [/opt/sensor/sensor_ctl/control_vars.conf
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/hedgehog-iso/interface/sensor_ctl/control_vars.conf), and for Malcolm they should be added to or modified in [suricata-live.env
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/config/suricata-live.env.example).
Upon starting, Malcolm's [suricata_config_populate.py
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/config/shared/bin/suricata_config_populate.py) script generates the suricata.yaml
configuration file (see (see suricata.yaml.in
and the Suricata documentation). The suricata_config_populate.py
script can use many environment variables when generating suricata.yaml
. See the DEFAULT_VARS
array in the script for a full list. Note that the environment variables must be prefixed with SURICATA_
when defined in control_vars.conf
or suricata-live.env
.
The following environment variables related to tuning Suricata for live packet capture may be of particular interest, but this list is by no means exhaustive:
SURICATA_AF_PACKET_IFACE_THREADS
- the number of threads used to read packets via the AF_Packet interface (default2
); a vaule ofauto
means to use the same number of threads as CPU coresSURICATA_MAX_PENDING_PACKETS
- the number simultaneous packets that the engine can handle; "setting this higher generally keeps the threads more busy, but setting it too high will lead to degradation" (default10000
)SURICATA_AF_PACKET_RING_SIZE
- the buffer size (in packets) per-thread; if this is set to0
(the default), it will be "computed with respect tomax_pending_packets
and the number of threads"
See the Suricata documentation on Tuning Considerations and High Performance for a more in-depth treatment of this topic, then cross-reference tuning parameters of interest with the variables in the DEFAULT_VARS
array in suricata_config_populate.py
to identify which variables correspond.
Note that for some variables (e.g., something with a sequence like HOME_NET
) Suricata wants values to be quoted. To accomplish that in the suricata.env
or suricata-live.env
environment variable files, use outer single quotes with inner double quotes, like this:
SURICATA_HOME_NET='"[192.168.0.0/16,10.0.0.0/8,172.16.0.0/12]"'