Skip to content

Managing Sophos Linux Sensor resource usage

Prerequisites

To avoid potential misconfigurations interrupting Sophos Linux Sensor (SLS) deployment and configuration, we recommend waiting until after the final stages of testing to use resource limiting features.

Overview

SLS can run with customized limits on resource utilization, in order to prioritize resources for production applications over security data collection.

SLS also employs a circuit breaker capability which, in the event it falls under heavy load, sheds security data collection to maintain host performance.

Note

These features limit the volume of telemetry being processed, not the number of alerts being generated.

Hard resource limits

This section describes the design, implementation, and usage of SLS's hard resource limiting capabilities. This feature allows you to set exact limits for CPU and memory resources. This is implemented using Linux cgroups under the CPU and Memory subsystems. SLS uses the cgroup named sophoslinuxsensor. The implementation requires a supervisor process which executes and monitors the actual sensor. This accomplishes multiple desired behaviors. First, this forces all routines of the SLS process to reside in the cgroup. Since the supervisor process must be done as the root user, this design also allows us to drop SLS privileges by executing the child process as a separate user. It also enables the supervisor process to restart the child sensor process when it exits and to monitor the SLS process for performance and violations.

Quick Start

To enable the resource limiter for SLS and restrict the sensor's CPU and memory usage, add the following block to SLS's configuration file, which by default at /etc/sophos/runtimedetections-rules.yaml:

use_supervisor: true
use_resource_limits: true

This will enable the resource limiter with its default thresholds of 5% CPU and 1024MB memory. To apply these changes, restart SLS.

Usage

The resource configurations are read in from SLS's configuration file, which by default is at /etc/sophos/runtimedetections-rules.yaml. The path to the configuration file may be overridden by setting the the RUNTIMEDETECTIONS_CONFIG environment variable. The following section describes the hard resource limiter configuration fields.

Configuration

The following fields are set in the SLS configuration file. They are also bound to environment variables.

  • use_supervisor - Boolean value determining whether or not to use the supervisor and, therefore, the hard resource limits.

    • Environment Variable: RUNTIMEDETECTIONS_USE_SUPERVISOR 
    • Type: boolean
    • Example: truefalse
    • Default: false
  • use_resource_limits - Boolean value determining whether or not to use the hard resource limiter functionality of the supervisor.

    • Environment Variable: RUNTIMEDETECTIONS_USE_RESOURCE_LIMITS
    • Type: boolean
    • Example: truefalse
    • Default: false
  • memory_limit - The maximum amount of memory that the SLS process is allowed to consume. The string must end in G (gigabyte) or M (megabyte). A special value of "0" indicates no limit.

    • Environment Variable: RUNTIMEDETECTIONS_MEMORY_LIMIT
    • Type: String
    • Example: 512M1G0
    • Default: 1024M
  • cpu_limit - The percentage of total CPU time that SLS will be allowed to be scheduled for, adjusting for multi-core processors. The special value of 0 indicates no limit.

    • Environment Variable: RUNTIMEDETECTIONS_CPU_LIMIT
    • Type: Integer
    • Example: 10.01520.50
    • Default: 5.0

Warning

Avoid managing resources through supervisors like systemd as this can cause unpredictable behavior when dealing with multi-core processors.

  • sensor_user - The user that the SLS process will run as. This is a string of the user name.

    • Environment Variable: RUNTIMEDETECTIONS_SENSOR_USER
    • Type: String
    • Example: myuserrootgrant
    • Default: sophos-spl-user
  • log_cgroup_metrics - Boolean value specifying whether or not to log cgroup metrics to stderr on a 2 minute interval.

    • Environment Variable: RUNTIMEDETECTIONS_LOG_CGROUP_METRICS
    • Type: boolean
    • Example: truefalse
    • Default: false

Verification

You can ensure that cgroup configuration is properly working by using the top utility. When running you should be able to see the memory and CPU usage of the SLS process in the form of percentages of total resources. For CPU, SLS should never go above the configured CPU limit multiplied by the amount of cores on the machine (the shell utility nproc will print number of cores). For memory, you can calculate the percentage of the machines total memory which is displayed in top in KiB by default.

Alert limiter

This section describes the usage of SLS's soft resource limiting capabilities. Alert limiting allows you to set rate limits on alert output to limit the alert volume SLS will transmit to a SIEM, logging stack, or webhook.

Usage

The resource configurations are read in from SLS's configuration file which, by default, is at /etc/sophos/runtimedetections-rules.yaml. The path to the configuration file may be overridden by setting the RUNTIMEDETECTIONS_CONFIG environment variable. The following section describes the alert limiter configuration fields.

Configuration

Alert limiting is specified on a per-output basis, which allows configurations where certain outputs have higher limits than others. By default, no limits are applied and an output will receive all alerts. The following additional keys should be specified on an alert_output to configure alert limiting:

  • limit_period - Duration value indicating the period over which to limit alerts.
    • Example: 60s2m
    • Default: none
  • limit_per_period - The number of sustained events per period after which point - when combined with limit_period - the circuit breaker will trip and alerts will be discarded.
    • Example: 100<
    • Default: none

The following example configures at most 5 alerts per minute to be written to standard output:

alert_output:
    outputs:
      - type: stdout
        enabled: true
        limit_per_period: 5
        limit_period: '60s'

Violations and monitoring

The cgroups for memory and CPU handle violations differently. When the SLS process runs out of memory it will be killed by the kernel and restarted by the supervisor process. The CPU cgroup uses a concept of periods and quotas. The period is a configured amount of time and the quota refers to a number of microseconds per period. SLS uses a period of one second and the quota is based on the configured percentage. When the SLS process has used up its quota of CPU time, it will be throttled, meaning it won't be scheduled on the CPU until the end of the period. Both of these will have effects on SLS's coverage of telemetry events.

The cgroup exposes statistics about CPU throttling which are then exposed by the supervisor process via logs to stderr. This must be turned on via the log_cgroup_merics configuration option.

Restarts

When the SLS child process exits for cgroup violations or otherwise, the supervisor process will restart it. This event is logged to stderr.

Capabilities

As part of your installation, SLS should have the foillowing capabilities:

  • CAP_CHOWN
  • CAP_DAC_OVERRIDE
  • CAP_FOWNER
  • CAP_KILL
  • CAP_SETGID
  • CAP_SETUID
  • CAP_SETPCAP
  • CAP_IPC_LOCK
  • CAP_SYS_PTRACE
  • CAP_SYS_ADMIN
  • CAP_SYSLOG

Since the supervisor process executes SLS as a unprivileged user, this is necessary. If you are getting "permission denied" errors, you can verify these capabilities are set with getcap <sensor_binary>. You can set these capabilities with setcap cap_chown,cap_dac_override,cap_fowner,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_ipc_lock,cap_sys_ptrace,cap_sys_admin,cap_syslog=+epi <sensor_binary>.

Back to top