Skip to content

Setting up Investigations with Hadoop HttpFS

Overview

Hadoop is an open-source framework capable of processing applications executed in a distributed computing environment. Hadoop HttpFS is a REST API that runs on nodes in a Hadoop cluster and allows for performing Hadoop actions through a single access point. Using Sophos Linux Sensor (SLS), you can set up investigations to send data to a Hadoop cluster using HttpFS by doing as follows:

Requirements

  • Hadoop Cluster 
  • SLS running in your environment
  • Kerberos Server (Optional, required for using authentication)
  • Kerberos keytab file for the sensor (Optional, required for using authentication)

1. Configuring SLS:

Edit the configuration file /etc/sophos/runtimedetections-rules.yaml.

Add a sink for HttpFS, enable the sensor to create directories, and turn on the flight recorder. Here's an example:

cloud_meta: auto
blob_storage_create_buckets_enabled: true
investigations:
  reporting_interval: 30s
  sinks:
    - name: "[namenode hostname/ip]:14000/runtimedetections-investigations/"
      backend: httpfs
      automated: true
      type: parquet
      partition_format: "hostname_partition={{.Hostname}}/date_partition={{.Time.Format \"2006-01-02\"}}"
      credentials
        blob_storage_httpfs_user: [hadoop user to write as]
        blob_storage_httpfs_use_ssl: false
flight_recorder:
  enabled: true
  tables:
    - name: "shell_commands"
      rows: 1000
      enabled: true
    - name: "tty_data"
      rows: 1000
      enabled: true
    - name: "connections"
      rows: 2000
      enabled: true
    - name: "sensor_metadata"
      rows: 500
      enabled: true
    - name: "alerts"
      rows: 100
      enabled: true
    - name: "sensors"
      rows: 10
      enabled: true
    - name: "process_events"
      rows: 4000
      enabled: true
    - name: "container_events"
      rows: 300
      enabled: true

Save the modified file and restart SLS. Next, check that SLS was able to write to HttpFS by checking HDFS:

$ hdfs dfs -ls /runtimedetections-investigations/

This should list all of the tables that were enabled in the config:

drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/alerts
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/connections
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/container_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/process_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensor_metadata
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensors

2. Editing SLS config:

After confirming that SLS is properly configured, edit the reporting interval to a more reasonable time. Here's an example:

cloud_meta: auto
blob_storage_create_buckets_enabled: true
investigations:
  reporting_interval: 5m
  #...

3. Authentication with Kerberos (Optional)

SLS has the ability to write to Kerberos protected HttpFS Clusters. The four pieces of information needed in order to authenticate are:

blob_storage_httpfs_krb5_confThe krb5 client config configured for the relevant kerberos environment.
Note: Currently, the only encryption types supported by the client are: des3-cbc-sha1-kd and des3-hmac-sha1
blob_storage_httpfs_keytabPath to the client keytab file
blob_storage_httpfs_principalThe principal in the keytab to use.
Ie. "root/webserver-7fc8ddf957-f25w5.default.svc.cluster.local"
blob_storage_httpfs_domainThe domain for the principle. Ie. EXAMPLE.COM

Here's an example:

cloud_meta: auto
blob_storage_create_buckets_enabled: true
investigations:
  reporting_interval: 30s
  sinks:
    - name: "[namenode hostname/ip]:14000/runtimedetections-investigations/"
      backend: httpfs
      automated: true
      type: parquet
      partition_format: "hostname_partition={{.Hostname}}/date_partition={{.Time.Format \"2006-01-02\"}}"
      credentials:
        blob_storage_httpfs_auth_type: kerberos
        blob_storage_httpfs_use_ssl: false
        blob_storage_httpfs_krb5_conf: /etc/sophos/krb5.conf
        blob_storage_httpfs_keytab: /etc/sophos/root.keytab
        blob_storage_httpfs_principal: "root/kerberos-sidecar-7fc8ddf957-f25w5.default.svc.cluster.local"
        blob_storage_httpfs_domain: "EXAMPLE.COM"
flight_recorder:
  enabled: true
  tables:
    - name: "shell_commands"
      rows: 1000
      enabled: true
    - name: "tty_data"
      rows: 1000
      enabled: true
    - name: "connections"
      rows: 2000
      enabled: true
    - name: "sensor_metadata"
      rows: 500
      enabled: true
    - name: "alerts"
      rows: 100
      enabled: true
    - name: "sensors"
      rows: 10
      enabled: true
    - name: "process_events"
      rows: 4000
      enabled: true
    - name: "container_events"
      rows: 300
      enabled: true

Deploy the configuration and restart SLS. Next, check that the sensor is able to write to HttpFS by checking HDFS:

$ hdfs dfs -ls /runtimedetections-investigations/ 

This should list all of the tables that were enabled in the config:

drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/alerts
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/connections
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/container_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/process_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensor_metadata
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensors
Back to top