Setting up Investigations with Hadoop HttpFS
Overview
Hadoop is an open-source framework capable of processing applications executed in a distributed computing environment. Hadoop HttpFS is a REST API that runs on nodes in a Hadoop cluster and allows for performing Hadoop actions through a single access point. Using Sophos Linux Sensor (SLS), you can set up investigations to send data to a Hadoop cluster using HttpFS by doing as follows:
Requirements
- Hadoop Cluster
- SLS running in your environment
- Kerberos Server (Optional, required for using authentication)
- Kerberos keytab file for the sensor (Optional, required for using authentication)
1. Configuring SLS:
Edit the configuration file /etc/sophos/runtimedetections-rules.yaml
.
Add a sink for HttpFS, enable the sensor to create directories, and turn on the flight recorder. Here's an example:
cloud_meta: auto
blob_storage_create_buckets_enabled: true
investigations:
reporting_interval: 30s
sinks:
- name: "[namenode hostname/ip]:14000/runtimedetections-investigations/"
backend: httpfs
automated: true
type: parquet
partition_format: "hostname_partition={{.Hostname}}/date_partition={{.Time.Format \"2006-01-02\"}}"
credentials
blob_storage_httpfs_user: [hadoop user to write as]
blob_storage_httpfs_use_ssl: false
flight_recorder:
enabled: true
tables:
- name: "shell_commands"
rows: 1000
enabled: true
- name: "tty_data"
rows: 1000
enabled: true
- name: "connections"
rows: 2000
enabled: true
- name: "sensor_metadata"
rows: 500
enabled: true
- name: "alerts"
rows: 100
enabled: true
- name: "sensors"
rows: 10
enabled: true
- name: "process_events"
rows: 4000
enabled: true
- name: "container_events"
rows: 300
enabled: true
Save the modified file and restart SLS. Next, check that SLS was able to write to HttpFS by checking HDFS:
hdfs dfs -ls /runtimedetections-investigations/
This should list all of the tables that were enabled in the config:
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/alerts
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/connections
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/container_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/process_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensor_metadata
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensors
2. Editing SLS config:
After confirming that SLS is properly configured, edit the reporting interval to a more reasonable time. Here's an example:
cloud_meta: auto
blob_storage_create_buckets_enabled: true
investigations:
reporting_interval: 5m
#...
3. Authentication with Kerberos (Optional)
SLS has the ability to write to Kerberos protected HttpFS Clusters. The four pieces of information needed in order to authenticate are:
blob_storage_httpfs_krb5_conf | The krb5 client config configured for the relevant kerberos environment. Note: Currently, the only encryption types supported by the client are: des3-cbc-sha1-kd and des3-hmac-sha1 |
blob_storage_httpfs_keytab | Path to the client keytab file |
blob_storage_httpfs_principal | The principal in the keytab to use. Ie. "root/webserver-7fc8ddf957-f25w5.default.svc.cluster.local" |
blob_storage_httpfs_domain | The domain for the principle. Ie. EXAMPLE.COM |
Here's an example:
cloud_meta: auto
blob_storage_create_buckets_enabled: true
investigations:
reporting_interval: 30s
sinks:
- name: "[namenode hostname/ip]:14000/runtimedetections-investigations/"
backend: httpfs
automated: true
type: parquet
partition_format: "hostname_partition={{.Hostname}}/date_partition={{.Time.Format \"2006-01-02\"}}"
credentials:
blob_storage_httpfs_auth_type: kerberos
blob_storage_httpfs_use_ssl: false
blob_storage_httpfs_krb5_conf: /etc/sophos/krb5.conf
blob_storage_httpfs_keytab: /etc/sophos/root.keytab
blob_storage_httpfs_principal: "root/kerberos-sidecar-7fc8ddf957-f25w5.default.svc.cluster.local"
blob_storage_httpfs_domain: "EXAMPLE.COM"
flight_recorder:
enabled: true
tables:
- name: "shell_commands"
rows: 1000
enabled: true
- name: "tty_data"
rows: 1000
enabled: true
- name: "connections"
rows: 2000
enabled: true
- name: "sensor_metadata"
rows: 500
enabled: true
- name: "alerts"
rows: 100
enabled: true
- name: "sensors"
rows: 10
enabled: true
- name: "process_events"
rows: 4000
enabled: true
- name: "container_events"
rows: 300
enabled: true
Deploy the configuration and restart SLS. Next, check that the sensor is able to write to HttpFS by checking HDFS:
hdfs dfs -ls /runtimedetections-investigations/
This should list all of the tables that were enabled in the config:
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/alerts
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/connections
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/container_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/process_events
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensor_metadata
drwxr--r-- - root supergroup 0 2020-10-27 18:33 /runtimedetections-investigations/sensors