Skip to content

Data Lake queries

Data Lake queries let you search security and compliance data that your devices upload to the cloud.


Data Lake uploads are turned off by default so that customers can decide which devices to exclude before turning on the uploads. Large environment customers might experience a sudden increase in network traffic if the uploads are turned on by default.

You can run Data Lake queries with Live Discover, a feature in our Threat Analysis Center.

Live Discover now lets you choose which data source you use when you set up and run a query:

  • Endpoints that are currently connected.
  • The Data Lake in the cloud.

For help with Live Discover see Live Discover.

How the Data Lake works

We host the Data Lake and provide scheduled “hydration queries” that define which data your endpoints upload to it.

However, before you use Data Lake queries, you must make sure that data is being uploaded. To turn on uploads of data, see Data Lake uploads.

We store the data for 90 days.

We provide pre-prepared Data Lake queries you can run. You can use them as they are or edit them. You can also create your own queries.

Benefits of Data Lake queries

Data Lake queries have some advantages over endpoint queries.

They always give results for all endpoints, whether they’re connected or not.

They can query data that's up to 90 days old. You can configure the time period so that they only generate as much data as you need.

They can be scheduled.

They can give you access to data uploaded by other Sophos products you're using (shown as “sensors” in Live Discover). For example:

  • Sophos Cloud Optix can upload data from your cloud environments to the Data Lake. You need to turn this on in Sophos Cloud Optix.
  • Sophos Email can upload data if you have M365 integration and turn on Auto search and remediate.
  • Sophos Firewall can upload data if you have Central Firewall Reporting set up.