Architecture Guide

Agentless vs. Agent-Based DSPM

The deployment model is the first architectural decision in any DSPM evaluation, and it is the one that most directly constrains the vendor shortlist. Agentless platforms connect to data stores via API, with no software to deploy and no infrastructure to manage, achieving coverage through cloud-native connections. Agent and collector platforms deploy software closer to the data, on hosts, servers, or network collection points, to reach coverage depth or real-time visibility that API-based approaches cannot deliver. These are different answers to different coverage requirements, not a better option and a worse one.

What agentless means in practice

Agentless DSPM platforms connect to cloud services, data stores, and SaaS applications through their native APIs and cloud provider SDKs. The platform requests read access to data stores, S3 buckets, Snowflake databases, Azure Blob containers, Salesforce objects, and scans them without any software running on the systems that host those data stores.

The operational implication: deployment is fast. A cloud-native organization can grant API permissions, connect the platform to its cloud accounts, and have the first scan results within hours or days. There's no change management process for agent deployment, no compatibility testing with host operating systems, no ongoing agent lifecycle management.

The coverage implication: agentless platforms are bounded by what APIs expose. If a cloud provider's API surfaces the data in a storage service, the platform can scan it. A legacy file server running CIFS, a mainframe dataset, or a NAS device on an on-premises network typically has no comparable API, and an agentless platform simply has no path to reach data in that location.

The behavioral implication: API-based platforms see data at rest. They scan what exists in storage, classify it, and assess what access configurations apply. They do not observe data access in real-time. A user who downloads a sensitive file from S3 generates an event in CloudTrail, not in the DSPM platform's activity feed.

The shadow data advantage. Agentless platforms have a structural advantage for shadow data discovery: because they connect directly to cloud provider APIs, they can enumerate all storage resources in a cloud account, including ones not registered in any governance catalog. An agentless scan finds the forgotten S3 bucket that was created three years ago by a developer for a one-time migration and never cleaned up. An agent-based platform can only find that bucket if the agent happens to be pointed at it.

What agent-based and collector architectures mean in practice

Agent-based platforms deploy software, either lightweight agents on hosts or network-layer collectors, that sits closer to the data than an API connection can reach. The classic example is Varonis: the platform deploys collectors that sit on or near Windows file servers, NAS devices, and Active Directory, where they capture file system activity in real-time. That collector architecture is the only way to achieve the coverage Varonis is built around: real-time observation of every file access, rename, move, and delete across a Windows environment.

The operational implication: deployment is slower and more complex. Collectors need to be sized, installed, and maintained. Compatibility with host operating systems and file system versions must be validated. Agent deployments in complex on-premises environments typically require professional services involvement and take weeks to months to reach full coverage.

The coverage implication: agent and collector architectures reach data sources that APIs cannot. On-premises file servers, network-attached storage, legacy databases, and mainframe environments are all in scope for agent-based platforms. For organizations with significant on-premises data estates, that coverage is the entire reason to choose this architecture in the first place.

The behavioral implication: agent and collector architectures observe access in real-time. This is the foundation for behavioral analytics, User and Entity Behavior Analytics (UEBA), Data Detection and Response (DDR), and automated threat detection. Detecting ransomware staging requires watching thousands of rapid sequential file reads and writes. That data does not exist in an API response. It only exists in a real-time file system event stream.

The three requirements that determine which architecture fits

On-premises and file server coverage

If a meaningful fraction of the sensitive data estate lives on Windows file servers, NAS, SharePoint on-premises, or legacy databases with no cloud API, an agentless platform cannot cover it. The question is not whether some data is on-premises; most enterprise environments have some. The question is whether the on-premises data is a significant part of the data risk requiring coverage. If yes, agent or collector architecture is required. If the on-premises data estate is residual and the primary risk is in cloud and SaaS, agentless is the faster and lower-overhead path.

Real-time behavioral detection requirements

If the use case includes detecting ransomware staging, insider threat, compromised credential behavior, or any threat scenario that requires watching data access as it happens, agentless architecture cannot satisfy it. API-based platforms do not observe real-time data access. Agent or collector architecture is the only path to genuine behavioral detection. If the threat model is data exposure and misconfiguration rather than behavioral threat detection, this requirement does not apply, and agentless platforms are not disadvantaged.

Deployment overhead capacity

Agent and collector deployments require ongoing operational capacity: agent installation, version management, compatibility maintenance, and scaling as the environment changes. In environments that are growing fast, are highly cloud-native, or have limited infrastructure operations capacity, agentless platforms deliver coverage without that overhead. This constraint shows up in practice more often than buyers expect: organizations that underestimate the cost of managing a collector deployment at scale frequently find that DSPM coverage lags the data estate, because new environments aren't connected promptly.

Hybrid environments: where both architectures exist

Most enterprise environments are hybrid: cloud-native workloads alongside on-premises infrastructure, with the balance shifting toward cloud over time but the on-premises data estate not disappearing. The coverage question in these environments comes down to which architecture covers which layer, rather than which architecture wins outright.

The practical outcome in many hybrid environments is that organizations end up running both: an agentless cloud-native platform covering the cloud and SaaS data estate, and an agent-based platform covering on-premises file servers and behavioral detection. Running both is a reasonable acknowledgment that these are different coverage layers with different architectural requirements, not a sign that the procurement process went wrong somewhere. The actual risk is acquiring both without clearly scoping each to its layer, which produces redundant coverage in some areas and gaps in others.

The coverage mapping worth establishing before platform selection: which data sources are in scope, how they break down between cloud-native and on-premises, and which of those require behavioral monitoring as distinct from classification and posture. The answers to those questions narrow the architecture decision significantly.

Where the market is heading

The dominant direction is agentless. Most net-new DSPM deployments over the past three years have been in cloud-native environments where agentless architecture is the natural fit, and most pure-play DSPM platforms founded since 2019 are agentless. Varonis is the major exception, and its architectural choice to use collectors reflects a deliberate design decision rather than a legacy constraint: that architecture is the foundation for the DDR and behavioral analytics capabilities that agentless platforms aren't built to replicate.

Agent-based and collector architectures are not going away. As long as on-premises file servers, NAS environments, and legacy infrastructure hold significant sensitive data, and as long as behavioral threat detection remains a real requirement, collector-based platforms will cover the layer that agentless platforms cannot reach.