S3 Metadata Table Scan for Amazon S3

S3 metadata table scan uses Amazon S3's metadata tables feature to back up storage contents efficiently, specifically for large S3 buckets with millions of objects, replacing expensive API-based scanning with native metadata table processing.

Resource Requirements

Amazon S3 permissions for metadata table management
AWS Lake Formation configuration for S3 Tables
Local storage for metadata processing
Network bandwidth for metadata downloads

How It Works

AWS generates metadata reports automatically based on the configured tables
Change detection compares metadata for incremental operations

Configuration Concepts

By default, if object count exceeds 500 million during S3 discovery, the scan type for the default subclient will automatically switch to S3 metadata table scan and use the new indexing schema for the newly created instance.

For multi-billion object buckets, split content across multiple subclients using bucket and prefix patterns to enable parallel processing. Metadata tables are updated continuously and store data in S3 Tables with automated lifecycle management.

Planning Considerations

S3 metadata table scan operates on metadata tables (not real-time) and does not include deleted markers from versioned buckets.
S3 metadata table scan is not supported when the configured subclient content is a subfolder within a bucket. S3 metadata table scan can only be enabled at the bucket level.
If your subclient is configured for a specific subfolder, do not enable metadata settings at the parent bucket level, as this will track the entire bucket and significantly increase AWS costs.
AWS Lake Formation must be properly configured in each region where S3 buckets are located.

S3 Tables and Lake Formation Setup

When S3 metadata table is enabled, Commvault Cloud automatically creates metadata table configurations for S3 buckets. This requires proper AWS Lake Formation permissions and S3 Tables access. If your IAM permissions are restricted, the initial job will fail. To resolve this, you must:

Configure the IAM user as an AWS Lake Formation administrator for each region
Grant necessary S3 Tables permissions for metadata table creation
Provide access to journal and S3 metadata tables through Lake Formation

This ensures consistent S3 metadata table processing even when automatic setup requires elevated Lake Formation permissions.

Enable Table Buckets Integration

Amazon S3 automatically integrates table buckets with AWS analytics services (Athena, Glue, and Lake Formation) in supported regions.

Steps to Enable Integration

Open the Amazon S3 Console: https://console.aws.amazon.com/s3/
Navigate to Table Buckets in the left navigation pane.
Amazon S3 will automatically attempt to integrate your table buckets in supported regions with AWS analytics services.

Supported Regions

S3 metadata table scan is available in the following AWS regions:

us-east-1 (US East - N. Virginia)
us-east-2 (US East - Ohio)
us-west-2 (US West - Oregon)
eu-west-1 (Europe - Ireland)
ap-northeast-1 (Asia Pacific - Tokyo)
eu-central-1 (Europe - Frankfurt)

Frequently Asked Questions

Why do some objects get backed up twice during S3 metadata table scan jobs?

Due to S3 metadata table update timing, there can be a delay for object changes to reflect in S3's metadata tables. To prevent data loss, the system might back up some objects in both the current job and the subsequent job during this update window. This ensures no data is missed due to timing delays in the metadata table update system.

What happens to metadata table configurations when a subclient is disabled or deleted?

Once metadata configuration is enabled at the bucket level, it will not be automatically removed in any of the following scenarios:

Manual content deletion: When a bucket is manually removed from subclient content (between backups), the metadata configuration remains enabled at the bucket level.
Tag-based content changes: For tag-based subclients, if a bucket with enabled metadata becomes unqualified due to tag changes at the bucket level, the metadata configuration persists.
Subclient deletion: When a subclient is deleted, the metadata configuration remains enabled for all buckets that were part of that subclient.
Disabling metadata scan: When S3 metadata table scan is manually disabled for a subclient, the underlying S3 metadata table configuration and S3 Tables resources are not removed.

In all these cases, the associated S3 metadata table configuration and corresponding S3 Tables resources must be manually cleaned up to avoid unnecessary storage costs and maintain optimal S3 Tables performance.

Does S3 metadata table scan capture object versioning operations in Amazon S3?

No, S3 metadata tables track current object versions only. Version history and delete markers are not included in S3 metadata table scan operations and require standard API-based scanning to capture.

What happens if Lake Formation is not configured in a region?

If AWS Lake Formation is not set up in a region where your S3 buckets are located, S3 metadata table scan cannot function. You must configure Lake Formation administrators and grant appropriate permissions in each region before enabling S3 metadata table scan.