What's in your bucket

What Tonbo Artifacts writes into a BYO bucket, what never appears there, and how to estimate object counts for a security or cost review.

If you brought your own bucket, you probably want to know what we write into it, both to verify the product is doing what it says and for security review.

What's in the bucket

When you mount a workspace and write to it, what lands in your bucket is immutable chunks. Under the workspace's prefix:

your-bucket/
  your-org/my-workspace/
    chunks/
      0/0/1_0_4194304
      0/0/2_0_4194304
      0/0/3_0_1234567
      0/1/4_0_4194304
      0/1/5_0_4194304
      ...

Path convention: chunks/{slice_id / 1_000_000}/{slice_id / 1_000}/{slice_id}_{block_index}_{block_size}. The two intermediate directories spread chunks for S3 listing performance; they're not semantically meaningful.

File naming:

  • slice_id: monotonic integer assigned by the metadata service.
  • block_index: position of this block within the slice (0 for the first ≤4 MiB block, 1 for the next, and so on). Small files have a single block, so you'll mostly see 0.
  • block_size: actual byte size of this chunk. Default block is 4 MiB (4_194_304); the trailing chunk of a file can be smaller.

Inspect from your host:

aws s3 ls s3://your-bucket/your-org/my-workspace/chunks/ --recursive --summarize
2026-05-22 18:00:01    4194304 your-org/my-workspace/chunks/0/0/1_0_4194304
2026-05-22 18:00:01    4194304 your-org/my-workspace/chunks/0/0/2_0_4194304
2026-05-22 18:00:01    1234567 your-org/my-workspace/chunks/0/0/3_0_1234567
...
Total Objects: 247
Total Size:    1023421238

You can read the total size and object count from the bucket directly. Each 4 MiB chunk is one object, so a roughly 1 GiB workspace shows ~250 objects.

What's not in the bucket

Everything except raw chunk bytes lives in the metadata service. You won't find any of this in your bucket:

  • File and directory names, paths, and tree structure
  • mtime, atime, ctime, file size
  • POSIX permissions, owner UID/GID
  • Symlinks, extended attributes (xattr)
  • Workspace name, handle, labels
  • Grants (who can mount this workspace)
  • Mount sessions (active tokens and their scopes)

What follows from this:

  • You can't reconstruct a file from the bucket alone. A 4 MiB chunk is anonymous bytes with no association to a file path. The metadata service has to interpret it.
  • You can't rename or restructure files by manipulating S3. Renaming foo.txt to bar.txt is a metadata-only operation; the chunks don't move.
  • Chunks aren't deduplicated by content. Each slice gets a fresh id, so writing the same bytes twice (even within one file) produces separate chunks.

This is the structural difference from a "files as S3 objects" approach (Archil, s3fs, goofys): those keep bytes readable by any S3-aware tool with nothing else running, at the cost of turning metadata operations (listing a deep tree, statting many files) into S3 API calls.

Estimating bucket size

Workspace logical sizeObject count
100 MB~25
1 GB~250
10 GB~2,500
100 GB~25,000
1 TB~250,000

There is no cross-workspace dedup: 20 workspaces holding the same data hold 20 copies of the chunks. Within a workspace, the metadata service tracks which slices are still referenced; a chunk becomes eligible for garbage collection once nothing references its slice, after overwrites, deletes, or slice compaction.

Cleanup on workspace delete

  • Managed workspaces: deleting the workspace triggers a best-effort delete of every chunk under its prefix. Failures during the bulk delete don't block the workspace from transitioning to deleted; orphan chunks get reaped by a background GC.
  • BYO workspaces: deleting the workspace clears metadata-service state but doesn't touch your bucket. Chunks remain under the prefix. To free storage, list the prefix and delete from your bucket yourself: aws s3 rm s3://your-bucket/your-org/my-workspace/ --recursive.

Relation to other concepts