Sharding Data Layer

The Sharding Data Layer is the engine at the core of Untrace's architecture. It encrypts data on the client, splits the encryption key into threshold shares, encodes the encrypted payload into recoverable shards, and distributes those shard bundles across independent nodes so no single node, operator, or jurisdiction can reconstruct the original file.

Why Sharding?

Traditional storage - cloud, on-premise, or even most blockchains - keeps data whole in one location. That is the root of every data breach: a single target worth attacking.

Untrace's sharding layer eliminates the target. When data is sharded:

No node holds enough information to breach - fragments are individually meaningless
No subpoena is sufficient - no single jurisdiction controls enough shards
No single point of failure exists - the network tolerates node failures without data loss
No administrator can be compelled - there is no administrator with access to the whole

The Sharding Pipeline

[ Raw Data Blob ]
        ↓
[ AES-256-GCM or XChaCha20-Poly1305 encryption ]
  -> Outputs: encrypted_blob, symmetric_key K
        ↓
[ Shamir's Secret Sharing applied to K ]
  -> Outputs: N key shares
        ↓
[ Encrypted blob erasure-coded into N recoverable data shards ]
  -> Outputs: N encrypted payload shards
        ↓
[ Each node receives one encrypted shard bundle ]
  -> key_share_i, payload_shard_i, shard_commitment_i
        ↓
[ On-chain: vault ID, access policy, and manifest commitment ]
        ↓
[ Retrieval: wallet signs nonce-scoped request ]
        ↓
[ Nodes verify signature and authorization before returning shards ]
        ↓
[ Client reconstructs K-of-N shares and decrypts locally ]

The encryption and sharding happen entirely client-side. The Untrace network sees only encrypted, individually useless fragments.

The K-of-N guarantee requires two separate mechanisms:

Shamir's Secret Sharing protects the symmetric key.
Erasure coding protects payload availability so any threshold set of valid payload shards can recover the encrypted file.

Splitting an encrypted file into ordinary chunks is not enough for K-of-N recovery, because every missing chunk would make the file incomplete. The production design should use Reed-Solomon-style erasure coding or an equivalent recoverable encoding for payload shards.

Shard Distribution

Node Selection

When a vault is created, the protocol selects N nodes for shard assignment using a deterministic but unpredictable selection algorithm seeded by the vault ID and current epoch:

node_set = select_nodes(
  vault_id,
  epoch,
  n = threshold_config.total_shards,
  constraints = {
    max_per_operator: 1,
    geo_diversity: true,
    as_diversity: true,
    jurisdiction_diversity: true
  }
)

This selection can be anchored on-chain so anyone can confirm that distribution followed the protocol rules.

Threshold Configuration

Sensitivity Level	K (threshold)	N (total shards)	Tolerates
Standard	3	5	2 node failures
High	5	9	4 node failures
Maximum	7	13	6 node failures

Enterprises with specific compliance requirements can configure custom (K, N) parameters.

Shard Node Authorization

Sharding nodes have their own private keys and signed node identities. They use those keys to authenticate node responses, sign delivery receipts, and participate in audit flows.

Shard release is gated by wallet signatures, not by a ZK proof. A retrieval request must include:

Vault ID
Shard generation ID
Requesting wallet address or DID binding
Requested action, such as read
Nonce and expiry
Chain or policy context
Wallet signature over the full request

Each node independently verifies:

The wallet signature matches the requester.
The request is fresh and cannot be replayed.
The wallet is authorized by the current vault policy.
The node is assigned to the requested shard.
The shard generation has not been revoked or deleted.

Only then does the node return its encrypted shard bundle. The client still reconstructs and decrypts locally.

Shard Integrity

Each shard bundle is protected by three integrity layers:

1. MAC at the shard level Each shard is authenticated with a Message Authentication Code. Any tampering with a shard is detected during reconstruction, and the corrupted shard is rejected.

2. Manifest commitment A Merkle root or equivalent vector commitment for the shard manifest is anchored at vault creation time. At reconstruction, the client verifies retrieved shards against the committed manifest before decrypting.

3. Signed delivery receipts Nodes sign shard delivery receipts with their node private keys. This creates an auditable trail of which node served which shard generation in response to which wallet-signed request.

Shard Replication and Availability

Beyond the primary N nodes, the protocol can maintain backup replicas to improve availability:

Each shard has M backup nodes that hold encrypted copies
Backup nodes activate if a primary node fails availability checks
The vault owner is never required to manage replication manually
Replication targets maintain geographic and jurisdictional diversity constraints

Early versions should use challenge-response availability checks before attempting Filecoin-grade PoRep or PoSt. Production storage proofs and slashing are feasible, but they are a later protocol phase.

Shard Lifecycle

Vault Write
    -> Shards created, distributed, committed on-chain
    -> Nodes begin serving availability checks

Vault Update
    -> New shard generation created for the same vault ID
    -> Previous generation retained or pruned per policy
    -> New manifest commitment anchored

Vault Delete
    -> Deletion instruction signed by owner wallet
    -> Broadcast to all shard nodes
    -> Nodes destroy shards and sign deletion receipts
    -> On-chain commitment marked as deleted

Deletion depends on node compliance and economic enforcement. A complete protocol should combine signed deletion receipts, periodic audits, reputation penalties, and eventual slashing for nodes that fail deletion requirements.

Comparison With Alternative Approaches

Approach	Breach Possible	Single Point of Failure	Privacy by Default
Centralized cloud	Yes	Yes	No
Encrypted cloud	Yes (key theft)	Yes	No
IPFS / Filecoin	Yes (content addressed, public)	Partial	No
Untrace Sharding Layer	No	None	Yes

IPFS and Filecoin store whole data objects - sharding in those systems is primarily about redundancy, not privacy. Untrace sharding is different: each fragment is encrypted, threshold protected, and released only after shard nodes verify wallet-signed authorization. ZK proofs are used separately in dashboard workflows, such as proving facts from bank statement PDFs after a user already has access to their files.