Sharding Data Layer
The Sharding Data Layer is the engine at the core of Untrace's architecture. It encrypts data on the client, splits the encryption key into threshold shares, encodes the encrypted payload into recoverable shards, and distributes those shard bundles across independent nodes so no single node, operator, or jurisdiction can reconstruct the original file.
Why Sharding?
Traditional storage - cloud, on-premise, or even most blockchains - keeps data whole in one location. That is the root of every data breach: a single target worth attacking.
Untrace's sharding layer eliminates the target. When data is sharded:
- No node holds enough information to breach - fragments are individually meaningless
- No subpoena is sufficient - no single jurisdiction controls enough shards
- No single point of failure exists - the network tolerates node failures without data loss
- No administrator can be compelled - there is no administrator with access to the whole
The Sharding Pipeline
[ Raw Data Blob ]
↓
[ AES-256-GCM or XChaCha20-Poly1305 encryption ]
-> Outputs: encrypted_blob, symmetric_key K
↓
[ Shamir's Secret Sharing applied to K ]
-> Outputs: N key shares
↓
[ Encrypted blob erasure-coded into N recoverable data shards ]
-> Outputs: N encrypted payload shards
↓
[ Each node receives one encrypted shard bundle ]
-> key_share_i, payload_shard_i, shard_commitment_i
↓
[ On-chain: vault ID, access policy, and manifest commitment ]
↓
[ Retrieval: wallet signs nonce-scoped request ]
↓
[ Nodes verify signature and authorization before returning shards ]
↓
[ Client reconstructs K-of-N shares and decrypts locally ]
The encryption and sharding happen entirely client-side. The Untrace network sees only encrypted, individually useless fragments.
The K-of-N guarantee requires two separate mechanisms:
- Shamir's Secret Sharing protects the symmetric key.
- Erasure coding protects payload availability so any threshold set of valid payload shards can recover the encrypted file.
Splitting an encrypted file into ordinary chunks is not enough for K-of-N recovery, because every missing chunk would make the file incomplete. The production design should use Reed-Solomon-style erasure coding or an equivalent recoverable encoding for payload shards.
Shard Distribution
Node Selection
When a vault is created, the protocol selects N nodes for shard assignment using a deterministic but unpredictable selection algorithm seeded by the vault ID and current epoch:
node_set = select_nodes(
vault_id,
epoch,
n = threshold_config.total_shards,
constraints = {
max_per_operator: 1,
geo_diversity: true,
as_diversity: true,
jurisdiction_diversity: true
}
)
This selection can be anchored on-chain so anyone can confirm that distribution followed the protocol rules.
Threshold Configuration
| Sensitivity Level | K (threshold) | N (total shards) | Tolerates |
|---|---|---|---|
| Standard | 3 | 5 | 2 node failures |
| High | 5 | 9 | 4 node failures |
| Maximum | 7 | 13 | 6 node failures |
Enterprises with specific compliance requirements can configure custom (K, N) parameters.
Shard Node Authorization
Sharding nodes have their own private keys and signed node identities. They use those keys to authenticate node responses, sign delivery receipts, and participate in audit flows.
Shard release is gated by wallet signatures, not by a ZK proof. A retrieval request must include:
- Vault ID
- Shard generation ID
- Requesting wallet address or DID binding
- Requested action, such as
read - Nonce and expiry
- Chain or policy context
- Wallet signature over the full request
Each node independently verifies:
- The wallet signature matches the requester.
- The request is fresh and cannot be replayed.
- The wallet is authorized by the current vault policy.
- The node is assigned to the requested shard.
- The shard generation has not been revoked or deleted.
Only then does the node return its encrypted shard bundle. The client still reconstructs and decrypts locally.
Shard Integrity
Each shard bundle is protected by three integrity layers:
1. MAC at the shard level Each shard is authenticated with a Message Authentication Code. Any tampering with a shard is detected during reconstruction, and the corrupted shard is rejected.
2. Manifest commitment A Merkle root or equivalent vector commitment for the shard manifest is anchored at vault creation time. At reconstruction, the client verifies retrieved shards against the committed manifest before decrypting.
3. Signed delivery receipts Nodes sign shard delivery receipts with their node private keys. This creates an auditable trail of which node served which shard generation in response to which wallet-signed request.
Shard Replication and Availability
Beyond the primary N nodes, the protocol can maintain backup replicas to improve availability:
- Each shard has M backup nodes that hold encrypted copies
- Backup nodes activate if a primary node fails availability checks
- The vault owner is never required to manage replication manually
- Replication targets maintain geographic and jurisdictional diversity constraints
Early versions should use challenge-response availability checks before attempting Filecoin-grade PoRep or PoSt. Production storage proofs and slashing are feasible, but they are a later protocol phase.
Shard Lifecycle
Vault Write
-> Shards created, distributed, committed on-chain
-> Nodes begin serving availability checks
Vault Update
-> New shard generation created for the same vault ID
-> Previous generation retained or pruned per policy
-> New manifest commitment anchored
Vault Delete
-> Deletion instruction signed by owner wallet
-> Broadcast to all shard nodes
-> Nodes destroy shards and sign deletion receipts
-> On-chain commitment marked as deleted
Deletion depends on node compliance and economic enforcement. A complete protocol should combine signed deletion receipts, periodic audits, reputation penalties, and eventual slashing for nodes that fail deletion requirements.
Comparison With Alternative Approaches
| Approach | Breach Possible | Single Point of Failure | Privacy by Default |
|---|---|---|---|
| Centralized cloud | Yes | Yes | No |
| Encrypted cloud | Yes (key theft) | Yes | No |
| IPFS / Filecoin | Yes (content addressed, public) | Partial | No |
| Untrace Sharding Layer | No | None | Yes |
IPFS and Filecoin store whole data objects - sharding in those systems is primarily about redundancy, not privacy. Untrace sharding is different: each fragment is encrypted, threshold protected, and released only after shard nodes verify wallet-signed authorization. ZK proofs are used separately in dashboard workflows, such as proving facts from bank statement PDFs after a user already has access to their files.