One of the main differences between @sunscreentech and other FHE companies is that we chose circuit bootstrapping (CBS) over programmable bootstrapping (PBS) in our tech stack. Here's why we did that 👇🧵 First of all, what is bootstrapping? Bootstrapping is the most sophisticated and compute-intensive component of an FHE scheme. It is a technique that allows for the refreshment of ciphertexts, essentially reducing the accumulated noise from homomorphic operations and enabling further computations. Programmable Bootstrapping (used by @zama_fhe) refreshes noise and evaluates a lookup table in a single step. It takes an LWE ciphertext as input and returns a new LWE ciphertext, ready for the next lookup. The per-bootstrap latency is low, so on isolated gates it looks attractive. The trade-off is sequential dependence. Real programs require a chain of bootstraps and the linear dependencies between these operations means that computations are prevented from running in parallel. This leaves the majority of the compute resources (cores) idle. Circuit Bootstrapping (used by @sunscreentech) follows a different path. The bootstrap still consumes an LWE ciphertext, but the output is a GGSW “selector” expressly designed for CMUX operations. Each CMUX is far cheaper than a bootstrap, and because CMUX trees are embarrassingly parallel, they can be efficiently distributed among many compute resources before another expensive bootstrap operation is required. That change in dependency structure is decisive; it lets our runtime saturate many-core CPUs and GPUs today and maps cleanly onto forthcoming FHE accelerators. CMUX are a widely used foundation in computing hardware, and as such we can utilize decades of work to enable general purpose computing from simple CMUX parts. PBS circuits generally demand bespoke handling for negacyclic indexing, LUT padding, and format conversions, all of which slow iteration and increase the surface area for bugs. Note however that tfhe-rs abstracts away almost all of this work for PBS if you simply use their default parameters. Data reuse matters as workloads scale. A GGSW selector produced by one CBS can drive multiple CMUXes, amortizing the expensive step across a wide sub-circuit. PBS offers no comparable reuse; every new gate incurs a fresh bootstrap. When we benchmarked full 16 and 32-bit arithmetic, the CBS-CMUX pipeline consistently executed with fewer sequential bootstraps and higher overall throughput. Those gains widen as core counts rise, and they align with our long-term hardware roadmap. For our team @sunscreentech, CBS delivered the right balance: predictable parallelism, a cleaner compute story, and a performance curve that improves with hardware instead of stalling against sequential bottlenecks. That is why CBS is the foundation of our stack and why we continue to double down on its ecosystem.
1,59K