Cache MX On-chip memory expansion
The Cache MX IP compresses on-chip L2, L3 SRAM cache enabling 2x effective capacity. SRAM Caches can take upto 30-50% of an SoC xPU silicon real estate and a significant power budget that increases with physical dimensions. While digital logic scales effectively with process technology node shrink, SRAM essentially stopped scaling from 5nm to 3nm technology nodes. The number of compute cores demands higher SRAM capacity to effectively scale compute IPC performance. Increasing SRAM area can negatively impact both the die cost as well as die yield. Cache MX offers a power, area and cost effective alternative to enable performance scaling with single digit latency.
Overview
The Cache MX IP compresses on-chip L2, L3 SRAM cache enabling 2x effective capacity. SRAM Caches can take upto 30-50% of an SoC xPU silicon real estate and a significant power budget that increases with physical dimensions. While digital logic scales effectively with process technology node shrink, SRAM essentially stopped scaling from 5nm to 3nm technology nodes. The number of compute cores demands higher SRAM capacity to effectively scale compute IPC performance. Increasing SRAM area can negatively impact both the die cost as well as die yield. Cache MX offers a power, area and cost effective alternative to enable performance scaling with single digit latency.
Standards
Z-Trainless (proprietary)
Z-ZID (proprietary)
Architecture
Modular architecture, enables seamless scalability: Multiple, independent Cache MX instances can coexist within SoC without requiring co-ordination
Architectural configuration parameters accessible to fine tune performance
HDL Source Licenses
Synthesizable System Verilog RTL (encrypted)
Implementation constraints
UVM testbench (self-checking)
Vectors for testbench and expected results
User Documentation
Features
On-the-fly compression / decompression of cache lines
Optional secureTraining on metadata capability
Silicon Verified TSMC N5
On-the-fly Multi-algorithm switching capability without recompression
Deliverables
Performance evaluation license C++ compression model for integration in customer performance simulation model
FPGA evaluation license
Encrypted IP delivery (Xilinx)
Applications
Server xPUs, Smart devices, and Embedded systems deal with a a wide range of workloaddata sets from diverse applications. Cache MX has been evaluated across a wide range of workload benchmarks including high performance compute benchmarks like SPEC2017INT, SPEC2017FP, AI/ ML Benchmarks like MLPerf Training, database benchmarks including Renaissance and MonetDB+ TPC-H. Cache MX delivers 2X compression on average across a wide suite of benchmarks at single digit latencies.
Integration
The Cache MX IP contains the compression and decompression accelerators as an integrated block that can be easily integrated into the SoC Cache controller design. The Tag array is decoupled from the Data array. The Tag array is doubled in size to accommodate the additional tags needed to address more blocks. The Data array remains unchanged. Optional Custom integration into existing.
Benefits
The Cache MX compression solution increases the (L2$, L3$, SLC) cache capacity by 2x at an 80% area and power saving to comparable SRAM capacity. Real-time compression, compaction and transparent memory management. Operating at cache speed and throughput.
Performance / KPI
Feature | Performance |
Compression ratio: | 2x across diverse data sets |
Latency read & write (cycles): | 5 cycles (ZSD algorithm) |
Added latency read & write (%): | 7% (L3$, SLC), 33% (L2$) |
Performance acceleration: | 15-30% |
Frequency: | L2$, L3$, SLC speed |
IP area: | Starting at 0.1mm2 (@5nm TSMC) excluding customer-required Tag array modifications |
Memory technologies supported: | On-chip SRAM and VCACHE |
System integration of Cache MX
There are three steps to integrate Cache MX:
When implementing Cach-MX the cache controller is enhanced with: Compression / decompression engines packaged in an IP block.
The Tag-array must be decoupled from data-array and the number of tags are increased, to address more blocks.
Lastly, there is a slight tag modifications (extra metadata) to support more logical blocks per physical frame in the data array. (data-array remains unchanged).
Cache MX
The Cache MX compression solution increases the cache capacity by 2x at an 80% area and power saving to comparable SRAM capacity.
Ziptilion™ MX
High performance and low latency hardware accelerated compression at unmatched power efficiency.
Ziptilion™ BW
Delivers up to 25% more (LP)DDR bandwidth at nominal frequency and power, enabling a significantly more performance and energy efficient SoC.
DenseMem
Double the CXL connected memory capacity with data DenseMem.
NVMe expansion
Extend NvMe storage capacity 2-4x with LZ4 or zstd hardware accelerated compression.
SphinX
High Performance and Low Latency AES-XTS industry-standard encryption / decryption. Independent non-blocking encryption and decryption channels.