A high-level diagram of the Rigel architecture is shown below.
- The baseline Rigel core is 2-wide, in-order, and has a 32-entry unified register file and a single-precision FPU.
- Each core has its own small L1I and L1D caches, and cores are organized into clusters that share a unified L2 cache.
- The L1 and L2 caches within each cluster are coherent with one another, but are not coherent with caches in other clusters.
- All clusters on the chip share a unified L3 cache, also called the global cache.
- Clusters are grouped into tiles, and each tile has a 4-ary tree interconnect aggregating it into a single global network port.
- Tiles are connected to global cache banks via a multi-stage crossbar.
- The L3 cache is backed by high-performance GDDR5 memory controllers.
Cores, network, and caches ran at 1.2GHz by default; the frequency can be changed in $RIGEL_SIM/rigel-sim/include/sim.h.
For more high-level architecture, memory model, and programming model details, see the publications here.
Rigel implements a cycle-accurate GDDR5 DRAM controller with several scheduling algorithms from the literature, including FR-FCFS and FCFS.
The DRAM operates at 6Gbps per pin by default; the frequency and detailed timing constraints can be changed in $RIGEL_SIM/rigel-sim/include/memory/dram.h.
A high-level diagram of Rigel's DRAM architecture is shown below.