A high-level diagram of the Rigel architecture is shown below.

Rigel Architecture Diagram

  • The baseline Rigel core is 2-wide, in-order, and has a 32-entry unified register file and a single-precision FPU.
  • Each core has its own small L1I and L1D caches, and cores are organized into clusters that share a unified L2 cache.
  • The L1 and L2 caches within each cluster are coherent with one another, but are not coherent with caches in other clusters.
  • All clusters on the chip share a unified L3 cache, also called the global cache.
  • Clusters are grouped into tiles, and each tile has a 4-ary tree interconnect aggregating it into a single global network port.
  • Tiles are connected to global cache banks via a multi-stage crossbar.
  • The L3 cache is backed by high-performance GDDR5 memory controllers.

Cores, network, and caches ran at 1.2GHz by default; the frequency can be changed in $RIGEL_SIM/rigel-sim/include/sim.h.

For more high-level architecture, memory model, and programming model details, see the publications here.

Rigel implements a cycle-accurate GDDR5 DRAM controller with several scheduling algorithms from the literature, including FR-FCFS and FCFS. The DRAM operates at 6Gbps per pin by default; the frequency and detailed timing constraints can be changed in $RIGEL_SIM/rigel-sim/include/memory/dram.h.

A high-level diagram of Rigel's DRAM architecture is shown below.

Rigel Memory System