An important goal of this project is to build an open, integrated, easy-to-use platform for many-core architecture exploration, including a simulator, a compiler toolchain, parallel libraries and applications, and physical design components like RTL. We believe that many of the innovations needed to combine the advantages of general-purpose processors and accelerators will require cross-cutting changes to multiple layers of the computing stack; for example, quantifying the performance, area, power, and programmability impact of an instruction set extension requires integrated RTL, simulator, compiler, libraries, and applications. Most existing tools in this area only cover a subset of these layers, requiring the user to piece together multiple projects in an ad-hoc way or develop their own models for the missing layers. By providing an integrated ecosystem of tools, we hope to enable students and researchers to build reusable and sharable infrastructure for parallel architecture, compiler, programming model, and applications research.
Performance prediction and modeling is more difficult for parallel systems than for uniprocessors, particularly in the shared memory system. Uniprocessor simulators often use heuristics to estimate timing; for example, it is common to simplify the simulator by assigning a fixed latency to cache misses or DRAM accesses. These heuristics provide reasonable performance estimates for uniprocessors, but can provide misleading results for multi-core systems, especially many-core systems where shared caches and memory controllers are highly contended. Our simulator, RigelSim, is execution-driven and models execution structurally, as opposed to simulators which are trace-driven or use event queues. We believe these design choices encourage accurate timing simulation by coupling execution correctness with timing correctness. While these choices also make the simulator more difficult to extend in some cases, it also makes it more difficult to extend incorrectly.
Many existing simulators use an existing commercial instruction set architecture (ISA), which allows them to use existing compilers, applications, and operating systems. Since one of the goals of the project is to investigate an architecture for future massively parallel applications, the lack of off-the-shelf software was not seen as a large downside. Furthermore, using a new ISA enables easier experimentation for two reasons. First, in order to take advantage of existing software, compilers, and operating systems, any instruction set extensions or core models developed by a researcher must support the full instruction set and semantics of the existing architecture. For example, most instruction sets have an associated memory consistency and cache coherence model; these two aspects of an architecture are very important to parallel performance and efficiency, and we would prefer to allow researchers building on the Rigel architecture to experiment them, rather than be tied to the design choices made for the original ISA. Additionally, defining a small ISA makes core models easier to develop. We have released an LLVM-based compiler toolchain integrated with our libraries, applications, and simulator to make cross-compilation for Rigel as easy as possible.