We have designed a memory subsystem for a systolic array, serving as the central hub for interconnectivity between the instruction set architecture (ISA), systolic array, and scheduling subsystems. Our approach involves developing an architecture that includes an instruction cache (I-cache) for fetching instructions, a data cache (D-cache) for scalar values, a memory arbiter to manage data flow, and a scratchpad to house memory banks and handle outputs to the systolic array. To validate our design, we will test it using software-controlled main memory, using C++ and utilizing Verilator's DPI-C interface to connect this main memory with our Verilog modules. The I-cache and D-cache are made to be reconfigurable depending on any necessary block organization. The I-cache will be optimized to handle instruction retrieval, while the D-cache will be configured to store any scalar values parsed from the scheduler core. The memory arbiter will direct the flow of data by giving first the scratchpad, then the D-cache, and finally the I-cache priority to main memory. This scratchpad will serve as an intermediary, providing two buses to transmit matrix inputs and weights into a multiplexer feeding into the systolic array. The scratchpad is implemented as a split transaction software-managed cache to be able to handle multiple requests from different sources simultaneously.
Accelerated Matrix Processor[AMP00] Memory Subsystem
May 2, 2025