This report explores performance comparisons among various processor designs: a single-cycle processor, a pipelined processor without caches, a pipelined processor with caches, a multicore processor designed for single-threaded programs, and a multicore processor designed to handle dual-threaded programs. All tests are conducted using the mergesort.asm assembly file for the single-threaded designs, while the dual-threaded multicore processor uses the dual.mergesort.asm file, ensuring consistent functionality and computational objectives across experiments. The metrics compared include estimated synthesis frequency, average clocks per instruction (CPI) with and without caches, average latency per instruction, total execution time with and without caches, the FPGA resources required for each design, and the speedup.
The single-cycle processor, known for its simplicity, avoids hazards and forwarding, making it resource-efficient but slower due to its inherently lower frequency and higher CPI. Pipelined designs, while more complex, address hazards and implement forwarding to improve performance, with the inclusion of caching further reducing latency and execution times. The dual-threaded multicore design leverages parallelism with dual.mergesort.asm, achieving even greater efficiency by processing multiple threads simultaneously. Data analysis, derived from synthesis and sweep reports generated via the Makefile and sweep script, highlights the trade-offs between hardware complexity, resource usage, and performance. While multicore processors require more resources, they deliver significant execution time reductions, particularly for parallel workloads.