Tensor - RISC-V MatMul Solver
High-performance matrix multiplication optimized for RISC-V architecture
This project implements a prototype MatMul solver capable of executing Amadeus computer workloads on RISC-V hardware, specifically designed for the ** Miner RISC-V Computer Prototype**.
Challenge Objective
Build the fastest MatMul solver for Amadeus computer workloads on RISC-V architecture. The implementation with the highest performance score wins!Features
Matrix Multiplication Algorithms
1.Basic Algorithm - Standard O(n³) implementation
2.Optimized Algorithm - Cache-blocking with loop unrolling
3.RISC-V Algorithm - Vectorized operations with RISC-V extensions
Performance Optimizations
1.Cache Blocking - Optimal block sizes for L1/L2 cache
2.Loop Unrolling - Reduced loop overhead
3.Memory Alignment - 64-byte alignment for SIMD
4.RISC-V Vector Extensions - SIMD operations where available
5.Prefetching - Memory access optimization
Benchmarking Framework
1.Multiple matrix sizes - 16x16 to 1024x1024
2.Performance metrics - GFLOPS, memory bandwidth, latency
3.Correctness verification - Floating-point accuracy validation
4.Export formats - JSON and CSV for submission
Hardware Targets
1.RISC-V chips (TensTorrent-class hardware)
2.GPU-based simulation (for development)
3.Vector Extensions (RV64GCV or equivalent)
Matrix Operations
1.Data Types: fp32, fp16, int8
2.Sizes: Variable (16x16 to 1024x1024)
3.Algorithms: Basic, Blocked, Vectorized
4.Verification: IEEE 754 floating-point accuracy
Performance Metrics
1.Execution Time (milliseconds)
2.Throughput (GFLOPS)
3.Memory Bandwidth (GB/s)
4.Memory Usage (MB)
5.Correctness (numerical accuracy)
github - https://github.com/ayushsingh82/Tensor
website - https://tensor-ama.vercel.app/
demo video - https://drive.google.com/drive/u/0/folders/1fmojYmKSESP09XUIicYL5XUMRU3KuiM0