Vasuki Research Project

This is the accompanying report for a group research project I worked on called "Vasuki: Minimizing Makespan for Offline LLM Batch Inference". The project was focused on developing a first-class offline LLM inference serving system leveraging KV cache offloading and bin-packing to navigate the induced memory-makespan-cost tradeoff space on commodity hardware. The project report is available for download here.

Share on

Twitter Facebook LinkedIn

Akaash Parthasarathy

Share on