On-Prem LLM Deployment Workshop — H100 to vLLM in 3 hours
A three-hour, hands-on workshop where you take a 70B-class open-weights model from cold disk to a stable production endpoint. You bring a laptop and a willingness to read logs. We provide remote H100 hardware, the model weights, and a working reference repository.
The agenda is deliberately narrow. We cover quantization tradeoffs, vLLM configuration for sustained throughput, KV cache sizing, request batching, and the observability you actually need on day one. By the end of the session you will have served real traffic against your own endpoint and you will leave with a runbook that documents every command, file, and config value that produced it.
This is not an architecture overview. It is a working session for people who are going to deploy a model next month and would prefer to skip the first ten failures. $250 CAD. Capped at twelve attendees so screen-shares stay readable.

