Sovereign AI Box Canada: H100 LLM agents on your hardware
Starting at $45,000 CADSovereign AI Box Canada bundles an H100 GPU with an open-weight LLM and an agent runtime, installed on your hardware in Canada. Eight to sixteen weeks from order to operational, with sovereign data residency by default.
Scope of engagement
What you get
- Hardware tier choice across single H100, dual H100, and eight-H100 configurations with NVLink. This is the spine of every Sovereign AI Box Canada engagement.
- Region choice across AWS Canada Central, Toronto on-prem, Montreal on-prem, and hybrid (on-prem training plus Canadian-region cloud inference)
- LLM size choice across the 8B-parameter Llama 3 or Qwen 2.5 tier, the 70B-parameter Llama 3.1 tier, the 405B-parameter Llama 3.1 research-grade tier, and a mixture tier for multi-model routing
- Agent count choice across one to three agents, four to ten agents, and eleven-plus agents with custom scope priced on the kickoff call
- Installer runbook plus five days of on-site or remote first-deploy onboarding
- Observability stack and audit-trail tooling wired to Canadian-region log stores
- Twelve months hardware-health monitoring across GPU, NVLink fabric, and power
- Quarterly architecture review covering model updates, scaling decisions, and security posture
Timeline
8 to 16 weeks from order to operational for the Sovereign AI Box Canada stack
Deliverables
- Hardware procurement spec sized to your chosen tier (single H100 / dual H100 / 8x H100), region, and concurrency target. This is the core Sovereign AI Box Canada deliverable.
- Installation runbook covering rack mount, NVLink fabric, networking, and power and cooling validation
- First-deploy onboarding running five days on-site at the operator datacentre OR five days remote on a Canadian-region cloud account
- Open-weight LLM bundle from the Llama 3.1 family or the Qwen 2.5 family; operator picks the licence path on the kickoff call
- Agent runtime supporting one to eleven plus agents per Box with routing, tool use, and structured event emission
- Observability stack with structured event logs, prompt-traffic visibility, and on-call alerting wired to a Canadian-region log store
- Audit-trail tooling exporting prompt and completion records to a tamper-evident store for regulator review
- Twelve months hardware-health monitoring covering GPU temperature, NVLink fabric, power draw, and cooling capacity
- Quarterly architecture review across the contract year covering model updates, scaling decisions, and security posture
- Handover documentation covering admin access, model retraining workflow, and the agent-promotion path from development to production
Prerequisites
- Datacentre or colocation space OR a Canadian-region cloud commitment (AWS Canada Central is the default; partner referrals available for colocation)
- Network capacity at ten gigabits per second internal and at least one gigabit per second egress so the agent runtime can call external tools where required
- Power and cooling capacity sized to the chosen hardware tier: roughly 700 watts per H100 for the single-H100 tier, scaling to 5.6 kilowatts plus headroom for the eight-H100 tier
- A signed compliance or security stakeholder inside your organisation who owns the audit posture (the Box ships ITSG-33 aware with Protected B as the default categorisation target)
- A model-licence decision on the kickoff call: Llama 3.1 community licence OR Qwen 2.5 Apache 2.0 OR an alternative open-weight family you have already licensed
Who this is for
- Healthcare buyers running patient data under PHIPA or equivalent provincial privacy frameworks who need an on-prem inference path for an AI deployment
- Fintech buyers running client data under OSFI guidelines who need a hardware-controlled inference path because SaaS LLM APIs do not clear their data-residency review
- Defence and public-sector buyers targeting Protected B classification under ITSG-33 who need hardware procurement plus an open-weight model plus an agent runtime delivered as one engagement
- Federal procurement targets pursuing a Canadian-supplier wedge under the federal AI procurement directives
- Private-sector buyers running fifty to five hundred staff with a data-residency mandate from their security stakeholder and a budget for a hardware-on-prem AI deployment
Customize this engagement
Live configurator arrives in milestone 2. For now, mention any custom scope on the kickoff call.
Frequently asked
What does 'sovereign' actually mean for the Sovereign AI Box Canada stack?
Sovereignty here covers four guarantees. First, data never leaves your Canadian jurisdiction at any point in the prompt and completion path. Second, there is no third-party prompt logging because no third party sits in the path. Third, every prompt, every completion, and every tool call lands in a tamper-evident audit trail you control. Fourth, the evidence pack we ship is regulator-ready against ITSG-33 with a Protected B classification target and against the Treasury Board Directive on Service and Digital. SaaS LLM APIs hit a wall on at least one of these guarantees. The Box clears all four.
Can I run the Box on AWS Canada Central or does it have to be on-prem?
The Box ships four region options. AWS Canada Central suits buyers who want a Canadian-region cloud footprint without colocation. Toronto on-prem suits buyers who already have a Toronto datacentre or a colocation contract. Montreal on-prem suits Quebec buyers under provincial data-residency mandates, including Law 25. Hybrid suits buyers who run model training on-prem and run production inference in the Canadian cloud. The Box recipe lifts cleanly across all four. The choice happens on the kickoff call and is recorded in the architecture decision record.
Which open-weight LLM do I get with the Sovereign AI Box Canada?
The default model families are Llama 3.1 and Qwen 2.5. Llama 3.1 ships under the Meta community licence with the usage and naming clauses that licence requires. Qwen 2.5 ships under Apache 2.0 with the more permissive redistribution terms. The operator picks the licence path on the kickoff call. We also support other open-weight families if the operator has already licensed a model from the Mistral, DeepSeek, or research-foundation pools. Custom model training is excluded from the Box engagement and runs through a separate scope.
What is the configurator for and why does it matter?
The configurator matches your workload to a specific hardware tier, region, LLM size, and agent count. A buyer running a 70B production chat workload at moderate concurrency lands on a dual-H100 tier with Toronto on-prem region and 4-to-10 agents. A buyer running 405B research workloads at heavy concurrency lands on an eight-H100 tier with NVLink and a single high-quality model. The configurator codifies that decision before procurement starts. The live M2 configurator ships in a future plugin milestone; until then, the dimension table on this page renders the choice surface buyers walk through on the kickoff call.
What is the typical timeline from order to operational on a Sovereign AI Box Canada deployment?
Timeline runs eight to sixteen weeks from order to operational. Hardware lead time on the chosen NVIDIA H100 SKU drives the wide end of the range. The single-H100 tier typically lands at eight to ten weeks. The dual-H100 tier typically lands at ten to twelve weeks. The eight-H100 tier typically lands at twelve to sixteen weeks because the NVLink fabric ordering and the power-and-cooling validation steps both add lead time. We start agent runtime work and observability staging in parallel with hardware delivery so the operational date does not slip on installation alone.
What happens if I outgrow the chosen tier on the Sovereign AI Box Canada?
Three upgrade paths exist. First, you add a second Box in parallel and route workloads across both via the agent runtime. This is the cleanest path when the bottleneck is concurrency, not model capability. Second, you scale the tier in place by upgrading the GPU configuration, typically single H100 to dual H100 or dual H100 to eight-H100 with NVLink. This is the right path when the bottleneck is model size and you need 405B in production. Third, you move from a single-model deployment to a mixture deployment where the agent runtime routes prompts across multiple specialised models on one Box. Each upgrade path is documented in the handover runbook with cost and timeline estimates.
