Sovereign AI Box Canada: H100 LLM agents on your hardware

Starting at $45,000 CAD

Sovereign AI Box Canada bundles an H100 GPU with an open-weight LLM and an agent runtime, installed on your hardware in Canada. Eight to sixteen weeks from order to operational, with sovereign data residency by default.

Book a Kickoff Call Download Engagement Brief

Scope of engagement

What you get

Hardware tier choice across single H100, dual H100, and eight-H100 configurations with NVLink. This is the spine of every Sovereign AI Box Canada engagement.
Region choice across AWS Canada Central, Toronto on-prem, Montreal on-prem, and hybrid (on-prem training plus Canadian-region cloud inference)
LLM size choice across the 8B-parameter Llama 3 or Qwen 2.5 tier, the 70B-parameter Llama 3.1 tier, the 405B-parameter Llama 3.1 research-grade tier, and a mixture tier for multi-model routing
Agent count choice across one to three agents, four to ten agents, and eleven-plus agents with custom scope priced on the kickoff call
Installer runbook plus five days of on-site or remote first-deploy onboarding
Observability stack and audit-trail tooling wired to Canadian-region log stores
Twelve months hardware-health monitoring across GPU, NVLink fabric, and power
Quarterly architecture review covering model updates, scaling decisions, and security posture

Timeline

8 to 16 weeks from order to operational for the Sovereign AI Box Canada stack

Deliverables

Hardware procurement spec sized to your chosen tier (single H100 / dual H100 / 8x H100), region, and concurrency target. This is the core Sovereign AI Box Canada deliverable.
Installation runbook covering rack mount, NVLink fabric, networking, and power and cooling validation
First-deploy onboarding running five days on-site at the operator datacentre OR five days remote on a Canadian-region cloud account
Open-weight LLM bundle from the Llama 3.1 family or the Qwen 2.5 family; operator picks the licence path on the kickoff call
Agent runtime supporting one to eleven plus agents per Box with routing, tool use, and structured event emission
Observability stack with structured event logs, prompt-traffic visibility, and on-call alerting wired to a Canadian-region log store
Audit-trail tooling exporting prompt and completion records to a tamper-evident store for regulator review
Twelve months hardware-health monitoring covering GPU temperature, NVLink fabric, power draw, and cooling capacity
Quarterly architecture review across the contract year covering model updates, scaling decisions, and security posture
Handover documentation covering admin access, model retraining workflow, and the agent-promotion path from development to production

Prerequisites

Datacentre or colocation space OR a Canadian-region cloud commitment (AWS Canada Central is the default; partner referrals available for colocation)
Network capacity at ten gigabits per second internal and at least one gigabit per second egress so the agent runtime can call external tools where required
Power and cooling capacity sized to the chosen hardware tier: roughly 700 watts per H100 for the single-H100 tier, scaling to 5.6 kilowatts plus headroom for the eight-H100 tier
A signed compliance or security stakeholder inside your organisation who owns the audit posture (the Box ships ITSG-33 aware with Protected B as the default categorisation target)
A model-licence decision on the kickoff call: Llama 3.1 community licence OR Qwen 2.5 Apache 2.0 OR an alternative open-weight family you have already licensed

Who this is for

Healthcare buyers running patient data under PHIPA or equivalent provincial privacy frameworks who need an on-prem inference path for an AI deployment
Fintech buyers running client data under OSFI guidelines who need a hardware-controlled inference path because SaaS LLM APIs do not clear their data-residency review
Defence and public-sector buyers targeting Protected B classification under ITSG-33 who need hardware procurement plus an open-weight model plus an agent runtime delivered as one engagement
Federal procurement targets pursuing a Canadian-supplier wedge under the federal AI procurement directives
Private-sector buyers running fifty to five hundred staff with a data-residency mandate from their security stakeholder and a budget for a hardware-on-prem AI deployment

Customize this engagement

Live configurator arrives in milestone 2. For now, mention any custom scope on the kickoff call.

Frequently asked

What does 'sovereign' actually mean for the Sovereign AI Box Canada stack?

Sovereignty here covers four guarantees. First, data never leaves your Canadian jurisdiction at any point in the prompt and completion path. Second, there is no third-party prompt logging because no third party sits in the path. Third, every prompt, every completion, and every tool call lands in a tamper-evident audit trail you control. Fourth, the evidence pack we ship is regulator-ready against ITSG-33 with a Protected B classification target and against the Treasury Board Directive on Service and Digital. SaaS LLM APIs hit a wall on at least one of these guarantees. The Box clears all four.

Can I run the Box on AWS Canada Central or does it have to be on-prem?

The Box ships four region options. AWS Canada Central suits buyers who want a Canadian-region cloud footprint without colocation. Toronto on-prem suits buyers who already have a Toronto datacentre or a colocation contract. Montreal on-prem suits Quebec buyers under provincial data-residency mandates, including Law 25. Hybrid suits buyers who run model training on-prem and run production inference in the Canadian cloud. The Box recipe lifts cleanly across all four. The choice happens on the kickoff call and is recorded in the architecture decision record.

Which open-weight LLM do I get with the Sovereign AI Box Canada?

The default model families are Llama 3.1 and Qwen 2.5. Llama 3.1 ships under the Meta community licence with the usage and naming clauses that licence requires. Qwen 2.5 ships under Apache 2.0 with the more permissive redistribution terms. The operator picks the licence path on the kickoff call. We also support other open-weight families if the operator has already licensed a model from the Mistral, DeepSeek, or research-foundation pools. Custom model training is excluded from the Box engagement and runs through a separate scope.

What is the configurator for and why does it matter?

The configurator matches your workload to a specific hardware tier, region, LLM size, and agent count. A buyer running a 70B production chat workload at moderate concurrency lands on a dual-H100 tier with Toronto on-prem region and 4-to-10 agents. A buyer running 405B research workloads at heavy concurrency lands on an eight-H100 tier with NVLink and a single high-quality model. The configurator codifies that decision before procurement starts. The live M2 configurator ships in a future plugin milestone; until then, the dimension table on this page renders the choice surface buyers walk through on the kickoff call.

What is the typical timeline from order to operational on a Sovereign AI Box Canada deployment?

Timeline runs eight to sixteen weeks from order to operational. Hardware lead time on the chosen NVIDIA H100 SKU drives the wide end of the range. The single-H100 tier typically lands at eight to ten weeks. The dual-H100 tier typically lands at ten to twelve weeks. The eight-H100 tier typically lands at twelve to sixteen weeks because the NVLink fabric ordering and the power-and-cooling validation steps both add lead time. We start agent runtime work and observability staging in parallel with hardware delivery so the operational date does not slip on installation alone.

What happens if I outgrow the chosen tier on the Sovereign AI Box Canada?

Three upgrade paths exist. First, you add a second Box in parallel and route workloads across both via the agent runtime. This is the cleanest path when the bottleneck is concurrency, not model capability. Second, you scale the tier in place by upgrading the GPU configuration, typically single H100 to dual H100 or dual H100 to eight-H100 with NVLink. This is the right path when the bottleneck is model size and you need 405B in production. Third, you move from a single-model deployment to a mixture deployment where the agent runtime routes prompts across multiple specialised models on one Box. Each upgrade path is documented in the handover runbook with cost and timeline estimates.

Sovereign AI Box Canada: H100 LLM agents on your hardware

Scope of engagement

What you get

Timeline

Deliverables

Prerequisites

Who this is for

Customize this engagement

Frequently asked

Services

Products

Resources

Policies

Get in touch