← Back to site
1
Infrastructure
2
Workload
3
Operations
4
Results

Infrastructure Setup

Define your deployment model and compute tier.

☁️ Cloud
AWS · Azure · GCP · Multi-cloud
🖥️ On-Premise
Own data centre or colocation
⚡ Hybrid
Cloud bursting + on-prem base
NVIDIA H100 80GB
Latest gen · Best for large LLMs & training
$32/hr cloud · $35K/card on-prem
NVIDIA A100 80GB
High performance · Training & inference
$22/hr cloud · $18K/card on-prem
NVIDIA L40S 48GB
Best value for inference workloads
$13/hr cloud · $10K/card on-prem
NVIDIA V100 32GB
Legacy · Cost-effective smaller models
$3.50/hr cloud · $4.5K/card on-prem

4 nodes · 32 GPUs total

Workload Profile

Define how your cluster will be used month to month.

70%

% of time GPUs are actively processing (industry avg 60–75%)

Training: 40% Inference: 60%
40%

Training uses spot pricing (−60% off on-demand); inference uses reserved (−30%)

Number of model invocations or API calls per day

100
Short Q&A
500
Chat turns
2,000
Doc analysis
5,000
Long context

Operational Context

Team, storage, and compliance drive the hidden costs that most calculators miss.

2 engineers

Estimated at $150K/yr per engineer (on-prem carries +30% infra management overhead)

50 TB

Training datasets, active model weights, feature stores

200 TB

Historical datasets, model checkpoints, audit logs

Standard
General enterprise · No regulated data
Baseline cost
Regulated
HIPAA · SOC 2 · PCI-DSS
+20% overhead
Government
FedRAMP · IL4/IL5 · Air-gapped
+45% overhead

TCO Assessment Complete

AI Cluster Cost Analysis

Based on your configuration

Scenario
Elasticity Impact — Cloud scales to zero during idle weeks; on-prem pays regardless
idle weeks / month
$0
cloud GPU cost during idle
on-prem cost during idle
elasticity saving / month

Monthly Cloud Cost
at current utilisation
3-Year Cloud TCO
reserved + spot mix
3-Year On-Prem TCO
incl. CapEx
Break-Even Point
cloud → on-prem payback
Cost / 1M Tokens
cloud inference (reserved)
Training Run Cost
100 GPU-hr fine-tune (spot)
Monthly GPU-Hours
at utilisation
Hidden Cost Share
egress + support overhead
⚠️
Hidden costs matteregress fees and enterprise support add significant overhead beyond the raw compute bill. Trinfac's ITFM framework surfaces and allocates these costs transparently across business units.
Cloud Cost Breakdown (Monthly)
3-Year TCO: Cloud vs On-Premise by Category
Detailed Cost Comparison
Cost Category Cloud — List Price Cloud — Enterprise ~40% committed-use discount On-Prem (monthly equivalent)
Key Decision Question: On-Prem Operational Readiness

Does your current team have the internal expertise to manage physical GPU infrastructure at scale? H100 clusters require specialist skills in InfiniBand networking, liquid-cooling systems, and firmware lifecycle management — roles that are significantly harder and more expensive to hire for than general ML engineers, and often command 30–50% salary premiums.

If that capability does not exist in-house, factor in either a third-party managed service provider (typically $15K–$40K/month for a cluster of this size) or the ramp-up time to build it before the on-premise TCO advantage becomes real. The 3–6 month hardware lead time for H100 allocations is an additional operational variable not reflected in this model.

Cost Assumptions & Methodology
Cloud costs include:
  • GPU compute at public list pricing (AWS · Azure · GCP)
  • Spot / preemptible discount of 60% applied to training workloads
  • Reserved instance discount of 30% applied to inference workloads
  • Managed object storage (hot at $80/TB/mo, cold at $23/TB/mo)
  • Data egress at $0.09/GB — 10% of hot storage assumed to egress monthly
  • Platform support fee at 5% of compute spend
  • ML engineering staff at $150K/yr fully-loaded cost
On-prem costs include (monthly equivalent):
  • GPU + server + networking hardware amortized over 36 months
  • InfiniBand networking ($10K/node for H100/A100, $3K for others) amortized 36 months
  • Facility readiness — power delivery & cooling ($12K/node H100) amortized 36 months
  • Storage hardware amortized over 36 months
  • Power at $0.10/kWh with 1.4× PUE data-centre overhead
  • Colocation at $1,500/rack — 2 nodes per rack assumed
  • Internet bandwidth at $1,500/month flat
  • Hardware maintenance at 10% of hardware cost per year
  • NVIDIA Enterprise software licenses at $2,500/GPU/year
  • Risk / redundancy premium at 1.5% of hardware CapEx per year
  • Infrastructure staff at 1.5× cloud rate — hardware-ops overhead included
Disclaimers: The Enterprise Cloud column reflects ~40% committed-use discount on compute and ~25% on storage — actual enterprise agreements vary by vendor and volume. On-premise CapEx now includes InfiniBand networking and facility readiness (power delivery & cooling), which are essential for production H100 clusters. Actual GPU hardware lifetime is 4–5 years; 36-month amortization is used for conservatism. Power costs vary by region ($0.04–$0.20/kWh). Idle on-prem GPUs incur full power, colocation, and risk-premium charges regardless of utilisation. H100 hardware allocations typically carry 3–6 month lead times — the model assumes Day 0 availability. One-time migration costs and MLOps pipeline development labour are not included. All figures are planning estimates only; obtain vendor quotes before committing capital.
5-Year Cost Projection
Year Cloud (Annual) Cloud (Cumulative) On-Prem (Annual OpEx) On-Prem (Cumulative incl. CapEx) Difference

Cloud costs apply a 3% annual reduction (market pricing trend). On-prem OpEx applies 3% annual inflation (staff, maintenance). Year 1 on-prem includes full hardware CapEx.

Top Cost Optimisation Levers

Download Your TCO Report

Save This Analysis as a PDF

Enter your email to notify our team, then download this full TCO analysis — including all projections and optimisation levers — directly to your device.

✓ Thanks — our team will follow up with shortly.