Define how your cluster will be used month to month.
70%
% of time GPUs are actively processing (industry avg 60–75%)
Training: 40%Inference: 60%
40%
Training uses spot pricing (−60% off on-demand); inference uses reserved (−30%)
Number of model invocations or API calls per day
100
Short Q&A
500
Chat turns
2,000
Doc analysis
5,000
Long context
Operational Context
Team, storage, and compliance drive the hidden costs that most calculators miss.
2 engineers
Estimated at $150K/yr per engineer (on-prem carries +30% infra management overhead)
50 TB
Training datasets, active model weights, feature stores
200 TB
Historical datasets, model checkpoints, audit logs
Standard
General enterprise · No regulated data
Baseline cost
Regulated
HIPAA · SOC 2 · PCI-DSS
+20% overhead
Government
FedRAMP · IL4/IL5 · Air-gapped
+45% overhead
TCO Assessment Complete
AI Cluster Cost Analysis
Based on your configuration
Scenario
Active weeks / month:4 wks
Elasticity Impact — Cloud scales to zero during idle weeks; on-prem pays regardless
—
idle weeks / month
$0
cloud GPU cost during idle
—
on-prem cost during idle
—
elasticity saving / month
Monthly Cloud Cost
—
at current utilisation
3-Year Cloud TCO
—
reserved + spot mix
3-Year On-Prem TCO
—
incl. CapEx
Break-Even Point
—
cloud → on-prem payback
Cost / 1M Tokens
—
cloud inference (reserved)
Training Run Cost
—
100 GPU-hr fine-tune (spot)
Monthly GPU-Hours
—
at utilisation
Hidden Cost Share
—
egress + support overhead
⚠️
Hidden costs matter — egress fees and enterprise support add significant overhead beyond the raw compute bill. Trinfac's ITFM framework surfaces and allocates these costs transparently across business units.
Does your current team have the internal expertise to manage physical GPU infrastructure at scale? H100 clusters require specialist skills in InfiniBand networking, liquid-cooling systems, and firmware lifecycle management — roles that are significantly harder and more expensive to hire for than general ML engineers, and often command 30–50% salary premiums.
If that capability does not exist in-house, factor in either a third-party managed service provider (typically $15K–$40K/month for a cluster of this size) or the ramp-up time to build it before the on-premise TCO advantage becomes real. The 3–6 month hardware lead time for H100 allocations is an additional operational variable not reflected in this model.
Cost Assumptions & Methodology
Cloud costs include:
GPU compute at public list pricing (AWS · Azure · GCP)
Spot / preemptible discount of 60% applied to training workloads
Reserved instance discount of 30% applied to inference workloads
Managed object storage (hot at $80/TB/mo, cold at $23/TB/mo)
Data egress at $0.09/GB — 10% of hot storage assumed to egress monthly
Platform support fee at 5% of compute spend
ML engineering staff at $150K/yr fully-loaded cost
On-prem costs include (monthly equivalent):
GPU + server + networking hardware amortized over 36 months
InfiniBand networking ($10K/node for H100/A100, $3K for others) amortized 36 months
Power at $0.10/kWh with 1.4× PUE data-centre overhead
Colocation at $1,500/rack — 2 nodes per rack assumed
Internet bandwidth at $1,500/month flat
Hardware maintenance at 10% of hardware cost per year
NVIDIA Enterprise software licenses at $2,500/GPU/year
Risk / redundancy premium at 1.5% of hardware CapEx per year
Infrastructure staff at 1.5× cloud rate — hardware-ops overhead included
Disclaimers: The Enterprise Cloud column reflects ~40% committed-use discount on compute and ~25% on storage — actual enterprise agreements vary by vendor and volume. On-premise CapEx now includes InfiniBand networking and facility readiness (power delivery & cooling), which are essential for production H100 clusters. Actual GPU hardware lifetime is 4–5 years; 36-month amortization is used for conservatism. Power costs vary by region ($0.04–$0.20/kWh). Idle on-prem GPUs incur full power, colocation, and risk-premium charges regardless of utilisation. H100 hardware allocations typically carry 3–6 month lead times — the model assumes Day 0 availability. One-time migration costs and MLOps pipeline development labour are not included. All figures are planning estimates only; obtain vendor quotes before committing capital.
5-Year Cost Projection
Year
Cloud (Annual)
Cloud (Cumulative)
On-Prem (Annual OpEx)
On-Prem (Cumulative incl. CapEx)
Difference
Cloud costs apply a 3% annual reduction (market pricing trend). On-prem OpEx applies 3% annual inflation (staff, maintenance). Year 1 on-prem includes full hardware CapEx.
Top Cost Optimisation Levers
Download Your TCO Report
Save This Analysis as a PDF
Enter your email to notify our team, then download this full TCO analysis — including all projections and optimisation levers — directly to your device.