Save to shopping list
Create a new shopping list

Blog

NVIDIA Cards for AI - Ada Lovelace and Blackwell in Practice

NVIDIA Cards for AI - Ada Lovelace and Blackwell in Practice

NVIDIA Ada Lovelace & Blackwell for AI – Practical GPU Selection

In projects based on large language models (LLM), the key factor is not theoretical GPU compute power but predictable generation throughput and stability in specific scenarios. This article demonstrates how to select NVIDIA GPUs for AI based on TPS metrics, model size, and deployment scale – from simple chatbots to enterprise environments.

Hardware Context

NVIDIA RTX PRO 6000 – workstation / enterprise GPU
NVIDIA RTX PRO 6000 - an example of a GPU designed for AI workloads, continuous operation, and multi-session environments.

In the following sections, we analyze Ada Lovelace and Blackwell architectures not through marketing benchmarks but real inference scenarios: number of users, model size, and target TPS.

1) TPS (tokens/s): Practical LLM Throughput Metric

In production, theoretical metrics don’t directly translate to user experience. For LLMs, the simplest and most understandable metric is TPS.

LevelTPSTPMTypical Effect
Limited fluidity5 TPS300 tokens/minnoticeable generation delay
Comfortable operation20 TPS1,200 tokens/minstable generation for most use cases
High throughput100 TPS6,000 tokens/mincapable of handling more sessions

Methodological note: TPS depends on model, quantization, context length, inference engine, and multi-session profile.

2) 7B / 13B / 70B – What Model Size Means

7B/13B/70B indicate the number of model parameters: 1B = 1 billion parameters. More parameters usually improve response quality and reasoning ability but increase VRAM and GPU throughput requirements.

ClassParametersTypical Use CasesTarget TPS
7–8B Models7–8Bchatbots, RAG, Q&A, summarization50–100+ TPS
13B Models13Benterprise AI, documents, longer responses40–70 TPS
70B Models70Badvanced analytics, AI agents, expert tasks15–25 TPS
Practical note: a bigger model does not always yield better business results. Often, 13B with stable TPS provides better utility than 70B with low throughput or high latency.

3) Reference to ChatGPT – Model Scale

For comparison: GPT-3 had ~175B parameters. For GPT-4 and newer, OpenAI does not disclose exact parameter counts; estimates vary. Practically, ChatGPT-class services operate at hyperscale and are optimized for parallelism and multi-GPU usage.

LevelParametersInfrastructure Implication
7-13B7-13Busually sufficient for enterprise deployments (RAG/chatbots)
70B70Brequires powerful GPU and careful context/quantization selection
GPT-3~175Bcloud scale; not intended for a single GPU
GPT-4 / newerundisclosedhyperscale + optimizations; 1:1 on-prem comparison is inadequate

4) Mapping Requirements: Scenario → Model → Target TPS

Scenario A

Chatbot / RAG for a department or app

  • Model: 7-8B
  • Goal: stable generation, low latency
  • Target: 50-100+ TPS (single session)
Scenario B

Enterprise AI (complex responses, documents)

  • Model: 13B
  • Goal: better response quality with predictable TPS
  • Target: 40-70 TPS
Scenario C

Advanced analytics and expert tasks

  • Model: 70B
  • Goal: quality and reasoning; trade-off between cost and throughput
  • Target: 15-25 TPS
Scenario D

Enterprise: parallelism + long context

  • Model: 70B+ or multi-session
  • Goal: stable TPS under load, long context (e.g., 32k)
  • Target: 30+ TPS per model + margin for parallelism

5) TPS Comparison: RTX 6000 Ada vs RTX PRO 6000 Blackwell

Approximate TPS ranges for typical inference scenarios. Values are for preliminary sizing and GPU class selection.

ScenarioRTX 6000 AdaRTX PRO 6000 BlackwellInterpretation
LLM 7–8B (FP16/FP8)90-120 TPS
≈ 5,400-7,200 TPM
180-220 TPS
≈ 10,800-13,200 TPM
higher throughput and more margin for parallelism
LLM 13B (FP16/FP8)45-65 TPS
≈ 2,700-3,900 TPM
95-120 TPS
≈ 5,700-7,200 TPM
stable handling of enterprise workloads, more headroom
LLM 70B (INT8 / 4-bit)15-20 TPS
≈ 900-1,200 TPM
30-40 TPS
≈ 1,800-2,400 TPM
Blackwell limits TPS drops under heavier load
Long context (32k)8-12 TPS
≈ 480-720 TPM
18-25 TPS
≈ 1,080-1,500 TPM
critical for large document analysis (law/finance)
Architecture difference in practice: Ada Lovelace is cost-optimal for many inference deployments, while Blackwell justifies the cost when higher parallelism, longer context, and stable TPS under load are required.

Why GeForce is not compared to RTX / RTX PRO in production AI

In AI deployments, the question often arises: “why pay more for professional cards when consumer cards are cheaper?” This is based on the incorrect assumption that they are interchangeable. In reality, consumer and professional cards solve different problems.

RTX / RTX PRO cards are designed for continuous operation, predictable workloads, and production environments where stable TPS, multi-session capability, and running larger models with longer context without compromise matter. These parameters determine solution usability in AI.

  • VRAM and model scale: larger models (13B/70B), long context, and multi-session expose consumer card limitations quickly.
  • 24/7 operation: inference loads are continuous; production stability and predictability matter more than peak performance.
  • Enterprise features: GPU virtualization, optimized drivers, profiles, and multi-user scenarios are the foundation of service deployments.
  • Scaling without degradation: increasing users, context, or query complexity requires maintaining stable TPS.

Consumer cards are not a cheaper alternative for production AI – they are for a different use profile. GPU selection should be based on model, target TPS, and SLA requirements. Hence, production environments naturally use RTX 6000 (Ada Lovelace) and RTX PRO 6000 (Blackwell).

Note: applies to production and multi-session projects. GPU selection should always consider target workload, traffic profile, and SLA requirements.

6) Recommendations – Based on Purpose

Recommendation: Ada Lovelace (RTX 6000 Ada) – when cost/TPS matters

  • Workload: chatbots, RAG, enterprise AI on 7–13B, and 70B at limited scale.
  • Priority: high cost efficiency, predictable TPS in standard scenarios.

Recommendation: Blackwell (RTX PRO 6000 Blackwell) – when scale and SLA matter

  • Workload: 70B+ in multi-session environment, long context, enterprise requirements.
  • Priority: higher throughput, stable under load, margin for parallelism.
Conclusion: The most expensive card is justified in projects requiring parallelism, long context, and stable TPS in production.

GPU Selection for AI at ESUS IT

We tailor GPU configurations to specific model, context, number of sessions, and target TPS/SLA. We can provide sizing and architecture recommendation (Ada/Blackwell) if needed.

Methodological note: TPS ranges are approximate. Results depend on model, quantization, context, inference engine, parallelism parameters, and platform configuration (drivers, CPU/RAM, power limits, and cooling).

© ESUS IT • Educational Material: GPUs for AI

Do you have questions? Write to our expert
Get an answer within 24 hours*
*From Mon-Fri: 8:00am-4:00pm
pixel