Blog
NVIDIA Cards for AI - Ada Lovelace and Blackwell in Practice

NVIDIA Ada Lovelace & Blackwell for AI – Practical GPU Selection
In projects based on large language models (LLM), the key factor is not theoretical GPU compute power but predictable generation throughput and stability in specific scenarios. This article demonstrates how to select NVIDIA GPUs for AI based on TPS metrics, model size, and deployment scale – from simple chatbots to enterprise environments.
Hardware Context

In the following sections, we analyze Ada Lovelace and Blackwell architectures not through marketing benchmarks but real inference scenarios: number of users, model size, and target TPS.
1) TPS (tokens/s): Practical LLM Throughput Metric
In production, theoretical metrics don’t directly translate to user experience. For LLMs, the simplest and most understandable metric is TPS.
| Level | TPS | TPM | Typical Effect |
|---|---|---|---|
| Limited fluidity | 5 TPS | 300 tokens/min | noticeable generation delay |
| Comfortable operation | 20 TPS | 1,200 tokens/min | stable generation for most use cases |
| High throughput | 100 TPS | 6,000 tokens/min | capable of handling more sessions |
Methodological note: TPS depends on model, quantization, context length, inference engine, and multi-session profile.
2) 7B / 13B / 70B – What Model Size Means
7B/13B/70B indicate the number of model parameters: 1B = 1 billion parameters. More parameters usually improve response quality and reasoning ability but increase VRAM and GPU throughput requirements.
| Class | Parameters | Typical Use Cases | Target TPS |
|---|---|---|---|
| 7–8B Models | 7–8B | chatbots, RAG, Q&A, summarization | 50–100+ TPS |
| 13B Models | 13B | enterprise AI, documents, longer responses | 40–70 TPS |
| 70B Models | 70B | advanced analytics, AI agents, expert tasks | 15–25 TPS |
3) Reference to ChatGPT – Model Scale
For comparison: GPT-3 had ~175B parameters. For GPT-4 and newer, OpenAI does not disclose exact parameter counts; estimates vary. Practically, ChatGPT-class services operate at hyperscale and are optimized for parallelism and multi-GPU usage.
| Level | Parameters | Infrastructure Implication |
|---|---|---|
| 7-13B | 7-13B | usually sufficient for enterprise deployments (RAG/chatbots) |
| 70B | 70B | requires powerful GPU and careful context/quantization selection |
| GPT-3 | ~175B | cloud scale; not intended for a single GPU |
| GPT-4 / newer | undisclosed | hyperscale + optimizations; 1:1 on-prem comparison is inadequate |
4) Mapping Requirements: Scenario → Model → Target TPS
Chatbot / RAG for a department or app
- Model: 7-8B
- Goal: stable generation, low latency
- Target: 50-100+ TPS (single session)
Enterprise AI (complex responses, documents)
- Model: 13B
- Goal: better response quality with predictable TPS
- Target: 40-70 TPS
Advanced analytics and expert tasks
- Model: 70B
- Goal: quality and reasoning; trade-off between cost and throughput
- Target: 15-25 TPS
Enterprise: parallelism + long context
- Model: 70B+ or multi-session
- Goal: stable TPS under load, long context (e.g., 32k)
- Target: 30+ TPS per model + margin for parallelism
5) TPS Comparison: RTX 6000 Ada vs RTX PRO 6000 Blackwell
Approximate TPS ranges for typical inference scenarios. Values are for preliminary sizing and GPU class selection.
| Scenario | RTX 6000 Ada | RTX PRO 6000 Blackwell | Interpretation |
|---|---|---|---|
| LLM 7–8B (FP16/FP8) | 90-120 TPS ≈ 5,400-7,200 TPM | 180-220 TPS ≈ 10,800-13,200 TPM | higher throughput and more margin for parallelism |
| LLM 13B (FP16/FP8) | 45-65 TPS ≈ 2,700-3,900 TPM | 95-120 TPS ≈ 5,700-7,200 TPM | stable handling of enterprise workloads, more headroom |
| LLM 70B (INT8 / 4-bit) | 15-20 TPS ≈ 900-1,200 TPM | 30-40 TPS ≈ 1,800-2,400 TPM | Blackwell limits TPS drops under heavier load |
| Long context (32k) | 8-12 TPS ≈ 480-720 TPM | 18-25 TPS ≈ 1,080-1,500 TPM | critical for large document analysis (law/finance) |
Why GeForce is not compared to RTX / RTX PRO in production AI
In AI deployments, the question often arises: “why pay more for professional cards when consumer cards are cheaper?” This is based on the incorrect assumption that they are interchangeable. In reality, consumer and professional cards solve different problems.
RTX / RTX PRO cards are designed for continuous operation, predictable workloads, and production environments where stable TPS, multi-session capability, and running larger models with longer context without compromise matter. These parameters determine solution usability in AI.
- VRAM and model scale: larger models (13B/70B), long context, and multi-session expose consumer card limitations quickly.
- 24/7 operation: inference loads are continuous; production stability and predictability matter more than peak performance.
- Enterprise features: GPU virtualization, optimized drivers, profiles, and multi-user scenarios are the foundation of service deployments.
- Scaling without degradation: increasing users, context, or query complexity requires maintaining stable TPS.
Consumer cards are not a cheaper alternative for production AI – they are for a different use profile. GPU selection should be based on model, target TPS, and SLA requirements. Hence, production environments naturally use RTX 6000 (Ada Lovelace) and RTX PRO 6000 (Blackwell).
Note: applies to production and multi-session projects. GPU selection should always consider target workload, traffic profile, and SLA requirements.
6) Recommendations – Based on Purpose
Recommendation: Ada Lovelace (RTX 6000 Ada) – when cost/TPS matters
- Workload: chatbots, RAG, enterprise AI on 7–13B, and 70B at limited scale.
- Priority: high cost efficiency, predictable TPS in standard scenarios.
Recommendation: Blackwell (RTX PRO 6000 Blackwell) – when scale and SLA matter
- Workload: 70B+ in multi-session environment, long context, enterprise requirements.
- Priority: higher throughput, stable under load, margin for parallelism.
GPU Selection for AI at ESUS IT
We tailor GPU configurations to specific model, context, number of sessions, and target TPS/SLA. We can provide sizing and architecture recommendation (Ada/Blackwell) if needed.
Methodological note: TPS ranges are approximate. Results depend on model, quantization, context, inference engine, parallelism parameters, and platform configuration (drivers, CPU/RAM, power limits, and cooling).
© ESUS IT • Educational Material: GPUs for AI





