Save to shopping list
Create a new shopping list

Blog

How to Choose a GPU for AI and LLMs? A Guide to Models, TPS, and Users

How to Choose a GPU for AI and LLMs? A Guide to Models, TPS, and Users

How to Choose a GPU for AI? A Practical Guide Based on Requirements

In AI-driven projects, GPU selection often starts with choosing a specific card model. In practice, this is the wrong starting point. AI infrastructure should be defined by project requirements: use case, model size, number of users, expected TPS, context length, and quantization method. Only then should you determine the appropriate class of solution.

Starting Point: Not the GPU Model, but the Requirements

In AI projects, GPU selection should be driven by workload, not by the product name.

The same GPU may be optimal in one scenario and completely insufficient in another. That’s why GPU selection should begin with workload analysis rather than a product list.

Core principle: first define the AI use case, model size, number of users, TPS, context length, and deployment method. Only then choose the GPU class.

1) Define the AI use case

The first step is answering: what will AI be used for? This determines GPU requirements.

Scenario 1

Chatbots and RAG

Q&A systems, knowledge retrieval, user support, integration with internal documentation.

Scenario 2

Document and data analysis

Processing contracts, reports, and internal datasets with larger context requirements.

Scenario 3

AI agents and automation

Multi-step workflows, tool usage, system integrations, complex processes.

Scenario 4

Expert models and AI-as-a-service

70B+ models, multi-session environments, SLA-driven infrastructure.

Conclusion: two “AI projects” can have completely different hardware requirements.

2) Model size: 7B / 13B / 70B

Model classCharacteristicsTypical use
7–8BLightweight, fast, low requirementsChatbots, RAG, Q&A
13BBalanced quality/performanceEnterprise AI, document analysis
70BHigh requirements, advanced reasoningExpert systems, enterprise
Key rule: bigger is not always better.

3) TPS and number of users

TPS (tokens per second) defines generation speed, but must be evaluated alongside concurrent users.

Key question: does it scale under real load?

4) Context length

  • 4k–8k – standard
  • 16k–32k – documents
  • 32k+ – advanced workloads
Longer context = more VRAM + lower TPS

5) Choosing the GPU class

Class 1

Entry-level

Testing, small deployments

Class 2

Production

Stable multi-user environments (e.g. RTX 6000 Ada)

Class 3

Enterprise

Large-scale AI (e.g. RTX PRO 6000 Blackwell)

Rule: match GPU to workload, not marketing name.

Summary

  • Use case
  • Model size
  • Users
  • TPS
  • Context
  • Quantization
Final takeaway: GPU selection = workload alignment.
Do you have questions? Write to our expert
Get an answer within 24 hours*
*From Mon-Fri: 8:00am-4:00pm
pixel