RAG-LLM Architecture with Ollama and RAGFlow on Kubuntu Linux
Local AI Infrastructure, Domain-Specific LLMs, Generic LLMs, and Engineering Applications
Abstract
Retrieval-Augmented Generation (RAG) combined with Large Language Models (LLMs) is transforming enterprise computing, engineering analysis, research automation, and intelligent knowledge management. Organizations increasingly seek privacy-preserving local AI infrastructures capable of running on desktops, laptops, workstations, and VPS/cloud environments without dependency on external APIs.
This paper presents a comprehensive guide to installing and configuring a local RAG-LLM environment using Ollama and RAGFlow on Kubuntu Linux systems. The paper also explains the theoretical foundations of RAG systems, vector databases, embeddings, semantic search, and AI workflows.
A major focus is placed on understanding the difference between:
- Generic LLMs
- Domain-specific LLMs
- Fine-tuned models
- Retrieval systems
- Hybrid enterprise AI architectures
The paper further explores practical applications in:
- Electrical engineering
- Computer engineering
- Industrial IoT
- HVDC systems
- Power systems
- Consulting engineering
- Research organizations
- SME digital transformation
1. Introduction
Artificial Intelligence is moving from cloud-centric systems toward localized enterprise AI infrastructure. Organizations increasingly require:
- Data privacy
- Local inference
- Reduced API costs
- Faster response times
- Domain specialization
- Secure knowledge management
Traditional cloud LLM systems suffer from several limitations:
|
Limitation |
Problem |
|---|---|
|
Hallucination |
Incorrect information |
|
Limited context |
Cannot access company documents |
|
Privacy concerns |
Sensitive data leaves organization |
|
API cost |
Expensive at scale |
|
Internet dependency |
Cloud reliance |
|
Generic knowledge |
Poor domain expertise |
RAG architectures solve many of these problems.
2. Understanding Large Language Models
What is an LLM?
A Large Language Model is a neural network trained on enormous amounts of text data to predict the next token in a sequence.
Examples include:
|
Model |
Organization |
|---|---|
|
Llama 3 |
Meta |
|
Mistral |
Mistral AI |
|
Gemma |
|
|
Qwen |
Alibaba Cloud |
|
DeepSeek |
DeepSeek |
3. Generic LLM vs Domain-Specific LLM
This distinction is critically important in enterprise AI.
3.1 Generic LLM
A generic LLM is trained on broad internet-scale data.
Examples:
- Llama 3
- GPT models
- Mistral
- Gemma
- Qwen
These models understand:
- General language
- Coding
- Mathematics
- Conversation
- Basic science
- Writing tasks
However, they may lack deep knowledge in:
- HVDC engineering
- IEC standards
- Protection systems
- Transient stability
- SCADA protocols
- Industrial systems
Characteristics of Generic LLMs
|
Feature |
Generic LLM |
|---|---|
|
Broad knowledge |
Excellent |
|
Domain expertise |
Moderate |
|
Flexibility |
High |
|
Hallucination risk |
Medium to high |
|
Enterprise customization |
Limited |
|
Cost efficiency |
Good |
3.2 Domain-Specific LLM
A domain-specific LLM is specialized for a particular industry or knowledge area.
Examples:
|
Domain |
Specialized Model |
|---|---|
|
Medicine |
Medical LLM |
|
Legal |
Legal AI |
|
Finance |
Financial LLM |
|
Engineering |
Technical engineering LLM |
|
Cybersecurity |
Security-focused LLM |
Domain-Specific LLM Training Sources
Technical Manuals Research Papers Standards Industrial Logs SCADA Data Engineering Reports Protection Studies Equipment Documentation
Characteristics of Domain-Specific LLMs
|
Feature |
Domain-Specific LLM |
|---|---|
|
Specialized expertise |
Excellent |
|
General reasoning |
Moderate |
|
Hallucination |
Lower in domain |
|
Training cost |
High |
|
Fine tuning complexity |
High |
|
Industry accuracy |
Very high |
3.3 Why Domain-Specific LLMs Matter
Example:
Question:
“Explain commutation failure in LCC-HVDC systems.”
A generic LLM may provide:
- Basic textbook explanation
- Limited practical engineering insight
A domain-specific engineering model may provide:
- Fault current analysis
- Thyristor behavior
- Reactive power dynamics
- Protection logic
- PSCAD/EMTDC interpretation
- Mitigation strategies
This dramatically improves engineering productivity.
4. RAG vs Fine-Tuning
Many organizations misunderstand this distinction.
Fine-Tuning
Fine-tuning changes model weights.
Advantages:
- Deep specialization
- Consistent responses
- Better domain adaptation
Disadvantages:
- Expensive
- GPU intensive
- Requires ML expertise
- Hard to update
RAG (Retrieval-Augmented Generation)
RAG keeps the base model unchanged but adds external knowledge retrieval.
Advantages:
- Easier updates
- Lower cost
- Enterprise friendly
- Better document grounding
- No retraining needed
Disadvantages:
- Retrieval quality matters
- More infrastructure components
Enterprise Preference
Most organizations prefer:
Generic LLM + RAG Knowledge System + Domain Documents
Instead of expensive fine tuning.
5. How RAG-LLM Systems Work
Core Pipeline
User Question │ ▼ Embedding Generation │ ▼ Vector Database Search │ ▼ Relevant Document Retrieval │ ▼ Prompt Construction │ ▼ Large Language Model │ ▼ Generated Response
6. Understanding Embeddings
Embeddings convert text into numerical vectors.
Example:
"HVDC converter transformer" ↓ [0.293, -0.442, 0.983, ...]
Similar concepts have similar vector positions.
Semantic Similarity Example
7. Components of a RAG System
|
Component |
Function |
|---|---|
|
LLM |
Generates responses |
|
Embedding model |
Creates vectors |
|
Vector database |
Stores embeddings |
|
Retriever |
Finds relevant context |
|
Prompt builder |
Constructs prompts |
|
Document parser |
Processes files |
|
UI |
User interaction |
8. What is Ollama?
Ollama is a local LLM runtime that enables running modern AI models on:
- Linux
- Kubuntu
- Ubuntu
- macOS
- Windows
Features:
- Offline AI
- GPU acceleration
- Local inference
- REST API
- Easy model management
9. What is RAGFlow?
RAGFlow is a comprehensive enterprise RAG platform featuring:
- OCR
- Knowledge graphs
- Agent workflows
- Hybrid retrieval
- Multi-user environment
- Document ingestion
- Semantic retrieval
- Vector indexing
10. Complete RAGFlow + Ollama Architecture
┌────────────────────┐ │ User Web Browser │ └─────────┬──────────┘ │ ▼ ┌────────────────────┐ │ RAGFlow Frontend │ └─────────┬──────────┘ │ ▼ ┌────────────────────┐ │ RAGFlow Backend │ └─────────┬──────────┘ │ ┌───────────────────┼───────────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Elasticsearch│ │ Redis Cache │ │ MinIO Object │ │ Vector Store │ │ Session Mgmt │ │ Storage │ └──────┬───────┘ └──────────────┘ └──────────────┘ │ ▼ ┌────────────────────┐ │ Embedding Models │ └─────────┬──────────┘ │ ▼ ┌────────────────────┐ │ Ollama Runtime │ │ Llama / Qwen │ └────────────────────┘
11. Kubuntu Installation Requirements
Recommended Kubuntu Version
|
Version |
Recommendation |
|---|---|
|
Kubuntu 22.04 LTS |
Stable |
|
Kubuntu 24.04 LTS |
Recommended |
12. Hardware Requirements
CPU-Only Deployment
|
Resource |
Minimum |
|---|---|
|
CPU |
4 cores |
|
RAM |
16 GB |
|
Storage |
50 GB SSD |
GPU Deployment
|
Resource |
Recommended |
|---|---|
|
GPU |
NVIDIA RTX 3060+ |
|
VRAM |
12 GB+ |
|
RAM |
32 GB |
|
CPU |
Ryzen 7 / Intel i7 |
13. Installing Docker
sudo apt update sudo apt upgrade -y sudo apt install docker.io docker-compose-v2 git curl -y
Enable Docker:
sudo systemctl enable docker sudo systemctl start docker
Add user:
sudo usermod -aG docker $USER
14. Installing Ollama
curl -fsSL https://ollama.com/install.sh | sh
Verify:
ollama --version
15. Installing Models
General Purpose Models
ollama pull llama3:8b ollama pull mistral:7b ollama pull qwen3
Domain-Specific Engineering Models
Potential approaches:
|
Method |
Description |
|---|---|
|
Fine-tuning |
Train engineering model |
|
RAG documents |
Add engineering PDFs |
|
Hybrid |
Generic LLM + engineering retrieval |
16. Installing Embedding Models
ollama pull nomic-embed-text
Other options:
|
Embedding Model |
Purpose |
|---|---|
|
bge-large |
Accurate retrieval |
|
mxbai-embed-large |
Enterprise retrieval |
|
e5-large |
Multilingual |
17. Installing RAGFlow
git clone https://github.com/infiniflow/ragflow.git cd ragflow
18. Configure Ollama Integration
Edit configuration:
nano docker/.env
Add:
OLLAMA_BASE_URL=http://host.docker.internal:11434
Linux alternative:
OLLAMA_BASE_URL=http://172.17.0.1:11434
19. Start RAGFlow
cd docker docker compose up -d
20. Access RAGFlow
21. Engineering Use Cases
21. Engineering Use Cases
Electrical Engineering
Electrical Engineering
HVDC Engineering RAG Example
HVDC Engineering RAG Example
“Explain transient stability during pole blocking in VSC-HVDC.”
Then generates grounded engineering explanations.
22. Industrial IoT Applications
22. Industrial IoT Applications
RAG systems can integrate with:
Industrial AI Workflow
Industrial AI Workflow23. Why Local AI Matters
23. Why Local AI Matters
|
Feature |
Benefit |
|---|---|
|
Offline operation |
Works without internet |
|
Privacy |
Internal data protection |
|
Lower cost |
No API billing |
|
Faster inference |
Local execution |
|
Customization |
Domain specialization |
24. Generic LLM + RAG vs Domain-Specific LLM
24. Generic LLM + RAG vs Domain-Specific LLM
25. Recommended Enterprise Architecture
25. Recommended Enterprise Architecture
This provides: Popular choices: Database Use Elasticsearch Enterprise scale Qdrant Lightweight Weaviate AI-native Milvus Large scale ChromaDB Small projects Component Purpose Firewall Access control Reverse proxy Secure routing HTTPS Encryption Authentication User control VPN Secure remote access Recommended providers: Provider Strength Hetzner Affordable servers DigitalOcean Easy deployment Linode Small enterprise Vultr GPU VPS Enterprise deployments may use: Future enterprise AI trends include: RAG-LLM systems represent a major evolution in enterprise artificial intelligence. By combining retrieval systems with local language models, organizations can build secure, scalable, and domain-aware AI infrastructures. The combination of: creates a powerful foundation for: For electrical and computer engineering organizations, RAG architectures enable the creation of intelligent assistants capable of understanding: The future of enterprise AI will likely rely heavily on hybrid architectures where: become the standard model for intelligent organizations.
Generic LLM + Enterprise RAG + Domain Documents + Vector Database + Workflow Automation
26. Vector Databases
27. Security Architecture
Important Security Components
28. VPS Deployment
29. Kubernetes and Scalability
30. Future of RAG Systems
31. Conclusion
Generic LLMs + Domain Knowledge + RAG Retrieval + Workflow Automation + Local Infrastructure
References and Resources
- Ollama Documentation
- RAGFlow GitHub Repository
- Docker Documentation
- Kubernetes Documentation
- Elasticsearch Documentation
- Kubuntu Official Website
Conceptual Semantic Clustering in Embedding Space
Engineering concepts with similar meaning cluster closely in vector space used by RAG systems.
|
concept |
x |
y |
|---|---|---|
|
HVDC converter |
1.2 |
1.5 |
|
LCC HVDC |
1.4 |
1.7 |
|
FACTS device |
2.1 |
2.4 |
|
Transformer protection |
4.5 |
4.9 |
|
Differential relay |
4.8 |
5.1 |
|
SCADA telemetry |
7.2 |
7.4 |
Generic LLM plus RAG vs Domain-Specific LLM
Comparison of practical enterprise AI deployment characteristics.
|
category |
genericRag |
domainLlm |
|---|---|---|
|
Deployment simplicity |
9 |
5 |
|
Domain expertise |
7 |
10 |
|
Update flexibility |
10 |
4 |
|
Training cost efficiency |
9 |
3 |
|
Infrastructure complexity |
7 |
4 |