If you've been following the explosion of Large Language Models and Retrieval-Augmented Generation applications, you've probably noticed a trend. The vast majority of tutorials, sample code, and applications are built using Python. While Python's prominence in the AI/ML space is well-deserved due to its rich ecosystem, it's time we talked about a powerful alternative that's often overlooked in these conversations - .NET.
As someone who has worked extensively with both Python and .NET for AI applications, I'd like to share a recent project that showcases why .NET deserves more attention in the RAG system conversation.
A .NET-Powered RAG Console Application
I recently built a complete RAG (Retrieval-Augmented Generation) system using .NET and Ollama, which allows local execution of various LLM models. The architecture is clean, performance is excellent, and the developer experience was surprisingly smooth. Let me walk you through why I believe .NET deserves more recognition in this space.
5 Reasons .NET Shines for RAG and LLM Applications
1. Strong Type System and Developer Productivity
public interface IDocumentProcessor
{
Task<List<DocumentChunk>> ProcessDocumentAsync(string filePath);
}
public class DocumentChunk
{
public required string Text { get; set; }
public required string DocumentName { get; set; }
public int ChunkNumber { get; set; }
public required float[] Embedding { get; set; }
}
One of the immediate benefits of .NET is its strong type system. The interface and class definitions above showcase how clean and self-documenting .NET code can be. As RAG systems grow more complex, having compiler-enforced type checking becomes increasingly valuable. This helps catch errors early in development rather than encountering them at runtime.
2. Dependency Injection Built Into the Framework
var serviceProvider = new ServiceCollection()
.AddLogging(configure => configure.AddConsole())
.AddSingleton<IDocumentProcessor, PdfDocumentProcessor>()
.AddSingleton<IVectorStore, SimpleVectorStore>()
.AddSingleton<IEmbeddingService, OllamaEmbeddingService>()
.AddSingleton<IChatService, OllamaChatService>()
.AddSingleton<IRagService, RagService>()
.BuildServiceProvider();
The dependency injection container in .NET makes it incredibly easy to build modular, testable applications. In the code snippet above, we're registering all our services in the DI container, making them available throughout the application. This promotes loose coupling and makes it easy to swap implementations (for example, switching from Ollama to Azure OpenAI) without changing the consuming code.
3. Async/Await Model for Handling I/O-Bound Operations
public async Task<string> GetAnswerAsync(string question)
{
try
{
// Get embedding for the question
var questionEmbedding = await _embeddingService.GetEmbeddingsAsync(question);
// Retrieve similar documents
var similarDocuments = await _vectorStore.GetSimilarDocumentsAsync(questionEmbedding, 3);
// Concatenate the content of similar documents
var context = new StringBuilder();
foreach (var doc in similarDocuments)
{
context.AppendLine($"From document: {doc.DocumentName}, chunk {doc.ChunkNumber}:");
context.AppendLine(doc.Text);
context.AppendLine();
}
// Get answer from chat service
string answer = await _chatService.GetResponseAsync(question, context.ToString());
return answer;
}
catch (Exception ex)
{
return $"Error: {ex.Message}";
}
}
The async/await pattern in .NET is elegant and powerful, making it simple to handle the I/O-bound operations that are common in RAG systems - like API calls to the LLM service, database operations, and file system access. The code remains readable while efficiently managing system resources.
4. Performance and Resource Efficiency
While Python is often lauded for its simplicity and rich ML ecosystem, .NET offers significant performance advantages:
- JIT compilation for optimized execution
- Efficient memory management
- Highly optimized garbage collection
- Superior thread handling for concurrent operations
For production RAG systems that need to handle many concurrent requests or operate within resource constraints, these advantages can be crucial. In my testing, the .NET implementation handled PDF processing and embedding generation with exceptional speed and minimal resource usage.
5. Enterprise-Ready Ecosystem
Many organizations already have significant investments in .NET infrastructure. Building RAG systems with .NET allows for:
- Seamless integration with existing systems and authentication mechanisms
- Familiar deployment patterns (Docker containers, Azure App Services, etc.)
- Existing developer expertise can be leveraged
- Mature libraries for logging, configuration, and monitoring
The Implementation Details
My .NET RAG implementation follows a clean architecture approach with clearly defined interfaces:
- Document Processing: Extract text from PDFs and chunk it into manageable pieces
- Embedding Generation: Using Ollama's embedding model to create vector representations
- Vector Storage: Simple in-memory vector store with cosine similarity search
- Chat Service: Integration with Llama 3.2 for generating responses based on retrieved context
The entire system is wired together with dependency injection, making each component testable and replaceable.
A Note on Ecosystem Maturity
It's fair to acknowledge that Python's ecosystem for machine learning and AI is more mature. Libraries like Hugging Face Transformers, LangChain, and LlamaIndex have established themselves as industry standards. However, the .NET ecosystem is rapidly evolving:
- Microsoft's Semantic Kernel is being actively developed for .NET
- The ML.NET framework continues to grow
- Community-driven projects are filling gaps in the ecosystem
When to Choose .NET for Your RAG System
While I'm not suggesting that .NET should completely replace Python in the AI space, there are scenarios where it makes perfect sense:
- When your organization already has .NET expertise
- When your application needs to integrate with existing .NET systems
- When performance and resource efficiency are critical
- When you value strong typing and compile-time safety
- When you're building enterprise applications with complex business logic
Step-by-Step Guide to Building Your Own .NET RAG System
Want to build your own .NET-powered RAG application? Here's a detailed walkthrough:
1. Project Setup
# Create a new console application
dotnet new console -n dotnet_console_rag_ollama
cd dotnet_console_rag_ollama
# Add necessary packages
dotnet add package Microsoft.Extensions.DependencyInjection
dotnet add package Microsoft.Extensions.Logging
dotnet add package Microsoft.Extensions.Logging.Console
dotnet add package itext7 # For PDF processing
dotnet add package Newtonsoft.Json
2. Define Your Interfaces
Start by creating clear interfaces that define the responsibilities of each component:
public interface IDocumentProcessor
{
Task<List<DocumentChunk>> ProcessDocumentAsync(string filePath);
}
public interface IEmbeddingService
{
Task<float[]> GetEmbeddingsAsync(string text);
}
public interface IVectorStore
{
Task AddDocumentAsync(DocumentChunk document);
Task<List<DocumentChunk>> GetSimilarDocumentsAsync(float[] queryEmbedding, int topK);
}
public interface IChatService
{
Task<string> GetResponseAsync(string question, string context);
}
public interface IRagService
{
Task ProcessDocumentsAsync(string folderPath);
Task<string> GetAnswerAsync(string question);
}
3. Implement Document Processing
Create a PDF document processor that extracts text from PDFs and chunks it:
public class PdfDocumentProcessor : IDocumentProcessor
{
private readonly ILogger<PdfDocumentProcessor> _logger;
private readonly IEmbeddingService _embeddingService;
private const int MaxChunkSize = 1000; // characters per chunk
public PdfDocumentProcessor(ILogger<PdfDocumentProcessor> logger, IEmbeddingService embeddingService)
{
_logger = logger;
_embeddingService = embeddingService;
}
public async Task<List<DocumentChunk>> ProcessDocumentAsync(string filePath)
{
_logger.LogInformation($"Processing PDF: {filePath}");
// Extract text from PDF using iText7
string extractedText = ExtractTextFromPdf(filePath);
// Split text into chunks
var textChunks = ChunkText(extractedText, MaxChunkSize);
List<DocumentChunk> documentChunks = new List<DocumentChunk>();
// Create document chunks with embeddings
for (int i = 0; i < textChunks.Count; i++)
{
var embedding = await _embaddingService.GetEmbeddingsAsync(textChunks[i]);
documentChunks.Add(new DocumentChunk
{
Text = textChunks[i],
DocumentName = Path.GetFileName(filePath),
ChunkNumber = i + 1,
Embedding = embedding
});
}
return documentChunks;
}
// Implementation details for ExtractTextFromPdf and ChunkText methods...
}
4. Implement Embedding Service (Ollama Integration)
Create a service that connects to Ollama to generate embeddings:
public class OllamaEmbeddingService : IEmbeddingService
{
private readonly ILogger<OllamaEmbeddingService> _logger;
private readonly HttpClient _httpClient;
private const string EmbeddingModel = "nomic-embed-text";
private const string OllamaBaseUrl = "http://localhost:11434/api";
public OllamaEmbeddingService(ILogger<OllamaEmbeddingService> logger)
{
_logger = logger;
_httpClient = new HttpClient();
}
public async Task<float[]> GetEmbeddingsAsync(string text)
{
try
{
var request = new
{
model = EmbeddingModel,
prompt = text
};
var response = await _httpClient.PostAsJsonAsync(
$"{OllamaBaseUrl}/embeddings", request);
// Process response to extract embeddings
// ...
return embeddings;
}
catch (Exception ex)
{
_logger.LogError($"Error getting embeddings: {ex.Message}");
throw;
}
}
}
5. Implement Vector Store
Create a simple in-memory vector store with cosine similarity search:
public class SimpleVectorStore : IVectorStore
{
private readonly List<DocumentChunk> _documents = new List<DocumentChunk>();
private readonly ILogger<SimpleVectorStore> _logger;
public SimpleVectorStore(ILogger<SimpleVectorStore> logger)
{
_logger = logger;
}
public Task AddDocumentAsync(DocumentChunk document)
{
_documents.Add(document);
return Task.CompletedTask;
}
public Task<List<DocumentChunk>> GetSimilarDocumentsAsync(float[] queryEmbedding, int topK)
{
// Calculate cosine similarity for each document
var similarities = _documents
.Select(doc => new
{
Document = doc,
Similarity = CosineSimilarity(queryEmbedding, doc.Embedding)
})
.OrderByDescending(x => x.Similarity)
.Take(topK)
.Select(x => x.Document)
.ToList();
return Task.FromResult(similarities);
}
private float CosineSimilarity(float[] vector1, float[] vector2)
{
// Implementation of cosine similarity calculation
// ...
}
}
6. Implement Chat Service
Create a service that sends prompts to Ollama's LLM:
public class OllamaChatService : IChatService
{
private readonly ILogger<OllamaChatService> _logger;
private readonly HttpClient _httpClient;
private const string ChatModel = "llama3:8b";
private const string OllamaBaseUrl = "http://localhost:11434/api";
public OllamaChatService(ILogger<OllamaChatService> logger)
{
_logger = logger;
_httpClient = new HttpClient();
}
public async Task<string> GetResponseAsync(string question, string context)
{
try
{
string systemPrompt = "You are a helpful assistant that answers questions based on the provided context.";
string prompt = $"Context:\n{context}\n\nQuestion: {question}\n\nAnswer:";
var request = new
{
model = ChatModel,
prompt = prompt,
system = systemPrompt,
stream = false
};
var response = await _httpClient.PostAsJsonAsync(
$"{OllamaBaseUrl}/generate", request);
// Process response to extract answer
// ...
return answer;
}
catch (Exception ex)
{
_logger.LogError($"Error getting response: {ex.Message}");
return $"Error: {ex.Message}";
}
}
}
7. Implement the RAG Service
Create the main service that orchestrates the entire RAG process:
public class RagService : IRagService
{
private readonly ILogger<RagService> _logger;
private readonly IDocumentProcessor _documentProcessor;
private readonly IVectorStore _vectorStore;
private readonly IEmbeddingService _embeddingService;
private readonly IChatService _chatService;
public RagService(
ILogger<RagService> logger,
IDocumentProcessor documentProcessor,
IVectorStore vectorStore,
IEmbeddingService embeddingService,
IChatService chatService)
{
_logger = logger;
_documentProcessor = documentProcessor;
_vectorStore = vectorStore;
_embeddingService = embeddingService;
_chatService = chatService;
}
public async Task ProcessDocumentsAsync(string folderPath)
{
// Process all PDF files in the specified folder
// ...
}
public async Task<string> GetAnswerAsync(string question)
{
// Get embedding for the question
var questionEmbedding = await _embeddingService.GetEmbeddingsAsync(question);
// Retrieve similar documents
var similarDocuments = await _vectorStore.GetSimilarDocumentsAsync(questionEmbedding, 3);
// Prepare context from similar documents
// ...
// Get answer from LLM
string answer = await _chatService.GetResponseAsync(question, context);
return answer;
}
}
8. Wire Everything Together
In your Program.cs, set up the dependency injection container and create the main application loop:
static async Task Main(string[] args)
{
// Set up dependency injection
var serviceProvider = new ServiceCollection()
.AddLogging(configure => configure.AddConsole())
.AddSingleton<IDocumentProcessor, PdfDocumentProcessor>()
.AddSingleton<IVectorStore, SimpleVectorStore>()
.AddSingleton<IEmbeddingService, OllamaEmbeddingService>()
.AddSingleton<IChatService, OllamaChatService>()
.AddSingleton<IRagService, RagService>()
.BuildServiceProvider();
var ragService = serviceProvider.GetRequiredService<IRagService>();
// Process documents
await ragService.ProcessDocumentsAsync("./Documents");
// Interactive question loop
while (true)
{
Console.Write("\nYour question (type 'exit' to quit): ");
string question = Console.ReadLine() ?? string.Empty;
if (question.ToLower() == "exit") break;
string answer = await ragService.GetAnswerAsync(question);
Console.WriteLine($"\nAnswer: {answer}");
}
}
9. Set Up and Configure Ollama
Before running your application, you need to set up Ollama properly:
- Install Ollama from https://ollama.ai
- For macOS: Download and install the .dmg file
- For Windows: Download and run the installer
- For Linux: Use the install script
curl -fsSL https://ollama.com/install.sh | sh
- Start the Ollama service:
ollama serve
This will start the Ollama API server on http://localhost:11434
- Pull the required models (in a new terminal window):
# Pull the embedding model
ollama pull nomic-embed-text
# Pull the chat model
ollama pull llama3:8b
- Verify the models are correctly installed:
# List all available models
ollama list
You should see both "nomic-embed-text" and "llama3:8b" in the list.
- Test the embedding model:
# Test embedding generation
curl -X POST http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "This is a test."
}' | head
You should see a JSON response with an "embedding" array containing vector values.
- Test the chat model:
# Test text generation
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3:8b",
"prompt": "What is retrieval-augmented generation?",
"stream": false
}' | jq '.response'
You should receive a helpful response explaining RAG. If you don't have jq installed, you can omit the | jq '.response'
part.
Additional Ollama commands that might be useful:
# See details about a specific model
ollama show nomic-embed-text
# Remove a model you no longer need
ollama rm modelname
# If you need to stop the Ollama service
ollama kill
# If you need to update your models
ollama pull nomic-embed-text:latest
ollama pull llama3:8b:latest
Note: Make sure Ollama is running at all times while using your RAG application. If you experience connection issues, check if Ollama is running by visiting http://localhost:11434 in your browser or running curl http://localhost:11434
.

10. Test Your Application
Build and run your application:
# Build the application
dotnet build
# Create a Documents directory if it doesn't exist
mkdir -p Documents
# Add some PDF documents to test with
# (You can use sample PDFs or technical documentation)
# Run the application
dotnet run

Place some PDF documents in the ./Documents folder and start asking questions related to the content!
Troubleshooting Common Issues
- Connection refused errors: Make sure Ollama is running with
ollama serve
- Model not found errors: Check if your models are properly installed with
ollama list
- Embedding dimension mismatch: Make sure you're using a consistent embedding model throughout your code
- Memory issues: If processing large PDFs, you might need to adjust the chunk size or use a smaller LLM model
- No documents found: Check that your PDFs are correctly placed in the ./Documents folder
Performance Optimization
For better performance, consider these tips:
- Cache embeddings to disk to avoid recalculating them each time
- Use a more efficient similarity search algorithm such as Approximate Nearest Neighbors
- Add a background service to preprocess documents and update the vector store
- Adjust chunk size based on your specific documents and use cases
- Consider using a persistent vector database like Qdrant or Milvus for large document collections
Conclusion
The next time you're planning a RAG system or other LLM-powered application, don't automatically reach for Python without considering the benefits .NET might bring to your specific use case. As my implementation demonstrates, .NET provides a robust, performant foundation for building sophisticated AI applications with clean, maintainable code.
What are your thoughts? Have you tried building RAG systems with .NET, or are you considering it? I'd love to hear about your experiences in the comments below.
The complete source code for this .NET RAG console application is available on my GitHub.
repo : https://github.com/encryptedtouhid/dotnet_console_rag_ollama
Feel free to check it out, contribute, or adapt it for your own projects.
Happy Coding 👨💻