A .NET-Powered RAG Console Application with Ollama

If you've been following the explosion of Large Language Models and Retrieval-Augmented Generation applications, you've probably noticed a trend. The vast majority of tutorials, sample code, and applications are built using Python. While Python's prominence in the AI/ML space is well-deserved due to its rich ecosystem, it's time we talked about a powerful alternative that's often overlooked in these conversations - .NET.

As someone who has worked extensively with both Python and .NET for AI applications, I'd like to share a recent project that showcases why .NET deserves more attention in the RAG system conversation.

A .NET-Powered RAG Console Application

I recently built a complete RAG (Retrieval-Augmented Generation) system using .NET and Ollama, which allows local execution of various LLM models. The architecture is clean, performance is excellent, and the developer experience was surprisingly smooth. Let me walk you through why I believe .NET deserves more recognition in this space.

5 Reasons .NET Shines for RAG and LLM Applications

1. Strong Type System and Developer Productivity

public interface IDocumentProcessor
{
    Task<List<DocumentChunk>> ProcessDocumentAsync(string filePath);
}

public class DocumentChunk
{
    public required string Text { get; set; }
    public required string DocumentName { get; set; }
    public int ChunkNumber { get; set; }
    public required float[] Embedding { get; set; }
}

One of the immediate benefits of .NET is its strong type system. The interface and class definitions above showcase how clean and self-documenting .NET code can be. As RAG systems grow more complex, having compiler-enforced type checking becomes increasingly valuable. This helps catch errors early in development rather than encountering them at runtime.

2. Dependency Injection Built Into the Framework

var serviceProvider = new ServiceCollection()
    .AddLogging(configure => configure.AddConsole())
    .AddSingleton<IDocumentProcessor, PdfDocumentProcessor>()
    .AddSingleton<IVectorStore, SimpleVectorStore>()
    .AddSingleton<IEmbeddingService, OllamaEmbeddingService>()
    .AddSingleton<IChatService, OllamaChatService>()
    .AddSingleton<IRagService, RagService>()
    .BuildServiceProvider();

The dependency injection container in .NET makes it incredibly easy to build modular, testable applications. In the code snippet above, we're registering all our services in the DI container, making them available throughout the application. This promotes loose coupling and makes it easy to swap implementations (for example, switching from Ollama to Azure OpenAI) without changing the consuming code.

3. Async/Await Model for Handling I/O-Bound Operations

public async Task<string> GetAnswerAsync(string question)
{
    try
    {
        // Get embedding for the question
        var questionEmbedding = await _embeddingService.GetEmbeddingsAsync(question);
        
        // Retrieve similar documents
        var similarDocuments = await _vectorStore.GetSimilarDocumentsAsync(questionEmbedding, 3);
        
        // Concatenate the content of similar documents
        var context = new StringBuilder();
        foreach (var doc in similarDocuments)
        {
            context.AppendLine($"From document: {doc.DocumentName}, chunk {doc.ChunkNumber}:");
            context.AppendLine(doc.Text);
            context.AppendLine();
        }
        
        // Get answer from chat service
        string answer = await _chatService.GetResponseAsync(question, context.ToString());
        
        return answer;
    }
    catch (Exception ex)
    {
        return $"Error: {ex.Message}";
    }
}

The async/await pattern in .NET is elegant and powerful, making it simple to handle the I/O-bound operations that are common in RAG systems - like API calls to the LLM service, database operations, and file system access. The code remains readable while efficiently managing system resources.

4. Performance and Resource Efficiency

While Python is often lauded for its simplicity and rich ML ecosystem, .NET offers significant performance advantages:

  • JIT compilation for optimized execution
  • Efficient memory management
  • Highly optimized garbage collection
  • Superior thread handling for concurrent operations

For production RAG systems that need to handle many concurrent requests or operate within resource constraints, these advantages can be crucial. In my testing, the .NET implementation handled PDF processing and embedding generation with exceptional speed and minimal resource usage.

5. Enterprise-Ready Ecosystem

Many organizations already have significant investments in .NET infrastructure. Building RAG systems with .NET allows for:

  • Seamless integration with existing systems and authentication mechanisms
  • Familiar deployment patterns (Docker containers, Azure App Services, etc.)
  • Existing developer expertise can be leveraged
  • Mature libraries for logging, configuration, and monitoring

The Implementation Details

My .NET RAG implementation follows a clean architecture approach with clearly defined interfaces:

  • Document Processing: Extract text from PDFs and chunk it into manageable pieces
  • Embedding Generation: Using Ollama's embedding model to create vector representations
  • Vector Storage: Simple in-memory vector store with cosine similarity search
  • Chat Service: Integration with Llama 3.2 for generating responses based on retrieved context

The entire system is wired together with dependency injection, making each component testable and replaceable.

A Note on Ecosystem Maturity

It's fair to acknowledge that Python's ecosystem for machine learning and AI is more mature. Libraries like Hugging Face Transformers, LangChain, and LlamaIndex have established themselves as industry standards. However, the .NET ecosystem is rapidly evolving:

  • Microsoft's Semantic Kernel is being actively developed for .NET
  • The ML.NET framework continues to grow
  • Community-driven projects are filling gaps in the ecosystem

When to Choose .NET for Your RAG System

While I'm not suggesting that .NET should completely replace Python in the AI space, there are scenarios where it makes perfect sense:

  • When your organization already has .NET expertise
  • When your application needs to integrate with existing .NET systems
  • When performance and resource efficiency are critical
  • When you value strong typing and compile-time safety
  • When you're building enterprise applications with complex business logic

Step-by-Step Guide to Building Your Own .NET RAG System

Want to build your own .NET-powered RAG application? Here's a detailed walkthrough:

1. Project Setup

# Create a new console application
dotnet new console -n dotnet_console_rag_ollama
cd dotnet_console_rag_ollama

# Add necessary packages
dotnet add package Microsoft.Extensions.DependencyInjection
dotnet add package Microsoft.Extensions.Logging
dotnet add package Microsoft.Extensions.Logging.Console
dotnet add package itext7 # For PDF processing
dotnet add package Newtonsoft.Json

2. Define Your Interfaces

Start by creating clear interfaces that define the responsibilities of each component:

public interface IDocumentProcessor
{
    Task<List<DocumentChunk>> ProcessDocumentAsync(string filePath);
}

public interface IEmbeddingService
{
    Task<float[]> GetEmbeddingsAsync(string text);
}

public interface IVectorStore
{
    Task AddDocumentAsync(DocumentChunk document);
    Task<List<DocumentChunk>> GetSimilarDocumentsAsync(float[] queryEmbedding, int topK);
}

public interface IChatService
{
    Task<string> GetResponseAsync(string question, string context);
}

public interface IRagService
{
    Task ProcessDocumentsAsync(string folderPath);
    Task<string> GetAnswerAsync(string question);
}

3. Implement Document Processing

Create a PDF document processor that extracts text from PDFs and chunks it:

public class PdfDocumentProcessor : IDocumentProcessor
{
    private readonly ILogger<PdfDocumentProcessor> _logger;
    private readonly IEmbeddingService _embeddingService;
    private const int MaxChunkSize = 1000; // characters per chunk
    
    public PdfDocumentProcessor(ILogger<PdfDocumentProcessor> logger, IEmbeddingService embeddingService)
    {
        _logger = logger;
        _embeddingService = embeddingService;
    }
    
    public async Task<List<DocumentChunk>> ProcessDocumentAsync(string filePath)
    {
        _logger.LogInformation($"Processing PDF: {filePath}");
        
        // Extract text from PDF using iText7
        string extractedText = ExtractTextFromPdf(filePath);
        
        // Split text into chunks
        var textChunks = ChunkText(extractedText, MaxChunkSize);
        
        List<DocumentChunk> documentChunks = new List<DocumentChunk>();
        
        // Create document chunks with embeddings
        for (int i = 0; i < textChunks.Count; i++)
        {
            var embedding = await _embaddingService.GetEmbeddingsAsync(textChunks[i]);
            
            documentChunks.Add(new DocumentChunk
            {
                Text = textChunks[i],
                DocumentName = Path.GetFileName(filePath),
                ChunkNumber = i + 1,
                Embedding = embedding
            });
        }
        
        return documentChunks;
    }
    
    // Implementation details for ExtractTextFromPdf and ChunkText methods...
}

4. Implement Embedding Service (Ollama Integration)

Create a service that connects to Ollama to generate embeddings:

public class OllamaEmbeddingService : IEmbeddingService
{
    private readonly ILogger<OllamaEmbeddingService> _logger;
    private readonly HttpClient _httpClient;
    private const string EmbeddingModel = "nomic-embed-text";
    private const string OllamaBaseUrl = "http://localhost:11434/api";

    public OllamaEmbeddingService(ILogger<OllamaEmbeddingService> logger)
    {
        _logger = logger;
        _httpClient = new HttpClient();
    }

    public async Task<float[]> GetEmbeddingsAsync(string text)
    {
        try
        {
            var request = new
            {
                model = EmbeddingModel,
                prompt = text
            };

            var response = await _httpClient.PostAsJsonAsync(
                $"{OllamaBaseUrl}/embeddings", request);
            
            // Process response to extract embeddings
            // ...
            return embeddings;
        }
        catch (Exception ex)
        {
            _logger.LogError($"Error getting embeddings: {ex.Message}");
            throw;
        }
    }
}

5. Implement Vector Store

Create a simple in-memory vector store with cosine similarity search:

public class SimpleVectorStore : IVectorStore
{
    private readonly List<DocumentChunk> _documents = new List<DocumentChunk>();
    private readonly ILogger<SimpleVectorStore> _logger;
    
    public SimpleVectorStore(ILogger<SimpleVectorStore> logger)
    {
        _logger = logger;
    }
    
    public Task AddDocumentAsync(DocumentChunk document)
    {
        _documents.Add(document);
        return Task.CompletedTask;
    }
    
    public Task<List<DocumentChunk>> GetSimilarDocumentsAsync(float[] queryEmbedding, int topK)
    {
        // Calculate cosine similarity for each document
        var similarities = _documents
            .Select(doc => new
            {
                Document = doc,
                Similarity = CosineSimilarity(queryEmbedding, doc.Embedding)
            })
            .OrderByDescending(x => x.Similarity)
            .Take(topK)
            .Select(x => x.Document)
            .ToList();
            
        return Task.FromResult(similarities);
    }
    
    private float CosineSimilarity(float[] vector1, float[] vector2)
    {
        // Implementation of cosine similarity calculation
        // ...
    }
}

6. Implement Chat Service

Create a service that sends prompts to Ollama's LLM:

public class OllamaChatService : IChatService
{
    private readonly ILogger<OllamaChatService> _logger;
    private readonly HttpClient _httpClient;
    private const string ChatModel = "llama3:8b";
    private const string OllamaBaseUrl = "http://localhost:11434/api";
    
    public OllamaChatService(ILogger<OllamaChatService> logger)
    {
        _logger = logger;
        _httpClient = new HttpClient();
    }
    
    public async Task<string> GetResponseAsync(string question, string context)
    {
        try
        {
            string systemPrompt = "You are a helpful assistant that answers questions based on the provided context.";
            string prompt = $"Context:\n{context}\n\nQuestion: {question}\n\nAnswer:";
            
            var request = new
            {
                model = ChatModel,
                prompt = prompt,
                system = systemPrompt,
                stream = false
            };
            
            var response = await _httpClient.PostAsJsonAsync(
                $"{OllamaBaseUrl}/generate", request);
            
            // Process response to extract answer
            // ...
            return answer;
        }
        catch (Exception ex)
        {
            _logger.LogError($"Error getting response: {ex.Message}");
            return $"Error: {ex.Message}";
        }
    }
}

7. Implement the RAG Service

Create the main service that orchestrates the entire RAG process:

public class RagService : IRagService
{
    private readonly ILogger<RagService> _logger;
    private readonly IDocumentProcessor _documentProcessor;
    private readonly IVectorStore _vectorStore;
    private readonly IEmbeddingService _embeddingService;
    private readonly IChatService _chatService;
    
    public RagService(
        ILogger<RagService> logger,
        IDocumentProcessor documentProcessor,
        IVectorStore vectorStore,
        IEmbeddingService embeddingService,
        IChatService chatService)
    {
        _logger = logger;
        _documentProcessor = documentProcessor;
        _vectorStore = vectorStore;
        _embeddingService = embeddingService;
        _chatService = chatService;
    }
    
    public async Task ProcessDocumentsAsync(string folderPath)
    {
        // Process all PDF files in the specified folder
        // ...
    }
    
    public async Task<string> GetAnswerAsync(string question)
    {
        // Get embedding for the question
        var questionEmbedding = await _embeddingService.GetEmbeddingsAsync(question);
        
        // Retrieve similar documents
        var similarDocuments = await _vectorStore.GetSimilarDocumentsAsync(questionEmbedding, 3);
        
        // Prepare context from similar documents
        // ...
        
        // Get answer from LLM
        string answer = await _chatService.GetResponseAsync(question, context);
        
        return answer;
    }
}

8. Wire Everything Together

In your Program.cs, set up the dependency injection container and create the main application loop:

static async Task Main(string[] args)
{
    // Set up dependency injection
    var serviceProvider = new ServiceCollection()
        .AddLogging(configure => configure.AddConsole())
        .AddSingleton<IDocumentProcessor, PdfDocumentProcessor>()
        .AddSingleton<IVectorStore, SimpleVectorStore>()
        .AddSingleton<IEmbeddingService, OllamaEmbeddingService>()
        .AddSingleton<IChatService, OllamaChatService>()
        .AddSingleton<IRagService, RagService>()
        .BuildServiceProvider();
    
    var ragService = serviceProvider.GetRequiredService<IRagService>();
    
    // Process documents
    await ragService.ProcessDocumentsAsync("./Documents");
    
    // Interactive question loop
    while (true)
    {
        Console.Write("\nYour question (type 'exit' to quit): ");
        string question = Console.ReadLine() ?? string.Empty;
        
        if (question.ToLower() == "exit") break;
        
        string answer = await ragService.GetAnswerAsync(question);
        Console.WriteLine($"\nAnswer: {answer}");
    }
}

9. Set Up and Configure Ollama

Before running your application, you need to set up Ollama properly:

  1. Install Ollama from https://ollama.ai
    • For macOS: Download and install the .dmg file
    • For Windows: Download and run the installer
    • For Linux: Use the install script curl -fsSL https://ollama.com/install.sh | sh
  2. Start the Ollama service:
    ollama serve
    This will start the Ollama API server on http://localhost:11434
  3. Pull the required models (in a new terminal window):
    # Pull the embedding model
    ollama pull nomic-embed-text
    
    # Pull the chat model
    ollama pull llama3:8b
  4. Verify the models are correctly installed:
    # List all available models
    ollama list
    You should see both "nomic-embed-text" and "llama3:8b" in the list.
  5. Test the embedding model:
    # Test embedding generation
    curl -X POST http://localhost:11434/api/embeddings -d '{
      "model": "nomic-embed-text",
      "prompt": "This is a test."
    }' | head
    You should see a JSON response with an "embedding" array containing vector values.
  6. Test the chat model:
    # Test text generation
    curl -X POST http://localhost:11434/api/generate -d '{
      "model": "llama3:8b",
      "prompt": "What is retrieval-augmented generation?",
      "stream": false
    }' | jq '.response'
    You should receive a helpful response explaining RAG. If you don't have jq installed, you can omit the | jq '.response' part.

Additional Ollama commands that might be useful:

# See details about a specific model
ollama show nomic-embed-text

# Remove a model you no longer need
ollama rm modelname

# If you need to stop the Ollama service
ollama kill

# If you need to update your models
ollama pull nomic-embed-text:latest
ollama pull llama3:8b:latest

Note: Make sure Ollama is running at all times while using your RAG application. If you experience connection issues, check if Ollama is running by visiting http://localhost:11434 in your browser or running curl http://localhost:11434.

10. Test Your Application

Build and run your application:

# Build the application
dotnet build

# Create a Documents directory if it doesn't exist
mkdir -p Documents

# Add some PDF documents to test with
# (You can use sample PDFs or technical documentation)

# Run the application dotnet run



Place some PDF documents in the ./Documents folder and start asking questions related to the content!

Troubleshooting Common Issues

  • Connection refused errors: Make sure Ollama is running with ollama serve
  • Model not found errors: Check if your models are properly installed with ollama list
  • Embedding dimension mismatch: Make sure you're using a consistent embedding model throughout your code
  • Memory issues: If processing large PDFs, you might need to adjust the chunk size or use a smaller LLM model
  • No documents found: Check that your PDFs are correctly placed in the ./Documents folder

Performance Optimization

For better performance, consider these tips:

  • Cache embeddings to disk to avoid recalculating them each time
  • Use a more efficient similarity search algorithm such as Approximate Nearest Neighbors
  • Add a background service to preprocess documents and update the vector store
  • Adjust chunk size based on your specific documents and use cases
  • Consider using a persistent vector database like Qdrant or Milvus for large document collections

Conclusion

The next time you're planning a RAG system or other LLM-powered application, don't automatically reach for Python without considering the benefits .NET might bring to your specific use case. As my implementation demonstrates, .NET provides a robust, performant foundation for building sophisticated AI applications with clean, maintainable code.

What are your thoughts? Have you tried building RAG systems with .NET, or are you considering it? I'd love to hear about your experiences in the comments below.

The complete source code for this .NET RAG console application is available on my GitHub.
repo : https://github.com/encryptedtouhid/dotnet_console_rag_ollama


Feel free to check it out, contribute, or adapt it for your own projects.
Happy Coding ðŸ‘¨‍💻

Add comment