Exploring the new AI chat template

.Net

In this post I explore the new .NET AI Chat Web App template (currently in preview) to create a chat application and take a brief look at everything it provides. In the next post I then customize the app so that instead of ingesting PDFs, it ingests the contents of a website and uses that data to answer questions in the chat.

Getting started with the new .NET AI Chat Web App template

The .NET AI Chat Web App is a new template that shows how to get started building a chat style application backed by a large language model (LLM). Chat apps are one of the most prolific use cases for AI (obviously heavily popularised by ChatGPT), and while they’re not always the best way to “add AI” to your app, they can have their uses.

To install the AI template, you can run the following command

dotnet new install Microsoft.Extensions.AI.Templates

This installs the template, making the AI Chat Web App template available using name aichatweb:

> dotnet new install Microsoft.Extensions.AI.Templates The following template packages will be installed: Microsoft.Extensions.AI.Templates Success: Microsoft.Extensions.AI.Templates::9.4.0-preview.2.25216.9 installed the following templates: Template Name Short Name Language Tags

AI Chat Web App aichatweb [C#] Common/AI/Web/Blazor/.NET Aspire

The template includes various options to control how it works, but there are three main aspects to consider:

Do you want the app configured to use Aspire? Yes, of course you do 😉
Which LLM provider do you want to use to power the chat interface:
- GitHub Models. This is a great getting-started option, as it’s free for developers, and is what I use in this post.
- OpenAI. Uses the OpenAI API Platform.
- Azure OpenAI. Uses the Azure OpenAI Service.
- Ollama. Runs locally on your machine, using Ollama with the llama3.2 and all-minilm models.
Which vector embedding store do you want to use for ingesting the data:
- Local. Uses a JSON file on disk, which is great for prototyping.
- Azure AI Search. Uses Azure AI Search, which manages the data ingestion automatically.
- Qdrant. Runs the Qdrant vector database in a docker container.

If you’re new to LLMs and AI then that might all be a bit overwhelming, but there’s basically two different concepts to understand here:

The LLM provider is what provides the AI interface that powers the chat.
The Vector embedding is how the LLM model “ingests” data. For each document you provide it, the LLM provides an array of numbers that you store in a database (or in a JSON file in the local case). The LLM can then run queries against the database and find “similar” data.

For this post, I chose to use GitHub Models for the LLM provider, as it’s free and very easy to get up and running (as I’ll show shortly). For the vector store I chose to store the data locally.

These are the default values for the template for good reason, as they’re pretty much the quickest way to get up and running. You wouldn’t choose these options for production, but they’re ideal for prototyping.

To install the template, you can either use your IDE, or you can use the .NET CLI like so:

dotnet new aichatweb </span>
– output ModernDotNetShowChat
–provider githubmodels </span>
–vector-store local </span>
–aspire true

This creates a full solution consisting of:

A Web project containing the chat app
A “Service Defaults” project—A suggested best practice for creating Aspire applications these days
An “App Host” project—The Aspire host project, that wires up the dependencies

There’s also a .sln file you can open in your IDE:

The solution layout

Inside the solution folder is a README.md file that describes the remaining configuration. For our setup, there’s just one step we need to take: configuring GitHub Models.

Using GitHub Models

The README.md file contains instructions for getting started with GitHub Models:

To use models hosted by GitHub Models, you will need to create a GitHub personal access token. The token should not have any scopes or permissions. See Managing your personal access tokens.

GitHub Models is a service from GitHub that provides developers an easy way to prototype with LLMs and Generative AI. All that you need is a GitHub account, and you can be running against all the latest models from OpenAI and others without having to sign up to those services directly.

GitHub Models is strictly for “prototyping” so it comes with some hefty usage limits and content filters.

To get started with GitHub models you just need to choose a model and retrieve a token:

Go to github.com/marketplace/models.
Click Model: Select a Model at the top left of the page.
Choose a model from the dropdown menu.

After selecting a model, you’ll see the screen below. As you can see, you can get SDK details and see various other configuration options:

The getting started page from GitHub models

We don’t need any to worry about any of that SDK information, as that’s already handled by the .NET NuGet packages and the template. All you need is a personal access token (PAT):

Click Get developer key on the above screen
This redirects to GitHub’s token management page: https://github.com/settings/tokens
Click Generate new token, and generate a fine grained token (not classic)
Enter a name for the token, and an expiry, but do not add any permissions or roles.

After creating the token, you can add it as a secret to your application. You need to add the token as a connection string inside the Aspire AppHost project. You can do that using the IDE editor integration in Visual Studio or Rider, or you can use the command line. For example, for my app, I ran the following (replacing YOUR-API-KEY with the token value):

cd ModernDotNetShowChat.AppHost
dotnet user-secrets set ConnectionStrings:openai “Endpoint=https://models.inference.ai.azure.com;Key=YOUR-API-KEY”

As you can probably tell from the above setting, GitHub Models runs using Azure OpenAI Service, hence the reference to Azure in the connection string. If you choose a different LLM provider then these settings will be different.

With the secret added, everything is ready to take the template for a spin.

Briefly trying out the template

Before we dig in further, we’ll take the standard template for a spin.

You can read more about getting started in the Preview 2 announcement post for the template.

You run the app by running the Aspire AppHost project. This starts the web app (and passes in all the required connection strings). The web app then runs an “ingestion” process against 2 pdf files (about watches) that are available in the content folder. More on this later.

The web app is a “traditional” chat application, just like you’ve seen with ChatGPT or GitHub Copilot Chat. This interface lets you ask questions about the PDFs that were ingested. In the example below I asked the question “Which watches are available”:

Trying out the default template

The chat assistant interprets your question and decides what phrases to search for in the documents. It then answers your question based on the details it finds in the documents, and even provides a link to the file that contains the answer.

This general technique of providing “sources” for the LLM to use, instead of relying on the built-in knowledge is called retrieval-augmented generation (RAG), and is one way to try to ensure that the LLM provides answers grounded in facts. It involves ingesting source data, encoding it as vectors in a vector store, and making this store available to the LLM.

That’s pretty much all there is to the app, but it provides a powerful template that you can extend and modify to work for your own application. For the remainder of the post I look at a couple of points of interest about the template.

The Aspire App Host

We’ll start by looking at the Aspire App Host. This is where the general architecture of the app is defined, and which reveals that there are essentially three components:

The OpenAI (via GitHub models) connection
The Blazor web app
A SQLite database as a cache for the generated embeddings

You can see all this configured in the Program.cs file:

var builder = DistributedApplication.CreateBuilder(args);
var openai = builder.AddConnectionString(“openai”);
var ingestionCache = builder.AddSqlite(“ingestionCache”);
var webApp = builder.AddProject<Projects.ModernDotNetShowChat_Web>(“aichatweb-app”);
webApp.WithReference(openai);
webApp
.WithReference(ingestionCache)
.WaitFor(ingestionCache);
builder.Build().Run();

When you run the AppHost, Aspire initializes the SQLite database and starts the web app, passing in the connection strings.

The web app

The main application is a Blazor Server app. In addition to the standard Blazor and ASP.NET Core services, it contains three main components:

The GitHub Models/OpenAI chat client. This is the core infrastructure for interacting with the OpenAI service using the OpenAI NuGet package and Microsoft.Extensions.AI abstractions. This also registers an embedding generator for OpenAI for converting the ingested documents into vectors that can later be serialised.
The vector store is responsible for persisting the generated embeddings, and for providing the mechanism for OpenAI to search the store as required.
The ingestion cache is an EF Core DbContext that keeps track of which documents have been ingested into the vector store. This allows the app to avoid ingesting the same documents multiple times as well as to remove documents that are no longer available.

The configuration of these services all happens in the Program.cs file of the web app, as shown below. I haven’t reproduced the whole file here, just the configuration related to the above components:

var builder = WebApplication.CreateBuilder(args);
// Add the OpenAI chat client to the container
var openai = builder.AddAzureOpenAIClient(“openai”);
openai.AddChatClient(“gpt-4o-mini”) // Use the ChatGPT 4o mini model
.UseFunctionInvocation()  // Allow the LLM to call local functions in your app
.UseOpenTelemetry(configure: c => // Configure OTel for 
c.EnableSensitiveData = builder.Environment.IsDevelopment());
openai.AddEmbeddingGenerator(“text-embedding-3-small”); // Allow generating text embeddings
// Add an IVectorStore implementation that stores the embeddings in a JSON file
var vectorStore = new JsonVectorStore(Path.Combine(AppContext.BaseDirectory, “vector-store”));
builder.Services.AddSingleton<IVectorStore>(vectorStore);
builder.Services.AddScoped<DataIngestor>(); // Used to ingest embeddings
builder.Services.AddSingleton<SemanticSearch>(); // Used to search embeddings
// Add the EF Core DbContext for tracking which files have been ingested
builder.AddSqliteDbContext<IngestionCacheDbContext>(“ingestionCache”);

When the app starts, it ensures the SQLite database has been created, starts the web app, and then starts the data ingestion:

await DataIngestor.IngestDataAsync(
app.Services,
new PDFDirectorySource(Path.Combine(builder.Environment.WebRootPath, “Data”)));

Much of the chat application uses standard NuGet packages for interacting with the LLM, however the DataIngestor and PdfDirectorySource implementations are specific to the template, and show a general approach to generating text embeddings.

Ingesting data and generating embeddings

The DataIngestor implementation defined in the template manages the storing of text embedding vectors in an IVectorStore based on the implementation in an IIngestionSource, using the IngestionCacheDbContext to track which documents have been previously ingested.

The implementation, reproduced below, is pretty self-explanatory, but I’ve added a few extra comments for clarity:

public class DataIngestor(
ILogger<DataIngestor> logger,
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator,
IVectorStore vectorStore,
IngestionCacheDbContext ingestionCacheDb)
{
public async Task IngestDataAsync(IIngestionSource source)
{
// Get or create the “collection” for holding the embeddings in the vector store
var vectorCollection = vectorStore.GetCollection<string, SemanticSearchRecord>(“data-moderndotnetshowchat-ingested”);
await vectorCollection.CreateCollectionIfNotExistsAsync();
    <span class="token comment">// Read which documents have already been ingested from the SQLite cache</span>
    <span class="token class-name"><span class="token keyword">var</span></span> documentsForSource <span class="token operator">=</span> ingestionCacheDb<span class="token punctuation">.</span>Documents
        <span class="token punctuation">.</span><span class="token function">Where</span><span class="token punctuation">(</span>d <span class="token operator">=&gt;</span> d<span class="token punctuation">.</span>SourceId <span class="token operator">==</span> source<span class="token punctuation">.</span>SourceId<span class="token punctuation">)</span>
        <span class="token punctuation">.</span><span class="token function">Include</span><span class="token punctuation">(</span>d <span class="token operator">=&gt;</span> d<span class="token punctuation">.</span>Records<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// Ask the IIngestionSource for a list of files to delete</span>
    <span class="token class-name"><span class="token keyword">var</span></span> deletedFiles <span class="token operator">=</span> <span class="token keyword">await</span> source<span class="token punctuation">.</span><span class="token function">GetDeletedDocumentsAsync</span><span class="token punctuation">(</span>documentsForSource<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// Delete the removed files from the IVectorStore and the SQLite cache</span>
    <span class="token keyword">foreach</span> <span class="token punctuation">(</span><span class="token class-name"><span class="token keyword">var</span></span> deletedFile <span class="token keyword">in</span> deletedFiles<span class="token punctuation">)</span>
    <span class="token punctuation">{</span>
        logger<span class="token punctuation">.</span><span class="token function">LogInformation</span><span class="token punctuation">(</span><span class="token string">"Removing ingested data for {file}"</span><span class="token punctuation">,</span> deletedFile<span class="token punctuation">.</span>Id<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">await</span> vectorCollection<span class="token punctuation">.</span><span class="token function">DeleteBatchAsync</span><span class="token punctuation">(</span>deletedFile<span class="token punctuation">.</span>Records<span class="token punctuation">.</span><span class="token function">Select</span><span class="token punctuation">(</span>r <span class="token operator">=&gt;</span> r<span class="token punctuation">.</span>Id<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        ingestionCacheDb<span class="token punctuation">.</span>Documents<span class="token punctuation">.</span><span class="token function">Remove</span><span class="token punctuation">(</span>deletedFile<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">await</span> ingestionCacheDb<span class="token punctuation">.</span><span class="token function">SaveChangesAsync</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// Ask the IIngestionSource for a list of new or modified files to ingest</span>
    <span class="token class-name"><span class="token keyword">var</span></span> modifiedDocs <span class="token operator">=</span> <span class="token keyword">await</span> source<span class="token punctuation">.</span><span class="token function">GetNewOrModifiedDocumentsAsync</span><span class="token punctuation">(</span>documentsForSource<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// For each new/modified document:</span>
    <span class="token comment">// - Delete the embeddings if they already exist (changed files)</span>
    <span class="token comment">// - Generate the embeddings for the document</span>
    <span class="token comment">// - Save the embeddings in the IVectorStore</span>
    <span class="token comment">// - Record the updated status in the SQLite cache</span>
    <span class="token keyword">foreach</span> <span class="token punctuation">(</span><span class="token class-name"><span class="token keyword">var</span></span> modifiedDoc <span class="token keyword">in</span> modifiedDocs<span class="token punctuation">)</span>
    <span class="token punctuation">{</span>
        logger<span class="token punctuation">.</span><span class="token function">LogInformation</span><span class="token punctuation">(</span><span class="token string">"Processing {file}"</span><span class="token punctuation">,</span> modifiedDoc<span class="token punctuation">.</span>Id<span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token keyword">if</span> <span class="token punctuation">(</span>modifiedDoc<span class="token punctuation">.</span>Records<span class="token punctuation">.</span>Count <span class="token operator">&gt;</span> <span class="token number">0</span><span class="token punctuation">)</span>
        <span class="token punctuation">{</span>
            <span class="token keyword">await</span> vectorCollection<span class="token punctuation">.</span><span class="token function">DeleteBatchAsync</span><span class="token punctuation">(</span>modifiedDoc<span class="token punctuation">.</span>Records<span class="token punctuation">.</span><span class="token function">Select</span><span class="token punctuation">(</span>r <span class="token operator">=&gt;</span> r<span class="token punctuation">.</span>Id<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>

        <span class="token class-name"><span class="token keyword">var</span></span> newRecords <span class="token operator">=</span> <span class="token keyword">await</span> source<span class="token punctuation">.</span><span class="token function">CreateRecordsForDocumentAsync</span><span class="token punctuation">(</span>embeddingGenerator<span class="token punctuation">,</span> modifiedDoc<span class="token punctuation">.</span>Id<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">await</span> <span class="token keyword">foreach</span> <span class="token punctuation">(</span><span class="token class-name"><span class="token keyword">var</span></span> id <span class="token keyword">in</span> vectorCollection<span class="token punctuation">.</span><span class="token function">UpsertBatchAsync</span><span class="token punctuation">(</span>newRecords<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token punctuation">}</span>

        modifiedDoc<span class="token punctuation">.</span>Records<span class="token punctuation">.</span><span class="token function">Clear</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        modifiedDoc<span class="token punctuation">.</span>Records<span class="token punctuation">.</span><span class="token function">AddRange</span><span class="token punctuation">(</span>newRecords<span class="token punctuation">.</span><span class="token function">Select</span><span class="token punctuation">(</span>r <span class="token operator">=&gt;</span> <span class="token keyword">new</span> <span class="token constructor-invocation class-name">IngestedRecord</span> <span class="token punctuation">{</span> Id <span class="token operator">=</span> r<span class="token punctuation">.</span>Key<span class="token punctuation">,</span> DocumentId <span class="token operator">=</span> modifiedDoc<span class="token punctuation">.</span>Id <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token keyword">if</span> <span class="token punctuation">(</span>ingestionCacheDb<span class="token punctuation">.</span><span class="token function">Entry</span><span class="token punctuation">(</span>modifiedDoc<span class="token punctuation">)</span><span class="token punctuation">.</span>State <span class="token operator">==</span> EntityState<span class="token punctuation">.</span>Detached<span class="token punctuation">)</span>
        <span class="token punctuation">{</span>
            ingestionCacheDb<span class="token punctuation">.</span>Documents<span class="token punctuation">.</span><span class="token function">Add</span><span class="token punctuation">(</span>modifiedDoc<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>

    <span class="token keyword">await</span> ingestionCacheDb<span class="token punctuation">.</span><span class="token function">SaveChangesAsync</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    logger<span class="token punctuation">.</span><span class="token function">LogInformation</span><span class="token punctuation">(</span><span class="token string">"Ingestion is up-to-date"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
}

I won’t dive into the IIngestionSource implementation in this post, as I’ll take a closer look at an alternative implementation in the next post. At a high level, the PDFDirectorySource:

Reads the list of .pdf files available in the source directory.
Uses the last write time for the file to determine whether it has been modified or not.
Uses the PdfPig library to read the text of the PDF document.
Generates an embedding for each paragraph, of each file.

The chat flow and embeddings

So how does this all come together?

The core of the implementation is in the Chat.razor component. This component configures the IChatClient with a system prompt and a tool/function invocator which the LLM can use to search the local embeddings by invoking SearchAsync().

The system prompt and tool are provided as follows:

private const string SystemPrompt = @"
You are an assistant who answers questions about information you retrieve.
Do not answer questions about anything else.
Use only simple markdown to format your responses.
Use the search tool to find relevant information. When you do this, end your
reply with citations in the special XML format:

&lt;citation filename='string' page_number='number'&gt;exact quote here&lt;/citation&gt;

Always include the citation in your response if there are results.

The quote must be max 5 words, taken word-for-word from the search result, and is the basis for why the citation is relevant.
Don't refer to the presence of citations; just emit these tags right at the end, with no surrounding text.
"</span><span class="token punctuation">;</span>
private readonly ChatOptions chatOptions = new();
private readonly List<ChatMessage> messages = new();
protected override void OnInitialized()
{
messages.Add(new(ChatRole.System, SystemPrompt));
chatOptions.Tools = [AIFunctionFactory.Create(SearchAsync)];
}[Description(“Searches for information using a phrase or keyword”)]
private async Task<IEnumerable<string>> SearchAsync(
[Description(“The phrase to search for.")] string searchPhrase,
[Description(“If possible, specify the filename to search that file only. If not provided or empty, the search includes all files.")] string? filenameFilter = null)
{
await InvokeAsync(StateHasChanged);
IReadOnlyList<SemanticSearchRecord> results = await Search.SearchAsync(searchPhrase, filenameFilter, maxResults: 5);
return results.Select(result =>
$"<result filename="{result.FileName}" page_number="{result.PageNumber}">{result.Text}</result>");
}

The system prompt here is interesting; it shows how the prompt tries very hard to restrict the LLM to only producing facts based on the files provided rather than the inherent “knowledge” it has. From what I’ve seen from my testing, this seems to work pretty well!

That’s as far as I’m going to go in this post. In the next post I describe an experiment which starts from this template and modifies it to ingest data from a website instead, so that you can chat and ask questions about the website instead.

Summary

In this post I introduced the new .NET AI Chat Web App template (currently in preview) and showed the default experience of chatting about PDF files. I then described the basic mechanics of the template, and showed some of the code and services around the core features of data ingestion and embedding generation. In the next post I show how you can modify the template to ingest data from a website instead.