In this series of posts, we have been building a desktop application to interface with LLaMA3. We did our setup in Part I and built our app’s backend in Part II. In part III and last part of this series we are going to build a simple client interface with SvelteKit and run our desktop application.

Recap
If you have landed strait into this post, I’d highly recommend you take a quick peek into the last 2 parts (links below) to help you get up to speed with the progress.

The series:

The series:

Goals

With the backend logic set, our goal is to take our application to a conclusion.

The final output should look like: final

Who is this for?

You’ll feel right at home if you are a programmer, have some exposure to Rust and a bit of experience working with Svelte, React or any other modern client-side framework.

Tools and Libraries

TL;DR

Github Repository

A simple interface

With our backend inference up and running, lets focus our attention on the client interface. We’ll try and keep this as simple and bare-bones as possible, just enough to get our end-to-end flow to work.

We are going to be using the tauri provided invoke API from the client side to trigger our backend handler ask(). This API is available with the npm package @tauri-apps/api. Let’s just go ahead and install it first.

npm install --save @tauri-apps/api

Our client side interface is a single page application, lets code this up step by step.

Step 1: CSS override to give our client side some structure, layout and theme

We’ll add a very simple app.css to our instruct/src directory.

instruct/src
/* Some overrides */
html {
    margin: 0;
    padding: 0;
    background-color: transparent !important;
}

body {
    font-family: sans-serif !important;
    background: rgb(48,48,48) url("") repeat !important;
    color: rgb(166,166,166) !important;
}

body * {
    font-size: 1.2rem;
}

.flex {
    display: flex;
}

.flex.flex-col {
    flex-direction: column;
}

.flex.flex-row {
    flex-direction: row;
}

.flex.center {
    align-items: center;
}

.flex.justify {
    justify-content: center;
}

input.input {
    background: rgb(48,48,48, .5);
    padding: 4px;
    color: rgb(166,166,166) !important;
}

input.input.full {
    width: 100%;
}

Then we’ll add a file instruct/src/routes/+layout.svelte and import the app.css file and define a basic layout.

instruct/src/routes/+layout.svelte
<script>
    import "../app.css";
</script>

<div class="flex flex-row justify">
    <slot/>
</div>

Step 2: We define some types to work with our QA format and inference

instruct/src/lib/types.ts
// Maintain metadata for an inference
export interface Meta {
    n_tokens: number,
    n_secs: number
}

// A simple type to hold our question, answer and meta
export interface QuestionAnswer {
    q: string,
    a: string,
    ts: Date,
    meta?: Meta
}

// A type to mirror the inference response from our backend
export interface Inference {
    text: string,
    meta?: Meta
}

Step 3: Create a component QA.svelte

An empty shell for now, but this component will just render each question <> answer turn

instruct/src/routes/QA.svelte
<script lang="ts">
    import type { QuestionAnswer } from "$lib/types";

    // the prop passed from the parent component
    export let qa: QuestionAnswer;
</script>

<!-- Todo: we'll work on this soon -->

Step 4: Modify the +page.svelte to accept input and invoke the inference

instruct/src/routes/+page.svelte
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
<script lang="ts">
    import { invoke } from '@tauri-apps/api/tauri';
    import type { Inference, QuestionAnswer } from "$lib/types";
    import Qa from "./QA.svelte";

// Holds the list of questions and answers
let qas: QuestionAnswer[] = [];
// the current question to be asked
let question: string,
// A variable to maintain when a processing is going on
    asking: boolean = false;

const command = async () => {
    let res: Inference = await invoke("ask", { text: qas[qas.length - 1].q });
    
    let idx = qas.length - 1;
    let qa: QuestionAnswer = qas[idx];
    qa.a = res.text;
    qa.meta = res.meta;

    qas = [...qas];

    asking = false;
}

// A function to actually invoke a command
const goAsk = async () => {
    asking = true;
    // We are just using a simple keyword to 
    qas.push({ q: question, a: "__asking__", ts: new Date() });
    question = "";

    qas = [...qas];

    // The inference generation is extremely resource intensive, giving our UI to update before the call
    setTimeout(() => {
        command()
    }, 100)
}

</script>

<div class="canvas flex flex-col">
    <div class="input">
        <input
            type="text"
            bind:value={question}
            on:keyup={(e) => { if(e.key == "Enter") goAsk() }}
            class="input full" placeholder="Ask your question!"
            disabled={asking}
        />
    </div>
    {#each [...qas].reverse() as qa}
        <Qa qa={qa}/>
    {/each}
</div>


<style>
.canvas {
    width: 90%;
    height: 100vh;
    padding: 24px;
    max-width: 2048px;
}
</style>

This is the svelte page that renders when we launch our application. Quite a lot happening over here, so let’s break it down.

  • We start by importing the invoke api from the @tauri-apps package we installed. We also import the types we defined in Step 2 along with the shell component QA.svelte
  • We declare a bunch of variables to maintain the current state of our app.
  • We then define a function goAsk which will be called on:keyup of the Enter key.
  • In the html, we define a layout, a text input element with a on:keyup event binding and a loop to render the results by calling the QA component.

Step 5: Showing the result

Let’s go back to the QA.svelte component and modify it to show the response. Here’s the HTML part of QA.svelte that we changed

instruct/src/routes/QA.svelte
<div class="flex flex-col qa">
    <div class="flex flex-row caption">
        <div class="time">
            {qa.ts.toLocaleString()}
        </div>
        {#if qa.a == "__asking__"}
        <div style="color: red">Thinking ...</div>
        {/if}
        {#if qa.meta}
        <div class="flex flex-row" style="margin-left: auto; gap: 8px">
            <div>Tokens: {qa.meta.n_tokens}</div>
            <div>Time: {qa.meta.n_secs}s</div>
        </div>
        {/if}
    </div>
    <div class="question">
        {qa.q}
    </div>
    {#if qa.a != "__asking__"}
    <div class="answer">
        {qa.a}
    </div>
    {/if}
</div>

This is strait forward; we are just showing the different variables from our response. The moment of truth … let’s run this app again

RUST_LOG=info cargo tauri dev --release

And it works! Our end-to-end Desktop QA App. Firs real inference

Note
Those unfamiliar with the –release flag - long story short, it just makes everything fast in rust by enabling tons of compiler optimizations but at the cost of longer compile times

Building

Our QA app is ready. But there’s no fun running this with the cargo tauri dev command. Let’s build this as a desktop app.

Tauri makes this super simple, it’s in fact just a single line

1
cargo tauri build --target aarch64-apple-darwin

We are issuing a tauri cli command to build by giving the classic rust target triplet.

Note
Adjust your target based on your cpu arch and os Tauri Build Guide

Output

Now, in Mac, hit Cmd + Space to pull up Spotlight Search and search for our app instruct, the way you’d do for any of your mac apps.

And there you go, you have built your own, personal, private QA application. Cheers to that … 🎉🎉

final

What next?

Here are some ideas for you to hack around …

  1. Waiting for response after an instruction is not cool, figure out a way of streaming response from the backend to the client.

    Hint
    Checkout Events in Tauri

  2. Use a LLaVA model instead of our text-only LLaMA3 to make your instruction backend multi-modal.

  3. Convert this instruct only format to a true chat assistant, use SQLite as a database to load and hop between chats

  4. Implement a RAG pipeline to talk to your computer’s document

  5. from here on, the possibilities are endless …

Before we wrap up …

I’d love to hear from you if you liked (or didn’t) the series. If you found an issue or hit a snag, have some feedback or just want to chat up reach out at @beingAnubhab. If you have found this series helpful, consider spreading the word, it would act as a strong motivator for me to create more.

Acknowledgements & reference

This project is built on the shoulder of stalwarts, a huge shout-out to all of them

  1. Rust maintainers for this awesome language
  2. The tauri app and it’s creators and maintainers
  3. Meta for creating LLaMA family of models and giving open-source AI a fair shot
  4. HuggingFace🤗 for everything they do
  5. Georgi Gerganov for creating GGML/ GGUF movement
  6. llama.cpp maintainers for moving at breakneck speed
  7. llama_cpp-rs maintainers for the awesome yet simple crate
  8. Quant Factory Team for the GGUF model files

And many, many more …