In this series of posts, we have been building a desktop application to interface with LLaMA3
. We did our setup in Part I and built our app’s backend in Part II. In part III and last part of this series we are going to build a simple client interface with SvelteKit
and run our desktop application.
RecapIf you have landed strait into this post, I’d highly recommend you take a quick peek into the last 2 parts (links below) to help you get up to speed with the progress.
The series:
The series:
Goals
With the backend logic set, our goal is to take our application to a conclusion.
The final output should look like:
Who is this for?
You’ll feel right at home if you are a programmer, have some exposure to Rust
and a bit of experience working with Svelte
, React
or any other modern client-side framework.
Tools and Libraries
- Rust - Install Rust
- Tauri - A cross-platform desktop app toolkit built on Rust
- SvelteKit - For the quick and simple UI
- Llama.cpp - another micro revolution of democratizing AI Models spearheaded by Georgi Gerganov
- llama_cpp-rs - a rust library that provides simple, high-level bindings over llama.cpp
TL;DR
A simple interface
With our backend inference up and running, lets focus our attention on the client
interface. We’ll try and keep this as simple and bare-bones as possible, just enough to get our end-to-end flow to work.
We are going to be using the tauri
provided invoke
API from the client side to trigger our backend handler ask()
. This API is available with the npm package @tauri-apps/api
. Let’s just go ahead and install it first.
npm install --save @tauri-apps/api
Our client side interface is a single page application
, lets code this up step by step.
Step 1: CSS override to give our client side some structure, layout and theme
We’ll add a very simple app.css
to our instruct/src
directory.
/* Some overrides */
html {
margin: 0;
padding: 0;
background-color: transparent !important;
}
body {
font-family: sans-serif !important;
background: rgb(48,48,48) url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAQAAAAECAIAAAAmkwkpAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAClJREFUeNpiFBQU5OXlZQADJmFhYQYYYPr16xeQ+vz5M4gDEYPIAwQYAJUPBptp0pKhAAAAAElFTkSuQmCC") repeat !important;
color: rgb(166,166,166) !important;
}
body * {
font-size: 1.2rem;
}
.flex {
display: flex;
}
.flex.flex-col {
flex-direction: column;
}
.flex.flex-row {
flex-direction: row;
}
.flex.center {
align-items: center;
}
.flex.justify {
justify-content: center;
}
input.input {
background: rgb(48,48,48, .5);
padding: 4px;
color: rgb(166,166,166) !important;
}
input.input.full {
width: 100%;
}
Then we’ll add a file instruct/src/routes/+layout.svelte
and import the app.css
file and define a basic layout.
<script>
import "../app.css";
</script>
<div class="flex flex-row justify">
<slot/>
</div>
Step 2: We define some types to work with our QA format and inference
// Maintain metadata for an inference
export interface Meta {
n_tokens: number,
n_secs: number
}
// A simple type to hold our question, answer and meta
export interface QuestionAnswer {
q: string,
a: string,
ts: Date,
meta?: Meta
}
// A type to mirror the inference response from our backend
export interface Inference {
text: string,
meta?: Meta
}
Step 3: Create a component QA.svelte
An empty shell for now, but this component will just render
each question <> answer turn
<script lang="ts">
import type { QuestionAnswer } from "$lib/types";
// the prop passed from the parent component
export let qa: QuestionAnswer;
</script>
<!-- Todo: we'll work on this soon -->
Step 4: Modify the +page.svelte
to accept input and invoke the inference
|
|
This is the svelte
page that renders when we launch our application. Quite a lot happening over here, so let’s break it down.
- We start by importing the
invoke
api from the@tauri-apps
package we installed. We also import thetypes
we defined in Step 2 along with the shellcomponent QA.svelte
- We declare a bunch of variables to maintain the current state of our app.
- We then define a function
goAsk
which will be calledon:keyup
of theEnter
key. - In the html, we define a layout, a text
input
element with aon:keyup
event binding and a loop to render the results by calling theQA
component.
Step 5: Showing the result
Let’s go back to the QA.svelte
component and modify it to show the response. Here’s the HTML part of QA.svelte
that we changed
<div class="flex flex-col qa">
<div class="flex flex-row caption">
<div class="time">
{qa.ts.toLocaleString()}
</div>
{#if qa.a == "__asking__"}
<div style="color: red">Thinking ...</div>
{/if}
{#if qa.meta}
<div class="flex flex-row" style="margin-left: auto; gap: 8px">
<div>Tokens: {qa.meta.n_tokens}</div>
<div>Time: {qa.meta.n_secs}s</div>
</div>
{/if}
</div>
<div class="question">
{qa.q}
</div>
{#if qa.a != "__asking__"}
<div class="answer">
{qa.a}
</div>
{/if}
</div>
This is strait forward; we are just showing the different variables from our response. The moment of truth … let’s run this app again
RUST_LOG=info cargo tauri dev --release
And it works! Our end-to-end Desktop QA
App.
NoteThose unfamiliar with the –release flag - long story short, it just makes everything fast in rust by enabling tons of compiler optimizations but at the cost of longer compile times
Building
Our QA app is ready. But there’s no fun running this with the cargo tauri dev
command. Let’s build this as a desktop app.
Tauri makes this super simple, it’s in fact just a single line
|
|
We are issuing a tauri cli
command to build by giving the classic rust target triplet
.
NoteAdjust your target based on your cpu arch and os Tauri Build Guide
Output
Now, in Mac, hit Cmd + Space
to pull up Spotlight Search
and search for our app instruct
, the way you’d do for any of your mac apps.
And there you go, you have built your own, personal, private QA application. Cheers to that … 🎉🎉
What next?
Here are some ideas for you to hack around …
Waiting for response after an instruction is not cool, figure out a way of streaming response from the backend to the client.
HintCheckout Events in TauriUse a LLaVA model instead of our text-only LLaMA3 to make your instruction backend multi-modal.
Convert this
instruct
only format to a truechat
assistant, useSQLite
as a database to load and hop between chatsImplement a RAG pipeline to talk to your computer’s document
from here on, the possibilities are endless …
Before we wrap up …
I’d love to hear from you if you liked (or didn’t) the series. If you found an issue or hit a snag, have some feedback or just want to chat up reach out at @beingAnubhab. If you have found this series helpful, consider spreading the word, it would act as a strong motivator for me to create more.
Acknowledgements & reference
This project is built on the shoulder of stalwarts, a huge shout-out to all of them
- Rust maintainers for this awesome language
- The tauri app and it’s creators and maintainers
- Meta for creating LLaMA family of models and giving open-source AI a fair shot
- HuggingFace🤗 for everything they do
- Georgi Gerganov for creating GGML/ GGUF movement
- llama.cpp maintainers for moving at breakneck speed
- llama_cpp-rs maintainers for the awesome yet simple crate
- Quant Factory Team for the GGUF model files
And many, many more …