Part 2: Desktop App for Document QA with RAG - Vector Storage | Blog

In this series we building our own local Retrieval Augmented Generation (RAG) desktop application for Document QA.

We got our Embedding flow working along with some smart text-splitting techniques in our previous post. In this post we are building a toy (but working) Vector Database to store, search and retrieve data based on the document embeddings.

Series Snapshot
Part 1: we implement Embedding generation from text data. We used Stella_en_1.5B_v5 and it’s Candle Transformers’ implementation as the embedding model and used the crate text-splitter to split our text into meaningful chunks.
Part 2 (this): we build our own mini Vector Store inspired by Spotify’s ANNOY.
Part 3: we code up a pipeline to analyze and extract text from .pdf files and also set the foundation for text generation with a LLaMA Model.
Part 4: we work on the retrieve-and-answer flow from our corpus..
Part 5: we implement and evaluate some techniques for a better RAG.

TL; DR

Github

Output

Note: This video has been sped up

About Vector Store

A vector store is a specialized database designed to efficiently store, search, and manage high-dimensional vectors, such as embeddings. The primary purpose of a vector store is to enable fast and accurate similarity searches, such as:

Nearest neighbor searches
Approximate nearest neighbor searches
Range searches

Ideally a vector store or vector database should have the following features

High-performance indexing: Efficient indexing of high-dimensional vectors for fast search and retrieval.
Similarity search: Support for various similarity measures (e.g., cosine similarity, Euclidean distance).
Scalability: Ability to handle large volumes of vectors and scale horizontally.
Low-latency queries: Fast query response times, even with high-dimensional vectors.
Data management: Support for vector CRUD operations.

Vector stores come in many shapes and form, from open source to fully-managed, embedded stores to distributed databases. Some well known battle-tested databases have also started to offer vector storage functionalities. I’ll just mention a few common databases I’ve worked with, evaluated or know of. The decision ultimately will be governed by the use-case, resource and familiarity.

For specialized vector stores take a look at Faiss, Pinecone, Milvus, qDrant or Chroma DB. I’ve always found the simplicity of Spotify ANNOY fascinating. Large battle-tested databases also provide vector storage functionalities like MongoDB Atlas or Postgres PGVector. This list is non-exhaustive.

Note
You might find this article about vector databases helpful.

Which Vector Store do we use?

Now, that brings us to the next BIG decision, which one do we use? We need something small, nimble yet fast. Since our end-goal is to perform local RAG, ideally, we’d like to avoid an elaborate setup, docker containers or dedicated server running.

I’d say none of them, let’s build our own bare minimum vector store. All we need is a reasonably efficient way to look up approximate nearest neighbor from the embeddings we generate and, hopefully, the process of building one ends up demystifying the fundamentals of efficient vector search for us.

Warning
Our mini vector store is highly experimental and should not be used for production.

Mini Vector Store

Let's outline the two fundamental components of our Vector store:

It should have a way of indexing and retrieving (searching) embeddings
It should have a way of storing the data and building indexes from disk

Indexing

Alas, not a from scratch implementation that I was gunning for. An obligatory Google search on vector search in rust led me to this wonderful article by Fennel AI folks where they go about providing a nice and intuitive explanation and rust implementation of Approximate Nearest Neighbor implementation based on Locality Sensitive Hashing.

Reccomendation
Please give this article a read to understand the underlying concepts. Here’s the code associated with the article that we are going to be building on.

I’ll just go ahead and modify this implementation to work with Candle Tensor instead of Vec<f32>. We’ll use rayon for parallelization and dashmap crate for fast concurrent hashmaps, so don’t forget to add them to your dependencies in src-tauri/Cargo.toml.

Let’s create a new file src-tauri/src/ann.rs and declare it in our src-tauri/src/lib.rs.

We’ll start by difining the structs to represent the HyperPlane and the inner Nodes of the index.

src-tauri/src/ann.rs

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// .. imports omitted ...

struct HyperPlane {
    coefficients: Tensor,
    constant: f32,
}

impl HyperPlane {
    pub fn point_is_above(&self, point: &Tensor) -> candle_core::Result<bool> {
        Ok((self.coefficients.mul(point)?.sum(0)?.to_scalar::<f32>()? + self.constant) >= 0.)
    }
}

enum Node {
    Inner(Box<InnerNode>),
    Leaf(Box<LeafNode>),
}

struct LeafNode(Vec<usize>);

struct InnerNode {
    hyperplane: HyperPlane,
    left_node: Node,
    right_node: Node,
}

Because we are using Candle Tensor instead of Vec<f32> we have done away with the constant generic parameter N across the implementation. Candle Tensor also comes with standard tensor ops and making custom implementations of Vector Add, Sub, Mul etc. redundant.

Let’s define the struct ANNIndex and it’s methods for index and search.

src-tauri/src/ann.rs

 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
pub struct ANNIndex {
    trees: Vec<Node>,
    ids: Vec<usize>,
    values: Tensor,
}

impl ANNIndex {
    fn build_hyperplane(
        indexes: &[usize],
        all_vecs: &Tensor,
    ) -> candle_core::Result<(HyperPlane, Vec<usize>, Vec<usize>)> {
        // Choosing random indices
        let sample = indexes
            .choose_multiple(&mut rand::thread_rng(), 2)
            .collect::<Vec<_>>();
        // cartesian eq for hyperplane n * (x - x_0) = 0
        // n (normal vector) is the coefs x_1 to x_n
        let (a, b) = (*sample[0], *sample[1]);

        // Calculate the midpoint
        let coefficients = (all_vecs.i(b) - &all_vecs.i(a)?)?;
        let point_on_plane = ((all_vecs.i(b)? + &all_vecs.i(a)?)? / 2.)?;

        // And build the hyperplane
        let constant = coefficients
            .mul(&point_on_plane)?
            .sum(0)?
            .to_scalar::<f32>()?;

        let hyperplane = HyperPlane {
            coefficients,
            constant,
        };

        // Classify the vectors as being above or below
        let (mut above, mut below) = (vec![], vec![]);
        for &id in indexes.iter() {
            if hyperplane
                .point_is_above(&all_vecs.i(id)?)
                .map_or(false, |d| d)
            {
                above.push(id)
            } else {
                below.push(id)
            };
        }
        Ok((hyperplane, above, below))
    }

    fn build_a_tree(
        max_size: usize,
        indexes: &[usize],
        data: &Tensor,
    ) -> candle_core::Result<Node> {
        // If the number of elements in a tree has reached the configurable `max size of node` then move on
        if indexes.len() <= max_size {
            return Ok(Node::Leaf(Box::new(LeafNode(indexes.to_vec()))));
        }

        // Recursively keep constructing the tree
        let (plane, above, below) = Self::build_hyperplane(indexes, data)?;
        let node_above = Self::build_a_tree(max_size, &above, data)?;
        let node_below = Self::build_a_tree(max_size, &below, data)?;

        Ok(Node::Inner(Box::new(InnerNode {
            hyperplane: plane,
            left_node: node_below,
            right_node: node_above,
        })))
    }

    fn tree_result(query: &Tensor, n: usize, tree: &Node, candidates: &DashSet<usize>) -> usize {
        // take everything in node, if still needed, take from alternate subtree
        match tree {
            Node::Leaf(box_leaf) => {
                let leaf_values = &(box_leaf.0);
                let num_candidates_found = n.min(leaf_values.len());

                leaf_values
                    .iter()
                    .take(num_candidates_found)
                    .for_each(|&c| {
                        candidates.insert(c);
                    });
                num_candidates_found
            }
            Node::Inner(inner) => {
                let above = (inner)
                    .hyperplane
                    .point_is_above(query)
                    .map_or(false, |d| d);
                let (main, backup) = match above {
                    true => (&(inner.right_node), &(inner.left_node)),
                    false => (&(inner.left_node), &(inner.right_node)),
                };
                match Self::tree_result(query, n, main, candidates) {
                    k if k < n => k + Self::tree_result(query, n - k, backup, candidates),
                    k => k,
                }
            }
        }
    }

    /// API for building the index
    pub fn build_index(
        num_trees: usize,
        max_size: usize,
        data: &Tensor,
        vec_ids: &[usize],
    ) -> candle_core::Result<Self> {
        // Trees hold an index into the [unique_vecs] list which is not
        // necessarily its id, if duplicates existed
        let trees = (0..num_trees)
            .map(|_| Self::build_a_tree(max_size, vec_ids, data).unwrap())
            .collect::<Vec<_>>();
        Ok(Self {
            trees,
            ids: vec_ids.to_owned(),
            values: data.to_owned(),
        })
    }

    /// API for search
    pub fn search_approximate(
        &self,
        query: &Tensor,
        top_k: usize,
        cutoff: Option<f32>,
    ) -> candle_core::Result<Vec<(usize, f32)>> {
        let candidates = DashSet::new();
        // Find approximate matches
        self.trees.par_iter().for_each(|tree| {
            Self::tree_result(query, top_k, tree, &candidates);
        });

        // Create lists for return
        let mut approx = vec![];
        let mut idxs = vec![];

        candidates
            .into_iter()
            .filter_map(|i| self.values.get(i).ok().map(|t| (i, t)))
            .for_each(|(idx, t)| {
                approx.push(t);
                idxs.push(idx);
            });

        // sort by distance
        // First create a tensor of shape top_k, 1024
        let cutoff = cutoff.map_or(0., |c| c);
        let approx = Tensor::stack(&approx, 0)?;
        let mut res = idxs
            .into_iter()
            .zip(query.matmul(&approx.t()?)?.squeeze(0)?.to_vec1::<f32>()?)
            .filter_map(|(idx, d)| {
                if d >= cutoff {
                    Some((self.ids[idx], d))
                } else {
                    None
                }
            })
            .collect::<Vec<_>>();

        res.par_sort_unstable_by(|a, b| b.1.total_cmp(&a.1));

        Ok(res[0..top_k.min(res.len())].to_vec())
    }
}

We’ve introduced a new control parameter cutoff which, if passed, would consider results above the given cutoff value.

Time to test the indexing!

Note
Let’s use the popular wiki-news dataset to run some of our tests, json variant downloadable here and a trimmed down subset in the repo.

src-tauri/src/ann.rs

211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
#[cfg(test)]
mod tests {
    // imports ommitted

    #[derive(Deserialize)]
    struct WikiNews {
        title: String,
        text: String,
    }

    const NUM_CHUNKS: usize = 16;

    #[test]
    fn index_and_fetch() -> Result<()> {
        let data = std::fs::read_to_string("../test-data/wiki-smoll.json")?;
        let data = serde_json::from_str::<Vec<WikiNews>>(&data)?
            .iter()
            .map(|d| format!("## {}\n{}", d.title, d.text))
            .collect::<Vec<_>>();

        let mut embed = Embed::new(Path::new("../models"))?;

        let chunks = data.chunks(Embed::STELLA_MAX_BATCH).take(NUM_CHUNKS);
        let mut all_tensors = vec![];
        for c in chunks {
            if let Ok(e) = embed.embeddings(c) {
                all_tensors.push(e);
            } else {
                continue;
            }
        }

        let tensor = Tensor::stack(&all_tensors, 0)?
            .reshape((Embed::STELLA_MAX_BATCH * NUM_CHUNKS, 1024))?;

        let store = ANNIndex::build_index(
            16,
            16,
            &tensor,
            &(0..Embed::STELLA_MAX_BATCH * NUM_CHUNKS).collect::<Vec<_>>()[..],
        )?;

        println!("Indexed!!");
        let qry = embed.query(&["What are the latest news about Iraq?".to_string()])?;
        let res = store.search_approximate(&qry, 4, Some(0.35))?;

        println!("Found matches: {}", res.len());
        for (idx, similarity) in res.iter() {
            println!("---------------\n{}\n[{idx} {}]", data[*idx], similarity);
        }
        Ok(())
    }
}

Note
Be patient, this is going to take a few minutes!

cd src-tauri
cargo test index_and_fetch --release -- --nocapture

There you go .. the results seem pretty relevant to our query What are the latest news about Iraq?

Storage

Our storage strategy is simple, we’ll maintain everything we need to index and search the data in 3 files -

A .store file for struct Store serialized, this acts like a master lookup and would serve as the primary entry point for our search and indexing. I’ll use bincode for this but it could be any serialization format.
embed.data would store the embeddings Tensor as safetensors. On application init, ANNIndex would be built from this data.
text.data file would save the chunked text corresponding to the tensors. We’ll maintain the starting position and length of each text chunk in struct Store that way when we want to find a particular chunk, we can seek to the start position of the chunk and read through the length for data retrieval.

Let’s create a new file src-tauri/src/store.rs and declare it in src-tauri/src/lib.rs.

src-tauri/src/store.rs

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
// ..import omitted ..

const EMBED_FILE: &str = "embed.data";
const TEXT_FILE: &str = "text.data";
const STORE_FILE: &str = ".store";
const EMBED_TENSOR_NAME: &str = "embed";

/// The store would represent data that is indexed.
/// Upon initiation it'd read (or create) a .store, text.data, embed.data file
/// In `text.data` file we'll maintain bytes of text split into embedding chunks. The start index byte, the length of the chunk and some more metadata will be maintained
/// In `embed.data` file we'll maintain byte representations of the tensor, one per each segment.
/// `.store` file will maintain a bincode serialized representation of the `struct Store`
#[derive(Serialize, Deserialize, Default)]
pub struct Store {
    data: Vec<Data>,
    dir: PathBuf,
    text_size: usize,
    #[serde(skip)]
    data_file: Option<Arc<Mutex<BufReader<File>>>>,
    #[serde(skip)]
    index: Option<ANNIndex>,
    #[serde(skip)]
    bm25: Option<SearchEngine<usize>>,
}

/// in `struct Data` we maintain the source file of the data and along with it the chunk of text being indexed
#[derive(Serialize, Deserialize, Debug)]
pub struct Data {
    file: FileKind,
    start: usize,
    length: usize,
    deleted: bool,
    indexed: bool,
}

impl Data {
    pub fn file(&self) -> &FileKind {
        &self.file
    }
}

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq)]
pub enum FileKind {
    Html(PathBuf),
    Pdf((PathBuf, usize)),
    Text(PathBuf),
}

Ignore the bm25 field for now, we’ll visit this when we work on the retrieval strategies.

Time to code up the APIs to load, store and create the indices.

src-tauri/src/store.rs

 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
// code omitted

impl Store {
    /// Loads data and initializes indexes if files are present in the given directory or creates them
    pub fn load_from_file(
        dir: &Path,
        num_trees: Option<usize>,
        max_size: Option<usize>,
    ) -> Result<Self> {
        let storefile = dir.join(STORE_FILE);
        let store = if storefile.is_file() {
            let mut store = fs::File::open(storefile)?;
            let mut buf = vec![];
            store.read_to_end(&mut buf)?;

            let mut store = bincode::deserialize::<Store>(&buf)?;

            store.data_file = Some(Arc::new(Mutex::new(BufReader::new(File::open(
                dir.join(TEXT_FILE),
            )?))));

            if let Ok(m) = dir.join(EMBED_FILE).metadata() {
                if m.len() > 0 {
                    store.build_index(num_trees.map_or(16, |n| n), max_size.map_or(16, |sz| sz))?;
                }
            }

            store
        } else {
            fs::File::create_new(storefile)?;
            fs::File::create_new(dir.join(EMBED_FILE))?;
            fs::File::create_new(dir.join(TEXT_FILE))?;

            let mut store = Self {
                dir: dir.to_path_buf(),
                ..Default::default()
            };

            store.data_file = Some(Arc::new(Mutex::new(BufReader::new(File::open(
                dir.join(TEXT_FILE),
            )?))));

            store.save()?;

            store
        };

        Ok(store)
    }

    /// Accepts an iterator of (text, embeddings of text, FileKind) and adds it to the indexed list
    pub fn insert(
        &mut self,
        text_file: &mut File,
        data: &mut impl Iterator<Item = (String, Tensor, FileKind)>,
    ) -> Result<()> {
        // Initialize with a large size to avoid frequent re-allocation
        let mut txt_data = Vec::with_capacity(4096);
        let mut tensors = Vec::with_capacity(64);

        while let Some((txt, tensor, file)) = data.by_ref().next() {
            let txt = txt.as_bytes();
            let data = Data {
                file,
                start: self.text_size,
                length: txt.len(),
                deleted: false,
                indexed: true,
            };
            tensors.push(tensor);
            txt_data = [&txt_data, txt].concat();
            self.data.push(data);
            self.text_size += txt.len();
        }

        Tensor::stack(&tensors, 0)?
            .save_safetensors(EMBED_TENSOR_NAME, self.dir.join(EMBED_FILE))?;
        text_file.write_all(&txt_data)?;
        text_file.sync_all()?;

        self.save()?;

        Ok(())
    }

    /// Serializes and saves the core struct
    pub fn save(&self) -> Result<()> {
        let storebytes = bincode::serialize(&self)?;
        std::fs::write(self.dir.join(STORE_FILE), &storebytes)?;

        Ok(())
    }

    // We break apart the index builders to separate functions and build the ANN and BM25 index in parallel
    fn build_index(&mut self, num_trees: usize, max_size: usize) -> Result<()> {
        let ann = Self::build_ann(
                    &self.dir.join(EMBED_FILE),
                    num_trees,
                    max_size,
                    self.data.len(),
                )?;

        self.index = Some(ann?);


        Ok(())
    }

    // Builds the ANN index
    fn build_ann(
        file: &Path,
        num_trees: usize,
        max_size: usize,
        data_len: usize,
    ) -> Result<ANNIndex> {
        let tensors = unsafe {
            safetensors::MmapedSafetensors::new(file)?
                .load(EMBED_TENSOR_NAME, &candle_core::Device::Cpu)?
        };

        let ann = ANNIndex::build_index(
            num_trees,
            max_size,
            &tensors,
            &(0..data_len).collect::<Vec<_>>(),
        )?;

        Ok(ann)
    }
}

That should suffice, let’s verify that our storage behaves as expected!

src-tauri/src/store.rs

491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
#[cfg(test)]
mod tests {
    // import omitted

    #[derive(Deserialize)]
    struct WikiNews {
        title: String,
        text: String,
    }

    #[test]
    fn storage_init() -> Result<()> {
        let data = {
            let data = std::fs::read_to_string("../test-data/wiki-smoll.json")?;
            serde_json::from_str::<Vec<WikiNews>>(&data)?
        };

        let tdir = tempdir::TempDir::new("storetest")?;

        let mut store = Store::load_from_file(tdir.path(), None, None)?;
        let mut embed = Embed::new(Path::new("../models"))?;

        let mut chunks = data
            .chunks(Embed::STELLA_MAX_BATCH)
            .take(32)
            .filter_map(|c| {
                let batch = c
                    .par_iter()
                    .map(|t| format!("## {}\n{}", t.title, t.text))
                    .collect::<Vec<_>>();

                if let Ok(e) = embed.embeddings(&batch) {
                    let data = batch
                        .par_iter()
                        .enumerate()
                        .map(|(i, t)| {
                            (
                                t.to_string(),
                                e.i(i).unwrap(),
                                FileKind::Text(
                                    Path::new("../test-data/wiki-smoll.json").to_path_buf(),
                                ),
                            )
                        })
                        .collect::<Vec<_>>();

                    Some(data)
                } else {
                    None
                }
            })
            .flatten();

        let (mut text_file, _) = store.files()?;
        store.insert(&mut text_file, &mut chunks)?;

        // Ok, now let's test we have saved it right or not
        for (i, d) in store.data.iter().enumerate() {
            let mut df = store.data_file.as_ref().unwrap().lock().unwrap();
            let mut buf = vec![0_u8; d.length];
            df.seek(std::io::SeekFrom::Start(d.start as u64))?;
            df.read_exact(&mut buf)?;
            let str = std::str::from_utf8(&buf)?;
            assert_eq!(str, format!("## {}\n{}", data[i].title, data[i].text));
        }

        Ok(())
    }
}

Note
This will take a few minutes. If you are in a hurry, tweak this test to work on a smaller dataset!
On a side note, the safetensors.save() operation seems to be the slowest part of this block!

cd src-tauri
cargo test storage_init --release -- --nocapture
cd ..

With the fundamentals of storage functional, let’s focus our attention on exposing the method Store::search( .. ).

src-tauri/src/store.rs

225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
// code omitted

impl Store {
    // code omitted

    /// API for search into the index
    pub fn search(
        &self,
        qry: &[Tensor],
        qry_str: &[String],
        top_k: usize,
        ann_cutoff: Option<f32>,
        with_bm25: bool, // ignore this for now
    ) -> Result<Vec<(usize, &Data, String, f32)>> {
        // Giving 75% weightage to the ANN search and 25% to BM25 search
        const ALPHA: f32 = 0.75;

        // Let's get the ANN scores and BM25 scores in parallel
        let (ann, _) = rayon::join(
            || {
                let ann = DashMap::new();
                if let Some(index) = &self.index {
                    qry.par_iter().for_each(|q| {
                        let res = match index.search_approximate(q, top_k * 4, ann_cutoff) {
                            Ok(d) => d,
                            Err(e) => {
                                eprintln!("Error in search_approximate: {e}");
                                return;
                            }
                        };

                        res.iter().for_each(|(idx, score)| {
                            let idx = *idx;
                            if let Some(d) = self.data.get(idx) {
                                let txt = if let Ok(c) = self.chunk(d) {
                                    c
                                } else {
                                    return;
                                };

                                let mut e = ann.entry(idx).or_insert((d, txt, *score));
                                if e.2 < *score {
                                    e.2 = *score;
                                }
                            }
                        });
                    });
                }

                ann
            },
            || {
                // We are just scoping in for some retrieval strategy to be discussed and built upon in the future here!
                return None;
            },
        );

        // Now, we have the highest ANN for the set of queries
        // To normalize the ANN Scores, let's go ahead and get the Max/ Min
        let mut ann_max = 0_f32;
        let mut ann_min = f32::MAX;

        ann.iter().for_each(|j| {
            ann_max = j.2.max(ann_max);
            ann_min = j.2.min(ann_min);
        });

        let ann_div = ann_max - ann_min;

        // Ok, to time to normalize our scores and create a combined score for each of them
        let mut combined = ann
            .par_iter()
            .map(|j| {
                let id = *j.key();
                let ann_score = 1. - (j.2 - ann_min) / ann_div;
                let combined = ALPHA * ann_score + (1. - ALPHA) * 0.; // this hardcoded value of `0.` represents another set of scores to be discussed and computed later

                (id, j.0, j.1.clone(), combined)
            })
            .collect::<Vec<_>>();

        combined.par_sort_unstable_by(|a, b| b.3.total_cmp(&a.3));

        Ok(combined[0..top_k.min(combined.len())].to_vec())
    }
}

Note
The code above contains some odd visit this later kind of placeholders! Sorry about that, that’s a part of the fusion retrieval search strategy that we’ll implement later into the series.

Let’s modify our test to create the indexed storage, load from the file and search through it.

src-tauri/src/store.rs

491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
#[cfg(test)]
mod tests {
    // import omitted

    #[derive(Deserialize)]
    struct WikiNews {
        title: String,
        text: String,
    }

    #[test]
    fn storage_init_load_search() -> Result<()> { // renamed `storage_init` to `storage_init_load_search`
        let data = {
            let data = std::fs::read_to_string("../test-data/wiki-x-small.json")?;
            serde_json::from_str::<Vec<WikiNews>>(&data)?
        };

        let tdir = tempdir::TempDir::new("storetest")?;

        let mut store = Store::load_from_file(tdir.path(), None, None)?;
        let mut embed = Embed::new(Path::new("../models"))?;

        let mut chunks = data
            .chunks(Embed::STELLA_MAX_BATCH)
            .take(32)
            .filter_map(|c| {
                let batch = c
                    .par_iter()
                    .map(|t| format!("## {}\n{}", t.title, t.text))
                    .collect::<Vec<_>>();

                if let Ok(e) = embed.embeddings(&batch) {
                    let data = batch
                        .par_iter()
                        .enumerate()
                        .map(|(i, t)| {
                            (
                                t.to_string(),
                                e.i(i).unwrap(),
                                FileKind::Text(
                                    Path::new("../test-data/wiki-smoll.json").to_path_buf(),
                                ),
                            )
                        })
                        .collect::<Vec<_>>();

                    Some(data)
                } else {
                    None
                }
            })
            .flatten();

        let (mut text_file, _) = store.files()?;
        store.insert(&mut text_file, &mut chunks)?;

        // Ok, now let's test we have saved it right or not
        for (i, d) in store.data.iter().enumerate() {
            let mut df = store.data_file.as_ref().unwrap().lock().unwrap();
            let mut buf = vec![0_u8; d.length];
            df.seek(std::io::SeekFrom::Start(d.start as u64))?;
            df.read_exact(&mut buf)?;
            let str = std::str::from_utf8(&buf)?;
            assert_eq!(str, format!("## {}\n{}", data[i].title, data[i].text));
        }

        // Now, lets load from the file
        let store = Store::load_from_file(tdir.path(), Some(16), Some(16))?;
        let qry = embed
            .query(&[
                "What are the latest news about Iraq?".to_string(),
                "What is Firefox?".to_string()
            ])?
            .to_device(&candle_core::Device::Cpu)?;

        let b = qry.dims2()?.0;
        let qry = (0..b)
            .filter_map(|i| qry.get(i).ok().and_then(|t| t.unsqueeze(0).ok()))
            .collect::<Vec<_>>();

        let res = store.search(&qry[..], &[], 8, Some(0.36), false)?;

        println!("Response length: {}", res.len());
        res.iter().for_each(|(idx, _, txt, score)| {
            println!("Match[{idx}] score[{score}] ----------\n{txt}\n------------------\n")
        });

        Ok(())
    }
}

Note
This will take some time to run. From what I can tell saving the candle tensors as safetensors seem to be the slowest block in this flow. Not a lot to be done here now!
The tensors to create hyperplane as selected at random, so, this test will yield different results for different runs for a small dataset!

cd src-tauri
cargo test storage_init_load_search --release -- --nocapture
cd ..

Let’s look at the output!

running 1 test
Response length: 4
Match score[0.41619474] ----------
## Bulk of Iraqi debt to Paris Club to be forgiven
The nineteen member nations of the Paris Club have agreed to forgive 80 percent of Iraq\'s debt, reports the Reuters news service. The plan will reduce Iraq\'s total debt to the member nations to $7.8 billion, from the original $38.9 billion, over a period of four years.

# ... ommitted for readability ...

The plan is the culmination of a trans-Atlantic struggle over the amount of Iraqi debt to be forgiven, the United States pushing for a 90 to 95 percent reduction while France argued for much less.
------------------

Match score[0.4155748] ----------
## Suicide car bomb attack in Baghdad
On the one year anniversary of Saddam Hussein\'s capture, 13 Iraqis were killed according to AP in a suicide car bomb attack by an al-Qaida-linked terrorist. The attack took place about 9 AM Monday, when many Iraquis were arriving to work, outside the Green Zone which is home to the Iraqi interim government and several embassies. The region is under severe protection.

Authorities said the suicide bomber detonated the car while it was in line at a checkpoint. The explosion was very loud and it could be heard throughout the city.
------------------

Match score[0.40933716] ----------
## Lycos Europe ends its anti-spam campaign
Lycos Europe has ended its anti-spam operation: "Make Love Not Spam." A company spokesperson said the objective of the time-limited campaign was to raise people\'s awareness. The reasons why it ended the campaign was variously reported and speculated in media. The operation, while fairly popular, suffered unexpected troubles and drew criticism from security experts and others from the start.

# ... ommitted for readability ...

Next, one of the targeted sites redirected all traffic to the Lycos\' server, making Lycos itself a target. The company had maintained that its server was immune from the attack. Lycos stopped distributing the program on December 3, 2004 and asked clients to "stay tuned." The company later ended the program.

On December 6, F-Secure reported a virus email disguised as the anti-spam screensaver. When its attachment \(a zip file\) is opened, it self-extracts and installs a \"Trojan horse\" --harmful program disguised as legitimate software. The Trojan horse was set up to monitor keystrokes in order to steal passwords, bank account numbers and other important information.

Lycos' software had been downloaded more than 100,000 times by the end of the campaign.
------------------

Match score[0.4010714] ----------
## Saddam Hussein profited roughly 1B by taking funds from UN program
Investigators said that Saddam Hussein diverted money from the Oil-for-Food Program to pay millions of dollars to families of suicide bombers from the West Bank and Gaza Strip who carried out attacks on Israeli civilians.

Paul Volcker, a former American official investigating the diverted funds scandal, has taken some heat from advocates demanding that he haul United Nations personnel before the US Congress. His reason for not subjecting them to this degree of open scrutiny is that it would have the perverse effect of pushing them into refusing to cooperate with the investigation at all. He plans to release documentary evidence early next year, when his investigation is complete.
------------------

test store::tests::storage_read ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 3 filtered out; finished in 16.84s

     Running unittests src/main.rs (target/release/deps/retaugen-8b0ea0594349d0df)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

That looks reasonable 🎉🥳🎉 … the first, second and fourth results look relevant. The second one is off! We’ll fix these, hopefully, when we incorporate various retrieval strategies in Part 5 of this series.

Next steps

That was a heavy ~4500 words, we created our Mini Vector Store from scratch, indexed some data into it, searched it to get some relevant result. We also figured out a way of saving the data on disk. We still have a lot to cover before we have a working Document QA application.

Next, we need a way of extracting text from .pdf and .txt files, over to the next post!

TL; DR#

Output#

About Vector Store#

Which Vector Store do we use?#

Mini Vector Store#

Indexing#

Storage#

Next steps#

TL; DR

Output

About Vector Store

Which Vector Store do we use?

Mini Vector Store

Indexing

Storage

Next steps