Project description

naan

Table of Contents

naan

Installation

pip install naan

Index data

To see Naan in action, let's first get some data to embed:

import requests
from sentence_transformers import SentenceTransformer


res = requests.get("https://raw.githubusercontent.com/masci/naan/main/example/sentences.json")
sentences = json.load(StringIO(res.text))

Naan tries not to get in the way you manage your FAISS index, so the first step is always setting up the FAISS side of things:

from sentence_transformers import SentenceTransformer
import faiss


model = SentenceTransformer("bert-base-nli-mean-tokens")
sentence_embeddings = model.encode(sentences[:100])
dim = sentence_embeddings.shape[1]
index = faiss.IndexFlatL2(dim)

Now it's time to wrap the FAISS index with Naan and use it to index data:

from naan import NaanDB


# Create a Naan database from scratch
db = NaanDB("db.naan", index, force_recreate=True)
db.add(sentence_embeddings, sentences)

Naan will add the vector embeddings to the FAISS index, and will also store the original sentences. This way, a vector search will look like this:

# Reopen an existing Naan database
db = NaanDB("db.naan")
query_embeddings = model.encode(["The book is on the table"])
# Naan's search API is the same as FAISS, let's get the 3 closest vectors
results = db.search(query_embeddings, 3)
for result in results:
    print(result)
# (5799, 'Two girls are laughing and other girls are watching them')
# (20303, 'A group of football players is running in the field')
# (14418, 'Four boys are sitting in a muddy stream.')
# (28922, 'A group of people playing football is running in the field')

License

naan is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.3

May 12, 2024

0.0.2

May 12, 2024

0.0.1

Mar 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naan-0.0.3.tar.gz (5.4 kB view hashes)

Uploaded May 12, 2024 Source

Built Distribution

naan-0.0.3-py3-none-any.whl (5.6 kB view hashes)

Uploaded May 12, 2024 Python 3

Hashes for naan-0.0.3.tar.gz

Hashes for naan-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`9de756c016f1faa6b7ae967ab2d1a30288eb65902c2d29538571a5e67397962e`
MD5	`3c27528751f544436adf66dbf6d39b89`
BLAKE2b-256	`19b8236c6475a8c6770ca04f69c94f66c5e94aa379692d3f8be95dd912fb58ed`

Hashes for naan-0.0.3-py3-none-any.whl

Hashes for naan-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2975cc68b12af16bdddf8fd16bc7988d8d39700dd1813dc9d733dcbc7cc116da`
MD5	`63bb532e5e8e0c08942a21e33f3accd5`
BLAKE2b-256	`0c82285f7de7e05cd2f416f9c2cf419ecb391492f4ab170b6914d87774d45a23`