In today’s fast-paced world, where information accumulates rapidly, the need to retrieve and leverage past insights efficiently is crucial. This led me on a journey to create an open-source semantic search tool for internal documents — named AIWhispr.
🔗 GitHub Repository: AIWhispr
The Problem AIWhispr Solves
Imagine having a decade-old business idea tucked away in your documents, notes, and communication. Unearthing this treasure trove of knowledge becomes effortless with semantic search. Unlike traditional search methods that churn out irrelevant results, semantic search understands the context and intent behind your query. It aligns your search with the content repository, helping you make informed decisions. For instance, that business idea you shelved in a document due to immature technology then, may now be ripe for fruition.
The Vision: Free, Fast, and User-Friendly
AIWhispr sets out to make semantic search accessible to everyone. Its vision encompasses:
Accessibility: The tool is absolutely free, eliminating financial barriers.
Speed: It ensures rapid results, with searches taking less than a second.
User-Friendly: No technical expertise is required; it’s designed for all, not just tech-savvy individuals.
Platform Diversity: AIWhispr caters to various operating systems, including MS Windows, Mac OS, ChromeOS, and Linux.
File Formats: It handles common file formats like pdf, txt, csv, pptx, docx, and xlsx.
Privacy and Security at the Forefront
Privacy concerns are paramount. AIWhispr respects your data by:
Local Processing: Your data remains local; AIWhispr doesn’t share it with external services.
Open Source: The source code is transparent, ensuring you can review, modify, and distribute it.
Offline Operation: Once installed and configured, AIWhispr operates solely on locally stored content, without internet connectivity.
Efficiency and flexibility
AIWhispr leverages the power of AI/LLM models to provide exceptional semantic search experiences. Notable features include:
Low Resource Usage: Quality vector embeddings are generated with less than 1 GB RAM for a model.
GPU Leverage: While not mandatory, AIWhispr optimally uses available GPUs.
Choice of Model: It adopts sentence-transformers/all-mpnet-base-v2 as its default model for similarity search while you giving you the flexibility to bring your own model. A simple configuration, low-code template enables you to integrate a third-party LLM encoding.
Choice of vector database: It adopts Typesense as the default storage for extracted text and vectors while giving you the flexibility to integrate with any vector database. A simple configuration, low-code template enables you to integrate a third-party text and vector storage.
Current Progress and Capabilities
AIWhispr is already making progress:
Model Selection: After rigorous testing, it employs sentence-transformers/all-mpnet-base-v2.
Storage and Search: Typesense is utilized for text and vector storage, enabling seamless comparison.
Versatile Compatibility: It runs on Linux, MacOS, and Windows (via WSL).
Data Sources: It supports reading files from local directories, S3 buckets, and Azure Blob containers.
Configuration: Default setting examples are provided, requiring no coding skills.
Integration: AIWhispr offers a search service query API interface for integration with your apps.
Future Avenues
AIWhispr’s journey doesn’t stop here, we are building for :
Wider Integration: Share examples of integrating other LLMs and vector databases.
Platform Expansion: Extend support to ChromeOS and Google Cloud Storage.
User Experience: Streamline installation steps, making it user-friendly across different platforms. The current installation steps require understanding of complciated software components like nginx, uwsgi. We acknowledge we have to do better at this to activate semantic search for everyone.
Visual Search: Enable image search within documents for logos, product images, and more.
Text-to-Image Search: Explore possibilities of searching for text within images.
If you’re intrigued by the potential of semantic search, I encourage you to explore AIWhispr,
It is open-source, free and takes a privacy first approach.
Comments