AI-RAG-docuquery: My Documentary Research Project with AI

AI-RAG-docuquery It is an application that combines artificial intelligence and semantic search to help you quickly and naturally consult your local documents.
It is based on the paradigm of Retrieval-Augmented Generation (RAG): an approach that combines the search for relevant texts (retrieval) with the generation of responses in natural language (generation).

GitHub repo: github.com/paoloronco/AI-RAG-docuquery-app

In practice, the user:

  1. Index your files (PDF, Word, Excel, PowerPoint, plain text, Markdown, CSV).
  2. Asks questions in natural language via a simple interface (PyQt6 GUI).
  3. Receives concise and quoted answers, with direct links to the original sources.

This means turning a folder of documents into a sort of personal assistant, capable of answering questions and guiding you through the consultation of sources without having to open them and search for them manually.

Why it is useful

Many professionals and students find themselves managing large amounts of documents: technical manuals, handouts, PDF archives. Searching for information in these files is often slow and frustrating.
With AI-RAG-docuquery:

  • You save time thanks to semantic research smarter than classic text searches.
  • The answers are always linked to verifiable sources, so one should not blindly trust the model.
  • It is possible to use both local models HuggingFace (even offline) both OpenAI-compatible services, making the system flexible to the user's needs.
  • It also works in mode “No LLM”, that is, without language models, for those who want only the most similar passages found in the documents.

Main features

  • FAISS Vector Search Engine for document indexing.
  • Multi-format support (PDF, DOCX, PPTX, XLSX, TXT, CSV, MD).
  • PyQt6 graphical interface, intuitive and cross-platform.
  • Simplified OpenAI setup: dedicated popup with API key management and template selection.
  • Compatibility with local HuggingFace, which uses the CPU or GPU automatically.
  • Build in executable format (.exe) via PyInstaller, so the app can be distributed without the user having to install Python.

Skills acquired thanks to the project

The realization of this project allowed me to work on various technical fronts, consolidating and expanding my skills:

  1. NLP and applied AI
    • Understanding the RAG paradigm and its practical use.
    • Use of Sentence-Transformers for embeddings and HuggingFace models.
    • Integration with external model APIs (OpenAI/compatible).
  2. Information Retrieval
    • Managing vector indexes with FAISS.
    • Hybrid (dense + sparse) retrieval techniques.
    • Creating pipelines for extracting content from heterogeneous formats.
  3. Software Development with Python
    • Modular code organization (indexer, retrieve, loaders, llm_clients).
    • Dependency management and virtual environments.
    • Distribution via PyInstaller with optimizations for heavy libraries like torch and faiss.
  4. GUI development
    • Creating graphical interfaces with PyQt6.
    • Integration of advanced features such as clickable links that open local files.
    • Persisting user configurations via JSON files.
  5. Software Engineering & DevOps
    • Versioning with Git and distribution on GitHub.
    • Package management, cross-platform troubleshooting, and optimization for Windows.
    • Complete documentation with README and roadmap.

Conclusion

AI-RAG-docuquery It's a project that combines academic research and practicality: a useful app for simplifying the consultation of personal documents, but also an advanced exercise in software engineering and AI integration.

It allowed me to grow as a developer, moving from theoretical NLP concepts to a working, deployable application, ready for anyone to use.

Never miss an article
SSubscribe to my newsletter!

📬 No spam, guaranteed.
Unsubscribe anytime

We don't spam! Read more in our privacy policy

Leave a Reply

Your email address will not be published. Required fields are marked *