RAG "Second Brain" for Technical Docs with code

Hello all,

I’m building a RAG (Retrieval-Augmented Generation) system as a "second brain" for ~3-4k docs I’ve collected, each with descriptions and code snippets on cloud, OS, and more. Here’s my approach so far:

  • Focus: Starting with retrieval and storage, aiming for quick access to relevant docs.
  • Structure of each document: Does it make sense to use also LLM API to create a short, standardised abstract of each doc to help with organization and tagging.
  • Storage Options: vector DB / relational for metadata?

Questions:

Any tips for structuring docs with code and descriptions for efficient retrieval

Has LLM summarization/tagging worked well for your projects?

Which VectorDB do you recommend?

Thank you all!