Open source AI stack components

#programming #ai #gen #javascript

Here’s a comprehensive and categorized list of open source AI stack components that you can mix and match when building GenAI applications — especially when focusing on modularity, scalability, and performance. This includes components for data processing, model serving, retrieval-augmented generation (RAG), vector search, and orchestration.

🧠 Foundational Model Alternatives

Models you can self-host or fine-tune:

LLMs
- llama.cpp – Inference for LLaMA and derivatives (CPU/GPU).
- mistral – Mistral models.
- Falcon – Powerful open weights.
- GPT-J, GPT-NeoX – From EleutherAI.
- OpenChat – Open fine-tuned chat model.
- WizardLM – Instruction-tuned LLMs.
Multimodal
- llava – Language + vision.
- bakllava – More optimized multimodal variant.
- CLIP – Text-image understanding.
Fine-Tuning
- QLoRA, LoRA, PEFT (via 🤗 Transformers + PEFT)
- Axolotl – Full stack fine-tuning.

📚 RAG (Retrieval-Augmented Generation) Stack

Tools to power knowledge-based Q&A systems:

Embeddings
- sentence-transformers
- Instructor-XL – Instruction-based embeddings.
Vector Databases
- Qdrant
- Weaviate
- Pinecone (closed source but popular)
- Milvus
- Chroma – Python-native vector DB.
- FAISS – Facebook AI Similarity Search.
Document Loaders & Chunking
- LangChain or LlamaIndex
- Haystack – Full RAG pipelines.

🔧 Serving & Orchestration

Serving models with APIs, managing prompts, memory, and chaining tools:

Model Servers
- vLLM – Fast LLM serving with paged attention.
- TGI – HuggingFace’s scalable inference server.
- Triton Inference Server
- LMDeploy – Model optimization & serving.
Agent / Workflow Frameworks
- LangChain
- LlamaIndex
- Haystack
- CrewAI – Multi-agent framework.
- AutoGen
Prompt Management
- PromptLayer
- Langfuse
- Helicone (for logging OpenAI usage)