Documentation Index
Fetch the complete documentation index at: https://mintlify.com/bidewio/better-openclaw/llms.txt
Use this file to discover all available pages before exploring further.
Local AI Models
Run AI models locally for privacy, cost savings, and offline operation. Includes LLM inference, image generation, and speech-to-text.Available Services
Ollama
Port: 11434 | Memory: 2048 MB | Maturity: StableRun large language models locally with an easy-to-use API. Supports Llama, Mistral, Gemma, and many more open-source models.Features:
- 100+ open-source models
- Simple REST API
- Model management CLI
- Streaming responses
- OpenAI-compatible API
- CPU and GPU support
- Llama 3.3, Llama 3.2, Llama 3.1
- Mistral, Mixtral
- Gemma 2, CodeGemma
- Phi-3, Qwen 2.5
- DeepSeek-Coder
- Skill:
ollama-local-llm - Environment:
OLLAMA_HOST,OLLAMA_PORT
ComfyUI
Port: 8188 | Memory: 4096 MB | Maturity: ExperimentalNode-based visual workflow editor for Stable Diffusion and other generative AI models. Design complex image/video generation pipelines.Features:
- Node-based workflow editor
- Stable Diffusion support
- ControlNet, LoRA, VAE support
- Custom nodes ecosystem
- REST API
- Batch processing
- NVIDIA GPU with CUDA
- nvidia-docker2 installed
- Minimum 4 GB VRAM (8 GB+ recommended)
- Skill:
comfyui-generate - Environment:
COMFYUI_HOST,COMFYUI_PORT
Stable Diffusion WebUI
Port: 7860 | Memory: 4096 MB | Maturity: ExperimentalLocal AI image generation with a web interface. Generate images from text prompts using Stable Diffusion.Features:
- Text-to-image generation
- Image-to-image transformation
- Inpainting and outpainting
- Model management
- Extensions support
- Batch processing
- NVIDIA GPU with CUDA
- nvidia-docker2 installed
- Minimum 4 GB VRAM
Faster Whisper Server
Port: 8001 | Memory: 1024 MB | Maturity: BetaSelf-hosted speech-to-text transcription service using the Faster Whisper engine for high-performance audio transcription.Features:
- OpenAI Whisper models
- Fast inference (CTranslate2)
- Multiple languages
- OpenAI-compatible API
- Timestamp support
- CPU and GPU support
- tiny, base, small, medium, large
- Multilingual and English-only variants
- Skill:
whisper-transcribe - Environment:
WHISPER_HOST,WHISPER_PORT
Usage Examples
Local LLM Stack
Image Generation Stack (GPU Required)
Complete Local AI Stack
Audio Transcription Stack
Model Management
Ollama Models
Pull models into Ollama:ComfyUI Models
Download Stable Diffusion checkpoints to thecomfyui-models volume:
Hardware Requirements
CPU-Only (LLMs)
| Model Size | RAM Required | Performance |
|---|---|---|
| 7B params | 8 GB | Good |
| 13B params | 16 GB | Moderate |
| 34B params | 32 GB | Slow |
| 70B params | 64 GB+ | Very Slow |
GPU (Image Generation)
| VRAM | Supported Models | Performance |
|---|---|---|
| 4 GB | SD 1.5 | Slow |
| 6 GB | SD 1.5, SDXL (low res) | Moderate |
| 8 GB | SDXL | Good |
| 12 GB+ | SDXL, SD3 | Excellent |
Performance Tips
Ollama Optimization
- Model Selection: Start with smaller models (7B) for faster inference
- Context Length: Reduce context window for speed
- Quantization: Use Q4 or Q5 quantized models
- GPU Acceleration: Add NVIDIA GPU support with
nvidia-docker2
Image Generation Optimization
- GPU Required: CPU inference is extremely slow (minutes per image)
- VRAM Management: Close other GPU applications
- Batch Size: Reduce for lower VRAM usage
- Resolution: Start with 512x512, then scale up
Privacy & Offline Operation
Local models provide:- Data Privacy: All processing happens on your infrastructure
- Offline Operation: No internet required after model download
- Cost Savings: No API costs for inference
- Customization: Fine-tune models for specific tasks
- Low Latency: No network round trips