🚀 ULTIMATE V1 ВЖЕ ДОСТУПНИЙ! ✦ Приєднуйтесь до тисяч розробників у Discord ✦ Найкращий у Світі Copilot для Unreal Engine 5 ✦ Понад 1050 Нативних Інструментів ✦ 95% Покриття UE5 Core ✦ Володійте Назавжди ✦
✨ ULTIMATE V1 - Найбільший Стрибок у Розробці Ігор з AI ✦ Понад 1050 Нативних Інструментів ✦ Без Щомісячних Підписок ✦ Запускайте Локальні LLM для 100% Конфіденційності ✦ Повний Вихідний Код Включено ✦ Натисніть, щоб дізнатися що нового ✦
ULTIMATE V1 ВЖЕ ТУТ!

Найбільше оновлення - понад 1050 нативних інструментів, Повний Інтелект Проєкту та AI Unreal Insights. Покриття 95% ядра UE5. Без підписок, повний вихідний код, володійте назавжди.

Local LLM Setup

Local LLM Setup

V1 Documentation - Under Heavy Rework. Some sections may be incomplete. For the latest info, check the FAB listing or Discord.

Run AI completely offline on your own machine with zero internet required and complete privacy. The plugin works with any OpenAI-compatible local inference server.

Supported Servers

Ollama

  1. Download and install from ollama.com
  2. Pull a model: ollama pull llama3.1
  3. Ollama starts automatically and serves on port 11434
  4. In plugin Settings > AI Models > API Key/Local LLM:
    • Provider: Custom
    • Base URL: http://localhost:11434/v1/chat/completions
    • Model Name: llama3.1 (or whatever you pulled)
    • API Key: leave empty

LM Studio

  1. Download from lmstudio.ai
  2. Load a model from the LM Studio UI
  3. Start the local server (default port 1234)
  4. In plugin Settings:
    • Provider: Custom
    • Base URL: http://localhost:1234/v1/chat/completions
    • Model Name: the model you loaded (check LM Studio’s server tab)
    • API Key: leave empty

Lemonade (AMD)

  1. Download from lemonade-server.ai
  2. Load a model (optimized for AMD GPUs/NPUs)
  3. Server runs on port 13305
  4. In plugin Settings:
    • Provider: Custom
    • Base URL: http://localhost:13305/api/v1/chat/completions
    • Model Name: your loaded model name (e.g., qwen3.5-9b-FLM)
    • API Key: leave empty

Any OpenAI-Compatible Server

The plugin works with any server that implements the OpenAI /v1/chat/completions endpoint:

  • vLLM
  • text-generation-webui (with OpenAI extension)
  • LocalAI
  • Jan
  • Koboldcpp (with OpenAI API mode)

Configuration Tips

Base URL Format

The URL must point to the chat completions endpoint. Common patterns:

  • http://localhost:PORT/v1/chat/completions
  • http://localhost:PORT/api/v1/chat/completions

API Key

Leave empty for local servers that don’t require authentication. The plugin skips the Authorization header when no key is set.

Custom Request Params

Use the “Custom Request Params (JSON)” field to pass extra parameters:

{"temperature": 0.7, "max_tokens": 16384}

Model Selection

Enter the exact model name as your server reports it. For Ollama, this is the model tag (e.g., llama3.1, codellama:13b). For LM Studio, check the server tab for the loaded model identifier.

Limitations

  • Context window: Local models typically have 4K-32K token contexts. The plugin’s system prompt uses several thousand tokens, which may not leave much room for conversation on small models. Use models with 32K+ context for best results.
  • Tool calling: The AI needs a capable model to properly use tools. Small models (7B and under) may struggle with the structured JSON tool format. 13B+ recommended for tool-heavy workflows.
  • Speed: Generation speed depends entirely on your hardware. GPU acceleration is recommended.
  • Just Chat mode: If tool execution fails with your local model, try “Just Chat” mode which sends a minimal payload without tool instructions.

Privacy

When using a local LLM, zero data leaves your machine. No API calls are made to any external server. Your prompts, project context, and generated content stay entirely local.

Live