⚡ ELECTRA API
MakuluLinux AI Backend · Public Access
How it works
1
Join our Patreon — pick a membership tier at patreon.com/makululinux
2
Sign in here with Patreon — your account is created automatically, your tier is synced, and an API key is generated instantly.
3
Connect your tools — use your API key with any OpenAI-compatible app (Continue.dev, Open WebUI, Cursor, LM Studio…).
Membership Tiers
TierPledgeDaily Requests
Free $0 / mo 15
Private $5 / mo 150
Corporal $10 / mo 330
Sergeant $20 / mo 660
Major $40 / mo 1,500
Commander $100 / mo Unlimited ∞
💡 We bill by requests per day, not tokens. No surprise costs — just a flat daily limit that resets at midnight UTC.
⚡ ELECTRA
Sign in to access your API key
Continue with Patreon
theme
Pick a theme
Discord
support@makululinux.com
⚡ ELECTRA API
?
Loading…
Sign out
Base URL
https://makululinux.us:2007
API Version
v1
OpenAI-compatible
Daily Limit
—
Loading…
Your API Key
••••••••••••••••••••••••••••••••
Keep this key private. Include it in every API request as a Bearer token.
Authentication
Add your API key to the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

The endpoint is OpenAI-compatible — just change the base_url in your existing client and it will work with Continue.dev, Open WebUI, LM Studio, Cursor, and any other tool that supports custom OpenAI endpoints.
Code Examples
curl https://makululinux.us:2007/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen/qwen3.5-122b-a10b", "messages": [{"role": "user", "content": "Hello!"}], "stream": false }'
from openai import OpenAI client = OpenAI( base_url="https://makululinux.us:2007/v1", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="qwen/qwen3.5-122b-a10b", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://makululinux.us:2007/v1", apiKey: "YOUR_API_KEY", }); const response = await client.chat.completions.create({ model: "qwen/qwen3.5-122b-a10b", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content);
Settings → Connections → Add Connection API Base URL : https://makululinux.us:2007 API Key : YOUR_API_KEY Then go to Models and select any model from the /v1/models list. All models are available immediately with no further configuration.
Endpoints
MethodPathDescription
POST/v1/chat/completionsChat completions — OpenAI compatible
GET/v1/modelsList available model IDs
GET/v1/model-speedsLive model speed test results
MemPalace Long-Term Memory
POST/v1/memory/storeStore a conversation turn into vector memory
POST/v1/memory/retrieveRetrieve hybrid (recent + semantic) context
DELETE/v1/memory/clearWipe all memories for a conversation
GET/v1/memory/statsMemory count and timestamps for a conversation
MemPalace Long-Term Memory
🧠 What is MemPalace?

The /v1/chat/completions endpoint is a stateless, OpenAI-compatible pass-through — it has no built-in memory. Your application is responsible for maintaining the messages[] array across turns.

MemPalace is the server-side long-term memory layer that sits alongside chat completions. It uses ChromaDB — a local vector database running on the Electra server — to store every conversation turn as a semantic embedding. When you retrieve memories, you get back a ready-to-inject context block that you prepend to your next prompt.

Memory is scoped per conversation_id — a string you choose and reuse across sessions. Entries are automatically purged after 60 days of inactivity.

How the hybrid retrieval works
⏱ Short-Term (Recent)
The 2 most recent conversation turns are always included, in chronological order. This ensures the model always has immediate context — what was just said.
🔍 Long-Term (Semantic)
Up to 4 older turns are retrieved by vector similarity to the current query. If the user mentions a topic discussed weeks ago, that memory is surfaced automatically.
Both layers are combined into a single compiled_context string with clear section headers. Prepend it to your system message or the start of your user message before calling /v1/chat/completions.
Typical usage flow per turn
1
Retrieve — Call POST /v1/memory/retrieve with the user's new message as the query. Get back compiled_context.
2
Inject — Prepend compiled_context to your system prompt. Build your messages[] array as usual.
3
Complete — Send the enriched request to POST /v1/chat/completions and get the AI reply.
4
Store — Call POST /v1/memory/store with the user message + the reply you just received. The server stores the turn as an embedding for future retrieval.
Code Examples
curl -X POST https://makululinux.us:2007/v1/memory/store \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "conversation_id": "my-session-abc123", "user_message": "My name is Alex and I prefer concise answers.", "assistant_message": "Got it, Alex! I will keep my replies short and to the point." }' # Response: # { "stored": true, "conversation_id": "my-session-abc123" }
curl -X POST https://makululinux.us:2007/v1/memory/retrieve \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "conversation_id": "my-session-abc123", "query": "What is my name?", "max_results": 4 }' # Response: # { # "conversation_id": "my-session-abc123", # "total_memories": 5, # "compiled_context": "--- LONG-TERM RELEVANT MEMORIES ---\n...\n--- IMMEDIATE CONTEXT (SHORT-TERM) ---\n...", # "recent": ["User said: ..."], # "semantic": ["User said: My name is Alex..."] # }
curl -X DELETE https://makululinux.us:2007/v1/memory/clear \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "conversation_id": "my-session-abc123" }' # Response: # { "cleared": true, "conversation_id": "my-session-abc123" } # --- Stats (GET) --- curl "https://makululinux.us:2007/v1/memory/stats?conversation_id=my-session-abc123" \ -H "Authorization: Bearer YOUR_API_KEY" # Response: # { # "conversation_id": "my-session-abc123", # "total_memories": 5, # "oldest_memory_ts": 1748000000.0, # "newest_memory_ts": 1748100000.0, # "retention_days": 60 # }
import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://makululinux.us:2007" CONV_ID = "my-session-abc123" HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"} def chat_with_memory(user_message: str) -> str: # 1. Retrieve relevant memories for this query mem = requests.post(f"{BASE_URL}/v1/memory/retrieve", headers=HEADERS, json={ "conversation_id": CONV_ID, "query": user_message, "max_results": 4 }).json() context = mem.get("compiled_context", "") # 2. Build enriched system prompt with memory injected system_prompt = "You are a helpful assistant." if context: system_prompt += f"\n\nPrevious conversation context:\n{context}" # 3. Call /v1/chat/completions resp = requests.post(f"{BASE_URL}/v1/chat/completions", headers=HEADERS, json={ "model": "qwen/qwen3.5-122b-a10b", "messages": [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_message} ], "stream": False }).json() assistant_reply = resp["choices"][0]["message"]["content"] # 4. Store this turn for future retrieval requests.post(f"{BASE_URL}/v1/memory/store", headers=HEADERS, json={ "conversation_id": CONV_ID, "user_message": user_message, "assistant_message": assistant_reply }) return assistant_reply # Example conversation — memory persists across sessions print(chat_with_memory("My name is Alex and I like Python.")) print(chat_with_memory("What's a good project idea for me?")) # Remembers your name + Python pref
Important notes
•conversation_id is your responsibility. Use any unique string — a UUID, username, session token, or hash. The same ID must be used consistently across store and retrieve calls for a session.
•Memory is shared across server restarts. ChromaDB is a persistent disk-backed database at /www/chroma_memory_db on the server. Your memories survive server updates.
•Auto-purge after 60 days. A background worker runs nightly and deletes any memory entry with a timestamp older than 60 days. Active sessions stay alive indefinitely as long as new turns keep being stored.
•Memory calls do not count toward your daily request limit. Store, retrieve, clear, and stats operations are free — only /v1/chat/completions calls are counted.
•API key required. All four memory endpoints require a valid Bearer API key in the Authorization header — same key you use for chat completions.
Live Model Speeds
# Model Response Time Status Model ID
Requests Today
—
of — daily limit
Remaining Today
—
resets at midnight UTC
7-Day Total
—
requests this week
Daily usage 0%
7-Day History
Top Models Used
ModelRequests (7d)
Loading…
Recent Requests
TimestampModelEndpoint
Loading…
Current Tier
—
— requests per day
Username
—
—
Membership Tiers
TierPledgeDaily Limit
Free $0/mo 15 requests
Private $5/mo 150 requests
Corporal $10/mo 330 requests
Sergeant $20/mo 660 requests
Major $40/mo 1,500 requests
Commander $100/mo Unlimited ∞
Manage Membership

To increase your daily request limit, upgrade your Patreon membership. After upgrading on Patreon, click Sync Membership to apply the new tier to your account immediately.

↗ Upgrade on Patreon
Join us on Discord
support@makululinux.com
© MakuluLinux · Electra API