Electra API · MakuluLinux

Tier	Pledge	Daily Requests
Free	$0 / mo	15
Private	$5 / mo	150
Corporal	$10 / mo	330
Sergeant	$20 / mo	660
Major	$40 / mo	1,500
Commander	$100 / mo	Unlimited ∞

Base URL

https://makululinux.us:2007

API Version

OpenAI-compatible

Daily Limit

—

Loading…

Your API Key

••••••••••••••••••••••••••••••••

Keep this key private. Include it in every API request as a Bearer token.

Authentication

Add your API key to the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

The endpoint is OpenAI-compatible — just change the base_url in your existing client and it will work with Continue.dev, Open WebUI, LM Studio, Cursor, and any other tool that supports custom OpenAI endpoints.

Code Examples

curl https://makululinux.us:2007/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.5-122b-a10b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://makululinux.us:2007/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="qwen/qwen3.5-122b-a10b",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://makululinux.us:2007/v1",
  apiKey:  "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model:    "qwen/qwen3.5-122b-a10b",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Settings → Connections → Add Connection

API Base URL : https://makululinux.us:2007
API Key      : YOUR_API_KEY

Then go to Models and select any model from
the /v1/models list. All models are available
immediately with no further configuration.

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	Chat completions — OpenAI compatible
GET	`/v1/models`	List available model IDs
GET	`/v1/model-speeds`	Live model speed test results
MemPalace Long-Term Memory
POST	`/v1/memory/store`	Store a conversation turn into vector memory
POST	`/v1/memory/retrieve`	Retrieve hybrid (recent + semantic) context
DELETE	`/v1/memory/clear`	Wipe all memories for a conversation
GET	`/v1/memory/stats`	Memory count and timestamps for a conversation

MemPalace Long-Term Memory

🧠 What is MemPalace?

The /v1/chat/completions endpoint is a stateless, OpenAI-compatible pass-through — it has no built-in memory. Your application is responsible for maintaining the messages[] array across turns.

MemPalace is the server-side long-term memory layer that sits alongside chat completions. It uses ChromaDB — a local vector database running on the Electra server — to store every conversation turn as a semantic embedding. When you retrieve memories, you get back a ready-to-inject context block that you prepend to your next prompt.

Memory is scoped per conversation_id — a string you choose and reuse across sessions. Entries are automatically purged after 60 days of inactivity.

How the hybrid retrieval works

⏱ Short-Term (Recent)

The 2 most recent conversation turns are always included, in chronological order. This ensures the model always has immediate context — what was just said.

🔍 Long-Term (Semantic)

Up to 4 older turns are retrieved by vector similarity to the current query. If the user mentions a topic discussed weeks ago, that memory is surfaced automatically.

Both layers are combined into a single compiled_context string with clear section headers. Prepend it to your system message or the start of your user message before calling /v1/chat/completions.

Typical usage flow per turn

Retrieve — Call POST /v1/memory/retrieve with the user's new message as the query. Get back compiled_context.

Inject — Prepend compiled_context to your system prompt. Build your messages[] array as usual.

Complete — Send the enriched request to POST /v1/chat/completions and get the AI reply.

Store — Call POST /v1/memory/store with the user message + the reply you just received. The server stores the turn as an embedding for future retrieval.

Code Examples

curl -X POST https://makululinux.us:2007/v1/memory/store \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": "my-session-abc123",
    "user_message":    "My name is Alex and I prefer concise answers.",
    "assistant_message": "Got it, Alex! I will keep my replies short and to the point."
  }'

# Response:
# { "stored": true, "conversation_id": "my-session-abc123" }

curl -X POST https://makululinux.us:2007/v1/memory/retrieve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": "my-session-abc123",
    "query":           "What is my name?",
    "max_results":     4
  }'

# Response:
# {
#   "conversation_id":  "my-session-abc123",
#   "total_memories":   5,
#   "compiled_context": "--- LONG-TERM RELEVANT MEMORIES ---\n...\n--- IMMEDIATE CONTEXT (SHORT-TERM) ---\n...",
#   "recent":           ["User said: ..."],
#   "semantic":         ["User said: My name is Alex..."]
# }

curl -X DELETE https://makululinux.us:2007/v1/memory/clear \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": "my-session-abc123"
  }'

# Response:
# { "cleared": true, "conversation_id": "my-session-abc123" }

# --- Stats (GET) ---
curl "https://makululinux.us:2007/v1/memory/stats?conversation_id=my-session-abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response:
# {
#   "conversation_id":  "my-session-abc123",
#   "total_memories":   5,
#   "oldest_memory_ts": 1748000000.0,
#   "newest_memory_ts": 1748100000.0,
#   "retention_days":   60
# }

import requests

API_KEY  = "YOUR_API_KEY"
BASE_URL = "https://makululinux.us:2007"
CONV_ID  = "my-session-abc123"
HEADERS  = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

def chat_with_memory(user_message: str) -> str:
    # 1. Retrieve relevant memories for this query
    mem = requests.post(f"{BASE_URL}/v1/memory/retrieve", headers=HEADERS, json={
        "conversation_id": CONV_ID,
        "query":           user_message,
        "max_results":     4
    }).json()

    context = mem.get("compiled_context", "")

    # 2. Build enriched system prompt with memory injected
    system_prompt = "You are a helpful assistant."
    if context:
        system_prompt += f"\n\nPrevious conversation context:\n{context}"

    # 3. Call /v1/chat/completions
    resp = requests.post(f"{BASE_URL}/v1/chat/completions", headers=HEADERS, json={
        "model":    "qwen/qwen3.5-122b-a10b",
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": user_message}
        ],
        "stream": False
    }).json()

    assistant_reply = resp["choices"][0]["message"]["content"]

    # 4. Store this turn for future retrieval
    requests.post(f"{BASE_URL}/v1/memory/store", headers=HEADERS, json={
        "conversation_id":   CONV_ID,
        "user_message":      user_message,
        "assistant_message": assistant_reply
    })

    return assistant_reply

# Example conversation — memory persists across sessions
print(chat_with_memory("My name is Alex and I like Python."))
print(chat_with_memory("What's a good project idea for me?"))  # Remembers your name + Python pref

Important notes

•conversation_id is your responsibility. Use any unique string — a UUID, username, session token, or hash. The same ID must be used consistently across store and retrieve calls for a session.

•Memory is shared across server restarts. ChromaDB is a persistent disk-backed database at /www/chroma_memory_db on the server. Your memories survive server updates.

•Auto-purge after 60 days. A background worker runs nightly and deletes any memory entry with a timestamp older than 60 days. Active sessions stay alive indefinitely as long as new turns keep being stored.

•Memory calls do not count toward your daily request limit. Store, retrieve, clear, and stats operations are free — only /v1/chat/completions calls are counted.

•API key required. All four memory endpoints require a valid Bearer API key in the Authorization header — same key you use for chat completions.

Live Model Speeds

#	Model	Response Time	Status	Model ID

Requests Today

—

of — daily limit

Remaining Today

—

resets at midnight UTC

7-Day Total

—

requests this week

Daily usage 0%

7-Day History

Top Models Used

Model	Requests (7d)
Loading…

Recent Requests

Timestamp	Model	Endpoint
Loading…

Current Tier

—

— requests per day

Username

—

Membership Tiers

Tier	Pledge	Daily Limit
Free	$0/mo	15 requests
Private	$5/mo	150 requests
Corporal	$10/mo	330 requests
Sergeant	$20/mo	660 requests
Major	$40/mo	1,500 requests
Commander	$100/mo	Unlimited ∞

Manage Membership

To increase your daily request limit, upgrade your Patreon membership. After upgrading on Patreon, click Sync Membership to apply the new tier to your account immediately.

↗ Upgrade on Patreon