Keep this key private. Include it in every API request as a Bearer token.
Authentication
Add your API key to the Authorization header of every request:
Authorization: Bearer YOUR_API_KEY
The endpoint is OpenAI-compatible — just change the base_url
in your existing client and it will work with Continue.dev, Open WebUI, LM Studio, Cursor, and any other tool that supports custom OpenAI endpoints.
Settings → Connections → Add Connection
API Base URL : https://makululinux.us:2007
API Key : YOUR_API_KEY
Then go to Models and select any model from
the /v1/models list. All models are available
immediately with no further configuration.
Endpoints
Method
Path
Description
POST
/v1/chat/completions
Chat completions — OpenAI compatible
GET
/v1/models
List available model IDs
GET
/v1/model-speeds
Live model speed test results
MemPalace Long-Term Memory
POST
/v1/memory/store
Store a conversation turn into vector memory
POST
/v1/memory/retrieve
Retrieve hybrid (recent + semantic) context
DELETE
/v1/memory/clear
Wipe all memories for a conversation
GET
/v1/memory/stats
Memory count and timestamps for a conversation
MemPalace Long-Term Memory
🧠What is MemPalace?
The /v1/chat/completions endpoint is a stateless, OpenAI-compatible pass-through —
it has no built-in memory. Your application is responsible for maintaining the
messages[] array across turns.
MemPalace is the server-side long-term memory layer that sits alongside chat completions.
It uses ChromaDB — a local vector database running on the Electra server — to store every
conversation turn as a semantic embedding. When you retrieve memories, you get back a ready-to-inject
context block that you prepend to your next prompt.
Memory is scoped per conversation_id — a string you choose
and reuse across sessions. Entries are automatically purged after 60 days of inactivity.
How the hybrid retrieval works
⏱ Short-Term (Recent)
The 2 most recent conversation turns are always included, in chronological order.
This ensures the model always has immediate context — what was just said.
🔍 Long-Term (Semantic)
Up to 4 older turns are retrieved by vector similarity to the current query.
If the user mentions a topic discussed weeks ago, that memory is surfaced automatically.
Both layers are combined into a single compiled_context string with clear section
headers. Prepend it to your system message or the start of your
user message before calling /v1/chat/completions.
Typical usage flow per turn
1
Retrieve — Call POST /v1/memory/retrieve with the user's new message as the query. Get back compiled_context.
2
Inject — Prepend compiled_context to your system prompt. Build your messages[] array as usual.
3
Complete — Send the enriched request to POST /v1/chat/completions and get the AI reply.
4
Store — Call POST /v1/memory/store with the user message + the reply you just received. The server stores the turn as an embedding for future retrieval.
Code Examples
curl -X POST https://makululinux.us:2007/v1/memory/store \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"conversation_id": "my-session-abc123",
"user_message": "My name is Alex and I prefer concise answers.",
"assistant_message": "Got it, Alex! I will keep my replies short and to the point."
}'
# Response:
# { "stored": true, "conversation_id": "my-session-abc123" }
import requests
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://makululinux.us:2007"
CONV_ID = "my-session-abc123"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
def chat_with_memory(user_message: str) -> str:
# 1. Retrieve relevant memories for this query
mem = requests.post(f"{BASE_URL}/v1/memory/retrieve", headers=HEADERS, json={
"conversation_id": CONV_ID,
"query": user_message,
"max_results": 4
}).json()
context = mem.get("compiled_context", "")
# 2. Build enriched system prompt with memory injected
system_prompt = "You are a helpful assistant."
if context:
system_prompt += f"\n\nPrevious conversation context:\n{context}"
# 3. Call /v1/chat/completions
resp = requests.post(f"{BASE_URL}/v1/chat/completions", headers=HEADERS, json={
"model": "qwen/qwen3.5-122b-a10b",
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
],
"stream": False
}).json()
assistant_reply = resp["choices"][0]["message"]["content"]
# 4. Store this turn for future retrieval
requests.post(f"{BASE_URL}/v1/memory/store", headers=HEADERS, json={
"conversation_id": CONV_ID,
"user_message": user_message,
"assistant_message": assistant_reply
})
return assistant_reply
# Example conversation — memory persists across sessions
print(chat_with_memory("My name is Alex and I like Python."))
print(chat_with_memory("What's a good project idea for me?")) # Remembers your name + Python pref
Important notes
•conversation_id is your responsibility. Use any unique string — a UUID, username, session token, or hash. The same ID must be used consistently across store and retrieve calls for a session.
•Memory is shared across server restarts. ChromaDB is a persistent disk-backed database at /www/chroma_memory_db on the server. Your memories survive server updates.
•Auto-purge after 60 days. A background worker runs nightly and deletes any memory entry with a timestamp older than 60 days. Active sessions stay alive indefinitely as long as new turns keep being stored.
•Memory calls do not count toward your daily request limit. Store, retrieve, clear, and stats operations are free — only /v1/chat/completions calls are counted.
•API key required. All four memory endpoints require a valid Bearer API key in the Authorization header — same key you use for chat completions.
Live Model Speeds
#
Model
Response Time
Status
Model ID
Requests Today
—
of — daily limit
Remaining Today
—
resets at midnight UTC
7-Day Total
—
requests this week
Daily usage0%
7-Day History
Top Models Used
Model
Requests (7d)
Loading…
Recent Requests
Timestamp
Model
Endpoint
Loading…
Current Tier
—
— requests per day
Username
—
—
Membership Tiers
Tier
Pledge
Daily Limit
Free
$0/mo
15 requests
Private
$5/mo
150 requests
Corporal
$10/mo
330 requests
Sergeant
$20/mo
660 requests
Major
$40/mo
1,500 requests
Commander
$100/mo
Unlimited ∞
Manage Membership
To increase your daily request limit, upgrade your Patreon membership.
After upgrading on Patreon, click Sync Membership to
apply the new tier to your account immediately.