Deep Dive: How Tego OS v3.0.0's per_user Dedicated Instances Actually Work
v3.0.0 introduces per_user dedicated instances so a single digital avatar can be safely used by a whole team without interference. This post walks through the scheduling model, lifecycle, capacity guardrails, stale-lock cleanup and the Warm Pool roadmap that make the system genuinely production-grade.

Deep Dive: How Tego OS v3.0.0's per_user Dedicated Instances Actually Work
One sentence: a single avatar, safely used by a whole team — without anyone stepping on anyone else.
Why per_user had to happen
In Tego OS v2.x, an "avatar" looked like this:
- One avatar mapped to one container;
- Every user of that avatar connected to the same runtime;
- Conversation history, long-term memory, file workspace — all in that single runtime.
Drop that into a real front-line, high-concurrency production scenario — a support center, an IT helpdesk, cross-border e-commerce ops — and three problems show up immediately:
- Session bleed. A is asking about shipping policy, B is chasing a refund — the avatar's context window is squeezed between two unrelated conversations.
- Memory contamination. A "customer preference" stored by A might be overwritten or read by B.
- File conflicts. A's workspace files might get touched by B; nothing is truly per-user.
We received the same customer ask more than once: "Can we give every employee their own runtime?" That's where v3.0.0's per_user comes from.
Scheduling model: split the avatar in two
v3.0.0 adds an instanceMode field on each avatar:
| Mode | Behavior |
|---|---|
shared (default) |
v2-compatible; multiple users share the same runtime |
per_user |
Each user gets a dedicated instance container plus a dedicated S3 namespace |
The scheduling diagram:
flowchart TD
UA[User A] --> GW[Gateway + JWT]
UB[User B] --> GW
UC[User C] --> GW
GW --> SCH[per_user scheduler]
SCH --> CA[A's container<br/>S3: per-user/A/...]
SCH --> CB[B's container<br/>S3: per-user/B/...]
SCH --> CC[C's container<br/>S3: per-user/C/...]
Four key decisions in the scheduler:
- Auth side: JWT carries
instanceUserId; every downstream service can identify request ownership. - Routing side: gateway picks/launches the target container by
(avatarId, instanceUserId). - Data side: S3 read/write paths are rewritten by
instanceUserId; business code stays unaware. - Reclaim side: idle past threshold → graceful shutdown → flush state to S3 → sub-second wakeup.
Failure in any of those four steps would make per_user feel "untrustworthy," so each step needs its own guardrail (below).
Lifecycle: from PROVISIONING to RECYCLED
The per_user container state machine, roughly:
stateDiagram-v2
[*] --> PROVISIONING: first user connect
PROVISIONING --> READY: container up + heartbeat
READY --> ACTIVE: traffic
ACTIVE --> IDLE: 5min no traffic
IDLE --> RECYCLING: graceful shutdown
RECYCLING --> RECYCLED: state flushed to S3
RECYCLED --> PROVISIONING: next wakeup
PROVISIONING --> FAILED: launch failure
ACTIVE --> FAILED: heartbeat timeout / crash
FAILED --> [*]: stale-lock cleanup → auto-retry
Each state corresponds to clear engineering actions:
- PROVISIONING → READY — launch container, mount S3 namespace, pass health check, report heartbeat.
- READY → ACTIVE — gateway routes traffic, quota counters start.
- ACTIVE → IDLE — 5 minutes of zero traffic by default.
- IDLE → RECYCLING — graceful shutdown signal; container has 45s to flush state to S3.
- RECYCLING → RECYCLED — orchestrator resources reclaimed (pod / volume / service); S3 data preserved.
- RECYCLED → PROVISIONING — next time the user shows up, sub-second wakeup reusing S3 data.
Goals: low steady-state footprint, low wakeup latency, recoverable failure.
Three-axis quota guardrails
The biggest risk with per_user is resource abuse. When one avatar serves many people, quotas have to be precise to the person. v3.0.0 introduces three independent axes:
| Axis | Purpose |
|---|---|
| Per-avatar cap | Prevents one avatar from eating a tenant's total resources |
| Per-user cap | Prevents an individual from monopolizing |
| Per-tenant cap | Prevents a tenant from impacting the cluster |
Any axis hitting its threshold puts requests through the standard queue → retry → reject path.
On top of that, the capacity guardrail kicks in once host CPU/memory exceeds 80%: requests are answered with 503 + queued, never crushing the node. This is what makes "instance-as-a-service" production-grade.
Heartbeat fallback and stale-lock cleanup
Container failures are normal — assume they will happen:
- process crash;
- network blip;
- node OOM;
- kubelet restart.
Two safety nets in v3.0.0:
Heartbeat fallback
Containers report a heartbeat every 30s. No heartbeat for 3 minutes triggers automatic shutdown, eliminating "zombie container holding quota forever." This means a few users' dead containers can never permanently drain the avatar's quota.
Stale-lock cleanup
A crash can leave PROVISIONING / PENDING rows stuck. A scheduled scanner flips rows past threshold to FAILED, so the next connection auto-retries.
Together, these mean as long as the node itself isn't completely broken, per_user will self-heal in reasonable time.
S3 namespace: the data substrate of per_user
For per_user to actually work, the S3 path layout can't be a "single shared pot" anymore. v3.0.0 redesigned the namespace:
| Path | Meaning |
|---|---|
{avatarId}/{key} |
Template (admin write, all users read-only) |
{avatarId}/shared/{key} |
Runtime backup for shared mode |
{avatarId}/per-user/{userId}/{key} |
Per-user runtime backup for per_user mode |
Reads use a three-layer fallback:
runtime/<key> → template/<key> → legacy seed path.
Writes are rewritten by the gateway to per-user/{userId}/...; business code doesn't need to know who the user is.
Customer-facing meaning: when an employee leaves, you only need to wipe
per-user/<userId>/.... Clean and complete.
One-click reset of all user instances
To keep canary and rollback under control, v3.0.0 ships a "one-click reset of all user instances" operation:
- stop all
per_usercontainers; - delete all orchestrator resources (pod / volume / service);
- clear deployment records;
- clear stale locks;
- optionally preserve or wipe S3 data.
In practice it's used in two situations:
- Canary switch — when an avatar moves from
sharedtoper_user, clean orchestrator state first. - Architecture upgrades — when the underlying contract bumps (scheduler, S3 layout), all user containers should be relaunched on the new version.
Warm Pool roadmap
Today's first launch is a 5–15 second cold start. Warm Pool is on the roadmap:
- A scheduler-managed pool of pre-warmed containers;
- On first user connection, bind
instanceUserIddirectly out of the pool; - Target: bring first-launch latency from "5–15s" down to "< 1s."
Warm Pool's design is being co-developed with BusinessMonitor — pool levels will be exposed as new metrics so ops can dynamically size the pool against actual activity.
Engineering takeaway
In retrospect, per_user is not a single feature — it's an ensemble:
- Scheduling → slice the avatar by user;
- Data → S3 namespace prefixing;
- Quotas → three independent axes;
- Stability → heartbeat fallback + stale-lock cleanup;
- Recoverability → one-click reset + three-layer fallback;
- Evolvability → Warm Pool ahead.
Only when all of these line up does "one avatar, safely shared by a whole team" stop being a slogan and become a real, deliverable enterprise capability.
| Channel | How to reach us |
|---|---|
| Enterprise demo | 30-minute walkthrough of the four core scenarios |
| Private deployment | support@zhama.com |
| Full platform | https://app.zhama.com |
Related Articles

BusinessMonitor: A Single Screen for the Health and Performance of Your Digital-Avatar Platform
Tego OS v3.0.0 ships the brand-new BusinessMonitor service and dashboard. Six panels — coverage, growth, engagement, tasks, workflow, skill effectiveness — aggregated into one snapshot, backed by an in-process LRU cache and dedicated routes in both ai-studio and dashboard, so leaders can finally see the platform clearly on one screen.

Three-Layer S3 Isolation + Single-Authority Templates: How Tego OS v3.0.0 Makes the Governance Boundary Explicit
v3.0.0 rewrites the S3 namespace and the template/runtime governance protocol together — templates belong to templates, runtime belongs to users, and any regular user trying to write a template path gets 403. This post walks through the design end-to-end and explains what it actually means for enterprises.

Tego OS v3.0.0 Released: One Digital Avatar a Whole Team Can Truly Share
Tego OS v3.0.0 is here. Per-user dedicated instances, three-layer S3 data isolation, single-authority template/runtime governance, the BusinessMonitor operations dashboard, 16 out-of-the-box template skills, the Mini Chat floating window, and a fully rewritten Windows installer—turning a product that runs into a system enterprises can truly deliver against.