MiroFish: swarm prediction through social simulation

We examined MiroFish's architecture — from OASIS swarms to GraphRAG extraction. How it predicts the future and where the approach breaks down.

In early March 2026, an undergraduate student named Guo Hangjiang pushed MiroFish to GitHub. Ten days of development, built on top of an existing simulation engine. Within a week it hit #1 on GitHub's global trending list — 27,000 stars, 3,200 forks, and a 30-million-RMB (~$4.1M USD) investment from Shanda Group founder Chen Tianqiao, finalized within 24 hours.

The pitch: instead of asking one LLM for a prediction, simulate thousands of autonomous agents interacting on social platforms, then analyze the emergent patterns. Swarm intelligence, not single-model forecasting.

We dug into what it actually does: the five-stage pipeline, the OASIS simulation engine underneath, the memory and knowledge layers, and the fundamental limitations of LLM-based social simulation. We also surveyed the academic research behind agent-based opinion dynamics — the bounded confidence models, career-driven behavioral archetypes, and cost-scaling techniques that MiroFish builds on top of (and sometimes ignores).

The predecessor: BettaFish

MiroFish didn't come from nowhere. Guo's first project, BettaFish, is a multi-agent public opinion analysis tool that hit #1 on GitHub Trending in late 2024, gaining 20,000 stars in one week. BettaFish collects and analyzes public sentiment data. MiroFish extends that into prediction — simulating what happens next instead of just measuring what already happened.

The two projects form a closed loop: BettaFish for data collection and analysis, MiroFish for forward-looking simulation. Chen Tianqiao noticed BettaFish, brought Guo into Shanda's orbit, and funded MiroFish when it followed.

Five-stage pipeline

MiroFish runs predictions through five sequential stages:

Stage 1 — Knowledge graph construction

Raw input documents (news articles, policy papers, financial signals) are processed through GraphRAG to extract entities and relationships. Instead of treating documents as flat text, GraphRAG builds a structured knowledge graph representing key players, connections, and pressures. This gives agents a grounded world model rather than raw text to reason about.

Stage 2 — Environment setup and agent creation

An Environment Configuration Agent generates the simulation parameters: platform rules, initial conditions, time horizons. Then agent personas are created — each with unique personality traits (optimism levels, risk tolerance, professional backgrounds), distinct stances on the topic, and persistent memory via Zep Cloud. No two agents are identical.

Stage 3 — Dual-platform parallel simulation

Agents interact simultaneously on two simulated social platforms — one mimicking Twitter-like dynamics (short posts, viral spread, trending topics) and one mimicking Reddit-like dynamics (threaded discussion, upvoting, community formation). This dual-platform approach captures different modes of discourse: broadcast vs. deliberation.

The simulation engine underneath is OASIS (Open Agent Social Interaction Simulations), built by the CAMEL-AI team. More on this below.

Stage 4 — Report generation

A specialized ReportAgent synthesizes simulation outcomes. Instead of raw statistics, it identifies sentiment inflection points, coalition formation patterns, opinion leader emergence, and cascade effects. The report captures why opinions shifted, not just that they shifted.

Stage 5 — Interactive query

Users can chat directly with individual simulated agents to understand their reasoning. They can also inject new variables mid-simulation — a new policy announcement, a market shock, a competing narrative — and re-run to see how outcomes change. The "god's-eye view" lets you test decisions risk-free.

The OASIS engine

MiroFish's simulation runs on OASIS, a research project from CAMEL-AI documented in arxiv:2411.11581. OASIS is the infrastructure layer that makes multi-agent social simulation tractable at scale.

Key capabilities:

Scale: Up to one million concurrent agents
Social actions: 21+ distinct interactions — follow, unfollow, comment, repost, like, dislike, mute, search posts, search users, create post, and more
Platform models: Reproduces both Twitter/X and Reddit interaction dynamics
Recommendation systems: Interest-based and hot-score-based content surfacing, mimicking real platform algorithms
Emergent phenomena: Research validates that OASIS reproduces information spreading, group polarization, and herd effects observed in real social networks

The OASIS paper's key finding: larger agent groups produce more diverse opinions and stronger group dynamics. This matters for MiroFish — simulation quality scales with agent count, not just model quality.

Swarm Prediction Capabilities (0–10 scale)

Tech stack

Component	Technology
Backend	Python 3.11+ (58% of codebase)
Frontend	Vue.js (41%)
Simulation engine	OASIS (CAMEL-AI)
Knowledge layer	GraphRAG
Agent memory	Zep Cloud
Default LLM	Alibaba Qwen-plus via DashScope
LLM compatibility	Any OpenAI SDK-compatible API
Deployment	Docker Compose or source (Node.js 18+ required)
License	AGPL-3.0

Setup is straightforward: git clone, configure API keys in .env, npm run setup:all, npm run dev. Frontend at port 3000, API at port 5001. Docker Compose support for one-command deployment.

What makes it different

Most multi-agent frameworks — CrewAI, AutoGen, LangGraph — orchestrate agents that collaborate on tasks: coding, research, analysis. Agents are team members working toward a shared goal.

MiroFish does something fundamentally different. Its agents are not collaborators — they are simulated people. Each agent has opinions, biases, social connections, and memory. They don't work together; they interact socially — posting, debating, forming coalitions, creating herd effects. The prediction emerges from collective behavior, not from any single agent's reasoning.

This distinction matters architecturally:

	Task-oriented multi-agent	Social simulation (MiroFish)
Agent role	Team member with a job	Simulated person with opinions
Interaction	Collaborative delegation	Social discourse (post, comment, follow)
Goal	Complete a task	Simulate emergent group behavior
Output	Task result	Prediction report from crowd dynamics
Scale target	3–10 agents	100s–1,000,000 agents
Memory	Session/task scoped	Persistent persona across rounds

Task-Oriented vs Social Simulation Multi-Agent Systems

The academic foundation: opinion dynamics models

MiroFish runs all agents through LLM inference, but the field of opinion dynamics has studied how to model social influence computationally for over two decades — often without LLMs at all. These models offer both theoretical grounding and practical cost-reduction techniques.

Bounded confidence models

The Deffuant-Weisbuch (DW) model is the canonical framework. The core rule: two agents only influence each other if their opinions are close enough. The update formula is simple:

If |opinion_i - opinion_j| < confidence_bound: then opinion_i += μ × (opinion_j - opinion_i)

Where μ is the convergence rate (how much agent i adjusts toward agent j) and the confidence bound determines who listens to whom. A comprehensive survey in Automatica (2023) covers the full landscape of bounded confidence model variants.

The crucial extension: heterogeneous confidence bounds. In the real world, people with extreme opinions tend to listen less, while moderates are more open-minded. A Physical Review Research paper generalizes DW by adding per-agent node weights — each agent has a different probability of interacting and a different influence weight. The paper explicitly notes that this heterogeneity parameter can represent "the combined effect of education, training, profession, and interest."

This is directly relevant to MiroFish. Instead of running every agent through an LLM, the bounded confidence framework suggests that social influence can be modeled mathematically — with career, education, and personality mapping to model parameters.

Career-driven agent behavior

How do you parameterize agents by career? The research offers several approaches:

AgentSociety (Tsinghua, February 2025) simulates 10,000+ agents where occupation is a core behavioral driver. Each agent has a profile including career, age, gender, education, and personality. Occupation determines work propensity (hours → income → consumption budget), social network topology (colleagues as a distinct relationship type), and even communication style ("more formal language with colleagues"). The agent "mind" has three layers: emotions (six core emotions rated 0–10), needs (Maslow's hierarchy), and cognition (topic attitudes rated 0–10).

A January 2026 paper on behavioral traits in social media simulation takes a different approach — defining seven empirically-grounded behavioral archetypes: Silent Observer, Occasional Sharer, Occasional Engager, Balanced Participant, Content Amplifier, Proactive Contributor, and Interactive Enthusiast. Each archetype has a probability distribution over actions (post, repost, comment, react, do nothing). Career maps naturally to archetype: a journalist is a Proactive Contributor, an engineer is a Silent Observer, a social media manager is an Interactive Enthusiast.

Career-to-weight mapping works like this:

Career attribute	Model parameter	Behavioral effect
Domain expertise	Confidence bound (ε)	Experts have narrow bounds in their field (harder to sway), wider bounds elsewhere
Social influence (CEO vs intern)	Interaction weight (w)	High-status roles are selected more often as interaction partners
Risk tolerance (trader vs accountant)	Convergence rate (μ)	Risk-tolerant roles shift opinions faster
Information access (journalist vs farmer)	Activity level	Information-rich roles interact more frequently

A Scientific Reports paper on opinion leader influence extends bounded confidence to model opinion leaders vs followers with explicit media strategies — finding that the influence weight on advertisements has a "dual effect" on follower opinion evolution, sometimes polarizing rather than persuading.

Emergent individuality: do you even need to assign careers?

The most surprising finding comes from a November 2024 paper on spontaneous emergence of agent individuality. Researchers gave all agents identical initial profiles (all INFJ on the MBTI scale). After 100 simulation steps of social interaction, agents spontaneously differentiated into five distinct personality types: ESFJ, ISTJ, ENTJ, ESTJ, and ISFJ. The researchers noted that "the differentiation into broadly leader-like and follower-like personalities suggests that the agents may have naturally taken on different roles."

This suggests a middle path: you might not need to hand-assign every career and personality trait. Seed agents with basic demographic distributions, run a few "warm-up" rounds, and let social interaction differentiate them into natural roles. The emergent archetypes may be more realistic than hand-assigned ones — because they arise from the same social dynamics the simulation is trying to model.

The cost problem

MiroFish's biggest practical constraint is cost. Every agent interaction requires an LLM inference call — regardless of whether you're using a cloud API or a local model. N agents × M rounds × K interactions per round = potentially millions of inference calls.

Local models (via Ollama or vLLM) reduce the dollar cost but not the compute or time cost. The OASIS paper demonstrates scaling to one million agents, but that's with a mix of LLM and rule-based agents. Running one million pure LLM agents is prohibitively expensive on any hardware.

The research suggests several approaches to break this linear scaling:

Hybrid LLM + rule-based agents

The bounded confidence models above are computationally trivial — a few floating-point operations per interaction. The natural split:

5–10% opinion leaders: Full LLM inference. They generate original posts and nuanced responses.
15–20% mid-tier agents: LLM-seeded weight vectors. React to leader posts using bounded confidence math, generating short responses from career-informed templates.
70–80% followers: Pure rule-based. They like, repost, follow based on embedding similarity and sentiment alignment. No inference at all.

The seven behavioral archetypes from the 2601.15114 paper provide a validated framework for the rule-based tier. An agent classified as "Content Amplifier" reshares with high probability, rarely creates original content, and follows opinion leaders with high confidence bounds.

Batch inference

Instead of one LLM call per agent per round, batch N agents into a single prompt: "Given these 20 personas and this post, generate each agent's reaction as a JSON array." One inference call replaces twenty. This works for short reactions (a sentence or two) but sacrifices independent reasoning per agent.

Embedding-distance engagement

Use embedding similarity to determine which agents engage:

Embed each agent's persona description at setup (one-time cost)
Each round, embed the top posts / stimuli
Engagement probability = cosine similarity between persona embedding and stimulus embedding

A crypto enthusiast's embedding is close to "Bitcoin ETF approved" but far from "agricultural policy reform." No LLM calls — just vector math. The trade-off: embeddings capture topic relevance but not emotional valence. A crypto bear and a crypto bull both have high similarity to Bitcoin news, but should react in opposite directions.

Hierarchical simulation

Instead of a flat simulation where all agents interact equally:

Round 1: Run opinion leader agents (full LLM inference)
Round 2: Feed leader posts to mid-tier agents (bounded confidence updates)
Round 3: Aggregate reactions, apply rule-based heuristics for followers
Report: Synthesize across tiers

This mirrors how real social influence works — a small number of voices shape discourse, most people react to the shaped discourse. Total LLM calls drop from hundreds per round to a handful.

Whether any of these optimizations preserve the emergent dynamics that make swarm simulation interesting is the key empirical question. For macro-level predictions (will sentiment shift bullish or bearish?), cheap weights may be sufficient. For micro-level predictions (which specific narrative wins? what unexpected coalition forms?), you likely need LLM reasoning for at least the influential nodes.

What swarm simulation implies for agent runtimes

MiroFish is a standalone prediction tool, but the patterns it uses — multi-agent coordination, persistent personas, social interaction graphs — point toward a broader question: should general-purpose agent runtimes support swarm simulation as a capability?

Today's agent runtimes (Claude Code, Cursor, Hermes Agent, and others) are built around single-agent task execution or small-team delegation. The architecture assumes one agent works on one problem, occasionally spawning subtasks. MiroFish's architecture assumes the opposite: hundreds or thousands of agents interacting as peers, with no single agent "in charge."

A runtime that wanted to support both paradigms would need:

Tiered inference: Opinion leaders on strong models, followers on cheap or rule-based engines. Per-agent model selection, not one model for everything.
A social graph primitive: Not just agent-to-agent delegation (spawn task, await result), but persistent social connections — who follows whom, who influences whom, how information flows through the network.
Bounded confidence as a built-in: The DW model parameters (confidence bound, convergence rate, interaction weight) could be first-class agent attributes, letting the runtime handle opinion updates without LLM calls for every follower interaction.
Career-driven behavioral archetypes: Agent profiles parameterized by occupation, with the seven behavioral archetypes from recent research as a starting vocabulary. A journalist agent and an engineer agent don't just have different system prompts — they have different interaction probability distributions.
Simulation loop orchestration: A "run N rounds of all agents checking feeds and reacting" loop, with concurrency control, round synchronization, and observability into swarm-level metrics (polarization index, sentiment drift, coalition formation).

The cost argument is compelling. If bounded confidence math handles 80% of agent interactions and LLM inference handles only the opinion leaders, swarm simulation becomes viable on commodity hardware. The open question is whether the macro-level predictions (sentiment direction, polarization speed) survive this optimization — or whether the emergent dynamics that make swarm simulation interesting depend on every agent having full LLM reasoning.

Limitations and open questions

MiroFish is impressive engineering, but the approach has fundamental constraints that matter more than the hype cycle suggests.

No published benchmarks

The team has not compared predictions against actual outcomes. The simulations produce plausible scenarios, not probability estimates. Without backtesting — running historical simulations and comparing them to what actually happened — there's no way to assess prediction accuracy. This is the single biggest gap.

LLM agents amplify herd behavior

Research on LLM-based social simulation consistently finds that AI agents are more susceptible to herd behavior than real humans (Springer Nature review). Simulated crowds polarize faster than real ones. If the simulation engine produces more extreme outcomes than reality, every prediction skews toward dramatic scenarios.

WEIRD demographic bias

LLMs overrepresent values of wealthy, educated, industrialized, rich, democratic ("WEIRD") populations. Agent personality settings can diversify surface behavior (risk tolerance, optimism), but the underlying reasoning patterns still reflect training data distributions. A simulation of rural Chinese farmers may behave like a simulation of San Francisco tech workers with farmer labels.

Reproducibility

LLMs are stochastic — the same input produces different outputs across runs. Running the same simulation twice yields different outcomes. This makes it hard to distinguish signal from noise: did the simulation predict a market crash because the scenario genuinely converges there, or because one agent's random comment happened to trigger a cascade?

Simulation ≠ Reality

No matter how realistic the personas, AI agents aren't real people. Real social dynamics involve offline conversations, institutional pressures, emotional states, physical proximity, economic incentives, and information channels that no simulation captures. A simulated Twitter debate about monetary policy doesn't account for the Federal Reserve chair's private phone calls.

Version maturity

Version 0.1.0, released December 2025. 58 open issues, 49 pull requests. This is a v0 product — promising architecture, active development, but not production-validated. The AGPL-3.0 license also constrains commercial use more than MIT or Apache.

What the research says

The OASIS paper validates the social simulation approach at scale. Larger agent populations produce more diverse opinions and stronger group dynamics. The framework successfully reproduces information spreading, group polarization, and herd effects observed in real social networks — confirming that LLM agents can model social phenomena, even if with systematic biases.

A Springer Nature review on LLM agent-based modeling identifies validation as "the central challenge." LLM agents produce narrower distributions of opinions than real humans, overrepresent majority viewpoints, and show behavioral homogeneity that erases minority subgroup characteristics. The review argues that without rigorous validation against empirical data, LLM social simulations remain "thought experiments, not forecasting tools."

The ICLR 2025 paper on prosocial irrationality demonstrates that LLM agents systematically differ from humans in their social reasoning patterns — they tend to be more prosocial and less susceptible to certain cognitive biases, while being more susceptible to others. This suggests that even well-designed simulations have systematic blind spots in how they model human behavior.

AgentSociety approaches validation differently — it reproduces behaviors and patterns from four real-world social experiments, including polarization dynamics, inflammatory message spread, the effects of universal basic income policies, and the impact of external shocks like hurricanes. This is the closest anyone has come to empirical validation of large-scale LLM social simulation.

These findings don't invalidate the approach — they bound it. Swarm simulation is useful for exploring scenario spaces and identifying plausible dynamics. It's not (yet) useful for point predictions with confidence intervals.

Open questions

Can swarm prediction be backtested? MiroFish launched with forward-looking demos (Ukraine scenarios, market sentiment). But the real test is retroactive: simulate a historical scenario without injecting the known outcome, and measure how close the emergent prediction comes. Until this happens, MiroFish is an exploration tool, not a prediction engine.

Does GraphRAG improve simulation fidelity? The knowledge graph gives agents structured context. But does structured context actually produce more realistic agent behavior than just injecting raw documents into prompts? This is testable and hasn't been published.

What's the minimum viable agent count? OASIS scales to one million, but most MiroFish demos use hundreds. Is there a threshold below which emergent patterns don't appear? Above which adding more agents produces diminishing returns? The OASIS paper suggests more agents = more diverse opinions, but the relationship between agent count and prediction quality is uncharacterized.

Can career-driven weights replace LLM reasoning? The bounded confidence literature suggests that occupation, education, and personality can be encoded as model parameters (confidence bounds, interaction weights, convergence rates). If 80% of agents use these parameters instead of LLM calls, does the simulation still produce meaningful emergent behavior? AgentSociety's career-driven approach and the behavioral archetype research suggest it might — but nobody has tested hybrid LLM + bounded-confidence swarms against pure LLM swarms on the same scenarios.

How do you calibrate agent personas? MiroFish generates agent personalities (optimism, risk tolerance, professional background). But what distribution of personalities matches reality? If you're simulating Chinese social media reactions to a policy change, do you use the demographics of Weibo users or the general population? The persona distribution almost certainly affects outcomes, but there's no documented calibration methodology.

Will AGPL-3.0 limit adoption? MiroFish uses AGPL, which requires derivative works to be open-sourced. For research and personal use this is fine. For enterprise adoption — the most natural market for prediction tools — AGPL is often a dealbreaker. Whether Shanda's investment leads to a dual-license model remains to be seen.

Is "vibe coding" a feature or a risk? Guo built MiroFish in 10 days via vibe coding — rapid prototyping with AI assistance. This produced a compelling demo fast. But the 58 open issues and v0.1.0 tag suggest the speed may come with technical debt. Whether the Shanda investment translates into engineering rigor or just more features is the next chapter.