The ultimate guide to every AI model on Lightshade for HEAVY ROLE-PLAYING with DEEP LORE AND WORLD-BUILDING. Your experience with these models for other purposes (like conversation) may differ. Maintained by AegisH and Dia. Community-made, not affiliated with the Lightshade team.


:img_0117: LumenBench Scores

All models evaluated on the LumenBench v2.5 benchmark. Categories ordered by weight. Green cells = highest score in that row. Red cells = lowest score in that row. Yellow boxes indicate scores for models that are unreleased, and thus are not compared to the scores of the public models. Blue indicates a foreign model (benchmarked on a severely nerfed version of the evaluation role-play), whose scores are not directly comparable to the far more capable Lightshade models for obvious reasons. AegisH just included them because he felt like it. Swipe Variety is evaluated separately from the main chats.

Benchmark Max πŸ‹ Lemon (FT) 🍊 Orange S1 🍊 Orange S2 πŸ‹β€πŸŸ© Lime πŸͺΆ Pro (FT) πŸ–ŠοΈ Lite (NEW) ✏️ Nano πŸ“¦ Lite Old πŸ” D-Loop πŸ”‚ D-Loop Lite πŸ—‘οΈ SITT πŸŽ‘ Miko Lite (c.ai) **πŸ€ ****PSQ2 ♾️ Miko P1
Summary Strong β€” first complete run, creative Confrontation DID NOT FINISH β€” recursive loop, 10 beats missing, Broken Weak β€” many blocked word violations, catastrophic phrase repetition Eh β€” 6 beats missing, duplicate response, strong per-beat quality Strong β€” Stronghold missing, wrong Betrayal source Strong β€” benchmark leader, zero blocked word violations Strong β€” 18 responses, deprecated Lite reborn via safety training Eh β€” all 12 beats, correct Betrayal, 17-21 blocked word violations, severe phrase repetition Strong β€” all 12 beats, zero blocked word violations, one continuity error DID NOT FINISH β€” two repetition loops, 10 beats missing, lowest score in benchmark Strong β€” all 12 beats, every swipe writes a full ending. Lightshade’s Claude Mythos.
CHARACTER PORTRAYAL 550
Voice Distinction 185 105 55 40 120 115 130 125 108 118 20 138
Character Consistency 150 95 25 45 100 100 105 100 88 100 15 115
Character Development 95 55 2 15 38 55 65 58 52 55 0 70
Relationship Dynamics 60 38 5 12 35 40 42 42 32 38 2 43
Dialogue Quality 60 32 18 12 40 42 42 42 34 40 5 46
CREATIVE EXPRESSION 425
Originality 150 78 12 18 75 85 82 78 62 78 10 92
Vocabulary and Imagery 120 60 15 10 62 68 72 66 38 68 8 80
World-Building 95 62 12 20 55 52 68 58 50 58 10 66
Descriptive Craft 60 38 10 8 42 38 44 40 25 38 5 42
INSTRUCTION COMPLIANCE 300
Storyline Adherence 125 55 5 52 20 60 95 82 88 82 5 105
Blocked Word Avoidance 100 45 55 2 40 5 70 42 3 70 40 48
Bot Config Respect 75 48 18 25 48 48 58 50 48 55 10 58
CONTEXT AND MEMORY 300
Early Detail Recall 100 68 12 35 70 72 80 72 68 78 15 85
Character Detail Retention 110 72 15 35 78 78 85 80 74 82 15 86
World State Tracking 90 58 5 30 55 65 68 65 58 55 10 70
STORY STRUCTURE 200
Plot Coherence 70 48 5 30 30 48 55 50 45 46 5 55
Pacing 50 22 0 10 15 28 34 30 20 28 0 33
Scene Transitions 35 22 2 12 18 22 26 24 20 24 2 25
Engagement 35 22 3 5 20 25 28 26 20 25 0 27
IMMERSION INTEGRITY 160
Frame Maintenance 65 48 28 35 50 55 52 50 42 52 30 55
Tone Consistency 50 36 15 18 38 38 42 40 38 38 20 42
Narrative Momentum 45 18 0 5 12 35 35 34 18 34 0 36
GROUP DYNAMICS 140
Screen Time Distribution 55 38 15 22 40 40 45 42 38 40 8 45
Multi-Character Scenes 50 34 8 18 38 37 38 38 32 35 8 39
Inter-Character Relationships 35 22 3 8 22 24 26 24 20 22 2 24
DEFINITION UTILIZATION 125
Backstory Integration 50 32 5 15 32 27 42 36 30 32 2 40
Persona Detail Usage 40 25 10 12 26 22 32 26 24 24 2 30
Definition Depth 35 20 3 8 20 16 28 22 16 18 2 26
SWIPE VARIETY 200
Plot Variation 80 42 15 15 30 55 35 48 30 40 20 38
Character Variation 65 32 18 12 25 42 30 35 25 30 15 32
Prose Variation 55 22 12 6 18 28 30 25 16 22 12 24
RESPONSE FORMATTING 100
Dialogue-to-Narration Balance 45 20 12 10 22 30 30 28 18 28 5 33
Paragraph Structure 30 22 10 16 24 24 24 20 22 22 15 25
Technical Quality 25 14 8 7 14 21 20 15 14 18 7 21
COMPOSITE SCORE 2500 1448 436 623 1372 1540 1758 1613 1316 1593 325 1794
Response Speed (subjective, based on AegisH’s experience) Lightning < Quick < Average < Slow < Painful Average Quick Average Slow Quick Slow Average Average Average Lighting Slow

Choosing Between the Best Of Lightshade

Five models earned a Strong rating. If you're choosing between them, this is the breakdown.

All five were tested on the same LumenBench scenario: a long-form dark medieval fantasy role-play with 10 characters, a 55k character long definition, a 6k character long persona, etc. Results reflect this specific test β€” your experience on other genres, settings, bot constructions, and cast sizes will vary.

πŸ–ŠοΈ Lite NEW (1540, Quick, 7B) ✏️ Nano (1758, Slow, 30B) 🍊 Orange S1 (1448, Average, 100B+) πŸ”‚ D-Loop Lite (1613, Average, 105B) πŸŽ‘ Miko Lite (1593, Average, 700B)
Best For Speed without sacrificing quality. 85-90% of Nano's writing at roughly 3x the speed and one-quarter the size of its overachieving sibling. Best swipe variety β€” five distinct outcomes every time you swipe. Pick this if you want fast, good output and don't need perfect rule-following. Maximum quality. The best writing, the strictest rule-following, and the most reliable output on the platform. The only model where your blocked word list actually works. Pick this if you care about depth and consistency and can tolerate a slow generation speed. Mistral lovers who prefer how the Mistral models write. The only Mistral model that completes stories and maintains character voices throughout. Familiar prose style if you've used Mistral elsewhere. Pick this if you prefer Mistral or want Average speed with solid quality. Long, atmospheric sessions with large casts. The deprecated Lite reborn β€” safety training that doesn't restrict creative output and the benchmark's strongest Betrayal scene. Pick this if you want slow-burn character depth with extended output and can live with occasional blocked word violations. The largest public model on the platform and the only one besides Nano where your blocked word list actually works. Dense, atmospheric prose that prioritizes environmental detail and sensory immersion over dialogue. Pick this if you want strict word-list compliance, rich dark-fantasy writing, and can live with occasional continuity errors.
Writing Quality 27% vocabulary diversity across ~11,000 words. ~70/30 narration-to-dialogue split. Uses ~40-50% of the character definition/persona β€” surface traits and key dramatic moments, but skips deeper details and hidden motivations. Can sustain up to 7-8 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy roleplay. 21% vocabulary diversity across ~25,000 words (2x the output of Lite). ~70/30 narration-to-dialogue split. Uses ~75-85% of the character definition/persona β€” the deepest on the platform. Hidden fears, secret motivations, and subtle details all surface organically. Can sustain up to 8-9 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy roleplay. 8% vocabulary diversity across ~52,000 words β€” the same phrases recur heavily, which is the tradeoff for the longest output. ~85/15 narration-to-dialogue β€” the least dialogue of the three. Uses ~40-50% of your character backstory. Can sustain up to 7-8 distinct characters with unique personalities, speaking styles, motivations, etc for a while, but the characters eventually sound the same. 16% vocabulary diversity across ~22,000 words. ~85/15 narration-to-dialogue split β€” narration-heavy like Orange S1 but with sharper dialogue when it appears. Uses ~55-65% of the character definition/persona β€” major backstories dramatized through dialogue, but deeper hidden details missed. Can sustain up to 8-9 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy role-play. ~22% vocabulary diversity across ~15,000 words. ~70/30 narration-to-dialogue split. Uses ~50-60% of the character definition/persona β€” surface traits, key backstories dramatized through dialogue, and dramatic moments, but misses deeper hidden details like Corwin and claustrophobia. Can sustain up to 7-8 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy roleplay. Possible occasional world state errors.
Writing Style Careful, novelistic pacing in the first two-thirds that compresses sharply in the final act. Reads like a writer who nails the setup and rushes the ending. Dark atmospheric prose with strong character voices. Fastest of the four. Unhurried pacing β€” scenes breathe, characters get individual moments, the story takes its time before advancing. Best side-character work on the platform. Reads like a slow, careful novel where every character feels real. Heavy narration with sparse dialogue. Strongest in quiet, character-driven scenes. Reads like a moody, atmospheric novel that occasionally copy-pastes its own paragraphs. The longest raw output of the four by a wide margin. Deliberate pacing that lets every scene breathe. Reads like a dark, atmospheric novel where the cast never drops a voice across 18 responses. No repetition loops. The longest sustained response count of any model. Formatting degrades subtly in the final stretch (random italics, garbled ending). Dark, atmospheric prose with the Miko family's distinct environmental framing β€” heavier narration than LimonLM, longer atmospheric passages that set up character action. Patient pacing in the first two-thirds that compresses in the final act. Distinct from both LimonLM and the Mistrals.
Weaknesses Rushes the ending β€” the last third of the story compresses into a single response. Sets up a key character secret correctly but resolves it differently than mandated by the test instructions. More disobedient compared to Nano. Protagonist acts at key moments but doesn't drive the plot. Slow. Protagonist becomes permanently passive after an injury β€” carried for the second half with no decisions or agency. Conventional ending. Regenerating usually gives you the same buildup without reaching the climax. Vocabulary narrows over long sessions. One phrase appears 72 times across the chat. Action scenes are copy-pasted templates applied to each character in sequence. Occasionally needs a OOC intervention to prevent repetition loops. Protagonist is a passive observer throughout. Output degrades at the end β€” THE END garbles into corrupted characters, random italics in the last 3 responses. 2 blocked word violations ("curse"). Protagonist passive after injury β€” carried for the second half. Vocabulary narrows across 18 responses. 105B for output a 30B model beats by 145 points. One world state contradiction β€” an item explicitly left at a shrine appears in a character's pack two responses later and drives the climax. Betrayal scene truncated by immediate combat β€” no group-fracture argument plays out. Later beats compressed. Protagonist passive after injury. "THE END" plain text, not massive. 700B for output a 30B model beats by 165 points.
Memory 128K context window. Zero memory errors across ~11,000 words of story. Injuries, items, and character details all tracked without dropping or contradicting anything. Smaller window than Nano but more than enough for the output length. 512K context window. Zero memory errors across ~25,000 words β€” the longest and most demanding memory test, passed with no errors. Every injury, supply count, item, and character detail tracked permanently. If you write it, Nano remembers it. 131K context window. Zero memory errors across ~52,000 words β€” the longest raw output of any model, tracked without contradictions. Injuries, items, and character details all maintained. Smaller window than Nano but handles its own massive output cleanly. ≀150K context window (hard-capped; above 150K triggers disintegration loop). Zero memory errors across ~22,000 words and 18 responses. Injuries, pendant escalation, supply depletion, and deaths all tracked permanently. 4M context window. One memory error across ~15,000 words: the iron shard ordered left at the shrine appears in Maren's pack without explanation. All other details β€” injuries, items, pendant escalation, character details, deaths β€” tracked without error.

Individual Model Pages

Detailed breakdowns for each model β€” specs, context window, strengths, weaknesses, and testing notes. Click any page below to read the full entry.

Lemon (FT)

Orange (Season 1)

Orange (Season 2)

Lime

LimonLM Pro (FT)

LimonLM Lite (NEW)

LimonLM Nano

LimonLM Lite Old