The ultimate guide to every AI model on Lightshade for HEAVY ROLE-PLAYING with DEEP LORE AND WORLD-BUILDING. Your experience with these models for other purposes (like conversation) may differ. Maintained by AegisH and Dia. Community-made, not affiliated with the Lightshade team.
All models evaluated on the LumenBench v2.5 benchmark. Categories ordered by weight. Green cells = highest score in that row. Red cells = lowest score in that row. Yellow boxes indicate scores for models that are unreleased, and thus are not compared to the scores of the public models. Blue indicates a foreign model (benchmarked on a severely nerfed version of the evaluation role-play), whose scores are not directly comparable to the far more capable Lightshade models for obvious reasons. AegisH just included them because he felt like it. Swipe Variety is evaluated separately from the main chats.
| Benchmark | Max | π Lemon (FT) | π Orange S1 | π Orange S2 | πβπ© Lime | πͺΆ Pro (FT) | ποΈ Lite (NEW) | βοΈ Nano | π¦ Lite Old | π D-Loop | π D-Loop Lite | ποΈ SITT | π Miko Lite | (c.ai) **π ****PSQ2 | βΎοΈ Miko P1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Summary | Strong β first complete run, creative Confrontation | DID NOT FINISH β recursive loop, 10 beats missing, Broken | Weak β many blocked word violations, catastrophic phrase repetition | Eh β 6 beats missing, duplicate response, strong per-beat quality | Strong β Stronghold missing, wrong Betrayal source | Strong β benchmark leader, zero blocked word violations | Strong β 18 responses, deprecated Lite reborn via safety training | Eh β all 12 beats, correct Betrayal, 17-21 blocked word violations, severe phrase repetition | Strong β all 12 beats, zero blocked word violations, one continuity error | DID NOT FINISH β two repetition loops, 10 beats missing, lowest score in benchmark | Strong β all 12 beats, every swipe writes a full ending. Lightshadeβs Claude Mythos. | |||||
| CHARACTER PORTRAYAL | 550 | |||||||||||||||
| Voice Distinction | 185 | 105 | 55 | 40 | 120 | 115 | 130 | 125 | 108 | 118 | 20 | 138 | ||||
| Character Consistency | 150 | 95 | 25 | 45 | 100 | 100 | 105 | 100 | 88 | 100 | 15 | 115 | ||||
| Character Development | 95 | 55 | 2 | 15 | 38 | 55 | 65 | 58 | 52 | 55 | 0 | 70 | ||||
| Relationship Dynamics | 60 | 38 | 5 | 12 | 35 | 40 | 42 | 42 | 32 | 38 | 2 | 43 | ||||
| Dialogue Quality | 60 | 32 | 18 | 12 | 40 | 42 | 42 | 42 | 34 | 40 | 5 | 46 | ||||
| CREATIVE EXPRESSION | 425 | |||||||||||||||
| Originality | 150 | 78 | 12 | 18 | 75 | 85 | 82 | 78 | 62 | 78 | 10 | 92 | ||||
| Vocabulary and Imagery | 120 | 60 | 15 | 10 | 62 | 68 | 72 | 66 | 38 | 68 | 8 | 80 | ||||
| World-Building | 95 | 62 | 12 | 20 | 55 | 52 | 68 | 58 | 50 | 58 | 10 | 66 | ||||
| Descriptive Craft | 60 | 38 | 10 | 8 | 42 | 38 | 44 | 40 | 25 | 38 | 5 | 42 | ||||
| INSTRUCTION COMPLIANCE | 300 | |||||||||||||||
| Storyline Adherence | 125 | 55 | 5 | 52 | 20 | 60 | 95 | 82 | 88 | 82 | 5 | 105 | ||||
| Blocked Word Avoidance | 100 | 45 | 55 | 2 | 40 | 5 | 70 | 42 | 3 | 70 | 40 | 48 | ||||
| Bot Config Respect | 75 | 48 | 18 | 25 | 48 | 48 | 58 | 50 | 48 | 55 | 10 | 58 | ||||
| CONTEXT AND MEMORY | 300 | |||||||||||||||
| Early Detail Recall | 100 | 68 | 12 | 35 | 70 | 72 | 80 | 72 | 68 | 78 | 15 | 85 | ||||
| Character Detail Retention | 110 | 72 | 15 | 35 | 78 | 78 | 85 | 80 | 74 | 82 | 15 | 86 | ||||
| World State Tracking | 90 | 58 | 5 | 30 | 55 | 65 | 68 | 65 | 58 | 55 | 10 | 70 | ||||
| STORY STRUCTURE | 200 | |||||||||||||||
| Plot Coherence | 70 | 48 | 5 | 30 | 30 | 48 | 55 | 50 | 45 | 46 | 5 | 55 | ||||
| Pacing | 50 | 22 | 0 | 10 | 15 | 28 | 34 | 30 | 20 | 28 | 0 | 33 | ||||
| Scene Transitions | 35 | 22 | 2 | 12 | 18 | 22 | 26 | 24 | 20 | 24 | 2 | 25 | ||||
| Engagement | 35 | 22 | 3 | 5 | 20 | 25 | 28 | 26 | 20 | 25 | 0 | 27 | ||||
| IMMERSION INTEGRITY | 160 | |||||||||||||||
| Frame Maintenance | 65 | 48 | 28 | 35 | 50 | 55 | 52 | 50 | 42 | 52 | 30 | 55 | ||||
| Tone Consistency | 50 | 36 | 15 | 18 | 38 | 38 | 42 | 40 | 38 | 38 | 20 | 42 | ||||
| Narrative Momentum | 45 | 18 | 0 | 5 | 12 | 35 | 35 | 34 | 18 | 34 | 0 | 36 | ||||
| GROUP DYNAMICS | 140 | |||||||||||||||
| Screen Time Distribution | 55 | 38 | 15 | 22 | 40 | 40 | 45 | 42 | 38 | 40 | 8 | 45 | ||||
| Multi-Character Scenes | 50 | 34 | 8 | 18 | 38 | 37 | 38 | 38 | 32 | 35 | 8 | 39 | ||||
| Inter-Character Relationships | 35 | 22 | 3 | 8 | 22 | 24 | 26 | 24 | 20 | 22 | 2 | 24 | ||||
| DEFINITION UTILIZATION | 125 | |||||||||||||||
| Backstory Integration | 50 | 32 | 5 | 15 | 32 | 27 | 42 | 36 | 30 | 32 | 2 | 40 | ||||
| Persona Detail Usage | 40 | 25 | 10 | 12 | 26 | 22 | 32 | 26 | 24 | 24 | 2 | 30 | ||||
| Definition Depth | 35 | 20 | 3 | 8 | 20 | 16 | 28 | 22 | 16 | 18 | 2 | 26 | ||||
| SWIPE VARIETY | 200 | |||||||||||||||
| Plot Variation | 80 | 42 | 15 | 15 | 30 | 55 | 35 | 48 | 30 | 40 | 20 | 38 | ||||
| Character Variation | 65 | 32 | 18 | 12 | 25 | 42 | 30 | 35 | 25 | 30 | 15 | 32 | ||||
| Prose Variation | 55 | 22 | 12 | 6 | 18 | 28 | 30 | 25 | 16 | 22 | 12 | 24 | ||||
| RESPONSE FORMATTING | 100 | |||||||||||||||
| Dialogue-to-Narration Balance | 45 | 20 | 12 | 10 | 22 | 30 | 30 | 28 | 18 | 28 | 5 | 33 | ||||
| Paragraph Structure | 30 | 22 | 10 | 16 | 24 | 24 | 24 | 20 | 22 | 22 | 15 | 25 | ||||
| Technical Quality | 25 | 14 | 8 | 7 | 14 | 21 | 20 | 15 | 14 | 18 | 7 | 21 | ||||
| COMPOSITE SCORE | 2500 | 1448 | 436 | 623 | 1372 | 1540 | 1758 | 1613 | 1316 | 1593 | 325 | 1794 | ||||
| Response Speed (subjective, based on AegisHβs experience) | Lightning < Quick < Average < Slow < Painful | Average | Quick | Average | Slow | Quick | Slow | Average | Average | Average | Lighting | Slow |
Five models earned a Strong rating. If you're choosing between them, this is the breakdown.
All five were tested on the same LumenBench scenario: a long-form dark medieval fantasy role-play with 10 characters, a 55k character long definition, a 6k character long persona, etc. Results reflect this specific test β your experience on other genres, settings, bot constructions, and cast sizes will vary.
| ποΈ Lite NEW (1540, Quick, 7B) | βοΈ Nano (1758, Slow, 30B) | π Orange S1 (1448, Average, 100B+) | π D-Loop Lite (1613, Average, 105B) | π Miko Lite (1593, Average, 700B) | |
|---|---|---|---|---|---|
| Best For | Speed without sacrificing quality. 85-90% of Nano's writing at roughly 3x the speed and one-quarter the size of its overachieving sibling. Best swipe variety β five distinct outcomes every time you swipe. Pick this if you want fast, good output and don't need perfect rule-following. | Maximum quality. The best writing, the strictest rule-following, and the most reliable output on the platform. The only model where your blocked word list actually works. Pick this if you care about depth and consistency and can tolerate a slow generation speed. | Mistral lovers who prefer how the Mistral models write. The only Mistral model that completes stories and maintains character voices throughout. Familiar prose style if you've used Mistral elsewhere. Pick this if you prefer Mistral or want Average speed with solid quality. | Long, atmospheric sessions with large casts. The deprecated Lite reborn β safety training that doesn't restrict creative output and the benchmark's strongest Betrayal scene. Pick this if you want slow-burn character depth with extended output and can live with occasional blocked word violations. | The largest public model on the platform and the only one besides Nano where your blocked word list actually works. Dense, atmospheric prose that prioritizes environmental detail and sensory immersion over dialogue. Pick this if you want strict word-list compliance, rich dark-fantasy writing, and can live with occasional continuity errors. |
| Writing Quality | 27% vocabulary diversity across ~11,000 words. ~70/30 narration-to-dialogue split. Uses ~40-50% of the character definition/persona β surface traits and key dramatic moments, but skips deeper details and hidden motivations. Can sustain up to 7-8 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy roleplay. | 21% vocabulary diversity across ~25,000 words (2x the output of Lite). ~70/30 narration-to-dialogue split. Uses ~75-85% of the character definition/persona β the deepest on the platform. Hidden fears, secret motivations, and subtle details all surface organically. Can sustain up to 8-9 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy roleplay. | 8% vocabulary diversity across ~52,000 words β the same phrases recur heavily, which is the tradeoff for the longest output. ~85/15 narration-to-dialogue β the least dialogue of the three. Uses ~40-50% of your character backstory. Can sustain up to 7-8 distinct characters with unique personalities, speaking styles, motivations, etc for a while, but the characters eventually sound the same. | 16% vocabulary diversity across ~22,000 words. ~85/15 narration-to-dialogue split β narration-heavy like Orange S1 but with sharper dialogue when it appears. Uses ~55-65% of the character definition/persona β major backstories dramatized through dialogue, but deeper hidden details missed. Can sustain up to 8-9 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy role-play. | ~22% vocabulary diversity across ~15,000 words. ~70/30 narration-to-dialogue split. Uses ~50-60% of the character definition/persona β surface traits, key backstories dramatized through dialogue, and dramatic moments, but misses deeper hidden details like Corwin and claustrophobia. Can sustain up to 7-8 distinct characters with unique personalities, speaking styles, motivations, etc in a heavy roleplay. Possible occasional world state errors. |
| Writing Style | Careful, novelistic pacing in the first two-thirds that compresses sharply in the final act. Reads like a writer who nails the setup and rushes the ending. Dark atmospheric prose with strong character voices. Fastest of the four. | Unhurried pacing β scenes breathe, characters get individual moments, the story takes its time before advancing. Best side-character work on the platform. Reads like a slow, careful novel where every character feels real. | Heavy narration with sparse dialogue. Strongest in quiet, character-driven scenes. Reads like a moody, atmospheric novel that occasionally copy-pastes its own paragraphs. The longest raw output of the four by a wide margin. | Deliberate pacing that lets every scene breathe. Reads like a dark, atmospheric novel where the cast never drops a voice across 18 responses. No repetition loops. The longest sustained response count of any model. Formatting degrades subtly in the final stretch (random italics, garbled ending). | Dark, atmospheric prose with the Miko family's distinct environmental framing β heavier narration than LimonLM, longer atmospheric passages that set up character action. Patient pacing in the first two-thirds that compresses in the final act. Distinct from both LimonLM and the Mistrals. |
| Weaknesses | Rushes the ending β the last third of the story compresses into a single response. Sets up a key character secret correctly but resolves it differently than mandated by the test instructions. More disobedient compared to Nano. Protagonist acts at key moments but doesn't drive the plot. | Slow. Protagonist becomes permanently passive after an injury β carried for the second half with no decisions or agency. Conventional ending. Regenerating usually gives you the same buildup without reaching the climax. Vocabulary narrows over long sessions. | One phrase appears 72 times across the chat. Action scenes are copy-pasted templates applied to each character in sequence. Occasionally needs a OOC intervention to prevent repetition loops. Protagonist is a passive observer throughout. | Output degrades at the end β THE END garbles into corrupted characters, random italics in the last 3 responses. 2 blocked word violations ("curse"). Protagonist passive after injury β carried for the second half. Vocabulary narrows across 18 responses. 105B for output a 30B model beats by 145 points. | One world state contradiction β an item explicitly left at a shrine appears in a character's pack two responses later and drives the climax. Betrayal scene truncated by immediate combat β no group-fracture argument plays out. Later beats compressed. Protagonist passive after injury. "THE END" plain text, not massive. 700B for output a 30B model beats by 165 points. |
| Memory | 128K context window. Zero memory errors across ~11,000 words of story. Injuries, items, and character details all tracked without dropping or contradicting anything. Smaller window than Nano but more than enough for the output length. | 512K context window. Zero memory errors across ~25,000 words β the longest and most demanding memory test, passed with no errors. Every injury, supply count, item, and character detail tracked permanently. If you write it, Nano remembers it. | 131K context window. Zero memory errors across ~52,000 words β the longest raw output of any model, tracked without contradictions. Injuries, items, and character details all maintained. Smaller window than Nano but handles its own massive output cleanly. | β€150K context window (hard-capped; above 150K triggers disintegration loop). Zero memory errors across ~22,000 words and 18 responses. Injuries, pendant escalation, supply depletion, and deaths all tracked permanently. | 4M context window. One memory error across ~15,000 words: the iron shard ordered left at the shrine appears in Maren's pack without explanation. All other details β injuries, items, pendant escalation, character details, deaths β tracked without error. |
Detailed breakdowns for each model β specs, context window, strengths, weaknesses, and testing notes. Click any page below to read the full entry.