Back to blog

Why Merging AI Models Misses the Point

April 21, 2026·3 min readAIMulti-Model
Why Merging AI Models Misses the Point

An AI engineer just stitched Claude, Qwen, and GLM into a single 18 billion parameter model. It runs on a laptop. It passed 40 out of 44 capability tests. And it completely misses the point of why you'd want multiple AI models in the first place.

What happened

Kyle Hessling, an AI infrastructure engineer, created Qwopus-GLM-18B by physically stacking neural network layers from three different models: Qwen 3.5 as the base, reasoning patterns distilled from Claude Opus 4.6, and problem decomposition techniques from GLM-5.1. The result is one frozen model that tries to capture the strengths of all three.

It's technically impressive. He had to write his own merge script from scratch because existing tools couldn't handle Qwen's hybrid architecture. The model uses "passthrough frankenmerge," raw layer stacking without weight averaging. Layers 0 through 31 come from one model, layers 32 through 63 from another.

It's also fragile. The model produced garbled code at the layer boundaries and needed a "healing fine tune" to fix the output. That's the nature of stitching neural networks together, the seams show.

The real problem it's trying to solve

The instinct behind this project is right: no single AI model is the best at everything. Claude writes beautifully but can't search the web. GPT handles structured tasks well but can feel formulaic. Gemini has real-time knowledge. DeepSeek is precise at code and math. That's why using multiple models matters.

Engineers and power users already know this. They keep multiple tabs open. They copy prompts between ChatGPT and Claude. They compare outputs manually. It's tedious, but it works better than trusting a single model for everything.

Hessling's solution: merge the models into one so you get all strengths simultaneously. It's an engineer's answer to a user's problem.

Why merging is the wrong approach

Model merging is permanent. Once you stitch those layers together, you can't update one model without rebuilding the whole thing. When Claude Opus 5 comes out next quarter, this merged model is stuck on 4.6. When a new model appears that's better at a specific task, you can't swap it in.

It's also a black box. You can't see which "model" contributed what to the output. Was the reasoning Claude-style or GLM-style? You don't know. You can't compare, you can't choose, and you can't learn which model handles your specific task best.

And it breaks at the seams. The garbled output at layer boundaries isn't a bug that got fixed, it's a fundamental weakness of the approach. Neural networks weren't designed to be cut and reassembled.

There's a simpler way

Instead of merging models into one, run them all separately and compare or combine at the answer level.

This is what Council Mode does on Anuma. You write your prompt once. Multiple models respond independently. You see every answer side by side. Then you either pick the best one or generate a Unified Answer that combines the strongest parts from each model.

Model mergingCouncil Mode
FlexibilityFixed, can't swap modelsChoose any models, change anytime
TransparencyBlack box outputSee each model's answer separately
UpdatesRebuild from scratchNew models available instantly
Failure modeGarbled output at layer seamsEach model runs independently
User controlNoneCompare, choose, or unify
Hardware9.2 GB VRAM minimumAny device, any browser
MemoryNone, statelessUnified memory across all models

The multi-model future isn't about fusion

The instinct to combine AI model strengths is correct. The method matters.

Merging at the neural network level is brittle, opaque, and frozen in time. Merging at the answer level is flexible, transparent, and always up to date. You keep each model's full capability intact, you can see exactly what each one contributed, and you can swap in better models the day they launch.

This is why Anuma built Council Mode and Unified Answer. Not because merging models is a bad idea, but because there's a better way to get the same result: run them all, compare them all, and let the user decide.

The best AI isn't one model trying to be everything. It's all of them, working together, with your memory carrying across every one.

You might also like