Yeah, using an API aggregator to run a 7B model is economically strange if you have even a consumer GPU. OpenRouter captures the cream of complex requests (Claude 3.5, o1) that you can't run at home. But even for local hosting, medium models are starting to displace small ones because quantization lets you run them on accessible hardware, and the quality boost there is massive. So the "Medium is the new Small" trend likely holds true for the self-hosted segment as well.