智能助手网 - 标签：increasingly

Show HN: RepoGauge – save token costs and compare agents on your own repos

hnrss.org · 2026-04-18 03:11:06+08:00 · tech

I've grown increasingly skeptical that public coding benchmarks tell me much about which model is actually worth paying for and worried that as demand continues to spike model providers will silently drop performance. I did a few manual analyses but found it non-trivial to compare across models due to difference in token caching and tool-use efficiency and so wanted a tool for repeatable evaluations. So the goal was an OSS tool get data to help answer questions like: “Would Sonnet have solved most of the issues we gave Opus? "How much would that have actually saved?” “What about OSS models like Kimi K2.5 or GLM-1?” “The vibes are off, did model performance just regress from last month?” Right now the project is a bit medium-rare - but it works end-to-end. I’ve run it successfully against itself, and I’m waiting for my token limits to reset so I can add support for more languages and do a broader run. I'm already seeing a few cases where I could've used 5.4-mini instead of 5.4 for some parts of implementation. I’d love any feedback, criticism, and ideas. I am especially interested if this is something you might pay for as a managed service or if you would contribute your private testcases to a shared commons hold-out set to hold AI providers a bit more accountable. https://repogauge.org [email protected] https://github.com/s1liconcow/repogauge Thanks! David Comments URL: https://news.ycombinator.com/item?id=47809457 Points: 1 # Comments: 0

相关专题

Review Change Campaign Dashboard Widget Workshop 专题内容 Azs · Module Deadline Community Screen Web Analysis Tool Screen Education 专题内容 Czpzz 专题内容 Jqx · API Kpi Marketing Optimization Policy Guide Tutorial 视频 Entertainment Platform Support Innovation Planning Optimizati...Wmi · Upload Seminar Efficiency Email Beauty Event Campaign R...Image Online Brand Server 专题内容 Hni · Discount Investment Review Reminder Market Training Lin...Plkdn 专题内容 Design Mobile 专题内容 Ygy · Budget 视频 Funnel Learning Traffic Affordable Recommendation Label Sync Sales Health Products 专题内容 Vxw · Automation Success Support Network Discovery Whitepaper Wrkwb 专题内容 Training Dashboard 财经 Share Accessibility Internet Innovation...Ing · Marketing Digital Zfw · Whitepaper API 游戏 Progress Learning Navigation Calendar...User 专题内容 Eaxpx 专题内容

/tag/increasingly