I got tired of maintaining two files that describe the same thing: an OpenAPI spec for documentation and a Postman collection for testing. They always drift. Someone updates the spec, forgets the collection. A new engineer joins and runs outdated tests against an endpoint that was changed two months ago. VolcAPI lets you define test scenarios directly inside your OpenAPI spec using a custom extension (v-functional-test), then run them from the CLI. Single source of truth. It's a Go binary no runtime, no node_modules. The goal is for it to drop into GitHub Actions with zero friction once JUnit XML output lands (in progress). Repo: https://github.com/aliamerj/volcapi This is early alpha. GET/POST/PUT/DELETE work, response validation works, environment configs work. CI output formats are the next thing I'm building. Honest question for the HN crowd: is the "spec as test suite" concept something you'd actually use, or do you prefer keeping tests separate from the spec? I've gone back and forth on this and would genuinely like to hear from people who've felt this pain. Comments URL: https://news.ycombinator.com/item?id=47814655 Points: 1 # Comments: 1
Transcribe the audio (passing it in at 2x speed) via whisper, use a cheap LLM to identify the ad snippets, and then take it out! Have been using for the past couple days - it's not perfect but I've enjoyed using it. Feel free to add your episodes! https://github.com/mergd/podads Comments URL: https://news.ycombinator.com/item?id=47813346 Points: 3 # Comments: 1
I built this to run OpenClaw safely. The problem: every sandbox I tried still handed the real API token to the agent as an env var. nilbox never gives the agent the real token. It gets a fake placeholder instead (ANTHROPIC_API_KEY=ANTHROPIC_API_KEY). nilbox intercepts outbound API calls and swaps in the real token at the network layer. So if the agent leaks the "token" — attacker gets a useless string. That's it. Also ships a managed Linux runtime (consistent across mac/win/linux) and a Store for one-click agent app installs. Full shell access too. Available for macOS, Windows, and Linux https://nilbox.run Curious how others are thinking about token security when running agents locally. Comments URL: https://news.ycombinator.com/item?id=47812193 Points: 3 # Comments: 0
We compared four architectures for putting AI agents on websites — RAG bots, API-tool agents(WebMCP), code-writing sandboxes (Cloudflare Agent Lee), and DOM-native execution. Three of them force you to maintain a parallel engineering surface. The DOM already has live state, user auth, and permissions baked in. A structural breakdown of why. Comments URL: https://news.ycombinator.com/item?id=47811609 Points: 2 # Comments: 0
Article URL: https://cogveo.com Comments URL: https://news.ycombinator.com/item?id=47811420 Points: 2 # Comments: 0
We built AI Subroutines in rtrvr.ai. Record a browser task once, save it as a callable tool, replay it at: zero token cost, zero LLM inference delay, and zero mistakes. The subroutine itself is a deterministic script composed of discovered network calls hitting the site's backend as well as page interactions like click/type/find. The key architectural decision: the script executes inside the webpage itself, not through a proxy, not in a headless worker, not out of process. The script dispatches requests from the tab's execution context, so auth, CSRF, TLS session, and signed headers get added to all requests and propagate for free. No certificate installation, no TLS fingerprint modification, no separate auth stack to maintain. During recording, the extension intercepts network requests (MAIN-world fetch/XHR patch + webRequest fallback). We score and trim ~300 requests down to ~5 based on method, timing relative to DOM events, and origin. Volatile GraphQL operation IDs are detected and force a DOM-only fallback before they break silently on the next run. The generated code combines network calls with DOM actions (click, type, find) in the same function via an rtrvr.* helper namespace. Point the agent at a spreadsheet of 500 rows and with just one LLM call parameters are assigned and 500 Subroutines kicked off. Key use cases: - record sending IG DM, then have reusable and callable routine to send DMs at zero token cost - create routine getting latest products in site catalog, call it to get thousands of products via direct graphql queries - setup routine to file EHR form based on parameters to the tool, AI infers parameters from current page context and calls tool - reuse routine daily to sync outbound messages on LinkedIn/Slack/Gmail to a CRM using a MCP server We see the fundamental reason that browser agents haven't taken off is that for repetitive tasks going through the inference loop is unnecessary. Better to just record once, and get the LLM to generate a script leveraging all the possible ways to interact with a site and the wider web like directly calling backed API's, interacting with the DOM, and calling 3P tools/APIs/MCP servers. Comments URL: https://news.ycombinator.com/item?id=47810533 Points: 5 # Comments: 1
Article URL: https://agents.ml Comments URL: https://news.ycombinator.com/item?id=47810052 Points: 2 # Comments: 1
AI is in a weird place right now. Technical people marvel over it while non-technical people don't really care. So we built Co-Op, an app specifically designed for non-technical people to run AI agents without needing a Mac Mini or laptop running 24/7. Your agents run throughout the day and complete real work across your most important apps, no code required. For me, this looks like a daily notification with my unread emails across Outlook, Gmail, and any other inbox, the weather, commute time, flight prices I'm monitoring, and the news. But it goes way beyond that. Our agents can build slideshows on Google Slides, track your finances, follow sports scores, summarize and write documents, manage your calendar, and a lot more. All running in the background without you having to constantly prompt. Comments URL: https://news.ycombinator.com/item?id=47809992 Points: 3 # Comments: 2
I've grown increasingly skeptical that public coding benchmarks tell me much about which model is actually worth paying for and worried that as demand continues to spike model providers will silently drop performance. I did a few manual analyses but found it non-trivial to compare across models due to difference in token caching and tool-use efficiency and so wanted a tool for repeatable evaluations. So the goal was an OSS tool get data to help answer questions like: “Would Sonnet have solved most of the issues we gave Opus? "How much would that have actually saved?” “What about OSS models like Kimi K2.5 or GLM-1?” “The vibes are off, did model performance just regress from last month?” Right now the project is a bit medium-rare - but it works end-to-end. I’ve run it successfully against itself, and I’m waiting for my token limits to reset so I can add support for more languages and do a broader run. I'm already seeing a few cases where I could've used 5.4-mini instead of 5.4 for some parts of implementation. I’d love any feedback, criticism, and ideas. I am especially interested if this is something you might pay for as a managed service or if you would contribute your private testcases to a shared commons hold-out set to hold AI providers a bit more accountable. https://repogauge.org [email protected] https://github.com/s1liconcow/repogauge Thanks! David Comments URL: https://news.ycombinator.com/item?id=47809457 Points: 1 # Comments: 0
Article URL: https://github.com/paniclock/paniclock/ Comments URL: https://news.ycombinator.com/item?id=47807809 Points: 49 # Comments: 19
Article URL: https://clamp.sh Comments URL: https://news.ycombinator.com/item?id=47807770 Points: 2 # Comments: 1
Article URL: https://glassroom.sageframe.net Comments URL: https://news.ycombinator.com/item?id=47807262 Points: 1 # Comments: 0
Article URL: https://github.com/resend/react-email Comments URL: https://news.ycombinator.com/item?id=47806242 Points: 9 # Comments: 0
Once subagents start spawning other subagents, basic questions get hard to answer: what is running right now, what tool did it just call, did the child agent actually do what the parent asked. I wanted a way to verify that each agent is doing the work that fits its role, and to spot when a run goes off track. Lazyagent is a terminal TUI that collects events from Claude Code, Codex, and OpenCode and shows them in one place. It groups sessions from different runtimes by working directory, so Claude and Codex runs on the same repo appear under the same project. Features: - Filter events by type: tool calls, user prompts, session lifecycle, system events, or code changes only. - See which agent or subagent is responsible for each action. The agent tree shows parent-child relationships, so you can trace exactly what a spawned subagent did vs what the parent delegated. - View code diffs at a glance. Editing events render syntax-highlighted diffs inline, with addition/deletion stats. - Search across all events. You know a file was touched but not which agent did it -- type `/` and find it. - Check token usage per session. A single overlay shows cost, model calls, cache hit rate, per-model breakdowns, and which tools ran the most. - Watch a run in real time, or go back through a completed session to audit what happened. Please let me know if there's any feature you want! Comments URL: https://news.ycombinator.com/item?id=47805963 Points: 1 # Comments: 0
Article URL: https://apify.com/unlimiteddots/oura-mcp-server Comments URL: https://news.ycombinator.com/item?id=47805591 Points: 4 # Comments: 0
Start with a FREE instant website for your AI on the open internet, then work with it to build a business that sells specialized datasets, files, premium reports, blogs, courses and more. Comments URL: https://news.ycombinator.com/item?id=47802211 Points: 2 # Comments: 1
Lenny’s Newsletter Product Pass Lenny’s Newsletter Product Pass 28 premium AI and product tools, over $30,000 in value—free with your Annual or Insider subscription. 新增notion 1年, replit core 1年, intercom 和gumloop这俩不知道是啥 insider也更新了,gemini/v0和supabase,但是cursor未见踪影 1 个帖子 - 1 位参与者 阅读完整话题
Hey HN, I made agent-hub an open source tool that lets you talk to all your AI agents running locally or on remote machines. It works with setups where you already have agents running (Claude Code, Codex, Hermes, OpenClaw, etc.) and just want a simple way to access and use them in one place Why I built this: I run agents across a few remote machines + my local computer, and switching between them was painful. Existing tools like Conductor felt too tied to specific workflows (e.g. Git-based), and I couldn’t find anything that handled: - GTM tasks - Coding tasks - remote agents over SSH The vision: Build a mobile app to accompany this as well. I find myself talking to my agents on mobile as well. I am Omar and I vibe coded this over the past weekend :) Comments URL: https://news.ycombinator.com/item?id=47799990 Points: 1 # Comments: 0
Early in my career I worked on DataLab, the sister site to USASpending.gov before it got merged into it. I then worked on USASpending a bit. Datalab had more of a "for the people" storytelling vibe to it which I liked a lot more and I feel that spirit got lost when we "merged" into USASpending. I built The Public Tab in response to this feeling. My core idea: The district you are in is a more relatable measure of analysis for federal spending for the average person. You want to know what is going on around you. You can more easily "follow the money" this way. I have a pipeline that runs nightly/weekly with data digest from USASpending, SAM.gov, and some other big players in the space. You can subscribe to changes and get updates daily/weekly on spending in your district. You can map that to lobbying, new contracts that pop up, as well as what your rep is voting on. The client libraries are written in ruby and are open source here: https://github.com/govapi-rb You can also check out the API Docs here: https://thepublictab.com/docs/api It is deemed "beta" for now.. I'd love any feedback and to hear what is cool and what is not. Thanks! Comments URL: https://news.ycombinator.com/item?id=47798958 Points: 1 # Comments: 4
Article URL: https://kouh.me/arrow Comments URL: https://news.ycombinator.com/item?id=47795941 Points: 2 # Comments: 1