智能助手网
标签聚合 Runtime

/tag/Runtime

linux.do · 2026-04-18 21:09:24+08:00 · tech

new_api_panic: Panic detected, error: runtime error: invalid memory address or nil pointer dereference. Please submit a issue here: GitHub - QuantumNous/new-api: A unified AI model hub for aggregation & distribution. It supports cross-converting various LLMs into OpenAI-compatible, Claude-compatible, or Gemini-compatible formats. A centralized gateway for personal and enterprise model management. 🍥 · GitHub | Upstream: {“error”:{“message”:“Panic detected, error: runtime error: invalid memory address or nil pointer dereference. Please submit a issue here: https://github.com/Calcium-Ion/new-api",“type”:"new_api_panic ”}} 2 个帖子 - 2 位参与者 阅读完整话题

hnrss.org · 2026-04-17 05:39:58+08:00 · tech

Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security layer around them. So I built a system that acts as a control plane for AI behavior, not just infrastructure. GitHub: https://github.com/dshapi/AI-SPM What it does The system sits around an LLM pipeline and enforces decisions in real time: Detects and blocks prompt injection (including obfuscation attempts) Forces structured tool calls (no direct execution from the model) Validates tool usage against policies Prevents data leakage (PII / sensitive outputs) Streams all activity for detection + audit Architecture (high-level) Gateway layer for request control Context inspection (prompt analysis + normalization) Policy engine (using Open Policy Agent) Runtime enforcement (tool validation + sandboxing) Streaming pipeline (Apache Kafka + Apache Flink) Output filtering before response leaves the system The key idea is: Treat the LLM as untrusted, and enforce everything externally What broke during testing Some things that surprised me: Simple pattern-based prompt injection detection is easy to bypass Obfuscated inputs (base64, unicode tricks) are much more common than expected Tool misuse is the biggest real risk (not the model itself) Most “guardrails” don’t actually enforce anything at runtime What I’m unsure about Would really appreciate feedback from people who’ve worked on similar systems: Is a general-purpose policy engine like OPA the right abstraction here? How are people handling prompt injection detection beyond heuristics? Where should enforcement actually live (gateway vs execution layer)? What am I missing in terms of attack surface? Why I’m sharing This space feels a bit underdeveloped compared to traditional security. We have CSPM, KSPM, etc… but nothing equivalent for AI systems yet. Trying to explore what that should look like in practice. Would love any feedback — especially critical takes. Comments URL: https://news.ycombinator.com/item?id=47799856 Points: 1 # Comments: 0

hnrss.org · 2026-04-16 05:41:39+08:00 · tech

I wanted to share something that I have been working on since about 2019, in one form or another. Springdrift is a persistent, auditable runtime for long-lived agents written in Gleam on the BEAM. It is my attempt at filling in some of the gaps in agent development. It is designed to do all the things an agent like Openclaw can do (and more eventually), but it can diagnose its own errors and failures. It has a sophisticated safety metacognition system. It has a character that should not drift. It started out as a machine ethics prototype in Java and gradually morphed into a full agent runtime. All the intermediate variations worth saving are on my GitHub repo. I recall trying to explain to a mentor what exactly I was building. I found it difficult because there was no existing category for this kind of agent. It is not quite an assistant, it does more than run tasks. It is not quite an autonomous agent because even though it is autonomous, its autonomy is bounded. I kept falling back to the example of assistance animals, like guide dogs. This provided what I needed, the example of a non-human agent that has bounded autonomy. But this is not a guide dog, it is an AI system. I needed to look to examples in fiction to add the final part - JARVIS, K9 from Dr. Who, Rhadamanthus from The Golden Age novel. All these systems have bounded autonomy and have a long term professional relationship with humans like a family lawyer or doctor whose services are retained. Hence the type of this system is an Artificial Retainer. The system has lots of interesting features - ambient self perception, introspection tooling and a safety system based on computational ethics (Becker) and decision theory (Beach). It is auditable, backed up to git and can manage its own work with a scheduler and a supervised team of subagents. The website and the accompanying paper provide more details. I make no huge claims for the system, it is pretty new. What I offer it as is a reference implementation of a new category of AI agent, one that I think we need. The road to AGI is all very well and I am not sure Springdrift gets us any closer. But it does represent an attempt to build an intermediate safe type of agent that we seem to be missing. All feedback and comments welcome! GitHub: https://github.com/seamus-brady/springdrift Arxiv paper: https://arxiv.org/abs/2604.04660 Eval data: https://huggingface.co/datasets/sbrady/springdrift-paper-eva... Website: https://springdrift.ai/ Comments URL: https://news.ycombinator.com/item?id=47785663 Points: 2 # Comments: 0

hnrss.org · 2026-04-14 16:28:16+08:00 · tech

Doors: Server-driven UI framework + runtime for building stateful, reactive web applications in Go. Some highlights: * Front-end framework capabilities in server-side Go. Reactive state primitives, dynamic routing, composable components. * No public API layer. No endpoint design needed, private temporal transport is handled under the hood. * Unified control flow. No context switch between back-end/front-end. * Integrated web stack. Bundle assets, build scripts, serve private files, automate CSP, and ship in one binary. How it works: Go server is UI runtime: web application runs on a stateful server, while the browser acts as a remote renderer and input layer. Security model: Every user can interact only with what you render to them. Means you check permissions when your render the button and that's is enough to be sure that related action wont be triggered by anyone else. Mental model: Link DOM to the data it depends on. Limitations: * Does not make sense for static non-iteractive sites, client-first apps with simple routing, and is not suitable for offline PWAs. * Load balancing and roll-outs without user interruption require different strategies with stateful server (mechanics to make it simpler is included). Where it fits best: Apps with heavy user flows and complex business logic. Single execution context and no API/endpoint permission management burden makes it easier. Peculiarities: * Purposely build [Go language extension]( https://github.com/doors-dev/gox ) with its own LSP, parser, and editor plugins. Adds HTML as Go expressions and \`elem\` primitives. * Custom concurrency engine that enables non-blocking event processing, parallel rendering, and tree-aware state propagation * HTTP/3-ready synchronization protocol (rolling-request + streaming, events via regular post, no WebSockets/SSE) From the author (me): It took me 1 year and 9 month to get to this stage. I rewrote the framework 6 or 7 times until every part is coherent, every decision feels right or is a reasonable compromise. I am very critical to my own work and I see flaws, but overall it turned out solid, I like developer experience as a user. Mental model requires a bit of thinking upfront, but pays off with explicit code and predictable outcome. Code Example: type Search struct { input doors.Source[string] // reactive state } elem (s Search) Main() { ~// subscribe results to state changes ~(doors.Sub(s.input, s.results)) } elem (s Search) results(input string) { ~(for _, user := range Users.Search(input) { ~(user.Name) }) } Comments URL: https://news.ycombinator.com/item?id=47762851 Points: 4 # Comments: 2

hnrss.org · 2026-04-14 14:25:47+08:00 · tech

We use Claude Code, Cursor, and Copilot daily. These tools run shell commands, read files, and call APIs on their own. When something goes wrong you find out after. A .env file gets read, a secret ends up somewhere it should not, a command runs that nobody approved. EDR sees process spawns. Cloud audit logs see API calls. Neither understands that the agent's chain of actions together is credential theft. Burrow sits between the agent and the machine. You define policies in plain language, like "block any agent from deleting production resources" or "alert if an agent reads AWS credentials and then sends data to an external endpoint." Burrow maps those policies against the actual tools, MCP servers, and plugins in your environment, then intercepts tool calls at the framework level before they execute. Risky calls get dropped. Everything else passes through. Works with Claude Code, Cursor, Copilot, Windsurf, CrewAI, LangChain, LangGraph, and a few more. CLI and SDK install in under a minute. Free tier for individuals, paid for teams. I ran infrastructure security at a large media company before this. Going full time on Burrow later this month. Happy to answer anything, especially the "does this actually work in production" question. try - https://burrow.run Comments URL: https://news.ycombinator.com/item?id=47761957 Points: 3 # Comments: 0

hnrss.org · 2026-04-13 15:48:11+08:00 · tech

Hi HN, since the Berkeley RDI benchmark integrity post recently got a lot of attention here [0], it seems like a good time to share Amber, related work aimed at making agent benchmarks easier to reproduce. Amber grew out of the RDI AgentX-AgentBeats benchmarking competition [1] where the general public was invited to submit agents. To ensure trustworthy results, we needed submissions to be reproducible and have clear provenance. Reproducibility motivates declarative specifications of benchmarks, and provenance motivates the ability to safely and efficiently run benchmarks on hosted hardware. Once you add support for multi-phase multi-agent benchmarks (like Werewolf), the design for Amber mostly falls right out. Amber is inspired by Fuchsia OS Component Framework. The security model of Amber is that a component like an A2A agent or MCP tool only serves a component that has explicitly been given a capability to use it. In the context of benchmarks, this means that an agent under test cannot reach into the evaluator, and that a tool can be revoked in a later phase of a benchmark. Amber is a combination of a compiler and a runtime system: the compiler turns manifests describing agents, tools, and how they connect to each other into a deterministic plan. The plan can be executed against different backends like Docker, K8s, KVM, or the host OS. The compiler injects runtime components necessary to enforce the capability model: sidecar routers that provide guarded connectivity between components, and backend controllers that allow components to create and destroy components at runtime. Amber started out with just static `docker compose`, but benches like TerminalBench and OSWorld required the addition of dynamic components and VM-backed components. Then competition participants wanted an easier way to test locally that didn't involve repeatedly rebuilding Docker images, so Amber got native binary support and a one-liner `amber run` interface. The concepts borrowed from Fuchsia have held up so far. Right now I'm working on making Amber's observability traces available to the benchmark evaluator so that it can judge based on the path an agent took, rather than just the final answer. Overall, the goal we set out to achieve was to make it easy to reproduce agent benchmark results in a low-trust environment. Amber is not a complete solution, but it takes some burden off of benchmark authors and agent builders. Maybe it's even useful beyond benchmarks. I would be happy for you to batter the conceptual framework! The AgentBeats tau2 benchmark manifest [2] is a real example. The in-tree mixed-site example [3] is a simple demo of Amber end-to-end with `amber run`. [0]: https://news.ycombinator.com/item?id=47733217 [1]: https://rdi.berkeley.edu/agentx-agentbeats.html [2]: https://github.com/RDI-Foundation/tau2-agentbeats/blob/main/... [3]: https://github.com/RDI-Foundation/amber/tree/main/examples/m... Comments URL: https://news.ycombinator.com/item?id=47749007 Points: 1 # Comments: 0