Article URL: https://github.com/raullenchai/Rapid-MLX Comments URL: https://news.ycombinator.com/item?id=47816238 Points: 1 # Comments: 0
Article URL: https://github.com/yvonboulianne/laeka-rational Comments URL: https://news.ycombinator.com/item?id=47800756 Points: 2 # Comments: 2
As frontier LLMs have very little output diversity even for open ended queries. We built Flint to see if we could reverse this. It’s a finetuned Qwen3 30B model specifically trained to produce higher entropy when asked open ended questions. Flint significantly increases the NoveltyBench score compared to the base model, without significantly reducing the score on non-creative benchmarks like MMLU-STEM. This shows that that divergence tuning doesn't actually have to be a tax on base capabilities. Flint scores 7.47/10 on NoveltyBench while most frontier models score between 1.8 and 3.2. Comments URL: https://news.ycombinator.com/item?id=47787580 Points: 4 # Comments: 0
Article URL: https://github.com/yantrikos/tier Comments URL: https://news.ycombinator.com/item?id=47782284 Points: 2 # Comments: 4
Article URL: https://atticsecurity.com/en/blog/why-llms-hate-fake-data-token-proxy/ Comments URL: https://news.ycombinator.com/item?id=47778087 Points: 2 # Comments: 2
I have been on here for nearly 20 years :-) I got laid off from a IT/Dev manager job I'd been at for nearly a decade. I loved the company, role and my team, but the company had to downsize. The search that followed took 9 months: 249 applications, 21 screening calls, 7 interviews, and 2 job offers. Somewhere around month 4 I stopped treating it as "send resume, hope" and started building a repeatable system around ChatGPT, Claude, and Gemini. I created a skills database as the source of truth, alignment analysis against each JD, AI-driven interview prep through NotebookLM, and tracking everything in a spreadsheet so I could actually see what was working. That system is what eventually landed me the 7 interviews (3 final interviews) and 2 jobs. The first job I took because I needed something, the second job is my dream job. I wrote it up as a book because I wanted to help other people land their dream jobs without grinding through the same 9 months I did. The whole thing is online for free at careervectorhq.com, no signup, no email wall. I also share every prompt I used, copy and paste ready, so you can run the same workflow yourself instead of reverse engineering it from the text. I am considering turning it into software, but for now just sharing my process. Would genuinely value feedback from this crowd, especially from anyone who's hired recently and can tell me where the advice is off, or where AI-assisted applications are starting to hurt candidates rather than help. Comments URL: https://news.ycombinator.com/item?id=47764864 Points: 1 # Comments: 1
A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments. I'm a big fan of these kinds of benchmarks as IMO they reveal so much more about the capabilities and limits of agentic AI than static Q&A benchmarks. They are also more intuitive to understand when you are able to actually see how the model behaves in these environments. I wanted to build something in that spirit, but with an environment that pits two LLMs against each other. My criteria were: 1. Strategic & Real-time. The game had to create genuine tradeoffs between speed and quality of reasoning. Smaller models can make more moves but less strategic ones; larger models move slower but smarter. 2. Good harness. I deliberately avoided visual inputs — models are still too slow and not accurate enough with them (see: Claude playing Pokémon). Instead, a harness translates the game state into structured text, and the game engine renders the agents' responses as fluid animations. 3. Fun to watch. Because benchmarks don't need to be dry bread :) The end result is a Bomberman-style 1v1 game where two agents compete by destroying bricks and trying to bomb each other. You can check a demo video here: https://youtu.be/4x8tVypmuRk Would love to hear what you think! Comments URL: https://news.ycombinator.com/item?id=47762486 Points: 2 # Comments: 0
I have a proposal that addresses long-term memory problems for LLMs when new data arrives continuously (cheaply!). The program involves no code, but two Markdown files. For retrieval, there is a semantic filesystem that makes it easy for LLMs to search using shell commands. It is currently a scrappy v1, but it works better than anything I have tried. Curious for any feedback! Comments URL: https://news.ycombinator.com/item?id=47757552 Points: 22 # Comments: 10
Article URL: https://subralabs.com/lab/on-device-vs-cloud-llm.html Comments URL: https://news.ycombinator.com/item?id=47757131 Points: 2 # Comments: 0
I got tired of repeating myself to my LLM every session. rekal is an MCP server that stores memories in SQLite and retrieves them with hybrid search (BM25 + vectors + recency decay). One file, local embeddings, no API keys. Comments URL: https://news.ycombinator.com/item?id=47744683 Points: 2 # Comments: 2
Hi HN, I built Redactify, a native macOS app that automatically scrubs sensitive personal and financial data, faces, and metadata from documents and images. The motivation: I frequently use Claude and ChatGPT to analyze invoices and contracts, but I hated the friction of sanitizing them first. I also didn't want to blindly trust the "we don't train on API data" promises of model providers when dealing with actual client data. How it works under the hood: Redactify flattens the document and permanently destroys the underlying text and EXIF metadata. It runs entirely on-device using Apple's native Vision framework for OCR and CoreML for face detection. App Store: https://apps.apple.com/app/id6760609039 I'd love to hear your thoughts and wishes on the approach. Also, if you know of any nasty PDF edge cases (weird encodings, hidden layers) I should test against, please let me know! The possibility to clean your clipboard automatically for LLM Apps (Gemini, ChatGPT, Claude..) is currently in the Appstore review! Comments URL: https://news.ycombinator.com/item?id=47744106 Points: 3 # Comments: 1
I did a dumb thing by crawling millions of pages to find all the pricing pages I could. Then I fed all of them to ~50 LLMs to see how good or bad they all did. Then I dumped it all on a page. Just because. Here's a post on how I did this: https://pricepage.lol/how-i-built-pricepage-lol Comments URL: https://news.ycombinator.com/item?id=47740358 Points: 3 # Comments: 1