智能助手网
标签聚合 LLMs

/tag/LLMs

hnrss.org · 2026-04-16 09:24:14+08:00 · tech

As frontier LLMs have very little output diversity even for open ended queries. We built Flint to see if we could reverse this. It’s a finetuned Qwen3 30B model specifically trained to produce higher entropy when asked open ended questions. Flint significantly increases the NoveltyBench score compared to the base model, without significantly reducing the score on non-creative benchmarks like MMLU-STEM. This shows that that divergence tuning doesn't actually have to be a tax on base capabilities. Flint scores 7.47/10 on NoveltyBench while most frontier models score between 1.8 and 3.2. Comments URL: https://news.ycombinator.com/item?id=47787580 Points: 4 # Comments: 0

hnrss.org · 2026-04-14 20:37:31+08:00 · tech

I have been on here for nearly 20 years :-) I got laid off from a IT/Dev manager job I'd been at for nearly a decade. I loved the company, role and my team, but the company had to downsize. The search that followed took 9 months: 249 applications, 21 screening calls, 7 interviews, and 2 job offers. Somewhere around month 4 I stopped treating it as "send resume, hope" and started building a repeatable system around ChatGPT, Claude, and Gemini. I created a skills database as the source of truth, alignment analysis against each JD, AI-driven interview prep through NotebookLM, and tracking everything in a spreadsheet so I could actually see what was working. That system is what eventually landed me the 7 interviews (3 final interviews) and 2 jobs. The first job I took because I needed something, the second job is my dream job. I wrote it up as a book because I wanted to help other people land their dream jobs without grinding through the same 9 months I did. The whole thing is online for free at careervectorhq.com, no signup, no email wall. I also share every prompt I used, copy and paste ready, so you can run the same workflow yourself instead of reverse engineering it from the text. I am considering turning it into software, but for now just sharing my process. Would genuinely value feedback from this crowd, especially from anyone who's hired recently and can tell me where the advice is off, or where AI-assisted applications are starting to hurt candidates rather than help. Comments URL: https://news.ycombinator.com/item?id=47764864 Points: 1 # Comments: 1

hnrss.org · 2026-04-14 15:36:39+08:00 · tech

A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments. I'm a big fan of these kinds of benchmarks as IMO they reveal so much more about the capabilities and limits of agentic AI than static Q&A benchmarks. They are also more intuitive to understand when you are able to actually see how the model behaves in these environments. I wanted to build something in that spirit, but with an environment that pits two LLMs against each other. My criteria were: 1. Strategic & Real-time. The game had to create genuine tradeoffs between speed and quality of reasoning. Smaller models can make more moves but less strategic ones; larger models move slower but smarter. 2. Good harness. I deliberately avoided visual inputs — models are still too slow and not accurate enough with them (see: Claude playing Pokémon). Instead, a harness translates the game state into structured text, and the game engine renders the agents' responses as fluid animations. 3. Fun to watch. Because benchmarks don't need to be dry bread :) The end result is a Bomberman-style 1v1 game where two agents compete by destroying bricks and trying to bomb each other. You can check a demo video here: https://youtu.be/4x8tVypmuRk Would love to hear what you think! Comments URL: https://news.ycombinator.com/item?id=47762486 Points: 2 # Comments: 0

hnrss.org · 2026-04-13 04:27:37+08:00 · tech

Hi HN, I built Redactify, a native macOS app that automatically scrubs sensitive personal and financial data, faces, and metadata from documents and images. The motivation: I frequently use Claude and ChatGPT to analyze invoices and contracts, but I hated the friction of sanitizing them first. I also didn't want to blindly trust the "we don't train on API data" promises of model providers when dealing with actual client data. How it works under the hood: Redactify flattens the document and permanently destroys the underlying text and EXIF metadata. It runs entirely on-device using Apple's native Vision framework for OCR and CoreML for face detection. App Store: https://apps.apple.com/app/id6760609039 I'd love to hear your thoughts and wishes on the approach. Also, if you know of any nasty PDF edge cases (weird encodings, hidden layers) I should test against, please let me know! The possibility to clean your clipboard automatically for LLM Apps (Gemini, ChatGPT, Claude..) is currently in the Appstore review! Comments URL: https://news.ycombinator.com/item?id=47744106 Points: 3 # Comments: 1