Complete llama.cpp tutorial for 2026. Install, compile with CUDA/Metal, run GGUF models, tune all inference flags, use the API server, speculative decoding, and benchmark your hardware. https://vucense.com/dev-corner/llama-cpp-tutorial-run-gguf-m... Comments URL: https://news.ycombinator.com/item?id=47812127 Points: 4 # Comments: 0
其实一直有点犹豫,一开始入门用的就是ollama,简单也便捷,后面发现ollama的资源调度机制其实挺让人纳闷的,就想着能不能通过自己优化一下,问了下大模型,大模型说不如用llama.cpp 5 个帖子 - 4 位参与者 阅读完整话题
最近想开ollama pro,看别人说爽玩,有glm5.1的调用,不知道额度有多少 4 个帖子 - 4 位参与者 阅读完整话题
充了个 ollama cloud pro 玩 glm-5.1 目前使用情况 上图为 4.4M tokens 消耗(本来想详细看看输入/输出/缓存,但是我用的axonhub不好查) 另外性能如下 13 个帖子 - 9 位参与者 阅读完整话题
Article URL: https://github.com/SvReenen/Deskdrop Comments URL: https://news.ycombinator.com/item?id=47782560 Points: 3 # Comments: 1
I'm a software engineer who works with LLMs professionally (Forward Deployed Engineer at TrueFoundry). Over the past year I built up implementations of five LLM architectures from scratch and wrote a book around them. The progression: - Ch1: Vanilla encoder-decoder transformer (English to Hindi translation) - Ch2: GPT-2 124M from scratch, loads real OpenAI pretrained weights - Ch3: Llama 3.2-3B by swapping 4 components of GPT-2 (LayerNorm to RMSNorm, learned PE to RoPE, GELU to SwiGLU, MHA to GQA), loads Meta's pretrained weights - Ch4: KV cache, MQA, GQA (inference optimisation) - Ch5: DeepSeek MLA (absorption trick, decoupled RoPE), DeepSeekMoE, Multi-Token Prediction, FP8 quantisation All code is open source: https://github.com/S1LV3RJ1NX/mal-code The book provides the explanations, derivations, diagrams, and narrative: https://leanpub.com/adventures-with-llms (free sample available) I wrote it because most resources stop at GPT-2 and I wanted something that covered what's actually in production models today. Happy to answer questions about any of the implementations. Comments URL: https://news.ycombinator.com/item?id=47779084 Points: 2 # Comments: 0
估计是薅羊毛的太多了,ollama 也挨不住这么玩,不给免费用 5.1 了 3 个帖子 - 2 位参与者 阅读完整话题
model is experiencing high volume. while capacity is being added, a subscription is required for access GLM-5.1请求量太大,目前只有付费用户才能使用,恢复时间未定,会不会恢复也难说。 鲸鱼大人,再带大家冲一次吧 3 个帖子 - 2 位参与者 阅读完整话题
Article URL: https://github.com/adrianium/Scryptian Comments URL: https://news.ycombinator.com/item?id=47764747 Points: 1 # Comments: 1
The introduction of TurboQuant, PolarQuant, and QJL (Quantized Johnson-Lindenstrauss) by Google Research represents more than just a technical optimization. At Vucense, we view this as a landmark moment for Inference Sovereignty https://vucense.com/ai-intelligence/local-llms/turboquant-ex... Comments URL: https://news.ycombinator.com/item?id=47752036 Points: 1 # Comments: 0
昨天看到有佬友发帖询问 Ollama 的订阅,Plan 截图里的 icon 和界面设计挺有意思。我就把各家 AI 厂商不同风格的订阅页面收集起来,供各位佬友观赏。 可爱风格: Ollama 经典白底 + 每种订阅的小羊驼读书 icon 人文风格: Claude 暖色背景 + 手绘风格人文 icon 严谨风格: ChatGPT 黑底 + 每种订阅只有文字描述,无 icon 全家桶风格: Google Gemini 白底 + Google 定制字体。Plan 涉及的权益太多,一张屏幕放不下,但是核心权益被削的很惨。 黑金风格: GLM 海外 黑底 + 金银色打光。Pro 订阅是积木 icon + 银色打光 + 小钻石图标。 Max 订阅的 logo 是经典的原子 icon + 金色打光 + 金色皇冠。 促销风格: 智谱国内 一眼就是经典的云服务商促销页面风格,附带 PDD 文案。 喜庆风格: minimax 红艳艳的顶部宣传插画 + 较为克制的套餐样式设计 工单风格: 阿里云百炼 Coding Plan 目前新购只有 Pro 这一种套餐,所以之前的多种 Plan 的界面已经撤下了。 音乐风格: Kimi(月之暗面) 鼠标放到不同的订阅套餐上会显示不同的五线谱,很有意思 以下是 Gemini 对于不同套餐英文名的解释: 1 个帖子 - 1 位参与者 阅读完整话题