Show HN: Seamless – Content-addressed computation caching for Python and bash

Show HN: Seamless – Content-addressed computation caching for Python and bash
Show HN: Seamless – Content-addressed computation caching for Python and bash

Hey HN, Sjoerd de Vries here. I have worked on Seamless for nearly 10 years now. It has been used in my lab, but I was always around for troubleshooting. This is the first time that I think it's ready to stand on its own. I would love to hear your thoughts about it.

It started as a hobby project, I had an itch about programming not being at-your-fingertips enough. Then I applied it to my work as a bioinformatics research engineer. The early versions focused on interactive workflows. After a year or two I realized that to make interactivity work properly, you need really good DAG tracking, so checksums were added everywhere. My lab built a collaborative web server with it that we published. More recently I've rebuilt it around the command line, persistent caching, and remote deployment.

It's still in alpha, but the core is usable.

Core idea: same code + same inputs = same result, identified by checksum. If you've already computed it, you don't compute it again.

Two entry points:

Python:

  from seamless.transformer import direct

  @direct
  def add(a, b):
      import time
      time.sleep(5)
      return a + b

  add(2, 3)   # runs, caches result
  add(2, 3)   # cache hit — instant
Bash:

  seamless-run 'seq 1 10 | tac && sleep 5'    # runs, caches result
  seamless-run 'seq 1 10 | tac && sleep 5'    # cache hit — instant
With persistent caching enabled, results are stored as checksum-to-checksum mappings in a small SQLite database that can be shared with collaborators, so that they get cache hits too.

Execution scales by changing config, not code: in-process, spawned workers, or a Dask-backed HPC cluster.

Remote execution also doubles as a reproducibility test. If your code produces the same result on a clean worker, it's reproducible. If not, Seamless helped you find the problem, whether it's a missing dependency, an undeclared input, or a platform sensitivity.

Built for scientific computing and data pipelines, but works for anything pipeline-shaped.


Comments URL: https://news.ycombinator.com/item?id=47809537

Points: 1

# Comments: 0

来源: hnrss.org查看原文