SciPipe logo

SciPipe

Fast, flexible Go library for building robust scientific pipelines

SciPipe lets you compose command‑line tools into efficient, parallel, and reproducible workflows using Go’s type‑safe code, with fine‑grained file naming, streaming, and built‑in audit reporting.

SciPipe banner

Overview

Overview

SciPipe is a Go library that enables developers to create scientific workflows by connecting command‑line programs and native Go processes through a flow‑based programming model. Workflows are defined in Go code, compiled into fast binaries, and support parallel execution, task‑level concurrency, and streaming to minimize disk usage. Built‑in audit logs record every command run, making pipelines reproducible and allowing easy restarts without overwriting existing results.

Deployment

Install SciPipe via the Go toolchain (go install github.com/scipipe/scipipe/...@latest) and initialize a module for your workflow. Once written, the workflow can be run directly with go run or compiled into a self‑contained executable for distribution across platforms where Go is supported. The library works equally well for bioinformatics, cheminformatics, and any domain that relies on chained command‑line tools.

Highlights

Intuitive flow‑based programming model using Go channels
Seamlessly wrap any command‑line tool alongside native Go processes
Compiled to fast binaries with built‑in parallel and task‑level concurrency
Streaming support eliminates intermediate files and reduces disk usage

Pros

  • High performance compiled binaries
  • Strong reproducibility via automatic audit reports
  • Flexible integration of existing command‑line tools
  • Portable executables for easy distribution

Considerations

  • Requires familiarity with Go programming
  • No graphical workflow designer
  • Common Workflow Language (CWL) not supported
  • Some advanced workflow features are still under development

Managed products teams compare with

When teams consider SciPipe, these hosted platforms usually appear on the same shortlist.

Astronomer logo

Astronomer

Managed Apache Airflow service for orchestrating and monitoring data pipelines in the cloud

Dagster logo

Dagster

Data orchestration framework for building reliable pipelines

ServiceNow logo

ServiceNow

Enterprise workflow and IT service management

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Bioinformatics and cheminformatics pipeline development
  • Researchers needing reproducible command‑line workflows
  • Teams comfortable writing Go code
  • Large‑scale data processing that benefits from parallelism

Not ideal when

  • Users preferring GUI‑based workflow tools
  • Projects that require CWL compatibility
  • Simple scripts where Go overhead is unnecessary
  • Environments without a Go toolchain installed

How teams use it

Genome variant calling pipeline

Orchestrates alignment, sorting, and annotation tools with parallel execution, producing reproducible results and audit logs.

Chemical property prediction

Streams molecular data through custom Go analysis and external prediction tools, reducing intermediate storage.

Large‑scale image processing

Chains image conversion utilities while streaming data, achieving high throughput without excessive disk I/O.

Reproducible data‑science experiment

Generates detailed audit reports and enables easy restart of interrupted runs, ensuring full reproducibility.

Tech snapshot

Go100%
Shell1%

Tags

dataflowworkflowworkflow-enginebioinformaticspipelinegocheminformaticsscipipescientific-workflowsgolangbioinformatics-pipelinefbp

Frequently asked questions

How do I install SciPipe?

Install Go, then run `go install github.com/scipipe/scipipe/...@latest` and create a Go module for your workflow.

Can I run workflows without writing Go code?

Workflows are defined in Go; however, the `scipipe new` command can scaffold a basic Go file to get started quickly.

Does SciPipe support the Common Workflow Language (CWL)?

No, CWL support is not currently implemented.

How does SciPipe ensure reproducibility?

It creates an audit JSON file for each output and can restart interrupted runs without overwriting existing results.

What platforms can run SciPipe workflows?

Any platform supported by Go; compiled binaries are portable across operating systems and architectures.

Project at a glance

Dormant
Stars
1,116
Watchers
1,116
Forks
74
LicenseMIT
Repo age10 years old
Last commitlast year
Primary languageGo

Last synced 3 hours ago