Skip to content

All Posts


Scientific Software: How to Build it Better

Lego is great. Why is it so great? - Lego is great because the pieces are simple. - Lego is great because you know how it works. - Lego is great because it fits together so well.

Stacking rocks is hard. Why is it so hard? - Stacking rocks hard because rocks come in all shapes and sizes. - Stacking rocks is hard because you're not immediately sure how to do it best. - Stacking rocks is hard because it requires balance and precision.

left: a huge lego tower. right: isolated stacks of rocks.

You can build almost anything with Lego. It's just bricks on bricks on bricks. The shapes of rocks dictate what you can build with them. It's fun to build lego with friends. Stacking rocks with friends could be fun, but it's more of a solo activity.

Where was I going with this... oh yeah, scientific software!

The Problem with Scientific Software

Scientific software often sucks. It sucks because it's a stack of rocks built by a single person. Their last boss cared only how quickly they could build the tower. They only needed a quick snapshot of it standing tall. Their current boss? Well, their current boss wants a photo of a new stack of rocks on a different beach. Even if the builder cares about their old stack of rocks... keeping it standing doesn't pay the bills any more. Even if the builder likes that people look at their stack of rocks... adding new rocks doesn't pay the bills any more.

How to Build Scientific Software Better

Scientific software could be so much better. How we build matters. We should be aiming to build our scientific software like we build with lego. We use the pieces we have. Those pieces are tried, tested and solid. If we're doing something totally out there, maybe we could 3D print a custom brick or two. You probably shouldn't be doing that much though.

A custom 3D printed mermaid tail for your Lego minifigures

The great thing about Lego is that the interfaces between pieces are well defined and easy to understand. If the architect of the Lego city we live in leaves town, someone could probably build a new hospital on the road at the edge of town. It's hard to build upwards on a stack of rocks.

Concrete Advice

Let's move onto some actual advice...

Build Simple

Build simple pieces. Build as few new pieces as possible. Make it obvious what things do. Make it obvious how the pieces fit together.

Build Together

Lone geniuses can build great things. They can't and won't maintain them forever. You should try to build with other people. You can build bigger than you ever dreamed of. You will learn. You will have fun.

Build to Standards

Nobody wants to spend time deciphering your mess, don't force them to unless it's absolutely necessary. If there is an accepted way of doing things, you should probably do that.

Build From the Bottom Up

Package small, useful modules whilst building your projects. You won't do it later. Your foundations must be solid. Other people will pick them up if they are useful. Other people will maintain them if they can understand them.

Closing

I feel strongly about this topic. It comes from a struggling against unnecessary complexity during my PhD. I spent literal years trying to cobble together an image processing workflow. What I was trying to do looked simple, pass data between a few analysis tools. In practice, it was painful. I spoke with many people who were in the same boat, unable to make the pieces fit together. Nobody paid attention to the interfaces.

I learned Python and built small modules to make things work. Since then, those packages have been used in a number of cool projects.

napari-animation was used by Merlin Lange to make one of Nature's "sharpest science shots" for September 2022.

starfile was used by Brady Johnston to enable visualisations of cellular architecture in Blender via Molecular Nodes.

Utz Ermel and Huy Bui built some of their tools using mine. They didn't have to reinvent the wheel. They got their science done.

I've since worked on a few projects that aren't built like Lego. It's a lot harder to make simple things work.

There is a better way! Please think about the way you are building. Thank you for reading.

Managing Python installations in 2023

The following is a short, opinionated guide on how to manage Python installations for scientific computing in 2023.

Motivation

Why write this post? I work in and around the scientific Python ecosystem, one of the projects I'm involved in is called napari. Some people without Python experience want to use napari, they often end up frustrated.

This makes me sad 🙁

Good news, there is a happy path! Once you're on it, I promise working with Python can be a great experience. I'm writing this to share the way I work and help you get on the right track.

Introduction

We are going to use virtual environments to manage our Python installation(s). Specifically, conda environments managed with micromamba.

Why do I need environments?

Python is ubiquitous, it's probably used in lots of different places on your computer already. Code written for specific versions of either the Python language or Python packages won't necessarily work with different versions. A virtual environment is a little box you can put Python and various packages into which is isolated from the rest of your system. What you do in the box won't affect what's going on outside the box.

Why micromamba?

One of the easiest ways to mess up a conda installation is to install a bunch of stuff into the base environment. micromamba doesn't give you a base environment so you can't mess it up! It's also wicked fast.

Installation

  1. Follow the instructions from the mamba documentation
  2. Set conda as an alias of micromamba

    Note

    Add the following to your ~/.bashrc (Linux) or ~/.zshrc (macOS)

    alias conda="micromamba"
    

  3. Reload your shell so that changes take effect

  4. Set conda-forge as the default channel

    Note
    conda config --add channels conda-forge
    
  1. Follow the instructions from the mamba documentation
  2. Set conda as an alias of micromamba

    Note

    Create the alias and add it to your PowerShell profile

    New-Alias -Name conda -Value micromamba
    Export-Alias -Path "$HOME\Documents\PowerShell\Profile.ps1" -Append -Name conda
    

  3. Reload your shell so that changes take effect

  4. Set conda-forge as the default channel

    Note
    conda config --add channels conda-forge
    

Usage

Creating and activating environments

To create an environment with a specific version of Python

conda create --name my-new-env python=3.10

To work in the environment we first need to activate it. We do this with

conda activate my-new-env

Once inside the environment, you have access to the packages you have installed there.

Note
(my-new-env)  python --version
Python 3.10.11

Installing packages

You can install most packages into your environment with conda or pip. Install what you can with conda. If a package is available on PyPI but not conda-forge then use pip.

(my-new-env) ➜ conda install numpy
(my-new-env) ➜ pip install numpy

Deactivating environments

We can exit an environment with

conda deactivate

Removing environments

We can remove an environment with

conda env remove --name my-new-env

Running commands from outside an environment

If you want to run a command in a specific environment from outside the environment you can

conda run --name my-new-env command

This is useful for workflows which rely on software with incompatible dependencies.

tips

  • get comfortable with creating/destroying and activating/deactivating environments
  • one environment per project is a useful ideal, in practice a general purpose default environment is useful for quick scripting/analysis

Closing

That's it! Working in conda environments empowers you to install things without worrying about messing up your whole system.

Breaking the Ice

I've never really blogged before and whilst setting this up I found myself thinking the whole thing feels a little too self-important. Saying that, I do occasionally find myself wanting to share things which are a little longer than a tweet and I don't really have a place to put them.

I'm hoping this little blog can be that place 🙂.