Virtual Environments

When working on computational biology or any other kind of project, dependencies can quickly get messy. One project might need Biopython 1.85, another might require a 1.75 version from 2019, because it was the version the project you are trying to reproduce was built upon. So, keeping your dependencies separated by project, instead of installing everything globally, can avoid broken setups. This is where Python’s built-in venv module comes in.

Why use a virtual environment?

A virtual environment creates an isolated Python workspace for each project. This means you can:

  • Keep project dependencies separate.

  • Avoid version conflicts between packages.

  • And most important for publications, make your projects reproducible and easier to share.

Creating a virtual environment

  • Navigate to your project folder:

~/projects/protein_analysis
  • Create the environment:

python3 -m venv venv

Here, venv is just the name of the folder that will store your environment. If you have more than one version of python you can choose which one to use, by adding the version after python, for example:

 python3.12 -m venv venv
  • Activate the environment:

On macOS/Linux:
source venv/bin/activate

When active, you’ll see (venv) before your prompt.

  • Install dependencies:

pip install biopython pandas matplotlib
  • Freeze dependencies for reproducibility:

pip freeze > requirements.txt
  • Deactivate when done:

source deactivate

Best practices:

  • One environment per project. Don’t share environments between unrelated projects.

  • Use requirements.txt or pyproject.toml: Makes it easy for collaborators to recreate your setup.

  • Keep environments lightweight. Only install what you need for that project.

Before we wrap it up, I would like to share my experience on why I decided to switch to Python’s built-in venv instead of Conda. Firstly, Conda was slow to install things (I know there is the mamba compiler that can make things a bit faster, but even then, it’s still not as good as venv). Secondly, I am using a remote server from the university, which only gives me 100 GB of space, and this is quickly filled by Conda packages. When you use venv, all the installation files required for the environment to work will be stored with the project files, which I expect you are also keeping in your working folder on the server, which likely has terabytes of space. Therefore, by using venv, you ensure that each of your projects stays reproducible and lightweight.

Last updated