Virtual Environments
When working on computational biology or any other kind of project, dependencies can quickly get messy. One project might need Biopython 1.85, another might require a 1.75 version from 2019, because it was the version the project you are trying to reproduce was built upon. So, keeping your dependencies separated by project, instead of installing everything globally, can avoid broken setups. This is where Python’s built-in venv module comes in.
Why use a virtual environment?
A virtual environment creates an isolated Python workspace for each project. This means you can:
Keep project dependencies separate.
Avoid version conflicts between packages.
And most important for publications, make your projects reproducible and easier to share.
Creating a virtual environment
Navigate to your project folder:
~/projects/protein_analysisCreate the environment:
python3 -m venv venvHere, venv is just the name of the folder that will store your environment. If you have more than one version of python you can choose which one to use, by adding the version after python, for example:
python3.12 -m venv venvActivate the environment:
source venv/bin/activateWhen active, you’ll see (venv) before your prompt.
Install dependencies:
pip install biopython pandas matplotlibFreeze dependencies for reproducibility:
pip freeze > requirements.txtDeactivate when done:
source deactivateBest practices:
One environment per project. Don’t share environments between unrelated projects.
Use
requirements.txtorpyproject.toml: Makes it easy for collaborators to recreate your setup.Keep environments lightweight. Only install what you need for that project.
Before we wrap it up, I would like to share my experience on why I decided to switch to Python’s built-in venv instead of Conda. Firstly, Conda was slow to install things (I know there is the mamba compiler that can make things a bit faster, but even then, it’s still not as good as venv). Secondly, I am using a remote server from the university, which only gives me 100 GB of space, and this is quickly filled by Conda packages. When you use venv, all the installation files required for the environment to work will be stored with the project files, which I expect you are also keeping in your working folder on the server, which likely has terabytes of space. Therefore, by using venv, you ensure that each of your projects stays reproducible and lightweight.
Last updated