CLI written in python to search and save data from MIBiG.
Find a file
2025-11-13 09:07:16 -03:00
src typo correction 2025-11-13 09:07:16 -03:00
.gitignore exclude db files from git 2024-12-19 09:10:25 -03:00
LICENSE.txt Initial commit 2024-09-14 10:09:44 -03:00
main.py less arguments in functions 2024-10-02 10:57:42 -03:00
main.spec pyinstaller spec update 2024-09-20 06:32:21 -03:00
pyproject.toml updated dependencies 2025-01-23 07:34:04 -03:00
README.md updated readme 2024-12-19 09:44:06 -03:00

pyMIBiG

PyPI - Version PyPI - Downloads

A small tool to download, match and save sequences from MIBiG.

pyMIBiG can search by "organism name", "compound / product", "biosynthetic class" and "entry quality" as intersections of every argument added. Which means that the more arguments you add more restrictive your search becomes. It uses the available MIBiG download files which have less information then those returned when using their web search. So, for very specific queries, that yield fewer results, you will be better using the web interface.

Usage

Download the available package of pyMIBiG and execute pymibig -<target> where target is the term you wanto to search in MIBiG database.

You can also install it using pip. In a virtual environment execute:

pip install pymibig

By default pyMIBiG will fetch all entry data and information of a given target.

You may change that using optional aguments passed along with the <target>:

usage: pyMIBiG [-h] [-o ORGANISM] [-p PRODUCT] [-b BIOSYNT] [-c {complete,incomplete,unknown,all}] [-q {low,medium,high,questionable,all}]

A small tool to download, match and save targeted sequences from MIBiG.

options:
  -h, --help            show this help message and exit
  -o ORGANISM, --organism ORGANISM
                        Organism name to query in database.
  -p PRODUCT, --product PRODUCT
                        Compound to query in database.
  -b BIOSYNT, --biosynt BIOSYNT
                        Biosynthetic class to query in database.
  -c {complete,incomplete,unknown,all}, --completeness {complete,incomplete,unknown,all}
                        Loci completeness.
  -q {low,medium,high,questionable,all}, --quality {low,medium,high,questionable,all}
                        Entry quality level.

You have to use at least one of the following arguments: organism, product or biosynt. The others are optional.

On first execution pyMIBiG will download the database files from MIBiG and save locally, so an internet connection will be needed, after that it can be used offline.

Latest release of pyMIBiG will download from MIBiG Version 4.0 (November 15, 2024) the:

  • Metadata in compressed format, including several JSON files;
  • Nucleotide sequences of the biosynthetic gene clusters in compressed format, including several GBK files;
  • Amino acid sequence translations of all genes from MIBiG entries are also available in a single compressed FASTa file.

Version 1.2.7 uses MIBiG Version 3.1 (October 7, 2022).

Output

pyMIBiG will create three files:

  • a FASTa containing nucleotide sequences
  • a FASTa containing aminoacid sequences
  • a tab-separated value table with information on the selected sequences

The filenames will reflect the parameters used when searching the database.

Ps.: Retired entries will be presented in the table, but there will be no sequences for them.

Reference

MIBiG 4.0: Advancing Biosynthetic Gene Cluster Curation through Global Collaboration.

License

pyMiBiG is distributed under the terms of the LGPL 3.0 license.

Disclaimer

pyMIBiG is a free software and comes with ABSOLUTELY NO WARRANTY. Use at your own risk. The developer of pyMIBiG has no relationship of any kind with MIBiG or the Genomic Standards Consortium.