EggNOG-mapper (a.k.a. emapper.py or just emapper) is a tool for fast functional annotation of novel sequences. It uses precomputed orthologous groups (OGs) and phylogenies from the eggNOG database (http://eggnogdb.embl.de/) to transfer functional information from fine-grained orthologs only.
Common uses of eggNOG-mapper include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs.
The use of orthology predictions for functional annotation permits a higher precision than traditional homology searches (i.e. BLAST searches), as it avoids transferring annotations from close paralogs (duplicate genes with a higher chance of being involved in functional divergence).
- Python 3.7 (or greater)
- BioPython 1.76 (python package) (BioPython 1.78 should work since emapper version 2.1.7)
- psutil 5.7.0 (python package)
- xlsxwriter 1.4.3 (python package), only if using the --excel option
- wget (linux command, required for downloading the eggNOG-mapper databases with download_eggnog_data.py, and to create new Diamond/MMseqs2 databases with create_dbs.py)
- sqlite (>=3.8.2)
- ~40 GB for the eggNOG annotation databases (eggnog.db and eggnog.taxa.db)
- ~9 GB for Diamond database of eggNOG sequences (required if using -m diamond, which is the default search mode).
- ~11 GB for MMseqs2 database of eggNOG sequences (~86 GB if MMseqs2 index is created) (required if using -m mmseqs).
- ~3 GB for PFAM database (required if using --pfam_realign options for realignment of queries to PFAM domains).
The size of eggNOG diamond/mmseqs databases created with create_dbs.py is highly variable, depending on the size of the chosen taxonomic groups.
Databases for specific taxonomic ranges can be downloaded (for HMMER) or created (for Diamond and MMseqs2). The size of these databases is highly variable. For the size of HMMER databases, check http://eggnog5.embl.de/#/app/downloads. For Diamond and MMseqs2 databases, DB size will depend on the number of proteins which are from those taxonomic ranges. Also, these proteins need to be downloaded to create the databases, and can be removed afterwards.
- Using --dbmem loads the whole eggnog.db sqlite3 annotation database during the annotation step, and therefore requires ~44 GB of memory.
- Using the --num_servers option when running HMMER in server mode (a.k.a. hmmgpmd, which is used for -m hmmer --usemem, --pfam_realign denovo or hmm_server.py) loads the HMM database as many times as specified in the argument (e.g. --pfam_realign denovo --num_servers 2 loads the PFAM database into memory twice, with up to roughly 2 GB per instance).
conda install -c bioconda eggnog-mapper
#创建数据库存储目录 mkdir /home/liu/miniconda3/envs/pgcgap/lib/python3.7/site-packages/data #将数据库位置加入环境变量 export EGGNOG_DATA_DIR=/home/liu/miniconda3/envs/pgcgap/lib/python3.7/site-packages/data #下载数据库 download_eggnog_data.py -P -M -y -f --data_dir /home/liu/miniconda3/envs/pgcgap/lib/python3.7/site-packages/data
Similarly, use create_dbs.py. For example, to create a diamond database for Bacteria only:
create_dbs.py -m diamond --dbname bacteria --taxa Bacteria
This will create a bacteria.dmnd diamond database in the default data directory or the one specified in EGGNOG_DATA_DIR environment variable. Such database can be used with emapper.py --dmnd_db bacteria.dmnd. Note that the first time create_dbs.py is used it will take time to download the eggNOG proteins and create the Diamond or MMseqs2 database. Next calls to create_dbs.py (to the same data directory pointed by EGGNOG_DATA_DIR, or --data_dir, or data/ by default) will not need to download the eggnog5 proteins again. If no more databases are going to be created, the proteins can be removed. For further info, check create_dbs.py --help.
关注公众号 “生信之巅”，聊天窗口回复 “” 获取下载链接。