README.md 17.8 KB

License BioLib


Subcellular Localizations

A fast, easy, and cost-free way for you to learn more about your proteins =)

Table of Contents
  1. About FastProtein
  2. Installation
  3. Usage
  4. Contact Info
  5. Citation and Acknowledgments

About FastProtein

Developed by the Bioinformatics Laboratory - UFSC and IFSC under the responsibility of Professor PhD. Renato Simões Moreira.

FastProtein is a software developed in Java that brings together various protein characteristics. We use the main prediction players and organize them into various files and images so that you can have information quickly and easy.

We use Docker technology for the development of a Linux image (based on Debian) with all the necessary installation not only to run FastProtein but also all the bioinformatics software we use in the pipeline. This way, you don't have just a Docker for a single tool but for multiple ones.

If you have questions, suggestions or difficulties regarding the pipeline, please do not hesitate to contact our team here on GitHub or by Bioinformatics Lab (UFSC).

(back to top)


Output Results

FastProtein can evaluate multiple protein parameters in a single run. As output, user obtain a table with:

  • ID: Protein ID from your FASTA file

  • Length: The length of the protein sequence

  • kDa: Molecular mass in kilodaltons

  • Isoelectric Point: Isoelectric point of the full protein sequence

  • Hydropathy: Hydropathy index of the full protein sequence

  • Aromaticity: Aromaticity index of the full protein sequence

  • Membrane Evidence: We provide in silico evidence of proteins related to the membrane

  • Subcellular Localization Prediction: Prediction of the protein's subcellular localization using WoLF PSORT

  • Prediction of Transmembrane Helices in Proteins: Prediction of transmembrane helices in proteins using TMHMM-2.0c and Phobius

  • Prediction of Signal Peptides: Prediction of signal peptides using SignalP-5 and Phobius

  • GPI-Anchored Proteins: Prediction of GPI-anchored proteins using PredGPI

  • Endoplasmic Reticulum Retention domains (peptide and position) - E.R Retention Domain.

  • N-Glycosylation domains (peptide and position) - N-Glyc Domain.

  • Header: Protein header

  • Gene Ontology, Panther, and Pfam: Protein function and annotation using InterProScan5

  • Charts

  • Alignment by Diamond and Blast

  • and much more...

(back to top)


The workflow


Technologies

(back to top)

Installation

If you want to create a local image Docker from scratch (optional)

  1. Clone the repository

    git clone https://github.com/bioinformatics-ufsc/FastProtein
  2. Change directory and build container

    cd FastProtein
    docker build -t bioinfoufsc/fastprotein:latest .

    or if you want to install InterProScan

    cd FastProtein
    docker build --build-arg INTERPRO_INSTALL=Y -t bioinfoufsc/fastprotein-interproscan:latest .

    If you are using the version with InterProScan, ensure that your Docker has at least 10GB of available RAM to run the program.

Get a image from DockerHub (recommended)

  1. Pull an image to host (clean, without InterProScan) -
   #Light version 900Mb compressed
   docker pull bioinfoufsc/fastprotein
  1. Pull a image to host (with InterProScan)
   #Full version with interpro installed (don't need to execute Step 4, just change the image name in the end of the command)
   docker pull bioinfoufsc/fastprotein-interpro

Basic running (recommended)

  1. This is the easiest way to run FastProtein
   docker run -it --name FastProtein -p 5000:5000 bioinfoufsc/fastprotein:latest

Now, access the url http://127.0.0.1:5000 and enjoy!

   Login: admin
   Password: admin

Advanced running sets (CLI mode)

# Step 1 - Create a local directory that will be used to exchange files with Docker (example FastProtein/ inside user home)
#          ~/FastProtein is the work directory
#          ~/FastProtein/runs the directory that stores the FastProtein web server requests
#          
mkdir -p <your_home>/FastProtein/runs

# Step 2 - File sharing - If you are using MacOS you have to share your folder before go to Step 3

# Step 3 - Create a container named FastProtein that will have the volume associated with the locally created directory. 
#          Port 5000 is used to access the FastProtein web server.
#          PS 1: this command is executed only one time and it will create and start your container
docker run -it --name FastProtein -p 5000:5000 -v <your_directory_output>:/FastProtein/runs bioinfoufsc/fastprotein:latest
# Step 3.1 - If you have InterProScan on your host, you can direct it to the FastProtein Docker InterProScan directory as follows.
#          The supported version is interproscan-5.61-93.0 (http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.61-93.0/interproscan-5.61-93.0-64-bit.tar.gz)
docker run -it --name FastProtein -p 5000:5000 -v <your_directory_output>:/FastProtein/runs -v <your_interpro_home>:/bioinformatic/interproscan-5.61-93.0 bioinfoufsc/fastprotein:latest
#
# Step 4 - InterProScan installation
#          This step may take ~1 hour total
docker exec -it FastProtein interpro_install

# The 'docker run' command starts the container for the first time. 
# If everything runs without errors, open a browser and go to 127.0.0.1:5000 to access the FastProtein web.

# The following commands are for you to learn how to control a Docker container.
# If you want to STOP the container, the command is:
docker stop FastProtein
# If you want to START the container, the command is:
docker start FastProtein
# To check if your container is running, the command is:
docker ps | grep FastProtein
# If you want to enter inside the container, the command is:
docker exec -it FastProtein /bin/bash 
# To exchange files between the host and the container without using a volume, use the command:
# docker cp <local_file> <container_id>:<container_file>
# E.g: copy a fasta file test/human.fasta to /FastProtein (or other directory if you need)
docker cp test/human.fasta FastProtein:/FastProtein
# E.g: copy the folder runs from container to local runs folder
docker cp FastProtein:/FastProtein/runs ./runs

(back to top)

Usage

FastProtein Web Server

Default IP is 127.0.0.1 and exposed port is 5000.

Just open the following link in a browser and FastProtein local service will be up and running: (127.0.0.1:5000)

Results will be redirect to directory /FastProtein/runs linked with the local folder ~/FastProteins/runs. A list of zip files is showed in the web page.

Server Screen

Using via docker container (local)

## To learn about the execution parameters, type:
docker exec -it FastProtein fastprotein -h
## Example of execution:
##        input.fasta - proteins to analyze
##        db.fasta - database for blast search (protein FASTA)
##        result_test - local inside the container with the results (/fastprotein/result_test is linked with the local folder ~/fastprotein/result_test)
docker exec -it FastProtein fastprotein -i /example/input.fasta -db /example/db.fasta -o result_test

Using inside the docker container

## Enter into container
docker exec -it FastProtein /bin/bash 
##
## Execute the command:
fastprotein -h
##
## Simplest execution (the default output is fastprotein_results)
fastprotein -i /example/input.fasta 
##
## Example of a complete execution (with InterproScan and Diamond [default]) with output in folder result_test
fastprotein -i /example/input.fasta -db /example/db.fasta --interpro -o result_test
##
## Example of a complete execution (with InterproScan and BlastP) with output in folder result_test
fastprotein -i /example/input.fasta -db /example/db.fasta -am blastp --interpro -o result_test
##
## The same example but with output in zip mode
fastprotein -i /example/input.fasta -db /example/db.fasta --interpro -o result_test --zip

##Output console It is possible to see all the commands running using the flag -log ALL in command line.

##Use your FastProtein container as your Bioinformatic tools In the FastProtein container, it is possible to run the software used within the pipeline as follows:

  ## WoLFPSORT (output = <local_execution_folder>/wolfsort.out)
  docker exec -it FastProtein wolfpsort animal /example/input.fasta > wolfpsort.out
  ## SignalP5 (output = <local_execution_folder>/signalp.out)
  docker exec -it FastProtein signalp -fasta /example/input.fasta -stdout > signalp.out
  ##Phobius (output = <local_execution_folder>/phobius.out)
  docker exec -it FastProtein phobius -short /example/input.fasta > phobius.out
  #TMHMM2 (output = <local_execution_folder>/tmhmm.out)
  docker exec -it FastProtein tmhmm2 /example/input.fasta > tmhmm2.out
  #PredGPI (output = <local_execution_folder>/predgpi.out)
  docker exec -it FastProtein predgpi /example/input.fasta > predgpi.out
  #InterProScan5 (output = <shared_folder>/interpro.out)
  docker exec -it FastProtein interproscan -i /example/input.fasta -f tsv -o interpro.out --goterms

Remember, the results are stored INSIDE the docker and will reflect in your local folder only if the output is set to /fastprotein/

(back to top)


Contact Info

Project Links:

(back to top)

Citation and Acknowledgments

This software was developed using Java 17 (please cite BioJava) and Python 3.

Please cite us:

FastProtein also uses a suite of software, please cite them too:

(back to top)