GFinisher. bioinformatics tool developed in the Graduate Program in Sciences - Biochemistry at the UFPR

Small manual


GFinisher is a finisher assembler tool that combines three tools:


The application may be executed in automated or manually mode. In automatic mode requires the availability of a reference genome.

How to run complete pipeline

For this small demonstration, to run the GFinisher is necessary to obtain assemblies and a reference genome. In this case study, we use the assemblies provided by GAGE-B genome of Bacteroides fragilis - HiSeq.

The download of assemblies can be made of the GAGE-B site or sourceforge. The reference genome can be download from NCBI or here.

If your computer doesn't have Java installed, try to download java and install it (www.java.com), preferably the 64-bit version.

Download the GFinisher binaries link

With all the resources available for you to run GFinisher, get to run.

Graphics mode

Run GFinisher in system console:


java -Xms2G -Xmx4G -jar GenomeFinisher.jar

At tab "Blast" verify the path where the blast was installed.

 

Go back to "Basic" tab and choose msrca_ctg.fasta file to "Target assembly" field and click at "Add" button in alternative assemblies section and choose files: abyss_ctg.fasta, cabog_ctg.fasta, mira_ctg.fasta, sga_ctg.fasta, soap_ctg.fasta, spades_ctg.fasta, velvet_ctg.fasta.

Add FQ312004.fna file in reference genome.

The figure below shows the data setup.

Click the first button in main toolbar and wait for about 15 minutes.

Console/Text mode

Run GFinisher in system console:


 java -Xms2G -Xmx4G -jar GenomeFinisher.jar -config -create name_of_config.txt
            

Verify and adjust the settings on the name_of_config.txt, especially the path of the Blast and run


 java -Xms2G -Xmx4G -jar GenomeFinisher.jar -config name_of_config.txt -i msrca_ctg.fasta \
    -ref FQ312004.fna \
    -ds abyss_ctg.fasta,cabog_ctg.fasta,mira_ctg.fasta,sga_ctg.fasta,soap_ctg.fasta,spades_ctg.fasta,velvet_ctg.fasta \
    -o .\MaSuRCA\ -v 1

Results


When GFinisher is done all data generated stay at MaSuRCA folder. The table below show some reports produced by GFinisher in each steps. Click in image to expand view.

Report type Original data After contigSort After Broken FGC Skew Results
N. contigs 1094421011
N. contigs (≥ 500bp)1094421011
N. contigs (≥ 1000bp)1024421011
N5015871630930672014663436
L50116233
Dotplot
GC Skew
Quastreport

Manual mode

Three tools may be executed individually:

Misassemblies detection

The Fuzzy GC Skew is used to identify possible errors in assembly contigs. Follow the step by step how to perform the identification misassblies with GFinisher.

  1. Open GFinisher in graphics mode.
  2. click in Fuzzy GC Skew
  3. Open your multi-fasta file (contigs/scaffolds)
  4. Select "generate Fuzzy GC Skew" and "Break down contigs" and "normalize sense" box
  5. The first run, we recommend 10,000bp for window length
  6. Choose the output filename
  7. Click in process button.

The option "normalize sense" is recommended when the assembler produces some small contigs length (less than window length parameter).

The GFinisher produces the result bellow.

jContigSort

The jContigSort is used to ordering contigs based on genome reference. Follow the step by step how to.

  1. Open GFinisher in graphics mode.
  2. click in Contig Sort
  3. Open your multi-fasta file (contigs/scaffolds)
  4. Add genome reference
  5. Click in "run" button.

The GFinisher produces the result bellow. The first report shows the relative reference position of contigs and second report shows clusters of reference relative positions, this data can be to indicate duplicate region.

jFGap

jFGap is a tool that combine same assemblies to close gaps and improve assemblies, originally version was written in Matlab (Gap).

  1. Open GFinisher in graphics mode.
  2. click in jFGap icon.
  3. Open your target assembly multi-fasta file (contigs/scaffolds)
  4. Add alternative assemblies
  5. Click in "process" button.

If "process button" are disabled, it's possible the paths setup is incomplete (see parameters tab).

The jFGap close only gaps identified by a specific symbol (see parameter tab) and jFGap don't change the order or sense of contigs, therefore recommends the use of jContigSort before running fgap.