degapseq.html

<HTML>
<HEAD>
  <TITLE>
  EMBOSS: degapseq
  </TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" text="#000000">

<table align=center border=0 cellspacing=0 cellpadding=0>
<tr><td valign=top>
<A HREF="/" ONMOUSEOVER="self.status='Go to the EMBOSS home page';return true"><img border=0 src="emboss_icon.jpg" alt="" width=150 height=48></a>
</td>
<td align=left valign=middle>
<b><font size="+6">
degapseq
</font></b>
</td></tr>
</table>
<br>&nbsp;
<p>


<H2>
    Function
</H2>
Removes gap characters from sequences
<H2>
    Description
</H2>

<b>degapseq</b> reads in one or more sequences and writes them out again
minus any gap characters.  In effect it removes gaps from aligned
sequences. 

<p>

In fact, if does more than just this as it removes <b>ANY</b>
non-alphabetic character from the input sequence, so as well as removing
the gap-characters, it will remove such things as the '*' in protein
sequences that indicates the position of a 'translated' STOP codon. 

<p>

There are many different formats for storing sequences in files.  Some
sequence formats allow you to store aligned sequences, including the
information on where gaps have been introduced to make the sequence
align properly.  This is indicated by using a special character to
indicate that there is a gap at that position.  Different sequence
formats use different characters to indicate gaps.  Some formats may use
more than one type of character to indicate different types of gaps
(e.g.  gaps at the ends of the sequences, internal gaps, gaps introduced
by a program or by a person editing the alignment, etc.) Some typicate
characters used to indicate where gaps are may be: '.', '-' and '~'. 

<p>

When EMBOSS programs read in a sequence that has gap-characters in, all
gap characters are internally changed to '-' characters.  i.e.  EMBOSS
only has one type of gap character.  Thus any distinguishing characters
for different gap types are reduced to a '-'.  There is only one type of
gap in EMBOSS. 

<p>

<b>degapseq</b> removes any non-alphabetic character in the sequence, in
effect this means that gaps and '*' characters are removed.  The
sequence is then written out. 

<H2>
    Usage
</H2>
<b>Here is a sample session with degapseq</b>
<p>

<p>
<table width="90%"><tr><td bgcolor="#CCFFFF"><pre>

% <b>degapseq dnagap.fasta nogaps.seq </b>
Removes gap characters from sequences

</pre></td></tr></table><p>
<p>
<a href="#input.1">Go to the input files for this example</a><br><a href="#output.1">Go to the output files for this example</a><p><p>


<H2>
    Command line arguments
</H2>

<table CELLSPACING=0 CELLPADDING=3 BGCOLOR="#f5f5ff" ><tr><td>
<pre>
   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     (Gapped) sequence(s) filename and optional
                                  format, or reference (input USA)
  [-outseq]            seqoutall  [<sequence>.<format>] Sequence set(s)
                                  filename and optional format (output USA)

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outseq" associated qualifiers
   -osformat2          string     Output seq format
   -osextension2       string     File name extension
   -osname2            string     Base file name
   -osdirectory2       string     Output directory
   -osdbname2          string     Database name to add
   -ossingle2          boolean    Separate file for each entry
   -oufo2              string     UFO features
   -offormat2          string     Features format
   -ofname2            string     Features file name
   -ofdirectory2       string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

</pre>
</td></tr></table>
<P>
<table border cellspacing=0 cellpadding=3 bgcolor="#ccccff">
<tr bgcolor="#FFFFCC">
<th align="left" colspan=2>Standard (Mandatory) qualifiers</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>

<tr>
<td>[-sequence]<br>(Parameter 1)</td>
<td>(Gapped) sequence(s) filename and optional format, or reference (input USA)</td>
<td>Readable sequence(s)</td>
<td><b>Required</b></td>
</tr>

<tr>
<td>[-outseq]<br>(Parameter 2)</td>
<td>Sequence set(s) filename and optional format (output USA)</td>
<td>Writeable sequence(s)</td>
<td><i>&lt;*&gt;</i>.<i>format</i></td>
</tr>

<tr bgcolor="#FFFFCC">
<th align="left" colspan=2>Additional (Optional) qualifiers</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>

<tr>
<td colspan=4>(none)</td>
</tr>

<tr bgcolor="#FFFFCC">
<th align="left" colspan=2>Advanced (Unprompted) qualifiers</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>

<tr>
<td colspan=4>(none)</td>
</tr>

</table>

<H2>
    Input file format
</H2>
Any valid input sequence USA is allowed.

<p>

The input sequence can be nucleic or protein.

<p>

The input sequence can be gapped or ungapped.

<p>

<a name="input.1"></a>
<h3>Input files for usage example </h3>
<p><h3>File: dnagap.fasta</h3>
<table width="90%"><tr><td bgcolor="#FFCCFF">
<pre>
&gt;FASTA F10002 FASTA FORMAT DNA SEQUENCE
ACGT....ACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGT
</pre>
</td></tr></table><p>


<H2>
    Output file format
</H2>

The output is a sequence with no gaps.

<p>

<a name="output.1"></a>
<h3>Output files for usage example </h3>
<p><h3>File: nogaps.seq</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
&gt;FASTA F10002 FASTA FORMAT DNA SEQUENCE
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGTACGTACGTACGT
</pre>
</td></tr></table><p>

<H2>
    Data files
</H2>

None.

<H2>
    Notes
</H2>

None.

<H2>
    References
</H2>


None.

<H2>
    Warnings
</H2>

It will remove '*' characters from protein sequences as well as removing
the gap characters. 

<H2>
    Diagnostic Error Messages
</H2>

None.


<H2>
    Exit status
</H2>

It always exits with status 0.


<H2>
    Known bugs
</H2>

None.


<h2><a name="See also">See also</a></h2>
<table border cellpadding=4 bgcolor="#FFFFF0">
<tr><th>Program name</th><th>Description</th></tr>
<tr>
<td><a href="biosed.html">biosed</a></td>
<td>Replace or delete sequence sections</td>
</tr>

<tr>
<td><a href="codcopy.html">codcopy</a></td>
<td>Reads and writes a codon usage table</td>
</tr>

<tr>
<td><a href="cutseq.html">cutseq</a></td>
<td>Removes a specified section from a sequence</td>
</tr>

<tr>
<td><a href="descseq.html">descseq</a></td>
<td>Alter the name or description of a sequence</td>
</tr>

<tr>
<td><a href="entret.html">entret</a></td>
<td>Reads and writes (returns) flatfile entries</td>
</tr>

<tr>
<td><a href="extractalign.html">extractalign</a></td>
<td>Extract regions from a sequence alignment</td>
</tr>

<tr>
<td><a href="extractfeat.html">extractfeat</a></td>
<td>Extract features from a sequence</td>
</tr>

<tr>
<td><a href="extractseq.html">extractseq</a></td>
<td>Extract regions from a sequence</td>
</tr>

<tr>
<td><a href="listor.html">listor</a></td>
<td>Write a list file of the logical OR of two sets of sequences</td>
</tr>

<tr>
<td><a href="makenucseq.html">makenucseq</a></td>
<td>Creates random nucleotide sequences</td>
</tr>

<tr>
<td><a href="makeprotseq.html">makeprotseq</a></td>
<td>Creates random protein sequences</td>
</tr>

<tr>
<td><a href="maskfeat.html">maskfeat</a></td>
<td>Mask off features of a sequence</td>
</tr>

<tr>
<td><a href="maskseq.html">maskseq</a></td>
<td>Mask off regions of a sequence</td>
</tr>

<tr>
<td><a href="newseq.html">newseq</a></td>
<td>Type in a short new sequence</td>
</tr>

<tr>
<td><a href="noreturn.html">noreturn</a></td>
<td>Removes carriage return from ASCII files</td>
</tr>

<tr>
<td><a href="notseq.html">notseq</a></td>
<td>Exclude a set of sequences and write out the remaining ones</td>
</tr>

<tr>
<td><a href="nthseq.html">nthseq</a></td>
<td>Writes one sequence from a multiple set of sequences</td>
</tr>

<tr>
<td><a href="pasteseq.html">pasteseq</a></td>
<td>Insert one sequence into another</td>
</tr>

<tr>
<td><a href="revseq.html">revseq</a></td>
<td>Reverse and complement a sequence</td>
</tr>

<tr>
<td><a href="seqret.html">seqret</a></td>
<td>Reads and writes (returns) sequences</td>
</tr>

<tr>
<td><a href="seqretsplit.html">seqretsplit</a></td>
<td>Reads and writes (returns) sequences in individual files</td>
</tr>

<tr>
<td><a href="skipseq.html">skipseq</a></td>
<td>Reads and writes (returns) sequences, skipping first few</td>
</tr>

<tr>
<td><a href="splitter.html">splitter</a></td>
<td>Split a sequence into (overlapping) smaller sequences</td>
</tr>

<tr>
<td><a href="trimest.html">trimest</a></td>
<td>Trim poly-A tails off EST sequences</td>
</tr>

<tr>
<td><a href="trimseq.html">trimseq</a></td>
<td>Trim ambiguous bits off the ends of sequences</td>
</tr>

<tr>
<td><a href="union.html">union</a></td>
<td>Reads sequence fragments and builds one sequence</td>
</tr>

<tr>
<td><a href="vectorstrip.html">vectorstrip</a></td>
<td>Strips out DNA between a pair of vector sequences</td>
</tr>

<tr>
<td><a href="yank.html">yank</a></td>
<td>Reads a sequence range, appends the full USA to a list file</td>
</tr>

</table>

<H2>
    Author(s)
</H2>

Gary Williams (gwilliam&nbsp;&copy;&nbsp;rfcgr.mrc.ac.uk)
<br>
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK


<H2>
    History
</H2>

Written (6 March 2001) - Gary Williams

<H2>
    Target users
</H2>
This program is intended to be used by everyone and everything, from naive users to embedded scripts.

<H2>
    Comments
</H2>
None


</BODY>
</HTML>