          
|
MaM: Multiple alignment
Manipulator
What's MaM?
MaM is a software tool that processes and manipulates multiple
alignments of genomic sequences.
MaM computes the exact locations of common repeat elements in multiple
aligned sequences, provided by a variety of user identified programs
databases and tables.
It then graphically displays how the alignment quality varies
throughout the aligned sequences, providing separate displays for the
repeat and non-repeat portions.
Latest Version
Latest version of MaM is 1.4.2; last update was on March 20,
2006. Check back here for further updates.
Changelog
Availability
MaM is freely available for non-commercial use. You can download the
source code
here. For more
information, you can download the paper describing MaM here.
WebMaM: Web Interface for MaM
The web interface for MaM is
hosted at LIRMM.
Authors
The authors of the program are
Can Alkan and Eray
Tüzün.
References
If you use MaM or WebMaM in your projects, please cite the following:
"Manipulating Multiple Sequence Alignments via MaM and WebMaM",
Can Alkan, Eray Tuzun, Jerome Buard, Franck Lethiec, Evan E. Eichler,
Jeffrey A. Bailey, S. Cenk Sahinalp.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W295-8.
Documentation
-
Configuration File:
At the first run of
MaM, regardless of using any options, it will create a configuration
file named ".mam-config" in the home directory of the user. MaM will
try to guess the locations for RepeatMasker, cross_match, sim4
and gnuplot programs, if the Unix/Linux utility which
is
installed. If not, MaM will create an empty file. Users are strongly
recommended to check the contents of the ".mam-config" file, and edit
if necessary. A sample configuration file can be found here .
-
Input Files:
MaM requires a single text file containing the alignment of multiple
sequences. The input formats supported by MaM are Clustal, NEXUS, MEGA, Fasta, and
Phylip.
Click on the file format name
to get an example file of the same alignment.
-
Output Files:
The output formats for MaM are the same with the input formats, but
MaM can also output in identity dot representation
format and also HTML format. Click here
to get an example file. HTML format looks the same with ClustalW
format; however, bases that are the same with the consensus in the
same location (i.e. majority base in that column) are marked with cyan
background, and those different are marked with pink. If all the bases
in a given column is the same, no marking is done. This is
particularly useful if the overall sequence divergense is to low and
if the user wishes to see the differences clearly. However; HTML
output option is not recommended if the overall sequenc divergence is
high: in that case the output HTML file will be huge. Click here to get an example HTML file.
-
Supported programs:
- RepeatMasker
- cross_match
- sim4
- table file (native to MaM): This option is
native to MaM, and a
table file is basically a text file that contains a table of motif
begin and end locations of the regions of interest. Instead of using
repeatmasker, cross_match or sim4 to determine the repeat/unique
regions or intronic/exonic regions, the user can opt to use this
property to determine the interesting regions. The subalignments given
in the table file will be regarded as "repeats" by the rest of the
MaM processing. The table file format should be as follows:
<sequence_name_1> <start_loc1> <end_loc1>
<sequence_name_2> <start_loc1> <end_loc1>
...
<sequence_name_n> <start_loc1> <end_loc1>
...
<sequence_name_1> <start_loc2> <end_loc2>
<sequence_name_2> <start_loc2> <end_loc2>
...
The user can give as many "interesting" regions as needed; but here
the start and end locations should be the coordinates in the original
sequence (not the coordinates on the alignment). Click here
for the sample
table file generated for the above input files, that mark all the
sequence as "interesting" region.
- convert (native to MaM): This option can be used
if the user only
intents to convert file formats. No more processing will be done, only
the input alignment will be converted to other formats as specified by
-clustal, -nexus, -mega, -fasta, -phylip, -html
and -identity options.
- none (native to MaM): When this option is
selected, MaM will do
no processing at all. This option is useful only when used with
-alnstats and/or -consensus
option(s) as described below.
-
Divergence Rate
Calculation:
MaM displays the quality of alignment by sliding a window (whose size
is determined by the user) through the whole alignment. MaM computes
the divergence score for the
alignment confined in each window position and displays how this score
varies over the whole alignment.
For calculating the divergence score within a window, the user has
three choices:
- pairwise deletion: The first
option is using sum-of-pairs
score, where every pair of bases in the same column of the alignment
are compared to each other, and the ratio of diverging pairs (not
counting the gaps) are reported.
- complete deletion: The second
option is the same except all the columns that contain a gap / deletion
from the alignment are ignored.
- parsimony score: The last
option returns
the ratio of the bases that differ from the mostly occuring base in the
same column.
Example: Let a given column of an alignment of 10 sequences be
ATTG--GTCT. For this column, pairwise deletion
option will compare every pair of bases (excluding gaps): AT, AT, AG,
AG, AT, AC, AT; TT, TG, TG, TT, TC, TT; etc., thus returning
sum-of-pairs score of 18. The complete deletion
option will discard the whole column since it includes at least one (in
this case, two) gap, thus returning sum-of-pairs score of 0. It would
however return the same value with pairwise deletion
option for all columns that do not include a gap. The last option
parsimony score will first find out that T is the most
abundant character in that column, and count the number of non-T bases,
returning 6. Of course, this score will be normalized with respect to
the number of sequences and window length to compute the divergence
rate at the end of the process.
-
List of options:
MaM is a menu-driven program for the ease of usability. However it is
also possible to use MaM with command line options with the same
functionality, thus enabling the user to call MaM via other
scripts:
- -program: One of
repeatmasker, crossmatch, sim4,
table, convert, or none (the user should
also import own tablefile via the
exonfile option if table is selected).
- -exonfile: If table
is selected for the program option, exonfile
is a table of motif begin and end locations. Otherwise, the user
imports a cDNA file using this parameter; in that case the selected
program (repeatmasker, crossmatch, sim4)
finds the locations of exons instead of
repeats.
- -update: Toggle updating locations
on/off. This option can
be selected only if the program is set to table.
If set to off, MaM will not update the alignment/sequence coordinates,
instead it will use the alignment coordinates in the tablefile. This is
useful when the user selects to cut columns from the alignment without
any more processing.
- -column: single
concatenates all the motifs, and multiple extracts
each motif in a separate file.
- -merge: max
criterion computes the union of two successive overlapping motifs, and
min criterion computes the intersection of two successive
overlapping motifs.
- -keep: If on
is selected the unique (or intronic) portions of the sequences are
kept; otherwise ( off the repeat (or exonic)
locations are kept.
- -slider: Toggle sliding
window function on/off.
- -pc: Select pairwise
(-pc=p) or complete (-pc=c) deletion; or
parsimony score (-pc=s).
- -sw: Select slide width
(default is 10).
- -ww: Select window size
(default is 100).
- -clustal, -nexus, -mega, -fasta,
-phylip, -html, -identity: Toggle output formats Clustal,
NEXUS, MEGA, Fasta, Phylip, HTML and consensus identity
dot representation respectively.
- -alnstats: Toggle Alignment
Statistics File Dump on/off. A text file consisting
of some statistical information will be dumped to a file with the
extension ".stats". The user can select -program=none
if no more processing is needed. The statistics file will include:
number of sequences in the alignment, alignment length, sequence with
mininum length and its name, sequence with maximum length and its name,
average length, number of perfectly conserved columns, percentage of
perfectly conserved columns; and a table that shows for each sequence:
its length and its G+C percentage.
- -consensus: Toggle consensus
sequence output on/off. If set to on,
the consensus sequence will be outputted in a separate file in Fasta
format. The user can select -program=none if no
more processing is needed.
- -cgaps: Toggle gaps in consensus
sequence on/off. If set to on,
the consensus sequence will include gaps if gap is the most abundant
character in the column it occures. If set to off, no gaps will
be included in the consensus; for the columns that the gap character
is the most abundant one, the second most abundant character in that
column will be
reported in the consensus.
- -include: Toggle including
consensus in output on/off. If set to on,
the consensus sequence will be included in all output files, as if it
is one of the input sequences.
- -rmasker_opts, -cmatch_opts,
and sim4_opts: Override default options for repeatmasker,
cross_match and sim4 respectively.
Example : -rmasker_opts="-no_is -nolow -q -alu".
- -defaults: See defaults for
options.
- -h: Help.
- -V: Verbose (will generate
lots of screendump).
- -v: Version.
-
Sample Run:
The input alignment file used in this sample can be downloaded here. This sample
run uses
repeatmasker as the external program, and NPIP file as exonfile.
Sliding window
size is determined as 1000 bases, where each window is moved by 100
bases. Two runs are
executed:
|
|