Home Computational Biology at SFU
SFU.caComputing
ScienceMolecular Biology and
BiochemistryStatisticsSFU Surrey
PeopleResearch ProjectsPublicationsInfrastructureTrainingNews and EventsFinancial SupportSoftware and DownloadsOther Linkssearch
 

MaM: Multiple alignment Manipulator

What's MaM?

MaM is a software tool that processes and manipulates multiple alignments of genomic sequences.

MaM computes the exact locations of common repeat elements in multiple aligned sequences, provided by a variety of user identified programs databases and tables.

It then graphically displays how the alignment quality varies throughout the aligned sequences, providing separate displays for the repeat and non-repeat portions.

Latest Version

Latest version of MaM is 1.4.2; last update was on March 20, 2006. Check back here for further updates.

Changelog

Availability

MaM is freely available for non-commercial use. You can download the source code here. For more information, you can download the paper describing MaM here.

WebMaM: Web Interface for MaM

The web interface for MaM is hosted at LIRMM.

Authors

The authors of the program are Can Alkan and Eray Tüzün.

References

If you use MaM or WebMaM in your projects, please cite the following:

"Manipulating Multiple Sequence Alignments via MaM and WebMaM",
Can Alkan, Eray Tuzun, Jerome Buard, Franck Lethiec, Evan E. Eichler, Jeffrey A. Bailey, S. Cenk Sahinalp.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W295-8.

Documentation

  1. Configuration File:
    At the first run of MaM, regardless of using any options, it will create a configuration file named ".mam-config" in the home directory of the user. MaM will try to guess the locations for RepeatMasker, cross_match, sim4 and gnuplot programs, if the Unix/Linux utility which is installed. If not, MaM will create an empty file. Users are strongly recommended to check the contents of the ".mam-config" file, and edit if necessary. A sample configuration file can be found here .
  2. Input Files:
    MaM requires a single text file containing the alignment of multiple sequences. The input formats supported by MaM are Clustal, NEXUS, MEGA, Fasta, and Phylip. Click on the file format name to get an example file of the same alignment.
  3. Output Files:
    The output formats for MaM are the same with the input formats, but MaM can also output in identity dot representation format and also HTML format. Click here to get an example file. HTML format looks the same with ClustalW format; however, bases that are the same with the consensus in the same location (i.e. majority base in that column) are marked with cyan background, and those different are marked with pink. If all the bases in a given column is the same, no marking is done. This is particularly useful if the overall sequence divergense is to low and if the user wishes to see the differences clearly. However; HTML output option is not recommended if the overall sequenc divergence is high: in that case the output HTML file will be huge. Click here to get an example HTML file.
  4. Supported programs:
    • RepeatMasker
    • cross_match
    • sim4
    • table file (native to MaM): This option is native to MaM, and a table file is basically a text file that contains a table of motif begin and end locations of the regions of interest. Instead of using repeatmasker, cross_match or sim4 to determine the repeat/unique regions or intronic/exonic regions, the user can opt to use this property to determine the interesting regions. The subalignments given in the table file will be regarded as "repeats" by the rest of the MaM processing. The table file format should be as follows:
          <sequence_name_1>  <start_loc1>  <end_loc1>
          <sequence_name_2>  <start_loc1>  <end_loc1>
          ...
          <sequence_name_n>  <start_loc1>  <end_loc1>
          ...
          <sequence_name_1>  <start_loc2>  <end_loc2>
          <sequence_name_2>  <start_loc2>  <end_loc2>
          ...
      The user can give as many "interesting" regions as needed; but here the start and end locations should be the coordinates in the original sequence (not the coordinates on the alignment). Click here for the sample table file generated for the above input files, that mark all the sequence as "interesting" region.
    • convert (native to MaM): This option can be used if the user only intents to convert file formats. No more processing will be done, only the input alignment will be converted to other formats as specified by -clustal, -nexus, -mega, -fasta, -phylip, -html and -identity options.
    • none (native to MaM): When this option is selected, MaM will do no processing at all. This option is useful only when used with -alnstats and/or -consensus option(s) as described below.
  5. Divergence Rate Calculation:
    MaM displays the quality of alignment by sliding a window (whose size is determined by the user) through the whole alignment. MaM computes the divergence score for the alignment confined in each window position and displays how this score varies over the whole alignment. For calculating the divergence score within a window, the user has three choices:
    • pairwise deletion: The first option is using sum-of-pairs score, where every pair of bases in the same column of the alignment are compared to each other, and the ratio of diverging pairs (not counting the gaps) are reported.
    • complete deletion: The second option is the same except all the columns that contain a gap / deletion from the alignment are ignored.
    • parsimony score: The last option returns the ratio of the bases that differ from the mostly occuring base in the same column.
    Example: Let a given column of an alignment of 10 sequences be ATTG--GTCT. For this column, pairwise deletion option will compare every pair of bases (excluding gaps): AT, AT, AG, AG, AT, AC, AT; TT, TG, TG, TT, TC, TT; etc., thus returning sum-of-pairs score of 18. The complete deletion option will discard the whole column since it includes at least one (in this case, two) gap, thus returning sum-of-pairs score of 0. It would however return the same value with pairwise deletion option for all columns that do not include a gap. The last option parsimony score will first find out that T is the most abundant character in that column, and count the number of non-T bases, returning 6. Of course, this score will be normalized with respect to the number of sequences and window length to compute the divergence rate at the end of the process.
  6. List of options:
    MaM is a menu-driven program for the ease of usability. However it is also possible to use MaM with command line options with the same functionality, thus enabling the user to call MaM via other scripts:
    • -program: One of repeatmasker, crossmatch, sim4, table, convert, or none (the user should also import own tablefile via the exonfile option if table is selected).
    • -exonfile: If table is selected for the program option, exonfile is a table of motif begin and end locations. Otherwise, the user imports a cDNA file using this parameter; in that case the selected program (repeatmasker, crossmatch, sim4) finds the locations of exons instead of repeats.
    • -update: Toggle updating locations on/off. This option can be selected only if the program is set to table. If set to off, MaM will not update the alignment/sequence coordinates, instead it will use the alignment coordinates in the tablefile. This is useful when the user selects to cut columns from the alignment without any more processing.
    • -column: single concatenates all the motifs, and multiple extracts each motif in a separate file.
    • -merge: max criterion computes the union of two successive overlapping motifs, and min criterion computes the intersection of two successive overlapping motifs.
    • -keep: If on is selected the unique (or intronic) portions of the sequences are kept; otherwise ( off the repeat (or exonic) locations are kept.
    • -slider: Toggle sliding window function on/off.
    • -pc: Select pairwise (-pc=p) or complete (-pc=c) deletion; or parsimony score (-pc=s).
    • -sw: Select slide width (default is 10).
    • -ww: Select window size (default is 100).
    • -clustal, -nexus, -mega, -fasta, -phylip, -html, -identity: Toggle output formats Clustal, NEXUS, MEGA, Fasta, Phylip, HTML and consensus identity dot representation respectively.
    • -alnstats: Toggle Alignment Statistics File Dump on/off. A text file consisting of some statistical information will be dumped to a file with the extension ".stats". The user can select -program=none if no more processing is needed. The statistics file will include: number of sequences in the alignment, alignment length, sequence with mininum length and its name, sequence with maximum length and its name, average length, number of perfectly conserved columns, percentage of perfectly conserved columns; and a table that shows for each sequence: its length and its G+C percentage.
    • -consensus: Toggle consensus sequence output on/off. If set to on, the consensus sequence will be outputted in a separate file in Fasta format. The user can select -program=none if no more processing is needed.
    • -cgaps: Toggle gaps in consensus sequence on/off. If set to on, the consensus sequence will include gaps if gap is the most abundant character in the column it occures. If set to off, no gaps will be included in the consensus; for the columns that the gap character is the most abundant one, the second most abundant character in that column will be reported in the consensus.
    • -include: Toggle including consensus in output on/off. If set to on, the consensus sequence will be included in all output files, as if it is one of the input sequences.
    • -rmasker_opts, -cmatch_opts, and sim4_opts: Override default options for repeatmasker, cross_match and sim4 respectively.
      Example : -rmasker_opts="-no_is -nolow -q -alu".
    • -defaults: See defaults for options.
    • -h: Help.
    • -V: Verbose (will generate lots of screendump).
    • -v: Version.
  7. Sample Run:
    The input alignment file used in this sample can be downloaded here. This sample run uses repeatmasker as the external program, and NPIP file as exonfile. Sliding window size is determined as 1000 bases, where each window is moved by 100 bases. Two runs are executed:
 

 

 

     
sfu.ca Computational
Biology Bioinformatics Home