RAPID-2, An Object Oriented Associative Memory Applicable to Genome Data Processing

Denis Archambaud, Pascal Faudemay

IBP-Masi 1993/32: Rapport de Recherche Masi / Masi research reports
pages - Novembre/November 1999 - Document en anglais.

PostScript : 50 Ko /Kb

Titre / Title: RAPID-2, An Object Oriented Associative Memory Applicable to Genome Data Processing


Résumé :

Abstract : The RAPID-2 system is a set-associative memory organized into virtual circuits.
Such an architecture implies a high flexibility for data storage (structured
data with various types and sizes are supported), and a high-speed execution
possibilities since set associativity involves massive parallelism and an SIMD
organization.

These two features make Rapid-2 a general purpose circuit that can speed-up
applications needing parallel processing over data. Some genome specific
applications can benefit from Rapid -2 to make possible some tasks that
required too much execution time so far.

DNA or proteins sequences alignment is a very important application in nowadays research in biology. Standard sequences alignment usually entails software implementation with disastrous execution time (typically one year). A parallel implementation can solve such a problem. We present the Rapid-2 circuit architecture and functionalities and we detail the implementation of genome sequences alignment. We programmed different algorithms with progressive complexity, adapted to different contexts and requirements.
The first method we programmed is a mere distance calculation between a pattern and a sequence. This distance correspond to the number of pairs to change or delete or insert in the pattern to make it matching with the reference sequence.
This method needs n^2 comparisons (n being the number of pairs in the sequence), a systolic implementation using Rapid-2 parallelism reduces the complexity to 2.n comparisons. We then improved the distance formula to run the Needleman-Wunsch algorithm with gap detection and pair phases calculation. The Dayhoff mutation matrix encoding the distance between all possible pairs can be handled in the circuit with high parallelism. Such complex formulas entirely run in the memory, with minimum transfer between the host computer and the memory.
Execution time evaluations of all these methods are presented. We consider that Rapid-2 could improve nowadays software implementation of a factor 100.


Publications internes Masi 1993 / Masi research reports 1993