DNA Sequencer (aka "Genomatic" )
Summary
Develop
an FPGA-based design that reads data from an embedded memory and
determines the frequency of occurrence of substrings defined in a
second memory. The frequency of each substring is recorded in
individual registers which can be accessed by switches on the Zybo board.
Technical Details
A set of substrings ("codons") will be defined in a coefficients (COE) file for a 32x4-bit memory (the Codon Memory).
Each substring will consist of between one and five nibbles (4-bit
"nucleotides"), delimited in memory by a single 'F', and there will be a
maximum of six codons defined. Remaining locations will be filled with 'F'.
A second memory, also initialized by a COE file, will contain the data to be read. This is known as the Gene Memory and will be organized as a 256x4-bit memory.
Upon
reset, the FPGA design will begin accessing the memories and
determine the number of occurrences of each codon within the gene.
The end condition of the Gene Memory will be indicated by two consecutive
nibbles of value 'F'. (Remaining entries are undefined.) The FPGA will analyze the Gene Memory to
determine the frequency of occurrence of codons defined in the 32x4-bit
memory. Upon completion, the FPGA will light a DONE light, at which
point the codon frequency (i.e., count) are available through
internal Registers 1-6. The contents of a specifc register can be
determined by applying the appropriate register ID using switches two
through zero. The status of the DONE signal is available on LED0 through "virtual" register zero.
Constraints
- Definitions of codons and the gene start at address zero.
- Codons will consist of between one and five nibbles.
- No codon will contain the nibble 'F', although that value can occur within the gene
- Any given nucleotide will appear in at most one codon, i.e., codons won't share nucleotides
- Any given nucleotide can only appear at most once in a given codon
- The frequency of occurrence of any codon will not exceed 15
- Total length of the gene will be no larger than 254 nibbles and the end of the gene is
indicated by two consecutive nibbles of value 'F'. The remainder of the gene is undefined.
- Partial codons may occur in the gene and should not be counted.
Sample COE files
The sample files are located here. Enjoy!