@@ -11,85 +11,84 @@ rss_descr = "Solving Rosalind problem CONS — finding a consensus string from a
1111
1212> ** The Problem**
1313>
14- > A matrix is a rectangular table of values divided into rows and columns.
15- > An m×n matrix has m rows and n columns.
16- > Given a matrix A, we write Ai,j.
17- > to indicate the value found at the intersection of row i and column j.
18-
19- > Say that we have a collection of DNA strings,
20- > all having the same length n.
21- > Their profile matrix is a 4×n matrix P in which P1,
22- > j represents the number of times that 'A' occurs in the jth position of one of the strings,
23- > P2,j represents the number of times that C occurs in the jth position,
24- > and so on (see below).
25-
26- > A consensus string c is a string of length n
27- > formed from our collection by taking the most common symbol at each position;
28- > the jth symbol of c therefore corresponds to the symbol having the maximum value
29- > in the j-th column of the profile matrix.
30- > Of course, there may be more than one most common symbol,
31- > leading to multiple possible consensus strings.
14+ > A matrix is a rectangular table of values divided into rows and columns.
15+ > An m×n matrix has m rows and n columns.
16+ > Given a matrix A, we write Ai,j.
17+ > to indicate the value found at the intersection of row i and column j.
18+
19+ > Say that we have a collection of DNA strings,
20+ > all having the same length n.
21+ > Their profile matrix is a 4×n matrix P in which P1,
22+ > j represents the number of times that 'A' occurs in the jth position of one of the strings,
23+ > P2,j represents the number of times that C occurs in the jth position,
24+ > and so on (see below).
25+
26+ > A consensus string c is a string of length n
27+ > formed from our collection by taking the most common symbol at each position;
28+ > the jth symbol of c therefore corresponds to the symbol having the maximum value
29+ > in the j-th column of the profile matrix.
30+ > Of course, there may be more than one most common symbol,
31+ > leading to multiple possible consensus strings.
3232>
33- > ### DNA Strings
34- > ```
35- > A T C C A G C T
36- > G G G C A A C T
37- > A T G G A T C T
38- > A A G C A A C C
39- > T T G G A A C T
40- > A T G C C A T T
41- > A T G G C A C T
42- > ```
33+ > ** DNA Strings**
34+ > ```
35+ > A T C C A G C T
36+ > G G G C A A C T
37+ > A T G G A T C T
38+ > A A G C A A C C
39+ > T T G G A A C T
40+ > A T G C C A T T
41+ > A T G G C A C T
42+ > ```
4343>
44- > ### Profile
45- > ```
46- > A 5 1 0 0 5 5 0 0
47- > C 0 0 1 4 2 0 6 1
48- > G 1 1 6 3 0 1 0 0
49- > T 1 5 0 0 0 1 1 6
50- > ```
44+ > ** Profile**
45+ > ```
46+ > A 5 1 0 0 5 5 0 0
47+ > C 0 0 1 4 2 0 6 1
48+ > G 1 1 6 3 0 1 0 0
49+ > T 1 5 0 0 0 1 1 6
50+ > ```
5151>
52- > ### Consensus
53- > ```A T G C A A C T```
52+ > **Consensus**
53+ > ```
54+ > A T G C A A C T
55+ > ```
5456>
55- > **Given:**
56- > A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
57+ > **Given:**
58+ > A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
5759>
58- > **Return:**
59- > A consensus string and profile matrix for the collection.
60- > (If several possible consensus strings exist,
61- > then you may return any one of them.)
60+ > **Return:**
61+ > A consensus string and profile matrix for the collection.
62+ > (If several possible consensus strings exist,
63+ > then you may return any one of them.)
6264>
63- > **Sample Dataset***
64- >
65- > ```
66- > >Rosalind_1
67- > ATCCAGCT
68- > >Rosalind_2
69- > GGGCAACT
70- > >Rosalind_3
71- > ATGGATCT
72- > >Rosalind_4
73- > AAGCAACC
74- > >Rosalind_5
75- > TTGGAACT
76- > >Rosalind_6
77- > ATGCCATT
78- > >Rosalind_7
79- > ATGGCACT
80- > ```
81- >
82- > **Sample Output**
83- > ```
84- > ATGCAACT
85- > ```
86- >
87- > ```
88- > A: 5 1 0 0 5 5 0 0
89- > C: 0 0 1 4 2 0 6 1
90- > G: 1 1 6 3 0 1 0 0
91- > T: 1 5 0 0 0 1 1 6
92- > ```
65+ > **Sample Dataset**
66+ > ```
67+ > >Rosalind_1
68+ > ATCCAGCT
69+ > >Rosalind_2
70+ > GGGCAACT
71+ > >Rosalind_3
72+ > ATGGATCT
73+ > >Rosalind_4
74+ > AAGCAACC
75+ > > Rosalind_5
76+ > TTGGAACT
77+ > >Rosalind_6
78+ > ATGCCATT
79+ > >Rosalind_7
80+ > ATGGCACT
81+ > ```
82+ > **Sample Output**
83+ > ```
84+ > ATGCAACT
85+ > ```
86+ > ```
87+ > A: 5 1 0 0 5 5 0 0
88+ > C: 0 0 1 4 2 0 6 1
89+ > G: 1 1 6 3 0 1 0 0
90+ > T: 1 5 0 0 0 1 1 6
91+ > ```
9392
9493
9594The first thing we will need to do is read in the input fasta.
0 commit comments