Problem
The DNA sequence is composed of a series of nucleotides abbreviated as 'A'
, 'C'
, 'G'
, and 'T'
.
- For example,
"ACGAATTCCG"
is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s
that represents a DNA sequence, return all the 10
-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
Constraints:
1 <= s.length <= 10^5
s[i]
is either'A'
,'C'
,'G'
, or'T'
.
Solution
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
if (s.length() < 10) {
return Collections.emptyList();
}
boolean[] seen = new boolean[1024 * 1024];
boolean[] added = new boolean[1024 * 1024];
char[] chars = s.toCharArray();
int buf = 0;
int[] map = new int[128];
map['A'] = 0;
map['C'] = 1;
map['G'] = 2;
map['T'] = 3;
List<String> ans = new ArrayList<>(s.length() / 2);
for (int i = 0; i < 10; i++) {
buf = (buf << 2) + map[chars[i]];
}
seen[buf] = true;
for (int i = 10; i < chars.length; i++) {
buf = ((buf << 2) & 0xFFFFF) + map[chars[i]];
if (seen[buf]) {
if (!added[buf]) {
ans.add(new String(chars, i - 9, 10));
added[buf] = true;
}
} else {
seen[buf] = true;
}
}
return ans;
}
}
Explain:
nope.
Complexity:
- Time complexity : O(n).
- Space complexity : O(n).