187. Repeated DNA Sequences

Difficulty:
Related Topics:
Similar Questions:

    Problem

    The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

    When studying DNA, it is useful to identify repeated sequences within the DNA.

    Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

      Example 1:

    Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
    Output: ["AAAAACCCCC","CCCCCAAAAA"]
    

    Example 2:

    Input: s = "AAAAAAAAAAAAA"
    Output: ["AAAAAAAAAA"]
    

      Constraints:

    Solution

    class Solution {
        public List<String> findRepeatedDnaSequences(String s) {
            if (s.length() < 10) {
                return Collections.emptyList();
            }
            boolean[] seen = new boolean[1024 * 1024];
            boolean[] added = new boolean[1024 * 1024];
            char[] chars = s.toCharArray();
            int buf = 0;
            int[] map = new int[128];
            map['A'] = 0;
            map['C'] = 1;
            map['G'] = 2;
            map['T'] = 3;
            List<String> ans = new ArrayList<>(s.length() / 2);
            for (int i = 0; i < 10; i++) {
                buf = (buf << 2) + map[chars[i]];
            }
            seen[buf] = true;
            for (int i = 10; i < chars.length; i++) {
                buf = ((buf << 2) & 0xFFFFF) + map[chars[i]];
                if (seen[buf]) {
                    if (!added[buf]) {
                        ans.add(new String(chars, i - 9, 10));
                        added[buf] = true;
                    }
                } else {
                    seen[buf] = true;
                }
            }
            return ans;
        }
    }
    

    Explain:

    nope.

    Complexity: