Before attempting this problem, you should be comfortable with:
Sliding Window - Maintaining a fixed-size window that slides across the string to extract substrings
Hash Set / Hash Map - Tracking seen sequences and counting occurrences for duplicate detection
String Manipulation - Extracting substrings efficiently for comparison and storage
1. Hash Set
Intuition
We need to find all 10-letter sequences that appear more than once. A hash set naturally tracks what we have seen before. As we slide a window of length 10 across the string with index l, we check if the current substring cur was already encountered. If so, it is a repeated sequence. Using two sets (one for seen sequences and one for res results) avoids adding duplicates to our answer.
Algorithm
Initialize two sets: seen to track encountered sequences and res to store repeated ones.
Iterate through the string with a sliding window of size 10 using index l.
For each position, extract the 10-character substring cur.
classSolution:deffindRepeatedDnaSequences(self, s:str)-> List[str]:
seen, res =set(),set()for l inrange(len(s)-9):
cur = s[l: l +10]if cur in seen:
res.add(cur)
seen.add(cur)returnlist(res)
publicclassSolution{publicList<String>findRepeatedDnaSequences(String s){Set<String> seen =newHashSet<>();Set<String> res =newHashSet<>();for(int l =0; l < s.length()-9; l++){String cur = s.substring(l, l +10);if(seen.contains(cur)){
res.add(cur);}
seen.add(cur);}returnnewArrayList<>(res);}}
classSolution{public:
vector<string>findRepeatedDnaSequences(string s){if(s.size()<10)return{};
unordered_set<string> seen, res;for(int l =0; l < s.size()-9; l++){
string cur = s.substr(l,10);if(seen.count(cur)){
res.insert(cur);}
seen.insert(cur);}returnvector<string>(res.begin(), res.end());}};
classSolution{/**
* @param {string} s
* @return {string[]}
*/findRepeatedDnaSequences(s){const seen =newSet();const res =newSet();for(let l =0; l < s.length -9; l++){const cur = s.substring(l, l +10);if(seen.has(cur)){
res.add(cur);}
seen.add(cur);}return Array.from(res);}}
publicclassSolution{publicIList<string>FindRepeatedDnaSequences(string s){var seen =newHashSet<string>();var res =newHashSet<string>();for(int l =0; l < s.Length -9; l++){string cur = s.Substring(l,10);if(seen.Contains(cur)){
res.Add(cur);}
seen.Add(cur);}return res.ToList();}}
funcfindRepeatedDnaSequences(s string)[]string{
seen :=make(map[string]bool)
res :=make(map[string]bool)for l :=0; l <len(s)-9; l++{
cur := s[l : l+10]if seen[cur]{
res[cur]=true}
seen[cur]=true}
result :=[]string{}for k :=range res {
result =append(result, k)}return result
}
class Solution {funfindRepeatedDnaSequences(s: String): List<String>{val seen = HashSet<String>()val res = HashSet<String>()for(l in0 until s.length -9){val cur = s.substring(l, l +10)if(cur in seen){
res.add(cur)}
seen.add(cur)}return res.toList()}}
classSolution{funcfindRepeatedDnaSequences(_ s:String)->[String]{var seen =Set<String>()var res =Set<String>()let chars =Array(s)for l in0..<(chars.count -9){let cur =String(chars[l..<(l +10)])if seen.contains(cur){
res.insert(cur)}
seen.insert(cur)}returnArray(res)}}
Time & Space Complexity
Time complexity: O(n)
Space complexity: O(n)
2. Hash Map
Intuition
Instead of using two sets, we can use a hash map mp to count occurrences of each sequence. This lets us add a sequence to the res exactly when its count reaches 2, ensuring we only report it once regardless of how many additional times it appears.
Algorithm
If the string length is less than 10, return an empty list.
Initialize a hash map mp to count occurrences and a result list res.
Slide a window of size 10 across the string using index l.
For each substring cur, increment its count in mp.
When the count becomes exactly 2, add the substring to res.
classSolution:deffindRepeatedDnaSequences(self, s:str)-> List[str]:iflen(s)<10:return[]
mp = defaultdict(int)
res =[]for l inrange(len(s)-9):
cur = s[l: l +10]
mp[cur]+=1if mp[cur]==2:
res.append(cur)return res
publicclassSolution{publicList<String>findRepeatedDnaSequences(String s){if(s.length()<10){returnnewArrayList<>();}Map<String,Integer> mp =newHashMap<>();List<String> res =newArrayList<>();for(int l =0; l < s.length()-9; l++){String cur = s.substring(l, l +10);
mp.put(cur, mp.getOrDefault(cur,0)+1);if(mp.get(cur)==2){
res.add(cur);}}return res;}}
classSolution{public:
vector<string>findRepeatedDnaSequences(string s){if(s.size()<10){return{};}
unordered_map<string,int> mp;
vector<string> res;for(int l =0; l < s.size()-9; l++){
string cur = s.substr(l,10);
mp[cur]++;if(mp[cur]==2){
res.push_back(cur);}}return res;}};
classSolution{/**
* @param {string} s
* @return {string[]}
*/findRepeatedDnaSequences(s){if(s.length <10){return[];}const mp =newMap();const res =[];for(let l =0; l <= s.length -10; l++){const cur = s.substring(l, l +10);
mp.set(cur,(mp.get(cur)||0)+1);if(mp.get(cur)===2){
res.push(cur);}}return res;}}
publicclassSolution{publicIList<string>FindRepeatedDnaSequences(string s){if(s.Length <10){returnnewList<string>();}var mp =newDictionary<string,int>();var res =newList<string>();for(int l =0; l <= s.Length -10; l++){string cur = s.Substring(l,10);if(!mp.ContainsKey(cur)) mp[cur]=0;
mp[cur]++;if(mp[cur]==2){
res.Add(cur);}}return res;}}
funcfindRepeatedDnaSequences(s string)[]string{iflen(s)<10{return[]string{}}
mp :=make(map[string]int)
res :=[]string{}for l :=0; l <=len(s)-10; l++{
cur := s[l : l+10]
mp[cur]++if mp[cur]==2{
res =append(res, cur)}}return res
}
class Solution {funfindRepeatedDnaSequences(s: String): List<String>{if(s.length <10){returnemptyList()}val mp = HashMap<String, Int>()val res = mutableListOf<String>()for(l in0..s.length -10){val cur = s.substring(l, l +10)
mp[cur]= mp.getOrDefault(cur,0)+1if(mp[cur]==2){
res.add(cur)}}return res
}}
classSolution{funcfindRepeatedDnaSequences(_ s:String)->[String]{if s.count <10{return[]}var mp =[String:Int]()var res =[String]()let chars =Array(s)for l in0...(chars.count -10){let cur =String(chars[l..<(l +10)])
mp[cur,default:0]+=1if mp[cur]==2{
res.append(cur)}}return res
}}
Time & Space Complexity
Time complexity: O(n)
Space complexity: O(n)
3. Rabin-Karp Algorithm (Double Hashing)
Intuition
Storing full 10-character strings as keys can be memory intensive. The Rabin-Karp algorithm computes a rolling hash for each window, allowing us to represent each sequence as a number instead. Double hashing (using two different hash bases) minimizes collision probability, making numeric comparisons reliable. As the window slides with index i, we efficiently update the hashes hash1 and hash2 by removing the contribution of the outgoing character and adding the incoming one.
Algorithm
If the string length is less than 10, return an empty list.
Precompute the power values power1 and power2 for both hash bases (raised to the 9th power for the leading digit).
Initialize two rolling hashes hash1 and hash2 to zero.
Iterate through the string with index i, updating both hashes for each character.
Once the window reaches size 10, combine both hashes into a single key.
Use a hash map cnt to count occurrences of each key.
When a key appears the second time, add the corresponding substring to res.
Slide the window by subtracting the outgoing character's hash contribution.
DNA sequences use only four characters (A, C, G, T), each of which can be encoded with just 2 bits. A 10-character sequence therefore fits in 20 bits, well within a single integer. By treating each sequence as a mask, we avoid storing strings entirely and get fast integer operations for comparisons and hashing.
Algorithm
If the string length is less than 10, return an empty list.
Map each nucleotide to a 2-bit value: A=0, C=1, G=2, T=3 in map mp.
Initialize a mask to zero and a hash map cnt for counting.
For each character at index i, shift the mask left by 2 bits and OR with the character's value.
Mask the result with 0xFFFFF to keep only the rightmost 20 bits (10 characters).
Once the window reaches size 10, increment the count for this mask.
When a mask appears the second time, add the corresponding substring to res.
classSolution:deffindRepeatedDnaSequences(self, s:str)->list[str]:iflen(s)<10:return[]
mp ={'A':0,'C':1,'G':2,'T':3}
seen, res =set(),set()
mask =0for i inrange(len(s)):
mask =((mask <<2)| mp[s[i]])&0xFFFFFif i >=9:if mask in seen:
res.add(s[i -9: i +1])else:
seen.add(mask)returnlist(res)
publicclassSolution{publicList<String>findRepeatedDnaSequences(String s){if(s.length()<10)returnnewArrayList<>();Map<Character,Integer> mp =Map.of('A',0,'C',1,'G',2,'T',3);Map<Integer,Integer> cnt =newHashMap<>();List<String> res =newArrayList<>();int mask =0;for(int i =0; i < s.length(); i++){
mask =((mask <<2)| mp.get(s.charAt(i)))&0xFFFFF;if(i >=9){
cnt.put(mask, cnt.getOrDefault(mask,0)+1);if(cnt.get(mask)==2){
res.add(s.substring(i -9, i +1));}}}return res;}}
classSolution{/**
* @param {string} s
* @return {string[]}
*/findRepeatedDnaSequences(s){if(s.length <10)return[];const mp ={A:0,C:1,G:2,T:3};const cnt =newMap();const res =[];let mask =0;for(let i =0; i < s.length; i++){
mask =((mask <<2)| mp[s[i]])&0xfffff;if(i >=9){
cnt.set(mask,(cnt.get(mask)||0)+1);if(cnt.get(mask)===2){
res.push(s.substring(i -9, i +1));}}}return res;}}
publicclassSolution{publicIList<string>FindRepeatedDnaSequences(string s){if(s.Length <10)returnnewList<string>();var mp =newDictionary<char,int>{{'A',0},{'C',1},{'G',2},{'T',3}};var cnt =newDictionary<int,int>();var res =newList<string>();int mask =0;for(int i =0; i < s.Length; i++){
mask =((mask <<2)| mp[s[i]])&0xFFFFF;if(i >=9){if(!cnt.ContainsKey(mask)) cnt[mask]=0;
cnt[mask]++;if(cnt[mask]==2){
res.Add(s.Substring(i -9,10));}}}return res;}}
funcfindRepeatedDnaSequences(s string)[]string{iflen(s)<10{return[]string{}}
mp :=map[byte]int{'A':0,'C':1,'G':2,'T':3}
cnt :=make(map[int]int)
res :=[]string{}
mask :=0for i :=0; i <len(s); i++{
mask =((mask <<2)| mp[s[i]])&0xFFFFFif i >=9{
cnt[mask]++if cnt[mask]==2{
res =append(res, s[i-9:i+1])}}}return res
}
class Solution {funfindRepeatedDnaSequences(s: String): List<String>{if(s.length <10)returnemptyList()val mp =mapOf('A'to0,'C'to1,'G'to2,'T'to3)val cnt = HashMap<Int, Int>()val res = mutableListOf<String>()var mask =0for(i in s.indices){
mask =((mask shl2)or mp[s[i]]!!)and0xFFFFFif(i >=9){
cnt[mask]= cnt.getOrDefault(mask,0)+1if(cnt[mask]==2){
res.add(s.substring(i -9, i +1))}}}return res
}}
classSolution{funcfindRepeatedDnaSequences(_ s:String)->[String]{let chars =Array(s)if chars.count <10{return[]}let mp:[Character:Int]=["A":0,"C":1,"G":2,"T":3]var cnt =[Int:Int]()var res =[String]()var mask =0for i in0..<chars.count {
mask =((mask <<2)| mp[chars[i]]!)&0xFFFFFif i >=9{
cnt[mask,default:0]+=1if cnt[mask]==2{
res.append(String(chars[(i -9)...(i)]))}}}return res
}}
Time & Space Complexity
Time complexity: O(n)
Space complexity: O(n)
Common Pitfalls
Adding Duplicates to the Result
When a sequence appears three or more times, it should only be added to the result once. Using a single set without tracking whether the sequence was already added to the result causes duplicates in the output.
Off-by-One Errors in Window Boundaries
The sliding window must extract exactly 10 characters. Common mistakes include using s[l:l+9] instead of s[l:l+10], or iterating with range(len(s) - 10) instead of range(len(s) - 9), which skips valid windows or causes index errors.
Incorrect Bit Mask Width in the Optimized Approach
When using bit manipulation, each nucleotide requires 2 bits, so a 10-character sequence needs 20 bits. Using the wrong mask value (such as 0xFFFF for 16 bits instead of 0xFFFFF for 20 bits) causes hash collisions between different sequences.