Leetcode 438. Find All Anagrams in a String
Clarification
Okay, so first I’ll restate the problem to make sure I understand it correctly. We are given two strings, s and p. We need to find all starting indices in s where the substring is an anagram of p. An anagram means it uses exactly the same characters with the same frequencies, but the order can be different.
Should I return a list of starting indices or the substrings?
If no valid answer exists, should I return an empty list?
What kind of characters do they have?
Can they be empty string? Can p be longer than s?
Brute-force Approach
A brute force solution would be: check every substring of length p.length(), count the characters, and compare it with p’s frequency count. But this gives us O(mn) because we have to recount every substring from scratch.
Optimized Approach
Since this is a substring problem, we can consider to use sliding window approach trying to optimize it.
What I’ll do is that, first keep a hash map to store the required characters and their amounts of p. I believe we can optimize the hash map into a bucket array because of the limited characters, but I'll save it later when I start to code. I'll maintain a window [left, right] with a fixed size as the same length of p. As I move the right pointer, I add the new character into the window. If the window size becomes larger than p, I remove the leftmost character. To check whether the window is valid or not, I'll need a variable count to store how many characters of p has been included in the window. Therefore, if count is equal to the length of p, the window should be valid.
So why does it work? There are three most important things for a sliding window: when to increase the window; when to shrink the window; when to update the result. For this question we use a fixed-size window. First we increase the window to the length of p, required and count has been updated correspondingly. We can think of required like a chart that shows us what characters need and their amounts to form a valid answer. Whenever we add a new character, we have to decrease its require by one. And if that character is in p, which can be captured by looking into the required array to see if it has a positive value, we need to increase count by one to show that our window gets a character of p. If it's a valid window b comparing count and p.length(), we update the result. And then we move the left pointer by 1 to keep sliding our fixed-size window. We also need to update required and count to keep the information correct by doing the reverse things that we did when increasing the window.
Since both pointers only move forward and never backward, each character enters and leaves the window at most once, this gives us O(n) time. The buildDictionary function cost O(m) time. So the time complexity is O(m + n). The extra space we used is the required array which contains m elements. So the space complexity is O(m).
Code
class Solution {
public List<Integer> findAnagrams(String s, String p) {
int left = 0, right = 0;
int[] required = buildDictionary(p);
int count = 0;
List<Integer> result = new ArrayList<>();
while (right < s.length()) {
while (right < s.length() && right - left < p.length()) {
char cr = s.charAt(right);
if (required[cr] > 0) count++;
required[cr]--;
right++;
}
if (count == p.length())
result.add(left);
char cl = s.charAt(left);
if (required[cl] >= 0) count--;
required[cl]++;
left++;
}
return result;
}
private int[] buildDictionary(String p) {
int[] required = new int[128];
for (char c : p.toCharArray())
required[c]++;
return required;
}
}