28. Find The Index of The First Occurrence in a String - Explanation

Problem Link

Description

You are given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = "neetcodeneetcode", needle = "neet"

Output: 0

Explanation: "neet" occurs at index 0 and 8.
The first occurrence is at index 0, so we return 0.

Example 2:

Input: haystack = "neetcode", needle = "codem"

Output: -1

Constraints:

  • 1 <= haystack.length, needle.length <= 10,000
  • haystack and needle consist of only lowercase English characters.


Topics

Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • String Manipulation - Basic operations like substring comparison and character indexing
  • KMP Algorithm (Optional) - Understanding the Longest Proper Prefix Suffix (LPS) array for efficient pattern matching
  • Rolling Hash / Rabin-Karp (Optional) - Using hash functions to compare substrings in O(1) time

1. Brute Force

Intuition

The simplest way to find a substring is to try every possible starting position in the haystack. At each position, we compare characters one by one with the needle. If all characters match, we found our answer. If any character doesn't match, we move to the next starting position and try again.

Algorithm

  1. Iterate through each valid starting position i in the haystack (from 0 to n - m, where n is the haystack length and m is the needle length).
  2. For each starting position, compare characters of the haystack starting at i with characters of the needle.
  3. If all characters match (we reach the end of the needle), return the starting position i.
  4. If any character doesn't match, break out of the inner loop and try the next starting position.
  5. If no match is found after checking all positions, return -1.
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n, m = len(haystack), len(needle)
        for i in range(n - m + 1):
            j = 0
            while j < m:
                if haystack[i + j] != needle[j]:
                    break
                j += 1
            if j == m:
                return i
        return -1

Time & Space Complexity

  • Time complexity: O(nm)O(n * m)
  • Space complexity: O(1)O(1)

Where nn is the length of the string heystackheystack and mm is the length of the string needleneedle.


2. Knuth-Morris-Pratt (KMP) Algorithm

Intuition

The brute force approach wastes work by restarting from scratch after each mismatch. KMP improves this by preprocessing the needle to build a "longest proper prefix which is also suffix" (LPS) array. When a mismatch occurs, the LPS array tells us how many characters we can skip, leveraging the pattern structure to avoid redundant comparisons.

Algorithm

  1. Build the LPS array for the needle:
    • Initialize an array of the same length as the needle, filled with zeros.
    • Use two pointers: prevLPS tracks the length of the current longest prefix-suffix, and i scans through the needle.
    • If characters match, increment both pointers and store the value. If they don't match and prevLPS > 0, fall back to the previous LPS value.
  2. Search for the needle in the haystack:
    • Use pointer i for the haystack and j for the needle.
    • If characters match, advance both pointers.
    • If they don't match and j > 0, use the LPS array to determine where to continue matching in the needle.
    • If j reaches the needle length, we found a match; return i - m.
  3. Return -1 if no match is found.
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if needle == "": return 0
        lps = [0] * len(needle)

        prevLPS, i = 0, 1
        while i < len(needle):
            if needle[i] == needle[prevLPS]:
                lps[i] = prevLPS + 1
                prevLPS += 1
                i += 1
            elif prevLPS == 0:
                lps[i] = 0
                i += 1
            else:
                prevLPS = lps[prevLPS - 1]

        i = 0  # ptr for haystack
        j = 0  # ptr for needle
        while i < len(haystack):
            if haystack[i] == needle[j]:
                i, j = i + 1, j + 1
            else:
                if j == 0:
                    i += 1
                else:
                    j = lps[j - 1]

            if j == len(needle):
                return i - len(needle)

        return -1

Time & Space Complexity

  • Time complexity: O(n+m)O(n + m)
  • Space complexity: O(m)O(m)

Where nn is the length of the string heystackheystack and mm is the length of the string needleneedle.


3. Z-Algorithm

Intuition

The Z-algorithm computes, for each position in a string, the length of the longest substring starting from that position that matches a prefix of the string. By concatenating the needle, a separator character, and the haystack, any position where the Z-value equals the needle length indicates a match. The algorithm uses a "Z-box" to track previously computed matches and skip redundant comparisons.

Algorithm

  1. Concatenate the needle, a special separator (like $), and the haystack into a single string.
  2. Compute the Z-array for this combined string:
    • Maintain a Z-box defined by [l, r] representing the rightmost substring that matches a prefix.
    • For each position i, use the Z-box to initialize z[i] when possible.
    • Extend z[i] by comparing characters directly.
    • Update the Z-box if the current match extends beyond the previous bounds.
  3. Scan the Z-array starting after the needle and separator. If z[i] equals the needle length, return the corresponding position in the haystack.
  4. Return -1 if no match is found.
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if not needle:
            return 0

        s = needle + "$" + haystack
        n = len(s)
        z = [0] * n
        l, r = 0, 0

        for i in range(1, n):
            if i <= r:
                z[i] = min(r - i + 1, z[i - l])
            while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                z[i] += 1
            if i + z[i] - 1 > r:
                l, r = i, i + z[i] - 1

        for i in range(len(needle) + 1, n):
            if z[i] == len(needle):
                return i - len(needle) - 1

        return -1

Time & Space Complexity

  • Time complexity: O(n+m)O(n + m)
  • Space complexity: O(n+m)O(n + m)

Where nn is the length of the string heystackheystack and mm is the length of the string needleneedle.


4. Rabin-Karp Algorithm (Rolling Hash)

Intuition

Instead of comparing characters one by one, we can compare hash values of substrings. If we compute the hash of the needle and the hash of each window in the haystack, matching hashes suggest a potential match. The key insight is using a rolling hash: when sliding the window by one position, we can update the hash in O(1) time by removing the contribution of the outgoing character and adding the incoming one. Using two different hash functions reduces false positives.

Algorithm

  1. Compute the hash of the needle using two different hash functions (different bases and moduli).
  2. Compute the hash of the first window in the haystack (same length as the needle).
  3. Precompute the power values needed to remove the leftmost character's contribution.
  4. Slide the window through the haystack:
    • If both hashes match the needle's hashes, return the current position.
    • Update the rolling hash by removing the leftmost character and adding the new rightmost character.
    • Handle negative values from the modulo operation.
  5. Return -1 if no match is found.
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if not needle:
            return 0

        base1, mod1 = 31, 768258391
        base2, mod2 = 37, 685683731

        n, m = len(haystack), len(needle)
        if m > n:
            return -1

        power1, power2 = 1, 1
        for _ in range(m):
            power1 = (power1 * base1) % mod1
            power2 = (power2 * base2) % mod2

        needle_hash1, needle_hash2 = 0, 0
        haystack_hash1, haystack_hash2 = 0, 0

        for i in range(m):
            needle_hash1 = (needle_hash1 * base1 + ord(needle[i])) % mod1
            needle_hash2 = (needle_hash2 * base2 + ord(needle[i])) % mod2
            haystack_hash1 = (haystack_hash1 * base1 + ord(haystack[i])) % mod1
            haystack_hash2 = (haystack_hash2 * base2 + ord(haystack[i])) % mod2

        for i in range(n - m + 1):
            if haystack_hash1 == needle_hash1 and haystack_hash2 == needle_hash2:
                return i

            if i + m < n:
                haystack_hash1 = (haystack_hash1 * base1 - ord(haystack[i]) * power1 + ord(haystack[i + m])) % mod1
                haystack_hash2 = (haystack_hash2 * base2 - ord(haystack[i]) * power2 + ord(haystack[i + m])) % mod2

                haystack_hash1 = (haystack_hash1 + mod1) % mod1
                haystack_hash2 = (haystack_hash2 + mod2) % mod2

        return -1

Time & Space Complexity

  • Time complexity: O(n+m)O(n + m)
  • Space complexity: O(1)O(1)

Where nn is the length of the string heystackheystack and mm is the length of the string needleneedle.


Common Pitfalls

Off-by-One Errors in Loop Bounds

When iterating through the haystack, the loop should run from 0 to n - m inclusive. A common mistake is using n - m - 1 or n - 1 as the upper bound. The former misses the case where the needle appears at the very end, while the latter causes out-of-bounds access when comparing characters.

Not Handling Empty Needle

When the needle is an empty string, the expected return value is 0 (the empty string is found at the beginning of any string). Forgetting to check for this edge case can lead to incorrect behavior or infinite loops in some implementations.