Algorithm & Data Structure - Rolling Hash

Rabin-Karp algorithm

Image how the binary string converts to an integer number: \(011101 = 0 \times 2^5 + 1 \times 2 ^ 4 + 1 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 = 29\)

What we did here is actually using an integer to uniquely represent an binary string, and it is like to take a hash of this binary string. Rabin-Karp hashing algorithm uses the same concept buth with more characters. Given a string \(s\), it’s hash could been taken by: \(Hash(s) = (p^{n-1} \times s_0 + p^{n-2} \times s_1 ..... + 1 \times s_n)\% mod\)

where p could be a number that more than the maximum character. Given the string might be very long, it is better to take modulus of some large prime number to avoid overflow.

Rolling Hash

Given a string \(p\), search whether it appears in a long string \(s\). A number of ways to solve this problem, but rolling hash could result in \(O(n)\) time complextity. Rolling hash is to calculate hash value of a sliding window (length of m) with fix length. The window add one character in front and remove one character at the back for every move, and thus we don’t need to recalculate the hash value every time, we only need to add a new value and remove one from the existing hash value. \(h_{i+1...j+1} = (h_{i...j} \times p - (p^{m-1} \times s_{i})\%mod + s_{j+1} + mod)\%mod\) We need to multiply the old hash value by \(p\), and less value from the first character \(s_i\), and then add the new hash value from the last character.

Rolling Hash/Array is able to reduce time complextity of numerous string or array related problems.

Examples

  • Leetcode - 1698

Leetcode-1698

Brutal force would require to generate all substrings and then put then into a HashSet. Note that HashSet need to calculate the hash of each of the substring, which takes \(O(n)\), with all substrings, the total complextity is \(O(n^3)\).

Two observations here:

  • If two substring are the same, then they must have the same length;
  • With fix length, we can use sliding window with rolling hash to save all substrings, and it takes only \(O(1)\).

Therefore, we could solve this by \(O(n^2)\)

 public int countDistinct(String s) {
        int count = 0;
        long prime = 261;

        long mod = (long)Math.pow(2, 40); // use a large enough number to avoid hash collision
        
        // fix substring window length to generate hash
        for (int len = 1; len <= s.length(); len++) {
            
            long h = 0;
            for (int i = 0; i < len; i++) {
                h = (h*prime + s.charAt(i))%mod;
            }
            Set<Long> seen = new HashSet<>();
            seen.add(h);
            count++;
            long p = 1; 
            for (int i = 0; i < len; i++) {
                p = (p*prime)%mod;
            }
            for (int i = 1; i + len <= s.length(); i++) {
                h = (h * prime - (p*s.charAt(i-1))%mod + s.charAt(i + len - 1) + mod)%mod;
                if (!seen.contains(h)){
                    count++;
                }
                seen.add(h);
            }
        }

        return count;
    }
  • Leetcode - 1392

Leetcode-1392

Rolling hash from both ends and compare if the two hashes are equal. Note that suffix substring’s hash need to calculated in a slightly different way.

public String longestPrefix(String s) {
        long prime = 131;
        long mod = (long)Math.pow(2,32);
        int n = s.length();

        long hl = 0, hr = 0;
        long p = 1;

        int maxLen = 0;
        for (int i = 0; i < n-1; i++) {
            hl = (hl*prime + s.charAt(i))%mod;

            // suffix substring hash value
            hr = (hr + s.charAt(n-i-1)*p)%mod;

            if (hl == hr) maxLen = i+1;

            p = (p*prime)%mod;
        }

        return s.substring(0, maxLen);      
        
    }
  • Leetcode-214

Leetcode-214

Palindrome string could also be detected by rolling hash, by calculating the hash value from begin to end and from end to begin. If both hashes are equal, then it is a palindrome string. With this idea, we could calculate the maximum palindrome length from the begin of the string, and obtain the shortest palindrome from there.

public String shortestPalindrome(String s) {
        long prime = 261;
        long mod = (long)Math.pow(2, 32);
        long hl = 0, hr = 0;
        
        long p = 1;
        int maxLen = 0;
        for (int i = 0; i < s.length(); i++) {
            hl = (hl*prime + s.charAt(i))%mod;
            hr = (p*s.charAt(i) + hr) %mod;
            p = (p*prime)%mod;
            if(hl == hr) {
                maxLen = Math.max(maxLen, i+1);
            }
        }
        
        StringBuilder ans = new StringBuilder(s);
        for (int i = maxLen; i < s.length(); i++) {
            ans.insert(0, s.charAt(i));
        }
        return ans.toString();
    }
  • Leetcode - 1923

Leetcode-1923

The longest common subpath does not exceed the shortest path in all the paths. We can use that as a base and test if its subpath is in all other friends’s path. Since the subpath need to be continous, if a subpath is in all paths, then all shorter subpath also exist, so we only need to test longer subpath. Binary search is a good way to save time in this scienario.

How do we verify if a subpath with long \(L\) is in all friends’s path? We can fix the length, and use sliding windows to test all subpaths with the same length \(L\), and verify if it exists in all friends’s paths. This reminders us to use Rolling Hash to save time.

public int longestCommonSubpath(int n, int[][] paths) {
        
        int mLen = Integer.MAX_VALUE;
        for (int[] path : paths) {
            mLen = Math.min(mLen, path.length);
        }
        
        int i = 1, j = mLen + 1;
        int res = 0;
        while (i < j) {
            int mid = i + (j - i)/2;
            if (isCommonPath(paths, mid)) {
                i = mid + 1;
                res = mid;
            } else {
                j = mid;
            }
        }
        
        return res;
    }
    
    private boolean isCommonPath(int[][] paths, int len) {
        long prime = 100003;
        long mod = (long)Math.pow(10, 11)+7;
        Map<Long, Integer> pathFreq = new HashMap<>();
        
        for (int[] path : paths) {
            long hash = 0;
            long p = 1;
            Set<Long> uniquePath = new HashSet<>();
            for (int i = 0; i < len; i++) {
                hash = (hash * prime + path[i])%mod;
                p = (p*prime)%mod;
            }
            uniquePath.add(hash);
            for (int i = 1; i + len <= path.length; i++) {
                hash = ((hash * prime)%mod - (path[i-1]*p)%mod + path[i+len-1] + mod) %mod;
                uniquePath.add(hash);
            }
            for (long h : uniquePath) {
                pathFreq.put(h, pathFreq.getOrDefault(h, 0) + 1);
                if (pathFreq.get(h) == paths.length) return true;
            }
            
        }
        return false;
    }
Written on September 24, 2021