Rolling hash is simply a drawstring hashing method that updates a substring hash successful changeless clip arsenic the model moves. It matters because hunt systems, plagiarism detectors, log scanners, and DNA series devices often request accelerated shape matching complete millions of characters. After reading, you tin instrumentality it safely and explicate its trade-offs.
String algorithms usually comparison characters 1 by one, but hashing converts substrings into compact numeric fingerprints. Rolling hash builds connected hashing and powers the Rabin-Karp algorithm, substring equality checks, copy detection, palindrome queries, and two-dimensional shape matching successful grids aliases images.
You will beryllium capable to take bases and moduli, compute prefix hashes, tally Rabin-Karp, grip collisions, comparison substrings successful O(1), and reply communal GATE-style and question and reply questions pinch clear complexity analysis.
Core Concepts
Rolling hash is not 1 isolated trick. It combines drawstring encoding, polynomial hashing, prefix precomputation, modular arithmetic, sliding model updates, and collision-aware verification. The array beneath covers the modular variants and usage cases you should cognize earlier implementing a production-quality aliases interview-ready solution.
1.Character Encoding
String hashing originates by converting characters into integers. The hash usability does not understand letters directly; it useful connected numbers. For lowercase English letters, a communal mapping is simply a = 1, b = 2, and truthful on. For wide Unicode text, utilizing the codification constituent pinch ord() is safer, though it whitethorn require normalization erstwhile visually identical matter tin person different byte representations.
A acquainted illustration is searching for a customer sanction wrong a CRM export: the connection “Riya” must first go numbers earlier hashing tin begin. An industry-specific illustration is scanning infirmary discharge summaries wherever tokens specified arsenic “HbA1c” and “insulin” should beryllium encoded consistently earlier aesculapian grounds deduplication.
Never representation a valid characteristic to zero successful polynomial hashing unless you afloat understand the consequence. Leading zero values tin make different strings easier to confuse, particularly erstwhile prefixes are involved.Code Example
2.Polynomial Hashing
Polynomial hashing treats a drawstring for illustration a number written successful a civilization base. For a drawstring specified arsenic “abc”, 1 communal shape is 1 × p² + 2 × p¹ + 3 × p⁰, computed nether a ample modulus to support numbers small. The guidelines spreads characteristic positions apart, while the modulus prevents integer overflow and keeps operations efficient.
A acquainted illustration is checking whether 2 saved coupon codes successful an e-commerce app are apt identical earlier doing a afloat drawstring comparison. A much industry-specific illustration is matching invoice reference strings successful banking reconciliation, wherever millions of transaction remarks whitethorn request speedy equality checks.
A modular GATE/interview mobility asks for the clip complexity of hashing a drawstring of magnitude n. Computing 1 polynomial hash from scratch is O(n); comparing already computed hash values is O(1), taxable to collision handling.Code Example
3.Prefix Hashes
Prefix hashes precompute the hash of each prefix of a string, allowing immoderate substring hash to beryllium extracted successful O(1). If H[i] stores the hash of the first one characters, the substring from l to r tin beryllium obtained by subtracting the earlier prefix aft aligning powers of the base.
A acquainted illustration is checking repeated phrases wrong a agelong WhatsApp export without comparing each characteristic each time. A SaaS-specific illustration is comparing way patterns successful API gateway logs, wherever substrings of petition paths specified arsenic “/v1/payments” must beryllium matched quickly.
The astir communal prefix-hash bug is forgetting to multiply the near prefix by the correct guidelines powerfulness earlier subtraction. The look must align some hashes to the aforesaid positional scale.Code Example
4.Rolling Window Hash
A rolling model hash updates the hash of a fixed-length substring erstwhile the model shifts by 1 character. Instead of recomputing the hash for each model successful O(m), it removes the outgoing character, shifts the remaining contribution, and adds the incoming character. This is the halfway capacity betterment down accelerated shape matching.
A acquainted illustration is detecting whether a short OTP template appears wrong SMS matter while scanning inbox messages. An observability illustration is searching a repeated correction signature wrong ample server logs streamed from Kubernetes pods.
For a model of magnitude m, precompute base^(m-1). That worth is needed to region the outgoing characteristic publication earlier shifting the window.Code Example
5.Rabin-Karp Algorithm
The Rabin-Karp algorithm uses rolling hash to hunt a shape of magnitude m wrong a matter of magnitude n. It hashes the shape once, past rolls a model of size m crossed the text. When the hash matches, it verifies the existent substring to debar mendacious positives caused by collisions.
A acquainted illustration is uncovering a train number shape wrong an IRCTC notification message. A content-platform illustration is scanning uploaded article matter for a copyrighted paragraph snippet earlier publication.
The modular answer: Rabin-Karp has expected O(n + m) clip pinch a bully hash and verification only connected hash matches. The worst lawsuit is O(nm) erstwhile galore collisions unit repeated afloat comparisons.Code Example
6.Collision Handling
A collision occurs erstwhile 2 different strings nutrient the aforesaid hash. Hashing is accelerated because it compresses a perchance agelong drawstring into a fixed-size value, but that compression intends collisions are mathematically possible. Correct shape matching codification must either verify campaigner substrings aliases usage a collision-reduction strategy specified arsenic double hashing.
A acquainted illustration is 2 different nutrient point names successful a Zomato paper producing the aforesaid fingerprint by chance. A banking-security illustration is simply a fraud scanner comparing transaction remarks; a hash lucifer should trigger verification, not automatic classification arsenic identical text.
Never opportunity rolling hash guarantees equality. Equal strings person adjacent hashes, but adjacent hashes do not ever beryllium adjacent strings unless the original substring is besides verified aliases a collision-free domain-specific method is used.Code Example
7.Double Hashing
Double hashing computes 2 independent hash values, usually pinch different ample premier moduli. A substring is considered a campaigner lucifer only erstwhile some hashes match. This does not make collisions impossible, but it makes accidental collisions acold little apt than a azygous hash successful emblematic programming-contest and question and reply settings.
A acquainted illustration is comparing PAN-like identifiers successful a cleaned dataset wherever 1 hash whitethorn beryllium excessively risky for deduplication. An ed-tech illustration is detecting copy reply snippets crossed thousands of submissions, wherever mendacious matches tin unfairly emblem students unless the strategy verifies aliases uses stronger hashing.
Double hashing is simply a applicable compromise: faster than afloat comparison for each model and safer than 1 modulus. In security-sensitive applications, usage cryptographic hashes aliases nonstop verification alternatively of relying connected polynomial hashes alone.Code Example
8.Base And Modulus
The guidelines and modulus power some distribution and performance. A mini guidelines tin create much building successful the hash values, while a mediocre modulus tin summation collisions. Common believe is to usage a ample random-looking guidelines and 1 aliases 2 ample premier moduli. In C++, unsigned 64-bit overflow is besides communal because arithmetic people wraps astir modulo 2^64.
A acquainted illustration is hashing merchandise SKUs successful an online market catalogue wherever galore strings stock prefixes specified arsenic “MILK”. A healthcare-analytics illustration is hashing lab-test codes wherever systematic prefixes for illustration “CBC” and “LFT” make bully distribution important.
Use the aforesaid base, modulus, and characteristic mapping for each hash you compare. Hashes computed nether different parameters are not comparable.Code Example
9.Multiple Pattern Search
Rabin-Karp tin hunt aggregate patterns efficiently erstwhile the patterns person the aforesaid length. Hash each shape and shop those hashes successful a group aliases dictionary. Then rotation 1 model crossed the matter and cheque whether the model hash belongs to the pattern-hash set. For different shape lengths, group patterns by magnitude and tally 1 rolling scan per group.
A acquainted illustration is scanning SMS messages for respective bank-alert phrases of adjacent length. A cybersecurity illustration is detecting a group of known malicious bid fragments successful ammunition history logs from unreality servers.
For k patterns of adjacent magnitude m, Rabin-Karp tin hash each patterns successful O(km) and scan the matter successful O(n), pinch verification connected campaigner matches. If shape lengths differ, group by length.Code Example
10.Palindrome Queries
Forward-reverse hashing answers palindrome queries by comparing the hash of a substring successful the original drawstring pinch the corresponding hash successful the reversed string. If some hashes match, the substring is simply a palindrome candidate. With double hashing aliases verification, this becomes a accelerated method for repeated palindrome checks.
A acquainted illustration is checking whether usernames incorporate palindromic segments for a puzzle app. A bioinformatics illustration is identifying reverse-complement-like patterns successful simplified DNA strings aft transforming symbols into a comparable representation.
Index conversion is the main palindrome-hash trap. A substring text[l:r] maps to reversed_text[n-r:n-l] successful the reversed string.Code Example
11.Longest Duplicate Substring
Rolling hash tin beryllium mixed pinch binary hunt to find the longest copy substring. The thought is to conjecture a length, hash each substring of that length, and cheque whether immoderate hash repeats pinch verification. If a copy exists, effort a longer length; otherwise, effort a shorter length.
A acquainted illustration is uncovering repeated clauses successful a rental statement draft. A publishing-platform illustration is locating copied paragraphs crossed uploaded manuscripts wherever repeated substrings tin bespeak reused content.
The tested shape is binary hunt connected reply positive rolling hash check. The emblematic expected complexity is O(n log n), assuming O(n) copy checking for each guessed length.Code Example
12.Two-Dimensional Hashing
Two-dimensional rolling hash extends the aforesaid thought to matrices. First hash horizontal sequences successful each row, past rotation vertically complete those row-hash values. This lets you hunt for a smaller grid shape wrong a larger grid, which is useful for images, committee games, maps, and system tables.
A acquainted illustration is uncovering a mini word-search artifact wrong a schoolhouse puzzle grid. An industry-specific illustration is detecting a repeated layout part successful screenshot testing for a fintech mobile app.
Two-dimensional hashing is conceptually conscionable nested rolling hash: 1 magnitude compresses rows, and the 2nd magnitude compresses the series of statement hashes.Code Example
13.Dynamic String Hashes
Static prefix hashes activity erstwhile the drawstring does not change. For editable strings, you request a information building that supports updates and range-hash queries, specified arsenic a Fenwick character aliases conception tree. Each characteristic contributes its worth multiplied by a positional power, and updates set only the affected position.
A acquainted illustration is checking a changing hunt query while a personification types successful a shopping app. A collaborative-document illustration is maintaining accelerated fingerprints of paragraphs arsenic aggregate editors modify shared argumentation documents.
For move strings, mention the trade-off clearly: prefix hashes springiness O(1) queries but costly updates; Fenwick aliases conception trees springiness O(log n) updates and O(log n) scope queries.Code Example
Rolling hash speeds up campaigner detection, not logical equality. For correctness, harvester it pinch accordant parameters, modular arithmetic discipline, and verification aliases double hashing erstwhile collisions matter.Complexity Summary
Rolling hash is celebrated because it separates preprocessing from querying. Once prefix hashes and powers are ready, substring hashes are constant-time. Rabin-Karp uses this spot to debar comparing the shape against each matter position characteristic by character.
Common Pitfalls
Most rolling hash bugs are not syntax errors; they are mathematical mismatches. Wrong scale ranges, antagonistic modulo results, inconsistent bases, aliases skipped verification tin nutrient answers that walk mini tests but neglect hidden cases.
A rolling hash implementation that passes sample tests tin still beryllium incorrect if it treats modulo subtraction otherwise crossed languages. Python normalizes pinch %, while C and C++ whitethorn request definitive summation of mod earlier taking %.Learning Path
Build rolling hash successful layers. Start pinch mean hashing and modular arithmetic, past move to prefix hashes, Rabin-Karp, collision handling, and precocious applications specified arsenic copy substring hunt and move updates.
Frequently Asked Questions
What is rolling hash?
Rolling hash is simply a hashing method that updates the hash of a moving substring without recomputing it from scratch. It is utilized successful accelerated shape matching, substring equality, copy detection, and algorithms specified arsenic Rabin-Karp.
What is the quality betwixt drawstring hashing and rolling hash?
String hashing converts a afloat drawstring aliases substring into a numeric fingerprint. Rolling hash is simply a circumstantial benignant of drawstring hashing wherever the fingerprint tin beryllium updated efficiently erstwhile a model slides by 1 position.
How does the Rabin-Karp algorithm usage rolling hash?
Rabin-Karp hashes the shape erstwhile and compares it pinch the rolling hash of each matter model of the aforesaid length. When hashes match, the algorithm verifies the existent substring to debar errors caused by collisions.
Is rolling hash amended than KMP?
Rolling hash is elastic and easy to widen to aggregate patterns, substring queries, and copy detection. KMP gives deterministic O(n + m) matching without collisions for a azygous pattern, while Rabin-Karp has expected O(n + m) but tin degrade pinch galore collisions.
Can 2 different strings person the aforesaid hash?
Yes. That business is called a collision. Polynomial hashing reduces the chance of collisions pinch bully parameters, and double hashing reduces it further, but correctness-sensitive codification should still verify campaigner matches.
Which guidelines and modulus should I choose?
Use a guidelines larger than the alphabet size aliases a ample random-looking base, and usage a ample premier modulus specified arsenic 1,000,000,007 aliases 1,000,000,009. For safer comparisons, usage 2 moduli and comparison the brace of hashes.
Why does Rabin-Karp person worst-case O(nm) time?
If galore matter windows person the aforesaid hash arsenic the pattern, the algorithm must verify galore campaigner substrings. Each verification tin return O(m), truthful repeated collisions tin origin O(nm) worst-case time.
When should I usage rolling hash successful interviews?
Use rolling hash erstwhile the problem asks for repeated substring comparison, copy substring search, plagiarism-style matching, palindrome queries, aliases aggregate same-length shape search. Always mention collision handling because interviewers often trial whether you cognize hashing is probabilistic.
Key Takeaways
Rolling hash turns substring comparison into accelerated numeric comparison by utilizing polynomial hashing, prefix hashes, and constant-time sliding-window updates. Rabin-Karp applies this thought to shape matching by hashing the shape erstwhile and rolling crossed the text. Double hashing, observant base-modulus selection, and nonstop verification are basal for collision-safe implementations.
For GATE and interviews, attraction connected the nonstop substring-hash formula, rolling update formula, expected O(n + m) Rabin-Karp complexity, worst-case O(nm) collision behaviour, and the quality betwixt hash equality and drawstring equality. Problems specified arsenic longest copy substring and palindrome queries often harvester rolling hash pinch binary hunt aliases reverse hashing.
The earthy adjacent measurement is to practise classical drawstring algorithms broadside by side: instrumentality naive search, Rabin-Karp, KMP, and prefix-hash substring queries connected the aforesaid trial cases, past comparison correctness, complexity, and collision behaviour.
English (US) ·
Indonesian (ID) ·