The Bitap Algorithm, also referred to as the Shift-Or Algorithm, is a method for searching strings that effectively carries out approximate string matching. This technique is especially valuable for locating a specific pattern within a body of text even in cases where the pattern may contain errors or deviations. Developed in 1980, the Bitap Algorithm was created by John Colquhoun and Gaston Gonnet.
Example:
Let's consider a C++ program that demonstrates the implementation of the Bitap Algorithm for matching strings.
#include <string>
#include <map>
#include <iostream>
using namespace std;
int bitmap_search(string t, string p) {
int m = p.length();
long p_mask[300];
long A = ~1;
if (m == 0)
return -1;
if (m >63) {
cout<<"Pattern is too long!";//if pattern is too long
return -1;
}
for (int i = 0; i <= 299; ++i)
p_mask[i] = ~0;
for (int i = 0; i < m; ++i)
p_mask[p[i]] &= ~(1L << i);
for (int i = 0; i < t.length(); ++i) {
A |= p_mask[t[i]];
A <<= 1;
if ((A & (1L << m)) == 0)
return i - m + 1;
}
return -1;
}
void findPattern(string t, string p) {
int position = bitmap_search(t, p);//initialize the position with the function bitmap_search
if (position == -1)
cout << "\nNo Match\n";
else
cout << "\nPattern found at position : " << position;
}
int main(int argc, char **argv) {
cout << "Enter Text:\n";
string t;
cin >>t;
cout << "Enter Pattern:\n";
string p;
cin >>p;
findPattern(t, p);
}
Output:
Explanation:
This script demonstrates the utilization of the Bitap Algorithm for String Matching. The program includes the following variables: 't' for the input text, 'p' for the input pattern, 'm' indicating the pattern's length, 'p_mask' as an array containing bit masks for individual pattern characters, 'A' for performing bitwise operations, and 'position' indicating the location where the pattern is discovered in the text.
The function 'bitmap_search' is designed to accept the string 't' and the pattern 'p', returning the position of the pattern within the text. If the pattern is not found, it will return -1. This function leverages bit manipulation techniques to conduct the search operation. By creating bit masks for each character in the pattern, it then utilizes bitwise operations to determine the pattern's position within the text.
In addition to 'bitmapsearch', there is another function called the 'find pattern' function. This function takes the input text 't' and the pattern 'p', checking if the pattern exists within the text and displaying its position if found. The 'find pattern' function interacts with the 'bitmapsearch' function by invoking it and then presenting the result. In cases where the pattern is located, the function showcases the position. However, if the pattern is not found, it outputs 'No Match'.
The primary function accepts two string parameters, which represent the user's input string and pattern, respectively. Following this, it invokes the findpattern function to locate the specified pattern and present the outcome.
Example 2:
Let's explore an alternative approach to implementing the Bitap Algorithm for String Matching in C++.
#include <iostream>
#include <bitset>
#include <vector>
using namespace std;
const int ALPHABET_SIZE = 256;
class Bitap {
public:
Bitap(const string& pattern) : pattern_(pattern), patternLength_(pattern.length()) {
preprocess();
}
void search(const string& text) {
int textLength = text.length();
bitset<ALPHABET_SIZE> R(1);
for (int i = 0; i < textLength; ++i) {
R = ((R << 1) & textMasks_[text[i]]) | bitset<ALPHABET_SIZE>(1);
if (R.test(patternLength_ - 1)) {
cout << "Pattern found at index " << i - patternLength_ + 1 << endl;
}
}
}
private:
void preprocess() {
textMasks_.resize(ALPHABET_SIZE, 0);
for (int i = 0; i < patternLength_; ++i) {
textMasks_[pattern_[i]] |= bitset<ALPHABET_SIZE>(1) << i;
}
}
string pattern_;
int patternLength_;
vector<bitset<ALPHABET_SIZE>> textMasks_;
};
int main() {
string text, pattern;
cout << "Enter the text: ";
getline(cin, text);
cout << "Enter the pattern: ";
getline(cin, pattern);
Bitap bitap(pattern);
bitap.search(text);
return 0;
}
Output:
Explanation:
This software module showcases the implementation of the Bitap Algorithm, a powerful tool for string matching tasks. It efficiently scans through a given text to locate a specific pattern, even when the match is approximate. Within this module, the Bitap class is defined, featuring a constructor designed to receive patterns and compute their respective lengths. The class also offers two essential functions: Preprocess and Search.
The Preprocess function is responsible for generating bit masks for each character in the pattern, storing them in the textMasks_vector data structure. Subsequently, the Search function utilizes bitwise operations to identify occurrences of the pattern within the text, ultimately displaying the positions where matches are found.
In the main function of this program, user input for both the text and pattern is solicited. Following this input acquisition, the main function instantiates a Bitap class object with the specified pattern, prompts the user for the necessary text input, and triggers the Search function of the Bitap class to pinpoint and display the positions of pattern matches within the text.