Problem Description:
The gene sequences at the beginning and end of this issue consist of eight characters, comprising the elements "A", "C", "G", and "T". Furthermore, there is a repository of authorized gene alterations. For a mutation to be valid, the gene must be included in the repository. A mutation is characterized as the modification of a single character within the gene sequence.
Determining the minimum number of mutations required to convert the initial gene into the target gene is the primary objective. If it is impossible to achieve this using authentic mutations from the gene bank exclusively, the function should output -1. It is crucial to note that each progression from the initial gene must incorporate a mutation available in the gene bank, while the initial gene itself remains valid even if absent.
A gene string mutation occurs when there is a modification of a single character within the set of possible characters. An example of this mutation is changing 'AACCGGTT' to 'AACCGGTA'.
Sense of intuition:
- Either Breadth-First Search (BFS) or Depth-First Search (DFS) can be used to solve this problem. However, BFS is more practical for determining the shortest path in an unweighted graph, which is comparable to determining the least amount of mutations required.
- The solution's underlying idea is to see every gene string as a graph's node and every legitimate mutation as an edge joining two nodes.
- Using BFS, we examine the graph level by level, working from the start gene to the end gene.
- Because BFS determines the shortest path (least mutations) from the start to the end node, it is perfect for this kind of situation.
- First, initialize a set from the bank to rapidly determine whether a mutation is legitimate.
- Start with a queue that has the start gene initialized in it and a mutation count of 0.
- After that, check a dictionary to determine the potential mutations for each character. Dequeue an element, and then iterate through its characters, which alter each one individually based on the potential for mutation.
- Enqueue a new mutation with a count incremented by one if it is legitimate (in the bank).
- Give back the number of mutations if we get to the last gene.
- Return -1 if the queue is filled up without the last gene being found.
- By applying this BFS method, we explore every potential gene mutation in the fewest steps possible, guaranteeing, if feasible, the fewest mutations required to reach the target gene.
The key steps of BFS are as follows:
Approach to Solve :
The technique employed in the solution, known as Breadth-First Search (BFS), is applied to explore tree or graph data structures. Prior to moving to nodes at the subsequent depth, it examines the neighboring nodes at the current depth. The implementation encompasses various data structures, algorithms, and methodologies.
- The setup of a set is derived from the repository, providing an O(1) efficiency in verifying the presence of a gene string.
A deque, short for double-ended queue, is employed to apply Breadth-First Search (BFS). It follows the First In, First Out (FIFO) principle by adding items at one end and removing them from the opposite end.
Hash Table:
- In order to display possible alterations stemming from a single character, a hash map links every character to three other characters. This simplifies the process of determining the viability of mutations within a genetic string at any specific location.
Traversing the Graph Level by Level:
The process involves exploring the graph one level at a time. It assesses the nodes sequentially using the Breadth-First Search (BFS) technique, considering all adjacent gene sequences reachable from the current gene sequence through a single mutation.
Early Termination:
The function calculates the number of iterations (steps) needed to reach this state immediately upon finding the final gene while exploring.
Graph Trimming:
When a fresh and valid gene sequence is identified within the repository, it is removed from the collection to avoid revisiting gene sequences and potentially entering into loops. This ensures that each gene sequence is visited only once.
Example:
Let's consider an example to demonstrate the smallest genetic mutation in C++.
#include <bits/stdc++.h>
using namespace std;
class Solution {
public:
vector <string> putStrt(string st){
vector <string> r;
for(int x=0; x<st.size();x++){
string temp = st.substr(0, x) + "*" + st.substr(x + 1);
r.push_back(temp);
}
return r;
}
int minMut(string start, string end, vector<string>& bank) {
unordered_map < string, vector <string> > graph;
for(int x=0;x<bank.size();x++){
string st = bank[x];
vector <string> out = putStrt(bank[x]);
for(int y=0; y<out.size();y++){
graph[out[y]].push_back(st);
}
}
queue <string> q;
q.push(start);
set <string> visited;
visited.insert(start);
for(int lvl = 1; !q.empty(); lvl++){
int sz = q.size();
while(sz--){
string node = q.front();
q.pop();
vector <string> out = putStrt(node);
for(int i = 0; i < out.size(); i++){
string u = out[i];
for(int j = 0; j < graph[u].size(); j++){
string v = graph[u][j];
if(visited.count(v)) continue;
if(v == end) return lvl;
visited.insert(v);
q.push(v);
}
}
}
}
return -1;
}
};
int main(){
Solution ob;
vector<string> v = {"AACCGGTA", "AACCGCTA", "AAACGGTA"};
cout << (ob.minMut("AACCGGTT", "AAACGGTA", v));
}
Output: