May 012017
 

String matching is a big research area and there are many data structure (eg. B-Tree, HashMap, Set) that help indexing of strings. There are also many algorithms used for string matching (like KMP, etc.)

The major application of Trie data structure is in storing a dictionary (Collection of strings) such the searching for a word in dictionary become O(k) time operation where k is the number of characters in the word.A HashMap can also do the same, but it can only check if the exact word exists or not. Trie, on the other hand, has many more applications,

  • like trie can be used in prefix-based search (e.g find all valid words with a particular prefix), so it can be used in autocomplete.
  • It takes less space than a Hash-table, because overlapping prefixes of multiple words, share common storage.
  • Trie can check that every word you type is in the dictionary or not.

In this case, Trie is an ordered general tree (not binary tree) where each node has 26 (total number of alphabets) children. Each child is either NULL or a Trie in itself.

Each node of a trie also has an indicator to say, weather or not it is the end-of-word. All leaf nodes are end of words, but there can be a non-leaf node at which a word is terminating.

Below picture represent a Trie.

Each node represent a character, but the path (till a red-node) represent a word. So AT and ATE are both valid words.

The structure of Trie node is as below:

#define NO_OF_ALPHABETS 26
struct TrieNode
{
   TrieNode * link[NO_OF_ALPHABETS];
   bool isEndOfWord;
}

isEndOfWord field is true, if the node represent an end of word. Else, it is false. In the picture, this value is true for all the red-nodes.

It is understood that link[i] represent the (i+1)’th character.

Potentially, there can be any alphabet that may come after the word.

In real implementation, usually we keep a map (Map<Char, TrieNode*>) rather than array to optimise the space. Below are some points about Trie:

  • Root represent the empty string (which is prefix of all the strings).
  • Each Node in the Trie represent either the end of word or a prefix.

If node x is parent of another node y then, String ending at x (represented by node x) is prefix of string represented by Node-y.

Inserting a word in Trie

Inserting is simple, for each character in the trie we follow the path if that character is already present. If that character is not present, then we insert the corresponding nodes, while inserting the last node, we also set the value of isEndOfWord to true.

For example, if we want to insert a new word ‘badge‘ in the previous trie, then path till bad already exist, after that we need to insert two nodes for g and e as shown below.

This is a simple recursion, where we move forward if the character is already present, and add node if it is not present till we reach end of the string.

Code

// TrieNode structure
struct TrieNode
{
   TrieNode* child[NO_OF_ALPHABETS];
   bool isEndOfWord;
   
   // Default constructor, set all the child nodes to NULL
   TrieNode():isEndOfWord(false){
      for(int i=0; i<NO_OF_ALPHABETS; i++)
         child[i] = NULL;
   }
};

// Insert the word in Trie
void insert(TrieNode* root, char* word)
{
   for(int i=0; word[i] != '\0'; i++)
   {
      if(root->child[word[i]-'a'] == NULL)
      {
         root->child[word[i]-'a'] = new TrieNode;
      }
      root = root->child[word[i]-'a'];
   }
   root->isEndOfWord = true;
}

Searching for a word in a Trie

If we want to search whether a word is present in the Trie or not, then we just need to keep tracing the path in Trie corresponding to the characters in word.

  • If there is no path (for example, if the word is bedroom, then till bed we will be able to trace the path, but there is no character path for r after bed, and we return false, indicating that this word is not present in the Trie).
  • If our word has exhausted, The path is also present, but the last node is not EndOfWord then also it is false. For example, if we are searching for word be, then path for be is present, but value of isEndOfWord for e node is false, which indicate that word does not terminate at e. Hence be is not a valid word in Trie.
  • The success case is when there exist a path and the last node in the path is also end-of-word.

Code

// Check if a word is present in the Trie or not
bool isValidWord(TrieNode* r, char* str)
{
   if(str == NULL )
      return true;
   if(*str == '\0')
      return r->isEndOfWord;
   
   if(r->child[*str - 'a'] == NULL)
      return false;
   else
      return isValidWord( r->child[*str - 'a'], str+1);
}

 

 

 

  One Response to “Trie data structure”

Comments (1)
  1. yo
    #include
    using namespace std;
    typedef long long int ll;

    int main()
    {
    ios_base::sync_with_stdio(0);
    cin.tie(0);cout.tie(0);
    ll t;
    scanf(“%lld”,&t);
    while(t–)
    {
    ll n,i,count1=0,lo=0;
    scanf(“%lld”,&n);
    for(i=0;i1){
    count1++;

    }
    }
    if(count1>=2||count1==1&&lo==0)
    printf(“no”);
    else
    printf(“yes”);
    printf(“\n”);
    }
    return 0;
    }

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)