Tuesday, 26 March 2013

Interview Series – Poly Indexing

Finding the presence of particular streams of text is vital during Search Operations. Sometimes, finding the location of repeated occurrences of search-terms is even more vital.

Here, we’ll see how to find the indices of multiple occurrences of overlapping and non-overlapping streams of text in a string-input.

It is fairly straightforward to locate the index of the first and last occurrences of a search-term.  We can use the IndexOf and the LastIndexOf functionalities. Finding the second occurrence of a search-term using only IndexOf and finding the penultimate occurrence of a search-term only using LastIndexOf is slightly tricky.

Consider the string “yyy”. The number of occurrences of the search-term “yy” without overlapping is 1 and with overlapping is 2. Unfortunately, the pre-defined Regex functionality is available only for finding non-overlapping occurrences of a search-term.  Hence we need to write our own custom function to find the total occurrences in the overlapping scenario.

Also, there’s no built-in functionality to find the indices of multiple occurrences of the same search-term in the entire input; be it in the overlapping or in the not overlapping scenario.

Please check out 4 video links where I go through this simulated Interview in more detail.

The code typed-in during the interview series is as follows for your reference:-

            //                          01234567890123456789012
            //string sentence = "yes noyes tyyes yed yes";
            //string searchWord = "yes";

            //                           0123456789012
            string sentence = "yyyfgyyyhdyyy";
            string searchWord = "yy";

            int indx1 = sentence.IndexOf(searchWord);
            int indx2 = sentence.LastIndexOf(searchWord);
            int indx3 = sentence.IndexOf(searchWord, sentence.IndexOf(searchWord) + 1);
       int indx4 = sentence.Substring(0, sentence.LastIndexOf(searchWord)).LastIndexOf(searchWord);

            int allCount = new Regex(searchWord).Matches(sentence).Count;

            var indices = new List<int>();
            int currentIndex = 0;
            int atIndex = 0;

            for (int i = 0; i < sentence.LastIndexOf(searchWord); i++)
                if ((atIndex = sentence.IndexOf(searchWord, currentIndex)) != -1)
                    //currentIndex = atIndex + word.Length;
                    currentIndex = atIndex + 1;

Thank you for reading this post and watching the videos. Please Subscribe, Comment and Rate the channel if you liked the video.

Goto  C# Experiments to access more of such content! Thanks again!