Friday, February 10, 2012

Solving the Boggle Game - Recursion, Prefix Tree, and Dynamic Programming

I spent this past weekend designing the game of Boggle. Although it looks like a simple game at a high level, implementing it in a programming language was a great experience. I had to use recursion, sorting, searching, prefix trees (also knows as trie's), and dynamic programming as I was improving the run time of the program. So here is a summary of the work I did in trying to write a very optimal solution to the Boggle game:



*Image courtesy: http://stackoverflow.com/questions/746082/how-to-find-list-of-possible-words-from-a-letter-matrix-boggle-solver

The game of Boggle is played on a N x N board (usually made of cubes that has letters engraved on it). Given a dictionary, you will have to construct words on the board following these rules:

i) The letters in the word must be "adjacent" to each other
ii) Two letters on the board are "adjacent" if they are located on the board next to each other horizontally, vertically, or diagonally.
iii) A word must be 3 or more letters long to be valid

The more words you can construct, the more points you get. Longer words get more points.

In computerse: given an N x N board, and a dictionary containing hundreds of thousands of words, construct those words from the board that are found in the dictionary.

There are a number of approaches that one can take to solve this problem. I designed three different algorithms to solve it. Here are they:

First solution - Recursion + Binary Search:

In this approach, we recurse (using backtracking) through the board and generate all possible words. For each word that are three or more letters long we check to see if it's in the dictionary. If it is, we have a match!

Here is the algorithm steps:

1. Read in the dictionary in to a container in memory.
2. Sort the dictionary.
3. Using backtracking, search the board for words.
4. If a word is found and it contains 3 or more letters, do a binary search on the dictionary to see if the word is there.
5. If searching was successful in the previous step, print the letter out.
6. Continue step 3-5 as long as there are more words to construct on the board.

Complexity of this approach:

In this solution, we do a good job on the dictionary search. Using binary search, we are quickly finding out whether a word is in dictionary or not. But the real bottleneck is in searching the board for words. For an N x N board the search space is O((N*N)!). I am not exactly sure about this number, but you can find some discussions of it here: http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/.
(N*N)! is a HUGE number even for N = 5. So this approach is is impractical and out of question for any useful implementation.


Second Solution - Pruned Recursion + Prefix Tree (also known as a Trie):

From the previous approach, our major concern was the enormous search space on the board. Fortunately, using a a prefix tree or trie data structure we could significantly cut down on this search space. The reasoning behind this improvement is that, if a word "abc" does not occur as a prefix to any word in the dictionary there is no reason to keep searching for words after we encounter "abc" on the board. This actually cut down the run time a lot.

Here is the algorithm steps:

1. Read a word from the dictionary file.
2. Insert it into a prefix tree data structure in memory.
3. Repeat steps 1-2 until all words in the dictionary have been inserted into the prefix tree.
4. Using backtracking, search the board for words.
5. If a word is found and it contains 3 or more letters, search the prefix tree for the word.
6. If searching was *not* successful in the previous step, return from this branch of the backtracking stage. (There is no point to continue searching in this branch, nothing in the dictionary as the prefix tree says).
7. If searching was successful in step 5, continue searching by constructing more words along this branch of backtracking and stop when the leaf node has been reached in the prefix tree. (at that point there is nothing more to search).
8. Repeat steps 4-7 as long as there are more words to search in the backtracking.

Complexity of this approach:

This approach significantly improves on the first one. Building a prefix tree our of the dictionary words is O(W * L), where W is the number of words in the dictionary and L is the maximum length of a word in the dictionary.
Searching the board will be of the same order as the dictionary since we are not really searching words that are not in the dictionary. But in reality it will be more work than that as we still need to backtrack along the board to construct new words until we can consult the dictionary prefix tree to know whether it exists or not.

Third and Final Solution - No search space + Dynamic Programming:

The 2nd approach mentioned above was good enough until the board size was 5. Unfortunately with a board size of 6, that too was taking forever to complete!



It got me into thinking - "Dang, this search space is still too big to search! Can I just get rid of it entirely?" And then this idea popped into my mind: instead of random constructing word after word in this infinite ocean of words why don't I take a word from the dictionary and somehow magically check whether that's available on the board or not?

It turns out, we can use a nifty dynamic programming technique to quickly check whether a word (from the dictionary in this case) can be constructed from the board or not!

Here is core point of the dynamic programming idea:


For a word of length k to be found (end location) at the [i, j]-th location of the board, the k-1'th letter of that word must be located in one of the adjacent cells of [i, j].



The base case is k = 1.

A letter of length 1 will be found (end location) in the [i, j]-th cell of the board of the only letter in the word matches the letter in the [i, j]-th location of the board.

Once our dynamic programming table is populated with the base case, we can build on top of that for any word of length k, k > 1.

Here is a sample code for this:

for (k = 2; k < MAX_WORD_LENGTH; ++k)
    for (i = 0; i < N; ++i)
        for (j = 0; j < N; ++j)
            if (board[i][j] == word[k])
            {
                 for all the "adjacent" cells of board[i, j]
                     if we table[k-1][adjacent_i][adjacent_j] is true
                         then table[k][i][j] = true;
             }


*I have the C++ code, if you are interested to take a look leave a comment with your email address.

Run Time Complexity:

The run time complexity of this approach is obvious is pretty obvious. It's O (W * N * N * MAX_WORD_LENGTH). Here N is dimension of the board which is usually between 4 to 6. So essentially the algorithm runs in the order of the size of the dictionary!

I solved a 6 X 6 boggle game with a dictionary with 600,000 words in 0.002 seconds! With I/O it as about 0.57 seconds. But with the trie and binary search approaches, it took much longer!

So, here is the summary of my a weekend's worth of adventure in to the world of algorithms and data structures. Feel free to ask me any questions, or any bug in these algorithms:)

Credit where it is due: My friend Satej who is a ninja problem solver and a PhD student at the UCF explained the dynamic programming solution to me first before I refined and finalized it. Thanks Satej!


Update (5/10/2012):

So many people have asked for the source code that I feel I better add the C++ code here:) The final, polished code for this is not in my home computer here, so if there is a bug please let me know!


#include <cstdio>
#include <iostream>

using namespace std;

const int N = 6; // max length of a word in the board

char in[N * N + 1]; // max length of a word
char board[N+1][N+2]; // keep room for a newline and null char at the end
char prev[N * N + 1];
bool dp[N * N + 1][N][N];

// direction X-Y delta pairs for adjacent cells
int dx[] = {0, 1, 1, 1, 0, -1, -1, -1};
int dy[] = {1, 1, 0, -1, -1, -1, 0, 1};
bool visited[N][N];

bool checkBoard(char* word, int curIndex, int r, int c, int wordLen)
{
    if (curIndex == wordLen - 1)
    {
        //cout << "Returned TRUE!!" << endl;
        return true;
    }
   
    int ret = false;
       
    for (int i = 0; i < 8; ++i)
    {
        int newR = r + dx[i];
        int newC = c + dy[i];
       
        if (newR >= 0 && newR < N && newC >= 0 && newC < N && !visited[newR][newC] && word[curIndex+1] == board[newR][newC])
        {
            ++curIndex;
            visited[newR][newC] = true;
           
            ret = checkBoard(word, curIndex, newR, newC, wordLen);
            if (ret)
                break;
               
            --curIndex;
            visited[newR][newC] = false;
        }
    }
   
    return ret;           
}

int main(int argc, char* argv[])
{
   
    int i, j, k, l;
   
    FILE* fDict = fopen("dict.txt","r");
    FILE* fBoard = fopen("tmp2.txt","r");
   
    for(i = 0; i < N; ++i)
        fgets(board[i], N+2, fBoard);
   
    strcpy(prev,"");
    int pLen = 0;
   
    while(fgets(in, N*N + 1, fDict))
    {
        int len = strlen(in);
        if (in[len-1] == '\n')
        {
            in[len-1] = '\0'; // remove the trailing newline
            --len;
        }
       
        if (len < 3)
            continue; // we only want words longer than 3 or more letter
           
        for(i = 0; i < len && i < pLen; ++i)
        {
            if(prev[i] != in[i])
                break;
        }
       
        int firstMismatch = i; // little optimization: building on previous word (will benefit if the word list is sorted)
       
        if(firstMismatch==0)
        {
            for(i = 0; i < 6; ++i)
            {
                for(j = 0; j < 6; ++j)
                {
                    if(board[i][j] == in[0])
                        dp[0][i][j] = true;
                    else
                        dp[0][i][j] = false;
                }
            }
            firstMismatch = 1;
        }
       
        for(k = firstMismatch; k < len; ++k)
        {
            for(i=0;i<6;++i)
            {
                for(j=0;j<6;++j)
                {   
                    dp[k][i][j] = false;
                           
                    if(board[i][j] != in[k])
                        continue;
                       
                    for(l= 0; l < 8 && !dp[k][i][j]; ++l)
                    {
                        int ti = i + dx[l];
                        int tj = j + dy[l];
                       
                        if(ti < 0 || ti >= 6 || tj < 0 || tj >= 6)
                            continue;
                       
                        if (dp[k-1][ti][tj])
                            dp[k][i][j] = true;
                    }
                }
            }  
        }
       
        // check if the word is tagged as found in the dp table
        bool flag = false;
        for(i = 0; i < 6 && !flag; ++i)
        {
            for(j = 0; j < 6 && !flag; ++j)
            {
                if(dp[len-1][i][j])
                    flag =true;
            }
        }
       
        // dp table says its there, but make sure its in the board and it does not repeat a location in the board
        if(flag)
        {
            //cout << "Checking word: " << in << endl;
            bool verified = false;
           
            for (i = 0; i < N && !verified; ++i)
            {
                for (j = 0; j < N && !verified; ++j)
                {
                    if (in[0] != board[i][j])
                        continue;
                       
                    memset(visited, false, sizeof(visited));
                    visited[i][j] = true;
                   
                    if (checkBoard(in, 0, i, j, len))
                    {
                        cout << in << endl;
                        break;
                    }
                }
            }
        }
       
        strcpy(prev,in);
        pLen=len;
           
    }
   
    return 0;
}


Enjoy!

Update 3/15/2015: I have added a Java version of the boggle game solution on my Github page here: https://github.com/bilash/boggle-solver

The Java version builds a Trie for the dictionary and then uses the dynamic programming approach mentioned above.

99 comments:

  1. i am interested and need the C++ code
    could you provide me them codes
    cemgurel@laben.com.tr

    ReplyDelete
  2. Thanks for reading my blog!

    Please check your inbox for the C++ code.

    ReplyDelete
    Replies
    1. İ still havent received the codes yet do you mind sending me the codes again at cemgurel@laben.com.tr

      Delete
    2. hi can u please lease send me the code ?

      Delete
  3. Please send me a c++ code on yasha.khandelwal02@gmail.com.

    Thanks

    ReplyDelete
    Replies
    1. Can you forward me the c++ code as well? vidurmayor88@gmail.com.

      Thanks :)

      Delete
  4. Replies
    1. This comment has been removed by the author.

      Delete
  5. hi,

    please could you send me the java code for this at dspds.naveen@gmail.com

    thanks

    ReplyDelete
  6. Hi Naveen, I didn't do it in Java. I have C++ code. Let me know if want C++ code for this.

    ReplyDelete
    Replies
    1. Hi Golam,

      Since your code seems to be so popular, I was wondering if you wouldn't mind sending the c++ code one more time to me at ivang1111@sbcglobal.net?

      Thank you so much!!!

      Delete
    2. i too need the c++ code
      can u plz send me the boggler code to karthiksai4555@gmail.com

      Delete
  7. Hi,
    Can u please send me the c++ code to radha.addala@yahoo.co.in

    Thanks in advance

    ReplyDelete
  8. your blog is nice.... now i am currently working on a data structures project...if u send this c++ code it would be helpful.... thankyou

    ReplyDelete
  9. sry my email is dsprojectmaze@gmail.com

    ReplyDelete
  10. my email is juzhax@gmail.com
    do you mind to share the code too ?

    ReplyDelete
  11. Thanks everyone for showing interest in the code. I have sent it to all of you. Please check your inbox.

    ReplyDelete
  12. please send me c++ or java source code
    thanks btw
    alican@hotmail.co.jp

    ReplyDelete
    Replies
    1. Can you send me too here hafizumarmubasher92@gamil.com

      Delete
  13. Very interesting. I'm curious for the source code.
    Can you send it to me?
    eydenjones@gmail.com
    Thx

    ReplyDelete
  14. Sorry for the late reply. Please check your inbox. I sent the code.

    ReplyDelete
  15. Could you send me the code?

    bobsyouruncle@yarr.ca

    ReplyDelete
  16. Hi, that looks very nice.
    Could you send me the C++ code to :

    coffeehouse@free.fr

    Thanks

    ReplyDelete
  17. can you send me the code
    punjab1836@gmail.com

    ReplyDelete
  18. Hi, I'd be interested in seeing the code for this if you could email it to info[at]daveallanson.com

    ReplyDelete
  19. That's really cool, whats the largest board you've tired it on? I'm interested in your code as well.

    lifeforce0 at gmail

    ReplyDelete
  20. Plz mail me c++ code at ashiniit@gmail.com.......
    Thanks a lot

    ReplyDelete
  21. please send me C++ code.
    jamesmark1007@gmail.com
    Thank you

    ReplyDelete
  22. Hi all,

    I updated the post with the source code. Please let me know if you find any bugs, etc. Thanks.

    ReplyDelete
  23. please send me C++ code.
    tzewei91@gmail.com
    Thank you

    ReplyDelete
  24. This is a really awesome read, could you send me the code, jcp1217@gmail.com

    ReplyDelete
  25. To those still asking for the code, I updated the post with the dynamic programming code. It's at the end of the post.

    Thanks for your interest in the code!

    ReplyDelete
  26. can you please send me the java source code its educational purpose i just wana know how it works please send it to
    p4l951@hotmail.com

    ReplyDelete
    Replies
    1. Hi, I only wrote C++ code for this. It should be very straightforward to port it to Java.

      Delete
  27. Can you send me the c++ code @ m.garimella1@gmail.com

    ReplyDelete
  28. If still available, I'd like to a copy of the c++ code
    Thanks larry@televox.cc

    ReplyDelete
  29. can u please mail me the java and c++ codes for the problem.
    thanks my id is abhinavsareen1@gmail.com

    ReplyDelete
  30. Can you forward me the c++ code as well? vidurmayor88@gmail.com.

    Thanks :)

    ReplyDelete
  31. can u please mail me the c++ code too.
    swati_sj00@yahoo.com

    ReplyDelete
  32. Can you please forward me the C++ code?
    akanksha.237@gmail.com

    Thanks :)

    ReplyDelete
  33. plz forward me the c++ code
    tarunnarang1992@gmail.com

    ReplyDelete
  34. This comment has been removed by the author.

    ReplyDelete
  35. To all those that wanted the C++ code, please check the end of the blog post. I added the C++ code in the post. Thanks.

    ReplyDelete
  36. Have you tried this with say a 100x100 boggle? I'm trying to compare your final solution with a c++ solver I implemented. My solver can do 100x100 in <7 seconds with a 178,000 word dictionary on a single 1.4ghz pentium. However using my solution, the time taken for larger puzzles does not scale linearly with the number of cubes.

    ReplyDelete
  37. Hi, no I haven't tied my solution with a large matrix (like 100x100). I guess it will take much longer than 7 seconds. Did you use DP or something else?

    ReplyDelete
    Replies
    1. I break the dictionary words into prefixes and store them in a hash table so look-up is O(1). "AND" becomes "A", "AN", "AND". I do a depth first search using the prefix dictionary to determine at each step if the string I'm building is part of a word. I'm going to post my code to git-hub if you're interested.

      Delete
    2. Interesting... Sure, would love to take a look at your code.

      Delete
    3. github.com/themachineswillwin/fast-boggle-solver

      Delete
    4. Thanks, I will take a look and get back if I have any comments.

      Delete
  38. This comment has been removed by the author.

    ReplyDelete
  39. can u give the codes in netbeans java.please...

    ReplyDelete
  40. can u please mail me the c++ code too.
    ralg9098@gmail.com

    ReplyDelete
  41. can you please email me the c++ code?
    joshua_chua90@yahoo.com

    ReplyDelete
  42. Great post. Enjoyed reading and perusing your code. You stored the board in a char array which is fine for all the tiles except for Qu. While the general approach would be the same, it seems that you would need to change your container and perhaps some of your search algorithms. Just a thought!

    ReplyDelete
  43. Hi, can you send me the C++ code to cronix_psycho@yahoo.com
    Thanks

    ReplyDelete
  44. Hi, can I use your code in a commercial project ?

    Thanks!

    ReplyDelete
    Replies
    1. Sure you can. But please attribute the code to this web page. Thanks. And I would love to know where you are using it!

      Delete
    2. Hi Gola,
      how to go about making a code for a NxN matrix using each alphabet only once......Would we be able to guess what would be the maximum number of words possible given a NxN matrix where N can be >10

      Delete
  45. hey do you think i can take a look at your program? thanks! pnchming@yahoo.com

    ReplyDelete
  46. Unfortunately, the dynamic programming solution is wrong over here as it ignores the fact that each cell can be visited atmost once. Your dp[i][j][k] can go through a cell many times as the state doesn't keep track of the previously visited cells.

    ReplyDelete
    Replies
    1. Finally someone pointed it out. In the wikipedia, it said clearly that "may not use the same letter cube more than once per word."
      Other than that, great post...

      Delete
    2. Exactly. I tried implementing it myself but it didn't work because there's no way to build a proper 'visited' array - when you move onto the next iteration (next letter), there's no way of knowing which path you've taken to reach your current tile.

      Delete
  47. Hey.. Ur explanation is too good.. Thanks.. Can you forward the full c++ code to albertiitm @gmail.com. ... thanks...

    ReplyDelete
  48. the code above doesmt giv any output..
    can u mail me d code?? its quite urgent ;)
    anurag27k@gmail.com
    thanks

    ReplyDelete
  49. This comment has been removed by the author.

    ReplyDelete
  50. Why can't you do the solution 3 but with trie ?
    instead of iterate through all dictionary, you can use trie to match valid word.

    from the board you can check the word with NxNx(max words)

    ReplyDelete
  51. Interested in looking at the c++ code. Could you send it to ryanlh7job@gmail.com?

    ReplyDelete
  52. Can you please send the final C++ code to tanya_thatzme@yahoo.com?

    ReplyDelete
  53. This comment has been removed by the author.

    ReplyDelete
  54. Could I have the C++ code of this solution?
    my gmail: kittysit1220@gmail.com

    Thank you so much

    ReplyDelete
  55. Can you please send me the C++ or java code at profcdd@gmail.com ?

    ReplyDelete
  56. First of all good article. Do you have an c# cod for that? here is my email vasquezjohncullen@yahoo.com I'm very happy if you let me have some

    ReplyDelete
  57. plz give me the code of dp approach thank you
    mail id is abhidreams662@gmail.com

    ReplyDelete
  58. plz mail me c++ code for second and third algorithm.....thank you mial id: hiteshkumar848@gmail.com

    ReplyDelete
  59. This comment has been removed by the author.

    ReplyDelete
  60. I'm not sure why you said solution #2 was taking forever. Using your concept, the implementation using the trie solves a 10x10 in 210 milliseconds.

    ReplyDelete
    Replies
    1. I am surprised you got such a fast runtime using the 2nd approach. If I remember correctly, it was taking a long time to finish with a board size of 6. Maybe I did not do something right... but I really don't remember much at this time.

      Delete
    2. Yeah, he's right. I suspect you were not pruning correctly. If I remove my pruning code, I go from solving a 10x10 in ~200ms (Python) to not solving it because I don't want to wait that long. I think you should retry the trie approach, as it should be a bit faster than the one you landed on (In my implementation, about 200* faster if I don't count trie creation, though I may be implementing it wrong)

      Delete
  61. Hi,
    On your github page, I saw that you mentioned that you have implemented the dynamic approach in C++. Could you please send that to me at akshay.jagirdar.delhi@gmail.com
    Thank you

    ReplyDelete
  62. the 3rd method works if it allows using one character repeatedly as long as not continuously. For example, 'dad' was found if 'da' is on the board. 'lee' was not found if 'le' in on the board.
    the 3rd method is actually BFS+branch-and-bound.

    ReplyDelete
  63. I'd love to understand your solution using dynamic programming, but I cannot follow the description and the code is very cryptic with all those single letter variables.

    ReplyDelete
  64. you mind sending me a copy of your C++ code?

    ReplyDelete
  65. would you be able to send me a copy of the C++ code please? c837@protonmail.com

    ReplyDelete
  66. From where can I get dictionary and board file ?

    ReplyDelete
  67. hi, can you please share the c++ code - keerthivanjari@gmail.com
    Thanks for your time on this blog

    ReplyDelete
  68. Hi , Can you pleae share the C++ code - rudreshaa@gmail.com
    Its very well explained. It would great help if you can share the code.

    ReplyDelete
  69. Hi, Can you please share the c++ code- rakeshvlkn@gmail.com.

    It is explained very well.

    ReplyDelete
  70. I am interested in the source code. My email address is michaelhamarneh@gmail.com.

    ReplyDelete
  71. I am also interested in the C++ source code. My email address is ssidari2018@my.fit.edu

    ReplyDelete
  72. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no

    need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing artificial intelligence course online.

    ReplyDelete
  73. Very interesting discussion!! I think that you should write more on this topic, it might not be a taboo subject but generally people are not enough to speak on such topics. To the next.
    Tree Service

    ReplyDelete
  74. Great article, Thanks for sharing such wonderful content.
    Custom Face Mask

    ReplyDelete

  75. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.sap fico training institute in Bangalore

    ReplyDelete


  76. That is nice article from you , this is informative stuff . Hope more articles from you . I also want to share some information about Pet Vaccination in vizag

    ReplyDelete