Friday, February 10, 2012

Solving the Boggle Game - Recursion, Prefix Tree, and Dynamic Programming

I spent this past weekend designing the game of Boggle. Although it looks like a simple game at a high level, implementing it in a programming language was a great experience. I had to use recursion, sorting, searching, prefix trees (also knows as trie's), and dynamic programming as I was improving the run time of the program. So here is a summary of the work I did in trying to write a very optimal solution to the Boggle game:



*Image courtesy: http://stackoverflow.com/questions/746082/how-to-find-list-of-possible-words-from-a-letter-matrix-boggle-solver

The game of Boggle is played on a N x N board (usually made of cubes that has letters engraved on it). Given a dictionary, you will have to construct words on the board following these rules:

i) The letters in the word must be "adjacent" to each other
ii) Two letters on the board are "adjacent" if they are located on the board next to each other horizontally, vertically, or diagonally.
iii) A word must be 3 or more letters long to be valid

The more words you can construct, the more points you get. Longer words get more points.

In computerse: given an N x N board, and a dictionary containing hundreds of thousands of words, construct those words from the board that are found in the dictionary.

There are a number of approaches that one can take to solve this problem. I designed three different algorithms to solve it. Here are they:

First solution - Recursion + Binary Search:

In this approach, we recurse (using backtracking) through the board and generate all possible words. For each word that are three or more letters long we check to see if it's in the dictionary. If it is, we have a match!

Here is the algorithm steps:

1. Read in the dictionary in to a container in memory.
2. Sort the dictionary.
3. Using backtracking, search the board for words.
4. If a word is found and it contains 3 or more letters, do a binary search on the dictionary to see if the word is there.
5. If searching was successful in the previous step, print the letter out.
6. Continue step 3-5 as long as there are more words to construct on the board.

Complexity of this approach:

In this solution, we do a good job on the dictionary search. Using binary search, we are quickly finding out whether a word is in dictionary or not. But the real bottleneck is in searching the board for words. For an N x N board the search space is O((N*N)!). I am not exactly sure about this number, but you can find some discussions of it here: http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/.
(N*N)! is a HUGE number even for N = 5. So this approach is is impractical and out of question for any useful implementation.


Second Solution - Pruned Recursion + Prefix Tree (also known as a Trie):

From the previous approach, our major concern was the enormous search space on the board. Fortunately, using a a prefix tree or trie data structure we could significantly cut down on this search space. The reasoning behind this improvement is that, if a word "abc" does not occur as a prefix to any word in the dictionary there is no reason to keep searching for words after we encounter "abc" on the board. This actually cut down the run time a lot.

Here is the algorithm steps:

1. Read a word from the dictionary file.
2. Insert it into a prefix tree data structure in memory.
3. Repeat steps 1-2 until all words in the dictionary have been inserted into the prefix tree.
4. Using backtracking, search the board for words.
5. If a word is found and it contains 3 or more letters, search the prefix tree for the word.
6. If searching was *not* successful in the previous step, return from this branch of the backtracking stage. (There is no point to continue searching in this branch, nothing in the dictionary as the prefix tree says).
7. If searching was successful in step 5, continue searching by constructing more words along this branch of backtracking and stop when the leaf node has been reached in the prefix tree. (at that point there is nothing more to search).
8. Repeat steps 4-7 as long as there are more words to search in the backtracking.

Complexity of this approach:

This approach significantly improves on the first one. Building a prefix tree our of the dictionary words is O(W * L), where W is the number of words in the dictionary and L is the maximum length of a word in the dictionary.
Searching the board will be of the same order as the dictionary since we are not really searching words that are not in the dictionary. But in reality it will be more work than that as we still need to backtrack along the board to construct new words until we can consult the dictionary prefix tree to know whether it exists or not.

Third and Final Solution - No search space + Dynamic Programming:

The 2nd approach mentioned above was good enough until the board size was 5. Unfortunately with a board size of 6, that too was taking forever to complete!



It got me into thinking - "Dang, this search space is still too big to search! Can I just get rid of it entirely?" And then this idea popped into my mind: instead of random constructing word after word in this infinite ocean of words why don't I take a word from the dictionary and somehow magically check whether that's available on the board or not?

It turns out, we can use a nifty dynamic programming technique to quickly check whether a word (from the dictionary in this case) can be constructed from the board or not!

Here is core point of the dynamic programming idea:


For a word of length k to be found (end location) at the [i, j]-th location of the board, the k-1'th letter of that word must be located in one of the adjacent cells of [i, j].



The base case is k = 1.

A letter of length 1 will be found (end location) in the [i, j]-th cell of the board of the only letter in the word matches the letter in the [i, j]-th location of the board.

Once our dynamic programming table is populated with the base case, we can build on top of that for any word of length k, k > 1.

Here is a sample code for this:

for (k = 2; k < MAX_WORD_LENGTH; ++k)
    for (i = 0; i < N; ++i)
        for (j = 0; j < N; ++j)
            if (board[i][j] == word[k])
            {
                 for all the "adjacent" cells of board[i, j]
                     if we table[k-1][adjacent_i][adjacent_j] is true
                         then table[k][i][j] = true;
             }


*I have the C++ code, if you are interested to take a look leave a comment with your email address.

Run Time Complexity:

The run time complexity of this approach is obvious is pretty obvious. It's O (W * N * N * MAX_WORD_LENGTH). Here N is dimension of the board which is usually between 4 to 6. So essentially the algorithm runs in the order of the size of the dictionary!

I solved a 6 X 6 boggle game with a dictionary with 600,000 words in 0.002 seconds! With I/O it as about 0.57 seconds. But with the trie and binary search approaches, it took much longer!

So, here is the summary of my a weekend's worth of adventure in to the world of algorithms and data structures. Feel free to ask me any questions, or any bug in these algorithms:)

Credit where it is due: My friend Satej who is a ninja problem solver and a PhD student at the UCF explained the dynamic programming solution to me first before I refined and finalized it. Thanks Satej!


Update (5/10/2012):

So many people have asked for the source code that I feel I better add the C++ code here:) The final, polished code for this is not in my home computer here, so if there is a bug please let me know!


#include <cstdio>
#include <iostream>

using namespace std;

const int N = 6; // max length of a word in the board

char in[N * N + 1]; // max length of a word
char board[N+1][N+2]; // keep room for a newline and null char at the end
char prev[N * N + 1];
bool dp[N * N + 1][N][N];

// direction X-Y delta pairs for adjacent cells
int dx[] = {0, 1, 1, 1, 0, -1, -1, -1};
int dy[] = {1, 1, 0, -1, -1, -1, 0, 1};
bool visited[N][N];

bool checkBoard(char* word, int curIndex, int r, int c, int wordLen)
{
    if (curIndex == wordLen - 1)
    {
        //cout << "Returned TRUE!!" << endl;
        return true;
    }
   
    int ret = false;
       
    for (int i = 0; i < 8; ++i)
    {
        int newR = r + dx[i];
        int newC = c + dy[i];
       
        if (newR >= 0 && newR < N && newC >= 0 && newC < N && !visited[newR][newC] && word[curIndex+1] == board[newR][newC])
        {
            ++curIndex;
            visited[newR][newC] = true;
           
            ret = checkBoard(word, curIndex, newR, newC, wordLen);
            if (ret)
                break;
               
            --curIndex;
            visited[newR][newC] = false;
        }
    }
   
    return ret;           
}

int main(int argc, char* argv[])
{
   
    int i, j, k, l;
   
    FILE* fDict = fopen("dict.txt","r");
    FILE* fBoard = fopen("tmp2.txt","r");
   
    for(i = 0; i < N; ++i)
        fgets(board[i], N+2, fBoard);
   
    strcpy(prev,"");
    int pLen = 0;
   
    while(fgets(in, N*N + 1, fDict))
    {
        int len = strlen(in);
        if (in[len-1] == '\n')
        {
            in[len-1] = '\0'; // remove the trailing newline
            --len;
        }
       
        if (len < 3)
            continue; // we only want words longer than 3 or more letter
           
        for(i = 0; i < len && i < pLen; ++i)
        {
            if(prev[i] != in[i])
                break;
        }
       
        int firstMismatch = i; // little optimization: building on previous word (will benefit if the word list is sorted)
       
        if(firstMismatch==0)
        {
            for(i = 0; i < 6; ++i)
            {
                for(j = 0; j < 6; ++j)
                {
                    if(board[i][j] == in[0])
                        dp[0][i][j] = true;
                    else
                        dp[0][i][j] = false;
                }
            }
            firstMismatch = 1;
        }
       
        for(k = firstMismatch; k < len; ++k)
        {
            for(i=0;i<6;++i)
            {
                for(j=0;j<6;++j)
                {   
                    dp[k][i][j] = false;
                           
                    if(board[i][j] != in[k])
                        continue;
                       
                    for(l= 0; l < 8 && !dp[k][i][j]; ++l)
                    {
                        int ti = i + dx[l];
                        int tj = j + dy[l];
                       
                        if(ti < 0 || ti >= 6 || tj < 0 || tj >= 6)
                            continue;
                       
                        if (dp[k-1][ti][tj])
                            dp[k][i][j] = true;
                    }
                }
            }  
        }
       
        // check if the word is tagged as found in the dp table
        bool flag = false;
        for(i = 0; i < 6 && !flag; ++i)
        {
            for(j = 0; j < 6 && !flag; ++j)
            {
                if(dp[len-1][i][j])
                    flag =true;
            }
        }
       
        // dp table says its there, but make sure its in the board and it does not repeat a location in the board
        if(flag)
        {
            //cout << "Checking word: " << in << endl;
            bool verified = false;
           
            for (i = 0; i < N && !verified; ++i)
            {
                for (j = 0; j < N && !verified; ++j)
                {
                    if (in[0] != board[i][j])
                        continue;
                       
                    memset(visited, false, sizeof(visited));
                    visited[i][j] = true;
                   
                    if (checkBoard(in, 0, i, j, len))
                    {
                        cout << in << endl;
                        break;
                    }
                }
            }
        }
       
        strcpy(prev,in);
        pLen=len;
           
    }
   
    return 0;
}


Enjoy!

58 comments:

  1. i am interested and need the C++ code
    could you provide me them codes
    cemgurel@laben.com.tr

    ReplyDelete
  2. Thanks for reading my blog!

    Please check your inbox for the C++ code.

    ReplyDelete
    Replies
    1. İ still havent received the codes yet do you mind sending me the codes again at cemgurel@laben.com.tr

      Delete
  3. Please send me a c++ code on yasha.khandelwal02@gmail.com.

    Thanks

    ReplyDelete
    Replies
    1. Can you forward me the c++ code as well? vidurmayor88@gmail.com.

      Thanks :)

      Delete
  4. Hi,

    Please check your inbox. Thanks.

    ReplyDelete
  5. hi,

    please could you send me the java code for this at dspds.naveen@gmail.com

    thanks

    ReplyDelete
  6. Hi Naveen, I didn't do it in Java. I have C++ code. Let me know if want C++ code for this.

    ReplyDelete
  7. Hi,
    Can u please send me the c++ code to radha.addala@yahoo.co.in

    Thanks in advance

    ReplyDelete
  8. your blog is nice.... now i am currently working on a data structures project...if u send this c++ code it would be helpful.... thankyou

    ReplyDelete
  9. sry my email is dsprojectmaze@gmail.com

    ReplyDelete
  10. my email is juzhax@gmail.com
    do you mind to share the code too ?

    ReplyDelete
  11. Thanks everyone for showing interest in the code. I have sent it to all of you. Please check your inbox.

    ReplyDelete
  12. please send me c++ or java source code
    thanks btw
    alican@hotmail.co.jp

    ReplyDelete
  13. Very interesting. I'm curious for the source code.
    Can you send it to me?
    eydenjones@gmail.com
    Thx

    ReplyDelete
  14. Sorry for the late reply. Please check your inbox. I sent the code.

    ReplyDelete
  15. Could you send me the code?

    bobsyouruncle@yarr.ca

    ReplyDelete
  16. Hi, that looks very nice.
    Could you send me the C++ code to :

    coffeehouse@free.fr

    Thanks

    ReplyDelete
  17. can you send me the code
    punjab1836@gmail.com

    ReplyDelete
  18. Hi, I'd be interested in seeing the code for this if you could email it to info[at]daveallanson.com

    ReplyDelete
  19. That's really cool, whats the largest board you've tired it on? I'm interested in your code as well.

    lifeforce0 at gmail

    ReplyDelete
  20. Plz mail me c++ code at ashiniit@gmail.com.......
    Thanks a lot

    ReplyDelete
  21. please send me C++ code.
    jamesmark1007@gmail.com
    Thank you

    ReplyDelete
  22. Hi all,

    I updated the post with the source code. Please let me know if you find any bugs, etc. Thanks.

    ReplyDelete
  23. please send me C++ code.
    tzewei91@gmail.com
    Thank you

    ReplyDelete
  24. This is a really awesome read, could you send me the code, jcp1217@gmail.com

    ReplyDelete
  25. To those still asking for the code, I updated the post with the dynamic programming code. It's at the end of the post.

    Thanks for your interest in the code!

    ReplyDelete
  26. can you please send me the java source code its educational purpose i just wana know how it works please send it to
    p4l951@hotmail.com

    ReplyDelete
    Replies
    1. Hi, I only wrote C++ code for this. It should be very straightforward to port it to Java.

      Delete
    2. sure can you send that plz

      Delete
  27. Can you send me the c++ code @ m.garimella1@gmail.com

    ReplyDelete
  28. If still available, I'd like to a copy of the c++ code
    Thanks larry@televox.cc

    ReplyDelete
  29. can u please mail me the java and c++ codes for the problem.
    thanks my id is abhinavsareen1@gmail.com

    ReplyDelete
  30. Can you forward me the c++ code as well? vidurmayor88@gmail.com.

    Thanks :)

    ReplyDelete
  31. can u please mail me the c++ code too.
    swati_sj00@yahoo.com

    ReplyDelete
  32. Can you please forward me the C++ code?
    akanksha.237@gmail.com

    Thanks :)

    ReplyDelete
  33. plz forward me the c++ code
    tarunnarang1992@gmail.com

    ReplyDelete
  34. This comment has been removed by the author.

    ReplyDelete
  35. To all those that wanted the C++ code, please check the end of the blog post. I added the C++ code in the post. Thanks.

    ReplyDelete
  36. Have you tried this with say a 100x100 boggle? I'm trying to compare your final solution with a c++ solver I implemented. My solver can do 100x100 in <7 seconds with a 178,000 word dictionary on a single 1.4ghz pentium. However using my solution, the time taken for larger puzzles does not scale linearly with the number of cubes.

    ReplyDelete
  37. Hi, no I haven't tied my solution with a large matrix (like 100x100). I guess it will take much longer than 7 seconds. Did you use DP or something else?

    ReplyDelete
    Replies
    1. I break the dictionary words into prefixes and store them in a hash table so look-up is O(1). "AND" becomes "A", "AN", "AND". I do a depth first search using the prefix dictionary to determine at each step if the string I'm building is part of a word. I'm going to post my code to git-hub if you're interested.

      Delete
    2. Interesting... Sure, would love to take a look at your code.

      Delete
    3. github.com/themachineswillwin/fast-boggle-solver

      Delete
    4. Thanks, I will take a look and get back if I have any comments.

      Delete
  38. This comment has been removed by the author.

    ReplyDelete
  39. can u give the codes in netbeans java.please...

    ReplyDelete
  40. can u please mail me the c++ code too.
    ralg9098@gmail.com

    ReplyDelete
  41. can you please email me the c++ code?
    joshua_chua90@yahoo.com

    ReplyDelete
  42. Great post. Enjoyed reading and perusing your code. You stored the board in a char array which is fine for all the tiles except for Qu. While the general approach would be the same, it seems that you would need to change your container and perhaps some of your search algorithms. Just a thought!

    ReplyDelete
  43. Hi, can you send me the C++ code to cronix_psycho@yahoo.com
    Thanks

    ReplyDelete
  44. Hi, can I use your code in a commercial project ?

    Thanks!

    ReplyDelete
    Replies
    1. Sure you can. But please attribute the code to this web page. Thanks. And I would love to know where you are using it!

      Delete
    2. Hi Gola,
      how to go about making a code for a NxN matrix using each alphabet only once......Would we be able to guess what would be the maximum number of words possible given a NxN matrix where N can be >10

      Delete
  45. hey do you think i can take a look at your program? thanks! pnchming@yahoo.com

    ReplyDelete
  46. Unfortunately, the dynamic programming solution is wrong over here as it ignores the fact that each cell can be visited atmost once. Your dp[i][j][k] can go through a cell many times as the state doesn't keep track of the previously visited cells.

    ReplyDelete
  47. Hey.. Ur explanation is too good.. Thanks.. Can you forward the full c++ code to albertiitm @gmail.com. ... thanks...

    ReplyDelete