## Friday, February 10, 2012

### Solving the Boggle Game - Recursion, Prefix Tree, and Dynamic Programming

I spent this past weekend designing the game of Boggle. Although it looks like a simple game at a high level, implementing it in a programming language was a great experience. I had to use recursion, sorting, searching, prefix trees (also knows as trie's), and dynamic programming as I was improving the run time of the program. So here is a summary of the work I did in trying to write a very optimal solution to the Boggle game:

*Image courtesy: http://stackoverflow.com/questions/746082/how-to-find-list-of-possible-words-from-a-letter-matrix-boggle-solver

The game of Boggle is played on a N x N board (usually made of cubes that has letters engraved on it). Given a dictionary, you will have to construct words on the board following these rules:

i) The letters in the word must be "adjacent" to each other
ii) Two letters on the board are "adjacent" if they are located on the board next to each other horizontally, vertically, or diagonally.
iii) A word must be 3 or more letters long to be valid

The more words you can construct, the more points you get. Longer words get more points.

In computerse: given an N x N board, and a dictionary containing hundreds of thousands of words, construct those words from the board that are found in the dictionary.

There are a number of approaches that one can take to solve this problem. I designed three different algorithms to solve it. Here are they:

First solution - Recursion + Binary Search:

In this approach, we recurse (using backtracking) through the board and generate all possible words. For each word that are three or more letters long we check to see if it's in the dictionary. If it is, we have a match!

Here is the algorithm steps:

1. Read in the dictionary in to a container in memory.
2. Sort the dictionary.
3. Using backtracking, search the board for words.
4. If a word is found and it contains 3 or more letters, do a binary search on the dictionary to see if the word is there.
5. If searching was successful in the previous step, print the letter out.
6. Continue step 3-5 as long as there are more words to construct on the board.

Complexity of this approach:

In this solution, we do a good job on the dictionary search. Using binary search, we are quickly finding out whether a word is in dictionary or not. But the real bottleneck is in searching the board for words. For an N x N board the search space is O((N*N)!). I am not exactly sure about this number, but you can find some discussions of it here: http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/.
(N*N)! is a HUGE number even for N = 5. So this approach is is impractical and out of question for any useful implementation.

Second Solution - Pruned Recursion + Prefix Tree (also known as a Trie):

From the previous approach, our major concern was the enormous search space on the board. Fortunately, using a a prefix tree or trie data structure we could significantly cut down on this search space. The reasoning behind this improvement is that, if a word "abc" does not occur as a prefix to any word in the dictionary there is no reason to keep searching for words after we encounter "abc" on the board. This actually cut down the run time a lot.

Here is the algorithm steps:

1. Read a word from the dictionary file.
2. Insert it into a prefix tree data structure in memory.
3. Repeat steps 1-2 until all words in the dictionary have been inserted into the prefix tree.
4. Using backtracking, search the board for words.
5. If a word is found and it contains 3 or more letters, search the prefix tree for the word.
6. If searching was *not* successful in the previous step, return from this branch of the backtracking stage. (There is no point to continue searching in this branch, nothing in the dictionary as the prefix tree says).
7. If searching was successful in step 5, continue searching by constructing more words along this branch of backtracking and stop when the leaf node has been reached in the prefix tree. (at that point there is nothing more to search).
8. Repeat steps 4-7 as long as there are more words to search in the backtracking.

Complexity of this approach:

This approach significantly improves on the first one. Building a prefix tree our of the dictionary words is O(W * L), where W is the number of words in the dictionary and L is the maximum length of a word in the dictionary.
Searching the board will be of the same order as the dictionary since we are not really searching words that are not in the dictionary. But in reality it will be more work than that as we still need to backtrack along the board to construct new words until we can consult the dictionary prefix tree to know whether it exists or not.

Third and Final Solution - No search space + Dynamic Programming:

The 2nd approach mentioned above was good enough until the board size was 5. Unfortunately with a board size of 6, that too was taking forever to complete!

It got me into thinking - "Dang, this search space is still too big to search! Can I just get rid of it entirely?" And then this idea popped into my mind: instead of random constructing word after word in this infinite ocean of words why don't I take a word from the dictionary and somehow magically check whether that's available on the board or not?

It turns out, we can use a nifty dynamic programming technique to quickly check whether a word (from the dictionary in this case) can be constructed from the board or not!

Here is core point of the dynamic programming idea:

For a word of length k to be found (end location) at the [i, j]-th location of the board, the k-1'th letter of that word must be located in one of the adjacent cells of [i, j].

The base case is k = 1.

A letter of length 1 will be found (end location) in the [i, j]-th cell of the board of the only letter in the word matches the letter in the [i, j]-th location of the board.

Once our dynamic programming table is populated with the base case, we can build on top of that for any word of length k, k > 1.

Here is a sample code for this:

for (k = 2; k < MAX_WORD_LENGTH; ++k)
for (i = 0; i < N; ++i)
for (j = 0; j < N; ++j)
if (board[i][j] == word[k])
{
for all the "adjacent" cells of board[i, j]
then table[k][i][j] = true;
}

Run Time Complexity:

The run time complexity of this approach is obvious is pretty obvious. It's O (W * N * N * MAX_WORD_LENGTH). Here N is dimension of the board which is usually between 4 to 6. So essentially the algorithm runs in the order of the size of the dictionary!

I solved a 6 X 6 boggle game with a dictionary with 600,000 words in 0.002 seconds! With I/O it as about 0.57 seconds. But with the trie and binary search approaches, it took much longer!

So, here is the summary of my a weekend's worth of adventure in to the world of algorithms and data structures. Feel free to ask me any questions, or any bug in these algorithms:)

Credit where it is due: My friend Satej who is a ninja problem solver and a PhD student at the UCF explained the dynamic programming solution to me first before I refined and finalized it. Thanks Satej!

Update (5/10/2012):

So many people have asked for the source code that I feel I better add the C++ code here:) The final, polished code for this is not in my home computer here, so if there is a bug please let me know!

#include <cstdio>
#include <iostream>

using namespace std;

const int N = 6; // max length of a word in the board

char in[N * N + 1]; // max length of a word
char board[N+1][N+2]; // keep room for a newline and null char at the end
char prev[N * N + 1];
bool dp[N * N + 1][N][N];

// direction X-Y delta pairs for adjacent cells
int dx[] = {0, 1, 1, 1, 0, -1, -1, -1};
int dy[] = {1, 1, 0, -1, -1, -1, 0, 1};
bool visited[N][N];

bool checkBoard(char* word, int curIndex, int r, int c, int wordLen)
{
if (curIndex == wordLen - 1)
{
//cout << "Returned TRUE!!" << endl;
return true;
}

int ret = false;

for (int i = 0; i < 8; ++i)
{
int newR = r + dx[i];
int newC = c + dy[i];

if (newR >= 0 && newR < N && newC >= 0 && newC < N && !visited[newR][newC] && word[curIndex+1] == board[newR][newC])
{
++curIndex;
visited[newR][newC] = true;

ret = checkBoard(word, curIndex, newR, newC, wordLen);
if (ret)
break;

--curIndex;
visited[newR][newC] = false;
}
}

return ret;
}

int main(int argc, char* argv[])
{

int i, j, k, l;

FILE* fDict = fopen("dict.txt","r");
FILE* fBoard = fopen("tmp2.txt","r");

for(i = 0; i < N; ++i)
fgets(board[i], N+2, fBoard);

strcpy(prev,"");
int pLen = 0;

while(fgets(in, N*N + 1, fDict))
{
int len = strlen(in);
if (in[len-1] == '\n')
{
in[len-1] = '\0'; // remove the trailing newline
--len;
}

if (len < 3)
continue; // we only want words longer than 3 or more letter

for(i = 0; i < len && i < pLen; ++i)
{
if(prev[i] != in[i])
break;
}

int firstMismatch = i; // little optimization: building on previous word (will benefit if the word list is sorted)

if(firstMismatch==0)
{
for(i = 0; i < 6; ++i)
{
for(j = 0; j < 6; ++j)
{
if(board[i][j] == in[0])
dp[0][i][j] = true;
else
dp[0][i][j] = false;
}
}
firstMismatch = 1;
}

for(k = firstMismatch; k < len; ++k)
{
for(i=0;i<6;++i)
{
for(j=0;j<6;++j)
{
dp[k][i][j] = false;

if(board[i][j] != in[k])
continue;

for(l= 0; l < 8 && !dp[k][i][j]; ++l)
{
int ti = i + dx[l];
int tj = j + dy[l];

if(ti < 0 || ti >= 6 || tj < 0 || tj >= 6)
continue;

if (dp[k-1][ti][tj])
dp[k][i][j] = true;
}
}
}
}

// check if the word is tagged as found in the dp table
bool flag = false;
for(i = 0; i < 6 && !flag; ++i)
{
for(j = 0; j < 6 && !flag; ++j)
{
if(dp[len-1][i][j])
flag =true;
}
}

// dp table says its there, but make sure its in the board and it does not repeat a location in the board
if(flag)
{
//cout << "Checking word: " << in << endl;
bool verified = false;

for (i = 0; i < N && !verified; ++i)
{
for (j = 0; j < N && !verified; ++j)
{
if (in[0] != board[i][j])
continue;

memset(visited, false, sizeof(visited));
visited[i][j] = true;

if (checkBoard(in, 0, i, j, len))
{
cout << in << endl;
break;
}
}
}
}

strcpy(prev,in);
pLen=len;

}

return 0;
}

Enjoy!

Update 3/15/2015: I have added a Java version of the boggle game solution on my Github page here: https://github.com/bilash/boggle-solver

The Java version builds a Trie for the dictionary and then uses the dynamic programming approach mentioned above.

1. i am interested and need the C++ code
could you provide me them codes
cemgurel@laben.com.tr

2. Thanks for reading my blog!

1. İ still havent received the codes yet do you mind sending me the codes again at cemgurel@laben.com.tr

3. Please send me a c++ code on yasha.khandelwal02@gmail.com.

Thanks

1. Can you forward me the c++ code as well? vidurmayor88@gmail.com.

Thanks :)

4. Hi,

1. This comment has been removed by the author.

5. hi,

please could you send me the java code for this at dspds.naveen@gmail.com

thanks

6. Hi Naveen, I didn't do it in Java. I have C++ code. Let me know if want C++ code for this.

1. Hi Golam,

Since your code seems to be so popular, I was wondering if you wouldn't mind sending the c++ code one more time to me at ivang1111@sbcglobal.net?

Thank you so much!!!

7. Hi,

8. your blog is nice.... now i am currently working on a data structures project...if u send this c++ code it would be helpful.... thankyou

9. sry my email is dsprojectmaze@gmail.com

10. my email is juzhax@gmail.com
do you mind to share the code too ?

11. Thanks everyone for showing interest in the code. I have sent it to all of you. Please check your inbox.

12. please send me c++ or java source code
thanks btw
alican@hotmail.co.jp

13. Very interesting. I'm curious for the source code.
Can you send it to me?
eydenjones@gmail.com
Thx

15. Could you send me the code?

bobsyouruncle@yarr.ca

16. Hi, that looks very nice.
Could you send me the C++ code to :

coffeehouse@free.fr

Thanks

17. can you send me the code
punjab1836@gmail.com

18. Hi, I'd be interested in seeing the code for this if you could email it to info[at]daveallanson.com

19. That's really cool, whats the largest board you've tired it on? I'm interested in your code as well.

lifeforce0 at gmail

20. Plz mail me c++ code at ashiniit@gmail.com.......
Thanks a lot

21. please send me C++ code.
jamesmark1007@gmail.com
Thank you

22. Hi all,

I updated the post with the source code. Please let me know if you find any bugs, etc. Thanks.

23. please send me C++ code.
tzewei91@gmail.com
Thank you

24. This is a really awesome read, could you send me the code, jcp1217@gmail.com

25. To those still asking for the code, I updated the post with the dynamic programming code. It's at the end of the post.

Thanks for your interest in the code!

26. can you please send me the java source code its educational purpose i just wana know how it works please send it to
p4l951@hotmail.com

1. java source plz

2. Hi, I only wrote C++ code for this. It should be very straightforward to port it to Java.

3. sure can you send that plz

27. Can you send me the c++ code @ m.garimella1@gmail.com

28. If still available, I'd like to a copy of the c++ code
Thanks larry@televox.cc

29. can u please mail me the java and c++ codes for the problem.
thanks my id is abhinavsareen1@gmail.com

30. Can you forward me the c++ code as well? vidurmayor88@gmail.com.

Thanks :)

31. can u please mail me the c++ code too.
swati_sj00@yahoo.com

32. Can you please forward me the C++ code?
akanksha.237@gmail.com

Thanks :)

33. plz forward me the c++ code
tarunnarang1992@gmail.com

34. This comment has been removed by the author.

35. To all those that wanted the C++ code, please check the end of the blog post. I added the C++ code in the post. Thanks.

36. Have you tried this with say a 100x100 boggle? I'm trying to compare your final solution with a c++ solver I implemented. My solver can do 100x100 in <7 seconds with a 178,000 word dictionary on a single 1.4ghz pentium. However using my solution, the time taken for larger puzzles does not scale linearly with the number of cubes.

37. Hi, no I haven't tied my solution with a large matrix (like 100x100). I guess it will take much longer than 7 seconds. Did you use DP or something else?

1. I break the dictionary words into prefixes and store them in a hash table so look-up is O(1). "AND" becomes "A", "AN", "AND". I do a depth first search using the prefix dictionary to determine at each step if the string I'm building is part of a word. I'm going to post my code to git-hub if you're interested.

2. Interesting... Sure, would love to take a look at your code.

3. github.com/themachineswillwin/fast-boggle-solver

4. Thanks, I will take a look and get back if I have any comments.

38. This comment has been removed by the author.

39. can u give the codes in netbeans java.please...

40. can u please mail me the c++ code too.
ralg9098@gmail.com

41. can you please email me the c++ code?
joshua_chua90@yahoo.com

42. Great post. Enjoyed reading and perusing your code. You stored the board in a char array which is fine for all the tiles except for Qu. While the general approach would be the same, it seems that you would need to change your container and perhaps some of your search algorithms. Just a thought!

43. Hi, can you send me the C++ code to cronix_psycho@yahoo.com
Thanks

44. Hi, can I use your code in a commercial project ?

Thanks!

1. Sure you can. But please attribute the code to this web page. Thanks. And I would love to know where you are using it!

2. Hi Gola,
how to go about making a code for a NxN matrix using each alphabet only once......Would we be able to guess what would be the maximum number of words possible given a NxN matrix where N can be >10

45. hey do you think i can take a look at your program? thanks! pnchming@yahoo.com

46. Unfortunately, the dynamic programming solution is wrong over here as it ignores the fact that each cell can be visited atmost once. Your dp[i][j][k] can go through a cell many times as the state doesn't keep track of the previously visited cells.

1. Finally someone pointed it out. In the wikipedia, it said clearly that "may not use the same letter cube more than once per word."
Other than that, great post...

47. Hey.. Ur explanation is too good.. Thanks.. Can you forward the full c++ code to albertiitm @gmail.com. ... thanks...

48. the code above doesmt giv any output..
can u mail me d code?? its quite urgent ;)
anurag27k@gmail.com
thanks

49. This comment has been removed by the author.

50. Why can't you do the solution 3 but with trie ?
instead of iterate through all dictionary, you can use trie to match valid word.

from the board you can check the word with NxNx(max words)

51. Interested in looking at the c++ code. Could you send it to ryanlh7job@gmail.com?

52. Can you please send the final C++ code to tanya_thatzme@yahoo.com?

53. This comment has been removed by the author.

54. Could I have the C++ code of this solution?
my gmail: kittysit1220@gmail.com

Thank you so much

55. Can you please send me the C++ or java code at profcdd@gmail.com ?

56. great article. thanks

57. First of all good article. Do you have an c# cod for that? here is my email vasquezjohncullen@yahoo.com I'm very happy if you let me have some

58. plz give me the code of dp approach thank you
mail id is abhidreams662@gmail.com

59. plz mail me c++ code for second and third algorithm.....thank you mial id: hiteshkumar848@gmail.com

60. This comment has been removed by the author.

61. I'm not sure why you said solution #2 was taking forever. Using your concept, the implementation using the trie solves a 10x10 in 210 milliseconds.

1. I am surprised you got such a fast runtime using the 2nd approach. If I remember correctly, it was taking a long time to finish with a board size of 6. Maybe I did not do something right... but I really don't remember much at this time.

2. Yeah, he's right. I suspect you were not pruning correctly. If I remove my pruning code, I go from solving a 10x10 in ~200ms (Python) to not solving it because I don't want to wait that long. I think you should retry the trie approach, as it should be a bit faster than the one you landed on (In my implementation, about 200* faster if I don't count trie creation, though I may be implementing it wrong)

62. Hi,
On your github page, I saw that you mentioned that you have implemented the dynamic approach in C++. Could you please send that to me at akshay.jagirdar.delhi@gmail.com
Thank you

63. the 3rd method works if it allows using one character repeatedly as long as not continuously. For example, 'dad' was found if 'da' is on the board. 'lee' was not found if 'le' in on the board.
the 3rd method is actually BFS+branch-and-bound.

64. I'd love to understand your solution using dynamic programming, but I cannot follow the description and the code is very cryptic with all those single letter variables.

65. you mind sending me a copy of your C++ code?

66. Hey what is the board.txt?

67. would you be able to send me a copy of the C++ code please? c837@protonmail.com

68. From where can I get dictionary and board file ?