I spent this past weekend designing the game of Boggle. Although it looks like a simple game at a high level, implementing it in a programming language was a great experience. I had to use recursion, sorting, searching, prefix trees (also knows as trie's), and dynamic programming as I was improving the run time of the program. So here is a summary of the work I did in trying to write a very optimal solution to the Boggle game:

*Image courtesy: http://stackoverflow.com/questions/746082/how-to-find-list-of-possible-words-from-a-letter-matrix-boggle-solver

The game of Boggle is played on a N x N board (usually made of cubes that has letters engraved on it). Given a dictionary, you will have to construct words on the board following these rules:

i) The letters in the word must be "adjacent" to each other

ii) Two letters on the board are "adjacent" if they are located on the board next to each other horizontally, vertically, or diagonally.

iii) A word must be 3 or more letters long to be valid

The more words you can construct, the more points you get. Longer words get more points.

In computerse: given an N x N board, and a dictionary containing hundreds of thousands of words, construct those words from the board that are found in the dictionary.

There are a number of approaches that one can take to solve this problem. I designed three different algorithms to solve it. Here are they:

In this approach, we recurse (using backtracking) through the board and generate all possible words. For each word that are three or more letters long we check to see if it's in the dictionary. If it is, we have a match!

Here is the algorithm steps:

1. Read in the dictionary in to a container in memory.

2. Sort the dictionary.

3. Using backtracking, search the board for words.

4. If a word is found and it contains 3 or more letters, do a binary search on the dictionary to see if the word is there.

5. If searching was successful in the previous step, print the letter out.

6. Continue step 3-5 as long as there are more words to construct on the board.

Complexity of this approach:

In this solution, we do a good job on the dictionary search. Using binary search, we are quickly finding out whether a word is in dictionary or not. But the real bottleneck is in searching the board for words. For an N x N board the search space is O((N*N)!). I am not exactly sure about this number, but you can find some discussions of it here: http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/.

(N*N)! is a HUGE number even for N = 5. So this approach is is impractical and out of question for any useful implementation.

From the previous approach, our major concern was the enormous search space on the board. Fortunately, using a a prefix tree or trie data structure we could significantly cut down on this search space. The reasoning behind this improvement is that, if a word "abc" does not occur as a prefix to any word in the dictionary there is no reason to keep searching for words after we encounter "abc" on the board. This actually cut down the run time a lot.

Here is the algorithm steps:

1. Read a word from the dictionary file.

2. Insert it into a prefix tree data structure in memory.

3. Repeat steps 1-2 until all words in the dictionary have been inserted into the prefix tree.

4. Using backtracking, search the board for words.

5. If a word is found and it contains 3 or more letters, search the prefix tree for the word.

6. If searching was *not* successful in the previous step, return from this branch of the backtracking stage. (There is no point to continue searching in this branch, nothing in the dictionary as the prefix tree says).

7. If searching was successful in step 5, continue searching by constructing more words along this branch of backtracking and stop when the leaf node has been reached in the prefix tree. (at that point there is nothing more to search).

8. Repeat steps 4-7 as long as there are more words to search in the backtracking.

Complexity of this approach:

This approach significantly improves on the first one. Building a prefix tree our of the dictionary words is O(W * L), where W is the number of words in the dictionary and L is the maximum length of a word in the dictionary.

Searching the board will be of the same order as the dictionary since we are not really searching words that are not in the dictionary. But in reality it will be more work than that as we still need to backtrack along the board to construct new words until we can consult the dictionary prefix tree to know whether it exists or not.

The 2nd approach mentioned above was good enough until the board size was 5. Unfortunately with a board size of 6, that too was taking forever to complete!

It got me into thinking - "Dang, this search space is still too big to search! Can I just get rid of it

It turns out, we can use a nifty dynamic programming technique to quickly check whether a word (from the dictionary in this case) can be constructed from the board or not!

Here is core point of the dynamic programming idea:

For a word of length k to be found (end location) at the [i, j]-th location of the board, the k-1'th letter of that word must be located in one of the adjacent cells of [i, j].

The base case is k = 1.

A letter of length 1 will be found (end location) in the [i, j]-th cell of the board of the only letter in the word matches the letter in the [i, j]-th location of the board.

Once our dynamic programming table is populated with the base case, we can build on top of that for any word of length k, k > 1.

Here is a sample code for this:

for (k = 2; k < MAX_WORD_LENGTH; ++k)

for (i = 0; i < N; ++i)

for (j = 0; j < N; ++j)

if (board[i][j] == word[k])

{

for all the "adjacent" cells of board[i, j]

if we table[k-1][adjacent_i][adjacent_j] is true

then table[k][i][j] = true;

}

*I have the C++ code, if you are interested to take a look leave a comment with your email address.

Run Time Complexity:

The run time complexity of this approach is obvious is pretty obvious. It's O (W * N * N * MAX_WORD_LENGTH). Here N is dimension of the board which is usually between 4 to 6. So essentially the algorithm runs in the order of the size of the dictionary!

I solved a 6 X 6 boggle game with a dictionary with 600,000 words in 0.002 seconds! With I/O it as about 0.57 seconds. But with the trie and binary search approaches, it took much longer!

So, here is the summary of my a weekend's worth of adventure in to the world of algorithms and data structures. Feel free to ask me any questions, or any bug in these algorithms:)

Credit where it is due: My friend Satej who is a ninja problem solver and a PhD student at the UCF explained the dynamic programming solution to me first before I refined and finalized it. Thanks Satej!

So many people have asked for the source code that I feel I better add the C++ code here:) The final, polished code for this is not in my home computer here, so if there is a bug please let me know!

Enjoy!

*Image courtesy: http://stackoverflow.com/questions/746082/how-to-find-list-of-possible-words-from-a-letter-matrix-boggle-solver

The game of Boggle is played on a N x N board (usually made of cubes that has letters engraved on it). Given a dictionary, you will have to construct words on the board following these rules:

i) The letters in the word must be "adjacent" to each other

ii) Two letters on the board are "adjacent" if they are located on the board next to each other horizontally, vertically, or diagonally.

iii) A word must be 3 or more letters long to be valid

The more words you can construct, the more points you get. Longer words get more points.

In computerse: given an N x N board, and a dictionary containing hundreds of thousands of words, construct those words from the board that are found in the dictionary.

There are a number of approaches that one can take to solve this problem. I designed three different algorithms to solve it. Here are they:

**First solution - Recursion + Binary Search**:In this approach, we recurse (using backtracking) through the board and generate all possible words. For each word that are three or more letters long we check to see if it's in the dictionary. If it is, we have a match!

Here is the algorithm steps:

1. Read in the dictionary in to a container in memory.

2. Sort the dictionary.

3. Using backtracking, search the board for words.

4. If a word is found and it contains 3 or more letters, do a binary search on the dictionary to see if the word is there.

5. If searching was successful in the previous step, print the letter out.

6. Continue step 3-5 as long as there are more words to construct on the board.

Complexity of this approach:

In this solution, we do a good job on the dictionary search. Using binary search, we are quickly finding out whether a word is in dictionary or not. But the real bottleneck is in searching the board for words. For an N x N board the search space is O((N*N)!). I am not exactly sure about this number, but you can find some discussions of it here: http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/.

(N*N)! is a HUGE number even for N = 5. So this approach is is impractical and out of question for any useful implementation.

**Second Solution - Pruned Recursion + Prefix Tree (also known as a Trie)**:From the previous approach, our major concern was the enormous search space on the board. Fortunately, using a a prefix tree or trie data structure we could significantly cut down on this search space. The reasoning behind this improvement is that, if a word "abc" does not occur as a prefix to any word in the dictionary there is no reason to keep searching for words after we encounter "abc" on the board. This actually cut down the run time a lot.

Here is the algorithm steps:

1. Read a word from the dictionary file.

2. Insert it into a prefix tree data structure in memory.

3. Repeat steps 1-2 until all words in the dictionary have been inserted into the prefix tree.

4. Using backtracking, search the board for words.

5. If a word is found and it contains 3 or more letters, search the prefix tree for the word.

6. If searching was *not* successful in the previous step, return from this branch of the backtracking stage. (There is no point to continue searching in this branch, nothing in the dictionary as the prefix tree says).

7. If searching was successful in step 5, continue searching by constructing more words along this branch of backtracking and stop when the leaf node has been reached in the prefix tree. (at that point there is nothing more to search).

8. Repeat steps 4-7 as long as there are more words to search in the backtracking.

Complexity of this approach:

This approach significantly improves on the first one. Building a prefix tree our of the dictionary words is O(W * L), where W is the number of words in the dictionary and L is the maximum length of a word in the dictionary.

Searching the board will be of the same order as the dictionary since we are not really searching words that are not in the dictionary. But in reality it will be more work than that as we still need to backtrack along the board to construct new words until we can consult the dictionary prefix tree to know whether it exists or not.

**Third and Final Solution - No search space + Dynamic Programming**:The 2nd approach mentioned above was good enough until the board size was 5. Unfortunately with a board size of 6, that too was taking forever to complete!

It got me into thinking - "Dang, this search space is still too big to search! Can I just get rid of it

*entirely*?" And then this idea popped into my mind: instead of random constructing word after word in this infinite ocean of words why don't I take a word from the dictionary and somehow magically check whether that's available on the board or not?It turns out, we can use a nifty dynamic programming technique to quickly check whether a word (from the dictionary in this case) can be constructed from the board or not!

Here is core point of the dynamic programming idea:

For a word of length k to be found (end location) at the [i, j]-th location of the board, the k-1'th letter of that word must be located in one of the adjacent cells of [i, j].

The base case is k = 1.

A letter of length 1 will be found (end location) in the [i, j]-th cell of the board of the only letter in the word matches the letter in the [i, j]-th location of the board.

Once our dynamic programming table is populated with the base case, we can build on top of that for any word of length k, k > 1.

Here is a sample code for this:

for (k = 2; k < MAX_WORD_LENGTH; ++k)

for (i = 0; i < N; ++i)

for (j = 0; j < N; ++j)

if (board[i][j] == word[k])

{

for all the "adjacent" cells of board[i, j]

if we table[k-1][adjacent_i][adjacent_j] is true

then table[k][i][j] = true;

}

*I have the C++ code, if you are interested to take a look leave a comment with your email address.

Run Time Complexity:

The run time complexity of this approach is obvious is pretty obvious. It's O (W * N * N * MAX_WORD_LENGTH). Here N is dimension of the board which is usually between 4 to 6. So essentially the algorithm runs in the order of the size of the dictionary!

I solved a 6 X 6 boggle game with a dictionary with 600,000 words in 0.002 seconds! With I/O it as about 0.57 seconds. But with the trie and binary search approaches, it took much longer!

So, here is the summary of my a weekend's worth of adventure in to the world of algorithms and data structures. Feel free to ask me any questions, or any bug in these algorithms:)

Credit where it is due: My friend Satej who is a ninja problem solver and a PhD student at the UCF explained the dynamic programming solution to me first before I refined and finalized it. Thanks Satej!

**Update (5/10/2012)**:So many people have asked for the source code that I feel I better add the C++ code here:) The final, polished code for this is not in my home computer here, so if there is a bug please let me know!

#include <cstdio>

#include <iostream>

using namespace std;

const int N = 6; // max length of a word in the board

char in[N * N + 1]; // max length of a word

char board[N+1][N+2]; // keep room for a newline and null char at the end

char prev[N * N + 1];

bool dp[N * N + 1][N][N];

// direction X-Y delta pairs for adjacent cells

int dx[] = {0, 1, 1, 1, 0, -1, -1, -1};

int dy[] = {1, 1, 0, -1, -1, -1, 0, 1};

bool visited[N][N];

bool checkBoard(char* word, int curIndex, int r, int c, int wordLen)

{

if (curIndex == wordLen - 1)

{

//cout << "Returned TRUE!!" << endl;

return true;

}

int ret = false;

for (int i = 0; i < 8; ++i)

{

int newR = r + dx[i];

int newC = c + dy[i];

if (newR >= 0 && newR < N && newC >= 0 && newC < N && !visited[newR][newC] && word[curIndex+1] == board[newR][newC])

{

++curIndex;

visited[newR][newC] = true;

ret = checkBoard(word, curIndex, newR, newC, wordLen);

if (ret)

break;

--curIndex;

visited[newR][newC] = false;

}

}

return ret;

}

int main(int argc, char* argv[])

{

int i, j, k, l;

FILE* fDict = fopen("dict.txt","r");

FILE* fBoard = fopen("tmp2.txt","r");

for(i = 0; i < N; ++i)

fgets(board[i], N+2, fBoard);

strcpy(prev,"");

int pLen = 0;

while(fgets(in, N*N + 1, fDict))

{

int len = strlen(in);

if (in[len-1] == '\n')

{

in[len-1] = '\0'; // remove the trailing newline

--len;

}

if (len < 3)

continue; // we only want words longer than 3 or more letter

for(i = 0; i < len && i < pLen; ++i)

{

if(prev[i] != in[i])

break;

}

int firstMismatch = i; // little optimization: building on previous word (will benefit if the word list is sorted)

if(firstMismatch==0)

{

for(i = 0; i < 6; ++i)

{

for(j = 0; j < 6; ++j)

{

if(board[i][j] == in[0])

dp[0][i][j] = true;

else

dp[0][i][j] = false;

}

}

firstMismatch = 1;

}

for(k = firstMismatch; k < len; ++k)

{

for(i=0;i<6;++i)

{

for(j=0;j<6;++j)

{

dp[k][i][j] = false;

if(board[i][j] != in[k])

continue;

for(l= 0; l < 8 && !dp[k][i][j]; ++l)

{

int ti = i + dx[l];

int tj = j + dy[l];

if(ti < 0 || ti >= 6 || tj < 0 || tj >= 6)

continue;

if (dp[k-1][ti][tj])

dp[k][i][j] = true;

}

}

}

}

// check if the word is tagged as found in the dp table

bool flag = false;

for(i = 0; i < 6 && !flag; ++i)

{

for(j = 0; j < 6 && !flag; ++j)

{

if(dp[len-1][i][j])

flag =true;

}

}

// dp table says its there, but make sure its in the board and it does not repeat a location in the board

if(flag)

{

//cout << "Checking word: " << in << endl;

bool verified = false;

for (i = 0; i < N && !verified; ++i)

{

for (j = 0; j < N && !verified; ++j)

{

if (in[0] != board[i][j])

continue;

memset(visited, false, sizeof(visited));

visited[i][j] = true;

if (checkBoard(in, 0, i, j, len))

{

cout << in << endl;

break;

}

}

}

}

strcpy(prev,in);

pLen=len;

}

return 0;

}

#include <iostream>

using namespace std;

const int N = 6; // max length of a word in the board

char in[N * N + 1]; // max length of a word

char board[N+1][N+2]; // keep room for a newline and null char at the end

char prev[N * N + 1];

bool dp[N * N + 1][N][N];

// direction X-Y delta pairs for adjacent cells

int dx[] = {0, 1, 1, 1, 0, -1, -1, -1};

int dy[] = {1, 1, 0, -1, -1, -1, 0, 1};

bool visited[N][N];

bool checkBoard(char* word, int curIndex, int r, int c, int wordLen)

{

if (curIndex == wordLen - 1)

{

//cout << "Returned TRUE!!" << endl;

return true;

}

int ret = false;

for (int i = 0; i < 8; ++i)

{

int newR = r + dx[i];

int newC = c + dy[i];

if (newR >= 0 && newR < N && newC >= 0 && newC < N && !visited[newR][newC] && word[curIndex+1] == board[newR][newC])

{

++curIndex;

visited[newR][newC] = true;

ret = checkBoard(word, curIndex, newR, newC, wordLen);

if (ret)

break;

--curIndex;

visited[newR][newC] = false;

}

}

return ret;

}

int main(int argc, char* argv[])

{

int i, j, k, l;

FILE* fDict = fopen("dict.txt","r");

FILE* fBoard = fopen("tmp2.txt","r");

for(i = 0; i < N; ++i)

fgets(board[i], N+2, fBoard);

strcpy(prev,"");

int pLen = 0;

while(fgets(in, N*N + 1, fDict))

{

int len = strlen(in);

if (in[len-1] == '\n')

{

in[len-1] = '\0'; // remove the trailing newline

--len;

}

if (len < 3)

continue; // we only want words longer than 3 or more letter

for(i = 0; i < len && i < pLen; ++i)

{

if(prev[i] != in[i])

break;

}

int firstMismatch = i; // little optimization: building on previous word (will benefit if the word list is sorted)

if(firstMismatch==0)

{

for(i = 0; i < 6; ++i)

{

for(j = 0; j < 6; ++j)

{

if(board[i][j] == in[0])

dp[0][i][j] = true;

else

dp[0][i][j] = false;

}

}

firstMismatch = 1;

}

for(k = firstMismatch; k < len; ++k)

{

for(i=0;i<6;++i)

{

for(j=0;j<6;++j)

{

dp[k][i][j] = false;

if(board[i][j] != in[k])

continue;

for(l= 0; l < 8 && !dp[k][i][j]; ++l)

{

int ti = i + dx[l];

int tj = j + dy[l];

if(ti < 0 || ti >= 6 || tj < 0 || tj >= 6)

continue;

if (dp[k-1][ti][tj])

dp[k][i][j] = true;

}

}

}

}

// check if the word is tagged as found in the dp table

bool flag = false;

for(i = 0; i < 6 && !flag; ++i)

{

for(j = 0; j < 6 && !flag; ++j)

{

if(dp[len-1][i][j])

flag =true;

}

}

// dp table says its there, but make sure its in the board and it does not repeat a location in the board

if(flag)

{

//cout << "Checking word: " << in << endl;

bool verified = false;

for (i = 0; i < N && !verified; ++i)

{

for (j = 0; j < N && !verified; ++j)

{

if (in[0] != board[i][j])

continue;

memset(visited, false, sizeof(visited));

visited[i][j] = true;

if (checkBoard(in, 0, i, j, len))

{

cout << in << endl;

break;

}

}

}

}

strcpy(prev,in);

pLen=len;

}

return 0;

}

Enjoy!

i am interested and need the C++ code

ReplyDeletecould you provide me them codes

cemgurel@laben.com.tr

Thanks for reading my blog!

ReplyDeletePlease check your inbox for the C++ code.

İ still havent received the codes yet do you mind sending me the codes again at cemgurel@laben.com.tr

DeletePlease send me a c++ code on yasha.khandelwal02@gmail.com.

ReplyDeleteThanks

Can you forward me the c++ code as well? vidurmayor88@gmail.com.

DeleteThanks :)

Hi,

ReplyDeletePlease check your inbox. Thanks.

hi,

ReplyDeleteplease could you send me the java code for this at dspds.naveen@gmail.com

thanks

Hi Naveen, I didn't do it in Java. I have C++ code. Let me know if want C++ code for this.

ReplyDeleteHi Golam,

DeleteSince your code seems to be so popular, I was wondering if you wouldn't mind sending the c++ code one more time to me at ivang1111@sbcglobal.net?

Thank you so much!!!

Hi,

ReplyDeleteCan u please send me the c++ code to radha.addala@yahoo.co.in

Thanks in advance

your blog is nice.... now i am currently working on a data structures project...if u send this c++ code it would be helpful.... thankyou

ReplyDeletesry my email is dsprojectmaze@gmail.com

ReplyDeletemy email is juzhax@gmail.com

ReplyDeletedo you mind to share the code too ?

Thanks everyone for showing interest in the code. I have sent it to all of you. Please check your inbox.

ReplyDeleteplease send me c++ or java source code

ReplyDeletethanks btw

alican@hotmail.co.jp

Very interesting. I'm curious for the source code.

ReplyDeleteCan you send it to me?

eydenjones@gmail.com

Thx

Sorry for the late reply. Please check your inbox. I sent the code.

ReplyDeleteCould you send me the code?

ReplyDeletebobsyouruncle@yarr.ca

Hi, that looks very nice.

ReplyDeleteCould you send me the C++ code to :

coffeehouse@free.fr

Thanks

can you send me the code

ReplyDeletepunjab1836@gmail.com

Hi, I'd be interested in seeing the code for this if you could email it to info[at]daveallanson.com

ReplyDeleteThat's really cool, whats the largest board you've tired it on? I'm interested in your code as well.

ReplyDeletelifeforce0 at gmail

Plz mail me c++ code at ashiniit@gmail.com.......

ReplyDeleteThanks a lot

please send me C++ code.

ReplyDeletejamesmark1007@gmail.com

Thank you

Hi all,

ReplyDeleteI updated the post with the source code. Please let me know if you find any bugs, etc. Thanks.

please send me C++ code.

ReplyDeletetzewei91@gmail.com

Thank you

This is a really awesome read, could you send me the code, jcp1217@gmail.com

ReplyDeleteTo those still asking for the code, I updated the post with the dynamic programming code. It's at the end of the post.

ReplyDeleteThanks for your interest in the code!

can you please send me the java source code its educational purpose i just wana know how it works please send it to

ReplyDeletep4l951@hotmail.com

java source plz

DeleteHi, I only wrote C++ code for this. It should be very straightforward to port it to Java.

Deletesure can you send that plz

DeleteCan you send me the c++ code @ m.garimella1@gmail.com

ReplyDeleteIf still available, I'd like to a copy of the c++ code

ReplyDeleteThanks larry@televox.cc

can u please mail me the java and c++ codes for the problem.

ReplyDeletethanks my id is abhinavsareen1@gmail.com

Can you forward me the c++ code as well? vidurmayor88@gmail.com.

ReplyDeleteThanks :)

can u please mail me the c++ code too.

ReplyDeleteswati_sj00@yahoo.com

Can you please forward me the C++ code?

ReplyDeleteakanksha.237@gmail.com

Thanks :)

plz forward me the c++ code

ReplyDeletetarunnarang1992@gmail.com

This comment has been removed by the author.

ReplyDeleteTo all those that wanted the C++ code, please check the end of the blog post. I added the C++ code in the post. Thanks.

ReplyDeleteHave you tried this with say a 100x100 boggle? I'm trying to compare your final solution with a c++ solver I implemented. My solver can do 100x100 in <7 seconds with a 178,000 word dictionary on a single 1.4ghz pentium. However using my solution, the time taken for larger puzzles does not scale linearly with the number of cubes.

ReplyDeleteHi, no I haven't tied my solution with a large matrix (like 100x100). I guess it will take much longer than 7 seconds. Did you use DP or something else?

ReplyDeleteI break the dictionary words into prefixes and store them in a hash table so look-up is O(1). "AND" becomes "A", "AN", "AND". I do a depth first search using the prefix dictionary to determine at each step if the string I'm building is part of a word. I'm going to post my code to git-hub if you're interested.

DeleteInteresting... Sure, would love to take a look at your code.

Deletegithub.com/themachineswillwin/fast-boggle-solver

DeleteThanks, I will take a look and get back if I have any comments.

DeleteThis comment has been removed by the author.

ReplyDeletecan u give the codes in netbeans java.please...

ReplyDeletecan u please mail me the c++ code too.

ReplyDeleteralg9098@gmail.com

can you please email me the c++ code?

ReplyDeletejoshua_chua90@yahoo.com

Great post. Enjoyed reading and perusing your code. You stored the board in a char array which is fine for all the tiles except for Qu. While the general approach would be the same, it seems that you would need to change your container and perhaps some of your search algorithms. Just a thought!

ReplyDeleteHi, can you send me the C++ code to cronix_psycho@yahoo.com

ReplyDeleteThanks

Hi, can I use your code in a commercial project ?

ReplyDeleteThanks!

Sure you can. But please attribute the code to this web page. Thanks. And I would love to know where you are using it!

DeleteHi Gola,

Deletehow to go about making a code for a NxN matrix using each alphabet only once......Would we be able to guess what would be the maximum number of words possible given a NxN matrix where N can be >10

hey do you think i can take a look at your program? thanks! pnchming@yahoo.com

ReplyDeleteUnfortunately, the dynamic programming solution is wrong over here as it ignores the fact that each cell can be visited atmost once. Your dp[i][j][k] can go through a cell many times as the state doesn't keep track of the previously visited cells.

ReplyDeleteHey.. Ur explanation is too good.. Thanks.. Can you forward the full c++ code to albertiitm @gmail.com. ... thanks...

ReplyDeletethe code above doesmt giv any output..

ReplyDeletecan u mail me d code?? its quite urgent ;)

anurag27k@gmail.com

thanks

Hi,

ReplyDeleteWould you mind to explain why solution number 3 which compare all possible

solution to dictionary is much better than solution no 2 ?

If you have 1millions words in your dictionary, than you have to iterate at least 1million x N x N.

Does it mean the trie solution is much better because once you build the trie you can reuse it for any other solution and each search will be log n ?

thanks for your explanation.