A sweary—and expertly punctuated—weblog.

Wednesday, March 10, 2010

Here we come a-qwantzling

I've made it no secret on this blog that I love Dinosaur Comics. I find the strip extremely funny—both intellectually and viscerally—in a way that I can't properly explain to people who don't share my appreciation. If you aren't familiar, you should give it a fair hearing. It could change your life for the awesomer.

Ryan North, the writer of Dinosaur Comics, embedded a puzzle in one of his recent comics. Inspired by the cryptographic messages of early modern scientists like Newton and Hooke, he encoded the strip's punchline as an anagram: "12t10o8e7a6l6n6u5i5s5d5h5y3I3r3fbbwwkcmvg", meaning that there are 12 't's, 10 'o's, etc. He left it to his readers to decode the scripts and offered prizes for the first person to return the correct punchline. That was over a week ago.

So far no one has solved it. It's like Excalibur. Or the Riemann hypothesis.

Realizing that his "qwantzle" is challenging, Ryan has been slowly giving out clues. So far, this is all we know:
  1. The solution is a single, reasonably grammatical sentence that fits the context of the strip. It begins with the word "I", contains a colon and a comma (in that order), and ends with a double exclamation mark!!
  2. Letters in the solution are capitalized as in the code, and there are no proper nouns; thus, combined with the first clue, all instances of capital I must be the word "I".
  3. All words in the solution have been used previously in Dinosaur Comics. (DC is searchable, and readers have put together a dictionary of all possible words. My untrimmed dictionary has 14,000 unique words.)
  4. The longest word in the puzzle has 11 letters, and the next-longest word has 8; these words appear sequentially in the solution.
  5. [RN recently posted a final clue: the largest word is 'fundamental'. It helps, I suppose, but I think most people had already guessed that, and in any case the search space is still obscenely large!]

Even with the clues, qwantzle is maddeningly hard. Naively, there are 97! letter combinations, and even if you incorporate all of the hints the number of possible word combinations is staggering—far too many for a computer to enumerate. So there's a small community of readers working on heuristic approaches to the problem, trying to combine human intuition with brute-force computational strength. But so far, most solutions (interestingly, readers have submitted many grammatical sentences that meet the criteria, but none of them has been correct) have come simply by guess-and-check.

I won't lie: I've spent more time than I care to admit on qwantzle. I've taught myself a new programming language and spent a few idle hours crash-coursing on computational linguistics. To show for it, I've developed two approaches that I thought were clever. One takes a valid solution, randomly deletes a few words, and forms a new anagram with the deleted words; this gives you a way to automatically explore variations on a solution that you think might be pretty close. The other performs a genetic algorithm on letter ordering alone. The letter orderings are ranked according to how well they correlate with DC dialogue, randomly mutated and crossed over, and made to compete in a pseudo-Darwinian process intended to improve the overall quality of the solutions.

But, despite (what I consider) reasonable creativity, these approaches don't work all that well. They WILL spit out technically valid solutions, but they aren't terribly grammatical. My next step should be to include natural language processing techniques—NLP is a new field with surprising success at computationally characterizing language as it is spoken and written—but I'm having a hard time being optimistic. In general, computers are far inferior to human brains at pattern recognition, and I struggle to believe that a computer could be made to recognize the right answer even if it found it.

I'm certainly on the lookout for new solution ideas, of course. But I think that the problem will remain unsolved until Ryan North finally gives out enough clues—at which time it will be solved by a human brain performing (computer-aided) guess-and-check.

[Note: I originally mistyped the anagram, so if any of you were working on the puzzle using my copy—and I hope you weren't—I'm very sorry and it's fixed now!]

4 comments:

g said...

pen and paper are about as high- tech as i get.

i imagine people have tried starting with the vocabulary of this particular comic itself, indentifying which words are possible and which aren't ... which ones might work with the occurrences of the the single letters in the anagram ... and which ones fit the word lengths given in the clues.

so for instance, discovery, discovering, discoveries are possible but not discovered (ten letters).

my gut feeling is that it should say something like: "i'll bet the reason is no body has made any discoveries worth stealing / found anything that any one wants to steal; when i do, i'll hide it ... / encode it, so that no one steals my idea!!"

but to get something like that to meet the criteria is of course not easy.

keep us posted.

Matt said...

If you do get sucked in to the problem, try the last link in the post. It has nice tools to facilitate pen-and-paper analysis.

I don't think anyone has specifically used the last comic as a corpus. It's actually a pretty good idea, since you'd expect those words to show up frequently, and you would seriously cut down on the size of your search space. But I worry that it's too small to have confidence that it contains ALL the words in the solution.

My guess for the 8/11 combo is "fundamental theories". Based on my extensive experience with DC, I speculate that the phrase goes something like: "I totally have fundamental theories you guys: [insert some blustery "theories" about parties or how T-rex is sexier or smarter than everyone else], woo!!" (I really hope the word "totally" is in the answer; there are enough letters to spell it out twice!)

I had a serious brain wave yesterday for a computational solution that might actually work. I'm going to code it up tonight. I'll let you know how it goes!

Unknown said...

For anyone solving this, I would simply like to note the great possibility of the word "totally" appearing in the solution.

Unknown said...

I just realized you just said that... Great minds think alike, huh?

Oh, that's a lie. Great minds think outside the box.

Post a Comment