Ryan North, the writer of Dinosaur Comics, embedded a puzzle in one of his recent comics. Inspired by the cryptographic messages of early modern scientists like Newton and Hooke, he encoded the strip's punchline as an anagram: "12t10o8e7a6l6n6u5i5s5d5h5y3I3r3fbbwwkcmvg", meaning that there are 12 't's, 10 'o's, etc. He left it to his readers to decode the scripts and offered prizes for the first person to return the correct punchline. That was over a week ago.
So far no one has solved it. It's like Excalibur. Or the Riemann hypothesis.
Realizing that his "qwantzle" is challenging, Ryan has been slowly giving out clues. So far, this is all we know:
- The solution is a single, reasonably grammatical sentence that fits the context of the strip. It begins with the word "I", contains a colon and a comma (in that order), and ends with a double exclamation mark!!
- Letters in the solution are capitalized as in the code, and there are no proper nouns; thus, combined with the first clue, all instances of capital I must be the word "I".
- All words in the solution have been used previously in Dinosaur Comics. (DC is searchable, and readers have put together a dictionary of all possible words. My untrimmed dictionary has 14,000 unique words.)
- The longest word in the puzzle has 11 letters, and the next-longest word has 8; these words appear sequentially in the solution.
- [RN recently posted a final clue: the largest word is 'fundamental'. It helps, I suppose, but I think most people had already guessed that, and in any case the search space is still obscenely large!]
Even with the clues, qwantzle is maddeningly hard. Naively, there are 97! letter combinations, and even if you incorporate all of the hints the number of possible word combinations is staggering—far too many for a computer to enumerate. So there's a small community of readers working on heuristic approaches to the problem, trying to combine human intuition with brute-force computational strength. But so far, most solutions (interestingly, readers have submitted many grammatical sentences that meet the criteria, but none of them has been correct) have come simply by guess-and-check.
I won't lie: I've spent more time than I care to admit on qwantzle. I've taught myself a new programming language and spent a few idle hours crash-coursing on computational linguistics. To show for it, I've developed two approaches that I thought were clever. One takes a valid solution, randomly deletes a few words, and forms a new anagram with the deleted words; this gives you a way to automatically explore variations on a solution that you think might be pretty close. The other performs a genetic algorithm on letter ordering alone. The letter orderings are ranked according to how well they correlate with DC dialogue, randomly mutated and crossed over, and made to compete in a pseudo-Darwinian process intended to improve the overall quality of the solutions.
But, despite (what I consider) reasonable creativity, these approaches don't work all that well. They WILL spit out technically valid solutions, but they aren't terribly grammatical. My next step should be to include natural language processing techniques—NLP is a new field with surprising success at computationally characterizing language as it is spoken and written—but I'm having a hard time being optimistic. In general, computers are far inferior to human brains at pattern recognition, and I struggle to believe that a computer could be made to recognize the right answer even if it found it.
I'm certainly on the lookout for new solution ideas, of course. But I think that the problem will remain unsolved until Ryan North finally gives out enough clues—at which time it will be solved by a human brain performing (computer-aided) guess-and-check.
[Note: I originally mistyped the anagram, so if any of you were working on the puzzle using my copy—and I hope you weren't—I'm very sorry and it's fixed now!]