PoemToday is a simple Rails web app that statistically generates a poem based on its users' profile and behavior. Each of the 6,000 poems on the site actually has a link wrapped around every word in every poem. When a user clicks one of the words, the site initiates a search of its database for the best-matching poem and redirects the user to the top result, alongside with the top image from the Flickr API for that word.
Behind the scenes, PoemToday is storing information about all of the words the user has clicked and the poems the user has visited in a temporary session. When the session has enough data, it statistically generates a completely unique "ephemeral" poem with a statistical process known as a Markov chain.
A Markov chain (discrete-time Markov chain or DTMC) named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another on a state space. It is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.
A generator can make more interesting text by making each letter a random function of its predecessor. We could, therefore, read a sample text and count how many times every letter follows an A, how many times they follow a B, and so on for each letter of the alphabet. When we write the random text, we produce the next letter as a random function of the current letter... We can extend this idea to longer sequences of letters [and whole words and poems].
Markov Chains are known for their simplicity, as well as their ability to produce remarkably "real" results. The Markov poems generated by PoemToday sound uncanny, like they were written by humans. Lifelike results from Markov Chains can be generated from any sufficiently large dataset, whether text or other input, and have many useful applications. For example, Google's world-changing PageRank algorithm uses a Markov Chain at its core to traverse and score pages across the web. Another fun application of Markov Chains: Garfield. Programmer Josh Millard has created a fascinating app which generates a Garfield strip with Markov dialogue in real-time.
Markov Chains operate without memory of their previous states or matches. Matches are chosen randomly, as if by roll of the dice, based on the inputted data, every time. This is unusual for matching algorithms, which often rely on their historical matches as a factor in determining future ones. In the spirit of the memorylessness of Markov Chains, randomly-generated poems on PoemToday aren't stored. When you leave or reload poemtoday.com/ephemeral, the session is cleared and your Markov poem is gone.
PoemToday also features a daily email option. The app will email you a poem every morning, along with information about why that poem was matched to you on that day. The daily email matching is based on whatever inputs you share with PoemToday. Currently available inputs are first name, birthday, location and Twitter handle. So, for example, if you tell PoemToday your location, it will look up your weather forecast and match words in your location's forecast summary with poems in its database.
Special occasions, like holidays, and New York Times' Most Emailed articles for that day also serve as inputs across all users. See the algorithm Gist below (you may need to refresh your browser).
When matching users' keyword sources with poems every morning, PoemToday takes into account where in the poem the keywords were found, as well as the frequency of the word's occurence in English language usage, which is provided via the Wordnik API. Extremely high frequency words, such as articles and prepositions, are automatically removed with a high cut-off score. Frequency is like a golf score, the lower the better. Thus more points are assigned to rare words by subtracting all keyword frequencies from 1000. When applicable, the word "birthday", first names and holiday names are treated as keywords and given a score of 0, so they automatically trump other keyword and sources.
Daily Email matches are saved on the user's homepage, with details about the match, including links to original keyword source where applicable (eg, with Tweets or New York Times articles). Future development plans include additional user feedback mechanisms for the matching algorithms and a/b testing a Natural Language Generation (NLG) engine against the Markov chain.