Word clouds are a well-known information visualization technique used to graphically convey the most frequently-occurring words in a body of text. Words are arranged in a "cloud" shape such that the most commonly-occurring words appear closest to the middle of the cloud, with less common words around the periphery. The size of a word within the cloud is correlated with its frequency; common words are also rendered in a larger font than less common words. As a result, word clouds provide a convenient way of conveying what a passage of text is "about"; at a glance, you can quickly recognize reoccurring words (and thus, reoccurring themes) within a body of text.
With ReviewCloud for Steam, I sought to apply this technique to user-written reviews for PC games on the digital gaming service, Steam. My goal is for a ReviewCloud to provide a quick, at-a-glance view into how what the public likes (or hates!) about a given game.
- In order to install ReviewCloud for Steam, you will need a browser plug-in capable of running Greasemonkey scripts. If you are running Firefox, download Greasemonkey. If you are using Chrome, download Tampermonkey.
- Once you have installed Greasemonkey or Tampermonkey, simply click here to download ReviewCloud for Steam. Greasemonkey or Tampermonkey will automatically prompt you to install the script.
- Once you've installed ReviewCloud for Steam, simply visit the store page for any game on Steam (e.g. Half-Life 2.) A ReviewCloud will appear automatically just above the "Reviews" section of the page.
HOW DOES IT WORK?
ReviewCloud for Steam is conceptually simple. When a user loads a game's store page, the Steam website presents five of the "most helpful" reviews for that game. The user has the option to press a button to load five additional reviews at a time. When pressed, this button sends a signal to the Steam servers to request more reviews; ReviewCloud for Steam simply simulates these requests to gather reviews without any input from the user. Once the script has accumulated roughly 50 reviews, it parses and filters the reviews to produce a list of word counts. It then feeds these word counts to jQCloud to render the ReviewCloud on the page.
Implementing ReviewCloud for Steam was surprisingly easy; it only took a few hours to get an initial prototype working. The real challenge was making the ReviewCloud interesting & meaningful.
When my first implementation produced a cloud which was full of common English words ("the", "a", "of", "to", etc.) it became clear that some form of filtering was necessary. The current version of the script filters roughly 900 of the most commonly-used English words. Some types of words are less interesting than others. Conjunctions (e.g. "but", "so", "because") tell us nothing about the game being reviewed, and so those can safely be eliminated from the cloud. Adverbs (e.g. "quickly", "nearly", "happily") also tell us very little because without knowing the context in which they occur, most of the meaningful information is lost. Nouns and verbs tend to tell us much more about the reviewers' opinions. Producing a good filter is a difficult balancing-act: If you filter too much, you risk censoring interesting information. But if you filter too little, you risk becoming overloaded with useless information. I anticipate that I will continue to tweak the filtering rules as development continues.
Once my filtering rules were established, I quickly noticed another issue: a lack of context surrounding the words which appeared in the cloud. For example, on some pages, the word "ending" appears quite prominently. But does that mean that reviewers liked the ending? Or did they dislike it? Thankfully, Steam forces users to categorize their review as either positive (thumbs up) or negative (thumbs down.) By tracking how often a word appears in positive reviews versus negative reviews it became possible to add much more nuance to each ReviewCloud. I added a coloring scheme so that words which have a strong positive association (i.e. words which appear primarily in positive reviews) are colored blue, whereas words with a strong negative association are colored red. Controversial words (which appear equally-often in both positive and negative reviews) are colored grey. This coloring scheme allows an observer to easily see which aspects of a game reviewers like, and which aspects they don't.
I'd like to thank Luca Ongaro, who wrote the open-source jQCloud script used in ReviewCloud for Steam. jQCloud makes generating word clouds easy!