What could be more fun than playing “The Forest” and embarking on a two hour thrill ride in a mutant-infested cave? Visualizing the gameplay transcript using an arsenal of free tools, of course. In this article I talk about the process and lessons learned of conducting sentiment analysis of a gaming session. Warning: there may be images and language in this article not suitable for children.
The Forest is a multi-player survival horror game set on a mysterious island. In the manner of LOST, a commercial flight crashes and strands the players on a mysterious forested island that is largely peaceful by day and largely terrifying at night. Players balance their time between low key tasks — hunting for berries, building forts, crafting tools — and gut-wrenching plunges into a deep and monster-infested cave system in search for supplies and clues about the island’s history.
A few weeks ago, my friends and I logged into The Forest. As is the tradition in our group, we recorded the gameplay and later posted it to YouTube. (One day my children will uncover the video and know just how terrified their father was in zombie horror situations.) Ours was a varied session — we gathered together, sailed around the island in a makeshift boat, established a base (complete with catapult!) on a remote sandbar, explored the forest, and discovered an entrance to the island’s branching cave system. In the darkness we discovered a terrifying, many-limbed monstrosity wading through the corpses of tiny mishapen mutants. After a desperate clash, we just barely managed to defeat it. We spent the remainder of our time exploring and, emerging back into the sunlight, building up our fortress for the next expedition before logging off.
Told as narrative, it seemed to me that the story was structured similarly to many books and movies. We had a cheerful beginning with friends finding each other, a period of rising action as we descended deeper into the cave system, a terrifying climax, and a period of declining tension as we returned to our base and prepared for our next session. I was curious whether the transcript told a story that aligned with what existed in my head.
But how to obtain the transcript? What I had was a YouTube video of our conversation — what I needed was a time-stamped spreadsheet of the words of our conversation that could be visualized!
As it turns out, there are multiple ways to generate a transcript from a YouTube video. In order to auto-generate close captioning, YouTube itself generates a transcription of the dialog using Google’s own speech-to-text engine. To extract this information, you can dive into the source XMLof a page, use a variety of online extraction services like ccSubs or DIYCaptions.com, or, as I did, use a free downloadable called 4K Video Downloader. At the end of about 30 minutes fiddling, I had a raw transcript file. 8,000 lines long!
At first I was flush with success at generating a transcript, but upon review of the output I realized there were plenty of limitations to the method I had used. Some of these were resolvable. There were, for example, font color inserts throughout the transcript and garbled HTML replacements for apostrophes that could be removed with a careful search and replace query. Less easy to address were the underlying imperfections of an auto-generated transcript:
- The words of my friends and me were not separated by speaker, so the transcription is the disjointed babble of an apparently sole voice talking to itself in an unbroken monologue (forget comparing the relative tones of speakers, or doing an analysis of who talked more!).
- The transcript did not divide up our speech by complete sentence, but instead broke it into 5–7 second chunks which often contained multiple one word exclamations or fragments of a longer speech.
- At the level of words captured, the speech-to-text service itself proved to be imperfect. My friends and I said many things in our clash with the mutants of the forest, but I’m pretty sure none of us whispered “you Kevin Bacon, well I’ll good keep that.”
In hindsight, there are dedicated transcription tools (some of which that cost money) that might have recognized the different speakers and created a cleaner output. But in my quest to do everything for free (or nearly free) I decided not to wade too deeply into the field of transcription services.
The imperfections of my spreadsheet notwithstanding, I now had something that could be analyzed. Something I find fascinating about data is how much information comes in a small package. Even without analysis, my spreadsheet of information could be refined into many dimensions: words, relative timestamp start and end, and, since I knew the time that we started playing, the approximate times that these words were spoken.
Next, the sentiment analysis! I ended up using a Add On for Google Sheets called AYLIEN Text Analysis which reviewed a column of text and generated an analysis of (among other things) the polarity of each phrase and its level of confidence in that assessment. Now each phrase was labeled as “positive”, “neutral”, or “negative” and was accompanied by a scale of 0 to 1, with 1 being very high confidence. I had to shell out $10 to complete the analysis of my very long transcript, but I considered the one-time expense worth the learning investment.
Here again, though, I ran into challenges. Sentiment analysis, particularly automated sentiment analysis, usually needs to be taken with a grain of salt. Without a human layer of correction and training, many sentiment analysis tools fail to capture the tone of what is being said. A scared group of men who wail, for example, “oh God oh God oh God oh God oh God” upon seeing a hideous stuff-of-nightmares monstrosity loom out of the darkness are not expressing a positive sentiment, as my tool initially surmised.
With these caveats in mind I was ready to visualize the data. Here I turned to Tableau Public, the powerful freeware version of Tableau Desktop. It was time to see if a simple narrative emerged from our 8000 line transcript. Putting time on the X-axis and the number of phrases per minute on the Y, I colorized the phases by polarity. This was the initial result:
Completely unreadable! But predictable, once I though about it. Our sentences were, of course, a hodgepodge of positive, neutral, and negative sentiments. In order for this style of graph to be readable, we would have needed to speak only happy thoughts for one stretch of time, and negative thoughts in another, which is not the way that ordinary conversations go. I also realized that this graph was not explorable — I had no idea what phrases were being said when.
Back to the drawing board. I decided the break out the string of sentiments into three separate graphs so that I was tracking positive, neutral and negative sentiments independently. I also opted for a bar chart, so that each individual phrase could be read.
Better! With this format it was possible to isolate the different sentiment tracks and see whether there were minute-by-minute increases and declines in particular sentiments being expressed. It also appeared that there were trends in each track, noticeable waves of negative sentiment and gaps in positive sentiment. Could these waves align with events occurring in the video?
I wanted to refine this graph so that viewers could see events occurring and the sentiment that emerged from our reactions. In order to accomplish this I would need to address the problems emerging from the issues listed above, and extract the maximum possible meaning from an imperfect sentiment analysis of a garbled transcript.
To do this, I created a filter of the data so that it would be possible to eschew all but the phrases that the analysis was certain about. Ambiguously worded phrases were cast aside and only phrases with strong sentiment (phrases with swears, for example, or positive words like “great” or “cool”) were retained. Finally, I rewatched the video and made note of points that I considered to be significant, marking the timestamp on each. These events were displayed beneath the sentiment analysis so that it would be possible to compare what we were doing with what we were saying.
Here is the final result.
Now I had an explorable graphic, where noticeable peaks of positive and negative sentiment accompanied key events! Notice the spike of negative comments as we enter the cave system around 10pm, and the wave of negative statements that follow our encounter with the cave monster (eg, “kill it literally kill it with fire” — yikes.) My friends and I got quite a chuckle exploring this information.
The result of this Saturday afternoon’s labor is an imperfect graph that — let’s be honest — only I and my friends will ever truly look at and enjoy. But in going through this process, I can identify several steps where would-be analysts following a similar process can improve upon a refine. A superior transcript and a deeper sentiment analysis may have resulted in an even cleaner narrative to visualize.
But I also hope people take away a sense of optimism about what is possible with free tools. With the exception of a few dollars I threw down to get a sentiment analysis of a particularly long video (had it been an hour shorter I could have done it for nothing) I spent no money to generate the complex visual you see above — not on the video recording tool we used to record our session (OBS), not on the tool we used to download the YouTube transcript (4K Video Downloader), not on Google Sheets, and not on my visualization software, Tableau Public. If we can do these visualizations of conversations between friends, think of what social good could be accomplished with the array of tools at our disposal.