Cluster Analysis View Basics

This post provides a suggested approach to using the basic cluster analysis method. It assumes you have a project open in Symphony.

The basic cluster analysis compares the text of all the comments against each other, then groups them according to similarities.

Perform Initial Analysis

    1. Open Cluster Analysis view.
    2. Set a filter on the content you want to analyze. If none of your content is coded, you don’t need a filter.
    3. For Analysis Method, ensure that “Compare comments with each other” is selected.
    4. For Minimum Cluster Size,  set the value to 1%-2% of the Comments in your analysis.
    5. Click the Refresh button on the main toolbar or press function key F5.When the analysis is complete, the suggested clusters will appear in the Cluster List on the Clusters tab.

Consolidate Clusters

With the basic cluster analysis, it is likely that you will have clusters where the underlying comments have a common theme. This will happen more with smaller settings for the Minimum Cluster Size and with higher values for the Multi-Theme Cutoff %.

  1. Locate two clusters where the names of the clusters are similar. (For example, in an employee engagement survey, you might have one titled “Good Teamwork” and another titled “We have an effective team environment”.)
  2. Click on the cluster you want to consolidate, then drag and drop it on the cluster you want to keep. The underlying comments will be moved into the remaining cluster.

Clean Up Clusters

Ultimately, you want each cluster that you are going to code to have only the comments in them that you want coded. Ideally, you also want each comment to contain only one theme, or at least you want to code it to as many themes as it contains. (Symphony allows you to code each comment to one theme but provides ways of duplicating comments so you can code them as many ways as you like. If you don’t like duplicating comments, you can either split them or see whether Tags are a solution.)

Remove Comments from Clusters

To find comments you don’t want in a cluster, the best place to start is at the bottom of the cluster. By default, the comments are organized in the clusters in descending score order, meaning the comments with the best fit appear at the top of the list. To remove a comment from the cluster, you simply highlight it and click the Remove From Cluster button directly above the comments list.

Split/Confirm Multi-theme Comments

Comments with multiple themes take a little more effort. For starters, they fall into two groups: those where Symphony “recognizes” more than one theme, and those where Symphony does not. You cal tell which ones Symphony believes have multiple themes by looking at the “Clusters” column. This is a count of clusters Symphony came up with for the comment. If your approach is to have copies of comments made (duplicates) for multi-theme comments, you don’t have to do anything with them; if you choose to code all of them, duplicates will be created. If however you don’t want to have duplicates created, you need either to split the comments, or make a decision that Symphony is wrong about the additional themes and ensure that the comment appears in only one cluster – in which case you remove it from the clusters as described above in “Remove Comments from Cluster”.

Alternatively, you can split the comment. You do this using the standard Split Comment capabilties. It is important to note that when you split a comment, the update to your content takes place right away. This is different from the cluster analysis itself, which doesn’t actually change your content; it merely displays the comments in the clusters. If you choose to split comments, it gives you an additional approach to cluster analysis. You can for example do nothing but split comments at this point, then  run the analysis again. The result will reflect the effects of split comments, in that the split will be evaluated separate from the text of the original comment.

Working from Comments Tab

So far what I’ve told you about cleaning up clusters has taken place on the Clusters tab, which consists of a list of clusters plus a second list that displays the comments contained in the highlighted cluster. The Comments tab contains the opposite: a list of all the clustered comments, plus a second list that displays the clusters in which the highlighted comment appears. That is, when you click on a comment, it shows you which cluster(s) it is in. The comments list here also includes a column called “Clusters”, which tells you how many clusters each comment appears in. So what’s useful is to click this column twice to get it in descending order so that the comments that appear in the most clusters rise to the top of the list.

Once the clustered comments list is sorted on Clusters, you can then compare individual comments — specifically those that are in more than one cluster — with the clusters they are assigned to and make the same decisions about them as you would have been making on the Clusters tab. That is: you can removed comments from clusters, you can split comments, or you can choose to leave them as-is.

To remove a comment from a cluster, you simply click the cluster entry in the Clusters list and click the Remove from Cluster button.

Coding the Data

Everything that’s been done so far has been with a singular goal in mind: code the data. By now, some portion of your comments are organized into clusters, and the quality of these clusters is near to — or better than — your standards. Whatever the case, the primary intent of Cluster Analyis view is to reduce the overall coding effort, thereby reducing costs and compressing the analysis timeline.

At it’s root, coding can be pretty simple: just highlight the clusters on the Clusters tab you want to code, then click the “Code Highlighted Clusters” button. This might be sufficient. Usually however, there are residual comments that have been allocated to more than one theme. If you were thorough in the “Clean Up Clusters” section above, the only places you will see this is where you agree with Symphony — that the multi-theme comments do in fact contain multiple themes. You now need to decide what to do about them.