User-generated content is a ubiquitous sight across the web, and platforms and search engines are doing their best to identify spam and UGC that have low user engagement. Google research has published a study with a new way to identify and measure the effects of UGC spam content. In the past, Google used a combination of email spam identifiers and other tactics to find UGC spam. In the recently released research, Google created a new system that creates new ways to identify UGC spam.

According to the paper’s authors, “In this paper, we develop the 15-item HaBuT scale, consisting of three sub-scales: Happiness, Burden and Trust that measures user experience with respect to UGC spam. The items in the instrument are analyzed using confirmatory factor analysis with a sample of 700 responses from internet users. This process resulted in an instrument of high reliability and validity. The instrument is a valuable tool for researchers and practitioners interested in designing, implementing, and managing systems that rely on user-generated content and to those studying the impact of UGC spam on user experience.”

The researchers were able to identify five basic kinds of spam content that appear the most in UGC materials. There was Gibberish content: e.g., asdsad jksjfs sdhd; Irrelevant content: e.g., Review of a movie for a gaming app; Solicitation content: e.g.Follow me on twitter @xxxx; Abusive language content: e.g.idiotic dirty morons; and Promotional content: e.g., Instant cash discount, register now.

As search engines like Google try to remove spam UGC content, the job becomes harder for the platform. As one of the paper’s authors noted, “The effort required to reduce the spam rates from 5% to 0% is much higher as compared to bringing it down from 10% to 5%.” The same author went on to say, “…There is a business need for prioritization based on which spam types impact user experience negatively and invest time and resources towards building automated systems to tackle those.”

The point of this study was to see how UGC spam affected the way people trusted the content they find on sites with UGC. Surprisingly, most kinds of UGC spam have a negligible effect on the overall experience of the visitors to the website. For example, abusive language spam had a more noticeable impact on consumer experience. It’s more worthwhile for Google and related properties to focus on abusive language rather than worry about spam based on solicitation or promotional materials.

For some practical applications to this study, in the video that accompanies the research, Dr. Sowmya states that away she discovered to build trust in Google Play is to show the top positive review and the top negative review. Doing so, she says, gives the user a sense of the pros and cons of a product. 

As was reported in another piece about this study, the video also recommends user surveys to measure happiness levels after the solution has been applied. This kind of data can demonstrate to the stakeholders that the initiative was or was not successful.

For more news about identifying and handling spam content, read this article on identifying and eliminating online spam content