RSS Feed

May, 2015

  1. The basics of a good comment system

    May 22, 2015 by max

    Comments are everywhere. From facebook and reddit to your local newspaper. Yet a lot of sites (cough… newspapers… cough) seem unable to maintain comment sections of an acceptable quality. Looking at it from the perspective of someone that has followed the development since Slashdot started taking comments seriously in 1997 a short guide may be in order.

    This post will outline the basics of a good comment system. Note that since this is a vast field some finer nuances may be omitted, and my personal opinion and preferences will probably shine through. These are just the basics. Also, I won’t get into technical implementations since that would be too much for one blogpost, and will be dependent on which language/framework is used.

    Also note that this guide applies to sites that have many users and many comments. If you  have a small blog where each post gets 5-10 comments you should just use a standard comment system, or maybe disqus.

    So let’s get started.

    The basics

    A good comment system consists of three parts:

    1. User profiles. This may seem obvious, yet I still see sites that don’t have it. The profile is a users identity when he is commenting on a site.
    2. An upvoting/downvoting mechanism. There are many implementations, among them are  facebook likes, Reddits up and downvotes and slashdots dropdown choices between interesting/insightful/funny//informative
    3. A sorting algorithm that will sort the comments based on input from user profiles and the upvoting/downvoting mechanism. This is the vital part that sorts the quality comments from the inevitable trolls, me-too posts and conspiracy nuts.

    Let’s look a bit more closely at each of these three components.

    User profiles

    The primary function of the user profile is, of course, to identify the user. But it has several other uses that are just as essential

    • Giving the user an identity. The more of an identity a user has on a given website the more he will feel a part of a community, adhere to rules and netiquette and ultimately write better comments.
    • Identifying good and bad citizens. Users that write good comments will often do so consistently, users that write bad comments will often do so consistently too. This can be used in the algorithmic placement of comments.
    • The ability to “get to know” other users. A comment stating that something is completely wrong may either be very insightful, very stupid, or just trolling – it depends on the context and to a large degree the user. Being able to recognise the user, look through his comment history, and maybe see his profile adds a lot of context and value.
    A voting mechanism

    The voting mechanism allows others to vote on a users contributions. This is the primary input for the voting algorithm in assessing how valuable a comment is – a comment with a lot of positive votes is inevitably more valuable than one with none. Voting also serves two subtle psychological purposes. One is to allow users to easily approve or disapprove of a comment, and the other is to give some (hopefully positive) feedback to the commenter. Both help user retention and the feeling of being part of a community.

    There are a number of different implementations to choose from.

    • likes. Probably the most known and versatile voting mechanism popularised by facebook. It has the intrinsic advantage that it is cognitively easy to parse – even your mom knows what it means to like something. Pressing the thumbs-up icon is done millions of times every day by non-technical users. It’s a safe bet, and probably the right thing to use if your demographic is old or not net savvy.
    • Upvotes/downvotes. The up and down arrows popularised by first Digg and then Reddit are for a slightly more tech-savvy crowd. An upvote and a like are of course technically the same, but the psychology behind them is slightly different. A like is exactly what it says, whereas an upvote can mean “I like this”, “this is a worthy and interesting comment”, “this adds to the conversation” or something else. It depends on the site, and it is not trivial to convey to users what an upvote means. Some sites, such as hacker news have a stated set of guidelines that users (surprisingly!) adhere to, whereas on a site like reddit an upvote means different things whether you’re in the “askhistorians” subreddit or the “aww” subreddit. Downvotes are of course the alter ego of upvotes, but you need to think about the psychology behind them before you implement them. Upvotes are a positive acknowledgement, downvotes are a negative acknowledgement which may deter users from coming back.
    A sorting algorithm

    One of the sad facts about the Internet is that 90% of what people write is crap. 99% if you set the bar high, 80% if you’re an optimist. This applies to comments too, and it means that without some intervention a reader is forced to read through 10 crappy comments before reading a good one. Most people don’t have the time for that.

    That’s why sorting comments is important.

    The ultimate goal is to present the reader with the good comments, and allowing him to skip over the bad ones. The goal of a sorting algorithm is to find these nuggets, and the job of the UX people is to present the nuggets to readers in the best way possible. Note that these are 2 different things. The algorithm calculates a score, and the design presents the comments to the user based on this.

    Most algorithmic sorting systems are primarily based on other users votes, but presenting only the comments with the most votes to other users presents a problem;  How will new comments gain votes? What if comment number 100 is incredibly insightful, but gets no votes because no one reads through 100 commetns before they see it? The way this is solved is to create an algorithm that makes sure that comments with many votes rise to the top, but also makes sure that new comments are seen and have a chance to get voted on.

    Probably the most simple version, that works surprisingly well, is Hacker News. The algorithm is as follows:

    Score = (votes-1) / (time since creation in hours+2)^1.8

    If you’re mathematically inclined you’ll see that votes add to the score and time subtracts from the score. Since comments are listed according to score this means that new comments start at or near the top, allowing other users to see them and vote on them, but quickly fall down the page if they receive no upvotes. Thus the playing field is more level, and late comments still have a chance to rise to the top.

    Reddit’s sorting algorithm works on the same principle of presenting users with a list of comments sorted by score, but the score is calculated somewhat differently. It uses Wilsons score interval, an algorithm developed by Edward B. Wilson in 1927(!). The idea is that you sample each comment when it is voted on, and give it a score. It’s basically like polling each comment when a vote is cast on it. The comment system is created by Randall Munroe of XKCD fame, and he has written a very readable blogpost about how it works here

    Amix.dk has a good run through of both Reddit’s and *Hacker news‘ algorithms.

    Facebook’s sorting algorithm is complex, often changing and a well kept secret – so it’s hard to say something meaningful about how it works. At least something meaningful that doesn’t change in a week.

    The old-timer slashdot solves the problem somewhat differently. The comments are listed chronologically, but the ones that receive few or no votes are hidden from view, and require an active click to view. Since their voting system is a dropdown of insightful/informative/interesting/funny you can choose to sort by one of these if you just want to see the funny comments. Or the insightful ones. The advantage of this solution is that it keeps the chronological nature of the comment section intact, while still presenting only the best comments to the user.

    Note that the above sorting algorithms are just the basics, and that you can, and probably should, add and experiment to get it right. Maybe you should include users average comment score in the algorithm, maybe you should add a negative weight to new users, maybe votes from moderators should count double. The possibilities are endless. This is also why it’s important to keep the sorting algorithmic separate in your code base so you can continue to tweak and perfect it.

    Moderation

    If you have a reasonable amount of comments you need moderation. There will always be trolls, personal attacks, haters and just assholes and you need to do something about them because they will infest your community and drive the good users away if you don’t. Nobody wants to spend time writing a thoughtful comment that will be lost in a sea of swearwords, illuminati conspiracies and presumptuous premises. This is a cumulative effect; Once you start having bad comments (for some definition of bad, that obviously depend on your community) they will attract more. The same goes for good comments. This is why moderation is important.

    Good moderation is a combination of human and machine effort. The most blatant spam can be caught using standard techniques such as bayesian filtering, but reasoning about the validity of comments above a very low threshold is still beyond algorithms. There are a few different technques that can be employed:

    Algorithmic sorting

    The voting algorithm will get you a long way, especially if you have downvotes. Comments that have a sizable amount of downvotes can automatically sink to the bottom, where few people read them. Hacker News has a rather clever system where the text-color of a comment  gets closer and closer to the background color the more downvotes it receives. After enough downvotes it is invisible.

    An additional measure is a “report spam” button that lets users report spam comments. This is useful, since it’s  clear indication that when a user presses it it is because he thinks a comment is spam. The system should, however, not just delete the comment since this is an easy way to cheat the system and remove comments that you disagree with. Instead a system should be employed where a report button is incorporated into the moderation system, such that the action taken is based on a more nuanced set of parameters. These could include the reporting users previous posts, average score, or time since creation, it could include the same parameters from the writer of the comment, and it could send a message to the moderators. Bringing us to…

    Moderators

    Moderators are the humans that make sure everything works as it is supposed to. These can either be paid moderators, which quickly gets expensive, or it can be powerusers that volunteer. Typically a hierarchy is employed with paid staff at the top that have a number of volunteers below them. The job of he paid staff is to find and keep good moderators, tweak algorithms and do normal housekeeping. The job of the volunteers is to moderate comments. One important reason for having volunteer moderators is to have a better response time. If moderation is only done by normal employees response times for commenting is typically slow, both because people have other things to do, and because there typically will be no moderation after working hours. A well-kept volunteer based system on he other hand will have almost instant moderation.

    banning

    Some users just won’t learn. Maybe they are trolls, maybe they have a personal agenda, or maybe they just have nothing better to do. To have a well functioning community you need to get rid of them since they can quickly infest and degrade your comment section. Banning can either be automatic, or done by moderators, and a ban can either be on the userprofile (with the disadvantage that he can just create another) or IP adress (with he disadvantage that others from that IP can’t join the discussion, and the problems with dynamic IP’s). There is no proven way to completely ban a user and make sure he doesn’t come back, short of requiring personal ID which is probably taking it a bit too far. For most sites it’s a whack-a-mole game, but the more effective you are at weeding out, the smaller the problem becomes as bad users find out that their comments won’t be read anyway.

    A clever way of keeping bad users in a trap is hell-banning – they will see their comments on the site, but they will be invisible to everyone else. Often they don’t realise this, and wonder why their snarky comment doesn’t trigger a response, not realising that they are the only ones to see it. Eventually they will get tired and go somewhere else. A particular insidious version of hell-banning is to let hell-banned users see comments from other hell-banned users.

    Transparency

    Experience suggests that at least some transparency is important for a good community. If you just delete comments users are prone to start speculating and eventually get angry. Conspiracy theories about the political bias of moderators, personal agendas and the like are bound to pop up. So are comments about it, and they typically don’t add to the conversation. A good start is a set of guidelines, that state what is and isn’t allowed. Being able to contact moderators is another good measure. Flicking a switch that allows users to see deleted comments is another good way. Sending an automated message with the guidelines is another.  Just deleting comments with no reason is a bad idea unless it’s obviously spam.

    Some sites such as Hacker News choose to keep the identity of moderators secret (or at least not publicly available) whereas sites such as reddit has visible moderators for each subreddit that are free for all to see. Slashdot employs a unique system where some users are granted moderation abilities for short timeperiods based on their past acions. This approach crowdsources the moderation to all users, and may be more fair and has the advantage that there is not the the potential for one moderator with a political agenda, a personal vendetta or other non-desirable behavior.

    Design and usability

    Design and usability are important factors. You should strive for a system that makes it easy for new users to join the conversation and if you have the resources give advanced possibilities to advanced users.

    The sign-up proccess

    The sign-up process should be easy and hassle-free; username and password and maybe e-mail should really be enough. Full name, number of pets, where you are from and sexual orientation is just filler that will drive new users away. I have seen some sites try to use the sign up process to minimise spam comments by requiring phone numbers or real ID’s. I have seen no data to suggest that this works. If your strategy for minimising spam and bad comments is to make it harder to sign up you’re doing it wrong. Facebook is an exception here – the only reason it works is because they have massive network advantages.

    Writing and reading comments

    Writing a comment should be easy. Again, making it hard or limiting users possibilities doesn’t help much against bad comments, but it definitely hinders good comments. This is also the wrong place for moderation. Most well functioning comments seem to have some kind of markdown, so that users can style their comments. This is a big win for longer comments, that otherwise would just be a wall of text. Typically styling is limited to simple things such as bold, indented, headings, links and unordered lists. Not much, but enough to make a long comment readable. It’s not an absolute must, but with all the free markdown editors availabel it’s an easy implementation. I have seen some sites limit comments to 500 or 1000 characters, and I’m pretty sure this is a terrible idea. You end up with complaints, comments in 2 or 3 parts, and noone writing thoughtful comments without any apparent upside.

    Anonymous posting may have its merits if the conversation is fickle and involves whistleblowing, sexual orientation, personal problems, or a number of other subjects. Typically users will create a throwaway account that will only be used for one comment thread. In my experience some of these anonymous postings are incredibly interesting  because they touch on subjects that are normally taboo in one way or another. Slashdot has an interesting twist on snonymous posting; When you are logged in you can choose to post anonymously, and your comment will appear with the username “anonymous coward” and get an automatic penalty in the voting system to keep anonymous spam and personal attacks near the bottom.

    The discussion between linear (one long list) and threaded (hierarchical, like folder views) comments has been ongoing since newsgroups was the hot thing. The advantage of linear comments is that they are easier to understand for non-technical users, but they are harder to parse for more savvy users. Particularly conversations are a problem for linear comments. Replying to another user is a mess, following a conversation is even more of a mess. Threaded conversations seem to be prevailing as more and more users get used to them. It’s also hard not to notice that almost all well-functioning comments sections have some kind of threaded comment system. On a sidenote I’ve more than once heard the argument that threaded comments with unlimited depth were almost impossible to implement. I suggest these people learn about recursion

    Interaction and psychology

    Why do people spend their time writing comments? To paint with a really broad brush it’s either because they are bored, have an agenda, are angry or have something interesting to say. The ones that have something interesting to say are usually the ones that are most busy, and have the lowest threshold for making a comment. For this reason it should be simple and quick to make a comment. The downside is that it will also be simple and quick for users that don’t have anything to add, but that’s what comment sorting and moderation is for. Making it hard to join the conversation is throwing the baby out with the bathwater.

    Actually there’s another reason people spend their time writing comments, and it’s probably the most important one; to feel part of a community, and to get a feeling of acceptance or empowerment from that community. This is why feedback is important. Facebook is the master of this. We all know and love the little red globe on the top right of the page that indicates that someone has liked or responded to something we wrote. As any psychologist can tell you this brings you closer to the community, and gives you a more favorable view of the site. It also promotes discussion since a user is notified when someone responds to his comment. You absolutely need to have functionality that easily lets users see responses on their writings – it’s one of the major psychological drivers for spending time writing out a long thoughtful comment.

    Karma is the word normally used for votes/points/likes. The more votes you have the more karma you have. It’s a disputed term that many people have a love/hate relationship with, but it works. Most power users on a given site with karma will follow it, and most people won’t acknowledge that they do so. It’s a measure of how good a member of the community you are, or as a psychologist might say, it is an extension of your ego. Even though it’s just a number it has a profound psychological effect, and spurs users to write better comments to gain karma. Some sites even have top lists of users with the most karma.

    Closing thoughts

    What was originally intended to be a short guide to comments for noobs ended up being much longer than I thought, and I’ve only covered the basics. This probably goes to show that comments are somewhat more complicated than they first appear, and that a good implementation is not trivial.

    Best of luck to anyone faced with the job of implementing a good comment system.

    If you think the task is monumental and don’t know where to start, you should send me an e-mail – if you have an interesting project I might be interested.