Autoflagging

What is autoflagging?

Autoflagging is a subsystem of metasmoke that lets us automatically cast spam flags on Stack Exchange posts if we’re certain enough that they’re spam. To gain that certainty, we calculate the historical accuracy of the spam checks that caught the post in question, and - essentially - if they’re over a certain mark, we cast some flags. More detail on how exactly that works is at the bottom of this page.

How good is it?

In the period from 2017-02-01 through 2018-09-11, we successfully cast 114,089 spam flags. Of those, 113,570 were correctly cast - that is, cast on a confirmed spam post - and 519 were cast in error on legitimate posts. Overall, that’s a 99.55% accuracy rate.

This is the chart of flags over time for that period. The tp_count series represents flags cast on spam; the fp_count series flags cast on legitimate posts. Click for larger version.

How many flags does it cast?

Autoflagging can cast up to four flags on a post it is confident is spam. However, that’s not a flat value - the number of flags cast varies with the degree of certainty we have. If the system is more than 99.75%* certain, we cast three flags; above 99.90%*, we cast four flags.

This doesn’t apply on sites where the number of flags required for auto-deletion of a post is lower. On most Stack Exchange sites, posts require 6 flags to be automatically deleted as spam by the system. However, on The Workplace and on English Language & Usage, the requirement has been dropped to 3. Hence, on these two sites, autoflagging only ever casts one flag automatically, regardless of certainty.

* Correct as of 2018-09-11; subject to change.

Where do the flags come from?

We are able to cast more than one spam flag on each post by using regular users’ spam flags. Stack Exchange users may grant permission for their accounts to be used for autoflagging by signing up on metasmoke. Once signed up, they may specify under what conditions their flags can be used; this enables metasmoke to cast flags in their name.

Moderators’ flags will never be used, regardless of whether their settings allow it or not; we don’t want to automatically nuke posts.

How does it work?

This has been alluded to in a number of places here, but for the sake of completeness, autoflagging works along these lines:

  • Users sign up and grant permission for their flags to be used on metasmoke.
    Part of this process involves the user setting up “flag conditions”, which specify at what degree of certainty they personally are happy for their flags to be used, and “site settings”, which govern how many of their flags autoflagging may use per day and on what sites.
  • Orthogonally, SmokeDetector detects spam and sends it to metasmoke.
    When a new report arrives in metasmoke, among other process, a certainty value is calculated for the post. Using that value, the system works out if the post is eligible for flagging, and if so which users’ flags may be used.
  • The system casts flags by selecting randomly from the available users.

If you’d like more information on the technical side of the system, drop into Charcoal HQ to have a chat about it.

What are spam waves?

Spam waves are a tool that metasmoke admins have to set SmokeDetector to immediately cast any number of autoflags on detected posts matching pre-set criteria. That includes enough autoflags to immediately delete the post without requiring users to cast flags manually. This feature is separate from blacklisted/watched keywords - an admin must manually configure it in metasmoke.

Additionally, SmokeDetector may use flags from any user account with flagging enabled in metasmoke, regardless of the account’s autoflagging configuration (if any). At this time, it is not possible to opt out of this.

If you wish to suggest a new spam wave, ping Makyen or another metasmoke admin for further information on how to proceed. Note that approval from a CM is required for unilateral deletion of posts matching specific criteria. Approval from site moderators is also acceptable for raising flags above the “normal” baseline. This tool is reserved for situations involving a severe influx of spam or extremely abusive content.

More information