Each SmokeDetector instance stores a number of data files within its repository directory. These are used to hold state between reboots and for debugging.
All pickles are stored in the “./pickles” directory.
The protocol
which is used is, currently, pickle.HIGHEST_PROTOCOL
. Protocol 5 is only supported in Python >= 3.8. We, currently, support Python 3.10 <= X <= 3.7. This means that SmokeDetector instances which are running in Python 3.7 will not be able to read pickle files created by SmokeDetector instances which are running with Python >= 3.8. In the vast majority of cases, this shouldn’t be an issue, because the pickles are intended to be read by the instance which created them. This should be/will be changed to use the highest protocol which is supported by the lowest Python version we support, which, currently, means protocol 4.
Pickle File | Where is the code to dump/load the pickle | Contents | What | Notes | Should sync between SD instances1 | Is in !!/dump / !!/load |
---|---|---|---|---|---|---|
apiCalls.p | datahandling.py | GlobalVars.api_calls_per_site |
Count of the number of bodyfetcher.py scan API calls which were made for the site since the last time the API quota rolled over. | Currently, this is synchronously dumped after every scan API call. A) it should be done asynchronously as a task; B) we don’t need it to be dumped that often. It really should only be dumped periodically and/or upon reboot or crash. | No | |
autoIgnoredPosts.p | datahandling.py | GlobalVars.auto_ignored_posts |
List of site/post which SD has auto-ignored, which is done for 7 days when detected for only one detection from a list: “all-caps title”, “repeating characters in {}”, and “repeating words in {}”. | Dumped synchronously when a post is added after scan. Filtered to 7 days max upon boot and dumped, synchronously. | Yes | |
blacklistedUsers.p | datahandling.py | GlobalVars.blacklisted_users |
Dict of users who have been blacklisted and why | Dumped synchronously upon add and remove | Yes | Yes |
bodyfetcherMaxIds.p | datahandling.py | GlobalVars.bodyfetcher.previous_max_ids |
Dict by site of the most recent post fetched. Used for all sites other than SO. | Stored as a async task after every bodyfetcher scan, which allows only one dump task active at a time. Pickle will go away when scanning changed for all sites to fetching most recently active, rather than by specific ID. | Yes | |
bodyfetcherQueue.p | datahandling.py | GlobalVars.bodyfetcher.queue |
Dict by site of the posts currently in the queue to be fetched and scanned. | Stored as a async task after every bodyfetcher scan, which allows only one dump task active at a time. How that works should be adjusted a bit. Instead of canceling the current task, a new task should just not be added if there’s an existing one. | Yes | |
codePrivileges.p | datahandling.py | GlobalVars.code_privileged_users |
Set of (chat site, user ID) tuples obtained from MS of users who are code privileged. | No | ||
cookies.p | datahandling.py | GlobalVars.cookies |
Dict by SE Chat site of the cookies obtained for logging into Chat. | No | ||
deletionIDs.p | deletionwatcher.py | dict of sites, each a list of post IDs | Posts which are currently being watched by DeletionWatcher | Dumped as Task upon subscribing | Yes | |
editActions.p | editwatcher.py | dict of sites, each a list of post IDs | Posts which are currently being watched by EditWatcher | Supposed to be dumped as Task upon subscribing. However, it appears there’s a bug in the code, because the pickle doesn’t exist on my instance or my test instance. | Yes | |
falsePositives.p | datahandling.py | GlobalVars.false_positives |
List of tuple containing site/post ID. Is used to prevent re-reporting posts which have received FP feedback. | Dumped syncronously upon addition. There’s no way to remove a post once added. | Yes | |
ignoredPosts.p | datahandling.py | GlobalVars.ignored_posts |
List of tuple containing site/post ID. Is used to prevent re-reporting posts which have been ignored or received NAA feedback. | Dumped syncronously upon addition. There’s no way to remove a post once added. | Yes | Yes |
messageData.p | chatcommunicate.py | chatcommunicate.py _last_messages |
The most recent 100 chat messages and 50 reports sent to chat. | Dumped async after every message or report sent to chat. | No2 | |
metasmokeCacheData.p | dumped in metasmoke_cache.py; restored in ws.py |
{'cache': MetasmokeCache._cache, 'expiries': MetasmokeCache._expiries} |
Cache of some of the data received from the metasmoke API. | Dumped async after data fetched form MS. | No | |
metasmokePostIds.p | datahandling.py | GlobalVars.metasmoke_ids |
Cache dict of MS post IDs by SE site API ident/ID tuple. Each contains the largest MS ID for the SE post which existed upon the entry creation. | Dumped sync upon addition. Entries only removed if not an int (i.e. invalid). Never updated, even if a newer MS post report is created. | No | |
ms_ajax_queue.p | datahandling.py | metasmoke.Metasmoke.ms_ajax_queue |
List of dict describing AJAX calls to MS which failed or were not tried because MS was declared down. | Dumped sync upon addtion. Intent is that these AJAX calls will be sent to MS once MS is back up/connection available. Code doesn’t exist to do anything with these, yet. | No | |
notifications.p | datahandling.py | GlobalVars.notifications |
List of tuple: (int(user_id), chat_site, int(room_id), se_site, always_ping) describing the notifications requested by users. | Dumped sync upon change | Yes | Yes |
postScanStats2.p | datahandling.py | GlobalVars.PostScanStat.stats |
Dict by stat key of stats for bodyfetcher scanning by this instance. | Dumped sync upon call to helpers.exit_mode (i.e. upon exit) | No | |
reasonWeights.p | datahandling.py | GlobalVars.reason_weights |
Cache dict of reason weights from MS | Dumped sync upon update from MS. Updated from MS upon !!/autoflagged or if > 1 hour old. | No | |
recentlyScannedPosts.p | datahandling.py | GlobalVars.recently_scanned_posts |
Dict by site/ID of posts which were recently scanned. | Dumped sync upon call to helpers.exit_mode (i.e. upon exit). Currently, the data structure is substantially larger than it needs to be, as it contains the post text. This is planned to change to a hash for the post body text and title, as well as potentially trimming some of the other data currently included. | Yes | |
seSiteIds.p | datahandling.py | (GlobalVars.site_id_dict_timestamp, GlobalVars.site_id_dict_issues_into_chat_timestamp, GlobalVars.site_id_dict) |
Cache of SE site IDs obtained from SE. Used for WebSocket access to specific sites and/or posts. Refreshed every 24 hours, if possible (i.e. SE not down). | Dumped sync upon update from SE. | No | |
whitelistedUsers.p | datahandling.py | GlobalVars.whitelisted_users |
Set of (user ID, SE site) tuples of those users who have been whitelisted. | Dumped sync when updated. | Yes | Yes |
whyData.p | datahandling.py | GlobalVars.why_data |
List of tuple (“site/post_id”, why text) kept to 50 entries max. | Dumped sync when added. | No2 |
File (location) | What |
---|---|
bodyfetcherQueueTimings.txt (./pickles) | Historical timing data for each launch of a scan for a site. |
errorLogs.txt (./) | Some limited logs output when helpers.log_file() is called. Use is limited. |
errorLog.txt (./) | Errors logged by nocrash.py |
File (location) | What |
---|---|
errorLogs.db (./) | Errors which the SmokeDetector instance has encountered during operation other than in nocrash.py (i.e. while the SmokeDetector instance is running its primary code. |