Data stored by a SmokeDetector instance

Each SmokeDetector instance stores a number of data files within its repository directory. These are used to hold state between reboots and for debugging.

Pickles

All pickles are stored in the “./pickles” directory.

The protocol which is used is, currently, pickle.HIGHEST_PROTOCOL. Protocol 5 is only supported in Python >= 3.8. We, currently, support Python 3.10 <= X <= 3.7. This means that SmokeDetector instances which are running in Python 3.7 will not be able to read pickle files created by SmokeDetector instances which are running with Python >= 3.8. In the vast majority of cases, this shouldn’t be an issue, because the pickles are intended to be read by the instance which created them. This should be/will be changed to use the highest protocol which is supported by the lowest Python version we support, which, currently, means protocol 4.

Pickle File Where is the code to dump/load the pickle Contents What Notes Should sync between SD instances1 Is in !!/dump / !!/load
apiCalls.p datahandling.py GlobalVars.api_calls_per_site Count of the number of bodyfetcher.py scan API calls which were made for the site since the last time the API quota rolled over. Currently, this is synchronously dumped after every scan API call. A) it should be done asynchronously as a task; B) we don’t need it to be dumped that often. It really should only be dumped periodically and/or upon reboot or crash. No  
autoIgnoredPosts.p datahandling.py GlobalVars.auto_ignored_posts List of site/post which SD has auto-ignored, which is done for 7 days when detected for only one detection from a list: “all-caps title”, “repeating characters in {}”, and “repeating words in {}”. Dumped synchronously when a post is added after scan. Filtered to 7 days max upon boot and dumped, synchronously. Yes  
blacklistedUsers.p datahandling.py GlobalVars.blacklisted_users Dict of users who have been blacklisted and why Dumped synchronously upon add and remove Yes Yes
bodyfetcherMaxIds.p datahandling.py GlobalVars.bodyfetcher.previous_max_ids Dict by site of the most recent post fetched. Used for all sites other than SO. Stored as a async task after every bodyfetcher scan, which allows only one dump task active at a time. Pickle will go away when scanning changed for all sites to fetching most recently active, rather than by specific ID. Yes  
bodyfetcherQueue.p datahandling.py GlobalVars.bodyfetcher.queue Dict by site of the posts currently in the queue to be fetched and scanned. Stored as a async task after every bodyfetcher scan, which allows only one dump task active at a time. How that works should be adjusted a bit. Instead of canceling the current task, a new task should just not be added if there’s an existing one. Yes  
codePrivileges.p datahandling.py GlobalVars.code_privileged_users Set of (chat site, user ID) tuples obtained from MS of users who are code privileged.   No  
cookies.p datahandling.py GlobalVars.cookies Dict by SE Chat site of the cookies obtained for logging into Chat.   No  
deletionIDs.p deletionwatcher.py dict of sites, each a list of post IDs Posts which are currently being watched by DeletionWatcher Dumped as Task upon subscribing Yes  
editActions.p editwatcher.py dict of sites, each a list of post IDs Posts which are currently being watched by EditWatcher Supposed to be dumped as Task upon subscribing. However, it appears there’s a bug in the code, because the pickle doesn’t exist on my instance or my test instance. Yes  
falsePositives.p datahandling.py GlobalVars.false_positives List of tuple containing site/post ID. Is used to prevent re-reporting posts which have received FP feedback. Dumped syncronously upon addition. There’s no way to remove a post once added. Yes  
ignoredPosts.p datahandling.py GlobalVars.ignored_posts List of tuple containing site/post ID. Is used to prevent re-reporting posts which have been ignored or received NAA feedback. Dumped syncronously upon addition. There’s no way to remove a post once added. Yes Yes
messageData.p chatcommunicate.py chatcommunicate.py _last_messages The most recent 100 chat messages and 50 reports sent to chat. Dumped async after every message or report sent to chat. No2  
metasmokeCacheData.p dumped in metasmoke_cache.py;
restored in ws.py
{'cache': MetasmokeCache._cache, 'expiries': MetasmokeCache._expiries} Cache of some of the data received from the metasmoke API. Dumped async after data fetched form MS. No  
metasmokePostIds.p datahandling.py GlobalVars.metasmoke_ids Cache dict of MS post IDs by SE site API ident/ID tuple. Each contains the largest MS ID for the SE post which existed upon the entry creation. Dumped sync upon addition. Entries only removed if not an int (i.e. invalid). Never updated, even if a newer MS post report is created. No  
ms_ajax_queue.p datahandling.py metasmoke.Metasmoke.ms_ajax_queue List of dict describing AJAX calls to MS which failed or were not tried because MS was declared down. Dumped sync upon addtion. Intent is that these AJAX calls will be sent to MS once MS is back up/connection available. Code doesn’t exist to do anything with these, yet. No  
notifications.p datahandling.py GlobalVars.notifications List of tuple: (int(user_id), chat_site, int(room_id), se_site, always_ping) describing the notifications requested by users. Dumped sync upon change Yes Yes
postScanStats2.p datahandling.py GlobalVars.PostScanStat.stats Dict by stat key of stats for bodyfetcher scanning by this instance. Dumped sync upon call to helpers.exit_mode (i.e. upon exit) No  
reasonWeights.p datahandling.py GlobalVars.reason_weights Cache dict of reason weights from MS Dumped sync upon update from MS. Updated from MS upon !!/autoflagged or if > 1 hour old. No  
recentlyScannedPosts.p datahandling.py GlobalVars.recently_scanned_posts Dict by site/ID of posts which were recently scanned. Dumped sync upon call to helpers.exit_mode (i.e. upon exit). Currently, the data structure is substantially larger than it needs to be, as it contains the post text. This is planned to change to a hash for the post body text and title, as well as potentially trimming some of the other data currently included. Yes  
seSiteIds.p datahandling.py (GlobalVars.site_id_dict_timestamp, GlobalVars.site_id_dict_issues_into_chat_timestamp, GlobalVars.site_id_dict) Cache of SE site IDs obtained from SE. Used for WebSocket access to specific sites and/or posts. Refreshed every 24 hours, if possible (i.e. SE not down). Dumped sync upon update from SE. No  
whitelistedUsers.p datahandling.py GlobalVars.whitelisted_users Set of (user ID, SE site) tuples of those users who have been whitelisted. Dumped sync when updated. Yes Yes
whyData.p datahandling.py GlobalVars.why_data List of tuple (“site/post_id”, why text) kept to 50 entries max. Dumped sync when added. No2  
  1. Currently, SD instances do not sync these pickles without admin/runner intervention. This may change in the future.
  2. But, not having it does limit some functions

Text files

File (location) What
bodyfetcherQueueTimings.txt (./pickles) Historical timing data for each launch of a scan for a site.
errorLogs.txt (./) Some limited logs output when helpers.log_file() is called. Use is limited.
errorLog.txt (./) Errors logged by nocrash.py

Database

File (location) What
errorLogs.db (./) Errors which the SmokeDetector instance has encountered during operation other than in nocrash.py (i.e. while the SmokeDetector instance is running its primary code.