Purging and Retention
Purge Types
There are two main forms of data purging in AWIPS. The most often thought of is the purging for processed data. This has to do with how long data is stored for after it has been decoded and processed.
The second type of purging has to do with raw data. This has to do with how long data is stored for before it has been decoded.
Processed Data Purging
AWIPS uses a plugin-based purge strategy for processed HDF5 data. This allows the user to change the purge frequency for each plugin individually, and even set purge rules for specific products for a particular plugin. There is also a default purge rules file for those products which do not have specific rules written.
Note: Purging is triggered by a quartz timer event that fires at 30 minutes after each hour.
Purging rules are defined in XML files in the Localization Store. On EDEX, most are located in /awips2/edex/data/utility/common_static/base/purge
, and follow the base/site localization pattern (e.g. site purge files are in site/XXX/purge
rather than base/purge
, where XXX is the site identifier).
Each data set can have a purge rule defined, and the xml file is named after the data set:
ls /awips2/edex/data/utility/common_static/base/purge/
acarsPurgeRules.xml bufruaPurgeRules.xml pirepPurgeRules.xml
acarssoundingPurgeRules.xml ccfpPurgeRules.xml poessoundingPurgeRules.xml
aggregatePurgeRules.xml convsigmetPurgeRules.xml pointsetPurgeRules.xml
airepPurgeRules.xml cwaPurgeRules.xml profilerPurgeRules.xml
...
Time-based purge
If a plugin has no XML file, the default rule of 1 day (24 hours) is used, from /awips2/edex/data/utility/common_static/base/purge/defaultPurgeRules.xml
:
<purgeRuleSet> <defaultRule> <period>01-00:00:00</period> </defaultRule> </purgeRuleSet>
Time-based purging is set with the period tag and uses the reference time of the data. The reference time of the data is determined by the decoder.
30-day NEXRAD3 Example
Modify /awips2/edex/data/utility/common_static/base/purge/radarPurgeRules.xml
to increase the data retention period from 1 to 31 days:
<purgeRuleSet> <defaultRule> <period>31-00:00:00</period> </defaultRule> </purgeRuleSet>
Note: you do NOT have to restart EDEX when you change a purge rule!
Frame-Based Purge
Some plugins use frame-base purging, retaining and certain number of product "versions".
/awips2/edex/data/utility/common_static/base/purge/gridPurgeRules.xml
<defaultRule> <versionsToKeep>2</versionsToKeep> <period>07-00:00:00</period> </defaultRule> <rule> <keyValue>LAPS</keyValue> <versionsToKeep>30</versionsToKeep> </rule> <rule regex="true"> <keyValue>NAM(?:12|20|40)</keyValue> <versionsToKeep>2</versionsToKeep> <modTimeToWait>00-00:15:00</modTimeToWait> </rule> ...
In the above example, notice a default rule (2) is specified, as well as specific models with their own rules.
The tag modTimeToWait can be used in conjunction with versionsToKeep and will increase the versionsToKeep by 1 if data matching this rule has been stored within modTimeToWait.
Purge Logs
Data purge events are logged to the file edex-ingest-purge-[yyyymmdd].log
, where [yyyymmdd]
is the date stamp.
tail -f edex-ingest-purge-20120327.log
--------START LOG PURGE---------
INFO 2012-03-27 00:30:00,027 [DefaultQuartzScheduler_Worker-3] PurgeLogger: EDEX - PURGE LOGS::Skipped file with invalid fileName: afos-trigger.log
INFO 2012-03-27 00:30:00,193 [DefaultQuartzScheduler_Worker-3] PurgeLogger: EDEX - PURGE LOGS::Removed 1 old files
INFO 2012-03-27 00:31:23,155 [DefaultQuartzScheduler_Worker-3] PurgeLogger: EDEX - PURGE LOGS::Archived 14 files
INFO 2012-03-27 00:31:23,155 [DefaultQuartzScheduler_Worker-3] PurgeLogger: EDEX - PURGE LOGS::Skipped processing 1 files
INFO 2012-03-27 00:31:23,155 [DefaultQuartzScheduler_Worker-3] PurgeLogger: EDEX - PURGE LOGS::---------END LOG PURGE-----------
All Purge Rules
To see all purge rule directories (base, site, configured):
find /awips2/edex/data/utility -name purge
/awips2/edex/data/utility/common_static/base/purge
If any overrides have been made, then it's possible that site directories may show up as results from the find command as well.
Raw Data Purging
Raw data are files that have been brought in by the LDM and recognized by an action in the pqact.conf file. These files are written to subdirectories of /awips2/data_store/
. This data will wait here until it is purged, from the purging rules defined in /awips2/edex/data/utility/common_static/base/archiver/purger/RAW_DATA.xml
.
If the purge time is too short, and the processing latencies on EDEX are too long, it is possible that EDEX will miss some of this data, and the purge times will need to be adjusted by changing the <defaultRetentionHours>
or <selectedRetentionHours>
tag on the relevent data sets.
Default Retention
The defaultRetentionHours tag is defined at the beginning of the RAW_DATA.xml file. It is the duration that will apply to any piece of data that does not fall under an explicitly defined category.
The default value for our EDEX is 1 hour:
<archive> <name>Raw</name> <rootDir>/awips2/data_store/</rootDir> <defaultRetentionHours>1</defaultRetentionHours> <category> ...
Selected Retention
Data sets are broken up into categories in the RAW_DATA.xml file. These categories are groupings of similar data. Each category has a selectedRetentionHours tag which specifies how long the matching data will be kept for.
For example, there is a Model category which sets the purge time to 3 hours for all grib, bufrmos, and modelsounding data:
... <category> <name>Model</name> <selectedRetentionHours>3</selectedRetentionHours> <dataSet> <dirPattern>(grib|grib2)/(\d{4})(\d{2})(\d{2})/(\d{2})/(.*)</dirPattern> <displayLabel>{1} - {6}</displayLabel> <dateGroupIndices>2,3,4,5</dateGroupIndices> </dataSet> <dataSet> <dirPattern>(bufrmos|modelsounding)/(\d{4})(\d{2})(\d{2})/(\d{2})</dirPattern> <displayLabel>{1}</displayLabel> <dateGroupIndices>2,3,4,5</dateGroupIndices> </dataSet> </category> ...
Logging
Raw data purging can be seen in the purge logs as well (/awips2/edex/logs/edex-ingest-purge-[yyyymmdd].log
where [yyyymmdd]
is the date stamp).
[centos@tg-atm160027-edex-dev purge]$ grep -i 'archive' /awips2/edex/logs/edex-ingest-purge-20200728.log
INFO 2020-07-28 20:05:23,959 2329 [Purge-Archive] ArchivePurgeManager: EDEX - Start purge of category Raw - Observation, directory "/awips2/data_store/bufrhdw".
INFO 2020-07-28 20:05:23,960 2330 [Purge-Archive] ArchivePurgeManager: EDEX - End purge of category Raw - Observation, directory "/awips2/data_store/bufrhdw", deleted 0 files and directories.
INFO 2020-07-28 20:05:23,961 2331 [Purge-Archive] ArchivePurgeManager: EDEX - Unlocked: "/awips2/data_store/bufrhdw"
INFO 2020-07-28 20:05:23,963 2332 [Purge-Archive] ArchivePurgeManager: EDEX - Locked: "/awips2/data_store/xml"
INFO 2020-07-28 20:05:23,963 2333 [Purge-Archive] ArchivePurgeManager: EDEX - Start purge of category Raw - Products, directory "/awips2/data_store/xml".
INFO 2020-07-28 20:05:23,964 2334 [Purge-Archive] ArchivePurgeManager: EDEX - End purge of category Raw - Products, directory "/awips2/data_store/xml", deleted 5 files and directories.
INFO 2020-07-28 20:05:23,967 2335 [Purge-Archive] ArchivePurgeManager: EDEX - Unlocked: "/awips2/data_store/xml"
INFO 2020-07-28 20:05:23,967 2336 [Purge-Archive] ArchivePurger: EDEX - Raw::Archive Purged 28387 files in 23.8s.
INFO 2020-07-28 20:05:23,979 2337 [Purge-Archive] ArchivePurgeManager: EDEX - Purging directory: "/awips2/edex/data/archive".
INFO 2020-07-28 20:05:23,992 2338 [Purge-Archive] ArchivePurger: EDEX - Processed::Archive Purged 0 files in 25ms.
INFO 2020-07-28 20:05:23,992 2339 [Purge-Archive] ArchivePurger: EDEX - Archive Purge finished. Time to run: 23.9s
...