gwpy cannot read standard LHO and LLO DQXML files
Created by: robertbruntz
BLUF: Standard LHO/LLO DQXML files don't have start_time_ns
attributes (columns), so gwpy can't read them.
Full story: gwpy cannot read standard LHO and LLO DQXML files, which are produced by John Zweizig's SegGeners, for at least 2 reasons:
- Many of the files have comments that contain commas, but commas are the field delimiters in ligolw XML files, so they are escaped with backslashes, but then gwpy (?) chokes on the backslash:
>>> from gwpy.segments import DataQualityFlag
>>> jz_dq_file = '/home/[redacted]/L-DQ_Segments-1283100000-16.xml'
>>> jz_flag = DataQualityFlag.read(jz_dq_file)
[...]
File "/cvmfs/software.igwn.org/conda/envs/igwn/lib/python3.10/site-packages/ligo/lw/ligolw.py", line 940, in characters
raise type(e)("line %d: %s" % (self._locator.getLineNumber(), str(e)))
ValueError: line 38: parse error in 'L1:GRD-HPI_ITMY_OK reported by Guardian system\, flag is on when system is in requested state and guardian node is connected and nomina' near '\' at position 47: unrecognized escape sequence
This can be benignly fixed by replacing the \,
with something else, like a semicolon: sed -i 's/\\,/;/g' /home/[redacted]/L-DQ_Segments-1283100000-16.xml
- The files don't have a
start_time_ns
field (attribute), which also produces an error:
>>> jz_flag = DataQualityFlag.read(jz_dq_file)
[...]
File "/cvmfs/software.igwn.org/conda/envs/igwn/lib/python3.10/site-packages/ligo/lw/lsctables.py", line 346, in __get__
ns = self.get_ns(obj)
AttributeError: 'Segment' object has no attribute 'start_time_ns'. Did you mean: 'start_time'?
This is not easily fixable. Duncan Macleod suggested that this is because "GWpy uses property decorators defined in the python-ligo-lw library that rely upon the full suite of columns being present. However, it would be fairly easy to patch this to accommodate missing _ns columns."
Note that fixing the production of new DQXML files doesn't really fix the issue, since there are probably millions of existing DQXML files in existence, going back to at least 2015, and maybe further back, most of which cannot be changed.