Performance difference TimeSeries.read for raw data Virgo files
Created by: maxlalleman
Hello all,
I have been working on converting some coherence calculation code from LIGO to Virgo and I have an encountered an issue in gwpy performance when reading in data of Virgo raw data files compared to LIGO files.
When running this code, I will have to read in around ~400 channels, which for LIGO gives an acceptable computing time, however due to this issue, the computing time for Virgo explodes.
Specifically, it concerns the gwpy.timeseries.TimeSeries.read
function.
Here is some code I used to run @ CIT (ldas-pcdev1, but issue is present at all pcdev)
st = 1251871200
et = st + 100
channel = 'H1:PEM-VAULT_MAG_1030X195Y_COIL_X_DQ'
channel_V = 'V1:Hrec_hoft_16384Hz'
filename='H1_raw.ffl'
print("TimeSeries.read: " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
data_read = TimeSeries.read(filename,channel,start=st,end=et)
print("TimeSeries.read: " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
print(len(data_read.data))
print(data_read)
filename='V1_raw.ffl'
print("TimesSeries.read (V1): " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
data_read = TimeSeries.read(source=filename,channel=channel_V,start=st,end=et)
print("TimesSeries.read (V1): " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
print(len(data_read.data))
print(data_read)`
This returns
TimeSeries.read: 2022-10-26 06:02:57
TimeSeries.read: 2022-10-26 06:02:59
409600
TimeSeries([-461.55807, -310.71957, -121.49655, ..., 1227.6696 ,
1089.4346 , 945.5336 ]
unit: ct,
t0: 1251871200.0 s,
dt: 0.000244140625 s,
name: H1:PEM-VAULT_MAG_1030X195Y_COIL_X_DQ,
channel: H1:PEM-VAULT_MAG_1030X195Y_COIL_X_DQ)
TimesSeries.read (V1): 2022-10-26 06:05:07
TimesSeries.read (V1): 2022-10-26 06:06:22
1638400
TimeSeries([-9.5718820e-21, 8.4890011e-20, 6.1089473e-20, ...,
-2.8835219e-20, -4.7170608e-20, -2.1399083e-20]
unit: h,
t0: 1251871200.0 s,
dt: 6.103515625e-05 s,
name: V1:Hrec_hoft_16384Hz,
channel: V1:Hrec_hoft_16384Hz)
As you can see, to load in the Virgo data, it takes more than a minute while LIGO data only takes a couple of seconds to read.
This is the case for any channel
you choose. The problem also scales for longer timeseries you want to read.
The filenames H/V1_raw.ffl
represent a file that points to some raw data files.
Versions:
- Python: 3.9.13
- NumPy: 1.22.4
- Astropy: 5.1
- GWpy: 2.1.5
- conda env: igwn-py39