Talk:SOFA conventions: Difference between revisions
No edit summary |
|||
| (11 intermediate revisions by 5 users not shown) | |||
| Line 2: | Line 2: | ||
== Headphone transfer functions (HpTFs) == |
== Headphone transfer functions (HpTFs) == |
||
No public HpIR databases nor standard HpIR repositories have been proposed to date. One of the purposes of HpIRs recordings is to compute the equalization filter that compensates the headphone starting from the raw HpIRs. ''Question: store the full emitter-receiver matrix or only a one-to-one correspondence?'' |
|||
Measured data exist as IRs. More information on metadata and measurement setups required. Corresponding researchers contacted. Waiting for response... |
|||
For each pair of headphones, a collection of individual HpIRs and equalization filters are stored with respect to a specific subject. ''SubjectID seems to be required. Saved as equalization filters, the data could have a one-to-one correspondence between emitters and receivers, i.e., data for each receiver correspond to data for a emitter.'' It is desirable to have a ''foreign key'' to the HRIRs set recorded on the same subject in order to provide data for an individual binaural listening experience. |
|||
For every subject, three data containers - raw, compensated and eq - could be defined. ''The processing state could be defined by the global attribute ProcessingState'': |
|||
* the raw data from the recordings: ''GLOBAL_ProcessingState = 'raw'?'' |
|||
* the compensated impulse responses: ''GLOBAL_ProcessingState = 'compensated by using xxx algorithm with yyy parameters'?'' |
|||
* the equalization filter (under the form of an impulse response) obtained from the inverse HpIR through one or more techniques, e.g. considering the mean HpIR measurement with respect to all the repositionaments of that device. ''This could be also described by the ProcessingState, e.g., GLOBAL_ProcessingState = 'inverted equ filters by using XXX regularization and YYY averaging technique'.'' |
|||
The following information is required: |
|||
* headphone model, producer. ''This could correspond to SourceDescription and EmitterDescription''. Specific characteristics of the headphones such as transducer type, acoustic coupling, and design are stored in their data sheet. |
|||
* eq algorithm, equalization algorithm used. ''ProcessingState could be used as a narrative descriptor.'' |
|||
== Include anthropometric data (AD) == |
== Include anthropometric data (AD) == |
||
| Line 11: | Line 22: | ||
When saving HRTFs, why not save CTCFs? Check and discuss how the representation of CTCFs differs from the representation of HRTFs. It might be that SimpleFreeFieldHRIR is sufficient and the difference between HRTFs and CTCFs is simple the interpretation. Then, a note in GLOBAL_Comment might be sufficient. |
When saving HRTFs, why not save CTCFs? Check and discuss how the representation of CTCFs differs from the representation of HRTFs. It might be that SimpleFreeFieldHRIR is sufficient and the difference between HRTFs and CTCFs is simple the interpretation. Then, a note in GLOBAL_Comment might be sufficient. |
||
At the same time, the calculation and use of CrossTalk cancellation filters goes beyond a general SOFA dataset. One can of course calculate the CTCFs filter if one wants, using various methods, but it is by no means an inherent part of the measured data, and more an application specific post-processing tool for a specific installation. It is, of course, also dependent on the speaker positions of the playback system chosen, and therefore this information should be stored within SOFA if required. |
|||
It seems like more information from researchers about their requirements is required. |
|||
== Include calibration data == |
== Include calibration data == |
||
| Line 21: | Line 36: | ||
Corresponding researchers contacted, waiting for response... |
Corresponding researchers contacted, waiting for response... |
||
Calibration data can hav several forms, depending on the |
Calibration data can hav several forms, depending on the measurement in question. If the measurement is a transfer function, and that it it's intention, then one could provide the measured transfer function of the measurement chain without the measurement object present. In the case of the HRTF, this is the measured transfer function without the head present. As there are various means of removing the effects of the measurement chain (deconvolution, spectral equalizing) the "calibrated" final data may vary between methods. If the object if to provide the data in its most raw form, for other researchers to use, the uncalibrated data, along with necessary calibration files, is of interest. In this way, the frequency response of the key elements, notably the microphone and speakers, can be both examined and taken into account. In the context of HRTF measurements which use multiple speakers, the calibration of each speaker is crucial in order to avoid spectral coloration which is independent of the head response and therefor an artifact of the measurement system. In such a case, impulse responses should be provided for each speaker, with the microphone oriented on-axis to the speaker. This can be performed with the measurement microphones used for the data acquisition, or a calibrated measurement microphone, whose model should be clearly stated. If the second is employed, a microphone measurement file is also necessary, consisting of a calibration measurement for a single speaker and the accompanying impulse response of all 3 microphones (again, in the case of and HRTF measurement). For a truly robust analysis of the data, this calibration measurement is made before and after the measurement session, to ensure no time independence of the measurement system, which is not always the case. In such a case that the before/after data do not correspond, the data should not be utilized. The actual data to be included is therefore a simple impulse response clearly annotated to indicate the microphone/speaker in question. This therefore requires in a multi-speaker measurement condition that each entry in the HRTF set includes a speaker ID. In the event that "un-calibrated", or already calibrated processed data, is provided, then there should be a clear text description indicating the calibration method used, and any addition post-treatment, such as low-frequency interpolated data (how, frequency range), equalization (diffuse field or otherwise), etc. As these different elements are clearly of more interest to researchers, rather than industrial uses, they cannot be "required elements". A simple required tag should be available indicating if the data is "raw" with accompanying calibration data, or is "processed". |
||
* Seems like we need a good definition of "raw", "processed", and "un-calibrated". Then we could provide an attribute, e.g., ProcessingState. Otherwise, we could use "ProcessingState" in the same way we use "Comments", namely, include a narrative description of the processing steps. Then we could write GLOBAL_ProcessingState = "Deconvolution with exp. sweep, windowing at ..., filtering at..." and so on. Applications which do not care, could ignore it; researchers who care could interpret it. |
|||
In the case of measurements where the absolute level is of importance, the calibration data is more typically that of a piston/speaker phone which provides a known SPL level. This is of specific interest when distributed microphones are employed to measure the spatial distribution of a sound field. In this instance, the calibration data can either be the noted RMS level of the calibrated source for each microphone (noting of course the model and settings of the calibrated source, and if possible, meterological conditions). To be consistent with the above mentioned HRTF case, the recorded calibration audio file could be provided, again noting the model and settings used. |
In the case of measurements where the absolute level is of importance, the calibration data is more typically that of a piston/speaker phone which provides a known SPL level. This is of specific interest when distributed microphones are employed to measure the spatial distribution of a sound field. In this instance, the calibration data can either be the noted RMS level of the calibrated source for each microphone (noting of course the model and settings of the calibrated source, and if possible, meterological conditions). To be consistent with the above mentioned HRTF case, the recorded calibration audio file could be provided, again noting the model and settings used. |
||
* It seems like we would need to save the IRs from speaker(s) and microphone(s) together with other metadata like levels. Would it be sufficient to store that information in a separate SOFA file? Applications which do not care, would not have this ballast. Researchers who care, could use it to reproduce all post-processing steps. A new SOFA conventions for calibration would do - Does it make sense? |
|||
* Seems like level would be important. Intermediate results: specifications for the level required, name for the variable required? Then, we could have level as an optional variable in SOFA. Also, some data would be nice |
|||
== Include room pictures == |
== Include room pictures == |
||
Latest revision as of 17:24, 25 June 2013
Headphone transfer functions (HpTFs)
No public HpIR databases nor standard HpIR repositories have been proposed to date. One of the purposes of HpIRs recordings is to compute the equalization filter that compensates the headphone starting from the raw HpIRs. Question: store the full emitter-receiver matrix or only a one-to-one correspondence?
For each pair of headphones, a collection of individual HpIRs and equalization filters are stored with respect to a specific subject. SubjectID seems to be required. Saved as equalization filters, the data could have a one-to-one correspondence between emitters and receivers, i.e., data for each receiver correspond to data for a emitter. It is desirable to have a foreign key to the HRIRs set recorded on the same subject in order to provide data for an individual binaural listening experience.
For every subject, three data containers - raw, compensated and eq - could be defined. The processing state could be defined by the global attribute ProcessingState:
- the raw data from the recordings: GLOBAL_ProcessingState = 'raw'?
- the compensated impulse responses: GLOBAL_ProcessingState = 'compensated by using xxx algorithm with yyy parameters'?
- the equalization filter (under the form of an impulse response) obtained from the inverse HpIR through one or more techniques, e.g. considering the mean HpIR measurement with respect to all the repositionaments of that device. This could be also described by the ProcessingState, e.g., GLOBAL_ProcessingState = 'inverted equ filters by using XXX regularization and YYY averaging technique'.
The following information is required:
- headphone model, producer. This could correspond to SourceDescription and EmitterDescription. Specific characteristics of the headphones such as transducer type, acoustic coupling, and design are stored in their data sheet.
- eq algorithm, equalization algorithm used. ProcessingState could be used as a narrative descriptor.
Include anthropometric data (AD)
Several request received on "include AD in SOFA". But none of them could specify what actually to store. What is clear that AD stored with CIPIC are not sufficient. In ARI more AD are store than in CIPIC, but there is no proof that these AD are of any use. So?
Crosstalk cancellation filters (CTCFs)
When saving HRTFs, why not save CTCFs? Check and discuss how the representation of CTCFs differs from the representation of HRTFs. It might be that SimpleFreeFieldHRIR is sufficient and the difference between HRTFs and CTCFs is simple the interpretation. Then, a note in GLOBAL_Comment might be sufficient.
At the same time, the calculation and use of CrossTalk cancellation filters goes beyond a general SOFA dataset. One can of course calculate the CTCFs filter if one wants, using various methods, but it is by no means an inherent part of the measured data, and more an application specific post-processing tool for a specific installation. It is, of course, also dependent on the speaker positions of the playback system chosen, and therefore this information should be stored within SOFA if required.
It seems like more information from researchers about their requirements is required.
Include calibration data
- Are data available?
- Is a description of the measurement setup available?
- What metadata should be stored?
- How to standardize the description of the calibration?
Corresponding researchers contacted, waiting for response...
Calibration data can hav several forms, depending on the measurement in question. If the measurement is a transfer function, and that it it's intention, then one could provide the measured transfer function of the measurement chain without the measurement object present. In the case of the HRTF, this is the measured transfer function without the head present. As there are various means of removing the effects of the measurement chain (deconvolution, spectral equalizing) the "calibrated" final data may vary between methods. If the object if to provide the data in its most raw form, for other researchers to use, the uncalibrated data, along with necessary calibration files, is of interest. In this way, the frequency response of the key elements, notably the microphone and speakers, can be both examined and taken into account. In the context of HRTF measurements which use multiple speakers, the calibration of each speaker is crucial in order to avoid spectral coloration which is independent of the head response and therefor an artifact of the measurement system. In such a case, impulse responses should be provided for each speaker, with the microphone oriented on-axis to the speaker. This can be performed with the measurement microphones used for the data acquisition, or a calibrated measurement microphone, whose model should be clearly stated. If the second is employed, a microphone measurement file is also necessary, consisting of a calibration measurement for a single speaker and the accompanying impulse response of all 3 microphones (again, in the case of and HRTF measurement). For a truly robust analysis of the data, this calibration measurement is made before and after the measurement session, to ensure no time independence of the measurement system, which is not always the case. In such a case that the before/after data do not correspond, the data should not be utilized. The actual data to be included is therefore a simple impulse response clearly annotated to indicate the microphone/speaker in question. This therefore requires in a multi-speaker measurement condition that each entry in the HRTF set includes a speaker ID. In the event that "un-calibrated", or already calibrated processed data, is provided, then there should be a clear text description indicating the calibration method used, and any addition post-treatment, such as low-frequency interpolated data (how, frequency range), equalization (diffuse field or otherwise), etc. As these different elements are clearly of more interest to researchers, rather than industrial uses, they cannot be "required elements". A simple required tag should be available indicating if the data is "raw" with accompanying calibration data, or is "processed".
- Seems like we need a good definition of "raw", "processed", and "un-calibrated". Then we could provide an attribute, e.g., ProcessingState. Otherwise, we could use "ProcessingState" in the same way we use "Comments", namely, include a narrative description of the processing steps. Then we could write GLOBAL_ProcessingState = "Deconvolution with exp. sweep, windowing at ..., filtering at..." and so on. Applications which do not care, could ignore it; researchers who care could interpret it.
In the case of measurements where the absolute level is of importance, the calibration data is more typically that of a piston/speaker phone which provides a known SPL level. This is of specific interest when distributed microphones are employed to measure the spatial distribution of a sound field. In this instance, the calibration data can either be the noted RMS level of the calibrated source for each microphone (noting of course the model and settings of the calibrated source, and if possible, meterological conditions). To be consistent with the above mentioned HRTF case, the recorded calibration audio file could be provided, again noting the model and settings used.
- It seems like we would need to save the IRs from speaker(s) and microphone(s) together with other metadata like levels. Would it be sufficient to store that information in a separate SOFA file? Applications which do not care, would not have this ballast. Researchers who care, could use it to reproduce all post-processing steps. A new SOFA conventions for calibration would do - Does it make sense?
- Seems like level would be important. Intermediate results: specifications for the level required, name for the variable required? Then, we could have level as an optional variable in SOFA. Also, some data would be nice
Include room pictures
- Are data available?
- Is a description of the measurement setup available?
- What metadata should be stored?
- How to standardize the description of the pictures (camera position, room description)? There is FITS for astronomy maybe we can learn from this format?
Corresponding researchers contacted, waiting for response...