SOFA Specifications

From Sofaconventions
Revision as of 07:02, 17 May 2013 by Isfmiho (talk | contribs) (Created page with "== Objects == Receiver is any acoustic sensor like the ear or a microphone. The number of receivers in not limited in SOFA and defines the size of the data matrix. Listener is...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Objects

Receiver is any acoustic sensor like the ear or a microphone. The number of receivers in not limited in SOFA and defines the size of the data matrix. Listener is the object incorporating all the receivers. For HRTFs, a listener can be a head or dummy-head microphone. For DRIRs, a listener represents the microphone-array structure such as a sphere or a frame. Incorporating the receivers in the listener as a single logical object is important because in measurements, usually the orientation and/or position of the listener vary without substantial changes in the head-microphone relation. For example, in measurements done for multiple positions in a room, the position of the head varies and the relation between the head and the microphones does not change. Note that only one listener is considered. Emitter is any acoustic excitation used for the measurement. The number of emitters is not limited in SOFA. The contribution of the particular emitter is described by the metadata (see later). Source is the object incorporating all emitters. In SOFA, source might be a multi-driver loudspeaker (with the particular drivers as emitters), or a speaker array (with the particular speakers as emitters), or a choir (with the particular human as emitter), etc. Note that only one source is considered but the source may incorporate an unlimited number of emitters. Room is the volume enclosing the measurement setup. In the case of a free-field measurement, the room is not considered. An optional room description is considered for measurements performed in reverberant spaces, with a direct description of a simple shoebox, or with a link to a digital asset exchange file for a more complex description. Optional Objects can be described by including user-defined metadata of a measurement. For example, this might be the information about a torso, as in the measurements in which the angle between the torso and the head is varied as an independent variable.

Relation between the objects

We use two coordinate systems. Source and listener are defined in the coordinate system of the room, called global coordinate system. In free field, the global coordinate system is arbitrary. Emitters and receivers have both their own coordinate system called local coordinate system. The local coordinate system of emitter and receiver are defined relatively to the coordinate system of the source and listener, respectively. With the source and listener in the origin and at default orientation, the local coordinate systems correspond to the global coordinate system. Two vectors describe the basic orientation of the source/listener: the “view” vector defines the direction in which the source/listener looks; the “up” vector defines the top of the source/listener. In spherical coordinates, the view vector describes the azimuth and elevation angles of the source/listener. The up vector describes the roll, which is usually not considered in HRTF measurements and is optional. If given, we suggest the up vector to be orthogonal to the view-vector. The default basic orientation for the source/listener is the view vector on x-axis and the up vector on z-axis. In order to be flexible in the future, the way the position and orientations are defined is specified separately for the listener, source, all emitters, and all receivers. The default coordinate type for the position, view, and up vectors is the Cartesian (x y z). When the spherical coordinate system is required, the format is (azimuth elevation distance). The source/listener basic rotations can be further modified. Most HRTF measurements consider only rotations described by the azimuth and elevation angles. These two angles provide the possibility to describe the rotation of the listener in an intuitive way. However, for arbitrary rotations in the 3-dimensional space the exact order of the rotations becomes important. Rotation descriptions like the "yaw-pitch-roll" system (which is known as DIN 9300 for aviation and more intuitive) or the unit quaternions (which avoid the gimbal lock and are computationally efficient) clearly define the order of rotation. Note that a complete agreement on the coordinates and coordinate systems has not been done yet.

Numeric container

SOFA stores the information in a single file by serializing the data into a binary stream. The serialization is usually done by a numerical container, which defines the format of the binary representation. SOFA files have the extension “.sofa”. In order to avoid custom development of a numerical container, SOFA relies on netCDF-4 (Unidata), which is a set of software libraries and data formats supporting the creation, access, and sharing of scientific data.1 It is self-describing, network-transparent, and machine-independent; it supports huge files, partial access within a file, and allows for data compression. netCDF-4 is widely used in the field of climatology, meteorology, oceanography, and geographic information systems. It is based on the HDF5 (HDF5 Group)2, a more basic numerical container, further supported by many institutions worldwide. For SOFA, netCDF offers a structured representation of multidimensional data and metadata. The open-access specifications are freely available and include a complete definition as well as examples of various implementations. Application-programming interfaces are available as pre-compiled libraries for programming languages like C++, Octave, and JAVA. Note that netCDF is natively supported in Matlab. netCDF considers conventions, a set of recommendations in a community on the naming of attributes, variables, and dimensions within a netCDF file. Many conventions exist, mostly in the field of climate and geographical research.3 SOFA proposes conventions related to the HRTF/DRIR measurement. In particular, SOFA conventions are proposed for typical HRTF/DRIR measurement setups. According to the netCDF terminology, SOFA defines dimensions and stores data in variables and attributes. SOFA uses the so-called enhanced data model from netCDF-4, which is based on the classic netCDF data model shown in Fig. 2. Since the enhanced data model is more complex and not well spread in various computer systems yet, we mostly use the classic data model parts from the enhanced model. This way allows a simple data representation but still full flexibility in the future. More deep knowledge of netCDF format details is not required to read or write netCDF datasets. More interested readers are referred to the User's Manual.4 Note that in SOFA, we sometimes refer to the data type “string”, which is defined in the enhanced data model but is not provided in the classic model. Currently, some computer programs like Matlab and Octave have difficulties handling netCDF strings in a proper way, thus, at the moment, strings as variables are currently not supported. Native support of strings and string arrays is planned after clarification of the technical requirements.

Data

Data represent the numeric description of the acoustic systems and consist of a multidimensional matrix of an arbitrary size. Data stored in this format have the flexibility to be in the domain that best accommodates the measurement and measurement system. Data can be time domain finite IRs (data type FIR) or infinite IR filter coefficients (IIRBiquad), with or without separately stored broadband delays. The broadband delay (i.e., time-of-arrival, TOA) can be stored as discrete delays in a matrix or as parameters of continuous-directional TOA model [26]⁠. Data contain fields (e.g., Data.IR, Data.G) which are functions of the dimension N. The interpretation of N depends on the data type, e.g., for IRs, N represents the sampling interval (i.e., inverse of the sampling rate) or the number of FIR-filter taps. The interpretation is denoted in the attributes of the dimension variable N. The different data types and corresponding fields are shown in Tab. 1. Theoretically, the HRTFs/DRIRs (as a function of discrete spatial position) can be transformed to functions of continuous spatial frequency and represented in the spherical-harmonic (SH) domain. Advantages like the directional continuity or better compactness are the main reasons for such a representation. Even though not provided at the moment, SOFA aims at considering SH data in future conventions (see Sec. 4).

Dimensions

Each netCDF variable has fixed dimensions and its dimensions must be defined before creating the variable. Thus, in SOFA, netCDF dimensions are pre-defined, see 2. Data and metadata are described by using these dimensions. User-defined dimensions are currently not provided. Throughout this document, the matrix sizes are denoted by [A1 A2 … AI] where Ai represents the length of the dimension i of the I-dimensional matrix. For example, assume a database consisting of one thousand measurements, i.e., M = 1000, obtained for 1000 different rotations of the listener, i.e., ListenerRotation is [M C], using two microphones, i.e., two IR per measurement, and sampling rate of 48 kHz. Then, in the netCDF file, M = 1000, R = 2, and C = 3. Further, the netCDF variables “Data.IR”, “ListenerRotation”, and “Data.SamplingRate” have dimensions [M R N], [M C], and [1], respectively. Variables can have different dimensions. For example, it is possible to provide the ListerPosition as a single entry, meaning that the one ListenerPosition is valid for all measurements. But it is also possible to provide a different ListenerPosition for each measurement. Note that there are restrictions on the variant dimensions: The dimensions must be the pre-defined dimensions, see Tab. 2. The size of the dimensions may change, but the number of dimensions must not. In the above example, valid dimensions of the ListenerPositions are [IC] and [MC]. Invalid dimensions would be [C] and [MC].

Metadata

Metadata consist of variables and their attributes. General metadata (Tab. 3) consider the most important properties of the measurement and are valid for the global measurement setup. In order to keep it simple, nested structures within the metadata are not allowed, but grouping by prefixes, e.g., ListenerPosition and ListenerOrientation is encouraged. Attributes for the geometry description (e.g., source position, listener orientation) extend a value by further coordinate triplet C. When saved as a variable, date and time uses integer number of seconds from 1974-02-22 00:00:00. When saved as attributes string in ISO 8601 format “yyyy-mm-dd HH:MM:SS” is used.

Global attributes

General metadata are represented as global attributes in netCDF.

Object variables and their attributes

Other metadata can be a matrix of numeric (integer or float) variables or a string. Attributes can accompany a variable where appropriate. Object-specific metadata consider the description of objects listener, receivers, source, emitters (Tabs. 4 and 5). Room-specific metadata describe the room used in the measurements and depend on the attribute RoomType (Tab. 2). Measurement metadata describe other measurement-specific data like the time of a particular measurement (MeasurementTime) and have the prefix “Measurement”.

User-defined attributes

Must have explicitly define dimensions using the dimension given in

Room types

Coordinate systems

Cartesian

x, y, z as a basis

Geographic (Spherical)

Din9300

Navigational

Horizontal-Polar