Backend Components¶
Backends take the features extracted by the Frontend and map them to a fixed-dimensional embedding vector.
Available Backends¶
1. AASIST (AASIST)¶
A Graph Attention Network (GAT) based architecture designed for audio deepfake detection and ASV spoofing.
Configuration Signature:
Parameters:
- filts - (list) Filter configuration.
- gat_dims - (list) Graph attention dimensions.
Example:
2. ECAPA-TDNN (ECAPA_TDNN)¶
Strong backend for speaker verification, adapted for Deepfake Detection. Features channel attention (SE-Blocks) and multi-scale feature aggregation.
Configuration Signature:
Parameters:
- channels - (int) Number of channels in Res2Net blocks (default: 512).
- emb_dim - (int) Output embedding dimension (default: 192).
Example:
3. RawNet2 (RawNet2)¶
A classic CNN-GRU architecture for ASV spoofing.
Configuration Signature:
Parameters:
- filts - (list) Channels for each residual block.
- gru_node - (int) GRU hidden size.
- emb_dim - (int) Output dimension.
Example:
4. MLP (MLP)¶
A simple Multi-Layer Perceptron with configurable pooling. Good for SSL frontends (Wav2Vec2, WavLM) that already output high-level features.
Configuration Signature:
Parameters:
- input_dim - (int) Dimension of input features.
- projection - (list[int]) List of hidden layer sizes (e.g.,
[128, 64]). - pooling_type - (str) Pooling method (
mean,max,asp(Attentive Statistics Pooling)).
Example:
5. Res2Net (Nes2Net)¶
A Res2Net-based convolutional architecture.
Configuration Signature:
Parameters:
- strides - (list) Stride settings for layers.
- filts - (list) Channel counts for layers.
Example:
Input/Output¶
- Input: Features from Frontend
[B, T, C]. - Output: Embedding vector
[B, Embedding_Dim].
Next Step: Loss Functions →