Using HuggingFace Hub¶
DeepFense publishes datasets and pretrained models on HuggingFace at huggingface.co/DeepFense.
This guide shows how to use them.
Available Datasets¶
| Dataset | Samples | Description |
|---|---|---|
| ASVSpoof19 | -- | ASVspoof 2019 LA |
| CompSpoof | 2.5k | Mixed bonafide/spoof compositions |
| CtrSVDD | 93k | Singing voice deepfake detection |
| DECRO | 118k | Cross-domain deepfake detection |
| EnvSDD | 40k | Environmental sound deepfake detection |
| FakeMusicCaps | 28k | Fake music generation detection |
| ODSS | 30k | Open-domain speech spoofing |
| SONICS | 49k | Spoofing in noisy conditions |
| SpeechFake | 4.5M | Large-scale speech deepfake |
| SpoofCeleb | 2.6M | Celebrity voice spoofing |
| VCapAV | 90k | Audio-visual deepfake |
| WaveFake | 134k | Waveform-based fake detection |
Each dataset contains parquet files (*_train.parquet, *_dev.parquet, *_eval.parquet) with columns: ID, path, label, dataset_name.
Available Models (455+)¶
Models follow the naming convention: {Dataset}_{Frontend}_{Backend}_{Aug}_{Seed}
Examples:
- ASV19_WavLM_Nes2Net_NoAug_Seed42
- CodecFake_EAT_Nes2Net_NoAug_Seed240
- FakeMusicCaps_mert_AASIST_NoAug_Seed42
- HABLA_EAT_MLP_NoAug_Seed2
Each model repo contains:
- best_model.pth -- the trained checkpoint
- config.yaml -- the exact config used for training
Download via CLI¶
List what's available¶
# List all datasets
deepfense download list-datasets
# List all models (shows first 20)
deepfense download list-models
# Filter models
deepfense download list-models --filter WavLM
deepfense download list-models --filter ASV19 --limit 50
Download a dataset¶
deepfense download dataset CompSpoof
# Downloads to ./data/CompSpoof/
# CompSpoof_train.parquet
# CompSpoof_dev.parquet
# CompSpoof_eval.parquet
deepfense download dataset ASVSpoof19 --output-dir ./my_data
Download a model¶
deepfense download model ASV19_WavLM_Nes2Net_NoAug_Seed42
# Downloads to ./models/ASV19_WavLM_Nes2Net_NoAug_Seed42/
# best_model.pth
# config.yaml
Download via Python¶
from deepfense.hub import download_dataset, download_model, list_models
# Download a dataset
parquet_files = download_dataset("CompSpoof", output_dir="./data")
# Returns: ["/abs/path/data/CompSpoof/CompSpoof_train.parquet", ...]
# Download a model
model_files = download_model("ASV19_WavLM_Nes2Net_NoAug_Seed42")
# Returns: {"checkpoint": "/abs/path/...", "config": "/abs/path/..."}
# Search for models
models = list_models(pattern="WavLM")
Complete Workflows¶
Workflow 1: Train on a HuggingFace dataset¶
# 1. Download the dataset
deepfense download dataset CompSpoof
# 2. The audio files still need to be on disk.
# The parquet files contain relative paths.
# Download the actual audio following each dataset's README.
# 3. Create a config pointing to the downloaded parquets
# config_compspoof.yaml
exp_name: "CompSpoof_WavLM_AASIST"
output_dir: "./outputs/"
seed: 42
data:
sampling_rate: 16000
label_map: {"bonafide": 1, "spoof": 0}
train:
dataset_type: "StandardDataset"
dataset_names: ["CompSpoof"]
parquet_files: ["./data/CompSpoof/CompSpoof_train.parquet"]
root_dir: "/path/to/compspoof/audio" # where the audio files live
batch_size: 24
shuffle: True
base_transform:
- type: "pad"
max_len: 64600
val:
dataset_type: "StandardDataset"
dataset_names: ["CompSpoof"]
parquet_files: ["./data/CompSpoof/CompSpoof_dev.parquet"]
root_dir: "/path/to/compspoof/audio"
batch_size: 48
base_transform:
- type: "pad"
max_len: 64600
test:
dataset_type: "StandardDataset"
dataset_names: ["CompSpoof"]
parquet_files: ["./data/CompSpoof/CompSpoof_eval.parquet"]
root_dir: "/path/to/compspoof/audio"
batch_size: 48
base_transform:
- type: "pad"
max_len: 64600
model:
type: "StandardDetector"
frontend:
type: "wavlm"
args:
source: "huggingface"
ckpt_path: "microsoft/wavlm-large"
freeze: True
backend:
type: "AASIST"
args:
input_dim: 1024
loss:
- type: "OCSoftmax"
embedding_dim: 32
w_posi: 0.9
w_nega: 0.2
alpha: 20.0
training:
epochs: 50
device: "cuda"
optimizer:
type: "adam"
lr: 0.0001
monitor_metric: "EER"
monitor_mode: "min"
metrics:
EER: {}
ACC: {}
Workflow 2: Evaluate a pretrained model¶
# 1. Download the model
deepfense download model ASV19_WavLM_Nes2Net_NoAug_Seed42
# 2. Look at the config it was trained with
cat models/ASV19_WavLM_Nes2Net_NoAug_Seed42/config.yaml
# 3. Test it (update the data.test section to point to your test data)
python test.py \
--config models/ASV19_WavLM_Nes2Net_NoAug_Seed42/config.yaml \
--checkpoint models/ASV19_WavLM_Nes2Net_NoAug_Seed42/best_model.pth
Workflow 3: Inference on a single file¶
import torch
import torchaudio
from omegaconf import OmegaConf
from deepfense.hub import download_model
from deepfense.utils.registry import build_detector
from deepfense.models import *
# 1. Download
files = download_model("ASV19_WavLM_Nes2Net_NoAug_Seed42")
# 2. Load config and build model
cfg = OmegaConf.load(files["config"])
model_cfg = OmegaConf.to_container(cfg.model, resolve=True)
model = build_detector(cfg.model.type, model_cfg)
# 3. Load checkpoint
state = torch.load(files["checkpoint"], map_location="cpu")
model.load_state_dict(state["model_state"])
model.eval()
# 4. Load audio
audio, sr = torchaudio.load("test_audio.wav")
if sr != 16000:
audio = torchaudio.transforms.Resample(sr, 16000)(audio)
audio = audio.mean(dim=0)[:64600].unsqueeze(0) # mono, pad/crop, add batch dim
# 5. Inference
with torch.no_grad():
output = model(audio)
score = output["scores"].item()
print(f"Score: {score:.4f}")
print(f"Prediction: {'bonafide' if score > 0 else 'spoof'}")
Workflow 4: Compare models across datasets (Python)¶
from deepfense.hub import list_models, download_model
# Find all WavLM + Nes2Net models trained on ASV19
models = list_models(pattern="ASV19_WavLM_Nes2Net")
print(models)
# ['ASV19_WavLM_Nes2Net_NoAug_Seed2',
# 'ASV19_WavLM_Nes2Net_NoAug_Seed42',
# 'ASV19_WavLM_Nes2Net_NoAug_Seed240']
# Download all three seeds for ensemble evaluation
for name in models:
download_model(name, output_dir="./models")
Dataset Format¶
The HuggingFace parquet files follow the same format DeepFense expects:
| Column | Type | Description |
|---|---|---|
ID |
str | Unique sample identifier |
path |
str | Relative path to audio file |
label |
str | Label string (e.g. "bonafide", "spoof") |
dataset_name |
str | Dataset name |
Important: The path column contains relative paths. You need to set root_dir in your config to point to the directory where the actual audio files are stored.
Model Naming Convention¶
| Part | Examples |
|---|---|
| Dataset | ASV19, CodecFake, FakeMusicCaps, HABLA |
| Frontend | WavLM, Wav2Vec2, EAT, mert |
| Backend | Nes2Net, AASIST, MLP, TCM |
| Augmentation | NoAug, RawBoost, ... |
| Seed | Seed2, Seed42, Seed240 |