Using HuggingFace Hub¶

DeepFense publishes datasets and pretrained models on HuggingFace at huggingface.co/DeepFense.

This guide shows how to use them.

Available Datasets¶

Dataset	Samples	Description
ASVSpoof19	--	ASVspoof 2019 LA
CompSpoof	2.5k	Mixed bonafide/spoof compositions
CtrSVDD	93k	Singing voice deepfake detection
DECRO	118k	Cross-domain deepfake detection
EnvSDD	40k	Environmental sound deepfake detection
FakeMusicCaps	28k	Fake music generation detection
ODSS	30k	Open-domain speech spoofing
SONICS	49k	Spoofing in noisy conditions
SpeechFake	4.5M	Large-scale speech deepfake
SpoofCeleb	2.6M	Celebrity voice spoofing
VCapAV	90k	Audio-visual deepfake
WaveFake	134k	Waveform-based fake detection

Each dataset contains parquet files (*_train.parquet, *_dev.parquet, *_eval.parquet) with columns: ID, path, label, dataset_name.

Available Models (455+)¶

Models follow the naming convention: {Dataset}_{Frontend}_{Backend}_{Aug}_{Seed}

Examples: - ASV19_WavLM_Nes2Net_NoAug_Seed42 - CodecFake_EAT_Nes2Net_NoAug_Seed240 - FakeMusicCaps_mert_AASIST_NoAug_Seed42 - HABLA_EAT_MLP_NoAug_Seed2

Each model repo contains: - best_model.pth -- the trained checkpoint - config.yaml -- the exact config used for training

Download via CLI¶

List what's available¶

# List all datasets
deepfense download list-datasets

# List all models (shows first 20)
deepfense download list-models

# Filter models
deepfense download list-models --filter WavLM
deepfense download list-models --filter ASV19 --limit 50

Download a dataset¶

deepfense download dataset CompSpoof
# Downloads to ./data/CompSpoof/
#   CompSpoof_train.parquet
#   CompSpoof_dev.parquet
#   CompSpoof_eval.parquet

deepfense download dataset ASVSpoof19 --output-dir ./my_data

Download a model¶

deepfense download model ASV19_WavLM_Nes2Net_NoAug_Seed42
# Downloads to ./models/ASV19_WavLM_Nes2Net_NoAug_Seed42/
#   best_model.pth
#   config.yaml

Download via Python¶

from deepfense.hub import download_dataset, download_model, list_models

# Download a dataset
parquet_files = download_dataset("CompSpoof", output_dir="./data")
# Returns: ["/abs/path/data/CompSpoof/CompSpoof_train.parquet", ...]

# Download a model
model_files = download_model("ASV19_WavLM_Nes2Net_NoAug_Seed42")
# Returns: {"checkpoint": "/abs/path/...", "config": "/abs/path/..."}

# Search for models
models = list_models(pattern="WavLM")

Complete Workflows¶

Workflow 1: Train on a HuggingFace dataset¶

# 1. Download the dataset
deepfense download dataset CompSpoof

# 2. The audio files still need to be on disk.
#    The parquet files contain relative paths.
#    Download the actual audio following each dataset's README.

# 3. Create a config pointing to the downloaded parquets

# config_compspoof.yaml
exp_name: "CompSpoof_WavLM_AASIST"
output_dir: "./outputs/"
seed: 42

data:
  sampling_rate: 16000
  label_map: {"bonafide": 1, "spoof": 0}
  train:
    dataset_type: "StandardDataset"
    dataset_names: ["CompSpoof"]
    parquet_files: ["./data/CompSpoof/CompSpoof_train.parquet"]
    root_dir: "/path/to/compspoof/audio"  # where the audio files live
    batch_size: 24
    shuffle: True
    base_transform:
      - type: "pad"
        max_len: 64600
  val:
    dataset_type: "StandardDataset"
    dataset_names: ["CompSpoof"]
    parquet_files: ["./data/CompSpoof/CompSpoof_dev.parquet"]
    root_dir: "/path/to/compspoof/audio"
    batch_size: 48
    base_transform:
      - type: "pad"
        max_len: 64600
  test:
    dataset_type: "StandardDataset"
    dataset_names: ["CompSpoof"]
    parquet_files: ["./data/CompSpoof/CompSpoof_eval.parquet"]
    root_dir: "/path/to/compspoof/audio"
    batch_size: 48
    base_transform:
      - type: "pad"
        max_len: 64600

model:
  type: "StandardDetector"
  frontend:
    type: "wavlm"
    args:
      source: "huggingface"
      ckpt_path: "microsoft/wavlm-large"
      freeze: True
  backend:
    type: "AASIST"
    args:
      input_dim: 1024
  loss:
    - type: "OCSoftmax"
      embedding_dim: 32
      w_posi: 0.9
      w_nega: 0.2
      alpha: 20.0

training:
  epochs: 50
  device: "cuda"
  optimizer:
    type: "adam"
    lr: 0.0001
  monitor_metric: "EER"
  monitor_mode: "min"
  metrics:
    EER: {}
    ACC: {}

# 4. Train
python train.py --config config_compspoof.yaml

Workflow 2: Evaluate a pretrained model¶

# 1. Download the model
deepfense download model ASV19_WavLM_Nes2Net_NoAug_Seed42

# 2. Look at the config it was trained with
cat models/ASV19_WavLM_Nes2Net_NoAug_Seed42/config.yaml

# 3. Test it (update the data.test section to point to your test data)
python test.py \
    --config models/ASV19_WavLM_Nes2Net_NoAug_Seed42/config.yaml \
    --checkpoint models/ASV19_WavLM_Nes2Net_NoAug_Seed42/best_model.pth

Workflow 3: Inference on a single file¶

import torch
import torchaudio
from omegaconf import OmegaConf
from deepfense.hub import download_model
from deepfense.utils.registry import build_detector
from deepfense.models import *

# 1. Download
files = download_model("ASV19_WavLM_Nes2Net_NoAug_Seed42")

# 2. Load config and build model
cfg = OmegaConf.load(files["config"])
model_cfg = OmegaConf.to_container(cfg.model, resolve=True)
model = build_detector(cfg.model.type, model_cfg)

# 3. Load checkpoint
state = torch.load(files["checkpoint"], map_location="cpu")
model.load_state_dict(state["model_state"])
model.eval()

# 4. Load audio
audio, sr = torchaudio.load("test_audio.wav")
if sr != 16000:
    audio = torchaudio.transforms.Resample(sr, 16000)(audio)
audio = audio.mean(dim=0)[:64600].unsqueeze(0)  # mono, pad/crop, add batch dim

# 5. Inference
with torch.no_grad():
    output = model(audio)
    score = output["scores"].item()

print(f"Score: {score:.4f}")
print(f"Prediction: {'bonafide' if score > 0 else 'spoof'}")

Workflow 4: Compare models across datasets (Python)¶

from deepfense.hub import list_models, download_model

# Find all WavLM + Nes2Net models trained on ASV19
models = list_models(pattern="ASV19_WavLM_Nes2Net")
print(models)
# ['ASV19_WavLM_Nes2Net_NoAug_Seed2',
#  'ASV19_WavLM_Nes2Net_NoAug_Seed42',
#  'ASV19_WavLM_Nes2Net_NoAug_Seed240']

# Download all three seeds for ensemble evaluation
for name in models:
    download_model(name, output_dir="./models")

Dataset Format¶

The HuggingFace parquet files follow the same format DeepFense expects:

Column	Type	Description
`ID`	str	Unique sample identifier
`path`	str	Relative path to audio file
`label`	str	Label string (e.g. `"bonafide"`, `"spoof"`)
`dataset_name`	str	Dataset name

Important: The path column contains relative paths. You need to set root_dir in your config to point to the directory where the actual audio files are stored.

Model Naming Convention¶

{Dataset}_{Frontend}_{Backend}_{Augmentation}_{Seed}

Part	Examples
Dataset	`ASV19`, `CodecFake`, `FakeMusicCaps`, `HABLA`
Frontend	`WavLM`, `Wav2Vec2`, `EAT`, `mert`
Backend	`Nes2Net`, `AASIST`, `MLP`, `TCM`
Augmentation	`NoAug`, `RawBoost`, ...
Seed	`Seed2`, `Seed42`, `Seed240`