Upload model

Browse files

Files changed (5) hide show

README.md +199 -0
config.json +46 -0
configuration_decomposer.py +67 -0
model.safetensors +3 -0
modeling_decomposer.py +388 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "architectures": [
+    "DecomposerModel"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_decomposer.DecomposerConfig",
+    "AutoModel": "modeling_decomposer.DecomposerModel"
+  },
+  "comp_sizes": [
+    768,
+    512,
+    256,
+    128,
+    64,
+    32
+  ],
+  "corr_k_vals": [
+    10,
+    100
+  ],
+  "corr_loss_type": "pearson",
+  "corr_weight": 1.0,
+  "cosine_weight": 1.0,
+  "dropout": 0.1,
+  "input_size": 768,
+  "layer_norm_eps": 1e-12,
+  "model_type": "embedding_decomposer",
+  "mse_weight": 0.0,
+  "n_comp_layers": 4,
+  "n_head_layers": 1,
+  "n_output": 2,
+  "n_refs_batch": 3072,
+  "n_refs_total": 0,
+  "n_shared_layers": 8,
+  "output_sizes": [
+    768,
+    512,
+    256,
+    128,
+    64,
+    32
+  ],
+  "shared_dim": 1024,
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.3"
+}

configuration_decomposer.py ADDED Viewed

	@@ -0,0 +1,67 @@

+from typing import List, Optional
+from transformers import PretrainedConfig
+class DecomposerConfig(PretrainedConfig):
+    """
+    Config for the embedding-decomposition model.
+    Args:
+        input_size (int):          input embedding size
+        comp_sizes (List[int]):    compressed embedding sizes
+        output_sizes (List[int]):  desired output dims (for the two blocks).
+        shared_dim (int):          common hidden size after input projection.
+        n_shared_layers (int):     how many FeedForwardLayers in shared trunk.
+        dropout (float):           dropout prob in *every* non-final layer.
+        layer_norm_eps (float|None): epsilon for LayerNorm (None → no LN).
+        n_output (int):            number of output embeddings.
+        n_refs_batch (int):        number of reference embeddings to sample per batch
+        n_refs_total (int):        number of reference embeddings total - set to 0 to skip creating embeddings
+        cosine_weight (float):     weight of 1-1 cosine similarity loss
+        mse_weight (float):        weight of 1-1 mse loss
+        corr_weight (float):       pairwise correlation loss weight
+        ref_corr (bool):           compute self-to-reference loss
+        corr_loss_type (str):      correlation loss type - "pearson" or "mse"
+        corr_k_vals (List[int]):   k-vals for weighting correlation loss
+    """
+    model_type = "embedding_decomposer"
+    def __init__(
+        self,
+        input_size:      int = 768,
+        comp_sizes:      List[int] = (768, 512, 256, 128, 64, 32),
+        output_sizes:    List[int] = (768, 512, 256, 128, 64, 32),
+        n_comp_layers:   int       = 4,
+        shared_dim:      int       = 1024,
+        n_shared_layers: int       = 8,
+        n_head_layers:   int       = 1,
+        dropout:         float     = 0.1,
+        layer_norm_eps:  Optional[float] = 1e-12,
+        n_output:        int       = 2,
+        n_refs_batch:    int       = 128,
+        n_refs_total:    int       = 2000,
+        cosine_weight:   float     = 1.0,
+        mse_weight:      float     = 1.0,
+        corr_weight:     float     = 1.0,
+        corr_loss_type:  str       = "pearson", # "pearson" or "mse"
+        corr_k_vals:     List[int] = [10, 100],
+        **kwargs,
+    ):
+        self.input_size      = input_size
+        self.comp_sizes      = list(comp_sizes)
+        self.output_sizes    = list(output_sizes)
+        self.n_comp_layers   = n_comp_layers
+        self.shared_dim      = shared_dim
+        self.n_shared_layers = n_shared_layers
+        self.n_head_layers   = n_head_layers
+        self.dropout         = dropout
+        self.layer_norm_eps  = layer_norm_eps
+        self.n_output        = n_output
+        self.n_refs_batch    = n_refs_batch
+        self.n_refs_total    = n_refs_total
+        self.cosine_weight   = cosine_weight
+        self.mse_weight      = mse_weight
+        self.corr_weight     = corr_weight
+        self.corr_loss_type  = corr_loss_type
+        self.corr_k_vals     = corr_k_vals
+        super().__init__(**kwargs)

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4fa2d69d2ea6349ff666b7418c088f3b0a394cc772d956b1771df3dca2a42e52
+size 291771256

modeling_decomposer.py ADDED Viewed

	@@ -0,0 +1,388 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Dict, List, Optional
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from transformers import PreTrainedModel
+from transformers.utils import ModelOutput
+from .configuration_decomposer import DecomposerConfig
+def pairwise_cosine(x: torch.Tensor) -> torch.Tensor:
+    """
+    x : [B,d]  or  [N,B,d]
+    returns a square similarity matrix:
+      [B,B]  or  [N,B,B]
+    """
+    x = F.normalize(x, p=2, dim=-1)
+    return torch.matmul(x, x.transpose(-1, -2))
+def cross_cosine(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+    """
+    a : [M,d] or [N,M,d]
+    b : [K,d]               (reference set - no extra axis)
+    returns:
+      [M,K]  or  [N,M,K]
+    """
+    a_n = F.normalize(a, 2, -1)
+    b_n = F.normalize(b, 2, -1)
+    if a.ndim == 2:          # [M,d]
+        return a_n @ b_n.T                # [M,K]
+    if a.ndim == 3:          # [N,M,d]
+        return torch.einsum("n m d , k d -> n m k", a_n, b_n)  # [N,M,K]
+    raise ValueError("cross_cosine: unexpected tensor rank.")
+def _drop_diag(M: torch.Tensor) -> torch.Tensor:
+    """
+    Remove the main diagonal per similarity matrix.
+    works for 2-D [B,B] or 3-D [N,B,B] tensors.
+    """
+    if M.ndim == 2:
+        n = M.size(0)
+        return M.masked_select(~torch.eye(n, dtype=torch.bool, device=M.device)
+                              ).view(n, n - 1)
+    if M.ndim == 3:
+        n = M.size(1)
+        mask = torch.eye(n, dtype=torch.bool, device=M.device).unsqueeze(0)  # [1,B,B]
+        return M.masked_select(~mask).view(M.size(0), n, n - 1)
+    raise ValueError("_drop_diag expects 2- or 3-D tensor")
+def rowwise_pearson(ref: torch.Tensor,
+                    pred: torch.Tensor,
+                    *,
+                    rm_diag: bool = True) -> torch.Tensor:
+    """
+    Pearson row-by-row; supports 2-D or 3-D inputs with identical shape.
+    returns mean correlation error  (0 → perfect).
+    """
+    if rm_diag:
+        ref  = _drop_diag(ref)
+        pred = _drop_diag(pred)
+    ref_z  = F.normalize(ref  - ref.mean(-1, keepdim=True), p=2, dim=-1)
+    pred_z = F.normalize(pred - pred.mean(-1, keepdim=True), p=2, dim=-1)
+    loss = 1 - (ref_z * pred_z).sum(-1).mean(-1)
+    if loss.ndim==0:
+        loss = loss.unsqueeze(0)
+    return loss
+def similarity_mse(ref: torch.Tensor,
+                   pred: torch.Tensor,
+                   *,
+                   rm_diag: bool = True) -> torch.Tensor:
+    if rm_diag:
+        ref, pred = _drop_diag(ref), _drop_diag(pred)
+    if pred.ndim==2:
+        loss = F.mse_loss(pred, ref).mean().unsqueeze(0)
+    elif pred.ndim==3:
+        loss = F.mse_loss(pred,
+                          ref.expand_as(pred),
+                          reduction="none"
+                         ).reshape(pred.size(0), -1).mean(-1)
+    return loss
+def sim_loss(pred:  torch.Tensor,       # [N,B,d]   or [B,d]
+             targ:  torch.Tensor,       # [B,d]     (ground truth)
+             ref:   Optional[torch.Tensor],
+             k_vals: Optional[List[int]],
+             loss_type: str = "pearson") -> torch.Tensor:
+    """
+    Returns stacked tensor of losses:
+        len = 1 + len(k_vals)
+    If `ref` is given we compute cross-similarities pred↔ref / targ↔ref,
+    otherwise self-similarities pred↔pred / targ↔targ.
+    """
+    loss_fn = rowwise_pearson if loss_type == "pearson" else similarity_mse
+    if ref is None:                         # self-sim
+        p_sim, t_sim = pairwise_cosine(pred), pairwise_cosine(targ)
+        rm_diag      = True
+    else:                                   # cross-sim vs fixed reference
+        p_sim, t_sim = cross_cosine(pred, ref), cross_cosine(targ, ref)
+        rm_diag      = False
+    losses = [loss_fn(t_sim, p_sim, rm_diag=rm_diag)]
+    if k_vals:
+        # ranks based on target sims (works for 2- or 3-D)
+        ranks = t_sim.argsort(-1, descending=True)
+        start = 1 if rm_diag else 0
+        for k in k_vals:
+            idx = ranks[..., start:start + k]
+            t_k = torch.gather(t_sim, -1, idx)
+            if p_sim.ndim==2:
+                p_k = torch.gather(p_sim, -1, idx)
+            elif p_sim.ndim==3:
+                p_k = torch.gather(p_sim, -1, idx.repeat(p_sim.size(0), 1, 1))
+            losses.append(loss_fn(t_k, p_k, rm_diag=False))
+    return torch.stack(losses, 1)              # shape [n_losses]
+# ─────────────────────────────── building blocks ──────────────────────────────
+class FeedForward(nn.Module):
+    def __init__(self, d_in: int, d_out: int):
+        super().__init__()
+        self.fc1 = nn.Linear(d_in, d_out * 2)
+        self.fc2 = nn.Linear(d_out, d_out)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x1, x2 = self.fc1(x).chunk(2, -1)
+        return self.fc2(F.silu(x1) * x2)
+class FeedForwardLayer(nn.Module):
+    def __init__(self,
+                 d_in: int,
+                 d_out: int,
+                 *,
+                 dropout: float = .1,
+                 ln_eps: Optional[float] = 1e-12):
+        super().__init__()
+        self.ff   = FeedForward(d_in, d_out)
+        self.skip = nn.Linear(d_in, d_out) if d_in != d_out else nn.Identity()
+        self.drop = nn.Dropout(dropout)
+        self.norm = nn.LayerNorm(d_out, eps=ln_eps) if ln_eps else nn.Identity()
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.norm(self.ff(self.drop(x)) + self.skip(x))
+class OutputLinear(nn.Module):
+    def __init__(self,
+                 input_size: int,
+                 n_head_layers: int,
+                 n_output: int,
+                 output_sizes: List[int],
+                 dropout: float=0.1,
+                 ln_eps: Optional[float] = 1e-12):
+        super().__init__()
+        self.n_output = n_output
+        ff_layers = [FeedForwardLayer(input_size, input_size, dropout=dropout,
+                                      ln_eps=None if i==n_head_layers-1 else ln_eps)
+                     for i in range(n_head_layers)]
+        self.ff = nn.Sequential(*ff_layers)
+        self.layers = nn.ModuleDict({str(d): nn.Linear(input_size, d*n_output)
+                                     for d in output_sizes})
+    def forward(self, inputs: torch.Tensor, sizes: List[int]):
+        inputs = self.ff(inputs)
+        weights = torch.cat([self.layers[str(i)].weight for i in sizes])
+        biases = torch.cat([self.layers[str(i)].bias for i in sizes])
+        outputs = F.linear(inputs, weights, biases)
+        output_dict = {}
+        current = 0
+        slice_sizes = [d*self.n_output for d in sizes]
+        for size in slice_sizes:
+            p = outputs[:, :, current:current+size]
+            p = p.view(p.size(0), p.size(1), self.n_output, size//self.n_output)
+            output_dict[size//self.n_output] = p
+            current += size
+        return output_dict
+def get_compression_heads(d_in, comp_sizes, n_layers, add_input_identity=False):
+    compression_heads = nn.ModuleDict({})
+    for d in comp_sizes:
+        enc_layers = []
+        for i in range(n_layers):
+            last = i == n_layers - 1
+            enc_layers.append(
+                FeedForwardLayer(
+                    d_in,
+                    d if last else d_in,
+                    dropout=0.0,
+                    ln_eps=None if last else 1e-12,
+                )
+            )
+        compression_heads[str(d)] = nn.Sequential(*enc_layers)
+    if add_input_identity:
+        compression_heads[str(d_in)] = nn.Identity()
+    return compression_heads
+# ───────────────────────────── output dataclass ───────────────────────────────
+@dataclass
+class DecomposerOutput(ModelOutput):
+    loss:        torch.FloatTensor
+    loss_terms:  Optional[Dict[str, torch.Tensor]] = None
+    decomp:      Optional[Dict[int, torch.FloatTensor]] = None  # {size:[B,2,size]}
+    ref_idxs:    Optional[torch.LongTensor] = None
+# ──────────────────────────────── main model ──────────────────────────────────
+class DecomposerModel(PreTrainedModel):
+    """Maps an embedding to *n_output* building-block embeddings for every
+    requested `output_size`. All loops are left intact for clarity."""
+    config_class = DecomposerConfig
+    # ---------------------------------------------------------------- init
+    def __init__(self, config: DecomposerConfig):
+        super().__init__(config)
+        # compression heads to avoid needing to save all embedding sizes for training
+        self.compression_heads = get_compression_heads(config.input_size,
+                                                       config.comp_sizes,
+                                                       config.n_comp_layers,
+                                                       add_input_identity=True)
+        # input → shared_dim
+        self.in_proj = nn.ModuleDict({
+            str(d): FeedForwardLayer(d, config.shared_dim,
+                                     dropout=config.dropout,
+                                     ln_eps=config.layer_norm_eps)
+            for d in config.comp_sizes
+        })
+        # shared trunk
+        blk = lambda: FeedForwardLayer(config.shared_dim,
+                                       config.shared_dim,
+                                       dropout=config.dropout,
+                                       ln_eps=config.layer_norm_eps)
+        self.trunk = nn.Sequential(*[blk() for _ in range(config.n_shared_layers)])
+        # shared_dim → each output size × n_output
+        self.out_proj = OutputLinear(self.config.shared_dim,
+                                     self.config.n_head_layers,
+                                     config.n_output,
+                                     config.output_sizes,
+                                     config.dropout,
+                                     config.layer_norm_eps)
+        # reference embeddings (optional corr-loss)
+        self.ref_emb = nn.ModuleDict({
+            str(d): nn.Embedding(config.n_refs_total, d)
+            for d in config.output_sizes if config.n_refs_total
+        })
+        self.post_init()
+    # ---------------------------------------------------------------- forward
+    def compress(self,
+                 inputs: torch.Tensor,                   # {size: [B,size]}
+                 comp_sizes: List[int]):
+        compressed = {d: self.compression_heads[str(d)](inputs) for d in comp_sizes}
+        return compressed
+    def decompose(self,
+                  inputs:  Dict[int, torch.Tensor],           # {size: [B,size]}
+                  output_sizes: List[int]):
+        hiddens = []
+        for input_size in self.config.comp_sizes:
+            if input_size not in inputs:
+                continue
+            h = self.in_proj[str(input_size)](inputs[input_size])  # [B,shared_dim]
+            hiddens.append(h)
+        hiddens = torch.stack(hiddens, dim=0) # [n_sizes, B, shared_dim]
+        hiddens = self.trunk(hiddens)
+        preds = self.out_proj(hiddens, output_sizes) # {size: [n_sizes, B, n_output, size]}
+        return preds
+    def load_targets(self,
+                     bb1_ids: torch.LongTensor,                  # [B,]
+                     bb2_ids: torch.LongTensor):                 # [B,]
+        targets = {}
+        for size in self.config.output_sizes:
+            embedding = self.ref_emb[str(size)]
+            targets[size] = torch.stack([embedding(bb1_ids), embedding(bb2_ids)], dim=1)
+        return targets
+    def compute_loss(self,
+                     inputs: Dict[int, torch.Tensor],
+                     preds: Dict[int, torch.Tensor],
+                     targets: Dict[int, torch.Tensor],
+                     ref_idxs: Optional[torch.LongTensor]=None,):
+        device = next(iter(preds.values())).device
+        loss_terms: Dict[str, torch.Tensor]  = {}
+        loss_total  = torch.zeros((), device=device)
+        cfg = self.config
+        for out_size in cfg.output_sizes:
+            p = preds[out_size]
+            t = targets[out_size]                                    # [B, n_out, d]
+            # 1) cosine to target ------------------------------------
+            if cfg.cosine_weight>0:
+                cos = 1 - F.cosine_similarity(p, t, dim=-1).view(p.size(0), -1).mean(-1)
+                loss_total += cfg.cosine_weight * cos.sum()
+                for i, in_size in enumerate(cfg.comp_sizes):
+                    loss_terms[f"{in_size}->{out_size}_cos"] = cos[i]
+            # 2) mse to target ---------------------------------------
+            if cfg.mse_weight>0:
+                mse = F.mse_loss(p, t.expand_as(p), reduction="none").view(p.size(0), -1).mean(-1)
+                loss_total += cfg.mse_weight * mse.sum()
+                for i, in_size in enumerate(cfg.comp_sizes):
+                    loss_terms[f"{in_size}->{out_size}_mse"] = mse[i]
+            # 3) correlation losses ----------------------------------
+            if cfg.corr_weight:
+                flat_p = p.flatten(1, 2)
+                flat_t = t.flatten(0, 1)
+                with torch.no_grad():
+                    ref = self.ref_emb[str(out_size)](ref_idxs)
+                ref_corr = sim_loss(flat_p, flat_t, ref,
+                                        cfg.corr_k_vals, cfg.corr_loss_type).mean(-1)
+                loss_total += cfg.corr_weight * ref_corr.sum()
+                for i, in_size in enumerate(cfg.comp_sizes):
+                    loss_terms[f"{in_size}->{out_size}_corr_ref"] = ref_corr[i]
+        return loss_total, loss_terms
+    def forward(self,
+                embedding:  torch.Tensor,                      # [B,size]
+                bb1_id: torch.LongTensor,                  # [B,]
+                bb2_id: torch.LongTensor,                  # [B,]
+                *,
+                ref_idxs: Optional[torch.LongTensor]=None,
+                return_preds: bool = False,
+                compute_loss: bool = True,
+                return_dict: bool  = True) -> DecomposerOutput: # | tuple:
+        cfg        = self.config
+        device     = embedding.device
+        targets    = self.load_targets(bb1_id, bb2_id)
+        if cfg.corr_weight and cfg.n_refs_total and ref_idxs is None:
+            ref_idxs = torch.randint(cfg.n_refs_total,
+                                     (cfg.n_refs_batch,),
+                                     device=device)
+        loss_terms: Dict[str, torch.Tensor]  = {}
+        loss_total  = torch.zeros((), device=device) if compute_loss else None
+        with torch.no_grad():
+            compressed_inputs = self.compress(embedding, cfg.comp_sizes)
+        if cfg.input_size in cfg.comp_sizes:
+            compressed_inputs[cfg.input_size] = embedding
+        preds = self.decompose(compressed_inputs, cfg.output_sizes)
+        loss_total = None
+        loss_terms = {}
+        if compute_loss:
+            loss_total, loss_terms = self.compute_loss(compressed_inputs, preds, targets, ref_idxs)
+        decomp = {k:v.permute(1,0,2,3) for k,v in preds.items()}
+        return DecomposerOutput(loss        = loss_total,
+                                loss_terms  = loss_terms,
+                                decomp      = decomp,
+                                ref_idxs    = ref_idxs)