Interpretation Study - Feature Masking Importance Analysis

Interpretation Study - Feature Masking Importance Analysis#

To better understand which features among multiomics data are the key biomarkers in cancer subtype classification and how decisions were made by the model, we perform an interpretation study to identify important biomarkers. We identify the most influential features (biomarkers) using feature-masking-based importance analysis.

We use kale.interpret to perform interpretation, where a function that systematically masks input features and observes the effect on performance—highlighting which features are most important for classification is provided.

How Feature Importance Is Computed?#

The select_top_features_by_masking function in PyKale implements a feature ablation approach to estimate feature importance for multi-omics data.

For each feature in each modality:

  • Temporarily mask (zero out) the feature.

  • Evaluate the model on the test set.

  • Measure the performance drop (e.g., in F1 score). The larger the drop, the more important the feature is.

  • Importance is calculated as \(Importance_j=(FullMetric-MaskedMetric_j)\times d\), where \(j\) is the feature index and \(d\) is the number of features in the modality (to scale the effect) For demonstration, we use F1 score as the metric to calculate feature importance.

Full results of interpretation study#

We attach the full results of most important features reported in the original paper for reference:

Table 3: Important features in BRCA dataset.

Omics

Importance features

mRNA expression

SOX11, AMY1A, SLC6A15, FABP7, SLC6A14, SLC6A2, FGFBP1, DSG1, UGT8, ANKRD45, PI3, SERPINB5, COL11A2, ARHGEF4, SOX10

DNA methylation

GPR37L1, MIR563, OR1J4, ATP10B, KRTAP3-3, FLJ41941, TMEM207, CDH26, MT1DP

miRNA expression

hsa-mir-205, hsa-mir-187, hsa-mir-452, hsa-mir-20b, hsa-mir-224, hsa-mir-204

Table 4: Important features in ROSMAP dataset.

Omics

Importance features

mRNA expression

NPNT, CDK18, KIF5A, SPACA6, TCEA3, SYTL1, ARRDC2, APLN

DNA methylation

TMC4, AGA, HYAL2, CCL3, TTC15

miRNA expression

hsa-miR-423-3p, hsa-miR-33a, hsa-miR-640, hsa-miR-362-3p, hsa-miR-491-5p, hsa-miR-206, hsa-miR-548b-3p, hsa-miR-127-3p, hsa-miR-106a_hsa-miR-17, hsa-miR-424, hsa-miR-577, hsa-miR-873, hsa-miR-651, hsa-miR-199b-5p, hsa-miR-192, hsa-miR-199a-5p, hsv1-miR-H1