Extended Exploration#

For those eager to explore further or just curious to tweak configurations to enhance performance, we’ve prepared a set of optional tasks. These tasks are designed to encourage experimentation without requiring significant changes to the core code.


Task 1: Explore Different Atlases and FCs (20–120+ minutes)#

We provide multiple brain atlases and functional connectivity (FC) embeddings to experiment with. You can create a new configuration file or simply uncomment the cfg.DATASET.ATLAS and cfg.DATASET.FC options in the configuration section.

Open-ended questions:

  • Is selecting an appropriate atlas beneficial for building accurate brain disorder diagnosis models?

  • If so, how much improvement can we expect from choosing the best atlas?

  • Does the best-performing atlas help interpret and localize key ROIs relevant to ASD?

  • Or is the choice of atlas less impactful than the choice of functional connectivity method?


Task 2: Better Phenotypes? (30–60+ minutes)#

Our results show that using only site labels already leads to performance improvements. In contrast to Kunda et al. (2022) who used site labels for domain adaptation and treated other phenotypic variables merely as additional features, our implementation in PyKale allows for full integration of all available phenotypic variables into the domain adaptation process when specified.

This raises a key question: could leveraging a richer set of phenotypes beyond site information further enhance multi-site model generalization?

Questions to explore:

  • Is the site label alone truly sufficient for effective multi-site data integration?

  • Are there phenotypes with distinct distributions across sites that may introduce bias or noise?

  • Can incorporating those phenotypes improve performance beyond site-only models?

Warning

Given that there are many missing values as seen previously, this task might be challenging for users who are unfamiliar with Python and pandas, as it may require manual crafting for encoding or imputation as done in preprocess_phenotypic_data.


Task 3: More Sites → Better Generalization? (20–60+ minutes)#

With ten sites, domain adaptation shows improved generalization under the leave-one-group-out (LPGO) setting. This raises new questions:

Things to consider:

  • Does domain adaptation continue to help as we include more sites, or is the benefit limited to fewer-site scenarios?

  • Is there a saturation point where adding more sites stops improving generalization, or even worsens it?

  • Could fewer but more homogeneous sites be better than many heterogeneous ones?


These tasks are designed to help you dive deeper into model robustness, generalizability, and interpretability in real-world neuroimaging settings. Feel free to explore, question, and iterate!

Hope you enjoy this tutorial!

Exercise Solutions#