Tutorial 0 - Environment Setup and Configuration

Tutorial 0 - Environment Setup and Configuration#

In this tutorial, we will cover the essential steps to set up your development environment on Google Colab and how to configure your project using .yml files.

Step 1: Open Google Colab and Setup Environment#

To begin, you need to open the Google Colab environment where you will run your Jupyter notebooks.

For each executable tutorial in this book, you will find a rocket icon near the top-right area. Click this icon to reveal the “Open in Colab” button, then click the button to launch the notebook in Google Colab.

Change Google Colab Runtime Type#

Google Colab provides different runtime types, including CPU, GPU, and TPU. For most machine learning tasks, using a GPU is recommended for faster computation. Click on the “Runtime” option in the top-left menu, then select “Change runtime type” and choose T4 GPU as the hardware accelerator.

Install Required Packages#

In this tutorial, we will need to install the yacs package, which will be used later for configuration management.

Run the code cell below to install the package:

!pip install yacs

For the other tutorials in this book, you will need to install additional packages including PyKale as specified in each tutorial.

Copy Files to Colab Runtime Storage#

For tutorials in this book, we will need to download files or link shared Google Drive folders to the Colab runtime local storage.

Run the following code cell to copy the necessary files from the workshop repository to your Colab runtime local storage.

import os
import site
import sys
import warnings

warnings.filterwarnings("ignore")
os.environ["PYTHONWARNINGS"] = "ignore"

if "google.colab" in str(get_ipython()):
    sys.path.insert(0, site.getusersitepackages())
    !git clone --single-branch -b main https://github.com/pykale/mmai-tutorials
    %cp -r /content/mmai-tutorials/tutorials/setup-config/* /content/
    %rm -r /content/mmai-tutorials

(Optional) Link Colab Runtime Storage to Google Drive#

For some tutorials, the data files have been downloaded to a shared Google Drive folder. You can access these files by linking your Google Drive to the Colab runtime storage, and create a shortcut to the shared folder in your Google Drive.

Run the following code cell to link your Google Drive to the Colab runtime storage:

from google.colab import drive

drive.mount("/content/drive")

Step 2: File Organization and Configuration#

File Structure#

Click the folder icon on the left sidebar to open the file explorer. You should see a directory structure below:

    ├───tutorial-0.ipynb
    ├───config.py
    ├───configs
    │   ├───base.yml
    │   ├───alternative.yml

The other tutorials have the similar standardized directory structure, with additional files as shown below:

    ├───tutorial-**.ipynb
    ├───model.py
    ├───config.py
    ├───configs
    │   ├───base.yml
    │   ├───**.yml
    ├───images
    │   ├───**.png
    │   ├───**.jpg
    ├───helpers
    │   ├───**.py
    ├───extend-reading
    │   ├───**.md

Configuration Using .yml Files#

As shown in the file structure above, each tutorial has a config.py script that defines the default configuration settings. The base configurations defined in the config.py file for this toy tutorial are as follows:

_C = CfgNode()

# Dataset configuration
_C.DATASET = CfgNode()
# Path to the dataset directory
_C.DATASET.DATA_DIR = "/content/"


# Model configuration
_C.MODEL = CfgNode()
# Type of model to use
_C.MODEL.NAME = "MyModel"

These default values can be overridden by the .yml files in the ./configs directory. Run the following code cell to load the configuration base.yml to override the default MODEL.NAME value in the config.py file.

from config import get_cfg_defaults

cfg = get_cfg_defaults()
cfg.merge_from_file("configs/base.yml")

print(cfg)

You can see that the MODEL.NAME value has been overridden to "SVM" as specified in the base.yml file.

Now let’s change the configuration using alternative.yml file.

cfg = get_cfg_defaults()
cfg.merge_from_file("configs/alternative.yml")

print(cfg)

You can see that both DATASET.DATA_DIR and MODEL.NAME values have been overridden to /data and "Transformer" respectively as specified in the alternative.yml file.

The configuration system allows running the tutorials with different settings without modifying the core code, and therefore enables reproducibility and reusability of the code across different experiments.