Overview

Turing Course: An Introduction to Transparent Machine Learning

Welcome to An Introduction to Transparent Machine Learning, part of the Alan Turing Institute’s online learning courses in responsible AI. This self-paced online learning course aims to introduce essential materials on transparent machine learning for learners of diverse backgrounds to understand and apply transparent machine learning in real-world applications with confidence and trust, targeting a missing gap in the online educational space.

We recast selected contents from a popular, award-winning textbook An Introduction to Statistical Learning with Applications in R [James et al., 2021] into the perspective of system and process transparency under the AI transparency framework in AI in Financial Services report [Ostmann and Dorobantu, 2021] from the Alan Turing Institute, using Python. The chosen textbook is widely used as standard in many university courses on machine learning, written by leading academics, and explains topics in a highly accessible way. Under the chosen AI transparency framework, transparent machine learning systems will cover fully transparent machine learning models such as linear regression and “semi-transparent” machine learning models such as deep learning, and transparent machine learning processes will cover machine learning model evaluation and software development methodologies such as cross validation and software development life cycle.

This course is developed as a PyKale repository for deployment as an Alan Turing Institute repository. Welcome your feedback/contribution via opening issues, discussions, and/or pull requests.

Watch the 10-minute video below for a conversation on this course between Lily Clements and Haiping Lu.

Learning objectives [LOs]

By the end of the course, we expect a learner can demonstrate the ability to:

  • [LO1] Understand the theoretical issues and wider context related to transparent machine learning systems and processes.

  • [LO2] Understand the inner-working of a selected number of transparent machine learning algorithms with the capability of interpret the modelling process and the input-output relationships.

  • [LO3] Deploy a practical implementation of transparent machine learning systems and processes in a real-world setting using Python libraries such as scikit-learn.

  • [LO4] Visualise, interpret, and explain transparent machine learning systems and processes in a real-world setting to help stakeholders understand these systems and processes.

  • [LO5] Gain the skills to confidently apply machine learning to your area of work, regardless of your background.

Target audience

Learners from diverse backgrounds, preferably with knowledge and skills of basic mathematics (particularly probability and linear algebra) and Python programming for machine learning. We suggest those lacking such knowledge and skills to go through Prerequisites to pass the quiz there first.

Discussion forum for Q&A etc

We have a discussion forum as our primary communication channel, e.g. for Q&A, information sharing, and discussion. We also provide a rich set of labels for you to label your posted messages so that similar messages can be easily found. Please ask questions or post information there, rather than emailing the instructors directly. This will help others to benefit from the answers and help build an engaging community. You will need a GitHub account to post questions. Please follow the Code of Conduct when using the discussion forum.

How to go through the course

The course is designed to be self-paced and self-directed. The course materials are organised into a Jupyter Book consisting of 10 chapters, plus prerequisites. Each chapter has a number of sections. Each section is a self-contained unit of learning, mostly taking no more than 30 minutes to complete and ending with an exercise. Answers to exercises are provided but hidden so that you should solve the exercises before checking the solutions, in most cases. Each chapter will have a quiz for assessment. Answers for quizzes will NOT be provided. You are advised to score at least 50% to proceed to the next topic.

For sections prepared in executable Jupyter notebooks, you will see a rocket icon at the top (mid-right) . For these sections, we suggest you to go through the materials in the browser (html) first and then launch (via ) the corresponding Jupyter notebook (better in Colab) to run the code and complete the exercises. Optional: You can also download the Jupyter notebooks and run them locally on your machine after installation.

We design this course to be self-contained so that most users can complete in 40 hours. For those short of time and those having relevant knowledge and skills already, going through the materials here should be sufficient. For those lacking such knowledge and skills, those who want to go deeper, and those with more time available, we suggest you to (re)read the textbook An Introduction to Statistical Learning alongside this course. You can buy the book or read the free online version in PDF.

Course outline and structure

Besides the prerequisites, the 10 chapters can be grouped into two parts: part one covers primary topics and part two covers secondary topics.

  • Prerequisites: Basic mathematics and Python programming for machine learning

Primary topics

  • Chapter 1: [System and Process] Introduction to machine learning and transparency

  • Chapter 2: [System] Linear regression

  • Chapter 3: [System] Logistic regression

  • Chapter 4: [Process] Hypothesis testing and software development

  • Chapter 5: [Process] Cross validation and bootstrap

Secondary topics

  • Chapter 6: [Process and System] Feature selection and regularisation

  • Chapter 7: [System and Process] Trees and ensembles

  • Chapter 8: [System] Generalised linear models and support vector machines

  • Chapter 9: [System] Principal component analysis and \(K\)-means/hierarchical clustering

  • Chapter 10: [System] Neural networks and deep learning

Instructors and contributors

Haiping Lu
Haiping Lu
Shuo Zhou
Shuo Zhou

This course is developed by Haiping Lu, a Professor of Machine Learning, and Shuo Zhou, an Academic Fellow in Machine Learning, both at the Department of Computer Science, The University of Sheffield. H. Lu has developed and taught two machine learning courses Machine Learning and Adaptive Intelligence from 2018 to 2021 and Scalable Machine Learning since 2017. S. Zhou was a head teaching assistant of the Machine Learning and Adaptive Intelligence course in fall 2019 and he has been teaching the Scalable Machine Learning course since spring 2023.

Mohammod Naimul Islam Suvon contributed all the course exercises and quizzes. Lawrence Schobs reviewed all teaching materials, provided detailed comments, added summaries and references, and refined the materials. After that, H. Lu and S. Zhou checked and refined the materials again for the release.

We thank valuable feedback received from Catherine Inness, a Data Science Senior Manager at Accenture.

We welcome and recognise all contributions. You can see a list of current contributors in the contributors tab.

Acknowledgements

License

All content except for YouTube videos is released under the MIT License. YouTube videos are embedded according to YouTube’s Terms of Service.