Deep reinforcement learning to optimize fractional radiotherapy in head and neck cancer
Deep reinforcement learning to optimize fractional radiotherapy in head and neck cancer
Ahmad Nejati Shahidain,1,*Azam Hesami,2
1. Department of Biomedical engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2. Lab Solutions Company, located at Science and Technology Park, Shahid Beheshti University
Introduction: Radiation therapy is one of the main approaches to the treatment of head and neck cancer. Radiotherapy uses ionizing radiation to selectively kill cancer cells and seeks to preserve as much as possible nearby healthy cells. Treatment planning is a crucial step in achieving this goal and enhancing the quality of treatment. However, the current method, which continuously delivers the dose, does not take into account the specific characteristics of the tumor that may affect the treatment outcome. The reinforcement learning (RL) method can be used to overcome these limitations. Optimization is achieved through sequential decision problems in the RL method, unlike the supervised learning method. Deep reinforcement learning (DRL) is the form of deep reinforcement learning that has gained popularity in recent years.
Methods: To solve the problem as efficiently and effectively as possible, an optimization RL agent's main objective is to find an optimal objective.
According to the classifications introduced in previous articles, the methods used by RL to find an optimal goal are usually classified into the following two modes.
1. Value-based approaches
2. Approaches based on goal search
We focus on the former in this research because they typically show better efficiency and stable performance, while the latter can be optimized directly. Value-based approaches rely on a value function that estimates the ideal for following a given goal and is gradually modified by an agent through environmental detection. A probability function is called a state-value function.
Due to the risks associated with radiation therapy, in vitro models and to a large extent in vivo models are difficult to implement, and also require a lot of time and money. To address such issues, we propose a computational model that represents the evolutionary dynamics of a radiotherapy process using data directly extracted from patients' CT scans.
Results: The most common model for predicting the radiobiological response of the cell population is the linear-quadratic (LQ) model, which is currently also used in clinical practice to plan radiotherapy treatment, but despite its generality and comprehensiveness, the LQ model is inappropriate for the RL approach because the factors It does not take into account important secondaries such as dose rate, rate of radiation damage repair, and dose fractionation. Also, this model does not consider the temporal component (time effect), because it does not provide an overview of the temporal evolution of the system. Now, to overcome these limitations, we have come up with the Γ-LQ model, which allows studying the evolution of a cell population during the entire treatment by introducing a set of ordinary differential equations (ODEs) that extend the lifetime. But still the Γ-LQ model is limited in reality because it does not consider cell regrowth, in fact there is another important secondary effect that we modeled using the Gompertz ODE model. Among several mathematical models that describe the growth of the tumor population, this model considers the decrease in the growth rate with the increase in tumor volume, and this is the best studied option.
Conclusion: This research not only provides a personalized treatment for volume adaptation, but also introduces a virtual radiotherapy environment for daily fractionation based on a set of ordinary differential equations that model tissue radiosensitivity by combining the effect of radiotherapy treatment and cell growth. Research parameters are estimated from CT scans that are routinely collected using a particle swarm optimization algorithm. This allows the DRL to learn the optimal behavior through an iterative process of trial and error with the environment. In this research, we came to the conclusion that the DRL approach can adapt to the radiation therapy process, optimize its behavior according to different functions, and finally perform better the current clinical practice.