2D- and 3D-QSAR, molecular docking, and virtual screening of pyrido[2,3-d]pyrimidin-7-one-based CDK4 inhibitors

Cyclin-dependent kinases (CDKs) are important targets for combating various types of cancer. Inhibitors of the CDK4 enzyme are promising agents for clinical use as anticancer agents. In this study, structure-based and ligand-based drug design methods were applied on a dataset of 52 pyrido[2,3-d]pyrimidin-7-one-based CDK4 inhibitors. Predictive 2Dand 3D-quantitative structure-activity relationship (QSAR) models were developed and were analyzed for understanding the important molecular properties that affect the activity. Molecular docking was conducted to analyze the binding interactions between the ligands and the target enzyme. Also, virtual screening of the ChEMBL database was carried out using the validated QSAR model and the molecular docking procedure. A total of six compounds were identified as potentially novel CDK4 inhibitors that have favorable drug-like properties and can serve as lead compounds for the development of anticancer therapeutic agents.


INTRODUCTION
Cyclin-dependent kinases (CDKs) are members of the serine/threonine protein kinases family and are important modulators of the cell cycle process. Targeting the CDKs for the treatment of various types of cancer has been shown to be a valid and promising therapeutic approach (Wenzel and Singh, 2018). The CDK4 enzyme is involved in controlling the progression of the cell cycle's G1 phase and inhibiting this enzyme prevents cell division. As the CDK4 has been observed to be overexpressed in several types of cancer, it has gained attention as a target for small molecules anticancer agents, in particular for the treatment of breast cancer (Bendris et al., 2015;Brown et al., 2015;Pandey et al., 2019). Derivatives of the pyrido[2,3-d]pyrimidin-7one scaffold (shown in Table 1) have been shown to have CDK4 inhibitory activity with modest selectivity towards other kinases (Barvian et al., 2000). However, very limited computational studies have been applied to this class of CDK4 inhibitors. In this study, computer-aided drug design (CADD) methods have been applied to a dataset of CDK4 inhibitors to gain a better understanding and insights into the binding process and the structural requirements for achieving high activity. Ligand-based CADD methods including 2Dand 3D-quantitative structure-activity relationship (QSAR) were used to develop quantitative models that can be used for predicting the activity of novel compounds as well as for demonstrating the important physicochemical properties that affect the activity of the compounds (Kubinyi et al., 2006;Lewis and Wood, 2014). Also, molecular docking, which is a structure-based CADD method, was used to elaborate the binding interactions of the compounds inside the active site of the CDK4 enzyme (Pinzi and Rastelli, 2019;Torres et al., 2019). Finally, virtual screening was carried out on the ChEMBL database by applying the developed and validated QSAR model and the molecular docking to identify potentially active novel CDK4 inhibitors with improved pharmacokinetic properties (Gaulton et al., 2017;Guedes et al., 2018;Neves et al., 2018).

Dataset
The dataset selected for molecular modeling studies compromises 52 CDK4 inhibitors based on the pyrido[2,3-d] pyrimidin-7-one scaffold ( Table 1) that has a phenyl ring attached to secondary amine (Barvian et al., 2000). The structures of the dataset and their reported pIC50 values are listed in Table 1. All the compounds have been assayed using the same method and conditions. The activity of the compounds spans a wide range of more than three orders of magnitude with a good activity distribution. These properties make the dataset suitable for molecular modeling studies, in particular for QSAR analysis (Tropsha, 2010).

2D-QSAR modeling
The 2D-QSAR method uses molecular descriptors that are based on the graph of the structure (i.e., 2D information) and correlates those descriptors with the activity (Lewis and Wood, 2014). In this study, the PaDEL-Descriptor software was used to calculate a total of 1,444 2D-descriptors (Yap, 2011). The dataset was divided into a training set (75%, 39 compounds) and a test set (25%, 13 compounds) using the Kennard--Stone algorithm (Puzyn et al., 2011). Table 1 indicates the training and test set compounds. A common pretreatment process was applied on the calculated descriptors, including the removal of descriptors with constant or near-constant values (a variance cutoff of 0.001 was used) and the removal of highly intercorrelated descriptors (a correlation cutoff of 0.6 was used). The remaining descriptors were used to develop the 2D-QSAR model using multiple linear regression (MLR). A genetic algorithm (GA) was used as a variable selection method (Ambure et al., 2015). The developed model was validated internally and externally using common validation procedures. Internal validation included leave-one-out (LOO) cross-validation (CV), as well as calculation of common parameters such as the root mean square error (RMSE) and correlation coefficient of CV (Q 2 LOO ). The test set was used for external validation to assess the predictive ability of the model (Roy et al., 2016;Veerasamy et al., 2011).

Alignment
The alignment of structures is a crucial step in the process of 3D-QSAR modeling. In this study, two different alignment methods implemented in the Open3DAlign software were used, namely, the atom-based and the pharmacophore-based methods . Both methods require a template for aligning the other structures onto it. Compound 43 was used as the alignment template as it is the most active compound. The atombased method operates by attempting to match the atoms of the template structure and the atoms of the probe structure (i.e., the structure to be aligned). The matching of the atoms is based on the properties of the atoms such as the partial charge . The pharmacophore-based method represents the structures by their pharmacophore features only, and then it attempts to align the structures in a way that maximizes the overlap of their pharmacophore features that are of the same type (Taminau et al., 2008). Both methods were applied to generate two alignments. The alignment with the higher score was selected for developing the 3D-QSAR model.

Model development
The aligned structures were used for developing 3D-QSAR models. The Open3DQSAR software was used for calculating the molecular interactions fields (MIFs) as well as for partial least squares regression (PLS) analysis . Electrostatic and steric MIFs were calculated and the common procedure for variables pretreatment was carried out including the truncation of grid points with extreme energy values (cut off = 30.0 kJ/mol) and the exclusion of variables with constant or near-constant values. Following the variables pretreatment processes, PLS regression was used to develop the statistical models (Kubinyi et al., 2006). The number of PLS components was determined based on the value of the Q 2 . The models were validated internally and externally in a similar manner as described for the 2D-QSAR model.

Molecular docking
Molecular docking is an essential tool in the drug design and discovery process. In this study, docking was used to understand the ligand-protein interactions as well as in virtual screening to identify novel CDK4 inhibitors (Torres et al., 2019). The AutoDock Vina software was used for docking the structures (Trott and Olson, 2010). Although several crystal structures of the CDK4 enzyme are available in the Protein Data Bank (PDB) database, none of them has a cocrystallized ligand, and they are in an inactive state. In this study, a hybrid model of the CDK4 enzyme constructed by Shafiq et al. (2012) was used for the docking experiment. Briefly, the model is constructed using a homology modeling approach from both a CDK4 crystal structure (inactive state) and a CDK2 crystal structure (in the active state) to produce a CDK4 hybrid model in the active state. The model was further refined and validated using molecular dynamics simulation.
Although the CDK4 hybrid model is in the active state, it lacks a ligand in the active site. To do a redocking experiment and validate the docking procedure, the CDK4 hybrid model was superimposed onto a crystal structure of a CDK6 (PDB-ID: 2EUF) enzyme that has a cocrystallized ligand, and then the protein of the CDK6 was removed, leaving the ligand in the hybrid model active site as a ligand-protein complex. The sequence identity between the CDK4 and CDK6 is 68%. This strategy has been successfully used by Al-Warhi et al. (2020). Redocking of the ligand and measurement of the root mean square deviation (RMSD) was carried out to assess the accuracy of the docking procedure.

Virtual screening
To discover novel potentially active CDK4 inhibitors with improved drug-like properties, virtual screening of the ChEMBL database compounds was carried out. The ChEMBL database is widely used in drug discovery as it contains a large number of curated drug-like compounds (Gaulton et al., 2017). Initially, a substructure search was carried out to retrieve compounds from the database that have the general scaffold of the compounds used in the study. In the next step, the compounds were filtered using the Lipinski rule of five to retain only the compounds with good absorption, distribution, metabolism, and excretion (ADME) profile (Pollastri, 2010). The activities of the remaining compounds were predicted using the 3D-QSAR model; initially, the alignment was carried out onto the template structure, and then the developed model was applied to predict the activities. Compounds with submicromolar predicted activities were retained while others were removed. In the final step, the remaining compounds were docked into the target enzyme using the validated docking procedure, and compounds that were successfully docked were visually inspected and selected as potential novel CDK4 inhibitors.

2D-QSAR modeling
A total of five descriptors were selected by the GA-MLR method, the descriptors alongside their contribution and description are shown in Table 2 The initial term in the equation corresponds to the Y-intercept and the standard deviation values are in the parenthesis. The statistical parameters of the obtained 2D-QSAR model via GA-MLR are presented in Table 3. The model exhibited significant internal and external validation parameters, as demonstrated by a Q 2 -value of 0.629, which is higher than the threshold value (0.5) for considering the model predictive. Similarly, the coefficient of determination (R 2 (test) ) value of 0.697 is higher than the threshold value (0.6) for considering the model predictivity (Golbraikh and Tropsha, 2002). Also, both the RMSE and RMSE (test) have Table 2. The molecular descriptors selected by the GA-MLR method.

VR1_Dt
Negative Randic-like eigenvector-based index from detour matrix

MDEO-12
Negative Molecular distance edge between all primary and secondary oxygens

VE1_Dzv
Positive Coefficient sum of the last eigenvector from Barysz matrix/weighted by van der Waals volumes relatively low values, indicating the validity of the model (Wenzel and Singh, 2018). Figure 1 shows the plot of the predicted activity by the model and the observed activity of the compounds.

3D-QSAR modeling
The pharmacophore-based alignment of the compounds was used for 3D-QSAR modeling as it demonstrated a higher score than the atom-based alignment methods. The aligned structures are shown in Figure 2. The 3D-QSAR model exhibited more statistically significant parameters compared to the 2D-QSAR model. In terms of internal validation, the 3D-QSAR model showed a higher value of R 2 , F-value, and Q 2 while the RMSE was lower than the 2D-QSAR model. Also, the external validation parameter R 2 (test) is higher, and the RMSE (test) value was lower, indicating a more predictive model. In general, the improved validation parameters of the 3D-QSAR model indicate the importance of the 3D conformations of the structures in the binding process to the target protein. As the 3D-QSAR model was validated internally and externally and demonstrated predictive ability, the contour maps of the steric and electrostatic fields were visualized onto the most active compound (43).

3D-QSAR visualization
Visualization of the contour maps of the 3D-QSAR model allows for the identification of regions where electrostatic or steric substituents improve or reduce the activity. Figure 3a shows the visualization of the electrostatic field onto compound 43. The red contours represent regions where negatively charged substituents or hydrogen bond acceptors are favorable for high activity. The most significant region is near the phenyl ring, which can be explained by the partially negative nature of the aromatic ring due to the cloud of pi-electrons. Also, the partial negative   charge of the oxygen directly attached to the phenyl ring which can act as a hydrogen bond acceptor in compound 43 contributes to this region. The blue areas represent regions where negatively charged groups or hydrogen bond acceptors reduce the activity.
The most significant areas are positioned near the R 1 group. They can be explained as the compounds with low activity contain substituents with negative partial charges; for instance, compound 19 which is the least active compound contains an ester substituent; similarly, compounds 16, 4, and 14 contain substituents with partially negative ether groups. Figure 3b shows the visualization of the steric field's contour map onto compound 43. The green regions indicate areas where bulky substituents can increase the activity. The most significant contour region is around the R 1 substituent. They can be seen as the most active compounds of the series that have a cycloalkyl group at this position. For instance, the three highest active compounds 43, 45, and 48 contain cyclopentyl, cyclohexyl, and cycloheptyl substituents at this position, respectively. On the other hand, most compounds with moderate to low activity have methyl or polar substituents at this position. Also, a similarly significant area extended from the R 2 position is present. This indicates that analogs with extended R 2 substituents can be beneficial for activity, as in compounds 43 and 48. The yellow regions indicate areas where bulky substituents decrease the activity. The most significant region is extended near the R 1 position; this shows that substituents with high extension from the R 1 position can be detrimental to the activity. The least active compounds 19, 25, and 16 have a substituent with an alkyl chain from this position. Overall, a cycloalkyl group directly attached to the R 1 without extension is the optimum substituent for high activity.

Molecular docking
The redocking experiment of the ligand in the CDK4 hybrid model successfully reproduced a close conformation with an RMSD value of 1.292 Å, indicating the validity of the docking accuracy. The cocrystallized ligand in the 2EUF PDB record is highly similar to the structures of this study. Figure 4 shows the superimposed docked ligand on the original one.
The binding interactions between compound 43 and the residues of the active site are shown in Figure 5. The amine N acts as a hydrogen bond donor forming a hydrogen bond with VAL96; the distance of the interaction was determined to be 2.7 Å, which makes it in the optimum range for a strong hydrogen bond interaction. The other hydrogen bond is formed between the side chain of HIS95 and the ring N at a distance of 3.0 Å (Fabiola et al., 2002). The bi-ring system is stabilized in the binding pocked via several hydrophobic and pi-alkyl interactions with the hydrophobic residues including ALA33, LEU147, VAL72, and ALA157. Those hydrophobic interactions seem to have a crucial role in positioning the structure inside the active site. The cyclopentyl ring is extended in the hydrophobic pocket making favorable hydrophobic interactions with VAL20. The phenyl ring is positioned feasibly to form amide-pi stacked interaction with a backbone amide group of ASP97. Also, the hydrophobic side chain of ILE12 is positioned at a close distance to the phenyl ring for the formation of hydrophobic interactions. Figure 6 summarizes the effects of the different groups on the activity of the compounds deduced from the 3D-QSAR interpretation. Also, the interactions of different parts of the structure elaborated from the molecular docking are shown on the most active compound (43).

Virtual screening
The substructure query of the ChEMBL database using the general scaffold of the studied compounds retrieved a total of 445 matching structures. These structures were preprocessed and filtered using the Lipinski rule of five to only retain compounds with good ADME properties. A total of 226 compounds were retrieved after this step. Those remaining compounds were aligned onto the template compound (43) and the 3D-QSAR model was applied to predict their activities. The 3D-QSAR model was used because it demonstrated a better predictive ability and more statistically significant validation parameters. A total of 90 compounds with submicromolar predicted activity were found. The final stage of the virtual screening involved docking and visual inspection of compounds to select the best lead-like structures. A total of six novel compounds were selected as potential CDK4 inhibitors with good ADME properties. Table 4 shows a list of the chemical structures of the compounds with their respective QSAR-predicted activity and the binding energy calculated by docking. The calculated docking binding energies of the compounds indicate potentially good binding affinity, which is consistent with the predicted activity by the QSAR model. Compound 1 has the highest activity predicted by the QSAR model and the second highest binding affinity calculated by docking, making it the best candidate lead compound. On the other hand, compound 6 has the lowest activity predicted by the QSAR model as well as the least binding affinity calculated by the docking, indicating a lower probability of being highly active. The Lipinski rule of five properties of the identified compounds is reported in Table 5. Overall, all these derivatives have not been previously reported as CDK4 inhibitors and are promising lead-like compounds.