Virtual screening campaigns and ADMET evaluation to unlock the potency of flavonoids from Erythrina as 3CL pro SARS-COV-2 inhibitors

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused the coronavirus disease 2019 (COVID-19) pandemic with more than 6 million deaths worldwide. Flavonoids from the genus Erythrina may inhibit SARS-CoV-2, targeting 3C-like protease (3CL pro ), an enzyme essential in the virus’s growth. Hence, this study aimed to screen 378 flavonoids from Erythrina against 3CL pro , using molecular docking, Lipinski’s rule of five, and in silico absorption, distribution, metabolism, excretion, and toxicity. These virtual screening campaigns suggest that 108 flavonoids have stronger binding energy values (−13.23 to −10.20 kcal/mol) than the N3 inhibitor (−10.14 kcal/mol) as the reference ligand. Some 33 of these flavonoids may be hepatoxicity-and mutagenicity-free. They are also non-hERG I and II inhibitors. Two of them, orientanol E ( 171 ) and erycaffra F ( 57 ), have binding energy values in the top 10 hits and good absorption profiles, despite their poor distribution properties. They may have a high bioavailability in the body and be excreted from the body through feces. Conducted molecular dynamics simulations also support orientanol E ( 171 ) and erycaffra F ( 57 ) as 3CL pro inhibitor candidates. Our study suggests that flavonoids from Erythrina have potential as 3CL pro inhibitors, which help guide further in vitro and in vivo experiments in COVID-19 drug development.

Several studies have suggested that natural products can inhibit 3CL pro (Liu et al., 2021;Su et al., 2021). Myricetin exhibits IC 50 of 0.30 μM to 3CL pro by interacting with the catalytic site residues (Cys145 and His41) (Su et al., 2021). Meanwhile, baicalein shows a potent inhibitory activity to 3CL pro with IC 50 of 0.39 μM (Liu et al., 2021). Both flavonoids have high inhibition activities since their IC 50 values are below 20 μM (Liu et al., 2008) and shed light on the potency of other flavonoids as 3CL pro inhibitors.
Nowadays, in silico approaches are the most commonly used methods to accelerate drug discovery from natural products (Ramírez and Caballero, 2018). One of them is molecular docking, which has been utilized in various studies (Hardianto et al., 2021;Petrou et al., 2022;Raschi et al., 2009;Ul Qamar et al., 2017). Molecular docking has assisted Yamamoto et al. (2022) in virtually screening 3CL pro inhibitors from the Enamine library before in vitro assays. It was also used by Ahmad et al. (2021) as one of the methods to elucidate the molecular mechanism of PF-07321332, an oral drug for COVID-19 treatment, in inhibiting 3CL pro . Therefore, we used molecular docking to screen 378 flavonoids from Erythrina as 3CL pro potential inhibitors. Meanwhile, to evaluate the drug-likeness of the screened flavonoids as orally active drugs, we employed Lipinski's rule of five (Ro5) (Lipinski et al., 2012). We also performed molecular dynamics (MD) simulations to top hits for more realistic energy prediction (Forli et al., 2016).
Another vital aspect of drug development is the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of compounds in the body. Absorption, distribution, metabolism, and excretion (ADME) data provide information to assess drug molecule candidates in crossing the small intestine, which eventually are distributed to target cells, metabolized, and cleared from the human body. Metabolism is an important process in activating some drug molecules. However, it may also convert drug molecules into toxic forms (Daina et al., 2017;Gil et al., 2020;Kavaliauskas et al., 2020). Hence, we evaluate the ADMET properties of flavonoids from Erythrina using an in silico approach via the pkCSM web server (Pires et al., 2015). Our study suggests that flavonoids from Erythrina have potential as 3CL pro inhibitors, which help guide further in vitro and in vivo experiments in COVID-19 drug development.

Flavonoid structures preparation
Fahmy et al. (2018) reported that 378 flavonoids have been isolated from the genus Erythrina. The two-dimensional (2D) structures of the flavonoids were sketched in Chemaxon MarvinSketch (Chemaxon, 2020). The protonated states of the flavonoids at physiological pH (7.4) were predicted using the same program, Chemaxon MarvinSketch. Their three-dimensional (3D) structures with the protonated states at physiological pH were saved in .pdb format. MGLTools 1.5.6 (Forli et al., 2016) was utilized to prepare the 3D structures for molecular docking.

Molecular docking validation
The crystal structure of 3CL pro was retrieved from the Protein Data Bank (PDB) website (https://www.rcsb.org) with PDB ID 6LU7 (Jin et al., 2020). The 3D structures of 3CL pro and its inhibitor (N3) (Saur et al., 2021;Yang et al., 2005) were separated in BIOVIA Discovery Studio (DS) 2020 Client and saved as separated pdb files, while the water and ion molecules were discarded. Subsequently, both structures were subjected to AutoDockTools 1.5.6 for the preparation of molecular docking validation. Nonpolar hydrogen atoms were removed, whereas the polar ones were retained. The Gasteiger and Kollman atomic charges (Forli et al., 2016) were added to the N3 and 3CL pro structures, respectively. The active torsions of the N3 structure were set as suggested by AutoDockTools 1.5.6. Both 3CL pro and N3 structures were saved as a pdbqt format. The dimension of the grid box was set to cover the whole structure of N3. Grid maps at the binding site of 3CL pro were computed using the AutoGrid4.2 program. A Lamarckian genetic algorithm (GA) was selected in molecular docking. The number of GA runs and population size were set to 100 and 300, respectively. Meanwhile, the maximum number of evaluations was 2.500.000 times (medium). The rest of the search parameters were set to defaults, and so were docking parameters. Molecular docking validation was performed by redocking the N3 structure to the binding site of 3CL pro . The validation step was considered a success if the root mean square deviation (RMSD) of the docked N3 was below 2.00 Å (Gohlke et al., 2000;Kissler et al., 2020).

Virtual screening campaigns
The preparation of flavonoid structures in AutoDockTools 1.5.6 followed the steps in the molecular docking validation above. We performed virtual screening campaigns using Raccoon v1.0b on an Ubuntu 20.04 operating system. The grid map and docking parameters used in molecular docking validation were employed as templates. The virtual screening results were analyzed in AutoDockTools 1.5.6.

ADMET prediction
ADMET prediction was conducted through the pkCSM web server (http://biosig.unimelb.edu.au/pkcsm). The absorption criteria consist of water solubility, Caco2 permeability, human intestinal absorption, P-glycoprotein substrate, and P-glycoprotein I and II inhibitors. Hit distribution prediction involves the steady-state volume of distribution (VDss), fraction unbound, blood-brain barrier (BBB), and central nervous system (CNS) permeabilities. The in silico metabolism study includes the prediction of hits as inhibitors of CYP1A2, CYP2C19, CYP2C9, and CYP2D6. In addition, pkCSM also predicts hits as CYP2D6 and CYP3A4 substrates. Total clearance and renal organic cation transporter 2 (OCT2) are two criteria to evaluate the excretion of hits. In evaluating the safety profile of hits, we used pkCSM to predict AMES toxicity and hepatoxicity. Moreover, we employed the web server to predict the potency of hits as hERG I and II inhibitors. We utilized a tidyverse package (Wickham et al., 2019) in an R environment to assist with ADMET analysis.

MD simulations
MD simulations were performed to evaluate the binding energy of the selected compounds compared with the N3 and GC376 inhibitors. The ligand was deployed in the AMBER force field with GAFF (Wang et al., 2004) and the 3CL pro with ff14SB (Maier et al., 2015). Each protein-ligand was solvated in a TIP3P water model as an explicit solvent with a minimum boundary distance of 10 Å. Also, the numbers of Na + and Cl − ions were added to each protein-ligand complex to acquire the neutralization and physiological state. All systems were minimized, heated gradually to 310 K, and equilibrated, which eventually produced 100 ns trajectories. Afterward, binding energy values were computed from the last frame MD trajectory using molecular mechanics with a generalized born and surface area solvation (MMGBSA) method.

Visualization
Molecular interactions between the best hits, which have good ADMET profiles, and the binding site of 3CL pro were visualized using BIOVIA DS 2020 Client. The generated images were processed further using GNU Image Manipulation Program 2.10.

Molecular docking validation
As the validation step in the virtual screening campaign, we conducted a redocking procedure on the crystal structure of SARS-CoV-2 3CL pro (hereafter mentioned as 3CL pro ) and its ligand (N3), which is deposited in the PDB as 6LU7. Such a procedure ensures that the stochastic search method reaches a global minimum (Forli et al., 2016). We achieved a RMSD of less than 2 Å in the redocking procedure by applying grid box dimensions of 26 × 32 × 32 with 1 Å spacing. The best pose of N3 has a binding energy score of −10.14 kcal/mol in binding to 3CL pro .
The conducted virtual screening campaign suggests that many flavonoids from Erythrina are potential 3CL pro inhibitors. Nonetheless, these compounds should fulfill drug-likeness criteria to be active as oral drugs. Therefore, in the next step, we assessed the resulting hits using Lipinski's Ro5 through the SwissADME web server.

Lipinski's Ro5
Several of the 108 hits were further assessed for their drug-likeness using Lipinski's Ro5 through the SwissADME web server. According to Ro5, drugs should have a MW less than 500 Da, HBD and HBA less than 5 and 10, respectively, and logP less than 5. Compounds should fulfill three rules for the minimum requirement as drug-like molecules (Lipinski et al., 2012;Raj, 2021).
Most of the hits, 103 flavonoids, fulfill the minimum requirement of Ro5. Some 101 satisfy all the rules of Ro5, whereas vitexin (11) and bis-sigmodiol (378) violate the rule of HBA number (Table S2). These results suggest that those 103 flavonoids are drug-like molecules and potential orally active drug candidates.

In silico ADMET
Oral is the most popular route for drug administration due to its convenience for patients (Alqahtani et al., 2021). Through oral administration, drug molecules follow a journey including ADME. Therefore, 103 flavonoids satisfying Ro5 also should be further assessed according to their ADME criteria inside the body. Moreover, their toxicities should be evaluated to describe their safety profile. In this work, we utilized the pkCSM web server (http://biosig.unimelb.edu.au/pkcsm/), which offers more comprehensive feature predictions than SwissADME, to study the in silico absorption, metabolism, excretion, and toxicity.
In silico predictions to assess absorption criteria of 103 flavonoids include water solubility, human intestinal absorption, Caco2 permeability, P-glycoprotein substrate, and P-glycoprotein I and II inhibitors. Prediction results suggest that the 103 flavonoids have good water solubility and human intestinal absorption (Table  S3), which are expected since they fulfill Ro5, as predicted by SwissADME. Nonetheless, only 69 flavonoids have good Caco2 permeabilities, with predicted values greater than 0.900. As one of the top hits, compound (171) has a predicted value of 0.771, which fails to satisfy Caco2 permeability. It is worth noting that the Caco2 permeability assay uses a cell line originating from a human colorectal carcinoma which represents the discrimination of drug absorption in the large intestine (Artursson and Karlsson, 1991). In terms of P-glycoprotein-related absorption, the pkCSM web server predicts the 103 flavonoids as P-glycoprotein substrates, implying that they are readily pumped back to the lumen, lowering their absorption. However, 85 of them are P-glycoprotein I inhibitors, 78 are P-glycoprotein II inhibitors, and 64 are inhibitors for both proteins. Compounds (171), (211), and (277), which are on the top hit list, are inhibitors for both P-glycoproteins I and II. Inhibition of P-glycoproteins I and II can increase the bioavailability of molecules that are P-glycoprotein substrates (Finch and Pillans, 2014).
After absorption via the small intestine, drug molecules are distributed to their targets and tissues through the blood and lymphatic circulatory systems. Therefore, distribution is a vital pharmacokinetic parameter for monitoring the number of drug molecules reaching target cells (Twardowski and Ksiazek, 1983;Zhivkova et al., 2015). For the distribution aspect, pkCSM predicts the VDss, fraction unbound in humans, BBB permeability, and CNS permeability.
VDss represents the propensity of drug molecules to reside in the plasma or distribute in tissues. Drugs with higher VDss values are distributed more in tissue than in plasma, thus requiring greater doses, and vice versa.  (Table S4). Therefore, the majority of the flavonoids complying with Ro5 may be more distributed in plasma. However, all these flavonoids have high fractions of bound forms to proteins in the blood plasma, indicated by low values of predicted unbound states. These predictions may reflect the low bioavailability of the flavonoids and their inefficiency in diffusing the membrane cells. A study conducted by Vandyck and Deval (2021) suggests that the low bioavailability of 3CL pro inhibitors can be addressed by injection or inhalation administration. Moreover, cyclodextrin delivery systems such as 2-hydroxypropyl-β-cyclodextrin (Glisoni et al., 2012;Jeulin et al., 2009;Nguyen and Van Den Mooter, 2014), randomly methylated β-cyclodextrin (Pathak et al., 2010), and α-cyclodextrin (αCD) are other solutions for the low bioavailability issue.
Another distribution criterion is BBB permeability. The BBB plays a key role in limiting the distribution of foreign molecules, which may be toxic, to the brain (Faria et al., 2012). BBB permeability can be expressed as logBB, which reflects the ability of drug molecules to enter the brain. None of the 103 hits are readily distributed to the brain (logBB > 0.3) and 45 of them poorly crossed the BBB (logBB < −1) (Table S4). However, the blood-barrier permeability surface product (logPS) prediction, which is derived from more direct measurement, suggests that 38 hits can easily enter the CNS, while 18 others are not permeable, such as compounds (16) and (20). A study conducted by Zanin et al. (2020) revealed that SARS-CoV-2 infects the brain. Therefore, those 38 hits potentially inhibit the virus's growth in the brain. Nevertheless, those hits should be free from mutagenesis and carcinogenesis properties (Gyebi et al., 2021). The next step after drug distribution is metabolism or biotransformation. Metabolism may occur before or after drug molecules reach target cells. It is an essential process in activating drug molecules and inactivating target cells and the excretion system, respectively (Wilde et al., 2019). Cytochrome P450 or the CYP superfamily are groups of enzymes that highly participate in drug metabolism, with CYP2C, CYP2D, and CYP3A subfamilies as the most active (Tushar et al., 2007). The pkCSM web server provides a series of predictions for drug molecules or secondary metabolites likely to be substrates or inhibitors for such proteins. It predicts that none of the hits is a CYP2D6 substrate and 14 hits are CYP3A4 substrates. These 14 hits are potentially metabolized by CYP3A4, increasing their elimination rates and decreasing their bioavailabilities in the body. However, further metabolism prediction suggests that 10 of them are CYP3A4 inhibitors, along with the other 30 hits. Other cytochromes, CYP1A2 and CYP2C19, are potentially inhibited by 29 and 89 hits, respectively. Meanwhile, CYP2C9 may be inhibited by 90 hits (Table S5). Of all hits, compound (259) may inhibit five cytochromes, including CYP1A2, CYP2C19, CYP2C9, CYP2D6, and CYP3A4. Inhibition of one or more cytochromes can increase the bioavailability of other hits from Erythrina plants.
The last step in pharmacokinetics is excretion, which reduces the concentration of drug molecules at their active sites. Slow total clearance of drug molecules via renal or hepatic excretion assists in the maintenance of therapeutic effects. Total clearance affects the bioavailability and half-life of drug molecules, which determine their dose sizes and regimens. The pkCSM web server provides a total clearance prediction, in log(CLtot), representing a combination of renal and hepatic clearance. Some 10 hits are classified as high total clearance with log(CLtot) more than 0.7. Those hits could be quickly cleared from the blood by the liver. Some 56 hits are intermediate total clearance, with log(CLtot) values between 0.3 and 0.7, whereas some 37 hits are low total clearance, with log(CLtot) less than 0.3 (Table S6). The other excretion parameter is the renal OCT2 substrate. OCT2 has a key role in the distribution and renal elimination of endogenous and drug molecules. Its substrates may have adverse interactions with OCT2 inhibitors. Therefore, predicting all hits as OCT2 substrates is beneficial for their potential contraindication and clearance (Pires et al., 2015). Interestingly, pkCSM only predicts erysubin D (329) as an OCT2 substrate. This prediction indicates the safety profile of the majority of the flavonoids from Erythrina plants in terms of contraindication due to OCT2.
Drug candidates often fail in the late phases of development due to their toxicities. Therefore, a preliminary toxicity study through pkCSM would be beneficial. All 103 hits are not hERG I inhibitors, but 37 of them inhibit hERG II (Table  S7). Inhibition of hERG causes heart rhythm abnormality, which may lead to fatal arrhythmia (Raschi et al., 2009). Another toxicity parameter provided by pkCSM is AMES toxicity. The pkCSM web server predicts 96 hits are AMES-toxicity-free, while the other 7 have a positive result prediction (Table S7), suggesting their carcinogenicity potentials. For hepatoxicity prediction, 15 hits potentially damage the liver (Table S7). In total, pkCSM predicts 33 hits free from hepatoxicity and mutagenicity. Additionally, they are non-hERG I and II inhibitors. Two of them, orientanol E (171) and erycaffra F (57), are the fifth and seventh on the top hit list.

Non-bonded interactions analysis
Among 33 flavonoids that are considered hepatoxicityand mutagenicity-free according to in silico ADMET above, orientanol E (171) and erycaffra F (57) are the fifth and seventh on the top hit list of virtual screening (Tables S1 and S7). Therefore, they are potential 3CL pro inhibitor candidates, and their detailed interaction analysis with the enzyme is interesting to explore.
We performed interaction analysis for all poses of orientanol E (171) and erycaffra F (57) in their best clusters (Tables S8-S10). The results (Tables S8-S11) showed that both flavonoids interact with His41 and Cys145, dyad catalytic residues, which are important to 3CL pro inhibition activity (Konwar and Sarma, 2021;Tahir Ul Qamar et al., 2020). Orientanol E (171) forms a salt bridge with His41 through its C7-OH, which is predicted in a deprotonated state (Fig. 1). Such an interaction presents in all poses docked to the binding site of 3CL pro . Furthermore, it also interacts with His41 via π-π T-shape stacking, where its occurrence is 80.5% in all poses in the best cluster. Meanwhile, erycaffra F (57) interacts with His41 solely through a π-alkyl with an occurrence of 92.1% in all poses (Tables S10 and S11). For Cys145, both flavonoids form a hydrophobic alkyl interaction which emerges in 90.2% and 81.6% among all poses of orientanol E (171) and erycaffra F (57), respectively.
Separate studies conducted by Cannalire et al. (2022) and Zhang et al. (2020) suggest that the interaction with His163 is essential for the inhibition of 3CL pro , as demonstrated on baicalein. Orientanol E (171) and erycaffra F (57) also form interactions with His163 ( Figs. 1 and 2). For erycaffra F (57), the interaction with this residue is through an H-bond, which appears in all poses (Tables S10 and S11). In the case of orientanol E (171), His163 makes a hydrophobic π-alkyl with almost all poses of the flavanone (Tables S8 and S9). Additionally, orientanol E (171) forms a hydrophobic alkyl interaction with Met49. It is another vital residue for the inhibition of 3CL pro , situated on the floor of the binding pocket. Xiang et al. (2022) suggested that the interaction with this residue hinders the native substrate from docking at the active site of 3CL pro .
Another binding energy component in molecular docking is torsional energy. Isa et al. (2022) and Kumar et al. (2020) showed that torsional bonds maintain ligand flexibility in the rigid binding site of 3CL pro . According to molecular docking analysis, orientanol E (171) has lower torsional energy than erycaffra F (57) ( Table 4), which is in agreement with their binding energy values. The torsional energy values suggest that orientanol E (171) may have a higher potency than erycaffra F (57) in binding to the active site of 3CL pro due to its flexibility.

MD simulations
Molecular docking is widely used in structuralbased virtual screening due to its rapid evaluation, which was implemented in this study to screen flavonoids in the genus Erythrina against 3CL pro . However, it suffers several limitations, such as employing a rigid receptor and simplifying a scoring function (Forli et al., 2016). Forli et al. (2016) recommended MD simulation method to refine the top hits obtained from molecular docking; MD also mimics a physiological condition which helps to estimate the strength of binding energy between the ligand and the target protein as an experimental study (Kumar et al., 2020). Therefore, we subjected 3CL pro -bound orientanol E (171) and erycaffra F (57) to MD simulations, yielding a 100 ns trajectory for each complex.
Additionally, we conducted MD simulations on the 3CL pro -bound N3 and GC376 as the references. N3 is a potent Figure 1. The visualization of interactions between orientanol E (171) and 3CLpro. The carbon atoms for the residues on the binding site of 3CLpro are in a light-blue color, whereas those for orientanol E (171) are in grey. The orange dash line depicts the interaction of pi anion and attractive charge, the green for hydrogen bond, the purple, and pink for hydrophobic interactions including alkyl, amide-π stacked, π-alkyl, π-π T-shaped, and π-sigma. Meanwhile, the yellow dash line denotes the interaction of π-sulfur, whereas the red for unfavorable interaction. Figure 2. Visualization of interactions between erycaffra F (57) and 3CLpro. The carbon atoms for the residues on the binding site of 3CLpro are in a light-blue color, whereas those for orientanol E (171) are in grey. The orange dash line depicts the interaction of π-anion, the green for hydrogen bond, the light green for carbon hydrogen interaction, the purple and pink for hydrophobic interactions including alkyl, amide-π stacked, π-alkyl, π-π T-shaped, and π-sigma. Meanwhile, the yellow dash line denotes the interaction of π-sulfur, whereas the red for unfavorable interaction. SARS-CoV-2 3CL pro inhibitor, which can also inhibit multiple coronaviruses, like MERS-CoV and SARS-CoV (Jin et al., 2020). Meanwhile, GC376 has been widely investigated as a potential inhibitor targeting 3CL pro in Vero cells against FCoV, SARS-CoV, MERS-CoV, and SARS-CoV-2 (Fu et al., 2020;Sharun et al., 2021). It is also employed as a standard inhibitor in a commercial 3CL pro assay kit (BPS Bioscience, 2022). Hence, the utilization of N3 and GC376 as the references assists in predicting the inhibition potential of orientanol E (171) and erycaffra F (57) to 3CL pro .
Binding energy evaluations from MD simulations are tabulated in Table 3. The binding energy values change from molecular docking. These changes may be due to MD simulations accommodating complete flexibility in both ligands and 3CL pro , allowing intermolecular interaction adjustment (Hardianto et al., 2017(Hardianto et al., , 2019 between them (Table S12). Based on MD simulation, N3 and Paxlovid bind 3CL pro stronger than orientanol E (171) and erycaffra F (57). Nevertheless, orientanol E (171) still has a more negative binding energy value (−35.47 kcal/mol) than GC376 (−20.25 kcal/mol), which is comparable to that of erycaffra F (57) (−20.06 kcal/mol). These results suggest that both orientanol E (171) and erycaffra F (57) are potential 3CL pro inhibitors.

CONCLUSION
In this study, we have performed structure-based virtual screening campaigns on 378 flavonoids from the genus Erythrina against SARS-CoV-2 3CL pro . These virtual screening campaigns filtered 108 hits with binding energy scores stronger than the N3 inhibitor. Some 103 of the hits satisfy Lipinski's Ro5, suggesting their potency as oral drug candidates. Additionally, we conducted an in silico ADMET study using the pkCSM web server on the 108 hits since ADME and safety profiles of molecules are essential in drug discovery and development. The web server suggests that 33 flavonoids are hepatotoxicity-and mutagenicity-free. In addition, they are not hERG I and II inhibitors. Two of them, orientanol E (171) and erycaffra F (57), have binding energy values in the top 10 hits. Binding energy evaluation from MD simulation also supports both orientanol E (171) and erycaffra F (57) as potential 3CL pro inhibitors. Both flavonoids satisfy Ro5 and have good absorption profiles, but their distribution properties are poor according to pkCSM prediction. These issues can be anticipated using delivery systems such as liposome and cyclodextrin. The pkCSM web server predicts both flavonoids as nonsubstrates for CYP2D6 and CYP3A4, suggesting their high bioavailability in the body. According to the web server, they may be cleared from the body through feces. In summary, this study suggests that flavonoids from Erythrina may have the potency to be developed as herbal medicines to treat COVID-19 through inhibition of 3CL pro . Further in vitro and in vivo experiments, however, are required to confirm this notion. Table 2. Non-bonded interaction types between the binding site of 3CL pro and two flavonoids, orientanol E (171), and eryvsffra F (57). These flavonoids have good ADMET profiles and are in the top 10 hits.