Structural depiction and analysis of RasGap protein using molecular dynamics simulations

RasGap is a significantly large protein constituting 1,047 amino acids with vital domains such as SH2, SH3, PH, and C2, N terminal, and RasGap region. However, the structures are available for distinctive domains, and thus in this paper, we predicted the entire 1,047 amino acids’ protein structure using various integrated computational techniques ( ab initio , homology modeling, and fragment-based modeling, which were subsequently subjected to molecular dynamics simulation analysis) to predict the structure and analyze the effects of this dynamic feature on the discrete domains. The findings revealed that RasGap protein has dynamic features like polyglycine, polyproline, and polyglutamine regions in the long N-terminal region. All the models exhibited stable conformations with 78.4%–83.1% residues and showed favored regions in the Ramachandran plot and demonstrated a confident c-score of 0.12 and a dope score between −62,050.261719 and −67,629.390625. The N-terminal polyproline region showed a hydrophobicity index ranging from 0.437 to 0.531, while the SH2_1 ranged from 0.633 to 0.68, SH3_2 ranged from 0.626 to 0.71, and SH3 ranged from 0.589 to 0.651. The molecular dynamics simulations also revealed the opening of the cavity region. Polyproline, with SH2_1, SH3, and SH2_2 domains, is a known and responsible factor in the formation of RasGap– Nck1 complex. Hence, opening a cavity in the studied region of RasGap protein correlates a strong relationship in the formation of the cavity region, which can be further studied in its interaction with the Nck1 protein.


INTRODUCTION
Ras genes are an ubiquitous gene family identified in both animals and plants with a fair degree of conservation. H-ras, K-ras, and N-ras, which encode highly similar proteins with molecular weights of 21,000 (Barbacid, 1987), are extensively studied and are involved in fundamental cellular processes, including survival, differentiation, motility, proliferation, and transcription (Cox and Der, 2010;Giehl, 2005). The extensive studies on the RAS pathway arise from the fact that it is commonly deregulated in human cancers and RAS-associated mutations occur in nearly 30% of human tumors (Maertens and Cichowski, 2014).
RAS constitutes molecular switches that cycle between 'on' and 'off' conformations caused by the binding of guanosine 5'-triphosphate (GTP) or guanosine diphosphate, respectively, through two major regulators -guanine nucleotide exchange factors and GTPaseactivating proteins (Díez et al., 2011). RAS GTPaseactivating proteins (RASGAPs) normally catalyze the RAS-GTP hydrolysis, thereby regulating RAS and have emerged as a class of tumor suppressors that upon inactivation provide an alternative mechanism of activating RAS. RASGAPs are also involved in several other functions. RASGAP has been shown to play important roles in T-cell development and functions. Neurofibromin1, another RASGAP, is shown to promote the positive selection of T cells in the thymus (Oliver et al., 2013;Qiao et al., 2012). Another prototypical RasGap, well expressed in T cells, is p120 RasGap (RASA1). RASA1 is shown to be a regulator of Ras activation in T cells (Downward et al., 1990). However, non-conditional RASA1deficient mice are shown to succumb at an early phase in embryonic development (Henkemeyer et al., 1995).
RASA1 has a COOH-terminal catalytic domain and NH2-terminal part with a proline-rich sequence, PH domain, SH2 and SH3 domains, and other noncatalytic domains, involved in the regulation of several proteins (Pamonsinlapatham et al., 2009). RasGap can act as both a downregulator and effector for Ras (Ekman et al., 1999;Kazlauskas et al., 1990;Klinghoffer et al., 1996). Ras-independent directional cell movement requires the association of RasGap and p190 Rho-GAP (Kulkarni et al., 2000), while inhibition of kinase activity is observed when RasGap binds to Aurora kinase domain (Gigoux et al., 2002). Other roles of RasGap include apoptosis regulation (Pamonsinlapatham et al., 2008;Yang et al., 2001), angiogenesis (Kunath et al., 2003), and protein synthesis (Panasyuk et al., 2008). However, the regulation of catalytic activity of RasGap is yet to be understood completely and positive regulators for the activity are still to be understood. The RasGap PH domain attenuates the activity of the catalytic domain (Drugan et al., 2000), while SH2/SH3 domain-containing proteins Nck1 and Nck2 mediate cytoskeletal rearrangement via membrane-receptor signaling (Blasutig et al., 2008;Buday et al., 2002). Both these adaptor proteins need to be investigated further to understand their interactions with RasGap.
In view of the selective advantages of computational methods, in this study, we have tried to gain insight into the conformational changes in the tertiary structure of human RASGAP employing molecular modeling and dynamics studies. The lack of x-ray crystallographic data for the human RasGap protein indicates the utility of computational methods for structure prediction through homology modeling, fragmentsbased modeling, fold recognition, and threading (Kuhlman and Bradley, 2019;Shah and Gupta, 2014). The objective of the study is to model the three-dimensional (3D) structure of RASA1 and explore its conformational-based analysis.

RASA1_Human sequence
The amino acid sequences of RASA1_Human [AccessionP20936] is 1,047 amino acids long and were retrieved from the UniProtKB public repository database and used for 3D structure modeling and analysis in the current study.

Secondary structure prediction
The first step in studying the protein structure both experimentally as well as computationally is the identification of the protein secondary structure it includes the conserved structural features such as α-helix, extended strand, random coil, ambiguous states, and other states. The prediction of RASA1 secondary structure is carried out using the Garnier-Osguthorpe-Robson (GOR) method (Garnier et al., 1996) and self-optimized prediction method with alignment (SOPMA) (Frishman and Argos, 1997).

Tertiary structure prediction
Template selection In the protein 3D structure prediction and modeling under the homology-based concept, selecting a suitable template is a backbone of model. Finding out a suitable template for modeling position-specific iterative (PSI)-basic local alignment search tool (BLAST) is carried out with a RASA1 protein sequence as a query and protein data bank (PDB) database as a reference (Altschul et al., 1990(Altschul et al., , 1997.

Fragment-based Homology modeling
The homology model for SH3, SH2_2, and RasGap domain is carried out using the SWISS-MODEL servers electing template protein PDB id: 2M51, 2GSB, and 1WER. SWISS-MODEL is a fully automated protein structure homology modeling server, which generates a refined 3D homology model, i.e., taking a protein sequence as a query or input, identifying the optimum aligned template structure using inbuilt BLAST algorithm, and generating a high-quality 3D structure as per the selected template structure. The quality of the model may be lost due to poor sequence-based identity between the target and the template, mostly when it falls below 20%.

Modeling of RASA1 N-Terminal domain 1-280 aa
No structural homology for the N-terminal domain of RASA1 protein led us to predict the structure from online knowledge-based sources such as Robetta and Phyre2 (Kelley et al., 2015;Kim et al., 2004). With poor 3D prediction results, we the adopted fragment-based homology modeling and short fragments ranging from three to five amino acid regular expression patterns were searched in the protein data bank using Jena Library of Biological Macromolecule (Reichert and Sühnel, 2002). Using Jena library's regular expression pattern search algorithm, we identified 59 structural homologs/templates similar to the N-terminal region of RASA1. The templates are listed in Table 1: Supplementary data 1.

SWISS-MODEL server
Using the SWISS-MODEL homology server input, we have modeled two structures using 5M6U.pdb as a template with an identity of 24.90% to a sequence length of 178-445 aa representing the domains SH2_1, SH3, and SH2_2. Template 1WER.pdb with 100% identity to sequence length 714-1,047 is a RasGap domain.

SCRATCH
SCRATCH is an online server for the predicting protein tertiary structure and the structural features are available at http:// www.igb.uci.edu/servers/psss.html (Cheng et al., 2005). The suite combines an evolutionary information in the form of profiles, fragment libraries derived from PDB, and energy functions, and adopts a machine learning algorithm to predict protein structural features and protein tertiary structures (Berman, 2000;Cheng et al., 2005). The suite includes SSpro: three-class secondary structure; SSpro8: eight-class secondary structure; ACCpro: relative solvent accessibility; CONpro: contacts with other residues compared to average; DOMpro: domain boundaries; DISpro: disordered regions; MUpro: effect of single amino acid mutation on stability; DIpro: disulfide bridges; CMAPpro: residue-residue contact maps; and 3Dpro: tertiary structure (Baldi and Cheng, 2004;Cheng et al., 2005;Pollastri and Baldi , 2002;Pollastri and Baldi, 2002). Since SCRATCH server predicts 3D structures for only 400 amino acids, we have modeled for fragments 1-280 and 447-713 using SCRATCH.

Phyre2
Phyre2 is the advanced version of Phyre, which determines the evolutionary profiles using HHblits that calculates the secondary structure from PSI-blast based secondary structure PREDiction (PSIPRED) and is scanned against a database of hidden Markov models of fold library of known structures. Indels are handled by the fragment library of a known protein structure that varies in length from 2 to 225 amino acids (Kelley et al., 2015), and fragments are fitted to the crude model using cyclic coordinate descent method (Canutescu and Dunbrack, 2003) which uses a fast graph-based technique and a side-chain rotamer library to model the side chains in their most appropriate rotamer while passing up the steric clashes (Canutescu and Dunbrack, 2003;Kelley et al., 2015;Xie and Sahinidis, 2006). The 3D structure model for RASA1 was developed using Phyre2 for predicting the protein structure by homology modeling under the 'intensive' mode. Using Phyre2, here we have successfully generated two model, model-1 02-1,046 length of protein sequence and model-2 181-446 amino acid sequence length with high confidence.

I-TASSER
I-TASSER executes TASSER method by predicting a secondary structure using PSIPRED and define secondary structure of proteins as a scoring function. Calculating the backbone, hydrogen bonds, short and long range correlation, and predicted surface area is carried out via the artificial neural network method (Dorn et al., 2014;Roy et al., 2010). Using I-TASSER, we have modeled a full length of sequence 1-1,047 and sequence length from 546 to 1,047 amino acids with high confidence.

Comparative model assembly
It is difficult to predict a protein structure when homology is discrete and minimum. As stated above, the automated modeling tools have generated only one full-length structure from I-TASSER. Hence, here we have adopted a comparative and fragment-based modeling approach by compiling all the fragments and full-length structure as a template to the target sequence for deriving a 3D structure model. A comparative model can be assembled from a framework of small rigid bodies or specific domain 3D structure. The approach is based on the dissection of the protein structure into conserved core regions. In this study, we have made four possible combinations to assemble the entire length of the protein sequence into the 3D structure ( Figure 1). We have used Modeler 9.2 version tool from Sali Lab (Webb and Sali, 2014).

Model evaluation and simulation
All 3D models were qualitatively estimated by ProSA (Sippl, 1993(Sippl, , 2007 and Rampage (Lovell et al., 2003). Finally, all the six models were subjected to MD simulations using Nanoscale Molecular Dynamics (NAMD, formerly Not Another Molecular Dynamics Program) (Phillips et al., 2005) and visualized in VMD 9.2 (Humphrey et al., 1996). The modeled 3D structures were considered as a starting point for MD simulation. A TIP3P water box was applied by using CHARMM force field wherein 22 parameter files for proteins and lipids were utilized for proteins with similar chemical structures. The protein preliminary energy was minimized via 1,000 steps at constant temperature (37°C), followed by the simulation of 1,50,00,000 steps (2 fs) at each run is equal to 30 ns with Langevin dynamics and restart frequency at 500 steps to regulate the dynamic energy, temperature, and pressure of the system.

Impact of the proline-rich region on SH2_1, SH3, and SH2_2 domains of RasGap
A hydropathy index was calculated using Kyte and Dolittle, the method in Protscale, ExPASy website (Gasteiger et al., 2005). A high hydrophobic region was calculated in all the three domains and the distance was calculated for the before and after simulations of the modeled protein using PYMOL (Schrödinger, 2017).

RESULTS AND DISCUSSION
The RasGTPase-activating protein 1 sequence with 1,047 residues was saved in FASTA format. RasGTPase-activating protein 1 holds a specific seven-functional domain reported in Table 1 with a compositional bias of polyglycine, polyproline, and polyglutamic acid sequence in Table 2. The query protein sequences for PSI-BLAST produced a set of sequences with the highest and lowest similar sequences. The secondary structure of the proteins was predicted by GOR and SOPMA. The derived results will annotate the structural features like helix, coil, or strands which would constitute the protein secondary structure and can be used for validation of protein tertiary structure. Attribution of the secondary structure components in the proteins is summarized in Table 3 and supplementary file. Although helices and coils are dominant over the sheets, they could be identified in all sequences length, in both the results.

3D structure modeling
When discussing the model building step within comparative protein structure modeling, it is useful to distinguish between dependent templates and independent templates modeling. Model building without the aid of any template corresponds to the GAPs or no-structural template to the target sequence. Comparing the BLAST results, three regions were mapped with structural templates, i.e., SH3, SH2_2, and RasGap domains, which are listed in Table 4 and shown in Figure 2.
Implementing the knowledge from Jena library's regular expression pattern search algorithm, we identified 59 structural homologs/templates similar to the N-terminal region of RASA1 for sequence length 1-280. A list of templates can be found in the supplementary data 1; these short fragments of 3D crystallized data were varied from three to five amino acids in length and served as structural templates for the comparative multiple template homology modeling. The outcome was not appropriate, and structures are deeply strangulated to each other (Figure 3).
Using the SWISS-MODEL server, we have modeled the 02 structure for sequence length 178-445 aa with 24.90% identities (template 5M6U) and 714-1,047 with 100% identity (template 1WER). Intensive modeling method using Phyre2 exhibited confidence of 858 residues (82%) modeled at >90% accuracy to 1,047 total length of sequence (Figure 4). A fulllength protein structure was predicted by I-Tasser with a c-Score of 0.2 ( Figure 5). Two fragments were modeled using the Scratch server from length 1 to 280, possessing the N-terminal and Sh2_1     domain and 446-713 amino acid length possessing PH2 and C domain. RAS_SH3_SH2 181-446 (template 3D structure) was modeled using Phyre2 and RASGtpaseActi 546-1,047 (template 3D structure) was modeled using I-Tasser.

Model evaluation
The 3D model was estimated qualitatively by two various independent servers. To recognize the error in the generated structure, the PDB format coordinate file is uploaded in the ProSA web. The z-score indicates the overall model quality by comparing the structure to all the experimentally determined protein chains in the current Protein data bank. In the second method, we used geometrical validation by comparing the phi, psi, and omega angles using the Ramchandran plot analysis in RAMPAGE (Table 6).
Comparing the outcomes of all the six subjected 3D structures of RasGap to 30 nanoseconds simulation, the root mean square deviation (RMSD) of all the six models exhibited a linear form (Fig. 10) in the generated molecular dynamics trajectories and the log output indicated a linear form of representation in the deviation of bonds, angle, dihedral, and improper evidence of the stability of dynamic behavior of our generated protein 3D structural models (Fig. 11A-F

Impact of the proline-rich region on SH2_1, SH3, and SH2_2 domains of RasGap
The highlighted peaks in Figures 12 and 13 show the hydrophobic regions in the SH2_1, SH3, SH2_2, and polyproline region at the N-terminal domain of RasGap protein.
The hydrophobic scores are listed in Table 8. As the polyproline region acts as a gateway for the NCK1 protein to interact with the SH2_1, SH3, and SH2_2 domain of RasGap receptor protein, we have calculated the distance between the polyproline region (grey and red color) and SH2_1 (purple color), SH3 (yellow color), and SH2_2 (green color) domain of RasGap receptor protein depicted in CPK view form. The average difference between all four regions before and after the simulation results depicts an opening of the cavity. As the distance is increased between the polyproline region and SH2_1, SH3, and SH2_2 domains of RasGap receptor protein, the distance value is listed in the supplementary data and is shown in Figs. 4-9.
Protein structure prediction is a growing field in which best methods vary only by a minor percent value in performance as per the recent critical assessment of protein structure prediction standards. Template-based protein structure prediction can be upgraded by advances in techniques specific to remote homology detection, alignment, and model quality assessment (Meier and Söding, 2015). In this study, we have shown how to generate a protein 3D structure model from multiple approaches, including   fragment-based and multi-template modeling, when very little or no structural homology is reported. The other factors we have described here is the role of the proline-rich region with respect to SH2_1, SH3, and SH2_2 domains within the RasGap receptor protein responsible to interact with NCK1 protein (Ger et al., 2011).
The 3D model of 1-280 residues has widely failed due to the highly coined region predicted by fragment-based modeling and therefore the modeling followed the multiple template-based modeling. The fragments predicted by the SCRATCH server, Phyre2, I-Tasser, and SWISS-MODEL server served as a good template input to the multiple template modeling. During the molecular dynamic simulation, we noted that the polyproline region from 135 to 145 marks an outbound deviation from the SH2_1, SH3, and SH2_2 domains within the RasGap receptor protein. This opening leads to opening a cavity of the RasGap protein which holds a strong connection in interaction with NCK1 protein.

CONCLUSION
The current work presents a 3D structural model of RasGap constructed by multiple homologs and ab initio methods. The model was refined by molecular dynamics simulations in a water solvent environment. The suitability of the model is indicated by numerous model quality evaluations through PROSA and the Ramachandran plot. The exploration of active site was restricted to the SH2_1, SH3, and SH2_2 domains where NCK1 enzyme interacts. The constructed model provides an alternative to explore the structural characteristics of RasGap at the molecular level. To study its mechanism and interaction, the projected model provides a platform for exploring the residues that play important roles in the catalytic activity with their substrates.

AUTHORS' CONTRIBUTION
Bastikar, VA, Bastikar, VA, and Gupta, PP made an active contribution to the conception and design. Bastikar, VA, Bastikar, A, Gupta, PP, Kumar, P, and Chhajed, SS, carried out the analysis and interpretation of the data and the drafting of the paper. All authors have critically reviewed its content and have approved the final version submitted for publication.
(2) School of Biotechnology and Bioinformatics, D Y Patil Deemed to be University, Navi Mumbai, Maharashtra, India.