Statistical Evaluation and Characterization of Carica papaya Metabolites

Background: Natural resources are often sought for potential molecules which are pharm acologically active. Carica papaya is a well-known plant, studied for its constituent secondary metabolites and their benefits. Methods: This study focuses on identifying low molecular weight secondary metabolites form Carica papaya as potential pharmacophores, based on their structure. We determined their similarity with pharmaceutically active anti-clotting molecules using clustering method. Cheminformatic and Statistical analysis was done with data on molecular characteristics. Results: We have identified Carica papaya compounds to be pharmacologically active. In this study, we have differentiated Carica papaya compounds, through statistical analysis using clustering method, on the basis of significant molecular properties. Conclusion: Molecular similarity of secondary metabolites revealed through this study identifies, Carica papaya metabolites as a possible substitute drug and a preventive medicine against thrombosis.


INTRODUCTION
There are innumerable ways of interaction and dependency among plants and animals. Selective use of the resources and changes in the environment modifies plant growth and adaptation. These plant responses have enabled the evolution of various metabolites. They form flavours, colours or develop defence mechanisms to cope with competition or survival threats. [1] Some adaptations also provide protection to neighbouring plants. [2] Few such compounds produce a bitter smell, toxic contents or cyanogens. [3] Production of secondary metabolites involves pathways which function for the plant survival, reproduction, etc. The metabolites produced have wide commercial applications in colouration, cosmetic, fragrances, or defense mechanisms. [4][5][6][7][8] These low molecular weight compounds have unique and complex structures. Methods as study of genetic maps of biosynthetic pathways, targeted metabolite analysis or cDNA amplified fragment length polymorphism (AFLP), quantitatively expresses profiles in metabolite analysis. In targeted metabolite analyses where transcript profiling was carried out with cDNA-amplified fragment length polymorphism, jasmonate elicited tobacco Bright yellow 2 cells. An extensive jasmonate mediated metabolic reprogramming was found focused on genes. This had reflected in metabolite biosynthesis. [9] This study focusses on the herbaceous plant Carica papaya from family Caricaceae. It grows in tropical and many subtropical regions of the world. It has a single unbranched trunk, palmately lobed leaves on The plant is cultivated as a food and cash crop. The crop is developed to increase yield, fruit quality along with better nutritional content. The plant is also found to have high applications in the form of traditional medicines. Various studies based on the constituents indicate presence of phytochemicals, which helps to impart pharmacologic activity as an antioxidant, anti-inflammatory, immunomodulatory or antimicrobial. They are used alone or as combination drugs in treating diseases. [12] Papaya is also identified as a rich source of numerous bioactive secondary metabolites as alkaloids, flavonoids, tannins, saponins and steroids.
Carica papaya is rich in antioxidants. Papaya leaves extracted in water exhibit antioxidant activity which can inhibit B. subtilis, P. aeruginosa and S. aureus gram positive bacterias and E. coli, S. typhi and K. pneumonia, which are gram negative bacteria. [13] In general, small molecules display unique properties by virtue of their molecular characteristics. Natural resources are used as a feasible resource of compounds, with pharmaceutical applications as drugs or nutraceuticals.
In this study, the secondary metabolites identified in the Carica papaya plant were chosen for analysis. We determined the structural characteristics specific to each compound. Based on the benefits identified in the plant parts, functional assignment was done. A molecular search was done for reference molecules, focussing on drugs and potential ligands that imparts the specific function. The structural parameters of compounds were compared and statistically characterized through clustering techniques to determine the structural proximity of Carica papaya compounds to the reference molecules.

Ligand Dataset preparation
Identification of Compounds from Carica papaya Plant secondary metabolite composition of Carica papaya, was identified from previous studies. Phytochemical screening results on various extracts of Carica papaya taken from the seed, leaf and pulp from the plant were taken. All three extracts, especially the leaf extract are identified to be rich in alkaloids. The extracts also contain glycosides, high amount of polyphenol. Tannins are available in seeds and leaves, but absent in pulp. Flavonoids are absent in seeds and pulp, but leaves are found to be rich in flavonoids. [14] The leaf extract of Carica papaya contained flavonoids as Quercetin, 5,7-Dimethoxy Coumarin, Chlorogenic acid, Caffeic acid, Kaempferol, p-Coumaric acid, and Protocatecheuic acid which imparted anti-viral assembly properties against Dengue2 virus. [15] Papaya plant composition identified with freeze dried Carica papaya leaf juice indicated the presence of twenty four compounds. Compounds including Quinic acid, Malic acid, Protocatechuic acid, Chlorogenic acid, p-Coumaric acid, Caffeic acid, Ferulic acid, Rutin, Nicotiflorin, Myricetin, Fisetin, Morin, Quercetin, Kaempferol, Citropten and Isorhamnetin were determined using mass spectral pattern and the retention times. [16] From the above studies, many compounds like Quercetin, Fisetin were found to be commonly available in extracts of different parts of the plant. A ligand dataset with commonly identified compounds from Carica papaya and the low molecular weight pharmacologically active drugs, taken as reference molecules, were prepared.

Determination of biological significance of Carica papaya
Functional assignment for the Carica papaya secondary metabolites were done based on previous studies that are carried out with the plant extracts. Effective biological activity was observed with Carica papaya extracts against diseases involving platelets and clot formation. Papaya plant extracts were found to be effective against viral diseases as Dengue fever. [17] Dengue is caused by a virus which causes self-limiting acute febrile stage of disease which is followed by a critical phase of defervescence. This can also lead to severe illness. Thrombocytopenia along with dysfunction of platelet are found to occur during the disease. [18] Therapeutic intervention involves understanding disease mechanisms and effective targeting. Activation of platelet is important in regulating thrombocytopenia in dengue infection. [19] Platelet count is a very important accessory test, reliably used for determination of infections including dengue viral infection. [20] The viral non-structural protein 1 induced thrombin and plasminogen cross reactive antibodies, are found to participate in the development of haemorrhage in patients with Dengue haemorrhagic fever. This is found to involve the processes of coagulation and fibrinolysis. Thrombogenesis happens through platelet aggregation formed by activating platelets, by attaching to strands of fibrinogen. During coagulation, a cascade of events involving enzymes leads to the formation of fibrin strands. Clotting can be therapeutically targeted using antiplatelet drugs or anticoagulants. Antiplatelet drugs inhibit platelet aggregation. Anticoagulants inhibit the formation of fibrin strands. Fibrinolytic drugs are involved in dissolving existing clots. [21,22] These studies indicate, effective therapeutic intervention can be done with papaya plant extract in thrombocytopenia or dysfunction of platelets.

Identification of reference drug molecules
Reference molecules were chosen based on their drug like properties. Antiplatelet drugs and anticoagulants were selected as reference molecules. In spite of the benefits provided, these molecules also specifically pose other risks such as postoperative bleeding or other unfavourable drug interactions. [22] Molecular information of the compounds was collected from the PubChem database. [23] Cheminformatic analysis using clustering and visualization Biological activity of molecules was determined based on Lipinski's Rule of 5 (RO5). [24] Statistical analysis was carried out to determine the compound structural similarities among various molecules. Through, hierarchical cluster analysis, classification based on data similarity was done with the tool. [25] Numerical data with molecular parameter values were used for clustering data with, Wolfram Mathematica tool. [26] The system is based on usage of classic and modern machine learning-based clustering analysis. We carried out data analysis to produce a clustering tree, with all the compounds from the small molecule dataset. Molecular data, including various parameter values, were analysed after a symbolic tree based on hierarchical clustering of data was created. A binary weighted tree was created with weight of each vertex indicating the distance between sub trees sharing the vertex as root. The information was plotted for graphical   visualisation and analysis, using charts and information visualization based on histograms. [26][27][28]  Molecular data based on various parameter values represented in the form of weighted tree were studied based on the distribution of parameter values in different clusters. Clustering of molecular weight of compound entries in the dataset indicated, Caffeic acid and Aspirin belonged to the same cluster. Coumarinic acid and Protocatechuic acid were found to be the closest cluster to Caffeic acid and Aspirin. 5-7-DimethoxyCoumarin was also closely related to these two clusters. Quercetin was clustered with oral anticoagulant Warfarin. The cluster was closely related to the antithrombotic drug Clopidogrel. Fisetin and Kaempferol which formed a cluster were found to be closely related to Clopidogrel. Beyond Clopidogrel, these compounds were found to be closer to antithrombotic drug Prasugrel and Cilostazol [ Figure 3a]. Based on partition coefficients, Caffeic acid and Protocatechuic acid were clustered together and were found closer to Aspirin. 5-7-Dime-Pharmacognosy Research, Vol 13, Issue 3, Jul-Sep, 2021     thoxy Coumarin, Fisetin, Kaempferol and Ticagrelor were found to closely clustered [ Figure 3b].

RESULTS AND DISCUSSION
Clustering based on TPSA indicated Caffeic acid, Protocatechuic acid belonged to the same cluster, which and more related to antiplatelet compounds, Dipyridamole and Prasugrel. 5-7-DimethoxyCoumarin closely relate to this cluster. Coumaric acid was clustered with Antithrombotic compound Clopidogrel. Fisetin, Kaempferol, Quercetin were found to closely relate to Oral anticoagulants Rivaroxaban, Apixaban and antithrombotic molecule Tirofiban [ Figure 3c].
Compounds as 5-7-Dimethoxy Coumarin, Fisetin and Kaempferol shared the same number of hydrogen donor atoms mostly with anti-thrombotic molecules as Clopidogrel, Prasugrel, Ticagrelor. The number of acceptors as 4, 7 or 6 were available for compounds in anticoagulant and antithrombotic molecules. The number of rotatable bonds was the least for compounds from the papaya plant which were either 2 or 1 indicating lowest rotational freedom during the interaction of the molecules in comparison to the antithrombotic or anticoagulant drugs. The majority of the compounds are found to be of low molecular weight, with a partition coefficient ranging from 0 and 4 having a topological surface area ranging from 51 to 150. The molecular weight distribution of compounds was maximum between 200 to 400 g/mol and one above 800g/mol. Partition coefficient of few compounds was found to be below 0 and the maximum number of compounds possessed a coefficient between 0-2. The polar surface area in the range 0-50 and 350-400 were found to be present in few compounds. A majority of compounds possessed TPSA value in the range 50 -100. The maximum number of rotatable bonds was found to be between 20 and 30 numbers [ Figure 4a-c].

CONCLUSION
The Carica papaya plant is found to be a rich source of secondary metabolites. Compounds from Carica papaya are found to be pharmacologically active with respect to Lipinski's RO5 specifications. Statistical analysis through clustering, segregated the compounds based on molecular properties where Carica papaya compounds were grouped in most of the cases with anti-thrombotic drugs than with anti-coagulant drugs. Standard deviations of parameter values for compounds from the papaya plant, indicate the compound relatedness with referenced antithrombotic or anticoagulatory molecules. This study, based on Pharmacognosy of Carica papaya compounds, shows similarity through clustering of molecular properties with the referenced anti-thrombotic or anticoagulant molecules. Clustering analysis recognised them to be part of a specific group of pharmaceutically active molecules. Grouping with Aspirin or Clopidogrel was significant in classification with respect to other drug properties. The secondary metabolites from Carica papaya, indicate a high potential structurally as pharmacologically active molecule which can be studied aginst clot related aetiology.

GRAPHICAL ABSTRACT SUMMARY
The Carica papaya plant is identified for many of its medicinal uses. Various secondary metabolites are identified in the plant source. This work analyses the secondary metabolites from Carica papaya plant based on its structural and molecular properties to determine their pharmacognostic characteristics. The metabolites structural characteristics are compared and statistically clustered through hierarchical clustering methods with anti-thrombotic molecules. The results identify molecular similarities with specific anti-thrombotic drugs, indicating the structural relatedness which could contribute to their functional benefits.