A Possible Risk Gene for B-Cell Chronic Lymphocytic Leukemia : NRIP 1

Background: The underlying mechanisms that cause B-cell chronic lymphocytic leukemia (B-CLL), the most common type of leukemia in adults, remain unclear. The aim of this study is to investigate the novel genetic risk of B-CLL through systematic literature review and metaanalysis. Methods: A comprehensive search of electronic databases was completed using Illumina BioEngine. Twenty one B-CLL case/control bio-sets from four different studies were selected, including 195 B-CLL cases and 31 controls. The selected top B-CLL risk genes were further analyzed by integrating an online open source B-CLL genetic database. Pathway enrichment analysis (PEA) and network connectivity analysis (NCA) were conducted to identify the potential functional association between target genes and B-CLL. Results: One novel gene (NRIP1) and two known genes (INPP5F and LEF1) were identified through the meta-analysis as top target genes for B-CLL. These genes play important roles within multiple B-CLL genetic pathways and are closely related to known B-CLL target genes. NCA results also revealed strong functional association between these genes and B-CLL. Conclusion: This study identified known as well as novel B-CLL target genes and their functional pathways that involved in the B-CLL pathogenesis. Our results may provide new insights into the understanding of the genetic mechanisms of B-CLL.


INTRODUCTION
Chronic lymphocytic leukemia (CLL) is the most frequent B cell leukemia in elderly patients [1] , the average onset age of CLL are mostly over 50, with few occurances in children [2][3][4] .The cellular origin of CLL is still debated, although this information is critical to understanding its pathogenesis.It has been hypothesized that both environmental and gen etic factors play important roles in the development of CLL [5][6][7] .
A large number of case-control studies and family-based genetic studies of B-CLL have been conducted to explore candidate genes for the disease [5,6,[8][9][10][11][12] .Many studies have shown increased familial risk for CLL [13] , and an ~8.5-fold increased relative risk in first-degree relatives [14] .In addition, genome-wide association (GWA) studies identified multiple CLL susceptibility loci [15] and novel genetic variants from familial CLL , however that was not seen in sporadic CLL [16,17] .Furthermore, multiple modality genetic data from peripheral blood samples were employed to identify B-CLL genetic determinants [8][9][10][11] .These previously studies built a solid background for B-CLL genetic research, which could be leveraged for the discovery and evaluation of novel risk genes.
The risk estimates from individual studies often lack statistical power due to limited sample sizes and sample specificities in terms of phenotype characteristics.It is also difficult to come to a consistent conclusion as results are spread over a large number of independent studies.Therefore, a meta-analysis of multiple studies could provide a more effecitve assessment of the genetic risk factors of B-CLL.
A meta-analysis was conducted on four recent studies (2004-2012).Integrating a curated B-CLL genetic database (B-CLL_GD), the top genes from the study were further analyzed.The B-CLL_ GD database was constructed using a large scale literature knowledge database, Pathway Studio (PS) ResNet database.In recent years, the PS ResNet database has been widely used to study modeled relationships between proteins, genes, complexes, cells, tissues and diseases (http://pathwaystudio.gousinfo.com/Mendeley.html).Our study identified novel B-CLL genes and evaluated the effectiveness of integrating meta-analysis and PS ResNet database to identify and evaluate novel B-CLL risk genes.

Genetic data selection
A systematic search of electronic databases was conducted using the Illumine BaseSpace Correlation Engine (http://www.illumina.com).Fig. 1 presents the diagram for the data selection.A search for the 'B-cell chronic lymphocytic leukemia' search result identified 28 B-CLL studies.Further filter criteria included: (1) whether the organism is Homo sapiens; (2) the data type is RNA expression; (3) the study is B-CLL case vs. healthy control study (or include case/control bio-sets).In total, 21 bio-sets (B-CLL case/control comparisons) from four studies satisfied the study selection criteria and were included in this systematic review and meta-analysis.

Genetic database B-CLL_GD
The B-CLL_GD is an online B-CLL targeted knowledge database available at 'Bioinformatics Database' (http://database.gousinfo.com/).The database is updated monthly or upon request.The current version of B-CLL_GD is composed of 753 B-CLL target genes (B-CLL_GD→Related Genes), 125 pathways (B-CLL_GD→Related Pathways), and 159 related diseases (B-CLL_GD→Related Diseases).The database also provides supporting references for each B-CLL-Gene relation, including the titles and the sentences where the relation has been identified (B-CLL_GD→Ref for Related Genes).This information could be used to locate a detailed description of how a candidate gene/drug is related to B-CLL.
Using B-CLL_GD, further analysis of the B-CLL target genes from the meta-analysis were conducted, including identifying their related B-CLL pathways (B-CLL_GD →Related Pathways) and genes (B-CLL_GD →Related Genes).Here we defined two genes as functionally related if they play roles within same genetic pathway.Pathway enrichment analysis (PEA) was conducted using Pathway Studio to identify genetic pathways potentially linked to B-CLL [18] .The gene-disease relationships were identified using the network building module of Pathway Studio.

Selected Datasets
Using the previously identified selection criteria, 21 B-CLL case/control comparison bio-sets from four independent studies were retrieved and assessed (see in B-CLL_Meta→Selected Datasets).Only one of the four datasets contained the case/control study (GSE19147) [8] .In this study, researchers analyzed T-cells isolated from CD3+ T-cells of patients with B-CLL, providing insights into the role of T-cells in B-CLL.The other three datasets contained separate case/control studies were available at NCBI GEO (ID: GSE2466, GSE26725 and GSE36907).
Datasets GSE26725 was designed to study the relationship between MYB (v-myb myeloblastosis viral oncogene homolog) and miR-155 host gene in B-CLL [9] , It contains 4   [11]   , including: (1) B-CLL with mutated IgV vs. normal naive CD27-IdD+; (2) B-CLL with wildtype IgV vs. normal naive CD27-IdD+.The third identified datasets (GSE2466) contained 14 separate case/ control studies, finding that a gene dosage effect may exert a pathogenic role in B-CLL, as well as genomic signature for the VH mutational status might be sex-related [10] .All the bio-sets studies are available at http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE2466 Statistics of the included bio-sets are presented in Table 1.Note: c-c refers to case vs. control.

Meta-analysis results
The Meta-analysis results were deposited into the 'Bioinformatics Database' (http://database.gousinfo.com), under the title B-CLL_Meta.The top three genes (Score > 90) from the meta-analysis appear in Out of the three genes listed in Table 2., only one gene was not included in the database B-CLL_ GD (NRIP1), which suggested that it may be a novel B-CLL risk gene.Further study using the B-CLL_ GD showed that this novel gene was enriched within multiple B-CLL target pathways and was connected to many other genes that were linked to B-CLL (Table 2., B-CLL_Meta → Related Pathways).Fig. 2 shows the 14 B-CLL pathways including these three genes.To note, two of these 14 pathways were among the top 10 B-CLL pathways (B-CLL_Meta → Related Pathways), including positive regulation of cell proliferation (0008284) and negative regulation of apoptotic process (0006916), as shown in Fig. 2.

Network analysis
Additional functional network connectivity analysis (NCA) using PS showed that the novel gene from this meta-analysis (NRIP1) presents strong functional association with B-CLL.These genes influence the pathogenic development of B-CLL through multiple pathways (Fig. 3).Each relation (arrow) were supported by one or more references (see B-CLL_Meta → NRIP1).

DISCUSSION
Although many previous genetic studies have been conducted to discovery genetic risk factors for B-CLL, combining the results from these separated studies by using meta-analysis could lead to a higher statistical power and a more robust point estimate for the disease.In this study, meta-analysis was performed on 21 B-CLL case/control bio-sets extracted from four recent studies.The B-CLL target genes from meta-analysis were sorted by gene score, which is based on the statistical significance and consistency of the gene across the queried biosets.Meta-analysis results suggested three top risk genes (INPP5F, NRIP1 and LEF1) for B-CLL (Score > 90), of which one is novel according to a recently updated database B-CLL_GD.Further analyses were conducted to study the possible correlation between B-CLL and these three genes, especially the novel gene.
Analysis using B-CLL_GD showed that the two known B-CLL target genes, INPP5F, and LEF1, are among the top B-CLL_GD genes , which is supported by (see B-CLL_GD → Related Genes).
Results from PEA showed that these three known genes and the one novel gene (NRIP1) are enriched within multiple B-CLL pathways (B-CLL_Meta → Related Pathways) and linked to hundreds of other B-CLL genes.These results support the relationship between the identified genes and B-CLL.
Additional network connectivity analysis (NCA) revealed multiple possible functional associations between B-CLL and the novel gene (Fig. 3).It has been shown that overexpression of NRIP1 could increase the mRNA levels of TNF-alpha [19] , while TNF-alpha plays an important role in the progression of B-CLL [20] .TNF-alpha promotes the proliferation of malignant cell clones.Therefore inhibition of TNF may have therapeutic applications in CLL [21] .This suggests that NRIP1 may play a role in the development of B-CLL through a NRIP1 → TNF → B-CLL pathway.
Many studies indicated that INPP5F and LEF1 may have a role in the therapeutic strategies of CLL [22, 23] .More potential connections between these genes and B-CLL could be identified from the B-CLL_ Meta database (see B-CLL_Meta → NRIP1, INPP5F and LEF1), which is available in the open source 'Bioinformatics Database' (http://database.gousinfo.com).
There are several limitations of this meta-analysis.The number of B-CLL patients and healthy controls were not well match (195 B-CLL cases and 31 controls).The unbalanced case/control comparison may influence the accuracy of the results.Additionally, due to the limitation of the space, we mainly focused on the most significant genes (Gene Score > 90).Genes with less significance from this meta-analysis may have potential linkages to B-CLL.A full list of the top 100 genesused in this metaanalysis is presented in the database B-CLL_Meta.
In summary, this meta-analysis supported the correlation between two genes (INPP5F and LEF1) and B-CLL, and revealed one novel potential risk gene (NRIP1).Network analysis supported the metaanalysis results and identified potential functional pathways and mechanisms, wherein which these genes play important roles on B-CLL.The findings in this study provide new insights into the current genetics research on B-CLL.

Fig. 1
Fig. 1 Workflow diagram for meta-analysis data selection

Fig. 2
Fig. 2 Fourteen B-CLL pathways where the three genes get enriched.The weight for a two-node edge is the number of shared genes by the two Pathways; The larger the size and brighter the color of a node, the larger the number of B-CLL candidate pathways including the gene.

Fig. 3
Fig. 3 Network connectivity analyses between NRIP1 and B-CLL.The networks were generated using 'network building' module of Pathway Studio.For the definition of the entity types and relation types in the figure please refer to http://pathwaystudio.gousinfo.com/ResNetDatabase.html