Part A:

RNA-Seq Data Network Analysis

Cytoscape is an open source software platform for integrating, visualizing, and analysing measurement data in the context of networks.

This protocol describes a network analysis workflow in Cytoscape for differentially expressed genes from an RNA-Seq experiment. Overall workflow:

• Finding a set of differentially expressed genes.
• Retrieving relevant networks from public databases.
• Integration and visualization of experimental data.
• Network functional enrichment analysis.
• Exporting network visualizations.

Setup

Install the (stringApp) from the Cytoscape App Store, or via Apps → App Store → Show App Store.

OR

Just visit the Cytoscape App store and install/download it from there.



Experimental Data

For this exercise, we will use a dataset comparing transcriptomic differences between bladder cancer and normal tissue. The study has been published by Radvanyi F et al., and we will get a summarized dataset with fold change and p-value from the EBI Gene Expression Atlas. Array-express ID is E-MTAB-1940.

  • link to the publication and data : (Here..!)



  • Download the data: Transcriptomic analysis of bladder cancer reveals convergent molecular pathology. (Here..!). First Select all the contents (by holding control + A or Command + A (Mac-users)) and Save the file by right-clicking the mouse button and using save-as option.
  • To open the tsv data file in Excel, first launch Excel and open a blank workbook. Next, go to Data → Get External Data → Import Text File….
  • In the import wizard, select Delimited and in the next step select Tab.
  • In the third step, you can select the Data Format for every column. The file has 4 columns of data: Gene ID, Gene Name, fold change and p-value. Make sure to change the format for the second column, Gene Name, to Text. You will have to scroll to the right to see the second column.
  • Click Finish to complete the import.


Editing experimental data

We are going to define a set of up-regulated genes from the full dataset by filtering for fold change and p-value.
For this reason we will need to edit the raw downloaded file to obtain expression information for the features specific to bladder cancer.

Download the following file (Here..!) and open in Microsoft Excel.


• Select the row containing data value headers (row 4) and select Data → Filter.
• In the drop-down for the fold change column, set a filter for fold change greater than 2. This should result in 263 genes.
• Next, one would normally filter out non-significant changes by filtering on the p-value as well, for example setting p-value less than 0.05. But in this case, all genes with a fold change greater than 2 already meet that cutoff.
• With the filter active, select and copy all entries in the Gene Name column.




Retrieve Networks from STRING

To identify a relevant network, we will use the STRING database to find a network relevant to the list of up-regulated genes.

• Launch Cytoscape. In the Network Search bar at the top of the Network Panel, select STRING protein query from the drop-down, and paste in the list of 263 up-regulated genes.
• Open the options panel and confirm you are searching Homo sapiens with a Confidence cutoff of 0.40 and 0 Maximum additional interactors.
• Click the search icon to search. If any of the search terms are ambiguous, a Resolve Ambiguous Terms dialog will appear. Click Import to continue with the import using the default choices. The resulting network will load automatically, and should have around 173 nodes.




STRING Network Up-Regulated Genes

The resulting network contains up-regulated genes recognized by STRING, and interactions between them with a confidence score of 0.4 or greater.



The networks consist of one large connected component, several smaller networks, and some unconnected nodes. We will use only the largest connected component for the rest of the tutorial.

• To select the largest connected component, select Select → Nodes → Largest subnetwork.
• Select File → New Network → From Selected Nodes, All Edges.



Data Integration

Next we will import the RNA-Seq data and use them to create a visualization.

• Load the downloaded E-MTAB-1940-query-results.tsv file under File menu by selecting Import → Table from File….. Alternatively, drag and drop the data file directly onto the Node Table.
• In Advanced Options…, in the Ignore Lines Starting With field, enter #, to exclude the additional lines at the beginning of the data file.
• Select the query term column as the Key column for Network and select the Gene Name column as the key column by clicking on the header and selecting the key symbol.
• Click OK to import. Two new columns of data will be added to the Node Table.





Visualization

Next, we will create a visualization of the imported data on the network. For more detailed information on data visualization, see the Visualizing Data tutorial.

• In the Style tab of the Control Panel, switch the style from STRING style to default in the drop-down at the top.
• Change the default node Shape to ellipse and check Lock node width and height.
• Set the default node Size to 50.
• Set the default node Fill Color to light gray.
• Set the default Border Width to 2, and make the default Border Paint dark gray.





• For node Fill Color, create a continuous mapping for ‘NMIBC’ vs ‘normal’ .foldChange.
• Double-click the color mapping to open the Continuous Mapping Editor and click the Current Palette. Select the ColorBrewer yellow-orange-red shades gradient.
• Finally, for node Label, set a passthrough mapping for display name.
• Save your new visualization under Copy Style… in the Options menu of the Style interface, and name it de genes up.




Apply the Prefuse Force Directed layout by clicking the Apply Preferred Layout button in the toolbar. The network will now look something like this:



Exercise

a. STRING Enrichment

The STRING app has built-in enrichment analysis functionality, which includes enrichment for Gene Ontology, InterPro, KEGG Pathways, and PFAM.

  • Using the STRING tab of the Results Panel, click the Functional Enrichment button. Keep the default settings. What do you see.



  • When the enrichment analysis is complete, a new tab titled STRING Enrichment will open in the Table Panel.

  • The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the STRING Enrichment tab. Filter the table to only show GO Biological Process.

  • At the top left of the STRING enrichment tab, click the filter icon . Select GO Biological Process and check the Remove redundant terms check-box. Then click OK.

  • Next, add a split donut chart to the nodes representing the top terms by clicking on

  • Explore custom settings via in the top right of the STRING enrichment tab.

b. Repeat the whole experiment using “down-regulated” genes.

c. Export your Networks

d. Save in any of the formats and be ready for publishing.

We will now go to the next session session (2B) to understand how to perform Functional Enrichment.

---
title: "RNA-Seq Data Network Analysis - Session 2-A"
author: "Akshay Bhat"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output: 
  html_notebook: 
    toc: yes
    toc_float: yes
    css: stylesheets/styles.css
---
<img src="images/logo-sm.png" style="position:absolute;top:40px;right:10px;" width="200" />


# Part A:


# RNA-Seq Data Network Analysis
Cytoscape is an open source software platform for integrating, visualizing, and analysing measurement data in the context of networks.<br></br>
  
  This protocol describes a network analysis workflow in Cytoscape for differentially expressed genes from an RNA-Seq experiment. Overall workflow:<br></br>
  
  •	Finding a set of differentially expressed genes. <br></br>
  •	Retrieving relevant networks from public databases. <br></br>
  •	Integration and visualization of experimental data. <br></br>
  •	Network functional enrichment analysis. <br></br>
  •	Exporting network visualizations. <br></br>
  
  
## Setup
  
  Install the ([stringApp](http://apps.cytoscape.org/apps/stringapp)) from the Cytoscape App Store, or via **Apps → App Store → Show App Store.** <br></br>
  
## OR <br></br>
  
Just visit the **Cytoscape App store** and install/download it from there.
<br></br><br></br>
  ![](images/StringApp.png)
<br></br><br></br>
  
## Experimental Data
  For this exercise, we will use a dataset comparing transcriptomic differences between bladder cancer and normal tissue. The study has been published by Radvanyi F et al., and we will get a summarized dataset with fold change and p-value from the **EBI Gene Expression Atlas**. Array-express ID is **E-MTAB-1940**.  
  
* link to the publication and data : ([Here..!](https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-1940?query=E-MTAB-1940))
  
  
<br></br>
  <br></br>
  
* Download the data: Transcriptomic analysis of bladder cancer reveals convergent molecular pathology. ([Here..!](https://raw.githubusercontent.com/a1aks/Cytoscape_Course/main/Data_Files/BCLA-all.tsv)).  First Select all the contents (by holding control + A or Command + A (Mac-users)) and Save the file by right-clicking the mouse button and using save-as option.  <br></br>
* To open the tsv data file in Excel, first launch Excel and open a blank workbook. Next, go to **Data → Get External Data → Import Text File....** <br></br>
* In the import wizard, select **Delimited** and in the next step select Tab.
<br></br>
* In the third step, you can select the **Data Format** for every column. The file has 4 columns of data: **Gene ID, Gene Name, fold change and p-value**. **Make sure to change the format for the second column, Gene Name, to Text.** You will have to scroll to the right to see the second column.
<br></br>
* Click **Finish** to complete the import.

<br></br>
  
## Editing experimental data
  We are going to define a set of up-regulated genes from the full dataset by filtering for fold change and p-value.
<br></br>
For this reason we will need to edit the raw downloaded file to obtain expression information for the features specific to bladder cancer. 

Download the following file ([Here..!](https://raw.githubusercontent.com/a1aks/Cytoscape_Course/main/Data_Files/E-MTAB-1940-query-results.tsv)) and open in Microsoft Excel. 

<br></br>
  •	Select the row containing data value headers (row 4) and select **Data → Filter.**<br></br>
  •	In the drop-down for the fold change column, set a filter for fold change greater than 2. This should result in **263** genes.<br></br>
  •	Next, one would normally filter out non-significant changes by filtering on the p-value as well, for example setting p-value less than 0.05. But in this case, all genes with a fold change greater than 2 already meet that cutoff.<br></br>
  •	With the filter active, select and copy all entries in the **Gene Name** column.<br></br>
  <br></br><br></br>
  ![](images/Exp_data.png)

<br></br><br></br>
  
## Retrieve Networks from STRING
  
  To identify a relevant network, we will use the **STRING** database to find a network relevant to the list of up-regulated genes.
<br></br><br></br>
  •	Launch Cytoscape. In the **Network Search** bar at the top of the **Network Panel**, select **STRING protein query** from the drop-down, and paste in the list of 263 up-regulated genes.<br></br>
  •	Open the options panel  and confirm you are searching **Homo sapiens** with a **Confidence cutoff of 0.40** and **0 Maximum additional interactors.**<br></br>
  •	Click the **search icon** to search. If any of the search terms are ambiguous, a **Resolve Ambiguous Terms** dialog will appear. Click **Import** to continue with the import using the default choices. The resulting network will load automatically, and should have around **173** nodes. <br></br>
  
  <br></br>
  
  ![](images/String-Import.png)
<br></br><br></br>
  
## STRING Network Up-Regulated Genes
  
  The resulting network contains up-regulated genes recognized by STRING, and interactions between them with a confidence score of 0.4 or greater.
<br></br><br></br>
  
  ![](images/String-Image1.png)


<br></br><br></br>
  
  The networks consist of one large connected component, several smaller networks, and some unconnected nodes. We will use only the largest connected component for the rest of the tutorial.<br></br>
  
  •	To select the largest connected component, select **Select → Nodes → Largest subnetwork.**<br></br>
  •	Select **File → New Network → From Selected Nodes, All Edges.**<br></br>
  <br></br><br></br>
  ![](images/String-Image2.png)

<br></br>

## Data Integration
  
  Next we will import the RNA-Seq data and use them to create a visualization.<br></br>
  
  •	Load the downloaded **E-MTAB-1940-query-results.tsv** file under File menu by selecting **Import → Table from File.....** Alternatively, drag and drop the data file directly onto the Node Table.<br></br>
  •	In **Advanced Options**..., in the **Ignore Lines Starting With field**, enter #, to exclude the additional lines at the beginning of the data file.<br></br>
•	Select the **query term** column as the **Key column for Network** and select the **Gene Name** column as the key column by clicking on the header and selecting the key symbol.<br></br>
  •	Click **OK** to import. Two new columns of data will be added to the **Node Table.**<br></br>
  
  <br></br><br></br>
  ![](images/Import-data-int.png)
![](images/Import-Columns.png)
<br></br><br></br>
  
  
## Visualization
  Next, we will create a visualization of the imported data on the network. For more detailed information on data visualization, see the Visualizing Data tutorial.<br></br>
  
  •	In the **Style** tab of the **Control Panel**, switch the style from **STRING** style to **default** in the drop-down at the top.<br></br>
  •	Change the default node **Shape** to **ellipse** and **check Lock node width and height.**<br></br>
  •	Set the default node **Size** to **50.**<br></br>
  •	Set the default node **Fill Color** to **light gray**.<br></br>
  •	Set the default **Border Width** to 2, and make the default **Border Paint** dark gray.<br></br>
  <br></br><br></br>
  ![](images/Visualization-Styles.png)

</br><br></br><br></br>
  •	For node **Fill Color**, create a continuous mapping for 'NMIBC' vs 'normal' .foldChange.<br></br>
  •	Double-click the color mapping to open the **Continuous Mapping Editor** and click the **Current Palette**. Select the ColorBrewer **yellow-orange-red shades gradient**.<br></br>
  •	Finally, for node **Label**, set a passthrough mapping for display name.<br></br>
  •	Save your new visualization under **Copy Style...** in the **Options** menu of the **Style** interface, and name it de genes up.<br></br>
  <br></br><br></br>
  ![](images/Style-options.png)
<br></br><br></br>
  
  
  Apply the **Prefuse Force Directed** layout by clicking the **Apply Preferred Layout** button in the toolbar. The network will now look something like this:
  <br></br><br></br>
  ![](images/Prefuse-force-layout.png)
<br></br><br></br>

<div class="exercise">
# Exercise

## a. STRING Enrichment

The STRING app has built-in enrichment analysis functionality, which includes enrichment for Gene Ontology, InterPro, KEGG Pathways, and PFAM.<br></br>

*	Using the STRING tab of the Results Panel, click the **Functional Enrichment button**. Keep the default settings. What do you see.<br></br>
    ![](images/FunctionalEnrichmentButton.png)
  
  <br></br><br></br>

* When the enrichment analysis is complete, a new tab titled **STRING** **Enrichment** will open in the **Table Panel**.<br></br>

* The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the **STRING Enrichment tab**. Filter the table to only show **GO Biological Process.**<br></br>
   
* At the top left of the STRING enrichment tab, click the filter icon `r icons::fontawesome("filter", style = "solid")` . Select **GO Biological Process** and check the **Remove redundant terms check-box**. Then click **OK.**<br></br>
* Next, add a split donut chart to the nodes representing the top terms by clicking on <br></br>
* Explore custom settings via   in the top right of the STRING enrichment tab.<br></br>   
    
### b. Repeat the whole experiment using "down-regulated" genes. 

### c. Export your Networks

### d. **Save in any of the formats and be ready for publishing.**
  
  </div>
  
  
### We will now go to the next session [session (2B)](session2b.nb.html) to understand how to perform Functional Enrichment.    
  