RNA-Seq Data Network Analysis
Cytoscape is an open source software platform for integrating,
visualizing, and analysing measurement data in the context of
networks.
This protocol describes a network analysis workflow in Cytoscape for
differentially expressed genes from an RNA-Seq experiment. Overall
workflow:
• Finding a set of differentially expressed genes.
•
Retrieving relevant networks from public databases.
•
Integration and visualization of experimental data.
• Network
functional enrichment analysis.
• Exporting network
visualizations.
Setup
Install the (stringApp) from the
Cytoscape App Store, or via Apps → App Store → Show App
Store.
OR
Just visit the Cytoscape App store and
install/download it from there.
Experimental Data
For this exercise, we will use a dataset comparing transcriptomic
differences between bladder cancer and normal tissue. The study has been
published by Radvanyi F et al., and we will get a summarized dataset
with fold change and p-value from the EBI Gene Expression
Atlas. Array-express ID is E-MTAB-1940.
- link to the publication and data : (Here..!)
- Download the data: Transcriptomic analysis of bladder cancer reveals
convergent molecular pathology. (Here..!).
First Select all the contents (by holding control + A or Command + A
(Mac-users)) and Save the file by right-clicking the mouse button and
using save-as option.
- To open the tsv data file in Excel, first launch Excel and open a
blank workbook. Next, go to Data → Get External Data → Import
Text File….
- In the import wizard, select Delimited and in the
next step select Tab.
- In the third step, you can select the Data Format
for every column. The file has 4 columns of data: Gene ID, Gene
Name, fold change and p-value. Make sure to change the
format for the second column, Gene Name, to Text. You will have
to scroll to the right to see the second column.
- Click Finish to complete the import.
Editing experimental data
We are going to define a set of up-regulated genes from the full
dataset by filtering for fold change and p-value.
For this
reason we will need to edit the raw downloaded file to obtain expression
information for the features specific to bladder cancer.
Download the following file (Here..!)
and open in Microsoft Excel.
• Select the row containing data value headers (row 4) and
select Data → Filter.
• In the drop-down for
the fold change column, set a filter for fold change greater than 2.
This should result in 263 genes.
• Next, one
would normally filter out non-significant changes by filtering on the
p-value as well, for example setting p-value less than 0.05. But in this
case, all genes with a fold change greater than 2 already meet that
cutoff.
• With the filter active, select and copy all entries
in the Gene Name column.

Retrieve Networks from STRING
To identify a relevant network, we will use the
STRING database to find a network relevant to the list
of up-regulated genes.
• Launch Cytoscape. In the
Network Search bar at the top of the Network
Panel, select STRING protein query from the
drop-down, and paste in the list of 263 up-regulated genes.
•
Open the options panel and confirm you are searching Homo
sapiens with a Confidence cutoff of 0.40 and
0 Maximum additional interactors.
• Click the
search icon to search. If any of the search terms are
ambiguous, a Resolve Ambiguous Terms dialog will
appear. Click Import to continue with the import using
the default choices. The resulting network will load automatically, and
should have around 173 nodes.
STRING Network Up-Regulated Genes
The resulting network contains up-regulated genes recognized by
STRING, and interactions between them with a confidence score of 0.4 or
greater.

The networks consist of one large connected component, several
smaller networks, and some unconnected nodes. We will use only the
largest connected component for the rest of the tutorial.
• To select the largest connected component, select Select →
Nodes → Largest subnetwork.
• Select File →
New Network → From Selected Nodes, All Edges.

Data Integration
Next we will import the RNA-Seq data and use them to create a
visualization.
• Load the downloaded E-MTAB-1940-query-results.tsv
file under File menu by selecting Import → Table from
File….. Alternatively, drag and drop the data file directly
onto the Node Table.
• In Advanced Options…,
in the Ignore Lines Starting With field, enter #, to
exclude the additional lines at the beginning of the data file.
• Select the query term column as the Key
column for Network and select the Gene Name
column as the key column by clicking on the header and selecting the key
symbol.
• Click OK to import. Two new columns
of data will be added to the Node Table.
Visualization
Next, we will create a visualization of the imported data on the
network. For more detailed information on data visualization, see the
Visualizing Data tutorial.
• In the Style tab of the Control
Panel, switch the style from STRING style to
default in the drop-down at the top.
• Change
the default node Shape to ellipse and
check Lock node width and height.
• Set the
default node Size to 50.
•
Set the default node Fill Color to light
gray.
• Set the default Border Width
to 2, and make the default Border Paint dark
gray.

• For node Fill Color,
create a continuous mapping for ‘NMIBC’ vs ‘normal’
.foldChange.
• Double-click the color mapping to open the
Continuous Mapping Editor and click the Current
Palette. Select the ColorBrewer yellow-orange-red
shades gradient.
• Finally, for node
Label, set a passthrough mapping for display
name.
• Save your new visualization under Copy
Style… in the Options menu of the
Style interface, and name it de genes up.
Apply the Prefuse Force Directed layout by clicking
the Apply Preferred Layout button in the toolbar. The
network will now look something like this:
