SessionRegFileScoreInfoCsv

Overview

The score_info.csv provides instructions on which files have the score information, what type of score was calculated and which columns in the score files have the numeric score and the rank of that score. By "instructions" I mean that the data loading functionality of the app needs to read this file in order to figure out how to load the score data from the session area into the SQL portion of the app.

If you use the built-in calculation tools in the session part of the app, then this score-info.csv file will be automatically updated. If you want to calculate a score yourself and upload the results, then you need to read this page.

The score_info.csv file has metadata about the scores which have been generated for a series/dataset. Most of the scores of interest are two group comparisons -- meaning that two groups of samples are used and then the fold change and p-value are calculated for each gene between the groups. Typically, one column in the samples.csv file is used to define which samples are in group1 and which are in group2.

Here is a tabular view of the csv file. We use a csv file since it is a metadata file and we use a tab delimited text file for the data files (e.g. the expression matrix and the score data files).

The column names and contents are:

name: the name of the score. Must be unique per file
filename: this has the filename which has the score data in it. The score data has at least 3 columns, the first column is the feature name. The other required columns are the score and the rank for each gene. The score is a number (like fold change or FDR), the rank is how this feature ranks for this score, with "1" being the "best." Avoid ties. The score file can have additional columns, but they will be ignored.
description: see below
score_type: usually: mean, stdev, logFC, FDR [try and use these if possible since sometimes the app looks for these strings]
rank_threshold: this the number of features which are "differential" using this score. So, for logFC, my two group comparison code puts the number of genes where the abs(logFC) > 1.73 [I have no idea why I picked 1.73] and for FDR I put the number of genes with an FDR< 0.05.
score_col: this has the column number to find the score value in the filename. column zero (0) is the feature name. column (1) is the first column after the feature name, and so on.
rank_col: this has the column number to find the rank in the filename
group1: in a two group comparison, this is the "numerator" value from the column of interest (which you specify in a particular place in the description column. In a future version, I should make this better). See below. If you have a global score, which doesn't use a subset of samples, then you can use "all" right here.
group2: in a two group comparison, this is the "denominator." If you have a global score, which doesn't use a subset of samples, then leave it blank. If you want all other samples not already picked in group1, then you can put "all_others" in this spot.

More info on "description"

The app sometimes parses the "description" column to grab some info on which columns were used for the two group comparison. In this example, the description in one row is:

limma logFC, celltype column: EC vs Podocytes

If you want to manually create you own description, you should follow the format of this one and keep all the spaces and punctuation the same. Just make these changes:

limma --> deseq
celltype --> [whichever column in your expr.samples.csv file has the group information for the two group comparison]
EC --> [this value in the "celltype" column shows which samples are in the numerator of the two group comparison]
Podocytes --> [this value in the "celltype" column shows which samples are in the denominator of the two group comparison]

so in the end it should be like this (making some random guesses for the column name and values): deseq logFC, treatment column: drug vs DMSO

and then the group1 value would be "drug" and the group2 value would be "DMSO"

NOTE:: this all assumes you are doing basically a two group comparison. If your model is more complicated then my parsing of the description will probably not be useful. I don't think too many things will break if you do this differently, but we can see what happens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SessionRegFileScoreInfoCsv

Overview

The column names and contents are:

More info on "description"

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally