-
Notifications
You must be signed in to change notification settings - Fork 0
SessionGEO
Create a new session by selecting "Process GEO data"

In the new GEO session, here is the wizard page:

Some of the steps run as a background task, and so you can “Check Tasks” (red circle) or just wait for it to finish. You can tell when it is finished when the next step is highlighted in a blue box on the Wizard page.
For this step, you only need to provide a GSE id from GEO, then the app will download the data from GEO in the background.
Usually there is only a single series matrix file choice shown in the table, but some GEO series have multiple sub-series because they contain more than 1 platform type, and so you'll need to visit the GEO web entry to decide which of these you would like to process for this series. Each GEO session can only handle one of these platforms at a time, so it gets a little messy when you run into those.
Question: What if the num_features column shows 0 (zero)? Answer: the series matrix file does not contain any of the expression data, and so you should still click the series_matrix filename since that will extract the sample annotation metadata (if num_samples is >0). Afterwards, you will need to download the expression data to your computer and then upload it separately, or you can try to download the supplementary file directly to this session (instructions available on the next step) and then be sure to try the "Modify the Expression matrix" step in the Wizard page.
This step runs as a task in the background.
The goal of this step is to extract the samples annotations and expression values from the Rdata file which was generated in the previous step. An important caveat is that some GEO entries have a "series matrix" file with all the sample annotations and gene expression data in it (step 3A below), while other GEO entries have a very small series matrix file with only the sample annotations (step 3B below). Usually the older GEO entries with microarray data have the full data in the series matrix with the supplementary files having the CEL files or other raw data. While RNA-seq data is not consistently made available, and you need to look at the supplementary files or contact the author, process the fastq files from SRA or ENA, etc.
Here is a screenshot of part of a GEO entry which shows the link to the series matrix file and any supplementary files. If you go to that ftp site with the series matrix link, and see a file which is under 1MB, then it is probably just the sample annotations:

If the GEO entry has expression data in its "series_matrix" file (which was downloaded in a prior step), then you will see this page (with some very nice help text, in my opinion):

If the GEO entry does not have expression data in its "series_matrix" file, then you will see this page. The red arrows show which pages you get after clicking those links. You will need to click the blue button to extract the sample annotations into a GEO.samples.csv file. Then you will need to go back to this page and click one of the other links in order to either grab an expression matrix from a custom URL that you supply, or to upload an expression matrix file from your computer. Either of those uploads will end up as a GEO.expr.txt registered file in this session. Once it is here, then you can follow these directions to modify the expression matrix.

After these 3 steps, you should have the three required files of a typical expression session: a series info file, a sample annotation file and, if there was a full series matrix file in GEO, then you’ll also have a gene expression matrix file. Go back to the Expression Sessions page to read more about the next steps.
(c) 2015-2025, Needle Genomics LLC