|
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | 8 | "# MDIBL Transcriptome Assembly Learning Module\n", |
9 | | - "# Notebook 1: Setup\n", |
| 9 | + "# Notebook 1: Setup" |
| 10 | + ] |
| 11 | + }, |
| 12 | + { |
| 13 | + "cell_type": "markdown", |
| 14 | + "id": "f62d616c", |
| 15 | + "metadata": {}, |
| 16 | + "source": [ |
| 17 | + "## Overview\n", |
10 | 18 | "\n", |
11 | 19 | "This notebook is designed to configure your virtual machine (VM) to have the proper tools and data in place to run the transcriptome assembly training module." |
12 | 20 | ] |
13 | 21 | }, |
| 22 | + { |
| 23 | + "cell_type": "markdown", |
| 24 | + "id": "60145056", |
| 25 | + "metadata": {}, |
| 26 | + "source": [ |
| 27 | + "## Learning Objectives\n", |
| 28 | + "\n", |
| 29 | + "1. **Understand and utilize shell commands within Jupyter Notebooks:** The notebook explicitly teaches the difference between `!` and `%` prefixes for executing shell commands, and how to navigate directories using `cd` and `pwd`.\n", |
| 30 | + "\n", |
| 31 | + "2. **Set up the necessary software:** Students will install and configure essential tools including:\n", |
| 32 | + " * Java (a prerequisite for Nextflow).\n", |
| 33 | + " * Mambaforge (a package manager for bioinformatics tools).\n", |
| 34 | + " * `sra-tools`, `perl-dbd-sqlite`, and `perl-dbi` (specific bioinformatics packages).\n", |
| 35 | + " * Nextflow (a workflow management system).\n", |
| 36 | + " * `gsutil` (for interacting with Google Cloud Storage).\n", |
| 37 | + "\n", |
| 38 | + "3. **Download and organize necessary data:** Students will download the TransPi transcriptome assembly software and its associated resources (databases, scripts, configuration files) from a Google Cloud Storage bucket. This includes understanding the directory structure and file organization.\n", |
| 39 | + "\n", |
| 40 | + "4. **Manage file permissions:** Students will use the `chmod` command to set executable permissions for the necessary files and directories within the TransPi software.\n", |
| 41 | + "\n", |
| 42 | + "5. **Navigate file paths:** The notebook provides examples and explanations for using relative file paths (e.g., `./`, `../`) within shell commands." |
| 43 | + ] |
| 44 | + }, |
| 45 | + { |
| 46 | + "cell_type": "markdown", |
| 47 | + "id": "549be731", |
| 48 | + "metadata": {}, |
| 49 | + "source": [ |
| 50 | + "## Prerequisites\n", |
| 51 | + "\n", |
| 52 | + "* **Operating System:** A Linux-based system is assumed (commands like `apt`, `uname` are used). The specific distribution isn't specified but a Debian-based system is likely.\n", |
| 53 | + "* **Shell Access:** The ability to execute shell commands from within the Jupyter Notebook environment (using `!` and `%`).\n", |
| 54 | + "* **Java Development Kit (JDK):** Required for Nextflow.\n", |
| 55 | + "* **Miniforge** A package manager for installing bioinformatics tools.\n", |
| 56 | + "* **`gsutil`:** The Google Cloud Storage command-line tool. This is crucial for downloading data from Google Cloud Storage." |
| 57 | + ] |
| 58 | + }, |
| 59 | + { |
| 60 | + "cell_type": "markdown", |
| 61 | + "id": "a92f62a0", |
| 62 | + "metadata": {}, |
| 63 | + "source": [ |
| 64 | + "## Get Started" |
| 65 | + ] |
| 66 | + }, |
14 | 67 | { |
15 | 68 | "cell_type": "markdown", |
16 | 69 | "id": "958495ce-339d-4d4d-a621-9ede79a7363c", |
|
71 | 124 | "metadata": {}, |
72 | 125 | "outputs": [], |
73 | 126 | "source": [ |
74 | | - "!pwd" |
| 127 | + "! pwd" |
75 | 128 | ] |
76 | 129 | }, |
77 | 130 | { |
|
89 | 142 | "metadata": {}, |
90 | 143 | "outputs": [], |
91 | 144 | "source": [ |
92 | | - "!sudo apt update\n", |
93 | | - "!sudo apt-get install default-jdk -y\n", |
94 | | - "!java -version" |
| 145 | + "! sudo apt update\n", |
| 146 | + "! sudo apt-get install default-jdk -y\n", |
| 147 | + "! java -version" |
95 | 148 | ] |
96 | 149 | }, |
97 | 150 | { |
98 | 151 | "cell_type": "markdown", |
99 | 152 | "id": "7b3ffb16-3395-4c01-9774-ee568e815490", |
100 | 153 | "metadata": {}, |
101 | 154 | "source": [ |
102 | | - "**Step 3:** Install Mambaforge, which is needed to support the information held within the TransPi databases.\n", |
103 | | - "\n", |
104 | | - ">Mambaforge is a package manager." |
| 155 | + "**Step 3:** Install Miniforge (a package manager), which is needed to support the information held within the TransPi databases." |
105 | 156 | ] |
106 | 157 | }, |
107 | 158 | { |
|
111 | 162 | "metadata": {}, |
112 | 163 | "outputs": [], |
113 | 164 | "source": [ |
114 | | - "!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n", |
115 | | - "!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge\n", |
116 | | - "!~/mambaforge/bin/mamba install -c bioconda sra-tools perl-dbd-sqlite perl-dbi -y" |
| 165 | + "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh\n", |
| 166 | + "! bash Miniforge3-$(uname)-$(uname -m).sh -b -p $HOME/miniforge" |
| 167 | + ] |
| 168 | + }, |
| 169 | + { |
| 170 | + "cell_type": "markdown", |
| 171 | + "id": "c5584e2e", |
| 172 | + "metadata": {}, |
| 173 | + "source": [ |
| 174 | + "Next, add it to the path." |
| 175 | + ] |
| 176 | + }, |
| 177 | + { |
| 178 | + "cell_type": "code", |
| 179 | + "execution_count": null, |
| 180 | + "id": "ad030cd1", |
| 181 | + "metadata": {}, |
| 182 | + "outputs": [], |
| 183 | + "source": [ |
| 184 | + "import os\n", |
| 185 | + "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/miniforge/bin\"" |
| 186 | + ] |
| 187 | + }, |
| 188 | + { |
| 189 | + "cell_type": "markdown", |
| 190 | + "id": "7b930ad7", |
| 191 | + "metadata": {}, |
| 192 | + "source": [ |
| 193 | + "Next, using Miniforge and bioconda, install the tools that will be used in this tutorial." |
| 194 | + ] |
| 195 | + }, |
| 196 | + { |
| 197 | + "cell_type": "code", |
| 198 | + "execution_count": null, |
| 199 | + "id": "4d4dd51e", |
| 200 | + "metadata": {}, |
| 201 | + "outputs": [], |
| 202 | + "source": [ |
| 203 | + "! mamba install -c bioconda sra-tools perl-dbd-sqlite perl-dbi -y" |
117 | 204 | ] |
118 | 205 | }, |
119 | 206 | { |
|
131 | 218 | "metadata": {}, |
132 | 219 | "outputs": [], |
133 | 220 | "source": [ |
134 | | - "!curl https://get.nextflow.io | bash\n", |
135 | | - "!chmod +x nextflow\n", |
136 | | - "!./nextflow self-update" |
| 221 | + "! curl https://get.nextflow.io | bash\n", |
| 222 | + "! chmod +x nextflow\n", |
| 223 | + "! ./nextflow self-update" |
137 | 224 | ] |
138 | 225 | }, |
139 | 226 | { |
|
152 | 239 | "metadata": {}, |
153 | 240 | "outputs": [], |
154 | 241 | "source": [ |
155 | | - "!gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/TransPi ./" |
| 242 | + "! gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/TransPi ./" |
156 | 243 | ] |
157 | 244 | }, |
158 | 245 | { |
|
190 | 277 | "metadata": {}, |
191 | 278 | "outputs": [], |
192 | 279 | "source": [ |
193 | | - "!gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/resources ./" |
| 280 | + "! gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/resources ./" |
194 | 281 | ] |
195 | 282 | }, |
196 | 283 | { |
|
234 | 321 | "metadata": {}, |
235 | 322 | "outputs": [], |
236 | 323 | "source": [ |
237 | | - "!chmod -R +x ./TransPi/bin" |
| 324 | + "! chmod -R +x ./TransPi/bin" |
238 | 325 | ] |
239 | 326 | }, |
240 | 327 | { |
|
295 | 382 | }, |
296 | 383 | { |
297 | 384 | "cell_type": "markdown", |
298 | | - "id": "f80a7bab-98ae-45a6-845f-ad3c4138575a", |
| 385 | + "id": "ffec658a", |
299 | 386 | "metadata": {}, |
300 | 387 | "source": [ |
301 | | - "## When you are ready, proceed to the next notebook: [`Submodule_02_basic_assembly.ipynb`](./Submodule_02_basic_assembly.ipynb)." |
| 388 | + "## Conclusion\n", |
| 389 | + "\n", |
| 390 | + "This notebook successfully configured the virtual machine for the MDIBL Transcriptome Assembly Learning Module. We updated the system, installed necessary software including Java, Mambaforge, and Nextflow, and downloaded the TransPi program and its associated resources from Google Cloud Storage. The `chmod` command ensured executability of the TransPi scripts. The VM is now prepared for the next notebook, `Submodule_02_basic_assembly.ipynb`, which will delve into the transcriptome assembly process itself. Successful completion of this notebook's steps is crucial for the successful execution of subsequent modules." |
302 | 391 | ] |
303 | 392 | }, |
304 | 393 | { |
305 | | - "cell_type": "code", |
306 | | - "execution_count": null, |
307 | | - "id": "934165c2-8fbd-4801-979f-6db5d1e592ea", |
| 394 | + "cell_type": "markdown", |
| 395 | + "id": "666c1e4d", |
308 | 396 | "metadata": {}, |
309 | | - "outputs": [], |
310 | | - "source": [] |
| 397 | + "source": [ |
| 398 | + "## Clean Up\n", |
| 399 | + "\n", |
| 400 | + "Remember to proceed to the next notebook [`Submodule_02_basic_assembly.ipynb`](./Submodule_02_basic_assembly.ipynb) or shut down your instance if you are finished." |
| 401 | + ] |
311 | 402 | } |
312 | 403 | ], |
313 | | - "metadata": {}, |
| 404 | + "metadata": { |
| 405 | + "language_info": { |
| 406 | + "name": "python" |
| 407 | + } |
| 408 | + }, |
314 | 409 | "nbformat": 4, |
315 | 410 | "nbformat_minor": 5 |
316 | 411 | } |
0 commit comments