Skip to content

Commit 3a29a5b

Browse files
author
OliverBScott
committed
Updated README
1 parent 735c275 commit 3a29a5b

1 file changed

Lines changed: 106 additions & 33 deletions

File tree

README.md

Lines changed: 106 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -150,54 +150,54 @@ Where "command" is one of: tree, network, hiers, aggregate or select.
150150

151151
- ##### Smiles Format:
152152
153-
ScaffoldGraph expects a delimited file where the first column defines a SMILES string, followed by a molecule
154-
identifier. If an identifier is not specified the program will use a hash of the molecule as an identifier.
153+
ScaffoldGraph expects a delimited file where the first column defines a SMILES string, followed by a molecule
154+
identifier. If an identifier is not specified the program will use a hash of the molecule as an identifier.
155155
156-
Example SMILES file:
156+
Example SMILES file:
157157
158-
```csv
159-
CCN1CCc2c(C1)sc(NC(=O)Nc3ccc(Cl)cc3)c2C#N CHEMBL4116520
160-
CC(N1CC(C1)Oc2ccc(Cl)cc2)C3=Nc4c(cnn4C5CCOCC5)C(=O)N3 CHEMBL3990718
161-
CN(C\C=C\c1ccc(cc1)C(F)(F)F)Cc2coc3ccccc23 CHEMBL4116665
162-
N=C1N(C(=Nc2ccccc12)c3ccccc3)c4ccc5OCOc5c4 CHEMBL4116261
163-
...
164-
```
158+
```csv
159+
CCN1CCc2c(C1)sc(NC(=O)Nc3ccc(Cl)cc3)c2C#N CHEMBL4116520
160+
CC(N1CC(C1)Oc2ccc(Cl)cc2)C3=Nc4c(cnn4C5CCOCC5)C(=O)N3 CHEMBL3990718
161+
CN(C\C=C\c1ccc(cc1)C(F)(F)F)Cc2coc3ccccc23 CHEMBL4116665
162+
N=C1N(C(=Nc2ccccc12)c3ccccc3)c4ccc5OCOc5c4 CHEMBL4116261
163+
...
164+
```
165165
166166
- ##### SDF Format:
167167
168-
ScaffoldGraph expects an [SDF](https://en.wikipedia.org/wiki/Chemical_table_file) file, where the molecule
169-
identifier is specified in the title line. If the title line is blank, then a hash of the molecule
170-
will be used as an identifier.
168+
ScaffoldGraph expects an [SDF](https://en.wikipedia.org/wiki/Chemical_table_file) file, where the molecule
169+
identifier is specified in the title line. If the title line is blank, then a hash of the molecule
170+
will be used as an identifier.
171171
172-
Note: selecting subsets of a graph will not be possible if a name is not supplied
172+
Note: selecting subsets of a graph will not be possible if a name is not supplied
173173
174-
- #### Output Formats
174+
- ### Output Formats
175175

176176
- ##### TSV Format (default)
177177
178-
The generate commands (network, hiers, tree) produce an intermediate tsv containing 4 columns:
178+
The generate commands (network, hiers, tree) produce an intermediate tsv containing 4 columns:
179179
180-
1) Number of rings (hierarchy)
181-
2) Scaffold SMILES
182-
3) Sub-scaffold SMILES
183-
4) Molecule ID(s) (top-level scaffolds (Murcko))
180+
1) Number of rings (hierarchy)
181+
2) Scaffold SMILES
182+
3) Sub-scaffold SMILES
183+
4) Molecule ID(s) (top-level scaffolds (Murcko))
184184

185-
The aggregate command produces a tsv containing 4 columns
185+
The aggregate command produces a tsv containing 4 columns
186186
187-
1) Scaffold ID
188-
2) Number of rings (hierarchy)
189-
3) Scaffold SMILES
190-
4) Sub-scaffold IDs
187+
1) Scaffold ID
188+
2) Number of rings (hierarchy)
189+
3) Scaffold SMILES
190+
4) Sub-scaffold IDs
191191
192192
- ##### SDF Format
193193
194-
An SDF file can be produced by the aggregate and select commands. This SDF is
195-
formatted according to the SDF specification with added property fields:
194+
An SDF file can be produced by the aggregate and select commands. This SDF is
195+
formatted according to the SDF specification with added property fields:
196196
197-
1) TITLE field = scaffold ID
198-
2) SUBSCAFFOLDS field = list of sub-scaffold IDs
199-
3) HIERARCHY field = number of rings
200-
4) SMILES field = scaffold canonical SMILES
197+
1) TITLE field = scaffold ID
198+
2) SUBSCAFFOLDS field = list of sub-scaffold IDs
199+
3) HIERARCHY field = number of rings
200+
4) SMILES field = scaffold canonical SMILES
201201
202202
203203
--------------------------------------------------------------------------------
@@ -250,8 +250,79 @@ tree = sg.ScaffoldTree.from_smiles('my_smiles_file.smi')
250250

251251
- **Creating custom scaffold prioritisation rules**
252252

253-
TODO
253+
If required a user can define their own rules for prioritizing scaffolds during scaffold tree construction.
254+
Rules can be defined by subclassing one of four rule classes:
255+
256+
BaseScaffoldFilterRule, ScaffoldFilterRule, ScaffoldMinFilterRule or ScaffoldMaxFilterRule
257+
258+
When subclassing a name property must be defined and either a condition, get_property or filter function.
259+
Examples are shown below:
260+
261+
```python
262+
import scaffoldgraph as sg
263+
from scaffoldgraph.prioritization import *
264+
265+
"""
266+
Scaffold filter rule (must implement name and condition)
267+
The filter will retain all scaffolds which return a True condition
268+
"""
269+
270+
class CustomRule01(ScaffoldFilterRule):
271+
"""Do not remove rings with >= 12 atoms if there are smaller rings to remove"""
272+
273+
def condition(self, child, parent):
274+
removed_ring = child.rings[parent.removed_ring_idx]
275+
return removed_ring.size < 12
276+
277+
@property
278+
def name(self):
279+
return 'custom rule 01'
280+
281+
"""
282+
Scaffold min/max filter rule (must implement name and get_property)
283+
The filter will retain all scaffolds with the min/max property value
284+
"""
285+
286+
class CustomRule02(ScaffoldMinFilterRule):
287+
"""Smaller rings are removed first"""
288+
289+
def get_property(self, child, parent):
290+
return child.rings[parent.removed_ring_idx].size
291+
292+
@property
293+
def name(self):
294+
return 'custom rule 02'
295+
296+
297+
"""
298+
Scaffold base filter rule (must implement name and filter)
299+
The filter method must return a list of filtered parent scaffolds
300+
This rule is used when a more complex rule is required, this example
301+
defines a tiebreaker rule. Only one scaffold must be left at the end
302+
of all filter rules in a rule set
303+
"""
304+
305+
class CustomRule03(BaseScaffoldFilterRule):
306+
"""Tie-breaker rule (alphabetical)"""
307+
308+
def filter(self, child, parents):
309+
return [sorted(parents, key=lambda p: p.smiles)[0]]
254310

311+
@property
312+
def name(self):
313+
return 'cutstom rule 03'
314+
```
315+
316+
Custom rules can subsequently be added to a rule set and supplied to the scaffold tree constructor:
317+
318+
```python
319+
ruleset = ScaffoldRuleSet(name='custom rules')
320+
ruleset.add_rule(CustomRule01())
321+
ruleset.add_rule(CustomRule02())
322+
ruleset.add_rule(CustomRule03())
323+
324+
graph = sg.ScaffoldTree.from_sdf('my_sdf_file.sdf', prioritization_rules=ruleset)
325+
```
255326

256327
--------------------------------------------------------------------------------
257328

@@ -292,4 +363,6 @@ ScaffoldGraph uses Travis CI for continuous integration
292363
* Bemis, G. W. and Murcko, M. A. (1996). The properties of known drugs. 1. molecular frameworks. Journal of Medicinal Chemistry, 39(15), 28872893.
293364
* Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., and Waldmann, H. (2007). The scaffold tree visualization of the scaffold universe by hierarchical scaffold classification. Journal of Chemical Information and Modeling, 47(1), 4758. PMID: 17238248.
294365
* Varin, T., Schuffenhauer, A., Ertl, P., and Renner, S. (2011). Mining for bioactive scaffolds with scaffold networks: Improved compound set enrichment from primary screening data. Journal of Chemical Information and Modeling, 51(7), 15281538.
295-
* Wetzel, S., Klein, K., Renner, S., Rennerauh, D., Oprea, T. I., Mutzel, P., and Waldmann, H. (2009). Interactive exploration of chemical space with scaffold hunter. Nat Chem Biol, 1875(8), 581–583.
366+
* Varin, T., Gubler, H., Parker, C., Zhang, J., Raman, P., Ertl, P. and Schuffenhauer, A. (2010) Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data. Journal of Chemical Information and Modeling, 50(12), 2067-2078.
367+
* Wetzel, S., Klein, K., Renner, S., Rennerauh, D., Oprea, T. I., Mutzel, P., and Waldmann, H. (2009). Interactive exploration of chemical space with scaffold hunter. Nat Chem Biol, 1875(8), 581583.
368+
* Wilkens, J., Janes, J. and Su, A. (2005). HierS:  Hierarchical Scaffold Clustering Using Topological Chemical Graphs. Journal of Medicinal Chemistry, 48(9), 3182-3193.

0 commit comments

Comments
 (0)