@@ -150,54 +150,54 @@ Where "command" is one of: tree, network, hiers, aggregate or select.
150150
151151 - ##### Smiles Format:
152152
153- ScaffoldGraph expects a delimited file where the first column defines a SMILES string, followed by a molecule
154- identifier. If an identifier is not specified the program will use a hash of the molecule as an identifier.
153+ ScaffoldGraph expects a delimited file where the first column defines a SMILES string, followed by a molecule
154+ identifier. If an identifier is not specified the program will use a hash of the molecule as an identifier.
155155
156- Example SMILES file:
156+ Example SMILES file:
157157
158- ```csv
159- CCN1CCc2c(C1)sc(NC(=O)Nc3ccc(Cl)cc3)c2C#N CHEMBL4116520
160- CC(N1CC(C1)Oc2ccc(Cl)cc2)C3=Nc4c(cnn4C5CCOCC5)C(=O)N3 CHEMBL3990718
161- CN(C\C=C\c1ccc(cc1)C(F)(F)F)Cc2coc3ccccc23 CHEMBL4116665
162- N=C1N(C(=Nc2ccccc12)c3ccccc3)c4ccc5OCOc5c4 CHEMBL4116261
163- ...
164- ```
158+ ```csv
159+ CCN1CCc2c(C1)sc(NC(=O)Nc3ccc(Cl)cc3)c2C#N CHEMBL4116520
160+ CC(N1CC(C1)Oc2ccc(Cl)cc2)C3=Nc4c(cnn4C5CCOCC5)C(=O)N3 CHEMBL3990718
161+ CN(C\C=C\c1ccc(cc1)C(F)(F)F)Cc2coc3ccccc23 CHEMBL4116665
162+ N=C1N(C(=Nc2ccccc12)c3ccccc3)c4ccc5OCOc5c4 CHEMBL4116261
163+ ...
164+ ```
165165
166166 - ##### SDF Format:
167167
168- ScaffoldGraph expects an [SDF](https://en.wikipedia.org/wiki/Chemical_table_file) file, where the molecule
169- identifier is specified in the title line. If the title line is blank, then a hash of the molecule
170- will be used as an identifier.
168+ ScaffoldGraph expects an [SDF](https://en.wikipedia.org/wiki/Chemical_table_file) file, where the molecule
169+ identifier is specified in the title line. If the title line is blank, then a hash of the molecule
170+ will be used as an identifier.
171171
172- Note: selecting subsets of a graph will not be possible if a name is not supplied
172+ Note: selecting subsets of a graph will not be possible if a name is not supplied
173173
174- - #### Output Formats
174+ - ### Output Formats
175175
176176 - ##### TSV Format (default)
177177
178- The generate commands (network, hiers, tree) produce an intermediate tsv containing 4 columns:
178+ The generate commands (network, hiers, tree) produce an intermediate tsv containing 4 columns:
179179
180- 1) Number of rings (hierarchy)
181- 2) Scaffold SMILES
182- 3) Sub-scaffold SMILES
183- 4) Molecule ID(s) (top-level scaffolds (Murcko))
180+ 1) Number of rings (hierarchy)
181+ 2) Scaffold SMILES
182+ 3) Sub-scaffold SMILES
183+ 4) Molecule ID(s) (top-level scaffolds (Murcko))
184184
185- The aggregate command produces a tsv containing 4 columns
185+ The aggregate command produces a tsv containing 4 columns
186186
187- 1) Scaffold ID
188- 2) Number of rings (hierarchy)
189- 3) Scaffold SMILES
190- 4) Sub-scaffold IDs
187+ 1) Scaffold ID
188+ 2) Number of rings (hierarchy)
189+ 3) Scaffold SMILES
190+ 4) Sub-scaffold IDs
191191
192192 - ##### SDF Format
193193
194- An SDF file can be produced by the aggregate and select commands. This SDF is
195- formatted according to the SDF specification with added property fields:
194+ An SDF file can be produced by the aggregate and select commands. This SDF is
195+ formatted according to the SDF specification with added property fields:
196196
197- 1) TITLE field = scaffold ID
198- 2) SUBSCAFFOLDS field = list of sub-scaffold IDs
199- 3) HIERARCHY field = number of rings
200- 4) SMILES field = scaffold canonical SMILES
197+ 1) TITLE field = scaffold ID
198+ 2) SUBSCAFFOLDS field = list of sub-scaffold IDs
199+ 3) HIERARCHY field = number of rings
200+ 4) SMILES field = scaffold canonical SMILES
201201
202202
203203--------------------------------------------------------------------------------
@@ -250,8 +250,79 @@ tree = sg.ScaffoldTree.from_smiles('my_smiles_file.smi')
250250
251251- ** Creating custom scaffold prioritisation rules**
252252
253- TODO
253+ If required a user can define their own rules for prioritizing scaffolds during scaffold tree construction.
254+ Rules can be defined by subclassing one of four rule classes:
255+
256+ BaseScaffoldFilterRule, ScaffoldFilterRule, ScaffoldMinFilterRule or ScaffoldMaxFilterRule
257+
258+ When subclassing a name property must be defined and either a condition, get_property or filter function.
259+ Examples are shown below:
260+
261+ ```python
262+ import scaffoldgraph as sg
263+ from scaffoldgraph.prioritization import *
264+
265+ """
266+ Scaffold filter rule (must implement name and condition)
267+ The filter will retain all scaffolds which return a True condition
268+ """
269+
270+ class CustomRule01(ScaffoldFilterRule):
271+ """ Do not remove rings with >= 12 atoms if there are smaller rings to remove"""
272+
273+ def condition(self , child, parent):
274+ removed_ring = child.rings[parent.removed_ring_idx]
275+ return removed_ring.size < 12
276+
277+ @ property
278+ def name(self ):
279+ return ' custom rule 01'
280+
281+ """
282+ Scaffold min/max filter rule (must implement name and get_property)
283+ The filter will retain all scaffolds with the min/max property value
284+ """
285+
286+ class CustomRule02(ScaffoldMinFilterRule):
287+ """ Smaller rings are removed first"""
288+
289+ def get_property(self , child, parent):
290+ return child.rings[parent.removed_ring_idx].size
291+
292+ @ property
293+ def name(self ):
294+ return ' custom rule 02'
295+
296+
297+ """
298+ Scaffold base filter rule (must implement name and filter)
299+ The filter method must return a list of filtered parent scaffolds
300+ This rule is used when a more complex rule is required, this example
301+ defines a tiebreaker rule. Only one scaffold must be left at the end
302+ of all filter rules in a rule set
303+ """
304+
305+ class CustomRule03(BaseScaffoldFilterRule):
306+ """ Tie-breaker rule (alphabetical)"""
307+
308+ def filter (self , child, parents):
309+ return [sorted (parents, key = lambda p : p.smiles)[0 ]]
254310
311+ @ property
312+ def name(self ):
313+ return ' cutstom rule 03'
314+ ```
315+
316+ Custom rules can subsequently be added to a rule set and supplied to the scaffold tree constructor:
317+
318+ ```python
319+ ruleset = ScaffoldRuleSet(name = ' custom rules' )
320+ ruleset.add_rule(CustomRule01())
321+ ruleset.add_rule(CustomRule02())
322+ ruleset.add_rule(CustomRule03())
323+
324+ graph = sg.ScaffoldTree.from_sdf(' my_sdf_file.sdf' , prioritization_rules = ruleset)
325+ ```
255326
256327--------------------------------------------------------------------------------
257328
@@ -292,4 +363,6 @@ ScaffoldGraph uses Travis CI for continuous integration
292363* Bemis, G. W. and Murcko, M. A. (1996 ). The properties of known drugs. 1. molecular frameworks. Journal of Medicinal Chemistry, 39 (15 ), 2887 –2893 .
293364* Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., and Waldmann, H. (2007 ). The scaffold tree visualization of the scaffold universe by hierarchical scaffold classification. Journal of Chemical Information and Modeling, 47 (1 ), 47 –58 . PMID : 17238248 .
294365* Varin, T., Schuffenhauer, A., Ertl, P., and Renner, S. (2011 ). Mining for bioactive scaffolds with scaffold networks: Improved compound set enrichment from primary screening data. Journal of Chemical Information and Modeling, 51 (7 ), 1528 –1538 .
295- * Wetzel, S., Klein, K., Renner, S., Rennerauh, D., Oprea, T. I., Mutzel, P., and Waldmann, H. (2009). Interactive exploration of chemical space with scaffold hunter. Nat Chem Biol, 1875(8), 581–583.
366+ * Varin, T., Gubler, H., Parker, C., Zhang, J., Raman, P., Ertl, P. and Schuffenhauer, A. (2010 ) Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data. Journal of Chemical Information and Modeling, 50 (12 ), 2067 - 2078 .
367+ * Wetzel, S., Klein, K., Renner, S., Rennerauh, D., Oprea, T. I., Mutzel, P., and Waldmann, H. (2009 ). Interactive exploration of chemical space with scaffold hunter. Nat Chem Biol, 1875 (8 ), 581 –583 .
368+ * Wilkens, J., Janes, J. and Su, A. (2005 ). HierS: Hierarchical Scaffold Clustering Using Topological Chemical Graphs. Journal of Medicinal Chemistry, 48 (9 ), 3182 - 3193 .
0 commit comments