WO2015028597A1 - A method for prediction of herg potassium channel inhibition in acidic and zwitterionic compounds - Google Patents

A method for prediction of herg potassium channel inhibition in acidic and zwitterionic compounds Download PDF

Info

Publication number
WO2015028597A1
WO2015028597A1 PCT/EP2014/068360 EP2014068360W WO2015028597A1 WO 2015028597 A1 WO2015028597 A1 WO 2015028597A1 EP 2014068360 W EP2014068360 W EP 2014068360W WO 2015028597 A1 WO2015028597 A1 WO 2015028597A1
Authority
WO
WIPO (PCT)
Prior art keywords
descriptor
conformers
prediction method
descriptors
maximum
Prior art date
Application number
PCT/EP2014/068360
Other languages
English (en)
French (fr)
Inventor
Nikolai Georgiev NIKOLOV
Eva Bay WEDEBYE
Original Assignee
Technical University Of Denmark
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technical University Of Denmark filed Critical Technical University Of Denmark
Publication of WO2015028597A1 publication Critical patent/WO2015028597A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6872Intracellular protein regulatory factors and their receptors, e.g. including ion channels
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures

Definitions

  • the present invention provides a method for developing predictive models of hERG ion channel inhibition by use of a training set of compounds selected from acidic and/or zwitterionic compounds, as well as a predictive method of hERG ion channel inhibition activity of compounds selected from acidic and/or zwitterionic compounds. Also, the present invention relates to computer assisted methods of the above. Such methods are useful for the screening of drugs for cardiac toxicity.
  • Ion channels are cellular proteins that regulate the flow of ions, including potassium, calcium, chloride and sodium into and out of cells. Such channels are present in all animal and human cells and affect a variety of processes including neuronal transmission, muscle contraction, and cellular secretion.
  • Potassium (K + ) channels are structurally and functionally diverse families of potassium selective channel proteins, which are ubiquitous in cells, and have central importance in regulating a number of key cell functions for example in the brain, heart, pancreas, prostate, kidney, gastro-intestinal tract, small intestine and peripheral blood leukocytes, placenta, lung, spleen, colon, thymus, testis and ovaries, epithelia and inner ear organs.
  • hERG human ether-a-go-go-related gene
  • K v 1 1 .1 or KCHN2 The human ether-a-go-go-related gene (hERG) encodes the pore forming alpha subunit of the hERG potassium ion channel (also called K v 1 1 .1 or KCHN2) which plays a crucial role in repolarization of the heart and mediates the repolarizing l KR current in the cardiac action potential.
  • Inhibition or blocking of the channel is associated with QT interval prolongation (long QT syndrome) which in turn may cause torsades de pointes, a potentially fatal arrhythmia (Mitcheson 2000, Sanguinetti 2006).
  • Blockade of hERG has been extensively investigated in the recent decade, including mechanistic studies, a number of in silico approaches (reviews are available e.g. in Aronov 2005, Schiesaro et al. 201 1 ), and new in vitro assays proposed as alternatives to the traditional cost-intensive patch-clamp method.
  • hERG blocking remains an important marker for cardiac risk.
  • hERG inhibition is an important activity that must be avoided during drug development, and the need for assessment of QT prolongation liability of drugs under development is recognized in topic E14 of the International Conference on Harmonization in 2005.
  • a relatively diverse group of drugs have been found to induce arrhythmias by blockage of hERG ion channels.
  • Predictive models for hERG inhibition can assist the elimination of possibly cardiotoxic drug candidates at an early stage in drug design. In this way, both the costs of drug development and the time spent on development may be reduced since the research can be focused on drug candidates with decreased risk of cardiotoxicity.
  • hERG is a relatively promiscuous target which has been shown to interact with pharmaceuticals of a highly varying structure, the development of an accurate prediction method is a difficult task.
  • hERG ion channel pharmacophore models from a number of different studies agree that charged nitrogen (hydrogen bond acceptor) and aromatic rings (hydrophobic feature) were important features to consider in hERG binding.
  • the article does not disclose the descriptors effective cross-sectional diameter and donor (electrophilic) superdelocalizability.
  • Waring does not disclose or suggest QSAR models for compounds divided based on ionization status or pK a values.
  • the descriptor donor (electrophilic) superdelocalizability on oxygen atoms have been successfully used to model androgen receptor binding (Todorov et al. 201 1 ). However, the article does not indicate nor suggest that the donor (electrophilic) superdelocalizability on nitrogen atoms may be successful for modeling inhibition of the un-related hERG ion channel.
  • the present invention provides a method for prediction of hERG binding and/or inhibition which is based on a relatively low number of descriptors, and which at the same time gives a high predictive performance and transparency.
  • a high transparency is particularly favourable in the design of new drugs with low cardiotoxicity, since provides a simplified interpretation of the attributes of a compound which influence hERG binding and/or inhibition activity.
  • the present invention in one aspect provides a method for developing a predictive model of hERG channel inhibiting activity of chemical substances, wherein the predictive model is obtained by training on a set of compounds which is divided based on ionization.
  • the invention relates to a method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds, wherein the method comprises the use of a predictive model.
  • the predictive model may be as herein defined.
  • a prediction method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds uses a combination of all descriptors a) to c):
  • the prediction method uses a combination of the below descriptors a) to c):
  • the prediction method uses a combination of descriptors comprising: a) a structural descriptor of maximum conformer effective diameter on one or more conformers of a chemical compound (MaxDiamEff), and
  • the invention relates to a computer-assisted method or prediction model further defined as in the present application.
  • an acid is meant as defined by the terms conventionally used in the art.
  • the strength of an acid is commonly described by use of its dissociation constant K a or the negative logarithm of the dissociation constant, pK a .
  • the larger value of pK a the smaller the extent of dissociation at any given pH.
  • a compound is an acid if it has one or more acidic ionogenic groups which has pK a less than 16.
  • a weak acid has a pK a in the range of -2 to 16 in water. Strong acids are almost completely dissociated in water and have a pK a less than -2 as determined experimentally or theoretically.
  • AD of a model is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.
  • AD of a QSAR model is defined on the basis of the training set which has been used for developing the QSAR model.
  • a base is meant as defined by the terms conventionally used in the art.
  • a compound is a base if it has one or more basic ionogenic groups which has pK b below 16.
  • the strength of a base is commonly described by use of its dissociation constant K b or the negative logarithm of the dissociation constant, pK b .
  • the larger value of pK b the smaller the extent of dissociation at any given pH.
  • Binary classification model A model that predicts two categorical values, such as for example 1 ) being a hERG ion channel inhibitor or, 2) not being a hERG ion channel inhibitor.
  • Half maximal inhibitory concentration A measure of the effectiveness of a compound in inhibiting biological or biochemical function. This quantitative measure indicates how much of a particular drug or other substance is needed to inhibit a given biological process by half. In other words, it is the half maximal (50%) inhibitory concentration (IC) of a substance (50% IC, or IC 50 ).
  • the biological process is defined by the functionality of the hERG channels, for example the transportation of potassium across a membrane, or such as the repolarization of the l kr current in the cardiac action potential.
  • the IC 50 of hERG can for example be measured by use of conventional techniques in the art such as patch clamp assays of mammalian cell lines expressing hERG or radioligand binding assays.
  • QSAR A quantitative structure-activity relationship (QSAR) model uses descriptors (predictor variables derived from physico-chemical properties or theoretical molecular descriptors of chemicals) for prediction of activity of compounds. For example in the case of a hERG inhibition QSAR model, a regression QSAR model relates predictor variables (descriptors) to the hERG binding or inhibition of a compound (IC 5 o, K, or % inhibition). A classification QSAR model relates the predictor variables (descriptors) to a categorical value of the response variable.
  • the descriptors can be related to a value of hERG inhibition activity, such as for example hERG IC 5 o- In the case of a binary hERG inhibition QSAR classification model, the descriptors can be related to a value of 1 ) being a hERG inhibitor or, 2) not being a hERG inhibitor.
  • Zwitterionic ampholyte An amphoteric compound (zwitterionic compound) with both acidic and basic ionogenic groups and wherein the pKa of the acidic group (pKa (acidic)) is less than pKa of the basic group (pKa (basic)), thus pKa (acidic)) ⁇ pKa (basic).
  • Conformer effective cross-sectional diameter A descriptor that is also called DiamEff.
  • the descriptor is defined as the diameter of the least-diameter cylinder containing the conformer (this parameter depends on the conformation, therefore the three- dimensional coordinates of all atoms are used for calculating the conformer effective cross-sectional diameter).
  • the maximum of DiamEff (MaxDiam Eff) over several conformers is a measure of both size and flexibility of the whole structure of a chemical compound. We use the implementation of DiamEff in OASIS Database Manager 1 .7.3 (http://oasis-lmc.org) calculated according to the following definition.
  • DiamEff m l is a line in R3 max ⁇ d (l, c) ⁇ c E 1 ⁇ (1 ) where R 3 is the three-dimensional Euclidean space of real numbers and d(x,y) denotes the Euclidean distance between a line and a point y in R 3 (the definition of a smallest encompassing cylinder can be found e.g. in (Schomer et al. 2000)).
  • Donor (electrophilic) superdelocalizability D E The atomic descriptor donor (electrophilic) superdelocalizability is a variant of reactivity indices in the Huckel molecular orbital scheme and was originally defined by (Fukui et al. 1954) and implemented into MOPAC (Stewart 1990 and 1993, http://openmopac.net/manual/super.html) and used in the Oasis DatabaseManager system (http://oasis-lmc.org) to calculate the capability of atoms to make covalent bonds by donation of electrons.
  • the donor (electrophilic) superdelocalizabilities are calculated according to the method described in (Schuurmann 1990A and Schuurmann 1990B:
  • the maximum of the atomic descriptor donor (electrophilic) superdelocalizability D E (where the maximum is calculated among all nitrogen atoms of the conformer) will be called the conformational maximum of donor (electrophilic) superdelocalizability and denoted by D E Co nformer-
  • D E Co nformer on all available conformers i.e. a set of one or more conformers
  • D E str ucture the structural maximum of donor (electrophilic) superdelocalizability
  • D E Co nformer is a conformational descriptor and D E structure is a structural descriptor.
  • Decision tree classifier A non-parametric machine learning technique.
  • a decision tree can be used as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.
  • observations regarding an item are for example various descriptors of the structure and physico-chemical nature of a compound.
  • Minimal diameter of a specific conformation of a conformer (DiamMin): The descriptor is defined as the minimum distance between two parallel planes circumscribing a conformer (Dimitrov et al 2003, see also Brooke and Cronin 2009).
  • DiamMin is defined by formula 4:
  • DiamMin min d 2 — d
  • DiamMax Maximum diameter of a specific conformation of a conformer
  • Van der Waals surface area of a specific conformer (VAN_D_WAALS_SUR.): Is defined in by conventional methods as the area of a surface formed by the spheres of van der Waals radii around the atoms of the conformer (as described in Meyer 1985). The Van der Waals surface area is calculated by conventional algorithms including tables for the radii of different atoms. One such algorithm is presented in (Gaudio and Takahata 1992).
  • Self-polarizability of a nitrogen atom POLAR
  • Self-polarizability TT s (r) of an atom r in a conformer was introduced as a reactivity measure for ⁇ electron systems by (Coulson and Longuet-Higgins 1947) ; the all-valence electron formula is defined by formula 6:
  • c ai is the linear combination of atomic orbitals - molecular orbitals (LCAO-MO) coefficient of atomic orbital ⁇ , at center r in the molecular orbital / ' , ⁇ 3 ⁇ 4 is the energy of the /- th molecular orbital and a is according to formula 3 defined as the average of the HOMO and LUMO energies.lt will be understood that the self-polarizability of a nitrogen atom can be calculated from the above formula where r is nitrogen.
  • LCAO-MO atomic orbitals - molecular orbitals
  • POP_ LUMO Population Lowest Unoccupied Molecular Orbital energy
  • Predictive models are commonly used in drug- design today as a low-cost tool for screening chemical compounds for biological activity. Predictive modelling is particularly useful for prediction of toxicity. Early-stage screening of the toxicity of compounds can potentially focus the drug development on compounds which have relatively low toxicity, and thereby avoid large costs and time spent in the development of drugs which are later found to be toxic in clinical trials.
  • the predictive quality of models for chemical compounds is closely connected to the quality and number of data available for training, and the complexity of the interaction between the target molecule (for example a receptor or an ion channel) and the binding compound (the ligand). A low quality of data in combination with a high complexity of the interaction results in low predictive performance. Therefore, methods for increasing the quality of the data and/or reducing the complexity of the modelled binding are beneficial in the development of models for prediction.
  • the hERG ion channel is a promiscuous receptor and is a target for compounds of highly variable structure and with diverse physical-chemical properties. This feature of the hERG ion channel increases the complexity of predictive models for hERG inhibition, and makes the task of developing high-perfomance predictive models more difficult.
  • the inventors of the present invention have surprisingly found that the division of a training set according to the acidic and/or zwitterionic status of compounds can effectively reduce the complexity of the predictive model of hERG ion channel inhibition and lead to high predictive performance and/or a reduction of the number of descriptors of a model developed from the training set.
  • the IC 50 results may not be the optimal basis for making a so-called continuous model, since experimental errors lead to different results, which will have an influence on training of a predictive model.
  • the binary classification approach has two possible response variable outcomes, i.e. either a positive prediction (for example equivalent to a hERG inhibiting activity) or a negative prediction (for example equivalent to a no inhibiting activity). This is in contrast to the continuous prediction approach where the exact IC 50 value of a compound is predicted.
  • the present inventors have found that the use of a binary classification model is useful for prediction of hERG ion channel inhibition.
  • the predictive model is a binary classification model.
  • Such a binary classification model can be developed by use of a binary decision tree classifier.
  • the predictive model is developed by use of a binary decision tree classifier.
  • See5 is a state-of-the-art classifier construction system (Quinlan (1993 and 1997)) using decision trees, a non-parametric machine-learning technique. See5 and its predecessors use formulas based on information theory to evaluate the "goodness" of a test; in particular, they choose the test that extracts the maximum amount of information from a set of cases, given the constraint that only one attribute is tested. To this end, the entropy criterion formula 7 is used:
  • N is the total number of observations
  • k the number of classes
  • n y is the number of observations belonging to each class.
  • the entropy of an information item is a measure of its randomness or uncertainty or can be taken as a measure of the average amount of information that is supplied by the knowledge of the information item.
  • the decision tree is based on the See5 algorithm or predecessors of See5 as described above.
  • the training set for developing a model of hERG channel inhibiting activity consists of a set of active chemicals (below a certain threshold for IC50), a set of inactive chemicals (over a certain, possibly different from the first, threshold for IC50) and a set of chemicals of marginal activity (with IC50 between the two thresholds, in case they are different).
  • the set of marginals may or may not be used for training. Introducing a set of marginals may also be beneficial for the model performance because a single theoretical breakpoint between active chemicals and inactive chemicals is difficult to define, and specifically in the cases where there is variation in the experimental test results for hERG blocking affinity.
  • a negative prediction (related to a compound that is not a hERG ion channel inhibitor) is associated with a hERG IC 50 ⁇ 40 ⁇ and/or a positive prediction (related to a compound that is a hERG ion channel inhibitor is associated with a hERG IC 50 ⁇ 10 ⁇ .
  • descriptors of physic-chemical properties of compounds are known in the art and often used in predictive models of hERG channel inhibition and in QSAR models in general. Such descriptors may be useful in methods of the present invention.
  • the three-dimensional structure of a ligand and/or its target is commonly used for modelling of binding or biological activity since three-dimensional structure often holds important information regarding the properties of the modelled interaction.
  • the three- dimensional conformation of most molecules varies in aqueous solution and often there are multiple stable conformers of the same ligand found close to the conformer of the lowest energy. This means that multiple conformers can potentially be energetically favourable for binding.
  • the conformational variation of a chemical compound can be calculated by use of a number of conventional methods in the field, for example freely available tools such as BALLOON, CONFAB, FROG2, and RDKIT, and commercial tools such as OMEGA, Catalyst and MOE, or as was done for the present invention by use of the GAS algorithm (Mekenyan 2005).
  • the GAS algorithm is a method for coverage of the conformational space of highly flexible chemicals by a limited number of conformers.
  • the GAS algorithm employs a genetic algorithm to minimize 3D similarity among the generated conformers. This makes the problem computationally feasible even for large, flexible molecules, at the cost of non-deterministic character of the algorithm.
  • the fitness of a conformer is not quantified individually, but only in conjunction with the population it belongs to.
  • the approach handles the following stereo-chemical and conformational degrees of freedom: rotation around acyclic single and double bonds, inversion of stereo-centers, flip of free corners in saturated rings, reflection of pyramids on the junction of two or three saturated rings.
  • the fitness function based on maximization of RMS distance between conformers is combined with Shannon function accounting for evenness of conformer distribution across conformational space and a procedure is included for automated determination of the number of conformers needed for an appropriate coverage of conformational space (Mekenyan 2005).
  • PMM strain-relief procedure
  • Geometry optimization of conformers is further completed by quantum-chemical methods.
  • MOPAC 93 (Stewart 1990 and 1993) is employed by making use of the AM1 Hamiltonian.
  • the conformers are screened to eliminate those whose heat of formation, ⁇ , 0 , is greater from the ⁇ , 0 associated with the conformer with absolute energy minimum by more than a specified threshold (the default value used by the OASIS Database Manager software is 20 kcal/mol). Subsequently, conformational degeneracy, due to molecular symmetry and geometry convergence is detected within a user defined torsion angle resolution (Mekenyan 2005).
  • the set of conformers generated for a given 2D structure can be used as an approximation for the entire conformational variety of the structure and used to formulate and test hypotheses about it.
  • maximum and minimum values of conformational parameters e.g. effective cross-sectional conformer diameter
  • the term “all conformers” or “all available conformers” denote a set, a selection or group of conformers that is representative for the conformational variance of a given chemical compound or structure.
  • Such a set of conformers may according to the present invention consist one or more conformers, such as 1 to 500 conformers, or such as 1 to 200 conformers, or such as 1 to 100 conformers, or such as 1 to 50 conformers, or such as 1 to 30 conformers, or such as 1 to 15 conformers, or such as 1 to 10 conformers, or such as 1 to 5 conformers; at least 5 conformers such as 5 to 10 conformers, or such as 5 to 30 conformers, or such as 5 to 50 conformers, or such as 5 to 100 conformers, or such as 5 to 200 conformers, or such as 5 to 500 conformers; at least 10 conformers such as 10 to 30 conformers, such as 10 to 50 conformers, or such as 10 to 100 conformers, or such as 10 to 200 conformers, or such as 10 to 500 conformers; at least
  • Atomic descriptors are used to describe attributes of a specific atom or a specific type of atom in a specific conformation of a chemical structure. Atomic descriptors may have different values for different atoms in the same conformer or different values for different conformers of the same structure. Thus, in one embodiment of the present invention, the predictive model uses one or more atomic descriptors, and/or one or more descriptors derived from atomic descriptors.
  • the prediction method uses one or more atomic descriptors and/or one or more descriptors derived from atomic descriptors for example such as one or more atomic descriptors selected from the group consisting of donor (electrophilic) superdelocalizabilities, acceptor (nucleophilic) superdelocalizabilities, atomic self- polarizability,.
  • atomic descriptors may be calculated using conventional protocols in the field, for example as implemented by OASIS Database Manager v. 1 .7.3 described in http://oasis-lmc.org and Nikolov et al. 2006 or Molecular Orbital PACkage (MOPAC) described in http://openmopac.net/manual.
  • the prediction method uses one or more atomic descriptors and/or one or more descriptors derived from atomic descriptors, wherein such atomic descriptors are for example selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR), and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).
  • atomic descriptors are for example selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR), and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).
  • Donor DLC donor (electrophilic) superdelocalizability
  • POLAR atomic self-polarizability
  • POP_LUMO partial electron densities of the nitrogen atom in the frontier orbital
  • the prediction method uses a descriptor of the reactivity of nitrogen atoms in a set of one or more conformers such as one or more atomic descriptors and/or one or more descriptors derived from a selected type of atomic descriptors, wherein said atomic descriptor is calculated on the nitrogen atoms of each conformer in a set of one or more conformers, and wherein said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizabilities (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).
  • a descriptor of the reactivity of nitrogen atoms in a set of one or more conformers such as one or more atomic descriptors and/or one or more descriptors derived from a selected type of atomic descriptors, wherein said atomic descriptor is calculated on the nitrogen atoms of each conformer in a
  • said atomic descriptor is calculated on the nitrogen atoms of each conformer in a set of one or more conformers, and said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizabilities (Donor DLC) and atomic self-polarizability (POLAR).
  • Conformational descriptors include for example descriptors of volume, surface descriptors, frontier molecular orbitals energies, geometric indices such as effective cross-sectional diameter, maximum distance between atoms (maximum diameter), planarity index, polarizability, electronegativity, heat of formation, geometric topological indices calculated for each individual conformer or for the ensemble of conformers are used in a model for prediction of hERG ion channel inhibition.
  • Such conformational descriptors may be calculated using conventional protocols in the field, for example as implemented by OASIS Database Manager v. 1 .7.3 described in http://oasis-lmc.org and Nikolov et al. 2006.
  • the prediction methods use a descriptor of the size of one or more conformers such as one or more conformational descriptors and/or one or more descriptors derived from conformational descriptors, wherein said conformational descriptors are selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Van der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff).
  • a descriptor of the size of one or more conformers such as one or more conformational descriptors and/or one or more descriptors derived from conformational descriptors, wherein said conformational descriptors are selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Van der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff).
  • the prediction method uses a descriptor of the size of one or more conformers and the descriptor is the effective cross-sectional diameter (DiamEff) or a descriptor derived from the effective cross-sectional diameter(DiamEff) of a set of conformers, such as for example MaxDiamEff.
  • O oxygen
  • N nitrogen
  • C carbon
  • the prediction model uses maximum of atomic descriptors on all atoms of a given conformer such as for example selected from the group consisting of donor (electrophilic) superdelocalizability, acceptor (nucleophilic) superdelocalizability, atomic self-polarizability, atomic charge, all atomic descriptors that can be calculated using conventional methods in the field such as the protocols implemented in the OASIS Database Manager v. 1 .7.3 as described in http://oasis-lmc.org and Nikolov et al. 2006.
  • the prediction model uses maximum of atomic descriptors of a given conformer selected from atomic descriptors of oxygen (O), nitrogen (N) and/or carbon (C) atoms.
  • the prediction model uses the maximum of atomic descriptors of the acceptor superdelocalizability D N of a given conformer and/or the maximum of the donor (electrophilic) superdelocalizability D E of a given conformer calculated on all atoms of a given conformer or selected from atomic descriptors of oxygen (O), nitrogen (N) and/or carbon (C), wherein the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of a given conformer (D E conformer) is more preferred.
  • the prediction model uses one or more conformational descriptors derived by taking the maximum of the values of a selected atomic descriptor on all nitrogen atoms of a given conformer, wherein said atomic descriptors are selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).
  • the prediction model uses a conformational descriptor derived by taking the maximum of donor (electrophilic) superdelocalizability (Donor DLC) descriptors on all nitrogen atoms of a given conformer. Structural descriptors
  • Structural descriptors describe attributes of a chemical structure as a whole rather than of individual conformations or atoms.
  • the predictive model thus uses one or more structural (two-dimensional) descriptors, such as for example lipophilicity, topological index (InfoWiener) related to the sum of interatomic distances, counts of the numbers of atoms, bonds, and rings, as well as acidic association constants (pK a ) and basic dissociation constancts (pK b ).
  • structural descriptors may be calculated by using conventional methods in the field, for example such as by using protocols implemented in OASIS Database Manager v. 1 .7.3 by Laboratory of Mathematical Chemistry, University of Bourgas, Bulgaria (http://oasis- lmc.org, Nikolov et al. 2006).
  • the ionization state of a chemical compound can influence the hERG ion channel inhibition. Tendencies of ionization are reflected in acidic dissociation constants, basic dissociation constants, pK a , pK b .
  • the predictive model uses at least one structural descriptor of the ionization state of a chemical compound or conformer such as for example one or more descriptors of the selected from the group consisting of acidic dissociation constants, basic dissociation constants, pK a and pK b .
  • the predictive model uses a descriptor of pK a (acidic).
  • the predictive model uses a descriptor of pK a (acidic), wherein the acidic dissociation constants, basic dissociation constants, pKa, pKb are calculated by using protocols of ACD/ToxSuite 2.95 by ACD/Labs as described in ((http://www.acdlabs.com/ products/admet/tox/)).
  • Such a descriptor will then be an attribute of a whole structure and not only of an individual conformer; therefore it is used as a conformational descriptor.
  • conformational descriptor used for derivation of a structural descriptor may be in turn derived from an atomic descriptor using the procedure above.
  • the prediction model uses one or more structural descriptors derived from conformational descriptors by taking the maximum and/or the minimum of a conformational descriptor wherein the conformational descriptors are for example selected from effective cross-sectional conformer diameter, included volume and surface descriptors, frontier molecular orbitals energies, geometric indices such as effective cross-sectional diameter, maximum diameter, planarity index, polarizability, electronegativity, heat of formation, geometric topological indices.
  • the predictive model uses a structural descriptor of the maximum of a conformational descriptor, such as for example the maximum of the descriptor of effective cross-sectional conformer diameter (DiamEff).
  • Descriptors derived from conformational descriptors of the size of a compound are for example structural descriptors derived by taking the maximum value of a conformational descriptor of the size of a conformer calculated on a set of one or more conformers. Such structural descriptors are then a descriptor of the size of a compound.
  • the predictive method of the present invention uses a structural descriptor derived by taking the maximum of a selected type of conformational descriptor on a set of one or more conformers, wherein said conformational descriptor is selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Van der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff).
  • Such structural descriptors are thus selected from the group consisting of Max DiamMin, MaxDiamMax, MaxVAN_DWAALS_SUR and MaxDiamEff.
  • the predictive method uses a structural descriptor (Max DiamEff) derived by taking the maximum of the effective cross-sectional diameter (DiamEff) for a set of one or more conformers.
  • the predictive method uses a structural descriptor derived by taking the maximum of a selected type of conformational descriptor on a set of one or more conformers, wherein said conformational descriptor is derived taking the maximum of a selected type of atomic descriptor on all nitrogen atoms of a given conformer, and wherein said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).
  • Such a structural descriptor is then a structural descriptor (derived from an atomic descriptor) of the reactivity of nitrogen atoms in a set of one or more conformers.
  • a structural descriptor is derived from the descriptor Donor DLC it is called MaxDonor DLC or D E str ucture herein.
  • a structural descriptor is derived from the descriptor POLAR it is called MaxPOLAR herein.
  • POP_LUMO it is called MaxPOPJJJMO herein.
  • the prediction method can use a descriptor of the reactivity of nitrogen atoms in a set of one or more conformers which is either a) an atomic descriptor, or b) a conformational descriptor, or c) a structural descriptor.
  • descriptors derived from atomic descriptors are for example a) conformational descriptors derived by taking the maximum value of atomic descriptors calculated on all nitrogen atoms in a conformer or b) structural descriptors derived by taking the maximum value of a) in a set of one or more conformers.
  • the prediction method uses a descriptor of the reactivity of nitrogen atoms which is:
  • the predictive model uses a structural descriptor of the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of all available conformers of a given chemical compound or structure
  • the prediction method uses a combination of all descriptors a) to c) below:
  • the prediction method uses a combination of all the descriptors a) to c) below:
  • the descriptor of the size of a conformer or a compound is a conformational descriptor selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Vander der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff), or a structural descriptor selected from the group consisting of Max DiamMin, MaxDiamMax, MaxVAN_D_WAALS_SUR and MaxDiamEff, and wherein the structural descriptors of the size are most preferred, and wherein the descriptors of the reactivity on nitrogen atoms is a structural descriptor derived from a selected type of atomic descriptor calculated on all nitrogen atoms of one or more conformers, said descriptor of the reactivity on nitrogen atoms being selected from the group consisting of D E st ructure, MAX POLAR, and MAXPOP_LUMO.
  • the prediction method uses a combination of: a) a descriptor of the size of one or more conformers, and
  • descriptors of the size of one or more conformers is a structural descriptor selected from the group consisting of Max DiamMin, MaxDiamMax,
  • the descriptors of the reactivity on nitrogen atoms is a structural descriptor derived from a selected type of atomic descriptor calculated on all nitrogen atoms of one or more conformers selected from the group consisting of D E st ructure, MAX POLAR, and MAXPOP LUMO.
  • the prediction method of the present invention uses a combination of the descriptors a) to c) comprising:
  • the predictive model or method uses a combination of all descriptors a to c below :
  • the predictive model uses a combination of descriptors comprising:
  • the predictive model uses a combination of all descriptors a to c below:
  • MaxDiamEff maximum of a conformational descriptor of effective cross-sectional conformer diameter calculated for all available conformers of a given chemical compound or structure
  • a predictive model of hERG channel inhibition can be developed which only uses 1 to 10 descriptors, such as 1 to 5 descriptors, preferably such as 1 to 3 descriptors, such as for example 1 descriptor or 2 descriptors or 3 descriptors.
  • rules may be useful which are based on one or more of the descriptors 1 ) if the compound comprises a nitrogen atom, 2) the pKa (acidic) of the compound, 3) the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms, 4) the maximum conformer effective cross-sectional diameter.
  • Predictive thresholds are values of descriptors can be used to associate a given compound with an biological activity, such as for example hERG ion channel inhibitors and non-inhibitors.
  • Predictive thresholds according to the present invention may vary depending on the data used for training of the methods.
  • the predictive threshold of pKa acidic
  • the predictive threshold of pKa may be in the range of about 0 to about 16, more preferably such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6, wherein a value of about 5 is most preferred.
  • the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E st ructure) may be in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3 a.u./eV, such as about 0.25 a.u./eV to about 3 a.u./eV, such as about 0.26 a.u./eV to 0.28 a.u./eV, such as about 0.265 a.u./eV, such as about 0.27 a.u./eV, such as about 0.275 a.u./eV to 0.280 a.u./eV, such as about 0.275 a.u./eV, or such as about 0.276 a.u./eV, or such as about 0.277 a.u./eV,
  • the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure may be in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A, such as about 9 A, or such as about 10 A, such as 10 A to 10.5 A, such as about 10.1 A, or such as about 10.2 A, or such as about 10.3 A, or such as in the range of 10.3 A to 10.4 A, such as about 10.31 A, or such as about 10.32 A, or such as about 10.33 A, or such as about 10.35 A, or such as about 10.36 A, or such as about 10.37 A, or such as about 10.38 A, or such as about 10.39 A, or such as about 10.4 A, or such as about 10.5 A, or such as about 1 1 A.
  • a predictive model is used wherein the predictive threshold of the pKa (acidic) may be in the range of about 0 to 16, such as about 2 to 8; and the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E st ructure) may be in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3; and the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiam Eff) may be in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A.
  • MaxDiam Eff maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure
  • a predictive model is used wherein the predictive threshold of the pKa (acidic) is in the range of about 2 to 8, preferably about 5; and the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E str ucture) may be in the range of about 0.275 a.u./eV to 0.280 a.u./eV, more preferably about 0.278 a.u./eV; and the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiam Eff) may be in the range of 9 A to 1 1 A, more preferably in the range of about 10.3 A to about 10.4 A, wherein a value of about 10.36 A is preferred.
  • the predictive model is a classification decision tree using a rule defined as:
  • a positive prediction is returned if all conditions a), b), c) and d) are fulfilled, 2) A negative prediction is returned if condition a) is fulfilled, and one or more of the conditions b), c) and d) are not fulfilled,
  • the compound comprises a nitrogen atom
  • the predictive model is a classification decision tree using a rule defined as:
  • a negative prediction is returned if condition a) is not fulfilled, and one or more of conditions e) and f) is fulfilled,
  • the compound comprises a nitrogen atom
  • the predictive model is a classification decision tree using rules defined as:
  • a positive prediction is returned if all conditions a), b), c) and d) are fulfilled,
  • condition a) is fulfilled, and one or more of the conditions b), c) and d) are not fulfilled,
  • condition a) is not fulfilled, and one or both of conditions b) and d) are not fulfilled
  • the compound comprises a nitrogen atom
  • the predictive model is a classification decision tree using both rules defined and requiring that in case of the presence of nitrogen atoms both the maximum conformer effective cross-sectional diameter and the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms condition should be fulfilled in the same conformer(s) in order for a positive prediction to be generated:
  • a positive prediction is returned if the all conditions a), b) and c) are fulfilled,
  • condition a) is not fulfilled, and one or both of conditions b) and d) are not fulfilled
  • the compound comprises a nitrogen atom
  • the applicability domain (AD) of a predictive model defines the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.
  • a predictive model may have an AD comprising compounds that are acids and zwitterions.
  • a predictive model has an AD confined by compounds having
  • the AD of a predictive model is confined by compounds having 1 .3 ⁇ pKa (acidic) ⁇ 16.
  • a predictive model is a method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds.
  • the present invention provides a predictive model which has an AD comprising compounds that have a maximum conformer effective cross-sectional diameter (MaxDiamEff) ⁇ 18.78A.
  • AD is confined by compounds having 6.38A ⁇ Maximum conformer effective cross-sectional diameter (MaxDiamEff) ⁇ 18.78A.
  • the present invention provides a predictive model which has an AD comprising compounds that have either:
  • the AD of the predictive models of the present invention includes compounds having 1 .3 ⁇ pKa (acidic) ⁇ 16, 6.38A ⁇ Maximum conformer effective cross-sectional diameter (MaxDiamEff) ⁇ 18.78A, and either:
  • the present invention provides methods for use in the development of predictive models of hERG ion channel inhibition as well as predictive models of hERG ion channel inhibition.
  • methods and models are assisted by a computer.
  • methods and models may be located on conventional computer storage media.
  • one aspect of the present invention is a computer-assisted prediction method or predictive model as defined herein.
  • Another aspect of the invention is a computer program product comprising a computer- assisted prediction method as described herein.
  • Another aspect of the invention is a data carrier comprising a computer-assisted method as described herein.
  • Example 1 Preparation of data sets for training and validation of a predictive model of hERG inhibition
  • Duplicate structures and stereoisomers were identified using the concept of parent 2D structure.
  • the parent 2D structure was taken to be the original 2D structure without any stereo information; for salts, the parent structure was then generated by removing the relevant (inorganic and small organic) counterions.
  • acid and base pK a constants were calculated using the default algorithm in ACD Labs ACD/ToxSuite 2.95 (http://www.acdlabs.com/ products/admet/tox/) as well as the other available algorithm in the same system, marked as pKa/ACDLabs.
  • the macrodissociation constants were predicted for standard conditions (25 ⁇ and zero ionic strength) in aqueous solutions by a proprietary algorithm that uses microconstants predictions at the corresponding protonation sites. The algorithm is based on an internal training set of 17593 compounds (http://www.acdlabs.com/ products/admet/tox/).
  • the values of the pKa calculated by the default algorithm were compared the pKa calculated by the alternative algorithm in ACD Labs ToxBoxes 2.95, pKa-ACD/Labs; in case both algorithms found an acidic ionogenic group, the difference between the pKa values according to the two versions were required not to exceed 8, otherwise the pKa value was considered unreliable.
  • the resulting data set consisted of 1718 experimental data points, of which 1 21 5 were hERG inhibitors (IC 50 ⁇ 1 0 ⁇ ) and 503 non-inhibitors (IC 50 ⁇ 40 ⁇ ). It was prior to any performance of modeling randomly split into a training set T (1 374 chemicals, or 80% of the data set) and a validation set V ! with the remaining 20% (344 chemicals). The ratio of hERG blockers to hERG non-blockers was maintained in the random selection for both the training and the validation sets.
  • a second validation set was compiled from the training chemicals of the predictive model for hERG blocking included in ACD/Labs ACD/ToxSuite 2.95 (http://www.acdlabs.com/ products/admet/tox/, Juska 2008). Salts and mixtures were identified and contradictory experimental results and duplicates were removed in the same way as for the main data set.
  • the parent structure for each of these chemicals was compared to all parent structures in Ti and ; if any match was found, the structure was ignored.
  • a set V 2 was thus constructed, having no structures in common with either Ti or ; moreover, no structures from the latter two sets were stereoisomers of, or salts of the same parent structure as any of the structures in V 2 .
  • the set V 2 contained 242 chemicals, 1 25 of them actives (hERG IC 50 ⁇ 10 ⁇ ) and 1 17 inactives (hERG IC 50 ⁇ 10 ⁇ ). Note the different inactivity threshold compared to the training and the first validation sets. Liste.
  • Example 1 the training and/or validation sets described in Example 1 were used, but other conventional methods of generation of conformers may also be useful for the present invention.
  • Example 3 Calculation of structural, conformational and atomic descriptors for training and validation of a predictive model of hERG inhibition
  • Example 1 the data sets of Example 1 were used and conformers were previously calculated as described in Example 2.
  • the below calculation of descriptors may be done on any other data set of chemical compounds and with any conformers calculated by use of conventional methods in the field.
  • Three groups of descriptors were calculated from the data sets of Example 1 and the conformers as described in Example 2, and used in the descriptor selection.
  • Structural descriptors included lipophilicity, a topological index (InfoWiener) related to the sum of interatomic distances, counts of the numbers of atoms, bonds, and rings, as well as acidic and basic pKa.
  • Conformer descriptors (different values for the different conformers of the same structure) included volume and surface descriptors, frontier molecular orbitals energies, geometric indices such as effective cross-sectional diameter, maximum diameter, planarity index, polarizability, electronegativity, heat of formation, geometric topological indices etc.
  • Atomic descriptors (different values for each atom of each conformer of each structure) included donor (electrophilic) and acceptor (nucleophilic) superdelocalizabilities, atomic self-polarizability, atomic charge and others. The full list of descriptors is presented in Table 1 below.
  • POP_LUMO Q, VWACWN, VWACWP, VWPNSA, VWPPSA.
  • Bond_Order_Hlg C A LC ._H E AT FO R M . , D_max, DiamEff, DiamMax, DiamMin, DIPOLE MOMENT, E GAP, E_HOMO, ELECTRONEGATIVITY, Electrophilicity, E_LUMO, GEOM._INFO_WIENER, GEOM.
  • ACD/ToxSuite 2.95 by ACD/Labs http://www.acdlabs.com/ products/admet/tox/ was used to calculate the acidic and basic dissociation constants. All other descriptors were calculated using OASIS Database Manager v. 1 .7.3 (http://oasis-lmc.org).
  • Example 4 Generating conformational descriptors from atomic descriptors, and structural descriptors from conformational descriptors
  • Example 1 the data sets of Example 1 , conformers of Example 2 and descriptors of Example 3 were used.
  • the below calculation of descriptors may be done on any data set of chemical compounds, with any conformers calculated by use of conventional methods in the field, and with any set of descriptors calculated for a given chemical structure or conformer.
  • MaxDiamEff the maximum of DiamEff, the effective cross-sectional conformer diameter of a conformer
  • the maximum and minimum (on all conformers) of the maximum D E taken on all nitrogen atoms of a conformer etc.
  • Example 5 Derivation of a binary prediction model for hERG ion channel inhibition of acids and zwitterions
  • a subset of chemicals T A was selected from the training set TV
  • the subset T A consisted of 153 chemicals with at least one acidic ionogenic group and either no basic ionogenic groups at all or pKa(acidic) ⁇ pKa(basic) (acids and zwitterionic ampholytes (AZA)).
  • See5 is a state-of-the-art classifier construction system using decision trees, a non-parametric machine-learning technique.
  • the See5 algorithm (Quinlan (1993 and 1997)) is the latest version of the ID3 and C4.5 algorithms developed by the same author. See5 and its predecessors use formulas based on information theory to evaluate the "goodness" of a test; in particular, they choose the test that extracts the maximum amount of information from a set of cases, given the constraint that only one attribute is tested.
  • the entropy criterion as described previously in formula 3 is used, where N is the total number of observations, k the number of classes and is the number of observations belonging to each class.
  • the entropy of an information item is a measure of its randomness or uncertainty or can be taken as a measure of the average amount of information that is supplied by the knowledge of the information item.
  • a confidence value of 5% was used in order to enhance the reliability of the derived rules.
  • the decision tree was required to have a large enough minimum leaf size.
  • a series of decision tree models was produced with different settings for this parameter.
  • the number of actives in T A was 35. Using more than 35 chemicals as the minimum leaf size resulted in no parameters being selected and the trivial classifier being built (classifying all structures as positive).
  • a prediction is positive (hERG IC 50 ⁇ 10 ⁇ ) if
  • D E conformer is the maximum donor (electrophilic) superdelocalizability D E calculated at all nitrogen atoms of the conformer
  • the performance of the derived rule was estimated on the training set of observations as well as on two independent external validation sets.
  • the training set T A consisted of 153 experimental data points, of which 35 were hERG inhibitors (IC 50 ⁇ 10 ⁇ ) and 1 18 non-inhibitors (IC 50 ⁇ 40 ⁇ ).
  • Validation set V 1A consisted of 35 experimental data points, of which 8 hERG inhibitors and 27 non- inhibitors. This validation set consisted of 20% of chemicals from the initial data set taken off randomly for validation while preserving the inhibitor/non-inhibitor ratio, as described in Section 2.3.
  • a subset of chemicals V 2A was selected from the validation set V 2 .
  • the subset V 2A consisted of 48 chemicals with at least one acidic ionogenic group and either no basic ionogenic groups at all or pKa(acidic) ⁇ pKa(basic).
  • the hERG inhibitors were defined as having hERG IC 50 ⁇ 10 ⁇ , while non-inhibitors (37) were defined as having hERG IC 50 ⁇ 10 ⁇ . Note the different inactivity threshold compared to the training and the first validation sets.
  • Sensitivity The ratio of true positives to all positives predicted positive or negative. Specificity: The ratio of true negatives to all negatives predicted positive or negative. Concordance: The ratio of true predictions to all predictions.
  • the conformerwise (C) application of the rule requires the existence of a conformer satisfying both the diameter and the nitrogen donor (electrophilic) superdelocalizability condition.
  • the structure may be required to have a DiamEff > 10.36 [A] (reached at some conformers) and D E str ucture > 0.278 [a.u./eV] (reached at possibly another subset of conformers), thus satisfying the maxima conditions.
  • the latter interpretation can have a certain value because the active conformer does not necessarily have to be the one with the largest effective diameter; MaxDiamEff is a measure of both size and flexibility of the structure.
  • C conformerwise
  • nitrogen-free AZA would trivially be predicted hERG non-inhibitors. This matches both T A and V 1A , where all nitrogen-free AZA have an IC 50 of over 40 ⁇ ; V 2A , however, contained three active chemicals of this type. Due to the scarcity of nitrogen-free hERG-inhibitors in the AZA set, statistically reliable rules for characterization of their hERG affinity were difficult to produce. Nevertheless, in view of the existence of such chemicals, we applied the following two domain restrictions to the hERG rule (other domain definitions will be considered below). The Nitrogen domain included only the nitrogen-containing AZA.
  • the Extended nitrogen domain included the nitrogen-containing AZA as well as any AZA with pKa ⁇ 5 or DiamEff ⁇ 10.36 A. The motivation for this was that these negative rules were derived on the entire set of AZA and did not relate to nitrogen. Within this domain definition, a nitrogen-free AZA would be considered in domain (and predicted negative) if it matches any of the two negative rules on size and pKa; otherwise, it would be considered outside of the model domain. Table 3 lists the results for all AZA as well as for the two domain definitions.
  • Table 4 presents the performance of the hERG rule for the T A and V 1A sets using the strict and the extended domain definition.
  • Table 6 shows the minima and maxima of the three model parameters, and the results from imposing this restriction on the model domain instead of the structural restriction are shown in Table 7.
  • the hERG blocking affinity (IC50 ⁇ 10 ⁇ ) of acids and zwitterionic ampholytes is described by an alert based on three descriptor ranges (pKa(acidic), D E st ructure and MaxDiamEff, where D E st ructure is the maximum donor (electrophilic) superdelocalizability, D E , calculated at all nitrogen atoms of all (available) conformers and MaxDiamEff is the maximum of the effective cross-sectional diameter.
  • Example 6 Comparison of the binary classification tree model to QSAR models of the same training set
  • the present example illustrates another example of the use of the method wherein the predictive model is obtained by training on a set of compounds which is divided based on ionization as presented herein.
  • the developed predictive model is not a binary decision tree classifier, but a QSAR model build by using Leadscope Predictive Data Miner.
  • the training set ⁇ described in Example 1 was also used to develop predictive QSAR models of hERG IC50 ⁇ 10 ⁇ vs. hERG IC50 ⁇ 40 ⁇ using Leadscope Predictive Data Miner by Leadscope Inc. (Cross et al. 2003, Valerio et al. 2010, http://www.leadscope.com).
  • Leadscope is a software for systematic substructural analysis of a compound set using predefined structural features stored in a template library.
  • the feature library contains approximately 27,000 structural features and the structural features chosen for analysis are motivated by those typically found in small molecules: aromatics, heterocycles, spacer groups, simple substituents.
  • the system can generate training set- dependent structural features (scaffolds), and it also estimates molecular descriptors for each structure: the octanol/water partition coefficient (AlogP), hydrogen bond acceptors, hydrogen bond donors, Lipinski score, atom count, parent compound molecular weight, polar surface area and rotatable bonds.
  • the model building process in Leadscope includes an automated procedure of structural feature and numeric descriptor selection (using t- and Yates' X2 statistic metrics).
  • the Leadscope algorithm for building QSAR models is based on structural features and numeric descriptors using partial logistic regression for a binary response variable.
  • the molecular structures were converted into SD format using OASIS Database Manager 1 .7.3 (http://oasis-lmc.org) and imported into Leadscope. The structures were then mined for the predefined structural features from Leadscopes template library by substructure analysis. The selection was done according to Yates' x 2 -test. In addition, eight molecular descriptors, the octanol-water partition coefficient (AlogP), hydrogen bond acceptors, hydrogen bond donors, Lipinski score, atom count, parent compound molecular weight, polar surface area and rotatable bonds, were calculated for each structure. Redundant features were removed using the least redundant feature option in Leadscope. Two predictive models were constructed using the above procedure.
  • the model L To tai (Leadscope Total) was built on the entire training set (T 1 ; 1336 chemicals).
  • the model L A (Leadscope AZA) was built on training set T A (acids and zwitterionic ampholytes only, 153 chemicals).
  • Both L Tota i and L A were estimated by cross-validation and by external validation using validation sets V! and V 2 (for L To tai) and V 1A and V 2A (for L A ).
  • the QSAR models were based on structural features and a small number of 2D calculated descriptors and were estimated by cross-validation and by external validation using the same validation sets as for the proposed alert. More specifically, the predictive models were developed based on the identified set of the structural features and the molecular descriptors, using partial logistic regression (PLR). Using the default mode recommended by Leadcope for the case of unbalanced training sets, three separate sub-models were developed based on three balanced training subsets, randomly selected so as to provide disjoint subsets of negatives. The sub-models were then combined into an overall ensemble model with all models assigned equal weights. The predictive performances of the overall model, was evaluated by 10-fold cross-validation, and by external validation.
  • PLR partial logistic regression
  • the applicability domain required that a compound had at least 30% Tanimoto structural similarity (the similarity coefficient was proposed by Jaccard in (Jaccard 1901 ) and independently by Tanimoto in 1957) with a training set compound.
  • the Tanimoto similarity was calculated based on fingerprints of the Leadscope features used for each of the models.
  • Compounds screened with the ensemble model were required to have 30% similarity with at least one training set compound in either sub-model.
  • predictions were required to have a positive prediction probability of over 0.7 for positives and less than or equal to 0.3 for negatives, rendering predictions with probabilities between 0.3 and 0.7 out of the domain.
  • the minimal diameter (DiamMin) of a conformer is defined as the minimum distance between two parallel planes circumscribing the molecule, and the maximum diameter (DiamMax) is defined as the diameter of the smallest sphere circumscribing the molecule (Dimitrov et al 2003, Brooke and Cronin 2009). These descriptors were calculated as defined herein.
  • Van der Waals surface area of a molecule can be defined in the usual way as the area of a surface formed by the spheres of van der Waals radii around the atoms of the molecule (see Meyer 1985).
  • the descriptor of Van der Waals surface area (VAN_D_WAALS_SUR) was likewise calculated as defined herein by using the proprietary algorithm implemented in OASIS Database Manager 1 .7.3, commercially available by the Laboratory of Mathematical Chemistry, University of Burgas, Bulgaria (http://oasis-lmc.org), which in turn uses the free software MOPAC to calculate some of its descriptors.
  • Each of these descriptors correlates with hERG blocking. While the value of the descriptors is in using them in combination with others, even when used alone, they show good agreement with hERG blocking on the training set of AZA as can be seen from Table 1 1 below:
  • D E stmcture is a structural descriptor of reactivity on the nitrogen atoms of a molecule.
  • Other similar descriptors of reactivity on the nitrogen atoms of a molecule may also be useful in the prediction methods according to the present invention.
  • the performance of other descriptors of reactivity on the nitrogen atoms of a compound calculated according to Examples 3 and 4 are given in Table 12 below.
  • each of these descriptors correlates with hERG blocking. While the value of the descriptors is in using them in combination with others, even when used alone, they show good agreement with hERG blocking on the training set of AZA.
  • Table 12 True False False True Sensitivi Specifici negativpositive- negativpositive- -ty -ty es es es es
  • a method for developing a predictive model of hERG channel inhibiting activity of chemical substances wherein the predictive model is obtained by training on a set of compounds which is divided based on ionization.
  • the method according to item 1 wherein the predictive model is obtained by training on a set of compounds selected from a group consisting of acids and/or zwitterionic compounds.
  • the predictive model uses one or more atomic descriptors, and/or descriptors derived from atomic descriptors.
  • the predictive model uses one or more conformational descriptors derived from one or more conformers of a chemical compound or structure.
  • the predictive model uses at least one descriptor of the conformer effective cross-sectional diameter.
  • predictive model uses one or more structural descriptors derived from conformational descriptors.
  • the predictive model uses at least one descriptor of the conformer effective cross-sectional diameter, such as the conformer effective cross-sectional diameter calculated on each conformers of a chemical compound (DiamEff), and/or the maximum conformer effective cross-sectional diameter calculated on all conformers of a chemical compound (MaxDiamEff).
  • the predictive method uses pKa (acidic) as a descriptor.
  • the predictive model uses a combination of descriptors comprising a descriptor of conformer effective diameter and a descriptor of donor (electrophilic) superdelocalizability on nitrogen atoms.
  • the predictive model uses a predictive threshold of pKa (acidic) in the range of about 0 to about 16, such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6.
  • pKa acidic
  • the predictive model uses a predictive threshold of the maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure in the range of about 5 A to 15 A, such as 9 A to 1 1 A, such as 10 A to 10.5 A, or such as in the range of 10.3 A to 10.4 A, such as about 10.36 A. 20.
  • the predictive model uses a predictive threshold of pKa (acidic) in the range of about 0 to 16, such as about 2 to 8, and a predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers of a chemical compound ( D E str ucture) in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3, and a predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiamEff) in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A.
  • pKa acidic
  • maximum donor electrophilic
  • MaxDiamEff maximum conformer effective diameter
  • the method comprises the use of a binary classification model.
  • the training set is sorted with respect to hERG channel inhibiting activity.
  • the training set is sorted with respect to hERG channel inhibiting activity measured as IC 50 .
  • the training set is confined by compounds having channel inhibiting activity IC 50 ⁇ 10 ⁇ and compounds having IC 50 ⁇ 40 ⁇ . 28.
  • a negative prediction is associated with a hERG IC 50 ⁇ 40 ⁇ .
  • classification model is developed by use of a binary decision tree classifier.
  • domain is confined by compounds having at least one acidic ionogenic group and either a) no basic ionogenic groups, or b) pKa (acidic) ⁇ pKa (basic).
  • domain is confined by compounds having a) at least one nitrogen atom and 0.1 14 a.u./eV ⁇ maximum donor (electrophilic) superdelocalizability on nitrogen atoms (D E stmcture) ⁇ 0.317 a.u./eV, or b) no nitrogen atom and one or more of the following conditions: pKa (acidic) ⁇ 5 or there exist no conformer so that conformer effective diameter is > 10.36 A.
  • the predictive method is further defined as in items 2 to 43.
  • a computer-assisted method or prediction method further defined as in any one of the preceding items.
  • a structural descriptor of maximum conformer effective diameter on all conformers of a chemical compound (MaxDiamEff) and/or
  • b) a structural descriptor of maximum donor (electrophilic) superdelocalizability on nitrogen atoms on all conformers of a chemical compound (D E st ructure) , and/or
  • the prediction method uses a predictive threshold of pKa (acidic) in the range of about 0 to about 16, such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6.
  • pKa acidic
  • Cianchetta G Li Y, Kang J, Rampe D, Fravolini A, Cruciani G, Vaz RJ (2005),
  • Keseru G Prediction of hERG potassium channel affinity by traditional and hologram qSAR methods, Bioorganic and Medicinal Chemistry Letters— 2003, Volume 13, Issue 16, pp. 2773-2775.
  • Sanguinetti MC Jiang C, Curran ME, Keating MT (1995), A mechanistic link between an inherited and an acquired cardiac arrhythmia: HERG encodes the IKr potassium channel, Cell, 81 (2):299-307.
  • Sanguinetti MC Tristani-Firouzi M, hERG potassium channels and cardiac arrhythmia. Nature 2006, 440, 463 ⁇ 169.
PCT/EP2014/068360 2013-08-30 2014-08-29 A method for prediction of herg potassium channel inhibition in acidic and zwitterionic compounds WO2015028597A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP13182391.6 2013-08-30
EP13182391 2013-08-30
EP13194399.5 2013-11-26
EP13194399 2013-11-26

Publications (1)

Publication Number Publication Date
WO2015028597A1 true WO2015028597A1 (en) 2015-03-05

Family

ID=51454694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2014/068360 WO2015028597A1 (en) 2013-08-30 2014-08-29 A method for prediction of herg potassium channel inhibition in acidic and zwitterionic compounds

Country Status (1)

Country Link
WO (1) WO2015028597A1 (tr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016201566A1 (en) * 2015-06-17 2016-12-22 Uti Limited Partnership Systems and methods of selecting compounds with reduced risk of cardiotoxicity using herg models
WO2016201575A1 (en) * 2015-06-17 2016-12-22 Uti Limited Partnership Systems and methods for predicting cardiotoxicity of molecular parameters of a compound based on machine learning algorithms
US9822670B2 (en) 2015-03-19 2017-11-21 General Electric Company Power generation system having compressor creating excess air flow and turbo-expander for cooling inlet air
US9828887B2 (en) 2015-03-19 2017-11-28 General Electric Company Power generation system having compressor creating excess air flow and turbo-expander to increase turbine exhaust gas mass flow
US9863284B2 (en) 2015-03-19 2018-01-09 General Electric Company Power generation system having compressor creating excess air flow and cooling fluid injection therefor
WO2018049376A1 (en) * 2016-09-12 2018-03-15 Cornell University Computational systems and methods for improving the accuracy of drug toxicity predictions
US10515715B1 (en) 2019-06-25 2019-12-24 Colgate-Palmolive Company Systems and methods for evaluating compositions

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ALESSIO COI ET AL: "Quantitative Structure-Activity Relationship Models for Predicting Biological Properties, Developed by Combining Structure- and Ligand-Based Approaches: An Application to the Human Ether-a-go-go-Related Gene Potassium Channel Inhibition", CHEMICAL BIOLOGY & DRUG DESIGN, vol. 74, no. 4, 1 October 2009 (2009-10-01), pages 416 - 433, XP055106607, ISSN: 1747-0277, DOI: 10.1111/j.1747-0285.2009.00873.x *
ARONOV ET AL: "Predictive in silico modeling for hERG channel blockers", DRUG DISCOVERY TODAY, ELSEVIER, RAHWAY, NJ, US, vol. 1, no. 2, 15 January 2005 (2005-01-15), pages 149 - 155, XP027685049, ISSN: 1359-6446, [retrieved on 20050115] *
SCHIESARO ANDREA ET AL: "Prediction of hERG Channel Inhibition Using In Silico Techniques", 2011, ION CHANNELS AND THEIR INHIBITORS SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, PAGE(S) 191-239, ISSN: null, XP009176729 *
TOBITA M ET AL: "A discriminant model constructed by the support vector machine method for HERG potassium channel inhibitors", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, PERGAMON, AMSTERDAM, NL, vol. 15, no. 11, 2 June 2005 (2005-06-02), pages 2886 - 2890, XP027801109, ISSN: 0960-894X, [retrieved on 20050602] *
WARING ET AL: "A quantitative assessment of hERG liability as a function of lipophilicity", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, PERGAMON, AMSTERDAM, NL, vol. 17, no. 6, 20 February 2007 (2007-02-20), pages 1759 - 1764, XP005895406, ISSN: 0960-894X, DOI: 10.1016/J.BMCL.2006.12.061 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9822670B2 (en) 2015-03-19 2017-11-21 General Electric Company Power generation system having compressor creating excess air flow and turbo-expander for cooling inlet air
US9828887B2 (en) 2015-03-19 2017-11-28 General Electric Company Power generation system having compressor creating excess air flow and turbo-expander to increase turbine exhaust gas mass flow
US9863284B2 (en) 2015-03-19 2018-01-09 General Electric Company Power generation system having compressor creating excess air flow and cooling fluid injection therefor
WO2016201575A1 (en) * 2015-06-17 2016-12-22 Uti Limited Partnership Systems and methods for predicting cardiotoxicity of molecular parameters of a compound based on machine learning algorithms
WO2016201566A1 (en) * 2015-06-17 2016-12-22 Uti Limited Partnership Systems and methods of selecting compounds with reduced risk of cardiotoxicity using herg models
US11462303B2 (en) 2016-09-12 2022-10-04 Cornell University Computational systems and methods for improving the accuracy of drug toxicity predictions
WO2018049376A1 (en) * 2016-09-12 2018-03-15 Cornell University Computational systems and methods for improving the accuracy of drug toxicity predictions
US10515715B1 (en) 2019-06-25 2019-12-24 Colgate-Palmolive Company Systems and methods for evaluating compositions
US10839942B1 (en) 2019-06-25 2020-11-17 Colgate-Palmolive Company Systems and methods for preparing a product
US10861588B1 (en) 2019-06-25 2020-12-08 Colgate-Palmolive Company Systems and methods for preparing compositions
US11315663B2 (en) 2019-06-25 2022-04-26 Colgate-Palmolive Company Systems and methods for producing personal care products
US11342049B2 (en) 2019-06-25 2022-05-24 Colgate-Palmolive Company Systems and methods for preparing a product
US10839941B1 (en) 2019-06-25 2020-11-17 Colgate-Palmolive Company Systems and methods for evaluating compositions
US11728012B2 (en) 2019-06-25 2023-08-15 Colgate-Palmolive Company Systems and methods for preparing a product

Similar Documents

Publication Publication Date Title
WO2015028597A1 (en) A method for prediction of herg potassium channel inhibition in acidic and zwitterionic compounds
Vanhaelen et al. Design of efficient computational workflows for in silico drug repurposing
Rogers et al. Extended-connectivity fingerprints
Ogura et al. Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
Wang et al. A comparative study of family-specific protein–ligand complex affinity prediction based on random forest approach
Chupakhin et al. Predicting ligand binding modes from neural networks trained on protein–ligand interaction fingerprints
Jacoby Computational chemogenomics
Fernández-Torras et al. Connecting chemistry and biology through molecular descriptors
US20060106545A1 (en) Methods of clustering proteins
Shen et al. A comprehensive support vector machine binary hERG classification model based on extensive but biased end point hERG data sets
García-Sosa et al. DrugLogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties
Luo et al. Comparative Analysis of QSAR‐based vs. Chemical Similarity Based Predictors of GPCRs Binding Affinity
Meslamani et al. Computational profiling of bioactive compounds using a target-dependent composite workflow
Yugandhar et al. Feature selection and classification of protein–protein complexes based on their binding affinities using machine learning approaches
Kumar et al. Prediction of drug-plasma protein binding using artificial intelligence based algorithms
Wang et al. Protein‐protein interaction networks as miners of biological discovery
Pérez-Nueno et al. GES polypharmacology fingerprints: a novel approach for drug repositioning
Nguyen et al. Diversity selection of compounds based on ‘Protein Affinity Fingerprints’ improves sampling of bioactive chemical space
Lauria et al. Drugs polypharmacology by in silico methods: new opportunities in drug discovery
Peragovics et al. Contribution of 2D and 3D structural features of drug molecules in the prediction of drug profile matching
Yang et al. Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data
De Simone et al. KUALA: A machine learning-driven framework for kinase inhibitors repositioning
Xie et al. Decomposing the space of protein quaternary structures with the interface fragment pair library
Aher et al. QSAR and pharmacophore modeling of diverse aminothiazoles and aminopyridines for antimalarial potency against multidrug-resistant Plasmodium falciparum
Abdolmaleki et al. Computational multi-target drug design

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14758365

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.06.2016)

122 Ep: pct application non-entry in european phase

Ref document number: 14758365

Country of ref document: EP

Kind code of ref document: A1