Sunteți pe pagina 1din 37

An Expert System for Chemical Structure Elucidation

Sean Walker COMP 4200 November 13, 2007

Introduction
I will be discussing an expert system developed to determine the chemical structure of an unknown compound (structure elucidation) The expert system is implemented on a blackboard

Introduction
Motivation Structure elucidation is a fundamental component of organic chemistry Requires a wide range of expertise
Each elucidation technique has its own unique vocabulary that needs to be mastered

An expert system can be used to simplify this process

Introduction
Outline

Outline of presentation:
1) Fundamentals of blackboard systems 2) The expertise being modeled
General spectroscopic techniques

3) Description of the expert system

Blackboard Systems

Metaphorically, we can think of a set of workers, all looking at the same blackboard: each is able to read everything that is on it and to judge when he has something worthwhile to add to it. Newell, 1969

Blackboard Systems
A set of experts independently modify solution elements on a central database to produce a complete solution The experts communicate solely through their contributions to the central database Three major components:
1) a globally accessible database (the blackboard) 2) a set of knowledge sources (the experts) 3) a control mechanism (the scheduler)

Blackboard Systems
The Blackboard
Blackboard is structured as an abstraction hierarchy Problems can be solved from different points by different knowledge sources

Items on the blackboard are called entries


Entries on the same level or on different levels of the hierarchy are linked Linked entries constitute a potential solution

Blackboard Systems
The Knowledge Sources

Knowledge sources are structured as conditionaction pairs


The condition component monitors the blackboard for any changes The action component makes changes to the blackboard when the condition-part is satisfied

When the condition is satisfied, the knowledge source is triggered and the scheduler decides whether the knowledge source will execute its action

Blackboard Systems
The Scheduler One or more problem solving strategies are implemented The scheduler examines the current state of the blackboard and decides which triggered knowledge source to execute based on the problem solving strategy in place The scheduler can abandon a strategy and adopt a new one or ignore a strategy altogether in order to pursue the most promising solution

Structure Elucidation
Modern structure elucidation is done using spectroscopy In absorption spectroscopy a frequency of light is irradiated on a sample of the unknown and the absorption of the compound is measured The resulting data is analyzed by an expert and information about the structure of the unknown can be obtained The information collected from each spectra is integrated to determine the complete structure

Spectroscopy
The Electromagnetic Spectrum

Infrared Spectroscopy
Involves the absorption of light in the infrared region of the electromagnetic spectrum Used primarily to determine what functional groups are present in a molecule
H3C O H3C

H3C

NH CH3
CH3

HC

N
H3C

H3C

OH

HC

CH

Infrared Spectroscopy
The broad peak at around 3000 cm-1 indicates the presence of a hydroxyl group (OH) The strong, sharp peak at around 1750 cm-1 indicates the presence of a carbonyl group

UV Spectroscopy
Involves the absorption of light in the ultraviolet region of the electromagnetic spectrum Used to determine the level of conjugation in the unknown
Conjugation is alternating single and double bonds
H3C CH2

UV spectroscopy is not very useful in structure elucidation

Proton NMR
Contains information about the hydrogens in the molecule Three key aspects:
1) chemical shift the type of hydrogen 2) integration ratio of different types of hydrogens 3) splitting nearest neighbour relationship

Can be used to identify the presence of certain functional groups Used primarily to determine how the different functional groups present fit together (the connectivity)

Proton NMR
The peak at around 10 ppm indicates the presence of an aldehyde
H O H3C

The peak at 2.6 ppm is split into 4 peaks (a quartet) indicating adjacent to a carbon with 3 hydrogens

Carbon-13 NMR
Contains information about the carbons in the molecule Three key aspects:
1) chemical shift the type of carbon 2) splitting the number of hydrogens bonded to each carbon 3) number of unique carbons present

Used to determine connectivity

Carbon-13 NMR
Peak at 190 ppm indicates the presence of a carbonyl (C=O) There are 7 total peaks indicating that there are only 7 unique carbons in the molecule

Mass Spectroscopy
Mass spectroscopy is used to determine the molecular formula of the unknown compound Mass spectroscopy data that provides structural information tends to be unreliable and thus will only be used to verify a possible structure or in the event that the other spectral techniques are unsuccessful

Structure Elucidation
Applicability of a Blackboard Architecture Each type of spectroscopy is unique A human expert will often analyze a set of spectra as a whole, selectively determining which spectral information to utilize at a given time The blackboard architecture is ideal for this approach The blackboard architecture also allows for new experts to be added (new spectroscopic techniques)

The Expert System


The Blackboard An expert system implemented on a distributed blackboard has been developed to determine the structure of a chemical compound A sequential implementation of a blackboard would allow only one expert to access the blackboard at a time In a distributed system experts can access different sections of the blackboard at the same time

The Expert System


The Blackboard

The hierarchy of the blackboard is based on the complexity of the structures being produced
Low level, basic structures occupy a certain level of the blackboard while more complicated structures occupy a different level

The Expert System


The Experts

There are two main types of experts:


1) Structure generation routines 2) Spectroscopy experts

Structure Generation Routines


Storing Structures Ideally every possible chemical structure could be stored but this is not feasible
Even a simple formula such as C23H48 has 5,731,580 structural isomers

Instead a set of substructures (components) is stored such that any possible structure can be formed from a combination of these components There are 630 total components Components are classified as primary, secondary or tertiary components

Structure Generation Routines


Types of Components 1) Primary Components:
Primary components are the most basic components for constructing organic molecules (CH3, CH2, CH, C, CO, OH, O, NH2, NH, N, SH, S, F, Cl, Br, I)

2) Secondary Components:
Secondary components are combinations of primary components There are 86 secondary components

3) Tertiary Components:
Tertiary components are secondary components with a restriction on what the component can bond to

Structure Generation Routines


The structure generation routines produce sets of primary, secondary or tertiary components based on input data The sets can be further pruned using spectral information

Spectroscopy Experts
There is an expert for each type of spectroscopy:
1) Infrared Expert 2) Ultraviolet Expert 3) Proton NMR Expert 4) Carbon-13 NMR Expert 5) Mass Spectroscopy Expert

Spectroscopy Experts

Spectroscopy Experts
The data contained in a spectrum may be unreliable or ambiguous
e.g. in a proton NMR spectrum if the chemical shift between two hydrogens is < 1 then the splitting observed may be inaccurate

Heuristic rules are used to handle this ambiguity Uncertainty factors are attached to each conclusion drawn from the spectra

Spectroscopy Experts
Each spectral expert translates the data contained in the spectra into molecular fragments These fragments are placed in an active list which is used to direct and restrict the structure generation routines If fragments from different experts conflict then the fragment with the highest certainty factor is used The conflicting fragment is placed in an inactive list which is used in the event that a correct structure is not found using the active list

Spectroscopy Experts
The spectroscopy experts are also used to test generated structures for consistency with the spectral information The system is able to identify when there is not enough information to verify a possible structure

An Example
Formula of unknown: C7H12O4 93 possible sets of primary components are produced Using these primary sets 497 sets of secondary components are possible
the number of sets of secondary components can be decreased if the primary component sets are pruned using spectral data

An Example

An Example
After pruning the sets of primary components only one possible set remains:
Set contains 2CH3, 2C=O, 2OH, 1C and 2CH2
O

OH C 3 HO

O
O

H3C OH OH CH3

CH3 OH

An Example

Conclusion
Determining the chemical structure of an unknown is an important part of organic chemistry Expert system technology can be applied to this domain A blackboard architecture is especially well suited to this task

References
1) Craig, I. D., Blackboard Systems, Artificial Intelligence Review (1988) 2, 103 - 118. 2) Funatsu, K., Susuta, Y., Sasaki, S., Introduction of Two-Dimensional NMR Spectral Information to an Automated Structure Elucidation System, CHEMICS. Utilization of 2D-Inadequate Information, J. Chem. Inf. Comput. Sci., 1989, 29, 6-11. 3) Sobczak, Ronald S., Matthews, Manton M., An Expert System for Chemical Structure Elucidation Implemented on a Blackboard, Proceedings of the 3rd International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 1990, 91-98. 4) Sobczak, Ronald S., Matthews, Manton M., A Massively Parallel

Expert System Architecture for Chemical Structure Analysis, Distributed Memory Computing Conference, 1990, 11-17. 5) Sasaki, S., Kudo, Y., Structure Elucidation System Using Structural Information from Multisources: CHEMICS, J. Chem. Inf. Comput. Sci., 1985, Vol. 25, 252-257.

S-ar putea să vă placă și