Sunteți pe pagina 1din 6

2010 IEEE International Conference on Granular Computing

Knowledge Representation and Expert Systems for Mineral Processing using Infobright
Alberto Rui Frutuoso Barroso Natural Resources Engineering School of Engineering Laurentian University Sudbury, Canada ar frutuosobarroso@laurentian.ca Greg Baiden Penguin Automated Systems Inc School of Engineering Laurentian University Sudbury, Canada gbaiden@laurentian.ca Julia Johnson Department of Mathematics and Computer Science Laurentian University Sudbury, Canada jjohnson@cs.laurentian.ca

AbstractOpen source tools for Knowledge Representation in databases and the implementation of a real time expert system for mineral processing operations (size reduction and enrichment) are discussed. The use of a column-oriented database system (Infobright IEE) to store quantitative data from sensors that measure feed size distribution, feed rate, aeration rate, pulp density, pH and temperature allows low latency database query responses and real time process control and analysis. Qualitative metadata can be generated with the use of mathematical process models (simulation outputs, reduction equations, transforms), and from the natural language analysis of process data (reagents and ore mineralogy). The toolkits Wordnet and the Natural Language Toolkit (NLTK) were used for metadata generation, processing qualitative text information present in process databases, and for generating data for subsequent inference engine rule checking. We took advantage of the power and ease of the programming language Python to implement a framework for fuzzy and rough set rules generation, and to create an on-line-analytical-processing (OLAP) system for reporting production process parameters. Keywords-Expert Systems; Articial Intelligence; Infobright; CLIPS; Pyke; Rough Sets; Fuzzy Sets; OLAP cubes

I. I NTRODUCTION In a real-time expert system used in mineral processing operations (grinding, otation, cyclone separation), multiple types of input data need to be processed and used to control equipment on the mill production oor. In grinding and otation units we have vibration sensors detecting operational malfunctions, high speed video cameras and processing units giving dimensional values of rock particles that are exiting the grinding circuits, gamma ray devices giving density values of the liquid-solid mixtures (slurries), pH sensors, and high speed temperature and pressure measurements (pumping circuits, autoclaves, etc). Such are examples of quantitative data. In contrast, qualitative data are present in local databases, such as the characteristics of froth products used, type of rods (or balls) used in the grinders, parameters adjusted with manual procedures, and operators comments present in the form of short natural language (English) texts containing domain specic terse abbreviations. There is evidence [1]
978-0-7695-4161-7/10 $26.00 2010 IEEE DOI 10.1109/GrC.2010.133 49

that fuzzy sets decisions help to process uncertain quantitative data from the equipment sensors, with the options to be implemented in a local Programmable Logic Controllers (PLC) or in a centralized system using OPC data links. In a previous application [2], the qualitative data used to populate the expert system, were extracted from local databases and transformed and loaded (ETL operations) into the knowledge database. In conjunction with [3] we conclude that a qualitative knowledge database for mineral processing can be established only after removing imprecise data emerging from a knowledge acquisition phase, allowing subsequent efcient searches by the inference engine (Figure 1 from [4]). Typically the inference engine will parse statements, assign degrees of belief, examine and re rules, use customized search strategies, provide explanations and justications, communicate these to users and external programs, and process the problem solving results [4]. NLTK and Wordnet are versatile tools to discard redundant information from the data before storing it in the knowledge database. NLTK and Wordnet are discussed here as precursors of the Rule-based representation, semantic networks and frames that are the three main methods used for knowledge representation in intelligent decision support systems [5].

Figure 1.

Expert System Components

II. C OLUMN BASED DATABASE MANAGEMENT SYSTEMS Column based database management systems (DBMS) access the database content reading and writing entire columns, allowing fast searches in large databases and data warehouses. Various approaches to column based database management systems have been used in different applications, and some solutions have been tuned to specic problems (Netezza Skimmer [6] ). Hybrid software-hardware solutions add the power of a column based architecture, with the performance of a SQL query processor implemented in a eld programmable gate array (FPGA). Kickre [7] and Xtremedata [8] are examples of analytic appliances with capabilities up to 10 Peta bytes, and query performance improvements up to 100x (10x minimum). A. Infobright system In contrast with hardware based FPGAs, Infobright [9] is a high performance analytic software system designed to handle specic queries on large data sets. Infobright technology [10] combines a column-oriented database with a Knowledge Grid architecture [11] to deliver low waiting times in data analysis. The data are partitioned and physical data structures built with a self-managing structure that eliminates the need for standard database indexes. Infobright provides scalability with solutions up to 50 Tera bytes using a single server, and the 10:1 (up to 40:1) data compression allowing a signicant reduction of the storage media of the database while delivering rapid response to complex queries. The APIs supported by Infobright are extensive and among them a mineral processing engineer will surely nd one that he/she prefers. The APIs include: C, C++, C#, Borland Delphi (via dbExpress), Eiffel, SmallTalk, Java (with a native Java driver implementation), Lisp, Perl, PHP, Python, Ruby, REALbasic, FreeBasic, and Tcl. Infobright supports ANSI SQL-92 with some SQL-99 extensions, standard database interfaces (including ODBC, JDBC and native connections), 500 database users with up to 32 concurrent queries (depending on number of CPU cores and amount of memory), and admits a variety of schema designs. Two versions are available: a GPL2-licensed, open source Community Edition (ICE), and a commercially licensed Enterprise Edition (IEE). ICE, being a self-contained system on PCs was convenient to use for initial implementation and testing, but deciencies regarding limited data types were observed for the application at hand. A complete comparison matrix has been provided on the IEE/ICE site. In summary, it is more fast to migraate a mySQL database to IEE, than ICE. Infobright Enterprise Edition (IEE) was obtained for use in this project and others at Laurentian University on the basis of a special academic promotional offer. It was installed on an 8-core, 8 Giga byte RAM server running the Debian Linux operating system. IEEs MySQL pluggable

storage engine architecture allowed the database server to be accessed using an SQL client running on Windows. III. E XTRACT, T RANSFORM AND L OAD TOOLS Extract, transform, and load (ETL) are functions used in the population of databases, and typically consists in Extracting data from outside sources (les or OPC servers), transforming (deleting, ltering, etc), and loading it into the target database [10]. A. How will ETL be used for mineral processing data? ETL operations are needed for equipment operator comments on error acknowledgments and for log les generated in programmable logic controllers PLC systems. The PLC that controls a mine hoist or a conveyor, transporting the feed (ore) to the mill is one good example of log data generation of parameters, in this case the transported weight, velocity, number of trips, downtime, MTBF, etc. B. AWK scripts AWK is a programming language that is designed for processing text-based data, either in les or data streams, and was created at Bell Labs in the 1970s. The name AWK is derived from the family names of its authors Alfred Aho, Peter Wgeinberger, and Brian Kernighan. Awk scripts nd use for mineral processing data to extract numbers and values from text log les generated in production equipment. C. Palo ETL server Key issues of Palo are ETL and cube technology. The open source Palo ETL Server 3.0 [12] is an Extract, Transform and Load software designed for importing and exporting large quantities of data to/from Palo databases. Data are extracted from heterogeneous sources and master and transaction data are transformed and loaded into Palo models illustrated by the vertical arrows in gure 2. The Palo ETL Server allows automatic data imports. Established relational databases can be connected as data sources via a standardized interface. This includes Infobright databases used in the expert system framework described pictorially shortly. Complex transformations and aggregations, for example, models for cyclone circuits and grinder loops can be represented within a Palo model. Palo ETL Server 3.0 can be operated both from the command line level and, more conveniently, using the ETL web client. Palo uses online analytical processing cube technology for its data structure. Manipulations and analyzes of data may be executed from multiple perspectives. The arrangement of data into cubes overcomes a limitation of relational databases that makes them unsuitable for near instantaneous analysis and display. It is proposed to couple advantages of column-oriented relational databases regarding compression and the resulting speed enhancements with modeling of instantaneous phenomena found in mineral processing applications afforded

50

by Palo cube technology for transforming between relational and OLAP databases.

Generating metadata from qualitative noisy data, compatible with the Dublin core qualiers, requires a powerful language and toolboxes to achieve that objective. Python comes with complete implementations of Wordnet and NLTK toolboxes and hence provides an excellent replacement to AWK scripts in the task of text extraction. B. Wordnet and NLTK code example The following code is an example of word synonym search using Wordnet and Python. The objective is to transform the text to a level 1 compliant form.
## ExpertS.py ## import Tkinter import nltk import MySQLdb from nltk.book import * from MySQLdb import Connect def GenMeta(palavras): from nltk.corpus import wordnet as wn conta=0 while conta < palavras: #len(words): conta = conta + 1 for synset in wn.synsets(words[conta]): print synset.definition palavras=10 conn = Connect(host=localhost , user=root , passwd=123456)

Figure 2.

Jedox Palo ETL Server Architecture [12]

IV. M ETADATA G ENERATION We are showing software to implement in complex mineral processing projects, which have the goal of maximum production efciency. The use of an efcient expert system will reduce the time to execute optimizations cycles. The magnitude of the project dictates that some areas are not developed and metadata generation is one of them. It sufces here to provide one example of metadata generation from a previous project. The Dublin Core metadata element set - ISO 15836:2009 (ISO, 2009) is an example of the need to create a well structured metadata. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements (Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identier, Source, Language, Relation, Coverage, Rights). A. DCMES used for mineral processing metadata Only at the highest level (level 1) of DCMES standard can we normalize process metadata. The layered architecture is diagrammed at http://dublincore.org/metadata-basics/. At Level 1, interoperability among applications sharing metadata is based on a shared vocabulary. Participants within an application environment agree upon the terms to use in their metadata and on how those terms are dened. Interoperability with the rest of the world outside of the implementation environment is generally not a priority. Most existing metadata applications operate at level 1. When metadata is automatically generated from raw data present in log servers, compliance with level 1 of the DCMES architecture is under consideration.

C. Fields trim with AWK PERL and Python are other examples of powerful text processing facilities, but the simplicity of AWK as a Turingcomplete programming language, allows creating lean code to manipulate text-based data and feed it to our real time processing expert system. The following code is a function available in the ETL Datamelt Tootkit. The objective is to trim (eg spaces or special characters) from a raw text le. The code is self documenting with comments internal to it.
#!/usr/bin/awk -f # uwe.geercken@datamelt.com # http://datamelt.com # function to remove blanks on both sides of the string function trim(value) { sub(/ */,"",value) sub(/ *$/,"",value) return value } # begin of processing BEGIN { # setting the files field separator FS=";"; OFS=";"; } { for(i=1;i<=NF;i++) { $i=trim($i); } print; }

51

V. B LENDING I NFOBRIGHT WITH OPC SERVERS AND


DATA MANAGEMENT SYSTEMS

The two essential tasks to be done in a database migration of any sort are: First, export the data from the original source database and, second, import the data into the target database. The syntax for Infobright MySQL export command is SELECT INTO OUTFILE From . . . WHERE . . .. The Infobright analytical engine has differences when compared with a standard MySQL DBMS: Declaration of storage engine type Lack of need for indices or partition schemes Lack of referential integrity checks Removal of constraints Minor data type differences (ICE, 2010) Supported character sets and collations A. Integration with OPC client-server systems Mineral processing function is typically automated with programmable logic controllers (PLC), to read analog and discrete values from the sensors wired to the I/Os. OPC servers (Figure 3) can be used to map those values to MySQL databases for post-processing [13].

Figure 3.

Dataporter CommServer [14]

B. Integration with PI systems Invensys Process Engineering Suite (PES), Wonderware, and Osisoft PI systems (gure 4) are examples of data management systems, used in mineral processing operations (Xstrata, ValeInco). Xstrata is a major global diversied mining group. Xstrata Nickel, Sudbury Operation has approximately 900 employees and produces Nickel and copper smelter products. The 2008 Sudbury Smelter annual production rates were 64,906 tons nickel-in-concentrate, 17,811 tons copper-inconcentrate and 2,698 tons cobalt-in-concentrate.

Key data items, to name just a few, among those mentioned in Xtratas 2009 Regional, Divisional and Site Sustainability Reports (published April 2010) follow (their units of measure are parenthesized): Environmental indicators Direct energy use (PJ), Total energy use (PJ), Total water use (ML) Direct and total greenhouse gas emissions (both measured in CO2 equivalent million tons) Sulphur dioxide stack emissions (tons) Oxides of nitrogen stack emissions (tons) Total recycling and reuse of water (ML) Land disturbed (hectares) Land rehabilitated (hectares) Production indicators and their units of measure Ferrochrome (kt) Vanadium pentoxide (k lbs) Ferrovanadium (k kg) Thermal coal (mt) Coking coal (mt) Semi-soft coking (mt) Total coal (mt) Total mined copper (contained metal) (kt) Total mined gold (contained metal) (koz) Nickel (kt) Ferronickel (kt) Cobalt (kt) Zinc in concentrate production (kt) Zinc metal production (kt) Lead in concentrate production (kt) Lead metal production (kt) Such measured quantities are related in different ways to other items of interest, for example: Indirect energy consumption by primary source Energy saved due to conservation and efciency improvements NOx, SOx, and other signicant air emissions by type and weight Total water discharge by quality and destination Total weight of waste by type and disposal method Total number and volume of signicant spills Weight of transported, imported, exported, or treated hazardous waste Identity, size, protected status, and biodiversity value of water bodies and related habitats signicantly affected by discharges of water and runoff Extent of impact of initiatives to mitigate environmental impacts of products and services Percentage of products sold and their packaging materials that are reclaimed by category Value and number of signicant nes and non-monetary sanctions for non-compliance with environmental laws and regulations

52

Extent of environmental impacts of transporting products and other goods and materials Total amount of land owned, leased, and managed for production activities or extractive use; total land distributed, total land rehabilitated The number/percentage of sites identied as requiring biodiversity management plans, and with plans in place Percentage of product(s) derived from secondary materials The Xstrata local operation uses PI systems for operational, event, and real-time data management recording quantities that eventually feed into national and global reports by extracting data from sensors positioned in production operations. However, users of PI systems require an aid to help them nd and evaluate specic data values emitted from sensors and the relationships among their data types. Sensors may either produce or consume data sometimes switching roles in response to perceived (consumed) inputs from its environment. It is useful to view sensors within a consumer/producer paradigm because the large body of research into data mining in commercial and business applications can be brought to bear upon the staggering knowledge management needs of a mineral processing plant.

on a historical analysis of the past behavior of the consumer as a prediction for future producer/consumer interactions. VI. O PEN SOURCE EXPERT SYSTEMS Expert systems can be implemented from scratch using high level programming languages (Python, Lua, Ruby) and specialized modules for the inference engines (PyFuzzyLib, Pyke). A more desireable approach for Engineers is to use a ready to populate open source expert system, two possibilites of which are described in the remainder of this section. A. CLIPS - C Language Integrated Production System The rst versions of CLIPS [17] were developed at NASA-Johnson Space Center in 1984, trying to eliminate the problems of the LISP language. Nowadays CLIPS is a public domain software tool to develop expert systems that supports three different programming paradigms: rulebased, object-oriented and procedural. CLIPS is written in C for portability and speed, interfaces with Python, with procedural programming capabilities provided by CLIPS are similar to capabilities found in languages such as C, Java, Ada, and LISP. B. D3Web Knowledge System The d3web system [18] is a Java-based prototyping and development toolkit for distributed knowledge systems. It includes a knowledge modelling environment tool (KnowME) and a visual knowledge acquisition tool and an evaluation & management tool. D3web offers various problem-solving methods including: categorical and heuristic rules decision trees and decision tables set-covering models case-based reasoning VII. T YING IT ALL TOGETHER The architecture for a real time expert system in a mineral processing plant is illustrated in gure 5. The implementation requires an OPC server that translates different PLC protocols (Modbus, Probus, CAN, DeviceNet, etc) to standard TCP-IP socket connections. The OPC server (gure 4) will populate the Infobright IEE database as a MySQL compatible database. A list of freely available OPC servers is given in [13]. The OPC server (orange box) sends and receives digital and analog signals to PLC and DCS systems. The OCP Infobright connection (between orange and blue boxes) is the recommended scenario for a typical mineral processing automation system. In this connection we have Human Machine Interfaces (HMI), Distributed Computer Systems (DCS) with supervisory control and data acquisition (SCADA) systems. A customized ETL is required for each of these distinct cases. The tools described [13] are either pre-built or supplied as ready to build source code, or both.

Figure 4.

Osisoft PI systems [15]

Recommender systems connect users with items to consume (purchase, view, listen to, etc.) by associating the content of recommended items or the opinions of other individuals with the consuming users actions or opinions [16]. A sensor in its role as consumer expresses an interest in data from its environment either through its perceptual instrument or by data received from other sensors. Data items from other sensors that might be of interest to a given sensor (the consumer) are recommended based on the sensors on a site that have the most trafc, on certain characteristics of the consumer (eg. strength of its signal), or

53

Some are evaluation versions and some are downloadable from the Web. The grey boxes represent the Expert System that can be implemented using Clips, Python (for connection with the database) and Pyke (for the inference engine), or D3Web. Light yellow illustrates a typical implementation of report generation and OLAP cubes creation. Manual and automatic process control allowing the test and implementation of distinct control strategies (stochastic, heuristic, deterministic, Monte Carlo methods) are placed in the human machine interface block. In future work the ETL operations for qualitative data will be implemented using Python toolboxes, while the quantitative data will be processed using normal algorithmic calculus implementations.

can be integrated with an OPC server to send and receive data from central or distributed systems. The divide and conquer strategy implemented in automation systems with local PID and fuzzy logic control makes way for distributed articial intelligence through remote expert systems. This paper has addressed mineral processing data needs with a focus on the application of data mining and data warehousing techniques. We have provided a framework in which a variety of software are put together to support the development and implementation of a real time expert system in a mineral processing plant. The software are CLIPS, D3web, and Pyke, freely downloadable or open source available for purchase. It was found that the software for ETL (extract, transform, and load) showed variability among applications requiring the evaluation and selection from a variety of available products. Those products have been enumerated. Additionally, the parameters by which the ETL products should be evaluated have been listed. R EFERENCES
[1] R. K. Brouwer, Fuzzy rule extraction from feed forward neural network by training a representative fuzzy neural network using gradient descent, International Journal of Uncertainty, pp. 673698, December 2005. [2] J. Johnson and G. Johnson, Infobright for analyzing social sciences data, Comm. Computer and Information Science: Database Theory and Applications, pp. 9098, March 2009. [3] P. Vaillancourt and J. Johnson, Monitoring network aware sensors using BACnet, IJCNS Int. Journal of Computer Science and Network Security, pp. 1523, November 2006. [4] T. Yalcin, Advanced mineral processing, LU-ENGR5207, pp. 144161, September 2007. [5] EUNITE roadmap, http://www.eunite.org/eunite/index.htm. [6] Netezza, Analytic appliancehttp://www.netezza.com. [7] Kickre, Kickres SQL chip,http://www.kickre.com. [8] XtremeData, Sql in silicon,http://www.xtremedata.com. [9] Infobright, Open source data warehousing http://www. infobright.com. [10] D. Slezak, J. Wroblewski, V. Eastwood, and P. Synak, Bright-house: An analytic data warehouse for ad-hoc queries, PVLDB, pp. 13371345, June 2008. [11] D. Slezak and M. Kowalski, Intelligent data granulation on load: Improving infobrights knowledge grid, Lecture Notes in Computer Science, vol. 5899, pp. 1225, 2009. [12] Jedox Plan Analyse Report,http://www.jedox.com, 2010. [13] OPCconnect, http://www.opcconnect.com/freesrv.php. [14] COMMserver, Opc servers powered by commserve, http: //www.commsvr.com/Products/OPCServer.aspx. [15] OSIsoft PI system http://www.osisoft.com, April 2010. [16] J. B. Schafer, The application of data mining to recommender systems, Encyclopedia of Data Warehousing and Mining, pp. 4448, Mar. 2006. [17] CLIPS, A tool for building expert systems, http://www. clipsrules.sourceforge.net, April 2010. [18] D3Web Knowledge Systems, http://www.d3web.sourceforge. net, April 2010.

Figure 5.

Real Time Mineral Processing Expert System

VIII. C ONCLUSION In the current mineral processing plants in Northern Ontario, Canada (eg., producing copper/nickel matte from sulphidic ores), the level of automation is increasing due to increased demands for productivity and efciency as well as for the need of compliance regarding environment factors. Hundreds of analog and discrete signals given by sensors and control systems are stored in databases either directly or using real-time data management infrastructures [15]. The implementation of a real time expert system to make use of such data requires low latency database read cycles, and column oriented databases tuned for performance and single variable analysis. Most automation communications systems

54

S-ar putea să vă placă și