Sunteți pe pagina 1din 20

Spatial Database management

SDBMS
A SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial abstract data types (ADTs) and a query language
from which these ADTs are callable
supports spatial indexing, efficient algorithms for processing spatial operations, and
domain specific rules for query optimization
Example: Oracle Spatial data cartridge, ESRI SDE
can work with Oracle 8i DBMS
Has spatial data types (e.g. polygon), operations (e.g. overlap) callable from SQL3 query
language
Has spatial indices, e.g. R-trees

Spatial Database Applications

 GIS applications (maps):


 Urban planning, route optimization, fire or pollution monitoring, utility networks, etc
 Other applications:
 VLSI design, CAD/CAM, model of human brain, etc
 Traditional applications:
 Multidimensional records

SDBMS Example

Consider a spatial dataset with:


County boundary (dashed white line)
Census block - name, area, population, boundary (dark line)
Water bodies (dark polygons)
Satellite Imagery (gray scale pixels)
Storage in a SDBMS table:
create table census_blocks (
name string,
area float,
population number,
boundary polygon );

Modeling Spatial Data in Traditional DBMS

• A row in the table census_blocks


• Question: Is Polyline datatype supported in DBMS?
Spatial Data Types and Traditional Databases

Traditional relational DBMS


Support simple data types, e.g. number, strings, date
Modeling Spatial data types is tedious
Example: Figure 1.4 shows modeling of polygon using numbers
Three new tables: polygon, edge, points
• Note: Polygon is a polyline where last point and first point are same
A simple unit sqaure represented as 16 rows across 3 tables
Simple spatial operators, e.g. area(), require joining tables
Tedious and computationally inefficient
Question. Name post-relational database management systems which facilitate modeling
of spatial data types, e.g. polygon.

Mapping “census_table” into a Relational Database


Spatial Data Types and Post-relational Databases

Post-relational DBMS
Support user defined abstract data types
Spatial data types (e.g. polygon) can be added
Choice of post-relational DBMS
Object oriented (OO) DBMS
Object relational (OR) DBMS
A spatial database is a collection of spatial data types, operators, indices, processing strategies,
etc. and can work with many post-relational DBMS as well as programming languages like Java,
Visual Basic etc.

Spatial Taxonomy, Data Models


Spatial Taxonomy:
multitude of descriptions available to organize space.
Topology models homeomorphic relationships, e.g. overlap
Euclidean space models distance and direction in a plane
Graphs models connectivity, Shortest-Path
Spatial data models
rules to identify identifiable objects and properties of space
Object model help manage identifiable things, e.g. mountains, cities, land-parcels etc.
Field model help manage continuous and amorphous phenomenon, e.g. wetlands, satellite
imagery, snowfall etc.

Spatial Query Language

• Spatial query language


• Spatial data types, e.g. point, linestring, polygon, …
• Spatial operations, e.g. overlap, distance, nearest neighbor, …
• Callable from a query language (e.g. SQL3) of underlying DBMS
SELECT S.name
FROM Senator S
WHERE S.district.Area() > 300

MULTIMEDIA DBMS
A multimedia database management system (MM-DBMS) is a framework that manages different types of
data potentially represented in a wide diversity of formats on a wide array of media sources.
Like the traditional DBMS, MM-DBMS should address requirements:
Integration
• Data items do not need to be duplicated for different programs
Data independence
• Separate the database and the management from the application programs
Concurrency control
• allows concurrent transactions
Requirements of Multimedia DBMS
Persistence
• Data objects can be saved and re-used by different transactions and program
invocations
Privacy
• Access and authorization control
Integrity control
• Ensures database consistency between transactions
Recovery
• Failures of transactions should not affect the persistent data storage
Query support
• Allows easy querying of multimedia data

Multimedia Document Management


Quality of presentation requirements
• Resolution, reliability, rate
Synchronization requirements
• Temporal, spatial, and logical structure specification
Media processing requirements
• Coloring, enhancements, dubbing, etc.
Security attributes

Software Architecture of a Multimedia Database Management System


Research Issues in Distributed Multimedia Database Management

 Distributed Object Management


 Efficient distributed query processing techniques
 Network caching techniques for composition of distributed objects
 Adaptive data filtering techniques to satisfy diverse user QoS requirements

MOBILE DATABASE
A mobile database is either a stationary database that can be connected to by a mobile
computing device (e.g., smart phones and PDAs) over a mobile network, or a database which is actually
stored by the mobile device. This could be a list of contacts, price information, distance travelled, or any
other information

• Functionality required of mobile DBMSs includes ability to:

– communicate with centralized database server through modes such as wireless or Internet
access;

– replicate data on centralized database server and mobile device;

– synchronize data on centralized database server and mobile device;

– capture data from various sources such as Internet;

– manage/analyze data on the mobile device;


– create customized mobile applications.

• Smart client applications have emerged as the architecture of choice over browser-based wireless
Internet applications, as they enable access to data while the mobile user is disconnected from the
network—wireless or otherwise. This capability is best implemented by incorporating persistent
data storage using a mobile database in your application.

• The main advantage of using a mobile database in your application is offline access to data—in
other words, the ability to read and update data without a network connection. This helps avoid
problems such as dropped connections, low bandwidth, and high latency that are typical on
wireless networks today.

PALM (PALM OS)

 Most popular PDA (earlier, 1992)


 13,000 soft. applications
 Focused on calendar, scheduling.
 Popular to Linux User

POCKET PC (Windows / CE)

 Microsoft Backed
 Growing rapidly
 Integrates MS-Windows applications
 Focus was more broad (datawarehouse, etc.)
 Popular to MS-Windows User

Mobile DB Environments

Database Front-End

 C, C++
 Java
 Visual Studios (C++, VB, C#, J#)
 Appforge – Mobile VB

Database Backend

 Sybase’s Ultralite
 Oracle Lite
 MS-Pocket Access
 MS-SQL Server CE
 Pointbase
Sybase

 Market Leader (over 60% of mobile market)


 Mobilink (Synchronization Server) and Ultralite (Mobile Database)
 Can synchronize Data to Databases of different Vendors (not tied to Sybase’s Enterprise Edition).
 Ultralite comes with Appforge’s Mobile VB

Oracle Lite – Basics

 Very Powerful
 supports 100% Java development (through JDBC drivers and the database's native support for
embedded SQLJ and Java stored procedures)
 Supports programming from any development tool that supports ODBC (Visual Basic, C++,
Delphi, and so on).
 WindowsCE (Pocket) and PalmOS (Palm).
 Includes Mobile SQL that is the mobile equivalent of Oracle's SQL*Plus tool.
 Only Oracle DBMS significantly different.
Oracle Lite Architecture
Microsoft’s Mobile Databases

Pocket Access 2002

 ADOCE database access classes.


 for smaller database applications that need to operate on a small number of tables.
 Pocket Access files are stored using the .cdb extension and are populated by one or more tables
from a desktop Access database.
 Replication/Synchronization is very simple, through ActiveSync
SQL Server 2000 Windows CE (SQL Server CE).

 ADOCE database access classes or OLE DB/CE


 replication with an enterprise SQL Server data store as well as advanced database capabilities.
 Synchronized through RDA or Merge Replication (both through IIS)

Choosing a Mobile DB

 MS-Access Server ? -> Pocket Access


 MS-SQL/Server ? -> MS-SQL/CE
 Oracle Server ? -> Oracle Lite
 Multiple Platforms -> Sybase
 Multiple Platform, Java Development -> Pointbase
 Oracle Lite: most powerful
 Sybase’s Ultralite: small footprint, very flexible
Web-Based DBMS
• In just over 10 years, the WWW has grown
– from nothing to the world's most important and powerful information system,
– with hundreds of millions of users and billions of online documents and
– doubling every few years...
• Many businesses now use web-based information systems (intranets)
• As architecture of Web was designed to be platform-independent,
– can significantly lower deployment and training costs
• E-Commerce on the web is growing rapidly
– Data about hourly products, prices, etc better stored in databases than in files
• The web is the primary interface to DBMSs
– Web applications make data available globally

Basics of WWW

• Web is a very large client-server system


– Connected through routers and switches
– Communicating with TCP/IP protocol
– With no centralised control
• Servers publish pages at URLs
• Clients request pages by specifying the URLs
• Pages are transferred on the web using HTTP protocol
• Each HTTP interaction is independent
– No concept of a session

Static and Dynamic Content
• HTML content stored entirely in files is static
• Most web content is dynamic
– needs to vary with time and users
– E.g. Amazon.co.uk
• Dynamic HTML pages need to be generated for every transfer/access
• Dynamic content may come from
– user inputs
– database tables
• Linking databases to web involves creating HTML pages on the fly using database query results
• We learn some techniques to generate dynamic HTML pages

Web Server Software Old Technologies

• Existing Java programs that connect to DBMS can be extended to generate dynamic HTML using
CGI
• CGI = Common Gateway Interface
• CGI is generic and can be used with
– Java, C and other programming languages
– Unix scripts and other scripting languages
• Low-level DB access exploits DB interface libraries such as JDBC

Web Server Software New Technologies


• There are many new technologies that allow access to DBMSs
• Naturally there are advantages and disadvantages associated with each of them
• Examples
– Microsoft IIS, ASP - JScript / VBScript
– Sun Microsystems Java - JSP, servlets
– Netscape LiveWire – Javascript
• In this course we use
– PHP code embedded in HTML
– To access MySQL databases
PHP & MySQL

• Stands for PHP:Hypertext Preprocessor


– Recursive acronym
• Is a scripting language
– Interpreted, not compiled
– Public domain software
• Embedded directly into HTML pages
– Pages are published with .php extension
• Server executes the embedded PHP code every time that page is requested
• PHP+MySQL is a very popular combination for producing dynamic web pages
• MySQL - Public domain RDBMS

Web Database Architecture with PHP and MySQL

1. Browser issues an HTTP request for a particular web page


2. Web server receives the request, retrieves the file and passes it to the PHP engine for processing
3. PHP engine connects to the MySQL server and sends the query
4. MySQL server receives the query, processes it, and sends the results back to the PHP engine
5. PHP engine receives the results, prepares the HTML page and send it to the web server
6. Web server sends the HTML page to the browser and browser displays the page to the user

Building Web database Applications

• Apply appropriate software engineering life cycle


– Requirements analysis
– Design
– Implementation
– Testing
• Security of data is very important in Web Database applications
• Use MySQL privilege system to control access to data
• User identification and personalization is necessary with web database applications
Data Warehouse
 “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of
data in support of management’s decision-making process.”—W. H. Inmon
 Data warehousing:
The process of constructing and using data warehouses
 Organized around major subjects, such as customer, product, sales.
 Focusing on the modeling and analysis of data for decision makers, not on daily operations or
transaction processing.
 Provide a simple and concise view around particular subject issues by excluding data that are not
useful in the decision support process.
Data Warehouse—Integrated
 Constructed by integrating multiple, heterogeneous data sources
 relational databases, flat files, on-line transaction records
 Data cleaning and data integration techniques are applied.
 Ensure consistency in naming conventions, encoding structures, attribute measures, etc.
among different data sources
 E.g., Hotel price: currency, tax, breakfast covered, etc.
 When data is moved to the warehouse, it is converted.

Data Warehouse—Time Variant

 The time horizon for the data warehouse is significantly longer than that of operational systems.
 Operational database: current value data.
 Data warehouse data: provide information from a historical perspective (e.g., past 5-10
years)
 Every key structure in the data warehouse
 Contains an element of time, explicitly or implicitly
 But the key of operational data may or may not contain “time element”.
Data Warehouse—Non-Volatile

 A physically separate store of data transformed from the operational environment.


 Operational update of data does not occur in the data warehouse environment.
 Does not require transaction processing, recovery, and concurrency control mechanisms
 Requires only two operations in data accessing:
 initial loading of data and access of data.

Conceptual Modeling of Data Warehouses

 Modeling data warehouses: dimensions & measures


 Star schema: A fact table in the middle connected to a set of dimension tables
 Snowflake schema: A refinement of star schema where some dimensional hierarchy is
normalized into a set of smaller dimension tables, forming a shape similar to snowflake
 Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of
stars, therefore called galaxy schema or fact constellation
Typical OLAP Operations

 Roll up (drill-up): summarize data


 by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or detailed data, or introducing new
dimensions
 Slice and dice:
 project and select
 Pivot (rotate):
 reorient the cube, visualization, 3D to series of 2D planes.
 Other operations
 drill across: involving (across) more than one fact table
 drill through: through the bottom level of the cube to its back-end relational tables (using
SQL)

Data Warehouse Design Process

 Top-down, bottom-up approaches or a combination of both


 Top-down: Starts with overall design and planning (mature)
 Bottom-up: Starts with experiments and prototypes (rapid)
 From software engineering point of view
 Waterfall: structured and systematic analysis at each step before proceeding to the next
 Spiral: rapid generation of increasingly functional systems, short turn around time, quick
turn around
 Typical data warehouse design process
 Choose a business process to model, e.g., orders, invoices, etc.
 Choose the grain (atomic level of data) of the business process
 Choose the dimensions that will apply to each fact table record
 Choose the measure that will populate each fact table record

Three Data Warehouse Models

 Enterprise warehouse
 collects all of the information about subjects spanning the entire organization
 Data Mart
 a subset of corporate-wide data that is of value to a specific groups of users. Its scope is
confined to specific, selected groups, such as marketing data mart
 Independent vs. dependent (directly from warehouse) data mart
 Virtual warehouse
 A set of views over operational databases
 Only some of the possible summary views may be materialized

Data Warehouse Back-End Tools and Utilities

 Data extraction:
 get data from multiple, heterogeneous, and external sources
 Data cleaning:
 detect errors in the data and rectify them when possible
 Data transformation:
 convert data from legacy or host format to warehouse format
 Load:
 sort, summarize, consolidate, compute views, check integrity, and build indicies and
partitions
 Refresh
 propagate the updates from the data sources to the warehouse
Data Warehouse Usage

 Three kinds of data warehouse applications


 Information processing
 supports querying, basic statistical analysis, and reporting using crosstabs, tables,
charts and graphs
 Analytical processing
 multidimensional analysis of data warehouse data
 supports basic OLAP operations, slice-dice, drilling, pivoting
 Data mining
 knowledge discovery from hidden patterns
 supports associations, constructing analytical models, performing classification
and prediction, and presenting the mining results using visualization tools.
 Differences among the three tasks

Data Mining
 Data mining (knowledge discovery in databases):
 Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) information or patterns from data in large databases
 Alternative names and their “inside stories”:
 Data mining: a misnomer?
 Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging, information harvesting, business intelligence,
etc.
Data Mining — Potential Applications
• Database analysis and decision support
– Market analysis and management
• target marketing, customer relation management, market basket analysis, cross
selling, market segmentation
– Risk analysis and management
• Forecasting, customer retention, improved underwriting, quality control,
competitive analysis
– Fraud detection and management
• Other Applications
– Text mining (news group, email, documents) and Web analysis.
– Intelligent query answering
Steps of a KDD Process

• Learning the application domain:


– relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation:
– Find useful features, dimensionality/variable reduction, invariant representation.
• Choosing functions of data mining
– summarization, classification, regression, association, clustering.
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
– visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge
Data Mining Functionalities

• Concept description: Characterization and discrimination


– Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
• Association (correlation and causality)
– Multi-dimensional vs. single-dimensional association
– age(X, “20..29”) ^ income(X, “20..29K”) à buys(X, “PC”) [support = 2%, confidence =
60%]
– contains(T, “computer”) à contains(x, “software”) [1%, 75%]

• Classification and Prediction


– Finding models (functions) that describe and distinguish classes or concepts for future
prediction
– E.g., classify countries based on climate, or classify cars based on gas mileage
– Presentation: decision-tree, classification rule, neural network
– Prediction: Predict some unknown or missing numerical values
• Cluster analysis
– Class label is unknown: Group data to form new classes, e.g., cluster houses to find
distribution patterns
– Clustering based on the principle: maximizing the intra-class similarity and minimizing
the interclass similarity

• Outlier analysis
– Outlier: a data object that does not comply with the general behavior of the data
– It can be considered as noise or exception but is quite useful in fraud detection, rare
events analysis
• Trend and evolution analysis
– Trend and deviation: regression analysis
– Sequential pattern mining, periodicity analysis
– Similarity-based analysis
• Other pattern-directed or statistical analyses
Data Mining: Classification Schemes

• General functionality
– Descriptive data mining
– Predictive data mining
• Different views, different classifications
– Kinds of databases to be mined
– Kinds of knowledge to be discovered
– Kinds of techniques utilized
– Kinds of applications adapted

A Multi-Dimensional View of Data Mining Classification

• Databases to be mined
– Relational, transactional, object-oriented, object-relational, active, spatial, time-series,
text, multi-media, heterogeneous, legacy, WWW, etc.
• Knowledge to be mined
– Characterization, discrimination, association, classification, clustering, trend, deviation
and outlier analysis, etc.
– Multiple/integrated functions and mining at multiple levels
• Techniques utilized
– Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization,
neural network, etc.
• Applications adapted
– Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis,
Web mining, Weblog analysis, etc.

Data Mart

• A subset of a data warehouse that supports the requirements of a particular department or


business function.
• Characteristics include:
– Do not normally contain detailed operational data unlike data warehouses.
– May contain certain levels of aggregation
Reasons for Creating a Data Mart
• To give users more flexible access to the data they need to analyse most often.
• To provide data in a form that matches the collective view of a group of users
• To improve end-user response time.
• Potential users of a data mart are clearly defined and can be targeted for support
• To provide appropriately structured data as dictated by the requirements of the end-user access
tools.
• Building a data mart is simpler compared with establishing a corporate data warehouse.
• The cost of implementing data marts is far less than that required to establish a data warehouse.

Shaping Data for Data Marts


• Need to maximize flexibility
• Cater for common purposes between marts and basic commonality (sorted out when handing
requirements for the warehouse)
• If difficult to cater for both flexibility and common purpose opt for flexibility
• The rule: Maximize Flexibility, Minimize Anticipation
Data Marts Issues

• Data mart functionality


• Data mart size
• Data mart load performance
• Users access to data in multiple data marts
• Data mart Internet / Intranet access
• Data mart administration
• Data mart installation

S-ar putea să vă placă și