Sunteți pe pagina 1din 329

Microsoft® Official Academic Course

Microsoft® SQL Server ® Database


Design and Optimization, Exam
70–443 and 70–450

J. Steven Jones
David W. Tschanz
Dave Owen
Wayne R. Boyer
Credits
EXECUTIVE EDITOR John Kane
DIRECTOR OF SALES Mitchell Beaton
EXECUTIVE MARKETING MANAGER Chris Ruel
MICROSOFT SENIOR PRODUCT MANAGER Merrick Van Dongen of Microsoft Learning
EDITORIAL PROGRAM ASSISTANT Jennifer Lartz
PRODUCTION MANAGER Micheline Frederick
PRODUCTION EDITOR Kerry Weinstein
CREATIVE DIRECTOR Harry Nolan
COVER DESIGNER Jim O’Shea
TECHNOLOGY AND MEDIA Tom Kulesa/Wendy Ashenberg

This book was set in Garamond by Aptara, Inc. and printed and bound by Bind Rite Graphics.
The cover was printed by Phoenix Color.

Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States
Copyright Act, without either the prior written permission of the Publisher, or authorization through payment
of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222 Rosewood Drive, Danvers, MA
01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201) 748-6011,
fax (201) 748-6008. To order books or for customer service, please call 1-800-CALL WILEY (225-5945).

Microsoft, ActiveX, Excel, InfoPath, Microsoft Press, MSDN, OneNote, Outlook, PivotChart, PivotTable,
PowerPoint, SharePoint, SQL Server, Visio, Windows, Windows Mobile, Windows Server, and Windows Vista are
either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.

The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events
depicted herein are fictitious. No association with any real company, organization, product, domain name, e-mail
address, logo, person, place, or event is intended or should be inferred.

The book expresses the author’s views and opinions. The information contained in this book is provided without
any express, statutory, or implied warranties. Neither the authors, John Wiley & Sons, Inc., Microsoft Corporation,
nor their resellers or distributors will be held liable for any damages caused or alleged to be caused either directly or
indirectly by this book.

Evaluation copies are provided to qualified academics and professionals for review purposes only, for use in their
courses during the next academic year. These copies are licensed and may not be sold or transferred to a third party.
Upon completion of the review period, please return the evaluation copy to Wiley. Return instructions and a free
of charge return shipping label are available at www.wiley.com/go/returnlabel. Outside of the United States, please
contact your local representative.

ISBN 978-0-470-18365-6

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Foreword from the Publisher

Wiley’s publishing vision for the Microsoft Official Academic Course series is to provide
students and instructors with the skills and knowledge they need to use Microsoft technology
effectively in all aspects of their personal and professional lives. Quality instruction is required
to help both educators and students get the most from Microsoft’s software tools and to become
more productive. Thus our mission is to make our instructional programs trusted educational
companions for life.
To accomplish this mission, Wiley and Microsoft have partnered to develop the highest
quality educational programs for Information Workers, IT Professionals, and Developers.
Materials created by this partnership carry the brand name “Microsoft Official Academic
Course,” assuring instructors and students alike that the content of these textbooks is fully
endorsed by Microsoft, and that they provide the highest quality information and instruction
on Microsoft products. The Microsoft Official Academic Course textbooks are “Official” in
still one more way—they are the officially sanctioned courseware for Microsoft IT Academy
members.
The Microsoft Official Academic Course series focuses on workforce development. These
programs are aimed at those students seeking to enter the workforce, change jobs, or embark
on new careers as information workers, IT professionals, and developers. Microsoft Official
Academic Course programs address their needs by emphasizing authentic workplace scenarios
with an abundance of projects, exercises, cases, and assessments.
The Microsoft Official Academic Courses are mapped to Microsoft’s extensive research
and job-task analysis, the same research and analysis used to create the Microsoft Certified
Information Technology Professional (MCITP) exam. The textbooks focus on real skills for
real jobs. As students work through the projects and exercises in the textbooks they enhance
their level of knowledge and their ability to apply the latest Microsoft technology to everyday
tasks. These students also gain resume-building credentials that can assist them in finding a
job, keeping their current job, or in furthering their education.
The concept of life-long learning is today an utmost necessity. Job roles, and even whole job
categories, are changing so quickly that none of us can stay competitive and productive without
continuously updating our skills and capabilities. The Microsoft Official Academic Course
offerings, and their focus on Microsoft certification exam preparation, provide a means for
people to acquire and effectively update their skills and knowledge. Wiley supports students
in this endeavor through the development and distribution of these courses as Microsoft’s
official academic publisher.
Today educational publishing requires attention to providing quality print and robust
electronic content. By integrating Microsoft Official Academic Course products, WileyPLUS,
and Microsoft certifications, we are better able to deliver efficient learning solutions for
students and teachers alike.

Bonnie Lieberman
General Manager and Senior Vice President

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only) | iii
Preface

Welcome to the Microsoft Official Academic Course (MOAC) program for Microsoft SQL
Server Database Design and Optimization. MOAC represents the collaboration between
Microsoft Learning and John Wiley & Sons, Inc. publishing company. Microsoft and Wiley
teamed up to produce a series of textbooks that deliver compelling and innovative teaching
solutions to instructors and superior learning experiences for students. Infused and informed
by in-depth knowledge from the creators of SQL Server, and crafted by a publisher known
worldwide for the pedagogical quality of its products, these textbooks maximize skills transfer
in minimum time. Students are challenged to reach their potential by using their new techni-
cal skills as highly productive members of the workforce.
Because this knowledgebase comes directly from Microsoft, architect of the SQL Server
operating system and creator of the Microsoft Certified IT Professional exams (www.microsoft.
com/learning/mcp/mcitp), you are sure to receive the topical coverage that is most relevant to
students’ personal and professional success. Microsoft’s direct participation not only assures
you that MOAC textbook content is accurate and current; it also means that students will
receive the best instruction possible to enable their success on certification exams and in the
workplace.

■ The Microsoft Official Academic Course Program


The Microsoft Official Academic Course series is a complete program for instructors and institu-
tions to prepare and deliver great courses on Microsoft software technologies. With MOAC,
we recognize that, because of the rapid pace of change in the technology and curriculum developed
by Microsoft, there is an ongoing set of needs beyond classroom instruction tools for an
instructor to be ready to teach the course. The MOAC program endeavors to provide solutions
for all these needs in a systematic manner in order to ensure a successful and rewarding course
experience for both instructor and student—technical and curriculum training for instructor
readiness with new software releases; the software itself for student use at home for building
hands-on skills, assessment, and validation of skill development; and a great set of tools for
delivering instruction in the classroom and lab. All are important to the smooth delivery of an
interesting course on Microsoft software, and all are provided with the MOAC program. We
think about the model below as a gauge for ensuring that we completely support you in your
goal of teaching a great course. As you evaluate your instructional materials options, you may
wish to use the model for comparison purposes with available products.

www.wiley.com/college/microsoft or
iv | call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Illustrated Book Tour

■ Pedagogical Features
The MOAC textbook for SQL Server Database Design and Optimization is designed to cover
all the learning objectives for that MCITP exam, which is referred to as its “objective domain.”
The Microsoft Certified Information Technology Professional (MCITP) exam objectives are
highlighted throughout the textbook. Many pedagogical features have been developed specifi-
cally for Microsoft Official Academic Course programs.
Presenting the extensive procedural information and technical concepts woven throughout
the textbook raises challenges for the student and instructor alike. The Illustrated Book Tour
that follows provides a guide to the rich features contributing to Microsoft Official Academic
Course program’s pedagogical plan. Following is a list of key features in each lesson designed
to prepare students for success on the certification exams and in the workplace:
• Each lesson begins with an Lesson Skill Matrix. More than a standard list of learning
objectives, the Domain Matrix correlates each software skill covered in the
lesson to the specific exam objective domain.
• A Lab Manual accompanies this textbook package. The Lab Manual contains hands-on
lab work corresponding to each of the lessons within the textbook. Numbered steps
give detailed, step-by-step instructions to help students learn workplace skills associ-
ated with database design and optimization. The labs are constructed using
real-world scenarios to mimic the tasks students will see in the workplace.
• Illustrations: Screen images provide visual feedback as students work through the
exercises. The images reinforce key concepts, provide visual clues about the steps, and
allow students to check their progress.
• Key Terms: Important technical vocabulary is listed at the beginning of the lesson.
When these terms are used later in the lesson, they appear in bold italic type and are
defined. The Glossary contains all of the key terms and their definitions.
• Engaging point-of-use Reader aids, located throughout the lessons, tell students why
this topic is relevant (The Bottom Line), provide students with helpful hints (Take Note),
or show alternate ways to accomplish tasks (Another Way). Reader aids also provide
additional relevant or background information that adds value to the lesson.
• Certification Ready? features throughout the text signal students where a specific
certification objective is covered. They provide students with a chance to check their
understanding of that particular exam objective and, if necessary, review the section
of the lesson where it is covered.
• Knowledge Assessments provide lesson-ending activities.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only) | v
vi | Illustrated Book Tour

■ Lesson Features

Analyzing and L ES S O N 4
Designing Security
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Analyze business requirements. Foundational
Gather business and regulatory requirements. Foundational Lesson Skill Matrix
Decide how requirements will impact choices at various security levels. Foundational
Evaluate costs and benefits of security choices. Foundational
Decide on appropriate security recommendations. Foundational
Inform business decision makers about security recommendations Foundational
and their impact.
Incorporate feedback from business decision makers into a design. Foundational
Integrate database security with enterprise-level authentication systems. Foundational Certification Ready alert
Decide which authentication system to use. Foundational
Design Active Directory organizational units (OUs) to implement server- Foundational
level security policies.
Ascertain the impact of authentication on a high-availability solution. Foundational
Establish the consumption of enterprise authentication. Foundational
Ascertain the impact of enterprise authentication on service uptime requirements. Foundational 92 | Lesson 4

Modify the security design based on the impact of network security policies. Foundational
CERTIFICATION READY?
You should make decisions yourself as much as possible; but when you’re faced with man-
Analyze the risk of attacks to the server environment and specify mitigations. Foundational dates or directives that conflict with one another, you need to seek resolution from those in
Be prepared for exam
questions giving you charge of the company—especially if the decision is made to stray from regulatory guidelines.
choices on conflicting Company leaders often have a working relationship with standards bodies or governmental
requirements. Pay offices and can adapt the requirements to meet your company’s needs.
attention to stated
KEY TERMS objectives and their
If you’re forced to choose between conflicting requirements yourself, understand the impli-
cations of ignoring any particular set of rules. In making your decision, you should meet
audit: An independent verification
of truth.
active directory (AD): The oper-
domains, organizational units,
computers, etc.
organizational unit: An object
security policy: The written
guidelines to be followed by all
employees of the enterprise to
Key Terms importance.
all requirements to the greatest extent possible, but understand that governmental regula-
tions usually are more important than corporate or certification ones. Penalties for ignoring
WARNING If you aren’t
requirements that have been written into law or codified by a governmental office can be
ating system’s directory service within Active Directory that protect data and resources from Warning!

a corporate officer, then you are financial woe for your company and may result in incarceration.
that contains references to all may contain other objects unintended consequences. A somewhat shielded from legal
objects on the network. Examples
include printers, fax machines,
such as other organizational
units (OUs), users, groups,
computers, etc.
security policy, for example, should
exist guiding all users on how to
protect their network password.
Warning responsibilities—but you aren’t
completely absolved of responsi-
bility if you don’t meet regulations.
If you’re choosing between your corporate mandates and the guidelines of a standards body or cer-
tification (such as ISO 9000), you should follow your corporate mandates. This is a general guide-
line; make sure you have the permission of your company’s executives to proceed in this manner.
user names, user passwords, Losing your job is one thing; going

Reader Aid to jail is something else entirely.


Analyzing the Cost of Requirements

Not all requirements you gather will be implemented on your SQL servers. Regulatory
and mandatory requirements will be adhered to, but there may be requirements that the
business would like to impose but chooses not to for cost reasons.
87
Every security decision you make has a cost. It isn’t necessarily a monetary cost, such as the
purchase of a piece of auditing software. It can be a cost in terms of time (RSA 2048-bit
encryption takes too long to complete with current technology), in terms of effort (requiring
two-factor authentication will result in too many errors from users), or in terms of another
resource. As the designer for your SQL Server infrastructure, you need to weigh the costs and
benefits of each decision to determine whether it’s worth pursuing.
Financial costs are simple to determine via price quotes from vendors and suppliers, licensing
costs based on existing installations or user counts, and so on. You can generally gather this
information easily and use it to determine the amount of money that your company must
spend for security items. Make sure to assign these direct dollar costs to each particular item.
Nonfinancial costs are difficult to establish, and you’ll have to decide how your company will
assign the value of those costs. You need to allocate a value in dollars (or some other currency)
so that you have a method of measuring these expenses along with other costs. You can do this
in a number of ways, almost all of which require that you consult with the people and depart-

Another Way Reader Aid ments that will be affected to gather information about the impact from a particular decision.
Time is an easy cost to determine. Often, the time an event takes can be translated into an
expense based on the cost of the resources involved. Each employee has a cost that can be
divided out to determine the per-minute value of his or her time. Security decisions often
impose a burden on people that equates to time spent on some activity, so it’s relatively simple
to determine the security cost of a particular decision.
4 | Lesson 1
When you examine the cost of time, include all the people involved. For example, a pass-
• Record SQL Server configuration settings. Record the minimum and maximum memory word change resulting from a security decision to expire passwords results in the use of the
ANOTHER WAY
settings, the CPUs used by the instance, and the default connection properties for each
TAKE NOTE
* time of at least two people: the person deciding whose password must be changed and the
SQL Server instance. person making the change.
Use the stored proce-
dure sp_configure with • Review the configuration management process for proposing, approving, and imple-
“show advanced option” menting configuration changes, and identify opportunities to make the process more Other costs, such as increased time for customers or clients to use your system, their desire
to display the current efficient. What tools are used? or ability to work with your system, or even potential costs for others to integrate with you,
settings. SQL Server • Assess the quality of the database server documentation. must be estimated by someone in your organization. The sales department may need to exam-
Configuration Manager ine your requirements and determine the opportunity cost of a decision on the company’s
• Verify the capabilities of disk subsystems and physical storage. Determine whether overall ability to generate revenue.
can help you collect net- the RAID levels of existing disk subsystems support data availability and performance LAB EXERCISE

work configuration data, requirements. Perform the exercise in your lab In Exercise 4.1, you’ll determine the time cost of resetting passwords.
such as the libraries, manual.
• Determine the locations of transaction log files and data files.
protocols, and ports for
each instance. • Examine the use of filegroups.
• Are adequate data-file sizes allocated to databases, including the tempdb database?
• Verify that the AutoShrink property is set to False, to ensure that the OS files
X REF maintaining table and index data are resized accordingly.
Lesson 2 discusses disk • Determine whether disk-maintenance operations, such as defragmentation, are
subsystems and physical performed periodically.
storage design consider- • Assess Event Viewer errors to identify disk storage-related problems.
ations in more detail.

Accommodating Changing Capacity Requirements

Requirements analysis is key to the process of designing modifications to a database server


infrastructure. Just as you need to know the purpose of a house in order to build one that
meets your needs, you must properly identify the business requirements in order to design
your infrastructure. Otherwise, your design won’t meet the needs of the organization; and
not only can you forget professional pride, you’ll be lucky if you still have a job.

It’s essential that you always work in a collaborative way with company management, IT staff,
and end users to identify both the technical and business requirements that the new database
infrastructure is expected to support.
There is an intricate dance between the technical aspects and the nontechnical aspects of a
project, and weaving them together seamlessly is one of your most important, if never really
specified, jobs. Technical aspects and requirements typically focus on tasks such as capacity,
archiving, distribution, and other needs. These are affected by business requirements that
include budgetary and legal restrictions, company IT policies, and so on. Successful compre-
hension of both sets of requirements allows you to know precisely the scope of modifications
to the infrastructure and establishes a valuable foundation on which to base your design and
modification decisions.
When designing modifications to a database server infrastructure, you must consider its
future capacity needs based on the projected business growth of the organization. In addition,
you must consider requirements pertaining to data archiving, database server consolidation,
and data distribution.

CONSIDERING TECHNICAL REQUIREMENTS


The rest of this lesson introduces specific capacity needs, usually when talking about a specific
server. It’s crucial to a successful design that you analyze and identify the capacity require-
ments of the database server infrastructure as a whole.
Because it’s difficult to extrapolate the capacity needs of the entire infrastructure, you may not
always be able to project growth except in qualitative and general terms. You should, nonethe-
less, answer these questions for your future planning estimates and projections:

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Illustrated Book Tour | vii

Analyzing and Designing Security | 89

• All logins must be mapped to Active Directory accounts.


• Customer Social Security numbers must be encrypted as per government regulations.
• All data access to the medical database must be audited.
• Only bonded individuals can be assigned system administrator privileges as per insur-
ance guidelines.
After you gather the requirements from all sources, be sure to document any existing security
settings on your SQL Servers. These may or may not be in conflict with the requirements,
but in designing a security plan, you should consider the current environment. Have
mitigation plans handy for any changes to be sure that the databases remain available and
functional to users.
Before examining how you’ll use these requirements, you must understand the security scope
in SQL Server.

Case Study: Gathering Requirements


Designing the Hardware and Software Infrastructure | 21
You’ve been assigned the task of architecting a new infrastructure for the SQL Server
2005 upgrade at a U.S. pharmaceutical company. To ensure that your design complies
with all applicable requirements, you schedule interviews with the chief operations officer SQL SERVER 2005
and his staff as well as the senior researchers. There are five different editions of SQL Server 2005: Microsoft SQL Server 2005 Enterprise/
You’re informed that you must adhere to a number of requirements: 10CFR15 as mandated Developer/Evaluation edition, Microsoft SQL Server 2005 Standard edition, Microsoft
by the U.S. government, Sarbanes-Oxley requirements for the company as a publicly held SQL Server 2005 Workgroup edition, Microsoft SQL Server 2005 Developer edition, and
entity, and various insurance requirements to ensure worker and customer safety. Microsoft SQL Server 2005 Express edition/Express edition with Advanced Services. The
most common editions used are Enterprise, Standard, and Express, because these editions fit
The process of complying with these regulations means you must validate every security the requirements and product pricing needed in production server environments:
decision against all the different requirements. An internal committee of employees will
check your plan’s compliance when you’ve completed it. • SQL Server 2005 Enterprise edition (32-bit and 64-bit). This edition comes in both
32-bit and 64-bit varieties. This is the ideal choice if you need a SQL Server 2005
Once you’ve made the necessary decisions, you need to ensure that a representative from edition that can scale to near limitless size while supporting enterprise-sized On-Line
each body whose requirements you’re meeting audits the plan and documents compliance Transaction Processing (OLTP), highly complex data analysis, data-warehousing systems,
with or deviation from each of their requirements. and Web sites.
Enterprise edition has all the bells and whistles and is suited to provide comprehensive
business intelligence and analytics capabilities. It includes high-availability features such
Understanding Security Scope as failover clustering and database mirroring. It’s ideal for large organizations or situations
with the need for a version of SQL Server 2005 that can handle complex situations.
X REF In SQL Server, security is applied at various levels, each encompassing a different scope on
• SQL Server 2005 Standard edition (32-bit and 64-bit). Standard includes the
External Windows which it applies. Security can be applied at the server level, the database level, and the schema
level. This Lesson will examine overall security system design for the entire enterprise. essential functionality needed for e-commerce, data warehousing, and line-of-business
server–level security will solutions but does not include some advanced features such as Advanced Data
be dealt with in Lesson 5 Transforms, Data-Driven Subscriptions, and DataFlow Integration using Integration
and internal server Services. The Standard edition is best suited for the small- to medium-sized organization
instance and database Figure 4-1 shows the hierarchy of a SQL Server. The highest level is the server instance, which
contains one or more databases. Each database has its own users, which are mapped to server that needs a complete data-management and analysis platform without many of the
security in Lesson 6.
instance level logins. Database security applies to the database container as well as all objects advanced features found in the Enterprise edition.
within that database. Outside of the SQL Server are the Windows server and enterprise-level • SQL Server 2005 Workgroup edition (32-bit only). Workgroup edition is the data
security structures. management solution for small organizations that need a database with no limits on size
SQL Server has a four-part set of security levels: server, database, schema, and object. The or number of users. It includes only the core database features of the product line (it
schema level was introduced with SQL Server 2005. A schema is essentially a container of doesn’t include Analysis Services or Integration Services, for example). It’s intended as
objects within a database; a single database can include multiple schemas. SQL Server 2000 an entry-level, easy-to-manage database.
blended the object’s owner and a schema to form a multipart naming system for objects. Thus • SQL Server 2005 Developer edition (32-bit and 64-bit). Developer edition has all
dbo. TestTable and Steve. TestTable were two different objects. However, the owner, Steve in the features of Enterprise edition, but it’s licensed only for use as a development and test
this case, was bound to the objects, and it was cumbersome to remove the user Steve. system, not as a production server. This edition is a good choice for persons or organiza-
tions that build and test applications but don’t want to pay for Enterprise edition.
• SQL Server 2005 Express edition (32-bit only). SQL Server Express is a free,
easy-to-use, simple-to-manage database without many of the features of other editions
(such as Notification Services, Analysis Services, Integration Services, and Report
Builder). SQL Server Express is free and can function as the client database as well as a
basic server database. It’s a good option if all that’s needed is a stripped-down version of
SQL Server 2005. Express is used typically among low-end server users, nonprofessional
developers building web applications, and hobbyists building client applications.

SQL SERVER 2008


X Ref Reader Aid With SQL Server 2008, Microsoft is now bundling both the 32-bit and 64-bit software
together with one license for all editions of the product. This in itself is a significant
change SQL Server 2005, where you could only purchase or obtain 32-bit software for the
Workgroup and Express editions. Also a new Web edition has been created for Internet web
server usage. The 2008 editions of SQL Server are:

SQL Server 2008 • SQL Server 2008 Enterprise edition. The Enterprise edition includes all features in
SQL Server 2008. The following features are only available in this edition of SQL Server
2008 (plus Developer and Evaluation editions as they are simply restricted license ver-
Information sions of Enterprise):
° Data Compression
° Extensible Key Management
° Hot Add CPUs, RAM
° Resource Governor

126 | Lesson 5

■ Knowledge Assessment

Case Study
The Ever-Growing Wealth Company
The Ever-Growing Wealth Company manages retirement funds for many people
and is concerned about the security of its data. To ensure that its database servers are
adequately protected, the company decides to review and revamp its security policies. End-of-lesson Case Studies
Planned Changes
The company’s management thinks the security policies for its applications must be
strengthened and that encryption needs to be deployed. However, these changes can’t
cause problems in the event that disaster-recovery plans must be implemented.

Existing Data Environment


The company currently has two SQL Servers that separately support two different
applications. A third SQL Server receives copies of all backups immediately after they’re
completed and is available in the event of a disaster. One of these, called SQLWeb,
supports the company Web site on the Internet. The other, SQLTrading, supports the
portfolio management and trading application.
SSIS is expected to be used to move some data between these two servers.

Existing Infrastructure
All these servers are stored in the company’s data center, which is a climate-controlled,
converted office in the company’s current location. The company would like to move all
its servers to a co-location facility with a dedicated network connection back to the office.
Currently, a tegwc.com domain contains two main organizational units (OU), one for
the internal employees and one for any client accounts.
The two SQL Servers are named instances that use dynamic ports. A firewall protects
the entire network, but all servers exist in a flat Ethernet topology as shown in the Case
Exhibit of this case study.

Business Requirements
The clients of Ever-Growing Wealth expect to be able to access their data at any time
of the day or night. The existing disaster-recovery plan allows system administrators a
five-minute response time to failover the SQL Servers, and this is deemed acceptable.
However, it can’t take more time than this to get the application running.
The company expects that regulatory requirements will be enacted soon for all financial
companies, so the strongest encryption possible is preferred, balancing the performance
of the servers. Newer hardware is available to make up for any issues from the imple-
mentation of encryption.

Technical Requirements
For the new servers, the company purchased the next generation of hardware to allow
for the additional load of encrypting data. However, complete encryption of all data
using asymmetric keys will likely overload these servers; therefore, the security policy
must work within these hardware constraints.
Each instance has a SQL Server Agent service that performs various functions,
including copying backup files to another server and running business maintenance
jobs that access the mail server.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
viii | Illustrated Book Tour

88 | Lesson 4

In SQL Server 2000, some key security templates made security cumbersome and resulted
in workarounds that often didn’t meet users’ requirements. As a result, one of the key
design considerations with SQL Server 2005 was an increased level of security for the
server. SQL Server now not only includes more control and capabilities but also makes it
easier for the DBA to administer the security policies for the server.

This Lesson will examine the methods and reasoning behind designing an effective database-
level security policy for your SQL Server instances.

■ Gathering Your Security Requirements

Before you can develop an effective security policy, you must understand the requirements
that your plan must meet. These include requirements dictated by your business as well
THE BOTTOM LINE as any regulatory requirements imposed on your business by governmental or regulatory
agencies. Your plan must cover both of these types, and you must resolve any conflicts
between the two based on your situation.

The requirements imposed on your SQL Servers by the business will in all likelihood be easier
The Bottom Line
to meet (in other words, they will be less restrictive) but will probably be harder to ascertain.
When someone in business decides on a requirement for an application, that requirement
may or may not be documented thoroughly, which can cause you difficulties during planning.
Reader Aid
You’ll spend much of this part of the design process interviewing executives, business liaisons,
stakeholders in each application, developers, administrators, and anyone else who may know
why an application has a security need. Designing a SQL Server Solution for High Availability | 217
The regulatory requirements, conversely, should be easy to determine. A business IT liaison
should be able to let you know which governmental regulations apply. Once you know the If SQLProd01 fails for some reason, SQLProd02 will become the primary server after failover
applicable laws or codes, you can look them up from the appropriate agency’s offices or Web and start responding to client requests. Only one server’s resources are used at a time, mean-
site and incorporate them into your documentation. ing that half your server hardware (excluding disk drives) isn’t being used at any given time.
In this case, only one SQL Server license is needed for the one virtual server.
A second example, illustrated in Figure 10-2, shows a three-node, active/active cluster with three
As you gather this information, document it carefully. You may want to segregate the physical servers and three virtual servers. In this case, each server is actively used at all times to
TAKE NOTE
* data by server instance and database for ease of locating it later. You’ll use the various do work, and three SQL Server licenses are required for the three active server instances.
requirements to design the security policy for your SQL Server.
The failover strategy is more complex in this example, with each server having a designated
failover server in a round-robin fashion. Table 10-2 shows the virtual servers, primary physical
instance, and the failover physical instance.
WARNING Make sure you
Warning!
In addition to regulatory or governmental requirements, you may be subject to requirements
know the exact details of the from industry groups, standards bodies, or even insurers. Each certifying, regulating, or
requirements—and don’t rely on a
Figure 10-2
summary from a source other than industry-related company that interacts with your organization may have its own set of rules Three-node active/passive
the regulatory agency. A digest or and regulations. clustering
guideline from another source can
help you understand the rules, but Often these governmental rules require different consideration than the rules that are estab-
your security decisions must satisfy lished for the rest of your enterprise. Regulatory rules exist to meet governmental standards
the original requirements.
or rules, while your enterprise will have developed rules to meet its own goals. If possible, it Client
helps to conform all your servers to the same set of rules. This makes it easier for everyone to
both administer the servers and understand the way each server works. This may not be pos-
sible for some applications that have conflicting requirements. For example, your accounting Informative
systems may be bound by requirements for auditing that are mutually exclusive from other
systems that require a high degree of privacy for the data. The following are a few example
requirements: Diagrams SQLProdA

SQLProd01 SQLProd02

Shared Disk

SQLProdC SQLProdB
Client Client

SQLProd03

Table 10-2
Three-node failover V IRTUAL S ERVER P RIMARY S ERVER S ECONDARY S ERVER
Easy-to-Read SQLProdA SQLProd01 SQLProd02
SQLProdB SQLProd02 SQLProd03
Tables SQLProdC SQLProd03 SQLProd01

If any node fails, then the virtual server moves to another instance. However, when this
occurs, one physical server will be spreading its resources to serve two virtual instances. In
this example, if SQLProd02 fails, then SQLProd03 must serve clients connecting to both
SQLProdC and SQLProdB.
In order for the applications to function at a similar performance level, each server must have
enough spare processor cycles and memory to handle the additional load of a second instance
in the event of a failover.

Designing Windows Server-Level Security | 125

Physically Securing Your Servers

Every server that you have running in your enterprise should be physically secured from
unauthorized access. There are many ways of enforcing security and protecting your
server through software, but most of these can be circumvented if the server can be
CERTIFICATION READY?
When examining
physically accessed or attacked. The local file system security can be bypassed if someone
security, be sure you can boot a server from another source, and this can lead to security-related files or data
grasp the breadth and files being copied and the data compromised.
depth of this topic. Do
you understand how
SQL Servers are no exception. But because they can be easily set up on many platforms and
authentications, physical
barriers, firewalls,
are used in testing new solutions, sometimes the servers’ physical security isn’t maintained as
disaster recovery plans, they’re moved to an employee’s office or cubicle.
business recovery plans, If you’re storing enterprise data on a SQL Server, the server should be stored in a physically
risk analyses, policies, secure manner. This means behind a locked door with a limited number of people able to
enforcement, incident
access the machine. Access controls that log and control which individuals can access the
response plans, and
forensic investigations all
room are preferred; they’re even mandated in some environments.
interact? SQL Servers often have large disk subsystems, so be sure the disks are secured to prevent their
physical theft. Due to the large data sets, tape backup systems are often used. Make sure phys-
ical control over these tapes is maintained and they aren’t allowed to sit on a desk or other
unsecured area where unauthorized people have access to them.

S K I L L S U M M A RY

This Lesson has investigated how to design Windows server-level security. The server-level
policies provide the highest level of security for SQL Server. Your password and encryption
policies should provide the level of security you need, balanced with the performance required
on your server. The services, service account, and firewall policies should be set to the
absolute minimums required for each server. Enabling all services or opening all possible ports
increases the surface area available for attack on your server unnecessarily. Configure and
make available those items only when you need them, and disable them when they’re no
longer needed.
Security is an ongoing process and should evolve as your server changes. Developing policies
Summary Skill Matrix
and procedures that make the least amount of resources available from a security perspective
will help to ensure that you’re protected and that your server functions in an optimum
manner at all times.
For the certification examination:
• Understand the SQL Server password policy. You should know the options for password
policies in SQL Server and the impact of each one.
• Understand the different SQL Server encryption options. You should know how encryption
is configured at the server level in SQL Server.
• Know how to properly configure a service account. SQL Server has different sections that
require service accounts, and you need to know how they should be configured.
• Understand how antivirus software interacts with SQL Server. You should be able to
configure antivirus software to coexist with a SQL Server instance.
• Know how to enable and disable services. SQL Server consists of multiple services, and
you should understand how and why to enable or disable them.
• Understand how server-level firewalls interact with SQL Server. A server-level firewall is a
software service that runs alongside a SQL Server instance. Understand how these interact
and how they should be configured.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Illustrated Book Tour | ix

156 | Lesson 7

• EXECUTE AS SELF. This context is similar to the EXECUTE AS <user_name> context,


but it uses the context of the user creating or altering the module. SELF, in this case,
applies to the user that is executing the CREATE or ALTER statement on the module.
As an example, Steve is creating the NewSchema.MyProcedure stored procedure. The
code is as follows:
CREATE PROCEDURE NewSchema.MyProcedure
WITH EXECUTE AS SELF
AS
SELECT * FROM Steve.MyTable
Steve then grants Dean permission to execute this stored procedure. When Dean exe-
cutes it, permissions are checked to be sure he can execute the module, but the permis-
CERTIFICATION READY?
sions check on Steve.MyTable uses Steve’s permission set.
Know the forms of the
EXECUTE AS command • EXECUTE AS OWNER. This context uses the permission set of the module owner for
and be prepared to all objects referenced in the module. If the module doesn’t have an owner, the owner of
identify how the use of the module’s schema is used instead.
this command would This is similar to EXECUTE AS SELF if the person creating the module is the same as
alter the execution
the owner at execution time. However, because object ownership can be changed, this
context.
context allows the permission check to move to the new owner of the object.

In the three cases where execution context is set to a particular username, that user can’t be
TAKE NOTE
* dropped until the execution context is changed. Take Note Reader Aid
Case Study: Developing an EXECUTE AS Policy for an Object

These are all powerful features that allow you to temporarily assign a different set of
permissions to a user by allowing them to execute a module. These permissions don’t
carry through—for example, executing a permission on a module calling the Sales table
doesn’t grant permission to access the Sales table.
This limitation is useful when you want to let users access cross-schema objects, but you
don’t want to grant them explicit rights. Just as with schemas, implications exist that
In-lesson Case Study
can cause issues in administering security.
Because users tend to change more often than permissions or objects, you use tech-
niques that allow for this flexibility. In assigning permissions, you use groups and roles
to collect users together for easy administration. Starting with the 2005 version of SQL
Server the concept of a schema has been available. The schema separates object owner-
ship from individual users for the same reason. And this should caution you against
using a particular user or SELF to change execution context: Because a one-to-one
mapping exists between the user and a module, if the user needs to be dropped, every
module must be altered to change the execution context. This is the same administrative
issue with users both owning an object and being its schema.
Instead, if you need to grant temporary permissions, the EXECUTE AS OWNER
statement is the best choice if the permissions for the owner are set appropriately for the
Knowledge Assessment
referenced objects. However, this can still cause issues if the administrator doesn’t want
an object’s owner to have the extended permissions.
The best policy you can implement is to create specific users that are in a role expressly
created to meet your permissions needs. These users shouldn’t map to a user login, but
rather should exist only to execute the modules requiring special permissions.
206 | Lesson 9
If you think your environment is static enough to use individual users, then EXECUTE
AS is a good way to change permissions in only one module.

■ Knowledge Assessment

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.

1. Which of the following are benefits of having database naming conventions? (Choose all
that apply. )
a. Provides a method to organize infrastructure
b. Reduces the learning curve for new database administrators
c. Makes coding easier
d. All of the above
2. Which of the following are the most important attributes of a naming convention?
(Choose all that apply.)
a. Flexibility
b. Regulatory requirements
c. Consistency
d. Size of the organization
Designing Physical Storage | 53
3. Which of the following database objects should have a naming convention? (Choose all
that apply.)
You can use Configuration Manager to perform the following tasks: a. Database
Manage services. You can use Configuration Manager to start, pause, resume, or stop ser- b. Table
vices, to view service properties, or to change service properties. As you can see in Figure 2-3, c. Trigger
Configuration Manager gives you easy access to SQL Server Services. d. Index
Change the accounts used by services. You should always use SQL Server tools, such 4. Which of the following practices should not be followed?
as SQL Server Configuration Manager, to change the account used by the SQL Server or a. Prefixing a view with vw_
SQL Server Agent services, or to change the password for the account. You can also use b. Prefixing a stored procedure with sp_
Configuration Manager to set permissions in the Windows Registry so that the new account c. Using prefixes with schema
can read the SQL Server settings. d. Using the prefix ufn to define a user-defined function
Manage server network and client protocols. SQL Server 2005 supports Shared Memory, 5. Which of the following are good naming practices for indexes? (Choose all that apply.)
TCP/IP, Named Pipes, and VIA protocols. You can use Configuration Manager to config- a. Combine the name of the table and the names of the columns.
ure server and client network protocols and connectivity options. After the correct protocols b. Specify whether the index is clustered or nonclustered.
are enabled using the Surface Area Configuration tool, you usually don’t need to change the c. Include a prefix such as IX_.
server network connections. However, you can use SQL Server Configuration Manager if you d. Use spaces to separate key elements.
need to reconfigure the server connections so that SQL Server listens on a particular network
protocol, port, or pipe. 6. When you have an existing database with poorly named objects that cannot be renamed,
what is the best way improve clarity of the naming conventions?
Assign TCP ports to instances. If instances must listen through TCP ports, you should a. Use a lookup table.
explicitly assign private port numbers. Otherwise, the port numbers are dynamically assigned.
b. Create a new column.
You can use the SQL Server Configuration Manager to assign port numbers. Although you
c. Note in your standards documentation what the poorly named object actually
can change port numbers that are dynamically assigned, client applications that are set up to
represents.
use these port numbers may be adversely affected.
d. Use a synonym.
7. Which of the following is not a bad practice for naming conventions?
When you’re assigning ports, make sure they don’t conflict with port numbers that are a. Using the sp_ prefix in user-defined stored procedure names
already reserved by software vendors. To determine which port numbers are available, b. Inconsistent use of uppercase and lowercase letters
TAKE NOTE
* visit the Internet Assigned Numbers Authority (IANA) Web site at the following URL: c. Using numbers in the name
www.iana.org/assignments/port-numbers. d. Using reserved words for object names
8. Which of the following are not recommended names for tables in a SQL Server
database? (Choose all that apply.)
Figure 2-3 a. Person.Address
SQL Server Configuration b. Person.Address Type
Manager is the preferred tool c. tbl_Person.AddressType
to manage many aspects of d. dbo.MSmerge_history
SQL Server instance configura-
tions, including services.

Screen Images

LAB EXERCISE

Perform the exercise in your lab In Exercise 2.5, you’ll learn how to use the Configuration Manager.
manual.

Lab Exercise callout

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Conventions and Features
Used in This Book

This book uses particular fonts, symbols, and heading conventions to highlight important
information or to call your attention to special steps. For more information about the features
in each lesson, refer to the Illustrated Book Tour section.

C ONVENTION M EANING
This feature provides a brief summary of the material
THE BOTTOM LINE
to be covered in the section that follows.

CERTIFICATION READY? This feature signals the point in the text where a specific
certification objective is covered. It provides you with a
chance to check your understanding of that particular MCITP
objective and, if necessary, review the section of the lesson
where it is covered.
Reader aids appear in shaded boxes found in your text. Take
TAKE NOTE
* Note provides helpful hints related to particular tasks or topics.

ANOTHER WAY
Another Way provides an alternative procedure for accom-
plishing a particular task.
These notes provide pointers to information discussed
X REF
elsewhere in the textbook or describe interesting features
of SQL Server that are not directly addressed in the current
topic or exercise.

A shared printer can be Key terms appear in bold italic.


used by many individuals
on a network.

Key My Name is. Any text you are asked to key appears in color.

Click OK. Any button on the screen you are supposed to click on or
select will also appear in color.

SQL Server 2008. Information that is particular to the 2008 version of SQL
Server is shown in color.

LAB EXERCISE The Lab Exercise feature shows where a corresponding


hands-on exercise is available in the companion
Lab Manual.

www.wiley.com/college/microsoft or
x | call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Instructor Support Program

The Microsoft Official Academic Course programs are accompanied by a rich array of resources
that incorporate the extensive textbook visuals to form a pedagogically cohesive package.
These resources provide all the materials instructors need to deploy and deliver their courses.
Resources available online for download include:
• The MSDN Academic Alliance is designed to provide the easiest and most inexpensive
developer tools, products, and technologies available to faculty and students in labs,
classrooms, and on student PCs. A free 3-year membership is available to qualified
MOAC adopters.
Note: Microsoft SQL Server can be downloaded from MSDN AA for use by students in
this course
• The Instructor’s Guide contains Solutions to all the textbook exercises as well as chapter
summaries and lecture notes. The Instructor’s Guide and Syllabi for various term lengths
are available from the Book Companion site (www.wiley.com/college/microsoft).
• The Test Bank contains hundreds of questions in multiple-choice, true-false, short answer,
and essay formats and is available to download from the Instructor’s Book Companion site
(www.wiley.com/college/microsoft). A complete answer key is provided.
• PowerPoint Presentations and Images. A complete set of PowerPoint presentations is
available on the Instructor’s Book Companion site (www.wiley.com/college/microsoft) to
enhance classroom presentations. Tailored to the text’s topical coverage and Skills Matrix,
these presentations are designed to convey key Microsoft SQL Server concepts addressed
in the text.
All figures from the text are on the Instructor’s Book Companion site (www.wiley.com/
college/microsoft). You can incorporate them into your PowerPoint presentations, or create
your own overhead transparencies and handouts.
By using these visuals in class discussions, you can help focus students’ attention on key
elements of Windows Server and help them understand how to use it effectively in the
workplace.
• When it comes to improving the classroom experience, there is no better source of ideas
and inspiration than your fellow colleagues. The Wiley Faculty Network connects teachers
with technology, facilitates the exchange of best practices, and helps to enhance instructional
efficiency and effectiveness. Faculty Network activities include technology training and
tutorials, virtual seminars, peer-to-peer exchanges of experiences and ideas, personal
consulting, and sharing of resources. For details visit www.WhereFacultyConnect.com.
• Microsoft SQL Server Books Online. This set of online documentation helps you under-
stand SQL Server and how to implement data management and business intelligence projects.
SQL Server Books Online is referred to throughout this text as a valuable supplement to your
work with SQL Server. You can find SQL Server Books Online at http://msdn.microsoft.
com/en-us/library/ms130214(SQL.90).aspx and http://msdn.microsoft.com/en-us/library/
ms130214.aspx.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only) | xi
xii | Instructor Support Program

MSDN ACADEMIC ALLIANCE—FREE 3-YEAR MEMBERSHIP


AVAILABLE TO QUALIFIED ADOPTERS!
The Microsoft Developer Network Academic Alliance (MSDN AA) is designed to provide
the easiest and most inexpensive way for universities to make the latest Microsoft developer
tools, products, and technologies available in labs, classrooms, and on student PCs. MSDN
AA is an annual membership program for departments teaching Science, Technology,
Engineering, and Mathematics (STEM) courses. The membership provides a complete
solution to keep academic labs, faculty, and students on the leading edge of technology.
Software available in the MSDN AA program is provided at no charge to adopting
departments through the Wiley and Microsoft publishing partnership.
As a bonus to this free offer, faculty will be introduced to Microsoft’s Faculty
Connection and Academic Resource Center. It takes time and preparation to keep
students engaged while giving them a fundamental understanding of theory, and the
Microsoft Faculty Connection is designed to help STEM professors with this prepara-
tion by providing articles, curriculum, and tools that professors can use to engage and
inspire today’s technology students.
Contact your Wiley rep for details.
For more information about the MSDN Academic Alliance program, go to:
msdn.microsoft.com/academic/
Note: Microsoft SQL Server can be downloaded from MSDN AA for use by students in
this course.

Important Web Addresses and Phone Numbers


To locate the Wiley Higher Education Rep in your area, go to the following Web address
and click on the “Who’s My Rep? ” link at the top of the page.
www.wiley.com/college
Or Call the MOAC Toll Free Number: 1 + (888) 764-7001 (U.S. & Canada only).
To learn more about becoming a Microsoft Certified Professional and exam availability, visit
www.microsoft.com/learning/mcp.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Student Support Program

Book Companion Web Site (www.wiley.com/college/microsoft)


The students’ book companion site for the MOAC series includes any resources, exercise files,
and Web links that will be used in conjunction with this course.

Wiley Desktop Editions


Wiley MOAC Desktop Editions are innovative, electronic versions of printed textbooks.
Students buy the desktop version for 50% off the U.S. price of the printed text, and get the
added value of permanence and portability. Wiley Desktop Editions provide students with
numerous additional benefits that are not available with other e-text solutions.
Wiley Desktop Editions are NOT subscriptions; students download the Wiley Desktop Edition
to their computer desktops. Students own the content they buy to keep for as long as they want.
Once a Wiley Desktop Edition is downloaded to the computer desktop, students have instant
access to all of the content without being online. Students can also print out the sections they
prefer to read in hard copy. Students also have access to fully integrated resources within their
Wiley Desktop Edition. From highlighting their e-text to taking and sharing notes, students can
easily personalize their Wiley Desktop Edition as they are reading or following along in class.

Microsoft SQL Server Software


As an adopter of a MOAC textbook, your school’s department is eligible for a free three-year
membership to the MSDN Academic Alliance (MSDN AA). Through MSDN AA, full versions
of Microsoft SQL Server are available for your use with this course. See your Wiley rep for details.

Preparing to Take the Microsoft Certified Information


Technology Professional (MCITP) Exam
The Microsoft Certified Information Technology Professional (MCITP) certifications enable
professionals to target specific technologies and to distinguish themselves by demonstrating
in-depth knowledge and expertise in their specialized technologies. Microsoft Certified
Information Technology Professionals are consistently capable of implementing, building,
troubleshooting, and debugging a particular Microsoft Technology.
For organizations the new generation of Microsoft certifications provides better skills verifi-
cation tools that help with assessing not only in-demand skills on Windows Server, but also
the ability to quickly complete on-the-job tasks. Individuals will find it easier to identify and
work toward the certification credential that meets their personal and professional goals.
To learn more about becoming a Microsoft Certified Professional and exam availability, visit
www.microsoft.com/learning/mcp.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only) | xiii
xiv | Student Support Program

Microsoft Certifications for IT Professionals


The new Microsoft Certified Technology Specialist (MCTS) and Microsoft Certified IT
Professional (MCITP) credentials provide IT professionals with a simpler and more targeted
framework to showcase their technical skills in addition to the skills that are required for spe-
cific developer job roles.
The Microsoft Certified Professional (MCP), Microsoft Certified System Administrator
(MCSA), and Microsoft Certified Systems Engineer (MCSE) credentials continue to provide
IT professionals who use Microsoft SQL Server, Windows XP, and Windows Server 2003 with
industry recognition and validation of their IT skills and experience.

Microsoft Certified Technology Specialist


The new Microsoft Certified Tehnology Specialist (MCTS) credential highlights your skills
using a specific Microsoft technology. You can demonstrate your abilities as an IT professional
or developer with in-depth knowledge of the Microsoft technology that you use today or are
planning to deploy.
The MCTS certifications enable professionals to target specific technologies and to distinguish
themselves by demonstrating in-depth knowledge and expertise in their specialized technolo-
gies. Microsoft Certified Technology Specialists are consistently capable of implementing,
building, troubleshooting, and debugging a particular Microsoft technology.
You can learn more about the MCTS program at www.microsoft.com/learning/mcp/mcts.

Microsoft Certified IT Professional


The new Microsoft Certified IT Professional (MCITP) credential lets you highlight your
specific area of expertise. Now, you can easily distinguish yourself as an expert in engineering,
designing, and deploying messaging solutions with Microsoft SQL Server.
By becoming certified, you demonstrate to employers that you have achieved a predictable
level of skill in the use of Microsoft technologies. Employers often require certification either
as a condition of employment or as a condition of advancement within the company or other
organization.
You can learn more about the MCITP program at www.microsoft.com/learning/mcp/mcitp.
The certification examinations are sponsored by Microsoft and administered through Microsoft’s
exam delivery partner Prometric.

Preparing to Take an Exam


Unless you are a very experienced user, you will need to use a test preparation course to
prepare to complete the test correctly and within the time allowed. The Microsoft Official
Academic Course series is designed to prepare you with a strong knowledge of all exam topics,
and with some additional review and practice on your own, you should feel confident in your
ability to pass the appropriate exam.
After you decide which exam to take, review the list of objectives for the exam. You can easily
identify tasks that are included in the objective list by locating the Lesson Skill Matrix at the
start of each lesson and the Certification Ready sidebars in the margin of the lessons in this
book.
To take the MCITP test, visit www.microsoft.com/learning/mcp to locate your nearest testing
center. Then call the testing center directly to schedule your test. The amount of advance notice
you should provide will vary for different testing centers, and it typically depends on the number

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Student Support Program | xv

of computers available at the testing center, the number of other testers who have already been
scheduled for the day on which you want to take the test, and the number of times per week
that the testing center offers MCITP testing. In general, you should call to schedule your test at
least two weeks prior to the date on which you want to take the test.
When you arrive at the testing center, you might be asked for proof of identity. A driver’s
license or passport is an acceptable form of identification. If you do not have either of these
items of documentation, call your testing center and ask what alternative forms of identifica-
tion will be accepted. If you are retaking a test, bring your MCITP identification number,
which will have been given to you when you previously took the test. If you have not prepaid
or if your organization has not already arranged to make payment for you, you will need to
pay the test-taking fee when you arrive.

Student CD
The CD-ROM included with this book contains practice exams that will help you hone
your knowledge before you take the MCITP Microsoft SQL Server Database Administrator
70–443/450 certification examination. The exams are meant to provide practice for your
certification exam and are also good reinforcement of the material covered in the course.
The enclosed Student CD will run automatically. Upon accepting the license agreement, you
will proceed directly to the exams. The exams also can be accessed through the Assets folder
located within the CD files.

Microsoft SQL Server Books Online


This set of online documentation helps you understand SQL Server and how to implement
data management and business intelligence projects. SQL Server Books Online is referred to
throughout this text as a valuable supplement to your work with SQL Server. You can find
SQL Server Books Online at http://msdn.microsoft.com/en-us/library/ms130214(SQL.90).
aspx and http://msdn.microsoft.com/en-us/library/ms130214.aspx.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
About the Authors

Dave Owen graduated from California State Polytechnic College as an Electronic Engineer
with an emphasis on communications theory. He did, however, have to take a programming
course: Fortran 4 with data entry on a punch card machine and submitted as a batch file for
overnight processing.
He was the seventh employee hired in 1971 at the Naval Civil Engineering Laboratory for
the then brand new initiative to bring naval facilities in line with environmental compliance
regulations. He ended up as the data management guy. He was a programmer (it was Rocky
Mountain BASIC then); he was the network guy; he was the database guy (you have proba-
bly never heard of Speed for the Wang Computer System); he was the enterprise planner (he
led an effort to update the Navy’s data tracking system, which was approved and budgeted
at $33 million); he ran the help desk (his team was the only Naval Facilities Engineering
Command group to receive an Outstanding rating by the Inspector General); he was having
a good time.
After retirement, he visited the County of Ventura Workforce Development Division who
offered him training to become certified in Microsoft and Novell technologies. He was too
long entrenched. He needed to become current. He became a CNA, MCSE, and MCDBA.
In the fall of 1998 he started teaching at Moorpark College. He taught every certificated
computer and networking topic desired by the Department Chair and earned his MCT
two years later. In addition he started teaching at Microsoft Certified Partners for Learning
Solutions such as New Horizons. A lot of other certifications followed.
He likes teaching. He learns more from students than self study or trying to fix problems.
Students approach situations in ways he can never imagine. Understanding their perspective
provides him with infinitely more insight than he could ever glean alone. Now he’s preparing
college text books and other publications.

Wayne Boyer is a consultant, systems analyst, programmer, network engineer, and infor-
mation systems manager who started working with relational database systems just a short
while ago in 1978. Wayne has extensive application systems experience with manufacturing
and financial systems sometimes commonly referred to as MRP and ERP systems. Most of
Wayne’s experience in years past was with HP-3000 minicomputer systems running a rela-
tional database system known as Image. This experience with database systems let to Wayne’s
current expertise and experience with modern relational database systems such as SQL Server
and Oracle. With over 30 years of Information Technology experience, Wayne brings a depth
of real-world experience to current technology topics. Currently Wayne is teaching Microsoft
curriculum topics at Moorpark College while also consulting and providing support for a
wide variety of clients on a range of IT related subjects. He has also acquired a number of
industry certifications: MCSE, MCDBA, MCITP for SQL Server 2005, and MCITP for
Enterprise Support. Currently Wayne is working toward Cisco networking certification as
well as an upgrade to MCITP for SQL Server 2008.

www.wiley.com/college/microsoft or
xvi | call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
About the Author | xvii

Steve Jones has been working with SQL Server for more than a decade, starting with v4.2 on
OS/2 and enjoying the new features and capabilities of every version since. After working as a
DBA and developer for a variety of companies, Steve founded SQLServerCentral.com along
with Brian Knight and Andy Warren in 2001. SQLServerCentral.com has grown into a won-
derful SQL Server community that provides daily articles and questions on all aspects of SQL
Server to over 300,000 members. Starting in 2004, Steve became the full-time editor of the
community and ensures it continues to evolve into the best resource possible for SQL Server
professionals. Over the last decade, Steve has written more than 200 articles about SQL
Server for SQLServerCentral.com, the SQL Server Standard magazine, SQL Server Magazine,
and Database Journal. Steve has spoken at the PASS Summits where SQLServerCentral.com
sponsors an opening reception every year as well as written a prior book for Sybex on SQL
Server 2000.

David W. Tschanz is the coauthor of the recent Sybex book Mastering SQL Server 2005. He
has been working with and managing large datasets for four decades. His work has included
analysis of population dynamics, voting behavior, and epidemiological data. He has been
writing on computer topics for the past several years, including 4 books and about 100
articles in the area. He is also a regular contributor to Redmond magazine. Dave currently lives
outside the United States, where his eclectic nature allows him to pursue projects involving
databases, IT infrastructure, web development, archaeology, the ancient Nabataean capital
of Petra, medical history, military science, and demography. He can be reached by e-mail at
desertwriter1121@yahoo.com, or look for him in Connecticut, Saudi Arabia, or Tasmania, his
three favorite haunts.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Acknowledgments

MOAC Instructor Advisory Board


We thank our Instructor Advisory Board, an elite group of educators who has assisted us every
step of the way in building these products. Advisory Board members have acted as our sounding
board on key pedagogical and design decisions leading to the development of these compelling
and innovative textbooks for future Information Workers. Their dedication to technology
education is truly appreciated.

Charles DeSassure, Tarrant County College


Charles DeSassure is Department Chair and Instructor of Computer Science & Information
Technology at Tarrant County College Southeast Campus, Arlington, Texas. He has had
experience as a MIS Manager, system analyst, field technology analyst, LAN Administrator,
microcomputer specialist, and public school teacher in South Carolina. DeSassure has worked
in higher education for more than ten years and received the Excellence Award in Teaching
from the National Institute for Staff and Organizational Development (NISOD). He currently
serves on the Educational Testing Service (ETS) iSkills National Advisory Committee and
chaired the Tarrant County College District Student Assessment Committee. He has written
proposals and makes presentations at major educational conferences nationwide. DeSassure has
served as a textbook reviewer for John Wiley & Sons and Prentice Hall. He teaches courses in
information security, networking, distance learning, and computer literacy. DeSassure holds a
master’s degree in Computer Resources & Information Management from Webster University.

Kim Ehlert, Waukesha County Technical College


Kim Ehlert is the Microsoft Program Coordinator and a Network Specialist instructor at
Waukesha County Technical College, teaching the full range of MCSE and networking courses
for the past nine years. Prior to joining WCTC, Kim was a professor at the Milwaukee School of
Engineering for five years where she oversaw the Novell Academic Education and the Microsoft
IT Academy programs. She has a wide variety of industry experience including network design
and management for Johnson Controls, local city fire departments, police departments, large
church congregations, health departments, and accounting firms. Kim holds many industry certi-
fications including MCDST, MCSE, Security⫹, Network⫹, Server⫹, MCT, and CNE.
Kim has a bachelor’s degree in Information Systems and a master’s degree in Business
Administration from the University of Wisconsin Milwaukee. When she is not busy teach-
ing, she enjoys spending time with her husband Gregg and their two children—Alex, 14, and
Courtney, 17.

Penny Gudgeon, Corinthian Colleges, Inc.


Penny Gudgeon is the Program Manager for IT curriculum at Corinthian Colleges, Inc.
Previously, she was responsible for computer programming and web curriculum for twenty-
seven campuses in Corinthian’s Canadian division, CDI College of Business, Technology and
Health Care. Penny joined CDI College in 1997 as a computer programming instructor at
one of the campuses outside of Toronto. Prior to joining CDI College, Penny taught produc-
tivity software at another Canadian college, the Academy of Learning, for four years. Penny
has experience in helping students achieve their goals through various learning models from
instructor-led to self-directed to online.

www.wiley.com/college/microsoft or
xviii | call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Acknowledgments | xix

Before embarking on a career in education, Penny worked in the fields of advertising, market-
ing/sales, mechanical and electronic engineering technology, and computer programming. When
not working from her home office or indulging her passion for lifelong learning, Penny likes to
read mysteries, garden, and relax at home in Hamilton, Ontario, with her Shih-Tzu, Gracie.

Margaret Leary, Northern Virginia Community College


Margaret Leary is Professor of IST at Northern Virginia Community College, teaching
Networking and Network Security Courses for the past ten years. She is the co-Principal
Investigator on the CyberWATCH initiative, an NSF-funded regional consortium of higher
education institutions and businesses working together to increase the number of network
security personnel in the workforce. She also serves as a Senior Security Policy Manager and
Research Analyst at Nortel Government Solutions and holds a CISSP certification.
Margaret holds a B.S.B.A. and MBA/Technology Management from the University
of Phoenix, and is pursuing her Ph.D. in Organization and Management with an IT
Specialization at Capella University. Her dissertation is titled “Quantifying the Discoverability
of Identity Attributes in Internet-Based Public Records: Impact on Identity Theft and
Knowledge-based Authentication.” She has several other published articles in various govern-
ment and industry magazines, notably on identity management and network security.

Wen Liu, ITT Educational Services, Inc.


Wen Liu is Director of Corporate Curriculum Development at ITT Educational Services,
Inc. He joined the ITT corporate headquarters in 1998 as a Senior Network Analyst to
plan and deploy the corporate WAN infrastructure. A year later he assumed the position
of Corporate Curriculum Manager supervising the curriculum development of all IT pro-
grams. After he was promoted to the current position three years ago, he continued to man-
age the curriculum research and development for all the programs offered in the School of
Information Technology in addition to supervising the curriculum development in other areas
(such as Schools of Drafting and Design and Schools of Electronics Technology). Prior to his
employment with ITT Educational Services, Liu was a Telecommunications Analyst at the
state government of Indiana working on the state backbone project that provided Internet
and telecommunications services to the public users such as K-12 and higher education
institutions, government agencies, libraries, and healthcare facilities.
Wen Liu has an M.A. in Student Personnel Administration in Higher Education and an
M.S. in Information and Communications Sciences from Ball State University, Indiana.
He used to be the director of special projects on the board of directors of the Indiana
Telecommunications User Association, and used to serve on Course Technology’s IT Advisory
Board. He is currently a member of the IEEE and its Computer Society.

Jared Spencer, Westwood College Online


Jared Spencer has been the Lead Faculty for Networking at Westwood College Online since
2006. He began teaching in 2001 and has taught both on-ground and online for a variety of
institutions, including Robert Morris University and Point Park University. In addition to his
academic background, he has more than fifteen years of industry experience working for com-
panies including the Thomson Corporation and IBM.
Jared has a master’s degree in Internet Information Systems and is currently ABD and
pursuing his doctorate in Information Systems at Nova Southeastern University. He has
authored several papers that have been presented at conferences and appeared in publica-
tions such as the Journal of Internet Commerce and the Journal of Information Privacy
and Security (JIPC). He holds a number of industry certifications, including AIX (UNIX),
A⫹, Network⫹, Security⫹, MCSA on Windows 2000, and MCSA on Windows 2003
Server.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
xx | Acknowledgments

We thank Steve Strom from Butler Community College for his diligent review, providing
invaluable feedback in the service of quality instructional materials.

Focus Group and Survey Participants


Finally, we thank the hundreds of instructors who participated in our focus groups and surveys
to ensure that the Microsoft Official Academic Courses best met the needs of our customers.

Jean Aguilar, Mt. Hood Community Catherine Binder, Strayer University Greg Clements, Midland Lutheran
College & Katharine Gibbs School– College
Konrad Akens, Zane State College Philadelphia Dayna Coker, Southwestern Oklahoma
Michael Albers, University of Terrel Blair, El Centro College State University–Sayre Campus
Memphis Ruth Blalock, Alamance Community Tamra Collins, Otero Junior College
Diana Anderson, Big Sandy College Janet Conrey, Gavilan Community
Community & Technical College Beverly Bohner, Reading Area College
Phyllis Anderson, Delaware County Community College Carol Cornforth, West Virginia
Community College Henry Bojack, Farmingdale State Northern Community College
Judith Andrews, Feather River College University Gary Cotton, American River College
Damon Antos, American River Matthew Bowie, Luna Community Edie Cox, Chattahoochee Technical
College College College
Bridget Archer, Oakton Community Julie Boyles, Portland Community Rollie Cox, Madison Area Technical
College College College
Linda Arnold, Harrisburg Area Karen Brandt, College of the Albemarle David Crawford, Northwestern
Community College–Lebanon Stephen Brown, College of San Mateo Michigan College
Campus Jared Bruckner, Southern Adventist J.K. Crowley, Victor Valley College
Neha Arya, Fullerton College University Rosalyn Culver, Washtenaw
Mohammad Bajwa, Katharine Gibbs Pam Brune, Chattanooga State Community College
School–New York Technical Community College Sharon Custer, Huntington University
Virginia Baker, University of Alaska Sue Buchholz, Georgia Perimeter College Sandra Daniels, New River Community
Fairbanks Roberta Buczyna, Edison College College
Carla Bannick, Pima Community Angela Butler, Mississippi Gulf Coast Anila Das, Cedar Valley College
College Community College Brad Davis, Santa Rosa Junior College
Rita Barkley, Northeast Alabama Rebecca Byrd, Augusta Technical College Susan Davis, Green River Community
Community College Kristen Callahan, Mercer County College
Elsa Barr, Central Community College– Community College Mark Dawdy, Lincoln Land
Hastings Judy Cameron, Spokane Community Community College
Ronald W. Barry, Ventura County College Jennifer Day, Sinclair Community
Community College District Dianne Campbell, Athens Technical College
Elizabeth Bastedo, Central Carolina College Carol Deane, Eastern Idaho Technical
Technical College Gena Casas, Florida Community College
Karen Baston, Waubonsee Community College at Jacksonville Julie DeBuhr, Lewis-Clark State College
College Jesus Castrejon, Latin Technologies Janis DeHaven, Central Community
Karen Bean, Blinn College Gail Chambers, Southwest Tennessee College
Scott Beckstrand, Community College Community College Drew Dekreon, University of Alaska–
of Southern Nevada Jacques Chansavang, Indiana Anchorage
Paulette Bell, Santa Rosa Junior College University–Purdue University Fort Joy DePover, Central Lakes College
Liz Bennett, Southeast Technical Wayne Salli DiBartolo, Brevard Community
Institute Nancy Chapko, Milwaukee Area College
Nancy Bermea, Olympic College Technical College Melissa Diegnau, Riverland
Lucy Betz, Milwaukee Area Technical Rebecca Chavez, Yavapai College Community College
College Sanjiv Chopra, Thomas Nelson Al Dillard, Lansdale School of Business
Meral Binbasioglu, Hofstra University Community College Marjorie Duffy, Cosumnes River College

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Acknowledgments | xxi

Sarah Dunn, Southwest Tennessee Mike Grabill, Katharine Gibbs Pashia Hogan, Northeast
Community College School–Philadelphia State Technical Community
Shahla Durany, Tarrant County Francis Green, Penn State University College
College–South Campus Walter Griffin, Blinn College Susan Hoggard, Tulsa Community
Kay Durden, University of Tennessee at Fillmore Guinn, Odessa College College
Martin Helen Haasch, Milwaukee Area Kathleen Holliman, Wallace
Dineen Ebert, St. Louis Community Technical College Community College Selma
College–Meramec John Habal, Ventura College Chastity Honchul, Brown Mackie
Donna Ehrhart, State University of Joy Haerens, Chaffey College College/Wright State University
New York–Brockport Norman Hahn, Thomas Nelson Christie Hovey, Lincoln Land
Larry Elias, Montgomery County Community College Community College
Community College Kathy Hall, Alamance Community Peggy Hughes, Allegany College of
Glenda Elser, New Mexico State College Maryland
University at Alamogordo Teri Harbacheck, Boise State University Sandra Hume, Chippewa Valley
Angela Evangelinos, Monroe County Linda Harper, Richland Community Technical College
Community College College John Hutson, Aims Community
Angie Evans, Ivy Tech Community Maureen Harper, Indian Hills College
College of Indiana Community College Celia Ing, Sacramento City College
Linda Farrington, Indian Hills Steve Harris, Katharine Gibbs School– Joan Ivey, Lanier Technical College
Community College New York Barbara Jaffari, College of the
Dana Fladhammer, Phoenix College Robyn Hart, Fresno City College Redwoods
Richard Flores, Citrus College Darien Hartman, Boise State Penny Jakes, University of Montana
Connie Fox, Community and University College of Technology
Technical College at Institute Gina Hatcher, Tacoma Community Eduardo Jaramillo, Peninsula College
of Technology West Virginia College Barbara Jauken, Southeast Community
University Winona T. Hatcher, Aiken Technical College
Wanda Freeman, Okefenokee College Susan Jennings, Stephen F. Austin
Technical College BJ Hathaway, Northeast Wisconsin Tech State University
Brenda Freeman, Augusta Technical College Leslie Jernberg, Eastern Idaho Technical
College Cynthia Hauki, West Hills College– College
Susan Fry, Boise State University Coalinga Linda Johns, Georgia Perimeter
Roger Fulk, Wright State University– Mary L. Haynes, Wayne County College
Lake Campus Community College Brent Johnson, Okefenokee Technical
Sue Furnas, Collin County Marcie Hawkins, Zane State College College
Community College District Steve Hebrock, Ohio State Mary Johnson, Mt. San Antonio College
Sandy Gabel, Vernon College University Agricultural Technical Shirley Johnson, Trinidad State Junior
Laura Galvan, Fayetteville Technical Institute College–Valley Campus
Community College Sue Heistand, Iowa Central Community Sandra M. Jolley, Tarrant County
Candace Garrod, Red Rocks College College
Community College Heith Hennel, Valencia Community Teresa Jolly, South Georgia Technical
Sherrie Geitgey, Northwest State College College
Community College Donna Hendricks, South Arkansas Dr. Deborah Jones, South Georgia
Chris Gerig, Chattahoochee Technical Community College Technical College
College Judy Hendrix, Dyersburg State Margie Jones, Central Virginia
Barb Gillespie, Cuyamaca College Community College Community College
Jessica Gilmore, Highline Community Gloria Hensel, Matanuska-Susitna Randall Jones, Marshall Community
College College University of Alaska and Technical College
Pamela Gilmore, Reedley College Anchorage Diane Karlsbraaten, Lake Region State
Debbie Glinert, Queensborough Gwendolyn Hester, Richland College College
Community College Tammarra Holmes, Laramie County Teresa Keller, Ivy Tech Community
Steven Goldman, Polk Community Community College College of Indiana
College Dee Hobson, Richland College Charles Kemnitz, Pennsylvania College
Bettie Goodman, C.S. Mott Keith Hoell, Katharine Gibbs School– of Technology
Community College New York Sandra Kinghorn, Ventura College

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
xxii | Acknowledgments

Bill Klein, Katharine Gibbs School– Pat R. Lyon, Tomball College Darrelyn Miller, Grays Harbor College
Philadelphia Bill Madden, Bergen Community Sue Mitchell, Calhoun Community
Bea Knaapen, Fresno City College College College
Kit Kofoed, Western Wyoming Heather Madden, Delaware Technical Jacquie Moldenhauer, Front Range
Community College & Community College Community College
Maria Kolatis, County College of Morris Donna Madsen, Kirkwood Community Linda Motonaga, Los Angeles City
Barry Kolb, Ocean County College College College
Karen Kuralt, University of Arkansas Jane Maringer-Cantu, Gavilan College Sam Mryyan, Allen County
at Little Rock Suzanne Marks, Bellevue Community Community College
Belva-Carole Lamb, Rogue Community College Cindy Murphy, Southeastern
College Carol Martin, Louisiana State Community College
Betty Lambert, Des Moines Area University–Alexandria Ryan Murphy, Sinclair Community
Community College Cheryl Martucci, Diablo Valley College College
Anita Lande, Cabrillo College Roberta Marvel, Eastern Wyoming Sharon E. Nastav, Johnson County
Junnae Landry, Pratt Community College Community College
College Tom Mason, Brookdale Community Christine Naylor, Kent State
Karen Lankisch, UC Clermont College University Ashtabula
David Lanzilla, Central Florida Mindy Mass, Santa Barbara City College Haji Nazarian, Seattle Central
Community College Dixie Massaro, Irvine Valley College Community College
Nora Laredo, Cerritos Community Rebekah May, Ashland Community Nancy Noe, Linn-Benton Community
College & Technical College College
Jennifer Larrabee, Chippewa Valley Emma Mays-Reynolds, Dyersburg Jennie Noriega, San Joaquin Delta
Technical College State Community College College
Debra Larson, Idaho State University Timothy Mayes, Metropolitan State Linda Nutter, Peninsula College
Barb Lave, Portland Community College College of Denver Thomas Omerza, Middle Bucks
Audrey Lawrence, Tidewater Reggie McCarthy, Central Lakes College Institute of Technology
Community College Matt McCaskill, Brevard Community Edith Orozco, St. Philip’s College
Deborah Layton, Eastern Oklahoma College Dona Orr, Boise State University
State College Kevin McFarlane, Front Range Joanne Osgood, Chaffey College
Larry LeBlanc, Owen Graduate Community College Janice Owens, Kishwaukee College
School–Vanderbilt University Donna McGill, Yuba Community Tatyana Pashnyak, Bainbridge College
Philip Lee, Nashville State Community College John Partacz, College of DuPage
College Terri McKeever, Ozarks Technical Tim Paul, Montana State University–
Michael Lehrfeld, Brevard Community Community College Great Falls
College Patricia McMahon, South Suburban Joseph Perez, South Texas College
Vasant Limaye, Southwest Collegiate College Mike Peterson, Chemeketa
Institute for the Deaf – Howard Sally McMillin, Katharine Gibbs Community College
College School–Philadelphia Dr. Karen R. Petitto, West Virginia
Anne C. Lewis, Edgecombe Charles McNerney, Bergen Community Wesleyan College
Community College College Terry Pierce, Onandaga Community
Stephen Linkin, Houston Community Lisa Mears, Palm Beach Community College
College College Ashlee Pieris, Raritan Valley Community
Peggy Linston, Athens Technical College Imran Mehmood, ITT Technical College
Hugh Lofton, Moultrie Technical Institute–King of Prussia Campus Jamie Pinchot, Thiel College
College Virginia Melvin, Southwest Tennessee Michelle Poertner, Northwestern
Donna Lohn, Lakeland Community Community College Michigan College
College Jeanne Mercer, Texas State Technical Betty Posta, University of Toledo
Jackie Lou, Lake Tahoe Community College Deborah Powell, West Central Technical
College Denise Merrell, Jefferson Community College
Donna Love, Gaston College & Technical College Mark Pranger, Rogers State University
Curt Lynch, Ozarks Technical Catherine Merrikin, Pearl River Carolyn Rainey, Southeast Missouri
Community College Community College State University
Sheilah Lynn, Florida Community Diane D. Mickey, Northern Virginia Linda Raskovich, Hibbing Community
College–Jacksonville Community College College

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Acknowledgments | xxiii

Leslie Ratliff, Griffin Technical College Beth Sindt, Hawkeye Community Brad Vogt, Northeast Community
Mar-Sue Ratzke, Rio Hondo College College
Community College Andrew Smith, Marian College Cozell Wagner, Southeastern
Roxy Reissen, Southeastern Community Brenda Smith, Southwest Tennessee Community College
College Community College Carolyn Walker, Tri-County Technical
Silvio Reyes, Technical Career Institutes Lynne Smith, State University of New College
Patricia Rishavy, Anoka Technical York–Delhi Sherry Walker, Tulsa Community College
College Rob Smith, Katharine Gibbs School– Qi Wang, Tacoma Community College
Jean Robbins, Southeast Technical Philadelphia Betty Wanielista, Valencia Community
Institute Tonya Smith, Arkansas State College
Carol Roberts, Eastern Maine University–Mountain Home Marge Warber, Lanier Technical
Community College and University Del Spencer – Trinity Valley College–Forsyth Campus
of Maine Community College Marjorie Webster, Bergen Community
Teresa Roberts, Wilson Technical Jeri Spinner, Idaho State University College
Community College Eric Stadnik, Santa Rosa Junior College Linda Wenn, Central Community
Vicki Robertson, Southwest Tennessee Karen Stanton, Los Medanos College College
Community College Meg Stoner, Santa Rosa Junior College Mark Westlund, Olympic College
Betty Rogge, Ohio State Agricultural Beverly Stowers, Ivy Tech Community Carolyn Whited, Roane State
Technical Institute College of Indiana Community College
Lynne Rusley, Missouri Southern State Marcia Stranix, Yuba College Winona Whited, Richland College
University Kim Styles, Tri-County Technical College Jerry Wilkerson, Scott Community
Claude Russo, Brevard Community Sylvia Summers, Tacoma Community College
College College Joel Willenbring, Fullerton College
Ginger Sabine, Northwestern Technical Beverly Swann, Delaware Technical & Barbara Williams, WITC Superior
College Community College Charlotte Williams, Jones County
Steven Sachs, Los Angeles Valley College Ann Taff, Tulsa Community College Junior College
Joanne Salas, Olympic College Mike Theiss, University of Wisconsin– Bonnie Willy, Ivy Tech Community
Lloyd Sandmann, Pima Community Marathon Campus College of Indiana
College–Desert Vista Campus Romy Thiele, Cañada College Diane Wilson, J. Sargeant Reynolds
Beverly Santillo, Georgia Perimeter Sharron Thompson, Portland Community College
College Community College James Wolfe, Metropolitan
Theresa Savarese, San Diego City College Ingrid Thompson-Sellers, Georgia Community College
Sharolyn Sayers, Milwaukee Area Perimeter College Marjory Wooten, Lanier Technical
Technical College Barbara Tietsort, University of College
Judith Scheeren, Westmoreland Cincinnati–Raymond Walters College Mark Yanko, Hocking College
County Community College Janine Tiffany, Reading Area Alexis Yusov, Pace University
Adolph Scheiwe, Joliet Junior College Community College Naeem Zaman, San Joaquin Delta
Marilyn Schmid, Asheville-Buncombe Denise Tillery, University of Nevada College
Technical Community College Las Vegas Kathleen Zimmerman, Des Moines
Janet Sebesy, Cuyahoga Community Susan Trebelhorn, Normandale Area Community College
College Community College
Phyllis T. Shafer, Brookdale Noel Trout, Santiago Canyon College We also thank Lutz Ziob, Merrick Van
Community College Cheryl Turgeon, Asnuntuck Dongen, Jim LeValley, Bruce Curling,
Ralph Shafer, Truckee Meadows Community College Joe Wilson, Rob Linsky, Jim Clark,
Community College Steve Turner, Ventura College Jim Palmeri, Scott Serna, Ben Watson,
Anne Marie Shanley, County College Sylvia Unwin, Bellevue Community and David Bramble at Microsoft for
of Morris College their encouragement and support
Shelia Shelton, Surry Community Lilly Vigil, Colorado Mountain College in making the Microsoft Official
College Sabrina Vincent, College of the Academic Course programs the finest
Merilyn Shepherd, Danville Area Mainland instructional materials for mastering
Community College Mary Vitrano, Palm Beach the newest Microsoft technologies for
Susan Sinele, Aims Community College Community College both students and instructors.

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
This page intentionally left blank
Brief Contents

Preface iv

1 Designing the Hardware and Software Infrastructure 1


2 Designing Physical Storage 30
3 Designing a Consolidation Strategy 58
4 Analyzing and Designing Security 87
5 Designing Windows Server-Level Security 107
6 Designing SQL Server Service-Level and Database-Level Security 129
7 Designing SQL Server Object-Level Security 150
8 Designing a Physical Database 168
9 Creating Database Conventions and Standards 194
10 Designing a SQL Server Solution for High Availability 209
11 Designing a Data Recovery Solution for a Database 242
12 Designing a Data-Archiving Solution 265

Glossary 285
Index 287

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only) | xxv
This page intentionally left blank
Contents

Lesson 1: Designing the Hardware and Estimating Database Size 33


Planning for Capacity 34
Software Infrastructure 1 Data Compression 34
Sparse Columns 35
Lesson Skill Matrix 1 Understanding RAID 36
Key Terms 1 Designing Transaction Log Storage 37
Analyzing the Current Configuration 2 Managing Transaction Log File Size 38
Thinking Holistically 3 Designing Backup-File Storage 41
Assessing the Current Configuration 3 Managing Your Backups 41
Accommodating Changing Capacity Requirements 4 Maintaining Transaction Log Backups 42
Designing for Capacity Requirements 6 Backup Compression 42
Analyzing Storage Requirements 6 Deciding Where to Install the Operating System 43
Forecasting and Planning Storage Requirements 7
Deciding Where to Place SQL Server Service
Analyzing Network Requirements 9
Executables 43
Analyzing CPU Requirements 11
Analyzing Memory Requirements 12 Specifying the Number and Placement of Files for
Each Database 44
Specifying Software Versions And Hardware
Setting Up Database Files 44
Configurations 13
Setting Up Filenames 45
Following Best Practices 14
Setting Up File Size 45
Choosing a Version and Edition of the Operating
Setting Up Database Filegroups 45
System 14
Choosing an Edition of SQL Server 20 Designing Instances 45
Choosing a CPU Type 22 Deciding on the Number of Instances 46
Choosing Memory Options 23 Deciding How to Name Instances 47
Determining Storage Requirements 24
Deciding How Many Physical Servers Are
Planning for Hot Add CPUs and RAM 24
Needed 49
Skill Summary 25 Deciding Where to Place System Databases for each
Knowledge Assessment 26 Instance 49
Case Study 28 Deciding on the Tempdb Database Physical Storage 50
Establishing Service Requirements 52
Specifying Instance Configurations 52
Lesson 2: Designing Physical Skill Summary 54
Storage 30 Knowledge Assessment 55
Case Study 55
Lesson Skill Matrix 30
Key Terms 30
Lesson 3: Designing a Consolidation
Understanding SQL Server Storage
Concepts 31
Strategy 58
Understanding Data Files and Transaction
Log Files 31 Lesson Skill Matrix 58
Understanding Pages 32 Key Terms 58
Understanding Extents 33 Phase 1: Envisioning 59

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only) | xxvii
xxviii | Contents

Forming a Team 59 Lesson 5: Designing Windows


Making the Decision to Consolidate 60
Developing Guidelines for the Consolidation
Server-Level Security 107
Project 65
Examining Your Environment 66 Lesson Skill Matrix 107
Phase 2: Planning 73 Key Terms 107
Evaluating Your Data 74 Understanding Password Rules 108
Making Initial Decisions about the Plan 75 Enforcing the Password Policy 109
Case Study: Consolidating and Clustering 77 Enforcing Password Expiration 109
Planning to Migrate Applications 78 Enforcing a Password Change at the Next Login 110
Case Study: Avoiding Scope Creep 79 Following Password Best Practices 110
Phase 3: Developing 80 Setting Up the Encryption Policy 110
Acquiring Your Hardware 80 Understanding the Encryption Hierarchy 111
Creating the Proof of Concept 81 Using Symmetric and Asymmetric Keys 111
Creating the Pilot 81 Using Certificates 112
Phase 4: Deploying 82 Considering Performance Issues 112
Developing an Encryption Policy 113
Skill Summary 83
Managing Keys 114
Knowledge Assessment 83 Choosing Keys 114
Case Study 83 Extensible Key Management 114
Introducing SQL Server Service Accounts 115
Understanding the SQL Server Services 115
Choosing a Service Account 117
Lesson 4: Analyzing and Designing Choosing a Domain User 118
Security 87 Choosing a Local Service 118
Choosing a Network Service 119
Lesson Skill Matrix 87 Choosing a Local System 119
Key Terms 87 Case Study: Planning for Services 119
Changing Service Accounts 119
Gathering Your Security Requirements 88
Case Study: Gathering Requirements 89 Setting Up Antivirus Software 122
Understanding Security Scope 89 Working with Services 123
Analyzing Your Security Requirements 90 Configuring Server Firewalls 124
Dealing with Conflicting Requirements 91 Physically Securing Your Servers 125
Analyzing the Cost of Requirements 92 Skill Summary 125
Integrating with the Enterprise 93 Knowledge Assessment 126
Choosing an Authentication Method 94
Setting Up Using Groups and Roles 94 Case Study 126
Assessing the Impact of Network
Policies 96
Achieving High Availability in a Secure
Lesson 6: Designing SQL Server
Way 97 Service-Level and Database-
Mitigating Server Attacks 99 Level Security 129
Protecting Backups 101
Auditing Access 101 Lesson Skill Matrix 129
Making Security Recommendations 102 Key Terms 129
Performing Ongoing Reviews 102 Creating Logins 130
Skill Summary 103 Granting Server Roles 131
Knowledge Assessment 103 Mapping Database Users to Roles 132
Case Study 103 Securing Schemas 133

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Contents | xxix

Granting Database Roles 134 Knowledge Assessment 165


Working with Fixed Database Roles 134 Case Study 165
Working with User-Defined Roles 135
Using Application Roles 136
Lesson 8: Designing a Physical
Introducing DDL Triggers 137
Understanding DDL Trigger Scope 137 Database 168
Specifying DDL Trigger Events 138
Defining a DDL Trigger Policy 139 Lesson Skill Matrix 168
Defining a Database-Level Encryption Policy 140 Key Terms 169
Transparent Data Encryption 140 Modifying a Database Design Based on Performance
Securing Endpoints 141 and Business Requirements 170
Introducing TDS Endpoints 142 Planning a Database 170
Using SOAP/Web Service Endpoints 142 Ensuring That a Database Is Normalized 171
Working with Service Broker and Database Mirroring Allowing Selected Denormalization for Performance
Endpoints 143 Purposes 171
Defining an Endpoint Policy 143 Ensuring That the Database Is Documented and
Granting SQL Server Agent Job Roles 144 Diagrammed 172
Case Study: Specifying Proxies 144 Designing Tables 173
Designing .NET Assembly Security 145 Deciding Whether Partitioning Is Appropriate 174
Setting SAFE 145 Specifying Primary and Foreign Keys 175
Setting EXTERNAL_ACCESS 146 Choosing a Primary Key 176
Setting UNSAFE 146 Using Constraints 180
Deciding Whether to Persist Computed Columns 182
Skill Summary 146 Specifying Physical Location of Tables, Including Filegroups
Knowledge Assessment 147 and a Partitioning Scheme 182
Case Study 147 Designing Filegroups 182
Designing Filegroups for Performance 183
Designing Filegroups for Recoverability 184
Lesson 7: Designing SQL Server Designing Filegroups for Partitioning 184
Object-Level Security 150 Designing Index Usage 184
Designing Indexes to Make Data Access Faster and to
Lesson Skill Matrix 150 Improve Data Modification 185
Key Terms 150 Creating Indexes with the Database Tuning Advisor 186
Developing a Permissions Strategy 151 Specifying Physical Placement of Indexes 186
Understanding Permissions 152 Designing Views 187
Applying Specific Permissions 153 Analyzing Business Requirements 187
Analyzing Existing Permissions 154 Choosing the Type of View 188
Specifying the Execution Context 155 Specifying Row and Column Filtering 189
Implementing EXECUTE AS for an Object 155 Skill Summary 189
Case Study: Developing an EXECUTE AS Policy for Knowledge Assessment 190
an Object 156
Case Study 190
Implementing EXECUTE AS in Batches 157
Auditing 158
Developing an EXECUTE AS Policy for Batches 159 Lesson 9: Creating Database
Specifying Column-Level Encryption 159 Conventions 194
Choosing Keys 160
Deploying Encryption 160 Lesson Skill Matrix 194
Using CLR Security 161 Key Terms 194
Creating Assemblies 161 Understanding the Benefits of Database Naming
Accessing External Resources 162 Conventions 195
Developing a CLR Policy 163 Establishing and Disseminating Naming
Skill Summary 164 Conventions 196

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
xxx | Contents

Defining Database Standards 200 Minimizing Downtime 236


Transact-SQL Coding Standards 200 Implementing Address Abstraction 237
Defining Database Access Standards 201 Training Your Staff 237
Deployment Process Standards 203 Skill Summary 237
Database Security Standards 205 Knowledge Assessment 238
Skill Summary 205 Case Study 238
Knowledge Assessment 206

Lesson 11: Designing a Data


Lesson 10: Designing a SQL Server Recovery Solution for a
Solution for High Database 242
Availability 209
Lesson Skill Matrix 242
Lesson Skill Matrix 209 Key Terms 242
Key Terms 210 Backing Up Data 243
Examining High-Availability Technologies 211 Restoring Databases 246
Identifying Single Points of Failure 211
Devising a Backup Strategy 249
Setting High-Availability System Goals 212
Designing a Backup and Restore Strategy:
Recognizing High-Availability System
The Process 251
Limitations 213
Choosing a Recovery Model 253
Understanding Clustering 214
Developing Database Mitigation Plans 257
Understanding Clustering Requirements 215
Designing a Clustering Solution 216 Skill Summary 261
Clustering Enhancements 218 Knowledge Assessment 262
Considering Geographic Design 219 Case Study 262
Making Hardware Decisions 219
Addressing Licensing Costs 220
Understanding Database Mirroring 220 Lesson 12: Designing a Data-
Designing Server Roles for Database Archiving Solution 265
Mirroring 221
Understanding Protection Levels 222 Lesson Skill Matrix 265
Designing a Database-Mirroring Solution 223
Configuring a Database-Mirroring Solution 224 Key Terms 265
Testing Database Mirroring 224 Deciding to Archive Data? 266
Mirroring Enhancements 225 Determining Business and Regulatory
Understanding Log Shipping 226 Requirements 267
Choosing Log-Shipping Roles 226 Case Study: Presenting a Data-Archiving Scenario 267
Switching Log-Shipping Roles 227 Determining What Data Will Be Archived 269
Reconnecting Client Applications 227 Developing a Data-Movement Strategy 273
Understanding Replication 228 Designing a Replication Topology 274
Implementing High Availability with Transactional Introducing Transactional Replication 276
Replication 229 Selecting a Storage Media Type and Format 271
Case Study: Handling Conflicts 229 Skill Summary 281
Implementing High Availability with Merge Knowledge Assessment 281
Replication 230
Designing Highly Available Storage 230 Case Study 281
Designing a High-Availability Solution 233
Developing a Migration Strategy 235 Glossary 285
Testing Your Migration 236 Index 287

www.wiley.com/college/microsoft or
call the MOAC Toll-Free Number: 1+(888) 764-7001 (U.S. & Canada only)
Designing the L ESSON 1
Hardware and
Software Infrastructure
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Design for capacity requirements. Foundational
Analyze storage requirements. Foundational
Analyze network requirements. Foundational
Analyze CPU requirements. Foundational
Analyze the current configuration. Foundational
Analyze memory requirements. Foundational
Forecast and incorporate anticipated growth requirements into the Foundational
capacity requirements.
Specify software versions and hardware configurations. Foundational
Choose a version and edition of the operating system. Foundational
Choose a version of SQL Server. Foundational
Choose a CPU type. Foundational
Choose memory options. Foundational
Choose a type of storage. Foundational

KEY TERMS
budgetary constraint: Limits horizon: A forecasting target. A regulatory requirements: A set
placed on your ability to invest horizon too far distant may result of compliance directions from an
as much as you might wish in an in capacity or other changes that external organization. This could
infrastructure improvement project. don’t prove needed; a horizon too be a governmental agency (e.g.,
capacity: A measure of the near may result in investments the regulator of the Sarbanes-
ability to store, manipulate, and that don’t meet tomorrow’s needs. Oxley Act or HIPPA) or your
report information collected for policies: A set of written guidelines corporate headquarters.
the enterprise. Excess capacity providing direction on how to security measures: The steps
suggests a declining business process any number of issues taken to assure data integrity.
need or too much investment (e.g., a corporate password policy).
in infrastructure.

1
2 | Lesson 1

There is an old saying among carpenters and woodworkers when preparing to work with
a good or particularly special piece of hardwood: “Measure it twice, and cut it once.” In
other words, make sure of what you’re doing and then get it right the first time, because it
might be the only chance you have.

Added to that old saying are others: “No one plans to fail, they fail to plan”; “Failure to plan
on your part does not constitute an emergency on mine”; and “A house built on sand will fall
down.” The point is to emphasize the role that careful planning and design of the underlying
support structure, the infrastructure, plays in the successful completion of any project—from a
child’s dollhouse to a family vacation to a career as a database administrator.
If you were going to build a house, either for yourself or someone else, the first thing you’d
want to know is how it will be used and how big it needs to be. To find out, you’d ask your-
self (or a client) some key questions: How much land is available on which to build? How
many people will live there? Does the couple have plans for additional children? Will it be
only a house, or will it serve as a home office? Is a separate section with a separate entrance
required? How much money and resources are available? With that information, you’d then
design and build accordingly. When it comes to a database server infrastructure, you need to
do the same.
To reemphasize: It’s very important that you grasp the underlying premise of this lesson and the
rest of the book. If you understand how to plan and design a database infrastructure and how to
successfully implement those plans, you will reap enormous benefits in terms of time saved and
resources properly allocated while increasing the probability that your activities will succeed.
The process of designing infrastructure often depends more on your understanding of the
underlying premises than on a single set of rules. Every infrastructure you design or work on
will need to meet unique requirements—there is no “one size fits all.” Unless you understand
the “hows” and “whys” of your process, the end result will be far from satisfactory.
In this lesson, you’ll take the first steps toward designing a database server infrastructure. Like
anything you build, whether a birdhouse or the Great Pyramid, the foundation is the key. First
you must review strategies for assessing your current configuration and gathering data about the
current capacity of key resources such as storage, CPU, memory, and network bandwidth. You
then have to cover how to use this data, along with the business requirements of the organiza-
tion, to estimate future capacity needs. The second part of the lesson will look at how to specify
software versions and hardware configurations for use in the organization’s requirements.

■ Analyzing the Current Configuration

You must first understand—completely—that which exists—the “as-is.” The second step
THE BOTTOM LINE involves understanding what is desired—the “to be.” And finally, you must understand the
plans for bridging the “gap”—the implementation strategy.

As you almost certainly know, you’ll rarely, if ever, be involved in designing a completely
new database server infrastructure. You’ll nearly always be working with an organization that
has an existing infrastructure that needs to change to meet enterprise growth and to enhance
performance.
In that case, the first step is not to reach for a piece of paper to draw your dream infrastructure.
Instead, the first step is to evaluate the various subsystems of the existing infrastructure and
figure out what you have to work with. This initial evaluation process will aid you in assessing
how well the different subsystems interact and will also highlight potential trouble spots.
Next, you should gather the requirements you need to have in place for the modified
infrastructure. These requirements may be technical or business related (they’re usually both),
Designing the Hardware and Software Infrastructure | 3

and they need to be prioritized in that context. Once you’ve established the requirements, set the
priorities, and determined the funding levels, you can design modifications to the infrastructure.

When designing modifications, a good practice is to standardize the hardware and software
configuration of database servers as much as possible. Doing so simplifies the design of the
TAKE NOTE
* infrastructure and reduces the maintenance overhead. In addition, standardization results
in significant cost savings.

Thinking Holistically

Whether before or after you’ve analyzed the capacity needs of the enterprise’s individual
database servers, you must at some point—the earlier the better—evaluate the existing
database server infrastructure as a whole. This view can give you a quick assessment of the
overall health of the infrastructure and help you determine any recurring trouble spots.

You should also think in terms of the ideal. Are the databases optimally designed? Are
disk-storage systems being used effectively? Are CPU and memory types and allocations
appropriate? Is the network properly designed and prepared for the new infrastructure?
You should use your evaluations to determine what modifications should be made to the
infrastructure to support business growth. And you should be able to make the business case
for your recommendations.

Assessing the Current Configuration

You should take a number of steps at the outset to assess the condition of the current
database server configurations:

Download • Check that licenses conform to your actual implementation.


You can download the • Inventory the operating system versions, service packs, and hotfixes running on each
latest SQL Server service database server.
packs and apply hotfixes • Verify whether the necessary OS service packs and hotfixes have been applied.
at the SQL Server Web • Identify any compatibility issues between the operating environment and the
site at http://www. applications running on the database server.
microsoft.com/sqlserver.
• Inventory the SQL Server versions, service packs, and hotfixes.
• Verify whether the latest service packs or hotfixes have been applied.
• Inventory what SQL Server services are running on each database server and what
service accounts have been assigned to each. To do so, you can use SQL Server 2005
Configuration Manager. A short list of important services includes:
° SQL Server Database Engine
° SQL Server Agent
X REF ° SQL Server Full-Text Search
For more information ° SQL Server Reporting Services (SSRS)
about the service accounts ° SQL Server Analysis Services (SSAS)
for SQL Server services, ° SQL Server Browser
refer to the topic “Setting
Up Windows Service ° SQL Server Integration Services (SSIS)
Accounts” in SQL Server • Inventory hardware configurations, including disk subsystems, CPUs, memory, network
2005 Books Online. cards, and power supplies on database servers. Make note of RAID and/or SCSI use.
Identify all servers in a cluster, if a clustering environment is being used.
4 | Lesson 1

• Record SQL Server configuration settings. Record the minimum and maximum memory
ANOTHER WAY
settings, the CPUs used by the instance, and the default connection properties for each
SQL Server instance.
Use the stored proce-
dure sp_configure with • Review the configuration management process for proposing, approving, and imple-
“show advanced option” menting configuration changes, and identify opportunities to make the process more
to display the current efficient. What tools are used?
settings. SQL Server • Assess the quality of the database server documentation.
Configuration Manager • Verify the capabilities of disk subsystems and physical storage. Determine whether
can help you collect net- the RAID levels of existing disk subsystems support data availability and performance
work configuration data, requirements.
such as the libraries,
• Determine the locations of transaction log files and data files.
protocols, and ports for
each instance. • Examine the use of filegroups.
• Are adequate data-file sizes allocated to databases, including the tempdb database?
• Verify that the AutoShrink property is set to False, to ensure that the OS files
X REF maintaining table and index data are resized accordingly.
Lesson 2 discusses disk • Determine whether disk-maintenance operations, such as defragmentation, are
subsystems and physical performed periodically.
storage design consider- • Assess Event Viewer errors to identify disk storage-related problems.
ations in more detail.

Accommodating Changing Capacity Requirements

Requirements analysis is key to the process of designing modifications to a database server


infrastructure. Just as you need to know the purpose of a house in order to build one that
meets your needs, you must properly identify the business requirements in order to design
your infrastructure. Otherwise, your design won’t meet the needs of the organization; and
not only can you forget professional pride, you’ll be lucky if you still have a job.

It’s essential that you always work in a collaborative way with company management, IT staff,
and end users to identify both the technical and business requirements that the new database
infrastructure is expected to support.
There is an intricate dance between the technical aspects and the nontechnical aspects of a
project, and weaving them together seamlessly is one of your most important, if never really
specified, jobs. Technical aspects and requirements typically focus on tasks such as capacity,
archiving, distribution, and other needs. These are affected by business requirements that
include budgetary and legal restrictions, company IT policies, and so on. Successful compre-
hension of both sets of requirements allows you to know precisely the scope of modifications
to the infrastructure and establishes a valuable foundation on which to base your design and
modification decisions.
When designing modifications to a database server infrastructure, you must consider its
future capacity needs based on the projected business growth of the organization. In addition,
you must consider requirements pertaining to data archiving, database server consolidation,
and data distribution.

CONSIDERING TECHNICAL REQUIREMENTS


The rest of this lesson introduces specific capacity needs, usually when talking about a specific
server. It’s crucial to a successful design that you analyze and identify the capacity require-
ments of the database server infrastructure as a whole.
Because it’s difficult to extrapolate the capacity needs of the entire infrastructure, you may not
always be able to project growth except in qualitative and general terms. You should, nonethe-
less, answer these questions for your future planning estimates and projections:
Designing the Hardware and Software Infrastructure | 5

• Is the enterprise planning, or likely, to experience growth through increases in business


operations, customer base, or increased demand for databases?
• Are there plans to utilize applications that require additional databases?
X REF
• Are there plans to improve the database server hardware?
Lesson 12 covers data
archiving in detail. • What cost variations, such as a decrease in the cost of servers and storage devices, will
you see because of market forces?
• What data archiving requirements exist? Will these change? Do they differ by department?
• What regulatory requirements are in place or are being contemplated? What security
components do they involve?
X REF
• Are any database servers potential candidates for consolidation?
Lesson 3 covers database
Are there opportunities to optimize the data-distribution process through simplification
consolidation.
and/or standardization?

CONSIDERING BUSINESS REQUIREMENTS


The nontechnical, business aspects of an organization or enterprise play a major role in deter-
mining the shape and scope of any infrastructure system you design. When considering busi-
ness requirements, you should be aware of any and all budgetary constraints, IT policies,
and industry-specific regulations. Additionally, you should consider the organization’s data
security measures and availability needs and requirements:
• Budgetary constraints. The amount of money an organization is willing to spend will
obviously affect the sort of database server infrastructure you can design. You can budget
funds for a project in a number of ways, but one of your key roles in the process is to
design within the constraints imposed by the bottom line. An ancillary role you play is
to make business cases when the budget is unrealistic or when spending money now may
produce a better return on investment (ROI) later.
• Existing IT policies. Any modifications to a database infrastructure must comply with
existing IT in the organization. Normally, these policies cover the following:
° Remote access procedures and rules
° Security policies including encryption
° Service-level agreements (SLAs)
° Standard hardware and software configurations
• Regulatory requirements. With the collection of data come twin demands for greater
privacy and security; at the same time, there are demands for data retention. All these
demands, as you’ll see throughout this lesson and the rest of the text, translate into
infrastructure-related requirements. For example, the health care and banking industries
have strict privacy requirements that translate into requirements for data security and
auditing and maintaining certain data-specific time periods. These requirements primar-
ily affect disk space storage and archiving needs and design considerations.
• Data security. An organization’s data security requirements include the following:
X REF

Lesson 4 covers database ° Confidentiality agreements with customers


security. Lessons 5 through ° Privacy restrictions
7 cover other security- ° Data encryption needs
related issues. ° External regulations
• Data availability. Typically, the overall infrastructure’s needs are a reflection of the data-
availability requirements applied to individual database servers and then generalized for
the entire infrastructure.
One other availability issue to consider applies only if the planned modification of
the database infrastructure results in a planned or unplanned loss of data availability.
Regardless of whether you anticipate a potential loss of availability across the infrastructure,
it’s important that you have solutions in place to prevent or minimize the loss.
6 | Lesson 1

This may include placing mission-critical data on a redundant site that can be used as an
emergency backup during infrastructure changes.
Now that you’ve examined some infrastructure-wide and general considerations for assessing,
planning, and modifying a database server infrastructure, you need to look at the process in
more detail, especially as it relates to specific capacity requirements.

■ Designing for Capacity Requirements

Your design must meet current and projected data storage requirements. You also must
THE BOTTOM LINE decide on a horizon when anticipating changes. You can anticipate near-term changes far
better than long-term requirements. Make everyone aware of your forecasting periods.

There are two sources of capacity requirements: the business and technical requirements of
the organization. The technical requirements are dictated by need and availability. You should
also determine the business goals of the organization for which you’re developing the database
infrastructure. Without knowing those, you can’t analyze or forecast its capacity needs, any
more than you can build the best possible house without knowing what it will be used for.
With those two points in mind, you have two other key tasks to perform: assessing the cur-
rent capacity of system resources; and identifying any information, such as growth trends,
that you can use to forecast future needs. Most of the time, you can correlate the trends with
a variable that can be measured, such as the database transaction growth rate (the rate at
which the read/write activity on the database server grows.)
In the following sections, you’ll learn how to gather data about the current capacity of key
system resources such as storage, CPU, memory, and network bandwidth. Then, you’ll learn
how to use the data to estimate future capacity needs, using that information to design
(or redesign when one exists) a database infrastructure.

Analyzing Storage Requirements

A lot of considerations go into analyzing the storage requirements of a database server. In


addition to the physical size of the database, you need to consider the transaction growth rate
and data-distribution requirements. Some industries, particularly financial and healthcare
institutions, are subject to requirements regarding data retention, storage, and security
that must be taken into account in determining storage capacity. You’ll now learn how to
determine the current storage capacity of a database server and identify factors that affect
its growth. We’ll also look at how to forecast future disk-storage requirements, taking into
account any relevant regulatory requirements that may apply to your business or enterprise.

ASSESSING CURRENT STORAGE CAPACITY


Typically, you won’t be starting from scratch when designing a database infrastructure; you’ll
be reviewing and upgrading an existing system. Even if the recommended upgrade calls for a
complete overhaul of the system, you need to be fully aware of the database server’s current
storage capacity. In making your survey and determination of current storage capacity,
consider the following factors:
• Disk-space capacity. Establishing disk-space capacity requires several steps. First,
determine how much disk space is used by the database data files. Then, add the
space required for the database’s transaction log files, the portion of tempdb that
supports database activity, and the space being used by full-text indexes. Look for any
maintenance operations that may require extra disk space for the database files, such as
index reorganization.
Designing the Hardware and Software Infrastructure | 7

If you’re examining an existing system, make sure you base your measurement of the
current disk usage on a properly sized database and that adequate disk space is already
allocated for data and log files. If adequate disk space is allocated for these files, SQL
Server doesn’t need to dynamically grow the database and request extra disk space from
the operating system. The process of allocating extra disk space for a file uses significant
disk resources. In addition, the process can cause physical file fragmentation because disk
segments are added to an existing file.
• Disk throughput capacity. Next, assess the disk I/O rate that the database requires. You
can use System Monitor’s PhysicalDisk:Disk Read Bytes/sec and the PhysicalDisk:Disk
Write Bytes/sec counters to measure disk I/O. If the database receives mostly reads, also
check the Avg. Disk Read Queue Length counter value, which should be low for opti-
mal performance. If the database primarily receives writes, check the Avg. Disk Write
Queue Length counter value, which should also be low.
• Locations and roles of database servers. When you’re working with a distributed envi-
ronment, you should establish where the database servers are (and should be) and their
different roles, because that may require a different disk-capacity assessment for each site
and each server. For example, the servers at an organization’s branch offices may store
only a subset of the data that is stored on the main server at headquarters. Based on the
roles of the servers, you may be able to identify which databases are most likely to experi-
LAB EXERCISE ence growth in disk-space usage or have particularly high or low disk-space requirements.
Perform the exercise in your lab In Exercise 1.1, you’ll learn how to use System Monitor to assess current disk throughput.
manual.

Forecasting and Planning Storage Requirements

When you’re building a house, you want to design it with an eye on future needs and
plans. The same applies to storage requirements. First, you need an idea of your future
needs. Armed with this information you can begin planning storage requirements. You
need to consider several key factors in planning for the future.

ESTABLISHING THE ESTIMATION PERIOD


Establish at the outset your planning period (in other words, the horizon or the length of
time for which the planning is valid). Are you establishing a plan that should be valid for one
year, two years, five years, or more? Review your assumptions regarding database needs for
the period. Are they valid? Do your estimates of the enterprise’s future, such as its anticipated
growth, match those of the enterprise? You will need to work with management and other
key stakeholders in the enterprise to determine how long the estimation period is and what
form it will take. Often, they will need to reconcile conflicting ideas to come to a consensus
on what the period should be. Don’t be surprised if they turn to you for expert advice and
mediation of internal disputes.

PROJECTING THE GROWTH RATE OF REQUIRED DISK SPACE


Obviously, you must estimate the amount of future disk space required. There are two ways
to do this effectively: You can either base your projection on an existing source of data that
correlates well with growth or you can follow a rule-of-thumb estimate when you don’t have a
correlating variable.
Ideally, you should correlate the growth rate with a measurable variable such as increases in
the transaction rate or user load. If you can identify such a variable, you can effectively esti-
mate future growth in disk space. For example, a clear correlation may already exist between
the growth in disk space for key tables and the number of new orders in a day. If so, and if
other variables are insignificant, then you can use the rate of growth in orders to estimate the
rate of growth in disk space required. If you don’t have a correlating variable, you can use past
growth trends to estimate future trends. In some cases, historical trends may be the only data
you have for estimation.
8 | Lesson 1

You can make an estimate of future trends using any of a number of formulas where
F ⫽ Future disk space
C ⫽ Current disk space
T ⫽ Growth-rate time unit
A ⫽ Growth amount
R ⫽ Rate of growth
Linear growth: If you expect disk space to grow by a constant amount in a specific period,
the growth is linear. In that case, you can apply the following formula:
F ⫽ C ⫹ (A ⫻ T)
For example, if you have an 800 GB database that’s expected to grow 100 GB per year, in
four years the database is expected to be 1200 GB: 800 GB ⫹ (100 GB ⫻ 4) ⫽ 1200 GB
Compound growth: If you expect disk space to grow at a constant rate during a specific
period (for example, at a certain percentage per month or per quarter), that growth is
described as compound. In that instance, use the following formula:
F ⫽ C ⫻ (1 ⫹ R)^T
For example, if an 800 GB database is expected to grow by 3 percent per quarter for two
years, the resulting additional disk space required in eight quarters (two years) will be
1013 GB: 800 ⫻ (1 ⫹ .03)^8 ⫽ 1013 GB. The total database size would be 1813 GB:
800 GB ⫹ 1013 GB ⫽ 1813 GB.

In this type of formula, you should express the growth rate as a decimal translation of the
percentage value. For example, if the growth rate is 3 percent per quarter, use the value
TAKE NOTE
* 0.03 in the formula. In this example, the number of periods is specified in quarters so that
it’s consistent with the growth-rate unit.

Geometric growth: If the disk space is expected to grow periodically by some increment,
but the increment itself is also growing at a constant rate, the disk space requirement grows
geometrically. In this case, use the sum of a series formula (also called a geometric series) to
determine the projected size:
F ⫽ C ⫹ ((Initial Increment) ⫻ (1 ⫺ Increment Growth Rate)^(T ⫹ 1))) / (1 ⫺ R))
For example, if a 1000 GB database grows by an increment that starts at 3 GB per month
and increases at 2 percent per month, then in 24 months the total disk space required will
LAB EXERCISE grow to 1096 GB: 1000 ⫹ (3 ⫻ (1 ⫺ 1.02^(24 ⫹ 1))) / (1 ⫺ 1.02) ⫽ 1096 GB.
Perform the exercise in your lab In Exercise 1.2, you’ll try your hand at forecasting future disk-storage requirements.
manual.
UNDERSTANDING THE IMPACT OF REGULATORY REQUIREMENTS
Legal considerations and/or governmental and financial regulations, such as those for banks
and the health care industry, can affect how long you need to retain data and how secure it
must be. Both of these factors in turn affect infrastructure design not only in terms of security
but also in terms of storage requirements:
• Longevity. Regulations may specify the length of time for which data must be
maintained. Something as simple as immunization records, for example, needs to be
retained for 20 years. Banks may also be required to store certain types of customer data
for a specific number of years. Before estimating disk-space capacity, determine what
data must be available online. For any data you consider storing offline, assess how
quickly the data must be available for online access. Also consider the type of offline
media that will be used. Technology changes quickly and being able to read 20-year-old
media could be an issue. Tapes deteriorate with time and even if backup tapes are in
Designing the Hardware and Software Infrastructure | 9

perfect condition, a functional tape drive of the correct type would be needed in order
to read these tapes.
• Privacy/Security. Regulations, industry guidelines, or legislation may mandate that
security measures, including encryption, be undertaken to protect consumer data. For
example, health insurers may be required to ensure the privacy of data. Such regulations
affect the data distribution strategy and, consequently, the disk-space capacity of local
and remote servers.
Privacy-related regulations may require data to be stored in an encrypted format. In SQL
Server, you can store data in an encrypted format by using several encryption algorithms.
However, encrypted data requires more disk space than nonencrypted data. In addition,
encryption increases CPU usage.

Analyzing Network Requirements

All database administrators and infrastructure designers should have a nuts-and-bolts


understanding of the topology and capacity of the network supporting the database
servers because this impacts infrastructural decisions. Available bandwidth, for example,
plays a large part in determining the backup strategies you use and the types of database
services you implement. In the following sections, you’ll learn how to identify the data-
base components of the network topology. You’ll also look at factors you should consider
when analyzing the current network traffic. Finally, you’ll learn how to estimate future
network-bandwidth requirements.

IDENTIFYING THE PARTS OF THE NETWORK THAT MAY AFFECT


DATABASE TRAFFIC
Obtain or create a network diagram to identify the parts of the network that
• Deliver replicated data to other servers
• Back up files to network devices
• Provide data to client applications
Identify and assess the location of the following items to determine weak points and potential
bottlenecks in the topology, such as low-bandwidth wide-area network (WAN) links. Also be
aware of the security aspects of the network and the impact they have on traffic:
• Local and remote connections between database servers
• Firewalls
• Antivirus applications
Assess capabilities of the database servers on the network by gathering the following
information about each:
• Number of SQL Server instances
• Instance names
• Installed SQL services
• Network protocols
LAB EXERCISE

Perform the exercise in your lab In Exercise 1.3, you’ll use SQL Server Configuration Manager to gather information about
manual. database servers.

ANALYZING CURRENT DATABASE NETWORK TRAFFIC


You should analyze current database network traffic to estimate whether, and how long, the
existing network can support your database server infrastructure. If the network can’t effec-
tively handle current or future increases in traffic as a result of business growth, the perfor-
mance of the database servers on the network will be adversely affected by traffic bottlenecks.
10 | Lesson 1

Analyze the traffic between servers and between clients and servers. Then, use the data you
gather to identify potential bottlenecks. Key areas to review include:
• Traffic between servers. Use System Monitor counters to analyze the traffic caused by
backup processes, database mirroring, and replication:
° Backup processes. The SQLServer: Backup Device:Device Throughput Bytes/sec
counter specifies the number of bytes per second that the backup device currently
supports. You should also review the backup strategy. If the amount of data is large and
available network bandwidth is low, frequent backups to network devices can saturate
the network.
° Database mirroring. The SQLServer:Database Mirroring:Bytes Sent/sec and Bytes
Received/sec counters indicate the number of bytes transferred from the principal
server to the mirror server.
° Replication. No single set of counters in System Monitor helps you analyze all
replication traffic; hence, what you need to use will depend on the type of replication
being used. In the case of subscribers, for example, you can monitor commands
delivered per second or transactions received per second.
• Traffic between clients and servers. Among other things, you must determine the
client traffic on the network, assess how well the current network supports the user
load, and identify the additional traffic that will be caused by an increase in user load
or changes in the application. A useful technique is to use the System Monitor counter
Network Interface: Bytes Totals/sec counter to establish the number of bytes transferred
across the database server’s network interface. You need to do this for each network
TAKE NOTE
* interface on the server. Check for a correlation exists between the Network Interface:
To use Network Monitor Bytes Total/sec counter and the SQLServer:General Statistics:User connection counter.
for monitoring database By doing this, you can determine the network traffic caused by users.
traffic across servers, use
• Potential bottlenecks. Running Network Monitor on the database server lets you deter-
the Network Monitor
mine the number of bytes used, the percentage of the total bandwidth used, and the
version included in
number of bytes transmitted in a specific period. You can also filter specific patterns and
System Management
protocols for a more granular approach. Analyze this data to identify bottlenecks, and
Server (SMS).
work with the network administrator on strategies to eliminate the bottlenecks.

FORECASTING AND PLANNING NETWORK REQUIREMENTS


Now that you have begun to grasp the present network situation, you should take time to
forecast network traffic growth for database servers. As with all forecasting, there are no hard
and fast rules (nor guarantees of being 100 percent accurate—just ask a weatherman), but
you should do the following:
• Make growth estimates for each network type. Because different network types
support varying data flows and volumes, it’s essential that you assess each network type
individually when determining growth estimates. A good tool is the network diagram
mentioned earlier in this section. Use it to help determine expected network traffic that
each segment may need to support.
• Establish a baseline, and study the trends. Before you can even think about estimating
future network traffic growth, you need to establish a baseline of network usage. As your
network grows, you should use that baseline, and others collected at intervals, to deter-
mine changes in usage from the baseline(s) and identify growth trends.
• Understand specific business needs and the expected workload for the estimation
period. Although understanding the technical aspects is a key to understanding design
of the network, it’s important to keep in mind the activity trends and growth projections
for the enterprise. Review these business plans, and then estimate future usage and deter-
mine the network configuration that is required to support the plans. You should also
make sure you confirm the plans and gather statistics periodically so that you can detect
new trends early and adjust your estimates.
Designing the Hardware and Software Infrastructure | 11

Analyzing CPU Requirements

The CPU is the heart of a computer and the heart of your database server infrastructure.
You’ll now learn about what you need to take into account when analyzing the CPU
performance of a database server and when choosing a processor type. We’ll also look at
what you should do to make meaningful estimates of future CPU requirements.

You’ll review considerations for choosing a CPU, such as the performance-versus-cost benefit
of using processors with 32-bit and 64-bit architectures, and of using processors with multi-
core and hyperthreading technologies, later in this lesson.

ASSESSING CURRENT CPU PERFORMANCE


When you’re analyzing the current CPU performance of a database server, consider the
TAKE NOTE
* following factors:
Dynamic affinity is • Type of CPUs. Identify the database servers in the system. For each, make a list of
tightly constrained by its current CPU, speed, architecture (32-bit or 64-bit), and whether the processor is
per-CPU licensing. multicore or capable of hyperthreading.
When you set the
• Affinity mask settings. By default, each thread allocated by a SQL Server instance is
affinity mask, SQL
scheduled to use the next available CPU. However, you can set the affinity mask to restrict
Server verifies that the
an instance to a specific subset of CPUs. Additionally, setting the affinity mask ensures that
settings don’t violate the
each thread always uses the same CPU between interrupts. This reduces the swapping of a
licensing policy.
thread among multiple CPUs and increases the cache-hit ratio on the second-level cache.
• Current CPU usage. To identify any CPU performance issues, you should set a base-
line of CPU usage in the current environment. To do so, first collect basic operations
data, such as the number of user connections and the amount of application data. Next,
establish the current CPU usage using monitoring tools such as System Monitor. Finally,
correlate the operations data with the CPU usage.
Hardware bottlenecks, recompilation of stored procedures, and the use of cursors
are some of the main causes of a decrease in CPU performance. To identify CPU
performance problems, use the counters that are included in System Monitor’s
SQLServer: Plan Cache and SQLServer:SQL Statistics objects.

FORECASTING AND PLANNING CPU REQUIREMENTS


Once you have a good understanding of the current CPU situation in your environment, you
should next assess and forecast future CPU needs in order to ensure the efficient and effective
operation of your database server infrastructure.
There are always a variety of indicators and factors to review, but the following are critical:
• Determine the estimation period. By now you may be tired of looking at this
consideration, but you must keep this in mind in order to limit the scope of your
forecast activities to a realistic level.
• Establish a baseline of CPU usage by using historical data. When analyzing data on
CPU usage, consider the type of work the SQL Server instance performs.
• Identify factors that affect CPU usage. These factors obviously include the number
of users and the amount of application data on the servers. You should also observe
variations over time and try to find a correlation with measurable variables.
• Confirm your estimates by performing load tests and by using sizing tools. Keep in
mind that adding CPUs to a server doesn’t necessarily increase the overall CPU power
in a linear proportion.

Some operating system licenses restrict the number of CPUs that may be used. Planning to
TAKE NOTE
* use more CPUs must take any such limitations into account.
12 | Lesson 1

Analyzing Memory Requirements

If the CPU is the heart of the computer, then memory is a combination of muscle,
sinew, and brainpower. As a general rule, when assessing the memory requirements of a
database server, you need to determine the amount of memory being used by the OS,
other processes on the server, and SQL Server. It’s also important to bear in mind that
memory usage is affected by the type of CPU. In the following sections, you’ll learn about
determining the current memory usage of a database server, the interaction between
memory usage and CPU type, and how to estimate future memory requirements.

ASSESSING CURRENT MEMORY USAGE


Assessing current memory usage isn’t that difficult thanks to a number of tools, the most
important of which is System Monitor. To determine current memory usage and assess the
ability to satisfactorily meet the needs of the current environment, do the following:
• Establish how much physical memory is installed on the database server.
• In addition to the OS and SQL Server, determine what other processes will be making
use of the available memory.
• Use these System Monitor counters to determine how much memory is available and
used on a server:
° Memory:Available Bytes indicates how many bytes of memory are available.
° Memory:Pages/sec specifies how many pages must be read from the disk or written
to the disk to resolve page faults.
° SQLServer:Memory Manager:Total Server Memory determines the amount of
physical memory used by each instance of SQL Server.
° Process:Working Set describes the set of memory pages that have been recently
accessed by the threads running in the process and can be used to determine how
much memory SQL Server is using.
° SQLServer:Buffer Manager specifies the buffer cache-hit ratio. This counter identi-
fies the percentage of pages that were found in the buffer pool without reading the
disk. The value of this counter should be over 90 percent. High values indicate good
cache usage and minimal disk access when searching for data.
° SQLServer:Buffer Manager:Page Life Expectancy specifies the average time spent
by a data page in the data cache. A value of less than 300 seconds indicates that SQL
Server needs more memory.
• In addition to the System Monitor tool, you can use the following dynamic manage-
ment views to collect data about SQL Server memory:
° sys.dm_exec_query_stats provides statistics on memory and CPU usage for a
specific query.
° sys.dm_exec_cached_plans returns a list of the query plans that are currently cached
in memory.
° sys.dm_os_memory_objects provides information about object types in memory,
such as MEMOBJ_COMPILE_ADHOC and MEMOBJ_STATEMENT.
° sys.dm_os_memory_clerks returns the set of all memory clerks (memory clerks
access memory node interfaces to allocate memory) that are currently active in the
instance of SQL Server.

When you’re trying to establish actual memory usage and peak usage, make sure you
collect information over a complete business cycle in order to obtain the most accurate
TAKE NOTE
* data. For example, if an organization generates a large number of reports the first week
of each month, collect the peak usage data when those reports are generated.
Designing the Hardware and Software Infrastructure | 13

• Determine whether the current database and memory size match is correct. If the
database, including its indexes, fits completely into the available memory, there will be no
page faults. When a large database can’t fit in memory, some data must be retrieved from
the disk when required. Page faults can be minimized by using the buffer cache efficiently.
LAB EXERCISE • Determine the amount of memory being used by SQL Server connections.
Perform the exercise in your lab In Exercise 1.4, you’ll use System Monitor to assess memory requirements.
manual.
The point of all this collecting and reviewing is to identify trouble spots that need to be
addressed in the current configuration or that will play a role in modification and future
TAKE NOTE
* growth of the infrastructure.
Microsoft has Consequently, you should track memory usage values regularly and establish a baseline. To
replaced the Microsoft gather data for establishing the baseline, you can use the System Monitor counters for SQL
Operations Manager Server memory usage. When they’re present, you can also use management tools such as
(MOM) product Microsoft’s System Center Operations Manager (SCOM) to gather data on memory usage
with System Center across a set of enterprise database servers.
Operations Manager
(SCOM). You should also establish minimum and maximum usage values. Using this data ensures that
the memory usage for the current period doesn’t exceed the established limits. If you compare
current memory usage values with the baseline, you can assess whether SQL Server has suf-
ficient memory for normal operations. If memory is insufficient, the database server is said to
be under memory pressure, a circumstance that needs to be addressed sooner rather than later.

FORECASTING AND PLANNING MEMORY REQUIREMENTS


Once you have a good understanding of the current memory requirements, you should next
estimate future memory needs. In order to do so, establish the following information:
• Determine the number of SQL Server instances. Ensure that the server has the
optimal number of SQL Server instances. (For more information on instances, see
Lesson 3.) Running too many instances can cause memory pressure. In some cases, such
as in a multiple-instance cluster during a failover, multiple instances may be required.
• Estimate database growth. Because the memory needed by a database may grow if the
size of the database and the volume of data that is queried increases, adding new data-
bases to the same SQL Server instance may also create memory pressure.
• Specify the number of concurrent users. An increase in the number of user connec-
tions may result in a wider range of queries with varied parameters. This increases the
pressure on memory because more data is cached in response to the queries.
• Use baseline data. Once you have the baseline data, you can collect current usage infor-
mation and then compare it to historical data to identify growth trends.
• Determine the rate of growth in memory usage. By correlating the rate of growth in
memory usage with a measurable variable, such as user connections, you can estimate
future memory requirements. For example, if you determine the data cache increases by
50 percent for every 500 users you can estimate the additional memory requirements of
the server if the number of users increases by 50.

■ Specifying Software Versions and Hardware Configurations

Microsoft employs lots of programmers working diligently to change operating systems and
THE BOTTOM LINE
applications. Intel, AMD, and Nvidia work long hours to make “improvements.” Knowledge
changes. Truth is time dependent. Your challenge is to keep up. This is no mean feat.

In this section of the lesson, you won’t learn every single aspect of hardware and software
selection that you should apply. Given the rate of hardware and software change, any specific
recommendations would likely be out of date by the time this textbook reaches the shelves.
14 | Lesson 1

However, you’re going to need to make these decisions every step of the way. There are no
hard-and-fast rules, but there are best practices. Apply them, and you can’t go wrong.

Following Best Practices

You should apply the following best practices when selecting database server hardware
and software:

• Meet or exceed the design requirements. Based on your assessment, the safest and
most effective course is to at least meet design requirements not only for the present but
also for the future. Be wary of technology obsolescence. Often, the best way to do that is
to deliberately select hardware and software that exceed the requirements.
• Perform cost-benefit analyses. When choosing hardware or software, always perform
a cost-benefit analysis. The purpose is to ensure that the benefit you’ll obtain from the
new item is justifiable in terms of increased throughput that offsets the cost. Spending
money on a new high-speed memory bus may be warranted on a server with a heavy
workload because the new hardware will increase the server’s performance; but it
wouldn’t be appropriate for a small server with little workload, such as a server used to
store archival data.
• Choose from approved hardware and software configurations. Most organizations
have a list of approved hardware and software products and configurations that
restricts your choices. One benefit is that the standardization of hardware and
software reduces the complexity of the database server infrastructure and thereby
simplifies maintenance and reduces implementation costs. Therefore, you should
try to make selections that are within the framework of the hardware and software
standards established by your organization. Keep in mind that it’s better to choose
database-server hardware and software products that have already been successfully
used in a given environment.
Be prepared to justify variations from standards. Although it’s better to stay with a
standardized setup, it isn’t uncommon for standards to lag behind improvements and
upgrades in database server hardware and software products. Consequently, you may need
to make hardware and software choices that vary from the current standards.
Design requirements should supersede standards, but these variations need to be approved by
management. The best ways to justify a variation are to clearly demonstrate that the existing
standards don’t allow you to meet specific design requirements and to describe how the
hardware and software products you’ve proposed meet those requirements.
Bear in mind that just because the variation may be justified doesn’t always mean a
variation or standards change will be approved. Variations from existing standards are
frequently rejected, usually for budgetary reasons. If that occurs, you need to identify
alternative hardware and software products that come closest to fulfilling the requirements.

Choosing a Version and Edition of the Operating System

The version and edition of the operating system you use, if not predetermined by
organizational standards, depends on the version of SQL Server you select. Table 1-1
specifies the minimum versions and editions of the operating system required for each
edition of SQL Server 2005. Table 1-2 specifies the equivalent information for each
edition of SQL Server 2008.
Designing the Hardware and Software Infrastructure | 15

Table 1-1
SQL Server 2005 editions and
E DITION OPERATING SYSTEM VERSION AND EDITION
minimum operating system
versions and editions SQL 2005 Enterprise (64-bit) x64 Windows Server 2003/2008: Standard x64, Enterprise
x64, or Datacenter x64 edition with Service Pack 1
or later
SQL 2005 Standard (64-bit) x64 XP Professional 64 or later; Windows Server 2003/2008:
Standard x64, Enterprise x64, or Datacenter x64 edition
with Service Pack 1 or later
SQL 2005 Developer (64-bit) x64 Windows XP Professional 64 or later; Windows Server
2003/2008: Standard x64, Enterprise x64, or Datacenter
x64 edition with Service Pack 1 or later
SQL 2005 Enterprise (64-bit) IA64 Microsoft Windows Server 2003/2008 Enterprise
Edition or Datacenter edition for Itanium-based
systems with SP 1 or later
SQL 2005 Standard (64-bit) IA64 Microsoft Windows Server 2003/2008 Enterprise
Edition or Datacenter edition for Itanium-based
systems with SP 1 or later
SQL 2005 Developer (64-bit) IA64 Microsoft Windows Server 2003/2008 Enterprise
Edition or Datacenter edition for Itanium-based
systems with SP 1 or later
SQL 2005 Enterprise (32-bit) Windows 2000 Server with Service Pack 4 or later;
Windows Server 2003/2008: Standard, Enterprise,
or Datacenter edition with Service Pack 1 or later;
Windows Small Business Server 2003 with Service
Pack 1 or later; Windows 2000 Professional with
Service Pack 4 or later
SQL 2005 Standard (32-bit) XP with Service Pack 2 or later; Windows 2000
Server with Service Pack 4 or later; Windows Server
2003/2008: Standard, Enterprise, or Datacenter
edition with Service Pack 1 or later; Windows Small
Business Server 2003 with Service Pack 1 or later;
Windows 2000 Professional with Service Pack 4 or
later
SQL 2005 Workgroup XP with Service Pack 2 or later; Windows 2000
Server with Service Pack 4 or later; Windows Server
2003/2008: Standard, Enterprise, or Datacenter
edition with Service Pack 1 or later; Windows Small
Business Server 2003 with Service Pack 1 or later;
Windows 2000 Professional with Service Pack 4
or later
SQL 2005 Developer (32-bit) Windows XP with Service Pack 2 or later; Windows
2000 Server with Service Pack 4 or later; Windows
Server 2003/2008: Standard, Enterprise, or
Datacenter edition with Service Pack 1 or later;
Windows Small Business Server 2003 with Service
Pack 1 or later; Windows 2000 Professional with
Service Pack 4 or later
TAKE NOTE
* SQL 2005 Express Windows XP with Service Pack 2 or later; Windows
Notice the lack of Vista. 2000 Server with Service Pack 4 or later; Windows
Check http://technet. Server 2003/2008: Standard, Enterprise or Datacenter
microsoft.com/en-us/ edition with Service Pack 1 or later; Windows Small
library/aa905868.aspx. Business Server 2003 with Service Pack 1 or later
16 | Lesson 1

Table 1-2
SQL Server 2008 editions and minimum operating system versions and editions

E DITION OPERATING SYSTEM VERSION AND EDITION


SQL 2008 Enterprise (64-bit) x64 Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Server 2008 64-bit x64 Standard
Windows Server 2008 64-bit x64 Datacenter
Windows Server 2008 64-bit x64 Enterprise
SQL 2008 Standard (64-bit) x64 Windows XP Professional x64
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 x64 Web
Windows Small Business Server 2008
Windows Server 2008 for Windows Essential Server Solutions
SQL 2008 Developer (64-bit) x64 Windows XP x64 Professional
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate x64
Windows Vista Home Premium x64
Windows Vista Home Basic x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 x64 Web
SQL 2008 Workgroup (64-bit) x64 Windows XP x64 Professional
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate x64
Windows Vista Home Premium x64
Windows Vista Home Basic x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 x64 Web
SQL 2008 Web (64-bit) x64 Windows XP x64 Professional
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 x64 Web
Designing the Hardware and Software Infrastructure | 17

E DITION OPERATING SYSTEM VERSION AND EDITION


SQL 2008 Developer (64-bit) IA64 Windows Server 2003 SP2 64-bit Itanium Datacenter
Windows Server 2003 SP2 64-bit Itanium Enterprise
Windows Server 2008 64-bit Itanium Edition
SQL 2008 Enterprise (32-bit) Windows Server 2003 SP2 Standard
Windows Server 2003 SP2 Enterprise
Windows Server 2003 SP2 Datacenter
Windows Server 2003 Small Business Server SP2 Standard
Windows Server 2003 Small Business Server SP2 Premium
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Server 2008 Standard
Windows Server 2008 Web
Windows Server 2008 Datacenter
Windows Server 2008 Enterprise
Windows Server 2008 x64 Standard
Windows Server 2008 x64 Datacenter
Windows Server 2008 x64 Enterprise
SQL 2008 Standard (32-bit) Windows XP Professional SP2
Windows XP SP2 Tablet
Windows XP x64 Professional
Windows XP Media Center
Windows XP Professional Reduced Media
Windows Server 2003 SP2 Small Business Server R2 Standard
Windows Server 2003 SP2 Small Business Server R2 Premium
Windows Server 2003 SP2 Standard
Windows Server 2003 SP2 Enterprise
Windows Server 2003 SP2 Datacenter
Windows Server 2003 SP2 Small Business Server Standard
Windows Server 2003 SP2 Small Business Server Premium
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate
Windows Vista Enterprise
Windows Vista Business
Windows Vista Ultimate x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 Web
Windows Server 2008 Standard Server
Windows Server 2008 Datacenter
Windows Server 2008 Enterprise
Windows Server 2008 x64 Standard
Windows Server 2008 x64 Datacenter
Windows Server 2008 x64 Enterprise
Windows Small Business Server 2008
Windows Server 2008 for Windows Essential Server Solutions
(Continued )
18 | Lesson 1

Table 1-2
SQL Server 2008 editions and minimum operating system versions and editions (Continued )

E DITION OPERATING SYSTEM VERSION AND EDITION


SQL 2008 Express x64 (64-bit) Windows Server 2003 x64
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate x64
Windows Vista Home Premium x64
Windows Vista Home Basic x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 64-bit x64 Web
Windows Server 2008 64-bit x64 Standard
Windows Server 2008 64-bit x64 Datacenter
Windows Server 2008 64-bit x64 Enterprise
SQL 2008 Developer (32-bit) Windows XP Home Edition SP2
Windows XP Professional SP2
Windows XP Tablet SP2
Windows XP Professional x64 SP21
Windows XP Media Center
Windows XP Professional Reduced Media
Windows XP Home Edition Reduced Media
Windows Server 2003 SP2 Standard
Windows Server 2003 SP2 Enterprise
Windows Server 2003 SP2 Datacenter
Windows Server 2003 SP2 Small Business Server Standard
Windows Server 2003 SP2 Small Business Server Premium
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate
Windows Vista Home Premium
Windows Vista Home Basic
Windows Vista Starter Edition
Windows Vista Enterprise
Windows Vista Business
Windows Vista Ultimate 64-bit x64
Windows Vista Home Premium 64-bit x64
Windows Vista Home Basic 64-bit x64
Windows Vista Enterprise 64-bit x64
Windows Vista Business 64-bit x64
Windows Server 2008 Web
Windows Server 2008 Standard
Windows Server 2008 Enterprise
Windows Server 2008 Datacenter
Windows Server 2008 64-bit x64 Standard
Windows Server 2008 Datacenter
Windows Server 2008 Enterprise
Designing the Hardware and Software Infrastructure | 19

E DITION OPERATING SYSTEM VERSION AND EDITION


SQL 2008 Workgroup (32-bit) Windows XP Professional SP2
Windows XP SP2 Tablet
Windows XP Professional 64-bit x64
Windows XP SP2 Media Center 2002
Windows XP SP2 Media Center 2004
Windows XP Media Center 2005
Windows XP Professional Reduced Media
Windows Server 2003 SP2 Standard
Windows Server 2003 SP2 Enterprise
Windows Server 2003 SP2 Datacenter
Windows Server 2003 SP2 Small Business Server Standard
Windows Server 2003 SP2 Small Business Server Premium
Windows Server 2003 64-bit x64 Standard
Windows Server 2003 64-bit x64 Datacenter
Windows Server 2003 64-bit x64 Enterprise
Windows Vista Ultimate
Windows Vista Enterprise
Windows Vista Business
Windows Vista 64-bit x64 Ultimate
Windows Vista 64-bit x64 Enterprise
Windows Vista 64-bit x64 Business
Windows Server 2008 Web
Windows Server 2008 Standard
Windows Server 2008 Datacenter
Windows Server 2008 Enterprise
Windows Server 2008 64-bit x64 Standard
Windows Server 2008 64-bit x64 Datacenter
Windows Server 2008 64-bit x64 Enterprise
SQL 2008 Web (32-bit) Windows XP Professional XP x64
Windows XP Media Center
Windows XP Professional Reduced Media
Windows Server 2003 SP2 Standard
Windows Server 2003 SP2 Enterprise
Windows Server 2003 SP2 Datacenter
Windows Server 2003 SP2 Small Business Server Standard
Windows Server 2003 SP2 Small Business Server Premium
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate
Windows Vista Enterprise
Windows Vista Business
Windows Vista Ultimate x64
Windows Vista Enterprise x64
Windows Vista Business x64
Windows Server 2008 Web
Windows Server 2008 Standard Server
Windows Server 2008 Datacenter
Windows Server 2008 Enterprise
Windows Server 2008 x64 Standard
Windows Server 2008 x64 Datacenter
Windows Server 2008 x64 Enterprise
(Continued )
20 | Lesson 1

Table 1-2
SQL Server 2008 editions and minimum operating system versions and editions (Continued )

E DITION OPERATING SYSTEM VERSION AND EDITION


SQL 2008 Express (32-bit), Express with Windows XP SP2 Home
Tools, and Express with Advanced Services Windows XP SP2 Professional
(32-bit) Windows XP SP2 Tablet
Windows XP Media Center
Windows Server 2003 Reduced Media
Windows XP Home Edition Reduced Media
Windows Server 2003 SP2 Standard
Windows Server 2003 SP2 Enterprise
Windows Server 2003 SP2 Datacenter
Windows Server 2003 SP2 Web Edition
Windows Server 2003 SP2 Small Business Server Standard
Windows Server 2003 SP2 Small Business Server Premium
Windows Server 2003 SP2 64-bit x64 Standard
Windows Server 2003 SP2 64-bit x64 Datacenter
Windows Server 2003 SP2 64-bit x64 Enterprise
Windows Vista Ultimate
Windows Vista Home Premium
Windows Vista Home Basic
Windows Vista Enterprise
Windows Vista Business
Windows Vista Ultimate 64-bit x64
Windows Vista Home Premium 64-bit x64
Windows Vista Home Basic 64-bit x64
Windows Vista Enterprise 64-bit x64
Windows Vista Business 64-bit x64
Windows Server 2008 Standard Server
Windows Server 2008 Enterprise
Windows Server 2008 Datacenter
Windows Server 2008 Web Edition
Windows Server 2008 64-bit x64 Web Edition
Windows Server 2008 64-bit x64 Standard
Windows Server 2008 64-bit x64 Datacenter
Windows Server 2008 64-bit x64 Enterprise
Windows XP Embedded SP2 feature pack 2007
Windows Embedded for Point of Service SP2

CERTIFICATION READY? Choosing an Edition of SQL Server


Know the different
editions and their Because SQL Server is used by a vast audience of different people, businesses, school,
requirements and government agencies, and so on, each of which has different needs as well as diverse
limitations. requirements, different editions of SQL Server have been provided by Microsoft. SQL
Server 2005 and 2008 each come in different editions. Each edition targets a group of
people based on creating a good match to the unique performance, runtime, and price
requirements of organizations and individuals.

Both SQL Server 2005 and SQL Server 2008 have a specialized edition for embedded
TAKE NOTE
* applications (such as handheld devices). This embedded edition is the Compact edition
and because of its very specialized nature, we will not discuss it further in this text.
Designing the Hardware and Software Infrastructure | 21

SQL SERVER 2005


There are five different editions of SQL Server 2005: Microsoft SQL Server 2005 Enterprise/
Developer/Evaluation edition, Microsoft SQL Server 2005 Standard edition, Microsoft
SQL Server 2005 Workgroup edition, Microsoft SQL Server 2005 Developer edition, and
Microsoft SQL Server 2005 Express edition/Express edition with Advanced Services. The
most common editions used are Enterprise, Standard, and Express, because these editions fit
the requirements and product pricing needed in production server environments:
• SQL Server 2005 Enterprise edition (32-bit and 64-bit). This edition comes in both
32-bit and 64-bit varieties. This is the ideal choice if you need a SQL Server 2005
edition that can scale to near limitless size while supporting enterprise-sized On-Line
Transaction Processing (OLTP), highly complex data analysis, data-warehousing systems,
and Web sites.
Enterprise edition has all the bells and whistles and is suited to provide comprehensive
business intelligence and analytics capabilities. It includes high-availability features such
as failover clustering and database mirroring. It’s ideal for large organizations or situations
with the need for a version of SQL Server 2005 that can handle complex situations.
• SQL Server 2005 Standard edition (32-bit and 64-bit). Standard includes the
essential functionality needed for e-commerce, data warehousing, and line-of-business
solutions but does not include some advanced features such as Advanced Data
Transforms, Data-Driven Subscriptions, and DataFlow Integration using Integration
Services. The Standard edition is best suited for the small- to medium-sized organization
that needs a complete data-management and analysis platform without many of the
advanced features found in the Enterprise edition.
• SQL Server 2005 Workgroup edition (32-bit only). Workgroup edition is the data
management solution for small organizations that need a database with no limits on size
or number of users. It includes only the core database features of the product line (it
doesn’t include Analysis Services or Integration Services, for example). It’s intended as
an entry-level, easy-to-manage database.
• SQL Server 2005 Developer edition (32-bit and 64-bit). Developer edition has all
the features of Enterprise edition, but it’s licensed only for use as a development and test
system, not as a production server. This edition is a good choice for persons or organiza-
tions that build and test applications but don’t want to pay for Enterprise edition.
• SQL Server 2005 Express edition (32-bit only). SQL Server Express is a free,
easy-to-use, simple-to-manage database without many of the features of other editions
(such as Notification Services, Analysis Services, Integration Services, and Report
Builder). SQL Server Express is free and can function as the client database as well as a
basic server database. It’s a good option if all that’s needed is a stripped-down version of
SQL Server 2005. Express is used typically among low-end server users, nonprofessional
developers building web applications, and hobbyists building client applications.

SQL SERVER 2008


With SQL Server 2008, Microsoft is now bundling both the 32-bit and 64-bit software
together with one license for all editions of the product. This in itself is a significant
change SQL Server 2005, where you could only purchase or obtain 32-bit software for the
Workgroup and Express editions. Also a new Web edition has been created for Internet web
server usage. The 2008 editions of SQL Server are:
• SQL Server 2008 Enterprise edition. The Enterprise edition includes all features in
SQL Server 2008. The following features are only available in this edition of SQL Server
2008 (plus Developer and Evaluation editions as they are simply restricted license ver-
sions of Enterprise):
° Data Compression
° Extensible Key Management
° Hot Add CPUs, RAM
° Resource Governor
22 | Lesson 1

° IA-64 Processor Hardware Support


° Table and Index Partitioning
° Sparse Columns
° Indexed Views
• SQL Server 2008 Standard edition. This edition is the second most advanced edition
below only Enterprise edition. There is no limit on database size, no RAM limit other
than the operating system maximum, and up to 4 processors are supported.
• SQL Server 2008 Developer edition. Developer edition has all the features of
Enterprise edition, but it’s licensed only for use as a development and test system, not as
a production server. This edition is a good choice for persons or organizations that build
and test applications but don’t want to pay for Enterprise edition.
• SQL Server 2008 Workgroup edition. RAM is limited to 4 GB.
• SQL Server 2008 Web edition. This edition is designed for Internet-oriented databases
provided by organizations such as hosting companies. There is no license limit to the size
of any database and up to four CPUs may be utilized. Also, there is no maximum limit
on RAM other than the maximum allowed by the operating system on which this edi-
tion is installed.
• SQL Server 2008 Express edition. The Express edition is free and is available in three
versions. Each of these might be considered subeditions. These versions or subeditions
are regular (runtime only) Express, Express with Tools, and Express with Advanced
Services. Management Studio is only included with the latter two subeditions. In all of
these versions, the core database engine is the same and all of the following limits apply:
all databases are limited to a maximum of 4 GB, only one CPU is permitted, and RAM
is limited to 1 GB.

Choosing a CPU Type

SQL Server supports both 32-bit and 64-bit CPUs. In addition, it supports multicore
CPUs and CPUs that use hyperthreading. When estimating CPU requirements, you
should consider the benefits of using different processor types. The benefits of using a
64-bit CPU instead of a 32-bit CPU include:

• Larger direct-addressable memory. A database server running Microsoft Windows


Server 2003/2008 on a 64-bit architecture can support up to the operating system maxi-
mum (up to 2 terabytes of memory). In contrast, a Windows server with a 32-bit archi-
tecture can directly address a maximum of 3.25 GB of physical memory. The server can
indirectly address memory beyond this limit only if you enable the Address Windowing
Extensions (AWE) switch.
• Better on-chip cache management. A 64-bit CPU allows SQL Server memory struc-
tures such as the query cache, connection pool, and lock manager to use all available
memory. A 32-bit CPU doesn’t.
• Enhanced on-processor parallelism. The 64-bit architecture can support 64 processors,
allowing SQL Server to potentially support more concurrent processes, applications, and
users on a single server. A 32-bit architecture can support only 32 processors.
A multicore CPU includes two or more complete execution cores that run at the same
frequency and share the same packaging and interface with the chipset and the memory. In
addition, the cores contain two or more physical processors and two or more L2 cache blocks.
On a Windows system running SQL Server, each core can be used as an independent proces-
sor to increase the multithreaded throughput.
Hyperthreading lets a CPU execute multiple threads simultaneously. Consequently, the CPU
throughput increases. A CPU that supports hyperthreading contains two architectural states
Designing the Hardware and Software Infrastructure | 23

on a single physical core. Each state acts as a logical CPU for the operating system. However,
the two logical CPUs use the same execution resources, so you don’t get the performance ben-
efits of using two physical CPUs.
In the recent past 64-bit, multicore, and hyperthreading CPUs were more expensive than 32-bit
CPUs. SQL Server 2005 requires a variety of different CPUs depending on edition, as summarized
in Table 1-3. Table 1-4 summarizes the same information for the editions of SQL Server 2008.

Table 1-3
SQL Server 2005 editions and E DITION OS T YPE M INIMUM CPU T YPE AND S PEED
minimum CPU type and speed
Enterprise, Standard, 64-bit 1.0 GHz AMD Opteron, AMD Athlon 64,
and Developer Intel Xeon with Intel EM64T support, Intel
Pentium IV with EM64T support processor
Enterprise, Standard, IA64 Itanium CPU: 1.0 GHz or faster processor
and Developer
Enterprise, Standard, 32-bit 32-bit, 600 MHz Pentium III compatible CPU;
Workgroup, Developer, 1.0 GHz or faster processor recommended
and Express

Table 1-4
SQL Server 2008 editions and M INIMUM CPU R ECOMMENDED
minimum CPU type and speed E DITION OS T YPE T YPE AND S PEED CPU S PEED
Enterprise, Standard, 64-bit 64-bit CPU, 1.4 GHz 2.0 GHz or faster
Developer, Workgroup,
Web, and Express
Enterprise IA64 Itanium CPU, 1.0 GHz 1.0 GHz or faster
Developer IA64 Itanium CPU, 1.0 GHz Not specified
Enterprise, Standard, 32-bit Pentium III compatible 2.0 GHz or faster
Developer, Workgroup, CPU, 1.0 GHz
Web, and Express

Choosing Memory Options

This may sound simplistic, but the best memory option still follows the oldest rule of
thumb: Buy as much RAM as you can and the fastest possible RAM you can that is
appropriate for the system you’re installing it on.

Increasing memory often solves what may initially appear to be a CPU bottleneck. Minimum
and recommended RAM requirements for the different editions of SQL Server 2005 are
presented in Table 1-5. Minimum and recommended RAM requirements for SQL Server
2008 editions are presented in Table 1-6.

Table 1-5
SQL Server 2005 editions and E DITION OS T YPE M INIMUM RAM R ECOMMENDED RAM
minimum RAM
Enterprise, Standard, 32-bit, 64-bit 512 MB 1 GB or more
Workgroup, and Developer and Itanium
Express 32-bit 192 MB 512 MB or more
24 | Lesson 1

Table 1-6
SQL Server 2008 editions and E DITION OS T YPE M INIMUM RAM R ECOMMENDED RAM
minimum RAM
Enterprise, Standard, 64-bit 512 MB 2 GB
Developer, Workgroup,
and Web
Express 64-bit 512 MB 1 GB
Enterprise and IA64 512 MB 2 GB
Developer
Enterprise, Standard, 32-bit 512 MB 2 GB
Developer, Workgroup,
and Web
Express 32-bit 256 MB 1 GB

Determining Storage Requirements

As with memory options, the best rule of thumb is to buy as much hard disk space as you
can afford. You’ll learn about physical storage in more detail in the next lesson. All editions
of SQL Server 2005 have effectively the same minimum disk space requirements. All editions
of SQL Server 2008 have higher space requirements than SQL Server 2005. All editions of
SQL Server 2008 have the same disk space requirements. These requirements are:

SQL Server 2005


350 MB of available hard disk space for the recommended installation with approximately
425 MB of additional space for SQL Server Books Online, SQL Server Mobile Books
Online, and sample databases.
SQL Server 2008
CERTIFICATION READY? Microsoft has published disk space requirements for installable software modules within SQL
When presented with a Server 2008. These are the space requirements:
scenario where several
solutions seem plausible, Database Engine, Replication, Full-Text Search 280 MB
try to determine the one Analysis Services 90 MB
with the best return on
investment. Database Reporting Services 120 MB
administration is always Integration Services 120 MB
about trade-offs; that
Client Services 850 MB
is, choosing the best
option from competing Books Online 240 MB
alternatives. See Lesson 3
To determine SQL Server 2008 disk space requirements, first determine the modules to be
for an introduction to ROI.
installed and then add up the requirements for those modules.

Planning for Hot Add CPUs and RAM

Microsoft has been developing methods to enable the dynamic reconfiguration of hardware
so that servers continue operating while the physical hardware is changed. Underlying the
ability of SQL Server to dynamically handle hardware changes is Windows Server support for
dynamic hardware reconfiguration and of course, the physical server hardware platform must
also support physical hardware changes while the server is running. Most hardware does not
yet support this ability. When designing the hardware infrastructure, a decision should be made
about whether to provide for dynamic hardware changes. If the decision is made to provide for
this possibility, then specific models of server hardware must be chosen to facilitate this potential
situation. Further, specific Windows Server operating system versions must be selected.
Designing the Hardware and Software Infrastructure | 25

Once the hardware and the operating system have been selected to allow for dynamic
hardware changes, then the edition of SQL Server to be used can be selected. Support for
additional RAM is available in SQL Server 2005 and SQL Server 2008. SQL Server 2008
also provides additional support for dynamic additional CPUs or processors. These features
are known as Hot Add RAM and Hot Add CPUs. Despite these names, support also exists
for hardware removal. The RECONFIGURE command must be executed in SQL Server to
implement any such hardware change for SQL Server.
Requirements for Hot Add CPU
• Requires special hardware that supports Hot Add CPU.
• Requires the 64-bit edition of Windows Server 2008 Datacenter or the Windows Server
2008 Enterprise edition for Itanium-Based Systems operating system.
• Requires SQL Server 2008 Enterprise edition.
• SQL Server cannot be configured to use soft NUMA.
Requirements for Hot Add RAM
• Requires special hardware that supports Hot Add RAM.
• Requires SQL Server Enterprise and is only available for 64-bit SQL Server and for
32-bit SQL Server when AWE is enabled. Hot Add Memory is not available for 32-bit
SQL Server when AWE is not enabled.
• Requires Windows Server 2003/2008, Enterprise and Datacenter editions.

S K I L L S U M M A RY

In this lesson, you reviewed the factors you need to consider when you’re assessing the
capacity requirements of a database server. You studied a variety of methods for collecting
information about current capacity and how to forecast future needs. In addition, you
familiarized yourself with the techniques and skills you’ll need to achieve a balance between
business and technical requirements. Finally, you learned about the various hardware and
software considerations you should factor into your design plans, including hardware,
operating system, and software versions.
For the certification examination:
• Be familiar with System Monitor counters. Know how to use the System Monitor tool and
which counters provide relevant information about system status. Know the techniques of
collecting a baseline, and when and how to use it.
• Understand business requirements. Know different business requirements and the
subsystems they impact. Make sure you understand the effect of regulatory requirements
on storage needs.
• Know the prerequisites. Know the requirements and limitations for installing the various
editions of SQL Server 2005, including what operating system, how much memory, and
the speed of CPU you need.
• Understand the cost-benefit relationship between 32-bit and 64-bit processors. Be familiar
with the advantages and disadvantages of the two processor types.
26 | Lesson 1

■ Knowledge Assessment
Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
1. Which of the following factors should be considered when projecting disk-storage
requirements?
a. Forecasted business growth
b. Historical trends
c. Index maintenance space requirements
d. All of the above
2. Which of the following file types should not be considered when determining the
amount of disk space used by the database files?
a. Database files
b. Database paging file
c. Database transaction logs
d. Full-text indexes
3. What can result if improper disk-space allocation causes SQL Server 2005 to
dynamically grow the database by requesting extra disk space from the operating system?
(Choose all that apply.)
a. Truncating of log files
b. Reduced network bandwidth
c. Disk/file fragmentation
d. Processor bottleneck
4. Which of the following are System Monitor counters that can be used to assess disk I/O
rates? (Choose all that apply.)
a. PhysicalDisk:Disk Read Bytes/sec
b. PhysicalDisk:Disk Write Bytes/sec
c. PhysicalDisk:Avg. Disk Queue Length
d. PhysicalDisk:Disk Modify Bytes/sec
e. All of the above
5. The length of time data must be retained is also referred to as what?
a. Lifetime of data
b. Data Retention Period (DRP)
c. Data estimation period
d. Longevity of data
6. In order to start the System Monitor tool, you should type which of the following
commands in the Run text box?
a. perfmon
b. sysmon
c. sysinfo
d. mssysmon
e. mmc
7. If regulatory requirements or internal procedures require the encryption of data, which
subsystems are directly impacted? (Choose all that apply.)
a. Physical storage
b. Memory
c. CPU
d. Network
e. SQL Server version
Designing the Hardware and Software Infrastructure | 27

8. If you calculate future disk-space requirements based on a constant amount in a specified


period, you are calculating what?
a. Linear growth
b. Compound growth
c. Trigonometric growth
d. Geometric growth
e. Incremental growth
9. Which of the following will not affect the CPU performance of a database server?
a. Affinity mask settings
b. Number of connections
c. Available memory
d. Network bandwidth
e. Processor type
10. Affinity masks can be used to do what? (Choose all that apply.)
a. Change the bit speed of a CPU.
b. Restrict a SQL Server instance to a specific subset of CPUs.
c. Ensure that each thread always uses the same CPU between interrupts.
d. Free up RAM that was locked earlier.
e. Restrict CPU operation to specific file types.
11. Which of the following is not a benefit of using a 64-bit CPU?
a. Larger direct-addressable memory
b. Better on-chip cache management
c. Lower cost per chip
d. Enhanced on-processor parallelism
e. None of the above
12. Characteristics of a multicore CPU include which of the following? (Choose all that
apply.)
a. Executes multiple cores simultaneously
b. Includes two or more completion execution cores
c. All cores run at different frequencies
d. Can increase multithreaded throughput in SQL Server 2005
e. Contains two architectural states on each core
13. Which of the following are the most important factors in estimating CPU requirements?
(Choose all that apply.)
a. Establishing a baseline
b. Business plans
c. Historical trends
d. Correlation between CPU usage and a measurable variable
e. Longevity of data
14. Which of the following counters indicates how many bytes of memory are available?
a. Memory:Available Bytes
b. Memory:Pages/sec
c. Memory:Available RAM
d. Memory:Available Pages
e. Memory:Bytes/sec
15. Which of the following dynamic management views can be used to gather data about
memory usage by SQL Server? (Choose all that apply.)
a. sys.dm_exec_query_stats
b. sys.dm_exec_cached_plans
c. sys.dm_os_memory_pages
d. sys.dm_os_memory_objects
e. sys.dm_exec_query_calls
28 | Lesson 1

16. Which one of the following counters is used to determine the amount of memory used
by SQL Server connections?
a. SQLServer:MemoryManager:Total Server Memory
b. SQLServer:MemoryManager:Working Set
c. SQLServer:Buffer Manager:Connection Memory (KB)
d. SQLServer:Page Manager:Connection Memory (KB)
e. SQLServer:Memory Manager:Connection Memory (KB)
17. Which of the following may affect network traffic?
a. Backup schedules
b. Firewalls
c. Antivirus applications
d. Enabled network protocols
e. All of the above
18. Which of the following business requirements should be considered when modifying or
designing a database infrastructure?
a. Budgetary constraints
b. IT policies
c. Data security
d. Data availability
e. All of the above
19. During your survey, you determine that one of the existing database servers has an
800MHz Pentium III processor with 256 MB of RAM and a 400 GB hard drive, run-
ning Windows XP. Which version of SQL Server can you install on this machine?
a. Workgroup
b. Standard
c. Enterprise
d. Developer
e. None of the above
20. You want to install SQL Server 2005 Enterprise Edition on a 32-bit CPU machine.
Which operating systems can this machine use? (Choose all that apply.)
a. Windows 2003, Service Pack 1
b. Windows 2000 Professional, Service Pack 3
c. Windows XP, Service Pack 1
d. Windows 2000 Server, Service Pack 5
e. All of the above

Case Study
Examining the Infrastructure
Thylacine Savings & Loan Association is a large financial institution serving
approximately 1.6 million customers over a broad geographic area. The company is
headquartered in the city of Trevallyn, which also serves as northern headquarters,
with 407 employees. Three branch offices are located in Stratford (Eastern operations),
Belleville (Western), and Rock Hill (Southern).
The company currently has a 3 terabyte OLTP database that tracks more than 2 billion
transactions each year. The main database for all transactions and operations is located
in Trevallyn. Regional databases contain deposit/withdrawal information only, and the
headquarters database is updated daily from the regional offices.
Thylacine’s departmental database servers are dispersed throughout the headquarters
location.
Designing the Hardware and Software Infrastructure | 29

The company is currently experiencing 4 percent annual growth and plans to expand
into four new markets at the rate of one new market every two years. The database is
growing at a rate of 6 percent per year and will exceed available hard disk space in the
future. Additionally, server capacity is overloaded, resulting in poor performance and
long delays. A large portion of the database data is historical information.
After lengthy consideration, Thylacine Savings & Loan has decided to upgrade its
database system to SQL Server 2005 and has hired you as a consultant database project
architect to address the company’s current and future needs.
Use the information in the previous case study to answer the following questions.
1. Briefly summarize the initial steps you should take before beginning capacity
planning.
2. Do you need to consider regulatory factors? If so, describe the impact they’re likely
to have on the various components of the infrastructure’s capacity.
3. Which would you give greater weight: the observed growth rate of a database or the
projected business growth rate of Thylacine Savings & Loan? Why?
2 LESSON
Designing Physical
Storage
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Design physical storage. Foundational
Design transaction log storage. Foundational
Design backup file storage. Foundational
Decide where to install the operating system. Foundational
Decide where to place SQL Server service executables. Foundational
Specify the number and placement of files to create for each database. Foundational
Design instances. Foundational
Decide how many databases to create. Foundational
Decide on the placement of system databases for each instance. Foundational
Decide on the physical storage for the tempdb database for each instance. Foundational
Decide on the number of instances. Foundational
Decide on the naming of instances. Foundational
Decide how many physical servers are needed for instances. Foundational
Establish service requirements. Foundational
Specify instance configurations. Foundational

KEY TERMS
extent: A unit of space allocated to a single unit of data allocation or needs while guaranteeing that
an object. A unit of data input and for administration of a database. one business cannot see the
output; data is stored or retrieved instance: A separate and isolated other’s data.
from the disk as an extent copy of SQL Server running on page: A unit of data storage. Eight
(64 kilobytes). a server. Application service pages comprise an extent.
filegroup: A named collection of providers can support multiple
one or more data files that forms businesses and their database

In the previous lesson, you looked at the various physical requirements and considerations
associated with creating a hardware infrastructure for SQL Server database servers. It’s im-
portant to remember that although you may tend to look at aspects of the hardware envi-
ronment as separate—grouping them in memory, disk space, network requirements, and so
on—all these components are intrinsically tied to each another and interact together.
30
Designing Physical Storage | 31

In this lesson, the focus is how to best design and organize physical storage. As you might guess,
the first thing you’ll learn is that there is no correct answer or magical formula that will work
every time in addressing this issue, any more than there was for assessing hardware requirements
in Lesson 1. One of the most elegant (and maddening) features of SQL Server is that it requires
the infrastructure designer to consider the interaction of all the components in designing the
optimal solution. Inadequate memory, for example, can have a profound influence on the
effectiveness of even the fastest hard disk.
With that in mind, you must examine how to design physical storage for your databases and
instances. To efficiently manage storage for your databases, you need to understand what
objects take up disk space and how SQL Server stores those objects. In SQL Server 2000, for
example, one simple system table tracks space usage, only two objects consume disk space, and
only three types of pages exist to store user data. This structure is relatively easy to manage, but
it also has its limitations, especially regarding how SQL Server stores and retrieves large object
(LOB) data.
SQL Server 2005 and SQL Server 2008 have an enhanced storage model that expands the
number and types of objects that consume space, gives you more flexible options for storing
variable-length LOB data, and adds functionality to store partitioned data in multiple,
different locations.

■ Understanding SQL Server Storage Concepts

Disk input/output is measured in extent units (64 kilobytes). Every database has at least
two physical files: one for data entries and one for log entries. In both cases the size on disk
is allocated when you define a database. For data files, empty space is stored until an extent
THE BOTTOM LINE is written. The files are Indexed Sequential Access Method (ISAM) structures. The Global
Allocation Map, Index Allocation Map, and Page Free Space are referenced dynamically to find
and recover the correct extent. The ISAM technique permits files to grow indefinitely and scale
to as large as you might need because never more than 64 KB is retrieved or written to disk.

If you’ve ever worked with data without a computer, you’ve almost certainly noticed that data
takes up a lot of physical space. In a nontechnical environment, that means boxes of paper,
lots of file folders, and a plan to keep them organized. Or it may entail miles of shelving filled
with vast numbers of files. It was this seemingly endless parade of paperwork (usually official
documents) and the practice of tying thick legal documents together with red cloth tape that
led to the phrase “cutting through the red tape.” The point is that paper data storage con-
sumes physical space.
If you’re using SQL Server 2005 to collect and store data for your enterprise, you don’t have
quite the same space problem, and it’s highly unlikely that you’ll need to rent a warehouse to
hold your files and records. However, you’ll still have to address the issue of physical storage
and how to design it. That’s what you will learn in the following sections after we cover some
basic concepts.

Understanding Data Files and Transaction Log Files

Just like any data saved on a computer, the databases you create with SQL Server must
be stored on the hard disk. SQL Server uses three types of files to store databases on disk:
primary data files, secondary data files, and transaction log files.

Primary data files with an .mdf extension are the first files created in a database and can con-
tain user-defined objects, such as tables and views, as well as system tables that SQL Server
32 | Lesson 2

requires for keeping track of the database. If the database gets too big and you run out of
room on your first hard disk, you can create secondary data files with an .ndf extension on
separate physical hard disks to give your database more room.
Secondary files can be grouped together into filegroups. Filegroups are logical groupings of
files, meaning that the files can be on any disk in the system and SQL Server will still see
them as belonging together. This grouping capability comes in handy for very large databases
(VLDBs), which are many gigabytes or even terabytes in size.

Primary data files and any other files not specifically assigned to another filegroup belong
TAKE NOTE
* to the primary filegroup.

For the purpose of illustration, suppose you have a database that is several hundred gigabytes
in size and contains several tables. Suppose that users read from half of these tables quite a bit
and write to the other half quite a bit. Assuming that you have multiple hard disks, you can
create secondary files on two of your hard disks and put them in a filegroup called READ.
Next, create two more secondary files on different hard disks, and place them in a filegroup
called WRITE. Now, when you want to create a new table that is primarily for reading, you
can specifically instruct SQL Server to place it on the READ filegroup. The WRITE group
will never be touched. You have, to a small degree, load-balanced the system, because some
hard disks are dedicated to reading and others to writing. Using filegroups is more complex
than this in the real world, but you get the picture.
The third type of file is a transaction log file. Transaction log files use an .ldf extension and
don’t contain any objects such as tables or views. To understand these files, it’s best to know a
little about how SQL Server writes data to disk.
Internally when a user initiates a change to some data (either an INSERT, UPDATE, or
DELETE statement), SQL Server first writes the transaction information out to the transac-
tion log file. Following that action, SQL Server then caches the changed data in memory for
a short period of time. This process of updating the log file first is called a write-ahead log. As
changes to data accumulate, and at frequent intervals, SQL Server flushes the cached data by
performing actual writes to the database data file on disk.
“Why should I want to do this?” you may ask. There are two reasons. The first is speed.
Memory is about 100 times faster than hard disk, so if you pull the data off the disk and
make all the changes in memory, the changes occur about 100 times faster than they would
if you wrote directly to disk. The second reason to use transaction logs is for recoverability.
Suppose you backed up your data last night around 10:00 p.m. and your hard disk contain-
ing the data crashed at 11:00 a.m. the next day. You would lose all your changes since last
night at 10:00 p.m. if you wrote to only the data file. Because you’ve recorded the changes to
the data in the transaction log file (which should be on a separate disk), you can recover all
your data right up to the second of the crash. The transaction log stores transactions in real
time and acts as a sort of incremental backup.
Now, try to imagine the inside of these database files. Imagine what would happen if they had
no order or organization—if SQL Server wrote data wherever it found the space. It would
take forever for SQL Server to find your data when you asked for it, and the entire server
would be slow as a result. To keep this from happening, SQL Server has even smaller levels of
data storage inside your data files that you don’t see, called pages and extents.

Understanding Pages

Pages are the smallest unit of storage in a SQL Server data file. Pages are 8,192 bytes
each and start with a 96-byte header. This means that each page can hold 8,096 bytes of
data. There are several different types of pages (not all listed here), each one holding a
different type of data:
Designing Physical Storage | 33

• Data. This type of page contains most of the data you enter into your tables. The only
kinds of data entered by users that aren’t stored in a data page are text and image data
because text and image data are usually large and warrant their own pages.
• Text/image. The text, ntext, and image datatypes are designed to hold large objects,
up to 2 GB. Large objects such as pictures and large documents are difficult to retrieve
when they’re stored in a field in one of your tables because SQL Server returns the entire
object when queried for it. To break the large, unwieldy objects into smaller, more man-
ageable chunks, text, ntext, and image datatypes are stored in their own pages. This way,
when you request SQL Server to return an image or a large document, it can return
small chunks of the document at a time rather than the whole thing all at once.
• Index. Indexes are used to accelerate data access by keeping a list of all the values in a
single field (or a combination of multiple fields) in the table and associating those values
with a record number. Indexes are stored separately from data in their own page type.
• Global Allocation Map. When a table requires more space inside the data file where it
resides, SQL Server doesn’t just allocate one page at a time. It allocates eight contiguous
pages, called an extent. The Global Allocation Map (GAM) page type is used to keep
track of which extents are allocated and which are still available.
• Index Allocation Map. Although the GAM pages keep track of which extents are in
use, they don’t monitor the purpose for which the extents are being used. The Index
Allocation Map (IAM) pages are used to keep track of what an extent is being used
for—specifically, to which table or index the extent has been allocated.
• Page Free Space. This isn’t an empty page, as the name may suggest. It’s a special type
used to keep track of free space on all the other pages in the database. Each Page Free
Space page can monitor the free space on up to 8,000 other pages. That way, SQL
Server knows which pages have free space when new data needs to be inserted.

Understanding Extents

An extent is a collection of eight contiguous pages used to keep the database from
becoming fragmented. Fragmentation means that pages that belong together, usually
belonging to the same table or index, are scattered throughout the database file. To avoid
fragmentation, SQL Server assigns space to tables and indexes in extents. That way, at
least eight of the pages should be physically next to one another, making them easier for
SQL Server to locate. SQL Server uses two types of extents to organize pages: uniform
and mixed.

Uniform extents are those entirely owned by a single object. For example, if a single table
owns all eight pages of an extent, it’s considered uniform. Mixed extents are used for objects
that are too small to fill eight pages by themselves. In that instance, SQL Server divvies up
the pages in the extent to multiple objects.

Transaction logs aren’t organized into pages or extents. They contain a list of transactions
TAKE NOTE
* that have modified your data, organized on a first-come, first-served basis.

■ Estimating Database Size

Dynamically adjusting file size proves to be an overhead expensive operation. Try to set file
THE BOTTOM LINE
allocations correctly through prior planning.
34 | Lesson 2

If there weren’t electronic databases or computers and your job was still to design physical
storage for your data, you’d have to know how much data there was, its growth rate, and how
much more there will be. Armed with this information, you’d estimate how many shelf feet it
would require (or convert to miles, if appropriate). Then, you’d estimate the storage space you
needed and select a warehouse (or more than one) that met your requirements.
When you design a database, you’ll likely have to estimate how large the database will be
when filled with data. This makes sense when you consider that the old adage waste not, want
not rings true regarding hard-disk space on your SQL Server. Because databases are files that
are stored on your hard disk, you can waste hard-disk space if you make them too big. If you
make your database files too small, SQL Server will have to expand the database file, or you
may need to create a secondary data file to accommodate the extra data—a process that can
slow the system and users.
As you’ll see in more detail in Lesson 8, estimating the size of a database can also help deter-
mine whether the database design needs refining. For example, you may determine that the
estimated size of the database is too large to implement in your organization and that more
normalization is required. Conversely, the estimated size may be smaller than expected. This
would allow you to denormalize the database to improve query performance.

Planning for Capacity

Some complex formulae let you precisely estimate the size of a database, but you can
get a good ballpark estimate of the capacity you need to plan for by asking yourself a
few questions and applying simple arithmetic. The easiest way to estimate the size of a
database is to estimate the size of each table individually and then add those values. The
size of a table depends on whether the table has indexes and, if so, what type of indexes.
Here are the general steps to estimate the size of your database:

1. Calculate the record size of the table in question. This may not be easy to do as some
datatypes have variable lengths. For such columns estimate the average size and then sum
the actual or estimated size of each column in the table.
2. Divide 8,096 by the row size from step 1, and round down to the nearest number. The
figure 8,096 is the amount of data a single data page can hold, and you round down
because a row can’t be split across pages.
3. Divide the number of rows you expect to have by the result from step 2. This tells you
how many data pages will be used for your table.
4. Multiply the result from step 3 by 8,192—the size of a data page in bytes. This tells you
LAB EXERCISE exactly how many bytes your table will take on the disk.
Perform the exercise in your lab You’ll try this process in Exercise 2.1.
manual.

Data Compression

SQL Server 2008 includes two new data compression features for reducing the disk-
space requirements—row compression and page compression. Only one type of
compression can be specified at a time on the same object. Compression can be used
on both regular tables and nonclustered indexes. Clustered indexes and views can be
considered as compressed if the data in the table is compressed since views and clustered
indexes are representations of the regular table data. The space savings with these
compression methods will, as with all forms of data compression, depend on the nature
of the data being compressed.
Designing Physical Storage | 35

A new stored procedure named sp_estimate_data_compression_savings has been provided


with SQL Server 2008 to provide estimated space savings without having to actually compress
a table first. This stored procedure needs a table or index name and either ‘ROW’ or ‘PAGE’
as a compression method.
TAKE NOTE
* Both row-based and page-based compression are enabled via either the CREATE TABLE,
Data Compression is
CREATE INDEX, ALTER TABLE, or ALTER INDEX commands. Examples of both row
only available in the
and page compression are shown:
Enterprise, Developer,
and Evaluation editions • Row Compression. Compresses all columns in all rows in a table (or index). This type
of SQL Server 2008. of compression involves compressing each row individually. Row compression is pre-
ferred over page compression when the data to be compressed has a higher percentage of
unique data as compared to repetitive data:
TAKE NOTE
* ALTER TABLE mytable REBUILD
Data compressed using WITH (DATA_COMPRESSION = ROW);
row or page compres- • Page Compression. Also compresses all columns in all rows in a table; however, the
sion should result in method of compression spans multiple rows thus involving an entire page of data. Page
faster backup and compression can be thought of as a higher or further level of compression because when
restore times. If Backup page compression is specified, row compression is done first, then the additional page
Compression is also level compression is applied. The purpose of page compression is to reduce the amount
enabled, the potentially of redundant data stored in a page regardless of which row it is in. Thus page compres-
redundant or duplicative sion is preferred over row compression when the data on a page to be compressed has a
compression may not higher percentage of repetitive data as compared to unique data:
show a significant time
ALTER TABLE mytable REBUILD
savings.
WITH (DATA_COMPRESSION = PAGE);

Sparse Columns

SQL Server 2008 includes a new storage space savings feature known as Sparse
Columns. Even if a column often has NULL data, space must be allocated for the
column. The algorithm used in assigning space to columns of data can be complex
depending on the datatypes involved. SQL Server disregards the order in which columns
are specified in the CREATE TABLE command and reorganizes the columns that are
defined for the table into a group for fixed-size columns and a group for variable-length
columns. Using a sparse-column option for a fixed-length column potentially alters this
fixed-space allocation. When the majority of the rows in a table have null data for a
particular column, then that column is a probable candidate for use as a sparse column.
Defining a column as sparse can actually increase the space used if the majority of the
rows have data in the column. Sparse columns also require some additional processing
overhead so like most things, using sparse columns is a trade-off and you should use
your best judgment depending on the data.

A considerable number of rules must be followed when using sparse columns. Here is a list to
remember:
• Every sparse column must be nullable.
• No default value constraint or rule can be applied to a sparse column.
• The column options IDENTITY and ROWGUIDCOL cannot be used on a sparse
column.
• All datatypes and attributes of datatypes are supported except for GEOGRAPHY,
GEOMETRY, TEXT, NTEXT, IMAGE, TIMESTAMP, VARBINARY (MAX), and
FILESTREAM.
• User-defined datatypes cannot be sparse.
• A table with one or more sparse columns cannot be compressed.
36 | Lesson 2

• A computed column cannot be sparse; however, a sparse column could be used to calcu-
TAKE NOTE
* late a computed column.
Sparse columns are
only available in the • Sparse columns cannot be used where merge replication is used.
Enterprise, Developer, • Sparse columns cannot be part of a clustered index nor part of a primary key.
and Evaluation editions A sparse column can be implemented simply by adding the key word SPARSE to the column
of SQL Server 2008. definition. The following example shows how this can be done:
CREATE TABLE address(
X REF addressid INT PRIMARY KEY,
streetinfo1 CHAR(50) NULL,
Data compression streetinfo2 CHAR(50) NULL SPARSE,
features are also new in city_name CHAR(20) NULL,
SQL Server 2008. See statecd CHAR(2),
Lesson 2. zipcode CHAR(9)
);

Because sparse columns normally should have a high percentage of null-valued rows,
filtered indexes are appropriate for these columns. A filtered index on a sparse column can
TAKE NOTE
* index only those rows that have actual values. This results in a smaller and more efficient
index on the column.

Understanding RAID

Placing database files in the appropriate location is highly dependent on the available
hardware and software. There are few hard-and-fast rules when it comes to databases. In
fact, the only definite rule is that of design. The more thoroughly you plan and design
your system, the less work it will be later, which is why it’s so important to develop a
good capacity plan.

You must keep several issues in mind when you’re deciding where to place your database files.
They include planning for growth, communication, fault tolerance, reliability, and speed.
When disks are arranged in certain patterns and use a specific controller, they can be a
Redundant Array of Inexpensive Disks (RAID) set. RAID is one of the best measures you can
take to ensure the reliability and consistency of your database.
For speed and reliability, it’s better to have more disks, and the faster the better. SCSI-type
disks are typically faster than IDE, although they’re slightly more difficult to configure. SATA
and SAS disks may make excellent choices for both speed and reliability. RAID controllers
X REF typically support only one type of drive interface so your choices for a controller and drives
Lesson 10 discusses must match each other. Note further that for non-SCSI drives, different RAID controllers
RAID and its role in support different quantities of drives based on the number of connectors. This potential
assuring high availability. limitation must be taken into account as well as the physical number of drives that can be
installed in a cabinet.
Several numbers are associated with RAID, but the most common are 0, 1, 5, and 10 as
shown in Table 2-1. Each has its own features and drawbacks.
RAID 0 uses disk striping—it writes data across multiple hard-disk partitions in what is called
a stripe set. This can greatly improve speed because multiple hard disks are working at the
same time. Although RAID 0 gives you the best speed, it doesn’t provide any fault tolerance.
If any one of the hard disks in the stripe set is damaged, you lose all your data.
RAID 1, called disk mirroring, writes your information to disk twice—once to the primary drive
and once to the mirror drive. This gives you excellent fault tolerance, but it’s fairly slow because
you must write to disk twice to two different hard drives. RAID 1 is optimized for fast reads.
Designing Physical Storage | 37

Table 2-1
RAID Levels RAID LEVEL DESCRIPTION
RAID 0 Supports striping across any number of disks, but doesn’t support
mirroring. Doesn’t provide fault tolerance.
RAID 1 Supports mirroring only with two disks.
RAID 5 Combines striping with parity information to protect data.
The parity information can be used to reconstruct up to one
failed drive.
CERTIFICATION READY?
Know the different types RAID 10 Supports striping across mirrored pairs of disks. Because it provides
of RAID and what types the fault tolerance of RAID 1 and the performance advantages of
are preferred for different RAID 0, it’s also known as RAID 1⫹0.
situations.

RAID 5 requires at least three physical drives and works by writing parts of data across all
drives in the set. Parity checksums are also written across all disks in the stripe set, giving you
excellent fault tolerance because the parity checksums can be used to re-create information
lost if a single disk in the stripe set fails. To understand this, think of a math problem like
3 ⫹ 2 ⫽ 5. Now think of one drive storing the number 3 and the other storing 2, with
5 as the parity checksum. If you remove one of the drives, you can re-create the lost data
by referring back to the other two: For example, x ⫹ 2 ⫽ 5 means that x ⫽ 3. However,
if more than one disk in the stripe set fails, you’ll lose all your data. RAID 5 is often called
stripe set with parity.
RAID 10 is a combination of both RAID 1 and RAID 0. This level of RAID should be used
in mission-critical systems that require uptime 24 hours a day, 7 days a week, and the fastest
possible access. RAID 10 implements striping and then mirrors the stripe sets. You still have
excellent speed and excellent fault tolerance, but you also have the added expense of using
twice the disk space of RAID 0. Because it doesn’t store a parity bit, it’s fast, but it duplicates
the data on four or more drives to be safe. This type of RAID is best for a situation that that
can afford no SQL Server downtime.

This discussion covers hardware RAID. Windows Server operating systems also provide
RAID implemented in software. Generally hardware RAID is faster. With software RAID,
TAKE NOTE
* all functions that would be handled by a hardware RAID controller must be handled in
software, which introduces an extra load on the server.

■ Designing Transaction Log Storage


The write-ahead log ensures data integrity through mishaps such as a power failure. Always
THE BOTTOM LINE train your data-entry people to check their last submission following a disaster to ensure the
transaction actually survived the catastrophe.

Every SQL Server database has a transaction log that records all transactions and the database
modifications made by each transaction. Think of it as an ongoing collection of everything
that has happened to your database—a diary of database doings.
The transaction log is a critical component of the database, and if there is a system failure,
the transaction log may be required to bring your database back to a consistent state. For that
reason, the transaction log should never be deleted or moved unless you fully understand the
ramifications of doing so.
38 | Lesson 2

The transaction log supports a number of operations:


• Recovering individual transactions. If an application issues a ROLLBACK statement,
or if the Database Engine detects an error such as the loss of communication with a
client, the log records are used to roll back the modifications made by an incomplete
transaction.
• Recovering all incomplete transactions when SQL Server is started. When an
instance of SQL Server is started, it runs a recovery of each database. Every modification
recorded in the log that may not have been written to the data files is rolled forward.
Every incomplete transaction found in the transaction log is then rolled back to make
sure the integrity of the database is preserved.
• Rolling a restored database, file, filegroup, or page forward to the point of failure.
If a hardware or disk failure affecting the database files occurs, you can restore the
database to the point of failure using the transaction log. You first restore the last full
database backup and the last differential database backup, and then you restore the
subsequent sequence of the transaction log backups to the point of failure. When you
restore each log backup, all the modifications recorded in the log roll forward all the
transactions. When the last log backup is restored, SQL Server uses the log information
to roll back all transactions that were not complete at that point.
• Supporting transactional replication. The Log Reader Agent monitors the transaction
log of each database configured for transactional replication and copies the transactions
marked for replication from the transaction log into the distribution database.
• Supporting standby server solutions. The standby-server solutions, database mirror-
ing, and log shipping, rely heavily on the transaction log. In a log-shipping scenario, the
primary server sends the active transaction log of the primary database to one or more
destinations. Each secondary server restores the log to its local secondary database. In the
case of database mirroring, every update to the principal database is immediately repro-
duced in a separate, full copy of the database: the mirror database. The principal server
instance sends each log record immediately to the mirror server instance, which applies
the incoming log records to the mirror database, continually rolling it forward.

Managing Transaction Log File Size

The best way to think of a transaction log is as a string of log records. Physically, the
sequence of log records is stored in the set of physical files that implements the transac-
tion log, meaning that the transaction log maps over one or more physical files.

SQL Server divides each physical log file internally into a number of virtual log files. Virtual
log files have no fixed size, and there is no fixed number of virtual log files for a physical log
file. The Database Engine chooses the size of the virtual log files dynamically while it’s creat-
ing or extending log files, and it tries to keep the number of virtual files small. The size of the
virtual files after a log file has been extended is the sum of the size of the existing log and the
size of the new file increment. The size or number of virtual log files can’t be configured or set
by administrators.
The transaction log is a wraparound file. To understand what that means, assume you have a
database with one physical log file divided into five virtual log files. When the database was
created, the logical log file began making entries from the start of the physical log file. New
log records have been added at the end of the logical log, and they expand toward the end of
the physical log. When they reach the end of the physical log file, the new log records wrap
around to the start of the physical log file. This cycle repeats endlessly, as long as the end of
the logical log never reaches the beginning of the logical log.
Designing Physical Storage | 39

If the end of the logical log reaches the start of the logical log, then one of two things occurs.
If the FILEGROWTH setting is enabled for the log and space is available on the disk, the file
is increased by the amount specified in the GROWTH_INCREMENT setting and the new
log records are added to the extension. If the FILEGROWTH setting isn’t enabled, or the
disk that is holding the log file has less free space than the amount specified in GROWTH_
INCREMENT, a 9002 error is generated.

If the log contains multiple physical log files, the logical log will move through all the
TAKE NOTE
* physical log files before it wraps back to the start of the first physical log file.

TRUNCATING THE TRANSACTION LOG


If log records were never deleted from the transaction log, the logical log would grow until it
filled all the available space on the disks holding the physical log files. To reduce the size of
the logical log and free disk space for reuse by the transaction log file, you must truncate the
inactive log periodically.
Transaction logs are divided internally into sections called virtual log files. Virtual log files
are the unit of space that can be reused. Only virtual log files that contain just inactive log
records can be truncated. The active portion of the transaction log—the active log—can’t be
truncated because the active log is required to recover the database. The most recent check-
point defines the active log. The log can be truncated up to that checkpoint, and the inactive
portion is marked as reusable.

Truncation doesn’t reduce the size of a physical log file. Reducing the physical size of a log
TAKE NOTE
* file requires shrinking the file.

UNDERSTANDING THE TRUNCATION AND RECOVERY MODEL


The recovery model of a database determines when transaction log truncation occurs (you’ll
X REF
learn more about recovery models in Lesson 11):
Lesson 11 discusses • Simple recovery model. Transaction log backups aren’t supported under this model, but
recovery models. log truncation is automatic. The simple recovery model logs only the minimal informa-
tion required to ensure database consistency after a system crash or after restoring a data
backup. This minimizes the space requirements of the transaction log space compared
to the other recovery models. To prevent the log from filling, the database requires suf-
ficient log space to cover the possibility of log truncation being delayed.
• Full and bulk-logged recovery models. In the full or bulk-logged recovery model, all
log records must be backed up to maintain the log chain—a series of log records having
an unbroken sequence of log sequence numbers (LSNs). The inactive portion of the log
can’t be truncated until all its log records have been captured in a log backup.
The log will always be truncated when you back up the transaction log, as long as at least one
of the following conditions exists:
• The BACKUP LOG statement doesn’t specify WITH NO_TRUNCATE or WITH
COPY_ONLY.
• A checkpoint has occurred since the log was last backed up.

MONITORING LOG SPACE USE


You can monitor log space use by using DBCC SQLPERF (LOGSPACE) as shown in
Figure 2-1. This command returns information about the amount of log space currently used
and indicates when the transaction log is in need of truncation.
40 | Lesson 2

Figure 2-1
Typical results from DBCC
SQLPERF (LOGSPACE)

To get information about the current size of a log file, its maximum size, and the autogrow
option for the file, you can also use the size, max_size, and growth columns for that log file in
sys.database_files.

SHRINKING THE SIZE OF THE LOG FILE


As you’ve seen, truncating the transaction log is essential because doing so frees disk space
for reuse. However, truncation doesn’t reduce the physical size of the log file. To do that, you
need to shrink the log file to remove one or more virtual log files that don’t hold any part of
the logical log (that is, inactive virtual log files). When a transaction log file is shrunk, enough
inactive virtual log files are removed from the end of the log file to reduce the log to approxi-
LAB EXERCISE mately the target size.
Perform the exercise in your lab In Exercise 2.2, you’ll shrink a transaction log file.
manual.
ADDING OR ENLARGING A LOG FILE
If you don’t want to (or can’t) shrink the log file, another way to gain space is to enlarge
the existing log file (if disk space permits) or add a log file to the database, typically on a
different disk.
To add a log file to the database, use the ADD LOG FILE clause of the ALTER DATABASE
statement. Adding a log file allows the log to grow.
To enlarge the log file, use the MODIFY FILE clause of the ALTER DATABASE statement,
specifying the SIZE and MAXSIZE syntax.

UNDERSTANDING TRANSACTION LOG STORAGE


As you’ve seen, transaction logs and their storage depend on a number of factors, and for the
most part there are no hard-and-fast rules; but you should follow some basic principles when
designing storage.
Normally you should store transaction log files and data files on separate disk volumes. Note
that when using RAID, it is common to have one large RAID array that is logically separated
into multiple volumes. If this is the case, then you may wish to consider implementing
multiple RAID arrays.
Designing Physical Storage | 41

You should reduce the risk of damage to your transaction log by locating it on fault-tolerant
storage. A prudent precaution is to also make multiple copies of log backups by backing up the
log to disk and then copying the disk file to another device, such as a separate disk or tape.

■ Designing Backup-File Storage

The first line of defense against a disaster is your backup. Do it regularly. Store the media in a
THE BOTTOM LINE
location where, if you have a fire, they won’t burn along with your server room.

In SQL Server, you’re limited to placing database files on what SQL Server deems to be a
local hard disk. Your local hard disks can be on your local machine or on a hardware device
that is connected directly to the SQL Server machine (such as a hardware RAID array).
Although you have this limitation with your active database files, this rule doesn’t apply to
your backups. Backups can be placed anywhere in your enterprise, using named pipe, shared
memory, TCP/IP and VIA protocols on local hard disks, networked hard disks, and tape.

Managing Your Backups

To be able to restore your system when needed, you must manage your backups carefully.
Each backup contains any descriptive text provided when the backup was created, as well
as the backup’s expiration information. You can use this information to:

• Identify a backup.
• Determine when you can safely overwrite the backup.
• Identify all the backups on a backup medium—tape or disk.
A complete history of all backup and restore operations on the server is stored in the msdb
database. Management Studio uses this history to identify the database backups and any
transaction log backups on the specified backup medium, as well as to create a restore plan.
The restore plan recommends a specific database backup and any subsequent transaction log
backups related to this database backup.
If msdb is restored, any backup history information saved since the last backup of msdb was
created is lost. Hence, you should back up msdb frequently enough to reduce the risk of
losing recent updates.
When you’re designing database backup-file storage, there are a number steps and procedures
you should include in your design. The first, and one of the most critical, is to store your back-
ups in a secure place—preferably a secure site removed from the location where the data exists.
Label backup media to avoid accidentally overwriting critical backups. You should also write
expiration dates on backup as protection against inadvertent overwriting. Labeling also allows
for easy identification of the data stored on the backup media or the specific backup set.
You should keep older backups for a designated amount of time, in case the most recent
backup is damaged, destroyed, or lost. When creating backups, consider using RAID 10.
As you recall, with RAID 10, you have a mirrored set of data that you can stripe across sev-
eral mirrored pairs for additional I/0 throughput. Because backups primarily read from the
database to write out the backup file, the write advantage will be noticeable on disks storing
backup files.
Another worthwhile step is to write to disks that are locally attached instead of writing to
network-attached storage. If the data is being written to direct-attached storage, you can
eliminate factors outside the server that may increase the backup time.
42 | Lesson 2

Maintaining Transaction Log Backups

As you saw in the previous section, if you use the full or bulk-logged recovery models,
making regular backups of your transaction logs is essential to recovering data.

Transaction log backups generally use fewer resources than database backups. As a result, you
can create them more frequently than database backups, reducing your risk of losing data.

A transaction log backup can be larger than a database backup. Suppose, for example, that
TAKE NOTE
* you have a database with a high transaction rate. In that case, the transaction log will grow
quickly. The best approach will be to create transaction log backups more frequently.

There are three types of transaction log backups. A pure log backup contains only transac-
tion log records for an interval, without any bulk changes. A bulk log backup includes log
and data pages changed by bulk operations. In this type of backup, point-in-time recovery
isn’t allowed. A tail-log backup is taken from a possibly damaged database to capture the log
records that haven’t yet been backed up. A tail-log backup is taken after a failure in order to
prevent work loss and can contain either pure log or bulk log data.
Making regular transaction log backups is an essential step in your database design, as you’ll
see in Lesson 11. As you already learned, in addition to permitting you to restore the backed-
up transactions, a log backup truncates the log to remove the backed-up log records from
CERTIFICATION READY? the log file. If you don’t back up the log frequently enough, the log files can fill up. If you
You remembered to back lose a log backup, you may not be able to restore the database past the preceding backup.
up the current log (tail Therefore, you should store the chain of log backups for a series of database backups.
log) after a catastrophic
data disk failure. Congra- If your most recent full database backup is unusable, you can restore an earlier full database backup
tulations! When you and then restore all the transaction log backups created since that earlier full database backup.
restore this log is it
with the /RECOVERY or Because of the crucial role transaction log backups play in restoring a damaged database, you
/NORECOVERY switch? should make multiple copies of log backups by backing up the log to disk and then copying
the disk file to another device, such as a separate disk or tape.

Backup Compression

SQL Server 2008 includes an easy-to-implement method of incorporating compression


when conducting database backups. Backups can be set to automatically use compression
via a new database option. This new option is set on the Database Settings node of the
Server Properties. This option setting can be overridden by specifying in the BACKUP
command whether or not compression should be performed. When restoring from a
compressed backup, no additional command syntax is necessary as SQL Server 2008
handles the decompression automatically. Be aware however that a compressed backup
would not be readable by an earlier version of SQL Server.

An example of the command syntax for using compression in a backup is shown next:
BACKUP DATABASE AdventureWorks TO DISK =
‘C:\SQLServerBackups\AdventureWorks.Bak’
WITH COMPRESSION

Note that the ability to create backups with compression is only available with the
TAKE NOTE
* Enterprise, Developer, and Evaluation editions of SQL Server 2008. All editions of SQL
Server 2008 can restore a compressed backup.
Designing Physical Storage | 43

■ Deciding Where to Install the Operating System

A database guideline has evolved. Use three spindles: one for the operating system, one for
THE BOTTOM LINE
the data, and one for the log. With SQL Server you may also want one for TempDB. Add
additional drives (spindles) maintaining these three (or four) categories.

To ensure the maximum utilization of resources while enhancing security for your database
server, you should install the operating system files on a spindle separate from data and
applications. In the case of SQL Server, you should install the Windows operating system
on a separate drive with or without SQL executables, where the page file will be. NTFS
5.0, introduced with Windows 2000, supports both file encryption and compression. By
default, these two features are turned off on a newly installed Windows 2000, 2003, or 2008
Server. Although these features do provide some benefits under limited circumstances, they
don’t provide any benefits for SQL Server. SQL Server is very I/O intensive, and anything
that increases disk I/O hurts SQL Server’s performance, so using either of these features can
greatly hurt performance.
Both file encryption and compression significantly increase disk I/O because data files have
to be manipulated on the fly as they’re used. If either of these settings have been activated by
accident, you should turn off this feature.

SQL Server 2008 introduces compressible rows and pages—slightly different concepts
TAKE NOTE
* than the operating system solution—that may result in faster I/O. Study BOL topics on
“Compression Implementation” to see if these methods will work for you.

■ Deciding Where to Place SQL Server Service Executables

Generally, place SQL Server files on the operating system spindle. Generally, change the
THE BOTTOM LINE
owner of services to a domain user and provide that service owner with only the rights and
permissions needed.

Each service in SQL Server represents a process or set of processes. Depending on the
Microsoft SQL Server components you choose to install, SQL Server 2005 Setup installs the
following 10 services:
• SQL Server Database Services
• SQL Server Agent
• Analysis Services
• Reporting Services
• Notification Services
• Integration Services
• Full-Text Search
• SQL Server Browser
• SQL Server Active Directory Helper
• SQL Writer
You should install only the services you’ll be using with SQL Server 2005.
44 | Lesson 2

On all supported operating systems, SQL Server and SQL Server Agent run as Microsoft
Windows services. For SQL Server and SQL Server Agent to run as services in Windows,
SQL Server, and SQL Server Agent must be assigned a Windows user account. Typically,
both SQL Server and SQL Server Agent are assigned the same user account—either the local
system or a domain user account. However, you can customize the settings for each service
during the installation process.

Program files and data files can’t be installed on a removable disk drive, on a file system that
TAKE NOTE
* uses compression, or on shared drives on a failover cluster instance.

■ Specifying the Number and Placement of Files for Each Database

Dynamic tables require backup frequently. Static tables need to be backed up just once.
THE BOTTOM LINE
Analyze your system. Create filegroups to manage your database objects efficiently.

SQL Server maps a database over a set of operating-system files. Data and log information are
never mixed in the same file, and individual files are used by only one database. As explained
earlier, filegroups are named collections of files and are used to help with data placement and
administrative tasks such as backup and restore operations.
SQL Server data and log files can be put on either FAT or NTFS partitions, with NTFS
highly recommended because of its security aspects. Read/write data filegroups and log files
can’t be placed on a compressed NTFS file system. Only read-only databases and read-only
secondary filegroups can be put on a compressed NTFS file system.

Setting Up Database Files

All SQL Server databases are composed of three file types:

• Primary data files. The primary data file is the starting point of the database and points
to the other files in the database. Every database has one primary data file, typically with
an .mdf extension.
• Secondary data files. Secondary data files make up all the data files other than the pri-
mary data file. Some databases may not have any secondary data files, whereas others
have several secondary data files. The usual extension for secondary data files is .ndf.
• Log files. Log files hold all the log information that is used to recover the database.
There must be at least one log file for each database, although there can be more than
one. The recommended filename extension for log files is .ldf.

Although the .mdf, .ndf, and .ldf filename extensions aren’t required, it’s a good idea to use
TAKE NOTE
* them because they help you identify the different kinds of files and their use.

The locations of all the files in a database are recorded in the primary file of the database and
in the master database. SQL Server uses the file location information from the master data-
base most of the time. In the following situations, it uses the file location information in the
primary file to initialize the file location entries in the master database:
• When attaching a database using the CREATE DATABASE statement with either the
FOR ATTACH or FOR ATTACH_REBUILD_LOG option
• When upgrading from SQL Server version 2000 or version 7.0 to SQL Server 2005
• When restoring the master database
Designing Physical Storage | 45

Setting Up Filenames

Each SQL Server file has two different names. The logical_file_name is used to refer to
the physical file in all Transact-SQL statements. The logical filename must comply with
the rules for SQL Server identifiers and must be unique among logical filenames in the
database. The OS_file_name is the name of the physical file, including the directory
path. It must follow the rules for operating system filenames.

Setting Up File Size

SQL Server files can grow automatically from their originally specified size. This growth
is in keeping with a growth increment you define when you create the file. Say, for exam-
ple, that you have a 100 MB file, and you define the growth increment as 10 MB. When
the file reaches 100 MB in size, it automatically grows to 110 MB, at 110 MB it grows
to 120 MB, and so on. If there are multiple files in a filegroup, they won’t autogrow until
all the files are full. Growth then occurs in a round-robin fashion. Therefore, you need to
make sure placement of files allows files sufficient room to expand.

Each file can also have a maximum size specified. If you don’t specify a maximum size, the file
can continue to grow until it has used all available space on the disk. This feature is especially
useful when SQL Server is used as a database embedded in an application where the user
doesn’t have convenient access to a system administrator.

Setting Up Database Filegroups


Database objects and files can be grouped together in filegroups for administration purposes
as well as for allocation. There are two types of filegroups:

TAKE NOTE
* • Primary filegroups. Contains the primary data file and any other files not specifically
Log files are never part assigned to another filegroup. All pages for the system tables are allocated in the primary
of a filegroup. Log space filegroup.
is managed separately
from data space. • User-defined filegroups. Any filegroups that are specified by using the FILEGROUP
keyword in a CREATE DATABASE or ALTER DATABASE statement.
No file can be a member of more than one filegroup. Tables, indexes, and large object data can
be associated with a specified filegroup. In this case, all their pages are allocated in that file-
group, or the tables and indexes can be partitioned. The data of partitioned tables and indexes
is divided into units, each of which can be placed in a separate filegroup in a database.
One filegroup in each database is designated the default filegroup. When a table or index is
created without specifying a filegroup, it’s assumed that all pages will be allocated from the
default filegroup. Only one filegroup at a time can be the default filegroup. If no default file-
group is specified, the primary filegroup is the default filegroup.

■ Designing Instances

Instances are isolated from each other. Someone with permission to access one instance
THE BOTTOM LINE
(normally) cannot access another instance.

An instance is a single installation of SQL Server. There are two types of SQL Server instances:
• Default instance. The first installation of SQL Server on a machine is the default
instance. It doesn’t have a special network name; it works by using the name of the
46 | Lesson 2

computer, just like always. The names of the default services remain MSSQLServer and
SQLServerAgent. If you have older SQL client applications that use only the computer
name, then you can still use those against the default instance. You can have only a
single default instance running at any given time.
• Named instance. SQL Server can be installed multiple times (in different directories)
on the same computer. In order to run multiple copies at the same time, a named
instance is installed. With a named instance, the computer name and the name of the
instance are used for the full name of the SQL Server instance. For example, if the
server GARAK has an instance called SECOND, the instance is known by GARAK\
SECOND, and GARAK\SECOND is used to connect to the instance, as shown in
Figure 2-2.

Figure 2-2
Connecting to a named
instance

One of the first decisions you have to make when installing SQL Server 2005 is whether to
use a default or named instance. Use the following guidelines in making your decision:
• If you’re upgrading from SQL Server 7.0, the upgraded instance must be created as a
default instance.
• If you only plan to install a single instance of SQL Server on a database server, it should
be a default instance.
• If you must support client connections from SQL Server 7.0 or earlier, it’s easier to use a
default instance.
SQL Server allows you to install a named instance without installing a default instance. This is
TAKE NOTE
* useful when you plan to have multiple instances on the same computer because the server can
You can install instances host only one default instance. You can have a default instance and multiple named instances
any time, even after on the same server, but it’s simpler if every instance has the same naming convention.
a default or another
named instance of SQL Any application that installs SQL Server Express edition should install it as a named instance.
Server is installed. Doing so minimizes conflict in situations where multiple applications are installed on the
same computer.

■ Deciding on the Number of Instances

A single server has only so many CPUs and a certain amount of RAM. Since each instance
THE BOTTOM LINE
requires its fair share of resources, you must continue that balancing act you have been
learning. At what point does performance to the other instances suffer?
Designing Physical Storage | 47

As you’ve seen, SQL Server supports multiple SQL Server instances on a single server or
processor. Only one instance can be the default instance; all others must be named instances.
A computer can run multiple instances of SQL Server concurrently, and each instance runs
independently of other instances. All instances on a single server or processor must be the
same localized version of SQL Server 2005.
Table 2-2 shows the number of instances supported for each instance-aware component in the
different editions of SQL Server 2005.

Table 2-2
Number of instances per SQL SQL S ERVER D ATABASE E NGINE A NALYSIS S ERVICES R EPORTING S ERVICES
Server 2005 edition and 2005 E DITION I NSTANCES I NSTANCES I NSTANCES
component
Enterprise or 50 50 50
Developer
Standard, 16 16 16
Workgroup, or
Express
TAKE NOTE
*
The maximum number
of instances is reduced Based on business requirements, you need to determine the number of instances that must
by at least 50 percent be installed on a database server. You can use several instances to isolate databases on a single
when servers are clus- server. Doing so safeguards the databases from inadvertent configuration changes. However,
tered. Further reduction each instance has certain resource requirements, and running too many instances increases the
may result from other management overhead of the operating system.
clustering restrictions.
The number of instances you can install always depends on the resources available on your
server and the resources that each instance requires. Sometimes it’s possible to sum the indi-
X REF vidual resource requirements for CPU, memory, and I/O, and get a reasonably good idea of
how many instances can fit.
For additional informa-
tion, see Lesson 3. Usually, if you have enough memory and disk space with SQL Server, you can get about four
instances comfortably—maybe one or two more, if they’re low-power-consumption instances.
Add many more than that, and you can run into disk trouble.
Generally speaking, one SQL Server instance will outperform two or more instances on the
same hardware, because there is some overhead for the instances themselves. If your first
instance isn’t hitting a performance bottleneck, having a second instance always reduces the
resources available to both instances because the second instance maintains both the second
copy of SQL Server and its own copies of the query plans for its data.
Your obvious goal is to find a way to achieve a balance between isolation, manageability, and
resources.

■ Deciding How to Name Instances

As with all objects, establish a naming convention that makes sense in your environment.
THE BOTTOM LINE

Because of the large number of instances you can potentially have across an enterprise,
establishing a naming convention at the outset is good practice. Each instance must have a
unique name. The names should be short but descriptive. Pay careful attention to creating
naming conventions, avoiding cryptic names as much as possible. If the instances aren’t
named clearly, you may make mistakes when accessing them. Remember that an instance
can’t be renamed—once a name is assigned, that’s it.
48 | Lesson 2

Keep in mind the following caveats and requirements when you’re creating a SQL Server
instance name:
• Instance names are limited to 16 characters.
• Instance names are case insensitive.
• An instance can’t be renamed. If you change the name of the computer, that portion of
the name changes, but not the instance name.
• Instance names can’t contain Default, MSSQLServer, or other reserved keywords.
• The first character in the instance name must be a letter or an underscore ( _ ).
Subsequent characters can be from other national scripts, the dollar sign ( $ ), or an
underscore ( _ ).
• Embedded spaces or other special characters aren’t allowed in instance names, nor are
the backslash ( \ ), comma ( , ), colon ( : ), semicolon ( ; ), single quote ( ’ ), ampersand
( & ), or at sign ( @ ).

Only characters that are valid in the current Microsoft Windows code page can be used
TAKE NOTE
* in SQL Server instance names. Also it is probably a good practice to not use complicated
instance names.

The name you give an instance is a “virtual” name. When creating directories and files, SQL
Server Setup uses the instance ID it generates for each server component. The server compo-
nents in SQL Server 2005 are the Database Engine, Analysis Services, and Reporting Services.
The instance ID is in the format MSSQL.n, where n is the ordinal number of the compo-
nent being installed. The instance ID is used in the file directory and the registry root. For
instance, if you install SQL Server and include Analysis and Reporting Services, the instance
ID will be three different numbers, and each server component will have its own instance ID.
The first instance ID generated is MSSQL.1; ID numbers are incremented for additional
instances as MSSQL.2, MSSQL.3, and so on. To confuse things a little further, if gaps occur
in the ID sequence because you’ve uninstalled a component or an entire instance, subsequent
installs result in SQL Server generating ID numbers to fill the gaps first. Hence, the most
recently installed instance may not always have the highest instance ID number.

SQL Server 2008 identifies instances slightly differently. The default path name is still
Program Files\Microsoft SQL Server but then deviates from SQL Server 2005 in that
TAKE NOTE
* the component is identified (e.g., MSAS10 for Analysis Services, MSRS10 for Reporting
Services, and MSSQL10 for the OLTP Database Engine).

Server components are installed in directories with the format <instanceID>\<component


name>. For example, a default or named instance with the Database Engine, Analysis
Services, and Reporting Services has the following default directories:
CERTIFICATION READY?
Expect some exam • <Program Files>\Microsoft SQL Server\MSSQL.1\MSSQL\ for the Database Engine
questions combining • <Program Files>\Microsoft SQL Server\MSSQL.2\OLAP\ for Analysis Services
RAID, filegroups, and
possibly multiple • <Program Files>\Microsoft SQL Server\MSSQL.3\RS\ for Reporting Services
instances. Can a new Instead of <Program Files>\Microsoft SQL Server, a <custom path> is used if the user chooses
instance be created using
to change the default installation directory.
a RAID 5 array already in
use by another instance? SQL Server 2005 Integration Services, Notification Services, and client components aren’t
If so, where should the instance aware and, therefore, aren’t assigned an instance ID. Non-instance-aware components
data files and log files be are installed to the same directory by default: <system drive>:\Program Files\Microsoft SQL
located if multiple drive Server\90\. Changing the installation path for one shared component also changes it for the
letters and arrays are
other shared components. Subsequent installations install non-instance-aware components to
available?
the same directory as the original installation.
Designing Physical Storage | 49

■ Deciding How Many Physical Servers Are Needed

Only so many instances can be supported on a single server. If you need more instances,
THE BOTTOM LINE
you must procure more servers. This involves hardware, software, and licenses.

Determining how many physical servers you’ll need and determining how many databases
you should create both depend on the same factors. The total number of databases or
instances on a particular server isn’t all that relevant. What is important is how busy each of
the databases is (and, to a certain degree, the size of the databases in relation to the size of the
available disk space). You can have servers with only one very busy database, and other servers
with many, many databases (all little used). The same logic applies to instances.
You must consider the total overall load on each physical SQL Server, not the total number
of databases on each server (unless database size is an issue). As you saw in Lesson 1, System
Monitor can be used to help you determine whether a particular SQL Server currently is
experiencing bottlenecks.
If you’ll be setting up one or more new SQL Servers, determining how many databases
should be on each server isn’t an easy task, because you probably don’t know what the load on
each database will be. In this case, you must make educated guesses about database usage to
best distribute databases among multiple SQL Servers and get the biggest performance ben-
efits. And once you get some experience with the databases in production, then you can move
them around as appropriate to balance the load.

Deciding Where to Place System Databases for each Instance


When an instance of SQL Server 2005 is installed, Setup creates the database and log files
shown in Table 2-3.
During SQL Server 2005 installation, Setup automatically creates an independent set of
system databases for each instance. Each instance receives a different default directory to hold
the files for the databases created in the instance. The default location of the database and log
files is Program Files\Microsoft SQL Server\Mssql.n\MSSQL\Data, where n is the ordinal
number of the SQL Server instance.

Table 2-3
Database and log files D ATABASE D ATABASE F ILE L OG F ILE
master Master.mdf Mastlog.ldf
model Model.mdf Modellog.ldf
msdb Msdbdata.mdf Msdblog.ldf
tempdb Tempdb.mdf Templog.ldf

The SQL Server installation process prompts you to select the physical location of the files
TAKE NOTE
* belonging to the system databases if you want to use a location rather than the default.
As pointed out in System databases contain information used by SQL Server to operate. You then create user
the Deciding How to databases, which can contain any information you need to collect. You can use SQL Query
Name Instances section, Analyzer to query any of your SQL databases, including the system and sample databases.
SQL Server 2008 has a
slightly different path
structure.
50 | Lesson 2

Table 2-4 describes the type of information stored in each of the default databases.

Table 2-4
System database contents D ATABASE C ONTENTS
distribution History information about replication. SQL Server creates this data-
base on your server only if you configure replication.
master Information about the operation of SQL Server, including user
accounts, other SQL servers, environment variables, error messages,
databases, storage space allocated to databases, and the tapes and
disk drives on the SQL Server.
model A template for creating new databases. SQL Server automatically
copies the objects in this database to each new database you create.
msdb Information about all scheduled jobs, alerts, and operators on your
server. This information is used by the SQL Server Agent service.
tempdb Temporary information and intermediate result sets. This database is
like a scratchpad for SQL Server.

An additional system database, the Resource database, is a read-only database that con-
tains all the system objects included with SQL Server. It’s usually hidden in Management
TAKE NOTE
* Studio. The only supported user action is to move the Resource database to the same loca-
tion as the master database.

Normally, you’ll leave the system databases in the default installation directory. However, you
may have to move a system database in the following situations:
• Failure recovery (For example, the database is in suspect mode or has shut down because
of a hardware failure)
• Planned relocation
• Relocation for scheduled disk maintenance

Common files used by all instances on a single computer are installed in the folder system-
TAKE NOTE
* drive:\Program Files\Microsoft SQL Server\90, where systemdrive is the drive letter where
components are installed. Normally this is drive C:.

LAB EXERCISE

Perform the exercise in your lab In Exercise 2.3, you’ll see where the system database files are located.
manual.

■ Deciding on the Tempdb Database Physical Storage

The faster the drive assigned to tempdb the faster your database will be. A separate spindle
THE BOTTOM LINE
may be justified.

The tempdb system database is a global resource available to all users connected to the instance
of SQL Server. The tempdb system database is like a scratchpad for SQL Server and a place
where temporary information and intermediate result sets are stored. Earlier versions of SQL
Server made some use of the tempdb database, but SQL Server 2005 takes that a step further.
Designing Physical Storage | 51

The new version uses the tempdb database heavily to support features such as row versioning
for triggers, online indexing, Multiple Active Result Sets (MARS), and snapshot isolation.
Consequently, you must be careful when determining the size and location of tempdb. In
addition, you should ensure that each instance has adequate throughput to the disk volumes
on which tempdb is stored.
Because it serves the same role as the reams of notepaper on which a writer outlines ideas, the
tempdb database is volatile, and no effort is made to save it from session to session. Instead,
tempdb is re-created each time the instance of SQL Server is started, and the system always
starts with a clean copy of the database. Temporary tables and stored procedures are dropped
automatically on disconnect, and no connections are active when the system is shut down.
For that reason, SQL Server doesn’t allow backup and restore operations on the tempdb
system database.

Because tempdb is re-created each time the instance of SQL Server is started, you don’t
have to physically move the data and log files. The files are created in the new location
TAKE NOTE
* when the service is restarted. Until then, tempdb continues to use the data and log files in
the existing location.

The size and physical placement of the tempdb system database can affect the performance of
a system. For example, if the size that is defined for tempdb is too small, part of the system-
processing load may be taken up with autogrowing tempdb to the size required to support the
workload every time you restart the instance of SQL Server. You can avoid this overhead by
increasing the size of the tempdb database and log file.
Determining the appropriate size and location for tempdb in a production environment
depends on many factors. As described previously, these factors include the existing workload
and the SQL Server components and other features that are used.
Whenever possible, place the tempdb system database on a fast I/O subsystem. Use disk strip-
ing if there are many directly attached disks. You should also put the tempdb database on
disks other than those being used for the user databases.
For optimal tempdb performance, you can make some critical settings to the configuration
of tempdb. (SQL Server Books Online contains excellent information on how to optimize
tempdb usage that is beyond the scope of this book.)
• Set the recovery model of tempdb to Simple to automatically reclaim log space. This
keeps space requirements small.
• Set files to automatically grow when they need additional space. This allows the file to
grow until the disk is full.
• Set the file growth increment to a reasonable level. You want to keep the tempdb data-
base files from growing by too small a value, causing tempdb to constantly use resources
to expand, which adversely impacts performance. Microsoft recommends the following
general guidelines for setting the file-growth increment for tempdb files.

TEMPDB F ILE S IZE F ILE -G ROWTH I NCREMENT


0 to 100 MB 10 MB
100 to 200 MB 20 MB
200 MB or more 10%

• Preallocate space for all tempdb files by setting the file size to a value large enough to
accommodate the typical workload in the environment.
52 | Lesson 2

• Create as many files as needed to maximize disk bandwidth. As a general guideline,


create one data file for each CPU on the server and then adjust the number of files up or
down as necessary. Note that a dual-core CPU is considered to be two CPUs.
• When using multiple data files, make each file the same size to allow for optimal
LAB EXERCISE proportional-fill performance.
Perform the exercise in your lab In Exercise 2.4, you’ll modify the tempdb database’s size and growth parameters.
manual.

Establishing Service Requirements

Every SQL Server instance is made up of a distinct set of services with specific settings
for collations and other options. The directory structure, registry structure, and service
names all reflect the specific instance ID of the SQL Server instance created during SQL
Server Setup.

You can determine which services are required by an instance based on the role played by the
instance. You can use the SQL Server Configuration Manager and the SQL Server Surface
Area Configuration tool to configure services.
You should establish policies for enabling and disabling SQL Server services. SQL Server
provides a large number of services. However, a database server typically doesn’t require all
these services. Therefore, you should establish policies for enabling and disabling SQL Server
services.
To do so, first group database servers according to their roles in the infrastructure. Then,
identify which SQL Server services need to be enabled for each group, and disable the
remaining services for that group. For example, a reporting server requires SQL Server
Reporting Services (SSRS) but may not require SQL Server Integration Services (SSIS) or the
SQL Browser service.
Isolating services reduces the risk that one compromised service could be used to compromise
others. To isolate services, use the following guidelines:
• Don’t install SQL Server on a domain controller.
• Run separate SQL Server services under separate Windows accounts.
• In a multitier environment, run Web logic and business logic on separate computers

Specifying Instance Configurations

So far in this lesson, you’ve looked at a number of overlapping topics that focus on the best
ways to configure your database server’s physical storage and other resource utilization.

In addition to planning where to place files and databases, designing storage and determining
instances, there are a few other aspects of instance configuration to address. SQL Server
2005 contains a new tool, Configuration Manager, which you can use to manage the services
associated with a SQL Server instance, configure the network protocols by SQL Server,
and manage the network connectivity configuration from SQL Server client computers.
SQL Server Configuration Manager is a Microsoft Management Console snap-in that is
available from the Start menu, or you can add it to any other Microsoft Management Console
display. Microsoft Management Console (mmc.exe) uses the SQLServerManager.msc file in
the Windows System32 folder to open SQL Server Configuration Manager.

SQL Server Configuration Manager combines the functionality of the following SQL
TAKE NOTE
* Server 2000 tools: Server Network Utility, Client Network Utility, and Service Manager.
Designing Physical Storage | 53

You can use Configuration Manager to perform the following tasks:


Manage services. You can use Configuration Manager to start, pause, resume, or stop ser-
vices, to view service properties, or to change service properties. As you can see in Figure 2-3,
Configuration Manager gives you easy access to SQL Server Services.
Change the accounts used by services. You should always use SQL Server tools, such
as SQL Server Configuration Manager, to change the account used by the SQL Server or
SQL Server Agent services, or to change the password for the account. You can also use
Configuration Manager to set permissions in the Windows Registry so that the new account
can read the SQL Server settings.
Manage server network and client protocols. SQL Server 2005 supports Shared Memory,
TCP/IP, Named Pipes, and VIA protocols. You can use Configuration Manager to config-
ure server and client network protocols and connectivity options. After the correct protocols
are enabled using the Surface Area Configuration tool, you usually don’t need to change the
server network connections. However, you can use SQL Server Configuration Manager if you
need to reconfigure the server connections so that SQL Server listens on a particular network
protocol, port, or pipe.
Assign TCP ports to instances. If instances must listen through TCP ports, you should
explicitly assign private port numbers. Otherwise, the port numbers are dynamically assigned.
You can use the SQL Server Configuration Manager to assign port numbers. Although you
can change port numbers that are dynamically assigned, client applications that are set up to
use these port numbers may be adversely affected.

When you’re assigning ports, make sure they don’t conflict with port numbers that are
already reserved by software vendors. To determine which port numbers are available,
TAKE NOTE
* visit the Internet Assigned Numbers Authority (IANA) Web site at the following URL:
www.iana.org/assignments/port-numbers.

Figure 2-3
SQL Server Configuration
Manager is the preferred tool
to manage many aspects of
SQL Server instance configura-
tions, including services.

LAB EXERCISE

Perform the exercise in your lab In Exercise 2.5, you’ll learn how to use the Configuration Manager.
manual.
54 | Lesson 2

S K I L L S U M M A RY

Physical storage is a prime consideration when you’re planning a SQL Server database
infrastructure. In this lesson, we’ve reviewed best practices and parameters for storing
the transaction log and backup file. You’ve had a brief introduction to RAID and how it can
be used to both assure fault tolerance and optimize your storage system.
Placement of files also plays an important role on performance, and you’ve learned the
information you’ll need to use in deciding where to install the operating system, SQL Server
service executables, files created for databases, and system databases, especially tempdb.
You’ve also learned that there is no magic answer about where to place files, but that your
decision in this regard will have a ripple effect across your database server.
You learned about default and named instances and how they both expand your flexibility
and ability to customize your database infrastructure while bringing along their own set of
considerations. You’ve learned basic functions such as deciding on the number of instances
and naming conventions. You’ve also learned when and where to use default and named
instances.
You learned how to set service requirements specific to your database server needs. Finally, you
learned how to use SQL Server Configuration Manager to administer services and instances, as
well as network protocols.
In the next lesson, you’ll learn how to combine the material you’ve learned here and in
Lesson 1 to develop a database-consolidation strategy.
For the certification examination:
• Be familiar with transaction logs and their storage needs. It’s important that you know the
growth characteristics of transaction log files and how they impact your physical storage
design. Make sure you understand the effect truncation and shrinking have on transaction logs.
• Know the different types of RAID. It’s important that you be able to differentiate between the
different types of RAID and understand which type should be applied in what circumstances.
You should be aware of the relative impact each RAID type has on read-and-write operations.
• Be familiar with system databases. What is their role? What do they do, and how do you dif-
ferentiate among them? Make sure you have a clear idea of the effect placement of the system
databases has on overall performance and the circumstances in which they should be moved.
• Understand the impact of the tempdb system database. Be certain you know the role of
the tempdb system database in an instance and the design considerations surrounding its
physical storage. Be familiar with the recommendations for the initial size of tempdb in
different situations as well the growth increment.
• Understand default and named instances. You should know the basic difference between
them and the circumstances under which a named instance is more appropriate than a
default instance, and vice versa.
• Know how to administer instances. You should understand the basics of choosing the
proper naming and number of instances for your infrastructure.
Designing Physical Storage | 55

■ Knowledge Assessment

Case Study
Mullen Enterprises
Mullen Enterprises provides database-hosting services for companies in the health
care industry. The company is now offering a new hosting service based on SQL
Server 2005.
Mullen Enterprises has a single office. Customers connect to the company network
through private WAN connections and via the Internet.

Planned Changes
The company plans to implements new SQL Server 2005 computers named Dublin,
Shannon, and Cork to host customer databases.

Existing Data Environment


Currently, Mullen Enterprises has its own database named Customers, which is used
to track customers. This database exists on a SQL Server computer named Dublin.
Internal users access this database through a web services application that allows users
to provide details that are used to build ad hoc queries that are then sent to the SQL
Server computer.

Business Requirements
Each customer can host up to five databases. Databases for a given customer are always
hosted on the same server. Each customer uses his or her own naming schema. Because
all customers are in the health care industry, most customers give their databases similar
names such as Patients, Doctors, Medications, and so on.

Performance
The company wants to maintain a minimal number of SQL Server 2005 instances and
servers.

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions.
1. You need to design a strategy for identifying the number of instances that any one SQL
Server computer will support. What should you do?
a. Specify that each server must have one service for each customer.
b. Specify that each server must have only one instance.
c. Specify that each server must have one instance for each database that is hosted on
the server.
d. Specify that each server must have one instance for each customer who has one or
more databases that are hosted on the server.
2. You plan to have the Cork server contain three customers: Yanni HealthCare Services,
Kelly Hospitals, Inc., and The Curtin Clinic. Following your guidelines that each server
must have one instance for each customer who has one or more databases hosted on the
server, which of the following should you do?
a. Create a default and two named instances. Place the customers with the largest
performance need on the default instances, and place each of the other customers
on their own named instance.
56 | Lesson 2

b. Create three named instances—Yanni_HealthCare_Services, Kelly_Hospitals, and


Curtin_ Clinic—and place each customer’s databases on the specific instance.
c. Create three named instances—YHCS, KH, and CC—and place each customer’s
databases on the specific instance.
d. Create three named instances—YanniHealth, KellyHosp, and CurtinClinic—and
place each customer’s databases on the specific instance.
3. You are planning the configuration of the SQL Server 2005 instance where the
Customer database will be stored. As a security precaution, you need to ensure that
Windows services that are not essential are disabled. Which Windows service or services
should be disabled? (Choose all that apply.)
a. SQL Browser
b. SQL Server
c. SQL Server Analysis Services
d. SQL Writer
e. SQL Server Integration Services
f. SQL Server Agent
4. You are planning the configuration of the CurtinClinic instance and trying to determine
how much space you need for the database. You want to estimate the size of the Patients
database when it is completely full. The Patients database consists of the following tables:
Name, Billing, and Orders. The total field size for the Name table is 184 bytes; the
Billing table is 313 bytes, and the Orders table is 439 bytes. You need to plan for 15,000
records. How much space should you plan for the Patients database?
a. 15.36 MB
b. 14.60 MB
c. 14.28 MB
d. 17.23 MB
5. You want to know the amount of space the transaction log for the Customer database is
using. Which T-SQL command would you use?
a. DBCC SQLPERF (LOGSPACE)
b. DBCC CALCULATE (LOGSPACE)
c. DBCC SQLPERF (TRANSACT)
d. DBCC CALL (TRANSLOGSPACE)
6. Yanni HealthCare Services advises you that their database, Orders, is a mission-critical
database. Because the database, which is related to patient care, contains all the phar-
macy, laboratory, and other orders, none of which can be acted on unless confirmed by
the database, it must be available 24/7 and have the fastest possible access. Which step
should you take?
a. Place the Orders database on a SCSI disk with the fastest controller.
b. Implement RAID 0.
c. Implement RAID 10.
d. Implement RAID 15.
7. You configure the Kelly Hospitals transaction log so that it starts with a size of 20 MB.
What settings do you need to configure in order for it to grow automatically by a pre-
specified amount of 5 MB until it fills the disk? (Choose all that apply.)
a. FILESIZE
b. FILEGROWTH
c. GROWTH_INCREMENT
d. GROWTH_SPACE
8. The transaction log for the Yanni HealthCare Services database has reached the maxi-
mum size possible and consumed all available disk space. You want to reduce the physi-
cal size of the log in order to allow other disk operations. Which of the following is the
correct procedure?
a. Use the command DBCC SHRINKFILE.
b. Truncate the database.
Designing Physical Storage | 57

c. Shrink the database from Object Explorer.


d. Force a checkpoint.
9. You have been instructed to optimize tempdb performance for all databases as part of your
optimization plan. Which of the following settings would you make? (Choose all that apply.)
a. Set the recovery model of tempdb to Full.
b. Preallocate space for all tempdb files.
c. Set the tempdb file-growth increments to least 10 percent.
d. Always confine tempdb to a single file.
10. Which of the following tasks can you use the Configuration Monitor to perform?
(Choose all that apply.)
a. Manage services
b. Change accounts used by services
c. Manage server network and client protocols
d. Assign TCP ports to instances
3 LESSON
Designing a
Consolidation Strategy
L E S S O N S K I L L M AT R I X

TECHNOLOGY EXAM
SKILL OBJECTIVE
Design a database consolidation strategy. Foundational
Gather information to analyze the dispersed environment. Foundational
Identify potential consolidation problems. Foundational
Create a specification to consolidate SQL Server databases. Foundational
Design a database migration plan for the consolidated environment. Foundational
Test existing applications against the consolidated environment. Foundational

KEY TERMS
deploying: Migrating and creating a solution, and testing planning: Evaluating the data you
stabilizing your database servers the pilot. gathered in the previous phase
in the consolidated environment. envisioning: Gathering information and creating a specification to
developing: Designing a to analyze a dispersed consolidate SQL Server instances.
database migration plan for environment and identifying
the consolidated environment, potential consolidation problems.

SQL Server is a powerful data platform capable of handling many different applications
at once. However, in most cases, each application has its own dedicated SQL Server that
is underutilized. This results in greater costs than necessary to many businesses, usually
for two reasons. First, often the DBAs don’t want to chance decreased performance with
multiple applications using the same database server, so they separate each one on its
own SQL Server. Second, application developers or network administrators don’t realize
that the database isn’t synonymous with the server, so they require a new server for each
application.
Building on the capabilities of SQL Server 2000 and greatly extending its limits, SQL
Server 2005 is a platform geared to consolidate many SQL Server 2000 instances into
fewer SQL Server 2005 servers. This Lesson looks at the terminology and pros and cons
of consolidations, and how you may want to proceed when developing a strategy for your
environment.
This Lesson looks at a consolidation strategy in four phases based on the Microsoft Solutions
Framework (MSF):

58
Designing a Consolidation Strategy | 59

• Envisioning your strategy. Gathering information to analyze a dispersed environment


and identifying potential consolidation problems.
• Planning your strategy. Evaluating the data you gathered in the previous phase and
creating a specification to consolidate SQL Server instances.
• Developing your plan. Designing a database migration plan for the consolidated
environment, creating a solution, and testing the pilot.
• Deploying your plan. Migrating and stabilizing your database servers in the
consolidated environment.

The full MSF Process Model consists of five distinct phases:


• Envisioning
• Planning
• Developing
• Stabilizing
• Deploying
TAKE NOTE
*
The stabilizing phase has been omitted in this discussion because it is largely an action
rather than a planning phase. During the stabilizing phase, the team performs integration,
load, and beta testing on the solution. In addition, the team tests the deployment scenarios
for the solution. The team focuses on identifying, prioritizing, and resolving issues so that
the solution can be prepared for release. During this phase, the solution progresses from
the state of all features being complete as defined in the functional specification for this
version to the state of meeting the defined quality levels. In addition, the solution is ready
for deployment to the business.

One note before going further: Your consolidation plan will be unique because of the wide
variety of variables that occur with each set of servers. This Lesson’s discussion assumes you
understand your environment and can extrapolate the advice and details given to make the
most informed decisions for your company.

■ Phase 1: Envisioning

When you’re creating a consolidation plan, the first step is to consider its value by examining
THE BOTTOM LINE
the current SQL Server environment and gathering the information about the infrastructure.

The following sections highlight the main steps of envisioning the consolidation plan.

Forming a Team

Before you decide whether consolidation is a good idea for your organization, you need
to form a team to plan, create, and test your consolidation strategy. This team needs to
be involved from the start, so members have input on the benefits and drawbacks of
consolidation for the company.

The consolidation effort won’t be quick or easy, and it’s too difficult for one person to
manage. Even in a small company where a single IT employee may be in charge of the
consolidation effort, this person will still need input and assistance from other aspects of the
60 | Lesson 3

business. A team of two is required at a minimum, consisting of a technical IT representative


and a business end user, although often other people will be involved.
The technical portion of the team should have representatives from the operational side of the
company as well as the development side. Again, in a small company these may be the same per-
son, but both perspectives should be represented. From the business side for example, it might
be helpful to have a financial representative as well as an end user. This may be the same person,
but they should present the reasons for proceeding as well as the potential effect on end users.

Making the Decision to Consolidate

The first task when beginning a consolidation plan is to decide if you should con-
solidate your servers. Once this decision is made for two or more instances, then you
can proceed with your plan. The first step is to examine the reasons for and against
consolidation.

It isn’t always a clear-cut case that you’ll want to consolidate your SQL Server instances.
However, some good reasons exist for going through a server consolidation. The following
sections cover the main reasons you should consider consolidation.

CONSIDERING COSTS
The first reason companies look at server consolidation is cost. You should consider a few
types of costs, but first look at hard costs—those you must pay for immediately, such as the
SQL Server license. Each SQL Server you install costs money for the server software. There
may be a simple fee for the software or there may be per-processor costs and possibly long-
term maintenance contracts. Microsoft offers discounts for business categories (government,
education, many others) for which you will undoubtedly qualify but this remains a significant
cost for many businesses.
Hard costs are easy to quantify and calculate because they represent actual money being spent
by the company. Other costs, called soft costs, are harder to list because they consist of missed
chances for revenue or savings.
IT people are often paid salaries; or, if your company outsources your IT support, there may
be a fixed cost for the service. But every server you add requires time to set up, install, admin-
ister, patch, and so on. Soft costs can be hidden because companies don’t raise the IT admin-
istrator’s salary each time a new server is added. Instead, the greater workload takes away from
the administrator’s ability to perform other work because of a lack of time. Or it may cause a
lack of desire to improve other areas because administrators feel overloaded and taken advan-
tage of as the number of servers they must support grows.
A great example is the database administrator’s (DBA) position. If a company’s DBA has two
SQL Server instances to administer, time should be available after monitoring logs, patching,
and so on, to tune these servers, proactively rewrite queries, and perform other tasks that are
important to a smooth running SQL Server. However, if the same DBA is required to handle
five servers, then there is less time to devote to each server. For a small business, the DBA is
probably also the system administrator. Overload is much more likely because that person
may also be responsible for file servers, mail servers, web servers, and other systems. Salary
costs are the biggest component of IT, so it behooves a company to minimize staffing require-
ments. Consolidation helps by allowing a smaller number of employees to administer a larger
number of applications using fewer servers.
The cost strategy can also be extended to your infrastructure. Although it’s unlikely that
you’ll run out of IP addresses, you may have other issues. These days, as servers get smaller
and smaller (e.g., blade servers), electrical power becomes a concern. Ensuring that you have
enough electrical power to supply your servers has become more of an issue in many data
centers. Even when you’re part of a small company with a single rack at a co-location facility
or in the back closet, if you continue to add servers, at some point you’ll start to run short of
Designing a Consolidation Strategy | 61

electrical power. Adding power lines can be a small expense or a large one, depending on your
situation, but it’s never a request that you should make. If you begin to consolidate your serv-
ers as you upgrade existing servers or look to install new applications, you can dramatically
lower your power requirements.
An even more critical component than power in many environments is the cooling capac-
ity in your data center. Most data centers designed in the past 10 years were created with the
expectation that 20 amps of power and 5 to 10 servers would be placed in each rack. Today,
with smaller servers, a single rack can draw more than 40 amps and contain dozens of servers,
throwing off more heat than can be removed by existing cooling systems. Large installations
are turning to liquid cooling in some cases, and rack vendors are even building liquid cooling
into their rack enclosures.
The addition of cooling capacity is a double request to a company’s finances. Not only must addi-
tional cooling equipment be purchased and installed (an expensive proposition), but this equip-
ment also requires power, adding strain to your power infrastructure. And if you’re forced to move
to liquid cooling from air cooling, then a major capital investment is required. Again, consolidat-
ing servers can eliminate the need for your organization to invest in additional cooling systems.
These last two expenses, power and cooling, are soft costs. It’s unlikely that adding one addi-
tional SQL Server will force you to spend money, but at some point the company will need
to expend hard cash on one of these projects. Consolidation allows you to delay, or even
eliminate, the need for any of these soft costs to become hard costs.

CONSIDERING SECURITY
One very good reason for consolidation is security. Setting up and maintaining a secure enter-
prise is difficult, and the fewer systems you have to secure, the more secure the enterprise
should be. Security experts talk about the surface area of attack, or the number of points at
which your security can be breached. Each new system means another chance for an attacker
to exploit a forgotten configuration, an unpatched vulnerability, or extra accounts that were
created and forgotten.
If you have one SQL Server, then you have one sa login account to worry about. When you
change the password, as you often should, then you’ve increased your level of security. If you
have five SQL Servers, there is a chance that one password will be forgotten or not changed
when it should be. You also have five potential sa login accounts that an attacker can look to
exploit. Just as having five doors to your building is less secure than having one or two, more
servers lower your overall security by increasing the surface area for attack. They also increase
the chances that one will remain unpatched or be incorrectly configured.

CONSIDERING CENTRALIZED MANAGEMENT


Another good reason to consolidate servers is to move to a centralized server management,
which can reduce costs and provide greater security. By centralizing management, you more
efficiently use the staff and software resources you’ve devoted to this task. Each server that
must be managed has its own quirks that must be known to keep it running in peak condi-
tion; this places an increased load on the staff and decreases the number of server instances
that a particular DBA can effectively manage. By consolidating on fewer servers, your staff
can more effectively manage the systems.
Also, some software resources are devoted to the management of the systems, whether these
are log readers, performance-monitoring applications, or others that read information from
each server instance. The licensing cost savings of these applications are easy to quantify in a
consolidation plan, but the load is also a driver. Network loads can be greatly reduced because
fewer servers being queried results in less information that must transit the network. In addi-
tion, the overall effectiveness of the applications can increase as fewer stresses are placed on the
applications. As loads grow, the ability of software monitoring to keep up with the number
of nodes can be compromised, and alerts or information may be lost.
Finally, one important benefit of centralized management is standardization. By reducing the
staff and resources that must be involved, it’s much more likely that your standards in every
62 | Lesson 3

area, including setup, installation, maintenance plans, naming, and more, will be carried out
consistently. Each new server and person to be handled by the enterprise increases the chance
that standards won’t be met properly. The deviations may be deliberate or accidental, but each
one is a potential problem area. Moving to fewer consolidated environments allows fewer
resources to be devoted to managing these systems and increases the likelihood that each
resource better understands the systems.

CONSIDERING INCREASED RETURN ON INVESTMENT


Most of the previous reasons factor into a return on investment (ROI) calculation that your
company’s finance people can determine when looking at a consolidation project. Because
consolidation can initially result in an investment in hardware as well as potential downtime,
the total ROI needs to be examined over time to determine whether it makes business sense.
In many companies, hardware isn’t the largest systems cost, but each purchase is highly visible.
Hard dollars (or whichever currency you use) are the easiest costs to include in an ROI
study, and they’re also the first hurdle you must overcome when gaining approval for a con-
solidation effort. If your plan can’t account for the investment in new hardware, it’s unlikely
to be approved. Once you can prove this is a worthwhile investment, you can begin to exam-
ine other, less budgetary, reasons for proceeding.
There are a few non-cost-related reasons to consolidate your systems where possible. The
first one to consider is the reduction in salary cost if you have fewer systems to administer.
Maintaining fewer Windows systems also means the IT staff must do less patching and less
monitoring work. This reduces the effort required to run your enterprise, which means your
people are less stressed. This translates into less downtime for your applications and more
availability for your business users to generate revenue.
Assuming you don’t overwork your employees in some other way, then consolidating your
systems, if done correctly, should result in happier employees, which translates to less turnover
and better morale. Each person working in your company has some knowledge about your
company that is hard to replace. The less often you’re forced to hire replacements, the more
efficiently your company will run, and the lower costs will be over time.

CONSIDERING OPTIMIZED USE OF RESOURCES


In many organizations, the servers in use are underutilized. This is partially a result of the fear
of systems being overloaded and partially because not enough time has been spent planning
system deployment. These two reasons are often intertwined, although they’re unavoidable in
some cases. Less frequently, the applications are underutilized, at least compared to the expec-
tations at the time it was deployed.
The fear of overloaded systems comes from historical deployments of client-server and web-
based applications whose usage was incorrectly forecast. Sometimes this happened because
there was not enough data to correctly forecast usage; in other cases, the impact of the appli-
cations was underestimated. Once an application is deployed and proves to be valuable, usage
often increases dramatically, overloading systems.
As a result, many applications are built with the supporting servers sized for a worst-case sce-
nario, resulting in servers that trundle along at 5 to 10 percent utilization for most of their
major components (CPU, RAM, disk) and occasionally spike in response to some event. There
often is a good case for consolidating some of these servers to more efficiently use resources at
a greater level. Moving to 50 or 60 percent usage on one system by consolidating five or six
others is a more efficient use of resources, because the remaining four or five can be redeployed
or retired. As long as the end users understand that it’s acceptable for application performance
to suffer when resource utilization spikes, this is a good reason to consolidate the servers.
Resources are also underutilized when time isn’t spent planning the deployments and new sys-
tem sizes and when existing systems aren’t examined for possible consolidation at deployment
time. The most cost-effective time to handle SQL Server consolidations is prior to applica-
tions being deployed, because much of the work being examined in this Lesson is performed
on development systems and is duplicated in a later consolidation effort.
Designing a Consolidation Strategy | 63

You can always look to separate your SQL Server databases on separate servers if perfor-
TAKE NOTE
* mance problems become severe. For some reason, this is an easier decision to make in a
business than the later consolidation decision.

DECIDING NOT TO CONSOLIDATE


You’ve seen a number of reasons why consolidation makes sense. Everything comes down to a
cost of some sort, but the reasons not to consolidate are more intangible. From a strictly finan-
cial point of view, wherever possible, a company should seek to consolidate its servers. However,
here some reasons why consolidation may not make sense, few of which are cost related.
Working backward from the reasons to consolidate, first examine security. Although the previous
arguments assume consolidation increases your security, consolidation is a double-edged sword.
If your SQL Server security is breached, then all of your databases are vulnerable. Although
overall security is higher, you’ve greatly increased the potential losses if a breach occurs.
Consider a company with five SQL Server instances, four used for internal applications and
one supporting a public Web site. Suppose you consolidate all five applications onto one
database server. Now you’re supporting the four internal applications along with the public
web application on one SQL Server. This web application presents a larger attack surface,
because more people have the ability to launch attacks through the Web. If the security of this
server was to be compromised and an attacker gained control, they would have access to all
the financial and sales information in addition to the Web site data. In this example, consider
excluding the web SQL Server in the consolidation plan.
It isn’t only outside attackers you have to worry about, however, because internal users can
inadvertently cause problems. Suppose a company develops its own inventory application. It
uses a database called Inventory_TDB to test its code and make changes while the produc-
tion system runs against the Inventory_PDB database. There are two possible problems if
you consolidate these databases on the same server: A developer could accidentally test code
against Inventory_PDB, resulting in data-integrity problems or data losses; or a worker could
accidentally update data in the Inventory_TDB database, resulting in incorrect values for the
business. As you review consolidation strategies, consider how to mitigate your own issues.
Another reason consolidation may not be a good idea is the effect on performance. Each
application running on instances of SQL Server uses a certain amount of resources, memory,
CPU cycles, disk space, and more in performing its function. If you move all your applica-
tions onto one server, then the applications may compete for limited resources in a way that
negatively affects the applications. You can mediate some of this impact by properly sizing a
consolidated server, but this may be a reason that you don’t want to consolidate servers.
The performance impact can be within the SQL Server instances as well. One of the single
points of contention on a SQL Server is tempdb. Some applications make heavy use of
tempdb, and some use it relatively little, but on a single SQL Server instance, all the databases
share one tempdb. As you add databases to a SQL Server through consolidation, this can
become a huge bottleneck.
One last issue that may give you pause in considering consolidation is the impact on employ-
ees if you experience performance problems. Happy IT staff members can be quickly pushed
to their limits by performance problems resulting from consolidation. On-call pages and long
hours trying to rewrite applications, tune queries, and so on, can devastate morale and lead to
employee turnover.
Business workers can quickly become frustrated by application performance problems and
decide that spending a large portion of their time at the computer, as opposed to performing
their job, isn’t worth the aggravation. Losing one of your talented, senior employees could be
devastating to your business and overwhelm any cost savings from a consolidation of servers.
Although you may not become aware of employee issues until a consolidation plan is
complete and it may be too late to undo the process, it’s something to consider before you
64 | Lesson 3

embark on a plan. Employees are often a company’s most valuable resource, especially in the
line of business.
Cost is usually a driving factor in deciding to consolidate servers, but sometimes the cost
savings doesn’t outweigh the cost outlay. Because a consolidation effort usually requires new
equipment to be purchased, the cost of a new server may not be worth the investment.
Suppose a company sized a new eight-CPU processor for its consolidation efforts that costs
$50,000. This cost might not be worth the investment to embark on this strategy. This is
often apparent when a single server becomes disk constrained. Although today’s disk capaci-
ties are growing, there is still a limit to how much space a single server can support through
direct attached storage. If you exceed this capacity and are forced to consider Storage Area
Network (SAN) based solutions, the initial investment can be high. It may be high enough
that you determine a consolidation strategy isn’t worth pursuing.
Another area that works against consolidation is the risk factor. This factor encompasses cost,
performance, and staff. If you consolidate your servers onto one Windows machine, you’re
in essence putting all your eggs in one basket. If a problem occurs with that one machine—
overheated CPU, power supply failure, and so on—then all your applications fail. For some
companies, this is a huge problem. Suppose a company does a brisk business on its Web site
for one of its products. If a problem occurs in the Accounting application, the Web is cur-
rently unaffected and the financial department works from paper until the system is fixed.
However, if you put both of these databases on the same server, then an issue that occurs
from an Accounting system upgrade could potentially take down the Web site. Some compa-
nies consider this an unacceptable risk, so consolidation wouldn’t be a possibility.
If you recognize these risks, you can mitigate many of them by implementing high-availability
features like clustering and redundant hardware. However, these features usually dramatically
raise the cost of the solution in two ways. First are the hardware and software costs from your
vendors for the resources to support these solutions. The other cost is in staffing, because you
may need to pay for training existing employees or hire others to handle these more complex
solutions. Just as the cost of purchasing a larger server can outweigh the benefits, so the cost
to mitigate risks can work against a consolidation strategy.
The last factor you should consider in reasons not to consolidate is the “sunk cost” factor.
Your existing servers, whether purchased or leased, have a cost already associated with them
that you may not be able to recoup. In some cases, you can trade in old servers on a lease or
sell them back to the vendor. However, if you can’t recoup any costs, chances are that these
servers are older and may not be suitable for redeployment in your enterprise. In that case,
the accountants may not see any cost benefits in moving to new servers when the old ones are
paid for and unused. Be aware of this potential roadblock when designing your strategy.

DEVELOPING THE GOALS


A consolidation effort should have clear-cut goals prior to beginning any detailed planning.
The team members should initially have a list of goals they seek to accomplish through this
process both from the business and technical viewpoints. Some of these reasons will be dis-
missed in this phase, and others may become apparent; but without an overall philosophy and
direction, this effort won’t proceed smoothly. Table 3-1 lists some sample goals.

Table 3-1
Goals for a consolidation
T YPE OF R EASON GOAL
project
Technical Reduce the number of systems that must be monitored by DBAs.
Technical Increase security by combining logins on multiple servers through
consolidation.
Business Reduce the licensing costs for SQL Server.
Business Adhere to single-source vendor limitations for the project.
Designing a Consolidation Strategy | 65

The goals you develop should include the type of consolidation you’ll undertake. You could
consolidate the resources of your SQL Server instances in a variety of ways, and you could
choose to implement any or all of the following types of consolidation:
• Instance consolidation. In this case, you look to reduce the number of instances by
moving databases from disparate instances to a single instance of SQL Server.
• Physical server consolidation. In this type of consolidation, separate physical SQL
Server instances on different Windows servers are consolidated to one Windows server.
This could involve moving to multiple instances of SQL Server on one Windows server,
or possibly keeping the same number of instances. This could also mean running sepa-
rate installations on one server by creating multiple virtual servers. The goal is to
reduce both the number of physical servers and Windowsservers that must be managed.
• Geographic consolidation. Although this type of consolidation involves moving servers
from one physical location to another, it often includes one of the two previous types of
consolidation. More than any other type of consolidation, this has a larger impact on the
network, so carefully consider that in your plans.
• Storage consolidation. Less a SQL Server move than a Windows consolidation, this
involves moving multiple SQL Server instances (and their corresponding Windows
servers) to the same storage device, such as a SAN device. It could also be a consolidation
of your SQL Server database files from multiple drives to fewer. Either option would
require less work by itself than the other methods, but it could be a part of a larger
consolidation project.

In developing the goals for the project, consider the existing environments that the applica-
tions run under. For example, many applications have a service-level agreement (SLA) with
end users that can’t be easily altered. These should be listed as goals for those systems to
which they apply.
Many other items apply to your systems, such as changes to headcount, budget restrictions
for new hardware, technology changes, and more. As planning is not edition specific, sample
issues to consider are available in the Planning for Consolidation with Microsoft SQL Server
2000 white paper, available at
http://www.microsoft.com/technet/prodtechnol/sql/2000/plan/sql2kcon.mspx.

Developing Guidelines for the Consolidation Project

Each project needs guidelines that determine how a consolidation effort will proceed.
Some of these guidelines will be technical and others nontechnical, but they will affect
the way the plan is developed. These “rules” govern how you’ll move through the proj-
ect. Although you’ll begin to develop them in this phase, they will carry through all the
phases as lessons are learned in the process for your organization. Some sample guide-
lines you may wish to use are as follows:

• On-Line Transaction Processing (OLTP) and On-Line Analytical Processing (OLAP)


workloads won’t be placed on the same server.
• Multiple instances can be used where database naming conflicts or login conflicts occur.
• Consolidated servers will run only SQL Server, not other services such as file serving,
Exchange, and so on.

These guidelines will be unique to your environment and need to meet the needs of your
company. The specific guidelines you develop will often be driven by business goals or
requirements.
66 | Lesson 3

Examining Your Environment

In developing the overall parameters for consolidation, one important set of information
is the structure of the current environment. Initially, every system should be considered;
only after you have valid reasons for dismissing it should the system be removed from the
master list. The next few sections examine how the current instances can be analyzed.

SQL Server 2005 is a great platform that has greatly enhanced its database performance over
SQL Server 2000. You’ll most likely want to acquire new hardware for your consolidated
server, but not necessarily. In either case, SQL Server 2005 outperforms SQL Server 2000 on
the same hardware, assuming the minimum hardware and software requirements are met.
At each stage, as you gather information about your current environment, note any potential
CERTIFICATION READY?
You set a performance
problem areas. These could be organizational issues, such as the scheduling of downtime, or
goal of user response they could be technical issues, such as object name collisions. This phase is concerned with
to data requests at the accumulation of documentation on your environment, not the decisions of whether each
8 seconds or less. What individual change is possible.
factors lengthen or shorten
this user experience? ANALYZING APPLICATIONS
The first step in considering consolidation is to examine your applications and determine
which ones will run on either SQL Server 2005 or SQL Server 2008. If a software vendor
won’t support your application on SQL Server 2005 or Server 2008, then you should elimi-
nate that application from your plan. SQL Server 2005 should be completely backward com-
patible with SQL Server 2000 databases, and you can set the compatibility level at 8 (for SQL
Server 2000 as opposed to 10 for SQL Server 2008, 9 for SQL Server 2005, or 7 for SQL
Server 7). However, some vendors won’t provide support for this configuration. Make sure
you check, because running applications without support is a good way to irreparably damage
your reputation.
While you’re checking this, note that there is no longer a SQL Server 6.5 compatibility
level. If you’re still running version 6.5, which is no longer supported by Microsoft, then
you should find out if you can upgrade the application. Substantial keyword and structural
changes occurred between versions 6.5 and 7 and your application may not even upgrade
without a rewrite.
After you’re sure all your applications will be supported, you should list the potential applica-
tions and their database names and servers. Doing so will give you a master list from which to
start considering consolidations. Take this list, and look for any database name collisions: two
applications that use the same database name. Consider development and Quality Assurance
(QA) databases. Because each SQL Server instance requires unique database names, if you
find two applications that share the same database name, you need to note that fact. They
can’t run in the same SQL Server instance, although they could inhabit two separate instances
on the same Windows server. You should also determine whether either application’s database
name could be changed.
Don’t cross off any names at this point, but knowing how many collisions exist will give you a
minimum number of servers or instances. Suppose your company had three Sales databases—
one each for development, QA, and production systems—and the names couldn’t be changed.
You would need at least three servers or three instances in a consolidated environment.
The next step is to examine each application and determine whether any of them are mis-
sion critical or too risky to combine with other systems. Doing so will further increase your
minimum server number because you may not be able to combine some databases with others.
Be careful, and ensure that you solicit feedback from the business owners of systems. You
may not think the Sales system needs to be separate from the Accounting application, but the
Finance department may feel differently. You can provide them with the benefits and the reasons
Designing a Consolidation Strategy | 67

you think this combination will work, but make sure you consider the viewpoints of the
various departments.
While you do this, don’t mention the multi-instance nature of SQL Server as a way of com-
bining different SQL Server instances onto one Windows server. This is a technical distinc-
tion that few business users will understand. Treat instances as if they were separate servers,
and find out what the various stakeholders think about their application being combined with
others. You can make the multi-instance decision later.
Also weigh the security risks and get your security department involved (if you have one).
Many view the development systems as less critical than, say, the Accounting system. However,
that doesn’t mean you can put the Development database on the same server as the Web site
backend. Security is a constant process, not a single event, and this is a good place to consider
the implications of combining databases on one server.
Save examining performance implications or hardware requirements for later; the previous
items can quickly lower the number of applications that you must examine in detail. Because
time is precious and the performance and hardware analysis will require more time, you don’t
want to examine any more systems than you have to.
Also consider the Service Pack/Patch implications of combining servers. Because many patch-
es apply to an entire SQL Server, or to all instances on a server, you need to be aware that if
you patch one application, you’re potentially patching them all—or breaking them all, if the
patch changes functionality. Include the third-party vendors’ past patch response times and
the Service Pack certifications on their applications. Also note whether servers are currently at
different patch levels, and be sure each application is tested at the highest patch level that will
exist on a consolidated server.

MONITORING APPLICATIONS
Once you have a list of applications and their databases that are potentially available for con-
solidation, you need to begin looking at their performance requirements as well as detailing
information. Because applications that are combined on one server or one instance may affect
each other, you need to examine a few performance points with each application/database
combination.

You should have current baselines for all your servers that you can use in this section. If
TAKE NOTE
* you don’t, consider gathering these on a regular basis.

You should have up-to-date documentation on the configuration of each server regarding
memory, security, server and database settings, collations, and any other changes made to
a default installation. The disk usage for the server, local or SAN based, as well as network
requirements should be included.

When gathering disk requirements, account for disk space used by files outside SQL
Server. This includes backup files, data-transfer or bulk load files, Data Transformation
Services (DTS) files, and more, that take up space but aren’t usually associated with
SQL Server.
TAKE NOTE
* When you’re gathering security requirements, it’s important to consider whether the
SQL Server instances are all in the same domain. Large enterprises sometimes have more
than one Active Directory (AD) domain. The security changes can be challenging if you
attempt to consolidate SQL Server instances from two different domains.
68 | Lesson 3

You may want to start by gathering quick averages of performance times for queries on the
different servers. You should examine a representative sample of queries using Profiler or
another monitoring tool and gather data on the time and frequency of queries. This is less for
TAKE NOTE
* planning than for a contingency plan in case issues arise. Having this data will help you deter-
Each version of SQL mine in more detail where issues are occurring. When you gather this data, use a few different
Server has utilized the dates and times.
server resources more Before you begin, start a spreadsheet on which to record the data. Doing so will help you
efficiently. This means tabulate and compare your data. List the applications down the left side; next to each, include
that for a given hard- the current server, database name, and database size. Use a consistent notation for size (prob-
ware platform, a SQL ably gigabytes). You should also note the CPU type and speed, the RAM, and the total disk
Server 2008 instance space for each server available for the SQL Server instances. At the top of each column, record
will run more quickly a header that notes the values you’re placing below it. You may want to check whether SQL
than a SQL Server 2005 dynamically manages memory or if there is a limit. Record the average that SQL uses as well
instance and a SQL as the Windows machine total.
Server 2005 instance
will run more quickly On each SQL Server, you need to examine some counters and determine how much load
than a SQL Server 2000 each application places on its SQL Server. Gather the following counters in Performance
instance. Note the SQL Monitor to get an idea of each server’s usage. Each is discussed in the appropriate section,
Server version for each and Lesson 1 contains more information about how to gather this information.
consolidation candidate.
MONITORING THE CPU
The CPU is the heart of the system, and a faster CPU often means better performance for
your server. To prepare for a consolidation effort, you need an idea of the load from each
server to understand whether you can combine two servers. Use the following counters:
• Processor: % Processor Time. This counter helps you understand how busy the overall
server is for a dedicated SQL Server server on a Windows server.
• Process: SQL Server process: % Processor Time. This counter is necessary when you
examine servers that have other applications, including other SQL Server instances
running on one Windows server.

MONITORING MEMORY
Memory is critical in SQL Server’s processing of requests, and more is always better. Note the
amount of physical memory on the server; then, to properly size a server holding more than
one application, monitor the following counters:
• Memory: Available Bytes. This is a good general counter that gives you an idea of how
much memory pressure the server is experiencing. If this is less than 100 MB, the server
is starting to feel pressure.
• Process: Private Bytes: sqlservr process. This should be close to the size of Process:
Working Set: sqlservr process if there is no memory pressure from the Windows server.
• Memory: Paging File: %Usage. If this value is high, then you may need to increase the
size of the paging file or account for a larger file on the new server.
• SQL Server: Buffer Manager: Buffer Cache Hit Ratio. If this is less than 80 percent
on a regular basis, not enough memory has been allocated to this instance.
• SQL Server: Buffer Manager: Stolen Pages and Reserved Pages. The sum of these two
values divided by 100 should tell you how much to set in MB.
• SQL Server: Memory Manager: Total Server Memory (KB) and Target Server Memory.
The first counter shows how much memory the server is consuming, and the second is the
amount it would like to consume. These values should be close to one another.

These metrics tell you how much memory is being used by the processes inside SQL Server.
Because SQL Server is fairly memory hungry, these will be padded if each application has its
own SQL Server, but you can still use the information to make some guesses about a consoli-
dated server.
Designing a Consolidation Strategy | 69

It’s important to check whether Address Windowing Extensions (AWE) or Physical Address
Extensions (PAE) is being used. If these switches are being used, then you should be especially
careful when consolidating other applications using the same switches onto a server without
large amounts of memory. These extension settings only apply to 32-bit versions of Windows
Server. Large memory usage with these two extensions may push you to investigate 64-bit
versions of Windows Server and SQL Server. Note that the counters for performance moni-
tor don’t include AWE values. You’ll need to investigate inside SQL Server for more accurate
information. You can get detailed instructions for checking memory in the “Troubleshooting
Performance Problems in SQL Server 2005” white paper available at
www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx.

MONITORING DISK PERFORMANCE


In gathering disk performance information, consider all the disks your application uses.
Because disk subsystems vary tremendously in their makeup, underlying structures, and indi-
vidual component performance, consider these general guidelines. A server that gets reported
as overloaded based on these guidelines may be performing well at the SQL Server level and
vice versa. Consider the perceived performance at the SQL Server level, meaning application
and query responsiveness, along with the data from performance counters when you deter-
mine whether additional databases can be added or to design a new storage subsystem.
Examine the following counters. Also examine all logical disks in use by your SQL Server
instances, whether they’re local storage or SAN based:
• % disk time. Examine this value for each disk. If it’s regularly greater than 50 percent,
then you’re probably experiencing bottlenecks from the disk subsystem. New databases
shouldn’t be added to disks that are experiencing high disk-time usage. Instead, note
the structure of the logical drive, and factor that information into designing a new
subsystem.
• Average Disk Queue Length. This value should be 0 ideally; if it’s sustained at 2 or
more on any of your logical drives, then you may have an issue and should consider a
new subsystem. This is a rule of thumb; newer subsystems may be capable of adequately
handling this number of requests, but it’s a point of concern.
• Avg. Disk Read/sec and Avg. Disk Write/sec. These counters give you an idea of how
quickly you’re moving data to and from the disk. Logs should be reading low, such as
2 to 4 ms, and your data files should be less than 20 ms on average. Again, these values
combined with the performance of the SQL Server should tell you what the load is like
on this particular subsystem.

If you’re measuring the performance of a RAID set of disks, you need to perform additional
TAKE NOTE
* calculations on the values from Performance Monitor. Check one of the white papers on
performance for more details.

All your SQL Server databases should use disk subsystems that include some sort of RAID
protection. But it’s something you need to note when gathering information about your
servers. Different RAID levels, as well as different numbers of disks in a RAID array, will
affect the performance of your new consolidated drive. Note how many spindles are being
used in addition to the RAID level of each drive being used for database files. This will
help you determine whether your existing subsystem or new design is adequate for the
expected load.
One interesting note on disk performance is that the percentage of disk usage can affect the
X REF performance of your systems. Studies done on disk access times show that performance is
Determining RAID levels highest when the utilization of the disk is less than 50 percent. After this point, the heads
is discussed in Lesson 10. must wait longer to access data, because more of the data is written toward the outside of
the disk.
70 | Lesson 3

In addition to checking the performance of the subsystem, ensure that you have adequate
space. In consolidating servers, be sure to account for the expected growth of the underlying
systems. Running out of space a few weeks or months after consolidation will likely upset a
great many people, not the least of which is the group that must fund additional space. Make
sure that any new disks on which databases are to be moved can handle the expected growth
for at least six months and preferably a year.
System databases also can experience growth, and that should figure into your calculations.
The master database is fairly small and should remain so in almost all cases, but msdb stores a
few pieces of information that can add to storage requirements. With multiserver administra-
tion features enabled, msdb could grow unexpectedly—be sure to note any issues at this stage
in your documentation.
Of more concern than msdb is tempdb because it’s more likely to have larger storage require-
ments. Intermediate worktables and other structures are stored in tempdb and can cause
growth in this database that exceeds the size of user databases in some cases. Factor the
tempdb usage on all servers being consolidated, and expect that you may wish to size a new
tempdb using the sum of the other maximum usages. The tempdb database is often placed
on its own disk array, so your calculations for this disk subsystem should ensure a fast and
responsive array for this database as well as adequate space. Expect that the combined load of
multiple systems will be larger on a consolidated tempdb, because this is a shared resource on
each instance.

One way to mitigate tempdb contention in consolidation is to use multiple instances of SQL
Server on one Windows server. Each will have its own tempdb, which means less contention
TAKE NOTE
* on one database. The downside is that you may need more disk subsystems to ensure separa-
tion. Also balance the gains in tempdb separation with the fixed-memory setups common with
multiple instances.

Consider the space requirements of your data and log backups, as well. Many SQL Server
instances store these files on a separate drive from the data and log files, so make sure that
they’re noted in your documentation.

MONITORING OTHER SQL SERVER–SPECIFIC METRICS


In addition to the gross metrics for the server, you should monitor some specific SQL Server
metrics:
• SQL Server: General Statistics: User Connections. In a per-server-based licensing
environment, knowing the number of users can affect the costs. This is also valuable
because each connection uses a small amount of memory.
• SQL Server: Cache Manager: Cache Hit Ratio. This value provides a metric of how
often the server finds the data it needs in cache. In general, the server should find data
more than 90 percent of the time. If you have two instances with numbers below 80
percent, you may not want to consolidate them unless you retune the applications or
add substantially more RAM.
• SQL Server: Databases: Transactions/sec. The cost of a transaction varies widely
depending on the amount of work done inside the transaction. However, over time,
this metric will tell you how “busy” your SQL Server is. As with the User Connections,
if you have two servers with high values, you may not wish to consolidate them.
Alternatively, you may find multiple servers with very low rates and want to combine
them on one server.
• SQL Server: Databases: Database Size: Tempdb. This will aid you in sizing a consoli-
dated tempdb for multiple servers.
Designing a Consolidation Strategy | 71

LAB EXERCISE In Exercise 3.1, you’ll gather a number of performance-related metrics from one of your SQL
Perform Exercise 3.1 in your lab Server instances. Although the exercise will walk you through setting up a single monitoring
manual. session, it’s recommended that you perform this multiple times at different times on different
days to get a picture of your SQL Server over time.

MONITORING GENERAL ISSUES


From your monitoring history, you should be able to generate average values as well as some
maximum values that give you a rough picture of the performance of your servers over time.
Record these next to each database in your spreadsheet. Use two columns for each value: one
for the average and one for the maximum. This will help you to easily see whether two of
your databases have potential problems with a maximum value. Don’t just take the maximum
value from Performance Monitor because that can be misleading. One recording of 100 per-
cent CPU can throw off your calculations when 99.9999 percent of the time the value is 10
percent. Most large ISPs use the 95th percentile method for recording maximum usage of
bandwidth: They throw out the top 5 percent of values and take the next max. If you record-
ed eighteen values of the CPU between 5 and 20, one value at 25 percent, and one at 100
percent, you would throw out the 100 percent recording and note the max as 25 percent.
This is a lot of work, but it isn’t necessarily continuous. Set up your monitoring (you should
already be doing some of this), and gather data over a period of days or weeks. Monitor dif-
ferent days and times and include times when your application is generating a heavy load.
You’re recording data that will factor into your decision about which servers are consolidated
as well as the details of the planning process in the next phase.

CREATING SERVICE-LEVEL AGREEMENTS


All the performance data you’ve gathered is of a technical nature. However, other performance
requirements are less obvious. One of these is a SLA that an application may have with its
end users. Such agreements spell out uptime/downtime limitations, performance metrics to be
met, response times, and more. They’re important business-level considerations that may drive
the decision to consolidate SQL Server instances.

SLAs aren’t always easy to find. Usually, these agreements are worked out between departments
in large companies, and the documentation may not be stored with the technical documenta-
TAKE NOTE
* tion on the server. With staff turnover, it’s possible that they may even be lost or unavailable,
existing only in the memory of long-term employees.

A good place to start looking for SLA documentation is with a business liaison for a particular
TAKE NOTE
* application or with an assistant close to your CIO/CTO. They often maintain this type of
business documentation.

CONSIDERING GEOGRAPHICAL ISSUES


As a company grows, it often acquires space in diverse locations. This can be a building next
door to the headquarters, in the next city as a remote office, or halfway around the world.
You may find that your SQL Server instances become similarly dispersed throughout the
enterprise; as you work on a consolidation, be conscious of which servers you’re considering
moving to a new location. Just as servers in separate domains need to be handled differently,
servers that move locations need special consideration.
In most situations involving multiple physical locations, each location will probably be on
its own network subnet. Relocating a server from one subnet to another would likely involve
network addressing issues as well as potential client connection issues such as DNS and
connection string values.
72 | Lesson 3

Your network is a dynamic topology that responds and reacts to changes constantly. Although
switches have helped to smooth out local traffic, routers can seriously affect the performance
of an application if introduced into the flow of traffic. Moving a server to a new physical
location can cause stress on the network if the link back to the users is less capable or reliable
than the previous one. Consider the impact of any move on network traffic and addressing
and consult your network engineers.

You don’t have to move a server to a new building to have network issues in a consolidation
effort. Even moving your database to another server in the same rack could result in a subnet
TAKE NOTE
* change. Because the use of two or more subnets causes the introduction of a router, you may
have traffic problems, security problems, and so on. Consider the need to involve network per-
sonnel in any consolidation effort.

MONITORING ASSOCIATED SYSTEMS


Do you know what other servers are required for a particular application to function? Do
you know every other server that connects to each database? Chances are, no matter how well
you know your environment, there are some connectivity items you’re unaware of. Or maybe
there are connections that only you know exist. These are places where your consolidation
effort can have problems—and you may not find out until you’ve moved your production
environment.
With web farms, load balancers, and other scale-out technologies, it can be difficult to keep
track of every system interaction, but you should document as much detail as possible. As an
example, suppose you have a reporting SQL Server that receives information from the web
SQL Server database using a SSIS package every night. During the consolidation effort, all
systems are tested and appear to work on a test version of the reporting SQL Server. When
this server is consolidated onto another, however, the job running from the web SQL Server
doesn’t have rights to connect to the new server at the firewall level. This issue can be difficult
to track down if not documented, and many processes like this are poorly documented.

As you document interactions and connections between servers, applications, and processes,
don’t forget to document a connection on both servers. A connection between Server A and
TAKE NOTE
* Server B should be documented in two places: the Server A documentation and the Server B
documentation.

Included in the discussion of linked systems are two SQL Server topics: replication and linked
servers. These two technologies are often implemented for very different reasons, and combin-
ing two servers connected with either of these deserves consideration. Replication is often used
to copy data to another system for two reasons: offline access, which isn’t a consideration here;
and reducing the load on a primary system and letting another server have this data available
to an application. If you combine two SQL Server instances that replicate data between them-
selves, you will probably be defeating the purpose of replication. Carefully examine the impli-
cation of such a move, and factor that into your decision to consolidate servers.
Linked servers, on the other hand, often are used to combine information from two servers
in an ad hoc manner. Linked-server queries are often slower than cross-database queries, and
you may be able to improve performance by eliminating the linked server if the two databases
using the link are combined. The downside is that programming changes will be necessary to
rewrite views, stored procedures, user-defined functions (UDFs), assemblies, and so on that
use the linked server. Consider all of this as a postconsolidation project to complete later.
One last interconnected set of systems you should consider is your administrative system—
especially the backup system. If you have multiple backup systems (a tape drive on each server
Designing a Consolidation Strategy | 73

or even two larger consolidated systems), make sure your consolidation plan won’t overload
one of them. Although the total amount of backup data won’t change, the distribution across
systems could overwhelm one of them, especially if you’re using local tape systems for your
SQL Server instances. The same is true of monitoring or other administrative software sys-
tems if you have multiple installations.

CONSIDERING OTHER ISSUES


Deciding in the first phase which servers to consolidate is a difficult process, and there are no
hard-and-fast rules that can simplify this process. You need to use your judgment and experi-
ence as a DBA to balance the trade-offs and make the best decision you can with your appli-
cations. All the topics in the first phase as well as the following items are designed to help
you understand what you need to consider and the effect each may have on your SQL Server
instances:
• Shared resources. Many of the shared subsystems between instances have been
eliminated in SQL Server 2005. Full-Text Search is a big one, but there can still be other
dependencies between instances. Be sure these don’t conflict with any of your decisions
made about which servers to consolidate.
• Extended stored procedures (XPs). Because these procedures operate outside of SQL
Server, they can cause instability, especially if they aren’t extremely well written. Memory
leaks are a big cause of concern with custom extended stored procedures. Factor the use
of these by separate applications on a combined server. Be especially careful if you have
custom XPs that are upgraded or different versions on different servers.

By default, the ability to use some XPs is disabled in SQL Server 2005. You can use the
Surface Area Configuration Manager tool to configure these XPs and enable the use of
xp_cmdshell.

SQL Server 2008 no longer includes the Surface Area Configuration Manager tool. That func-
TAKE NOTE
* tionality is now accomplished using the SQL Server Configuration Manager. The functionality
has not changed, just the method of accessing it through the GUI tool.

• Collation conflicts. This is a potential point of conflict if defaults are expected in an


application and a new server uses different defaults. With the granularity of collation in
SQL Server 2005 extending down to the individual field, this shouldn’t be a problem,
but make sure to note potential conflicts and effect a mediation strategy.

At this point, you should have gathered a great deal of information about the structure
and configuration of the current environment. Your plan should identify candidates for
consolidation as well as list problem areas. The next phase will provide guidelines for
developing your plan.

■ Phase 2: Planning

The second phase of this process is the planning stage where the servers are designed and the
THE BOTTOM LINE
processes are initially built to move forward.

At this point, a team is working on this project, and the members have overall goals and
guidelines as well as a list of systems to consolidate. In this phase, you do the more detailed
work of determining the makeup of the consolidated environments by determining which
hardware will be used to run the consolidated servers. This is also when basic procedures and
74 | Lesson 3

processes are tested and developed for your environment. This is in contrast to the Envisioning
phase, where you analyze the benefits and costs of consolidation to the entire enterprise.
You begin with analysis, looking at the information from phase 1 and beginning to consider
how to design the new consolidated system. Note that this doesn’t necessarily mean buy-
ing new hardware. Existing hardware can be used if it meets the requirements developed in
the design of the new server. The initial testing of procedures and processes occurs prior to
detailed development and final testing in the Developing phase.

Evaluating Your Data

As you’re gathering all this data about performance, you should be simultaneously deter-
mining which databases can and can’t coexist. If someone is concerned about Database A
being on the same server as Database B, you’ll have to make a judgment about whether
they can be in separate instances or must be on separate servers.

You should start to have general ideas of which applications can be combined on one server
based mostly on CPU load and disk space. Without adequate disk space available, either
direct attached or available on a SAN, you can’t combine the applications. Many decisions
and trade-offs regarding hardware are affected by other factors discussed later. Check your
plan against all these sections and go back through it each time you make a change. This is a
complex process involving many intertwined factors, so a single pass won’t be sufficient.
Consider the hardware systems separately and then in the context of the design possibilities.

EVALUATING YOUR PROCESSOR DATA


Try to keep CPU utilization below 70 percent. This gives room for spikes, although less room
than on separate servers. This decision results in a trade-off of performance capabilities, so
make sure the benefits outweigh the potential performance limitations. If you’re moving to
like processors, then you can total the average loads and stop when you hit 70 percent. If
you’re moving to more powerful processors with new hardware, then you’ll have to use bench-
marks to make some guesses about the load on a new CPU.
In either case, ensure that you understand how much less CPU headroom you have, and
examine your CPU spikes carefully. If there are particular times when multiple applications
make heavy use of computing resources, such as end-of-month or end-of-year processing, you
may wish to have those applications use separate SQL Server instances.

EVALUATING YOUR MEMORY DATA


Decisions about RAM sizing can be more difficult. RAM is used heavily by SQL Server, even
on lightly loaded systems, so you should determine as much information as you can about
how your SQL Server instances are using RAM.
RAM usage is managed dynamically by the server. However, as your servers grow, you may
implement AWE and/or PAE and set memory usage. Running multiple instances usually
means you should set the memory usage as well, and your decisions change if you choose to
move to 64-bit hardware.
This means you need to consider larger memory requirements for combined servers, although
the extrapolation isn’t linear. If you have three servers with 2 GB each, then consolidating all
three to one server doesn’t mean you need 6 GB; 4 GB or even less may be feasible. However,
in general, you should determine how much memory each server is using and develop a mini-
mum memory requirement for a combined server using this number along with the expected
workload of the combined server. For very lightly loaded servers, you may not need to add
memory; for larger workloads, you may wish to set a large minimum memory requirement.
RAM is now quite inexpensive so there is little cost to specifying more RAM than the mini-
mum necessary.
Designing a Consolidation Strategy | 75

Different OS versions have different memory maximum limits. These maximums can affect
your decisions to implement new servers because there will be a software cost as well as a
hardware cost to change versions.
Your design for a new SQL Server that exceeds 2 GB of RAM should also include the
WARNING One important
thing to note on your spreadsheet proper settings for AWE and PAE. You can read about these settings in the Windows Server
is whether any of your servers 2003/2008 documentation or SQL Server Books Online.
have memory specifically set to
some value. Most SQL Server
instances are set to dynamically PLANNING YOUR DISK SUBSYSTEM
manage memory, but some are The disk subsystem is critical to a smooth-running SQL Server because the data is stored on
limited by configuration to a
maximum amount. If this is the
disk, the log is written to disk, backup-and-restore speed is limited by disk response, and so
case, it’s possible that the SQL on. In addition, running out of RAM means a slower-running SQL Server; running out of
Server database for this application disk space means a stopped SQL Server.
uses additional memory and may
try to use more when consolidated In examining your subsystems, first make sure you’re looking at the logical versus physi-
to another server. Reconsider the
load of any servers whose memory
cal disk setup. They aren’t necessarily the same; you need to understand that performance is
is limited, and allow additional driven by the physical capabilities, but it can be masked by the logical setup. In other words,
padding in your calculations. you may design a fast physical subsystem but then place two, three, or more logical drives on
this subsystem that contend for the physical access. A logical drive containing tempdb may
appear to be responding slowly, but if you determine that a heavily used transaction log on
another logical drive shares the same physical array, removing the bottleneck may be as simple
as separating one logical drive to another physical array.
Good practice suggests that each physical drive array contain its own logical drive and no
WARNING Beware of
SAN-based storage if you haven’t others. These days, with larger file systems capable of handling hundreds of gigabytes of
designed the setup. Sometimes space, there’s no reason to combine multiple drive letters onto one physical array. This will
a Logical Unit Number (LUN) keep confusion to a minimum as well as make it easy for you to monitor performance.
presented to a server is made up
of physical drives shared by other You need to consider three additional factors in your plan for disk subsystems: space for
LUNs. Consult your SAN vendor
to see whether this is a potential backups, the RAID level of the underlying storage, and the number of disks that are used.
problem. This is of less concern on a SAN, but with direct-attached storage, moving from RAID-1 to
RAID-5 can have a big impact on performance. In general, you should move to like storage.
You should also keep an equivalent or increase the number of disks (spindles) in use on a
particular logical drive.
The design of your new subsystems should take all these factors into consideration separate
from any existing hardware. Once you’ve made some decisions on your requirements, you
can examine the capabilities of existing hardware and determine whether any systems can be
reused or reworked to meet your requirements.

Making Initial Decisions about the Plan

At this point, you can start to cut and paste rows in your spreadsheet into groups, sepa-
rating each group by a few rows. Each group represents a potential consolidated server,
so you can total up the CPU loads, disk space, and so on for that group. These are still
gross decisions, but this is a way to start formulating a plan that can be picked apart by
yourself and others and then refined until it becomes viable.

Once you’ve set up these groups, examine the memory usage and tempdb usage to refine
your plan. These two areas are hard to examine, and this is where the process becomes more
of an art than a science. One example is the Memory: Pages/sec counter, which is relative
between servers. It’s hard to compare it across servers, but based on the amount of memory
on the server available to SQL Server and looking at similar levels of RAM, you should get an
idea how memory hungry your database is. This is deceiving when you look at raw numbers,
because SQL will cache as much as it can and take up more memory than it needs for small
applications. Try to limit the number of high-memory-usage databases on each server. SQL
Server 2005 memory usage is very different from SQL Server 2000.
76 | Lesson 3

The tempdb database is also a concern with consolidating servers. Because each instance
shares one tempdb, if you have two to three applications that make heavy use of tempdb,
then you can overwhelm a server and swamp the disks on which tempdb resides with requests
(or even cause tempdb to grow out of control). SQL Server 2005 offers some additional uses
for tempdb, such as row versioning and online index operations. Sizing the tempdb database
is an art, and familiarity with the behavior of tempdb over time on your SQL Server instances
helps tremendously. You can look at the sizes of tempdb over time and make some extrapola-
tions regarding the needs of multiple databases on a consolidated SQL Server. Allow for some
padding, and size the new tempdb appropriately to handle the needs of all the applications
that will use it. This often means adding together the usages of all tempdb instances that are
being consolidated.
The following sections show a few of the potential options you can consider when designing a
new consolidated server environment.

RUNNING THE UPGRADE ADVISOR


You may determine that you can consolidate SQL Server 2000 instances using existing
hardware. Before you consider upgrading any server to SQL Server 2005, you should run the
Upgrade Advisor to ensure that the hardware can adequately handle the upgrade.
If you’re purchasing new hardware, then make sure your hardware not only exceeds the recom-
mended levels from Microsoft but also will handle the additional load of multiple databases
or instances.

CHOOSING TO USE MULTIPLE INSTANCES


One remedy for overwhelming tempdb is to use multiple instances of SQL Server on one
Windows server. If you decide to choose this option, then make sure you understand that
running multiple instances requires that you manually set the memory allocated to each
instance. Because your base Windows machine may be limited, make sure that you have
enough RAM to support both instances. If you’re running with AWE or PAE support, you’ll
suffer some performance issues as memory is swapped in and out of the addressing window.
Moving to 64-bit Windows and SQL Server can alleviate this issue, and this is one of the big
reasons to move to 64-bit SQL Server 2005. However, you need to make sure your applica-
tions will run on 64-bit platforms.
The other consideration with multiple instances is whether the application will support
multi-instance connections. As mentioned earlier, connecting to a named instance of SQL
Server requires that you address the server as windows name\instance name. If you’re consid-
ering multi-instances, you should check that your applications support it.

ADDRESSING 64-BIT SQL SERVER


Although 64-bit versions were available for SQL Server 2000, 64-bit computing wasn’t con-
sidered mainstream and the installations were few and far between. With SQL Server 2005
and SQL Server 2008, new Intel 64-bit CPUs, and other 64-bit advances, 64-bit computing
is a much more viable option. The big push to 64-bit computing is to take advantage of the
huge memory space—no more 4 GB limit. This is worth considering in a consolidated envi-
ronment, but because 64-bit computing has a cost (new hardware, new Windows version,
training, and so on), you need to do a cost analysis before making this decision.

CONSIDERING HIGH AVAILABILITY


Another concern in consolidating a larger number of servers to a smaller number is the avail-
ability of each application. A short example will help to illustrate the issue.
Suppose you’re working on a consolidation and have chosen to consolidate the QA server
along with the Inventory server because performance metrics show both of them to be a good
match for consolidation onto the existing Inventory server. You perform all the planning and
other phase work as outlined in this Lesson and then complete the deployment of both appli-
cations onto the Inventory server with no issues.
Designing a Consolidation Strategy | 77

A few months later, the QA team is testing a new version of one application and finds a bug.
The resolution from the vendor is a patch for SQL Server. When this patch is applied to
the QA server, it causes a problem and results in the server being rebuilt over the next day.
During this time, the Inventory application is unavailable, and the IT team must deal with
unhappy end users.

It’s unlikely that a nonproduction server would be consolidated with a production server,
but work performed on one application can affect another. In this case, a clustered situ-
TAKE NOTE
* ation might have allowed Inventory to run on the failover server while the primary was
being rebuilt.

This example illustrates how a consolidated server can pose a higher risk of instability than
separate servers. If you have two applications, each with a 20 percent chance of bringing
down a server, then combining them means you have a 40 percent chance of the new server
going down due to one of these applications.
Another example is the consolidation of the Accounting and SalesCRM systems onto one
server. In the event that a hardware failure occurs, two groups are unable to work, and a
greater portion of the business is affected than when the two applications were separate.
This brings us to the need for additional high-availability mitigation strategies for a consoli-
WARNING When looking at
high-availability solutions, consider
dated server. Clustering, log shipping, database mirroring, and other high-availability tech-
the training costs for your staff. nologies become more important (and possibly essential) in a consolidated environment. The
Clustering with SQL Server 2005 resulting cost of implementing them may outweigh the benefits of consolidating onto fewer
and Windows 2003 is much easier
than with past versions, but your
servers.
staff must have or acquire addi-
tional skills. GOING THROUGH MULTIPLE ITERATIONS
Now that you’ve made some decisions about which databases can be combined and the details
of your hardware, check your plan to be sure your decisions are sound. Multiple individuals
should validate the plan, and you should be able to articulate the technical and business rea-
sons behind your decisions to others.

Case Study: Consolidating and Clustering

You’re considering moving dkRanch Cabinets from its existing five servers down to three
servers: one for Accounting, Inventory, and SalesCRM; a second for WebPresence; and a
third for Development. However, the chief financial officer decides that having the three
major business applications on one server is a large risk to the business. You counter
with the idea that you can create a four-node clustered setup with three active nodes and
one passive node to mitigate the risk.
It’s estimated that the five-year cost of keeping the existing environment and going
through scheduled hardware upgrades will be $22,000. Moving to new hardware now
for each server is $4,000, but the additional software for clustering will cost $16,000
under your business deal with Microsoft. Is this worthwhile?
Solution: The total cost of the clustered solution will require four servers at $4,000,
which is $16,000 plus the cost of the software. This would cost the company $32,000
in total, which is substantially more than the $22,000 that is expected to be spent on
the current environment.
There could be other considerations—monitoring software costs, data-center costs, staff-
ing costs, and more—that would provide $10,000 in savings and make this a more cost-
effective solution. But as the problem was outlined, with the additional risk of having
mission-critical business applications on the same server, it probably is not worthwhile.
78 | Lesson 3

This may require that you modify your plan and move databases around. As you move data-
bases, make comments in your spreadsheet that gives reasons or restrictions for your deci-
sions. You may, for example, note that the WebPresence database must remain on its own
server for security reasons. That way, as you move things, it’s easier to keep all the information
and rationales for your decisions in one place.

SIZING HARDWARE
At some point, you may be moving databases and realize that you’re exceeding 70 percent
average CPU or discover that you’ll need more resources in some area. This is when you begin
sizing new hardware, as is usually the case with consolidation. You might consider consolida-
tion when getting ready to purchase new servers, so this is a logical spot to make decisions.
For your CPUs, you’ll have to depend on benchmarks and educated guesses if you’re choosing
CPUs of a type that you don’t have in your environment. Consider the relative strengths in
benchmarks of the integer values, because databases primarily move data around (an integer
operation). You may wish to subtract 20 percent as a pad when sizing the CPUs. You may
consider dual-core CPUs as 1.8 or so single-core CPUs. Multiple CPUs and symmetric multi-
processing (SMP) systems should use a benchmark of each additional CPU counting for 75
percent of a full one, due to the overhead of running the SMP system. Consider the relative
performance of different SQL Server versions in your decision.
RAM requirements are almost always easy to decide on: Get as much as you can and then
add more. You can’t have too much RAM in a SQL Server, so size as much as you can install
or afford on a new system. If you’re moving to a clustered environment, especially an active-
active node system, allow enough RAM for a failover scenario.
Your disk space requirements will require that you go back to your current applications and
get some numbers for your database growth over time. Factor the space required for backups
into this growth number, and add a pad (such as 10 percent). Try to ensure that you have
enough space for a year when sizing disks for your databases. Because budgets are usually
annual, this works well if you need to purchase more disks later.

Planning to Migrate Applications

Once you’ve developed the hardware design and made some decisions about which
databases will be consolidated, you’ll begin developing more detailed plans for the
migration of the applications, as well as user accounts and databases.

This section discusses the processes involved in moving applications to the new server. This
includes client-level changes, possibly firewall or other network changes, monitoring software
changes, possible upgrades of application code, security changes, and so on. This is the time
when you may perform a migration on a smaller scale, note a problem, perform another
migration with a possible solution, note other problems, and then continue until you’re com-
fortable with your process. You’ll resolve issues with names and locations, connection issues,
and so on. You’ll also determine the order in which you need to perform the various steps.
A few of the issues to be aware of and ideas for mitigating problems are as follows:
• Migrating logins and users. Database users are often easily moved with the database
itself, especially if you use the backup-and-restore or detach/attach method, but logins
are more difficult. First, make sure you need to move logins, because you may have
duplicate logins already set up on servers being consolidated. If you need to move logins
and retain their passwords, use the sp_help_revlogin procedure as described in KB article
246133: “HOW TO: Transfer Logins and Passwords Between Instances of SQL Server.”
If you have collisions where the same user exists on different systems with different
passwords, note this fact and devise a mediation strategy. Be aware that Active Directory
users and passwords exist throughout a domain. There is no requirement to change such
users if a consolidation is within a single domain.
Designing a Consolidation Strategy | 79

• Security issues. Your jobs, DTS packages, Integration Services packages, or CLR assem-
blies may require security permissions not set up on the consolidated server. This is the
time to resolve those issues through detailed examination of the logs created during your
testing. Hard-coded or preset passwords may also cause issues. Check connectivity from
all clients and other servers to resolve any issues.
• Domain issues. If you’re consolidating two servers from different Active Directory
domains, create procedures to either grant cross-domain permissions or otherwise handle
domain related issues. This also applies to different Active Directory forest situations.
Forests consist of domains and a multiple forest situation could be complex.
• Migrating DTS packages to Integration Services. This is a more difficult task,
although you can download the DTS runtime, which enables you to run DTS packages
CERTIFICATION READY? on a SQL Server 2005 server. Look up “Upgrading or Migrating Data Transformation
Can two servers each Services” in Books Online.
with a named instance of
Sales and each using a Moving the data is also something you test at this stage, but in limited amounts. You
database named SalesDB shouldn’t transfer a 100 GB database over and over to resolve issues. Instead, create a new
be combined onto one database and transfer a small subset of data to it. Then, perform your testing using this
server with two instances smaller database. In some cases, this may be as simple as performing a database transfer using
of Sales and SalesDB? Integration Services, backup and restore, or the detach/attach methods. In others cases, the
If this can be done, how
process may be much more complex, involving the development of customized Integration
would users connect?
Services scripts and packages.

You use actual data or a subset for this testing, but don’t focus on specific strategies for one
application. You should be developing general procedures to combine multiple SQL Server
TAKE NOTE
* instances onto one instance or server. If you learn specific items for one application, make
a note of them separate from your general procedures.

As you’re refining your process and procedure, you’ll note some issues that can affect the con-
solidation. These may include extended downtime, additional resources needed to complete
the consolidation, application changes, and so on. These risks should be noted in your plan
and used to determine whether you proceed to the next stage of developing the consolidated
solution.
Beware of scope creep at this point. There is always work going on with many SQL Server
databases, and you’ll be tempted to include some deployments of functionality along with the
consolidation because the application will be down. Avoid this temptation because there will
be issues associated with the consolidation and it will be difficult if not impossible to deter-
mine the source of the problems. Is it the consolidation or the enhancements? One example
is given in the following hands-on exercise, but there are many others. For example, pruning
logins, users, and other SQL Server objects should be tackled as a separate project, either
before or after this one.

Case Study: Avoiding Scope Creep

You’re developing a consolidation plan for dkRanch Cabinets and are looking to move
the Accounting database to the same server that currently houses the SalesCRM data-
base. Both of these applications were purchased from a third party and set up by the
vendors. The standard setup from the vendor uses the sysadmin account to connect to
the SQL Server instance using a password stored in a configuration file on each client.
After contacting the vendor, you learn that there is no technical reason why the appli-
cations need to use the sysadmin account specifically, and you receive instructions for
using another account. In what order should you proceed with the consolidation of the
Accounting application? (Not all steps are required.)
80 | Lesson 3

Change the client configuration files to reflect the new account and password.
Change the client configuration files to reflect the consolidated server name.
Create the new account, and assign permissions on the existing server.
Create the new account, and assign permissions on the consolidated server.
Add the steps for changing the application account to your consolidation plan.
Add the Accounting application migration steps to your consolidation plan, ignoring
the account changes.
Test the application using the new account.
Solution: This is a bit of a trick question because the idea in a consolidation effort is
to avoid scope creep. The consolidation should focus strictly on moving the application
to a new server without changing its functionality. Moving the connectivity to a new
account is a major change of functionality and should be completed prior to the consol-
idation testing. This prevents scope creep by changing the application as a project prior
to considering it for consolidation. The steps should be as follows:
1. Create the new account, and assign permissions on the existing server.
2. Change the client configuration files to reflect the new account and password.
3. Test the application using the new account.
4. Add the Accounting application migration steps to your consolidation plan, ignoring
the account changes.
The steps specific to the account changes on the consolidated server are ignored because
once the application is changed to the new account, the steps involved in consolidation
would be the same as with any other accounts that currently exist on the Accounting
SQL Server.

■ Phase 3: Developing

THE BOTTOM LINE The plan is complete. Begin implementing, prototyping, and testing.

Now that you’re developing a plan, it’s time to begin real development of the consolidated
solution. This phase of your plan requires the actual hardware or its equivalent, so that you
can begin testing and refining the plans using full-scale prototypes of the databases and
servers. This also involves validation of the decisions, piloting the consolidation, and a reex-
amination of the plan to ensure that it works as expected.

Acquiring Your Hardware

In this phase, you need to acquire at least some of the hardware that you designed in
phase 2. A full-scale mock-up of at least one of the consolidated servers is necessary for
a full test of the application movement as well as a simulated load that this server will
undergo. It’s understandable if you can’t get exactly the same hardware due to financial
constraints, but you should strive to get a system as close as possible in order to accu-
rately assess the performance of the system under real-world loads. If you’ve purchased
new hardware for the project at this time, you should be able to set it up as designed
and test your consolidation procedures as well as a full-scale load.
Designing a Consolidation Strategy | 81

There may be budget restrictions, but this is the last testing phase where you can alter the
hardware design before going live. This is why it’s important to get as close as possible to the
expected live setup and load to validate your decisions. Test various memory and disk con-
figurations to determine whether there are ways to improve performance through reconfigura-
tions. Once the system goes live, these changes will be difficult to make.

Creating the Proof of Concept

At this stage in the process, you should have acquired some of the hardware needed
and documented your plans and procedures for proceeding with the consolidation.
Prior to testing the process in a production environment, you should first implement a
proof-of-concept consolidation.

This stage should work exactly like the full consolidation, just on a smaller scale. Take one set
of databases and migrate them exactly as you plan to do with all your servers. This set should
result in a consolidated server that looks similar to one from your final design. You may com-
bine two databases onto one server—or more, if that is what your plan calls for—but the final
server needs to look like its production design.
You should then use the consolidated server under the same load that it will experience in
a production environment. This can be accomplished through replaying traces or simulated
loads, but whatever method you use needs to be as close as possible to the real-world result
in order to verify that your processes will work as expected. You’re trying to ensure that the
extrapolations you made for the performance of the consolidated server are accurate in terms
of what the final server will experience.
In implementing this proof of concept, you must test all the scripts and steps as well as any
connections from other systems, monitoring and administrative changes, applications and tools,
and so on. This isn’t just a SQL Server test, but an entire application environment evaluation,
with special attention paid to metrics (without neglecting the other parts of the system).
This is essentially a dry run of the processes to find and eliminate any problems. You may
require a second proof of concept if you have a large number of issues, or if you decide that
some things can’t be fixed and you must mitigate the problem with other solutions. The
examination of this step is crucial to ensuring a successful production deployment.

Creating the Pilot


Once you’re comfortable with the hardware and your proof of concept, it’s a sound
idea to set up a pilot of some less mission-critical applications and consolidate them. If
you have a large number of servers being consolidated onto a smaller number, you may
choose one of the less critical ones to work out any issues in your procedures. If you’re
consolidating all your servers onto one large server, perhaps you’ll consider consolidat-
ing two applications as a pilot and add the others later based on the success of your pilot
consolidation.

This pilot should be a full-scale, live movement of the applications to the consolidated server.
In essence, it’s the first step of your final consolidation, but with the intention of learning
from this exercise before continuing on to the next server. This follows the same scope as the
proof of concept, with one notable exception: This isn’t a test. This is a live deployment in
your production environment, and unless the pilot proceeds so badly that you must roll back
the servers to their original environment, the databases moved during this stage will remain
consolidated for the foreseeable future.
82 | Lesson 3

Even though this is a live deployment, you need to perform as much testing as you can and
pay increased attention to the consolidated server. This is the first production change you’ll
make, and the success or failure of this part of the project will affect your company’s business.
Everything prior to this step was a test and didn’t directly affect the end users. This time, any
mistakes will have a direct impact on the application as it’s used by the company.

REEXAMINING YOUR DESIGN


No matter how well you’ve planned and how accurate your design and procedures, it’s
extremely likely that some unforeseen event will occur or some hole will be found in your
procedures. After the pilot, reexamine your plan based on the results and any knowledge
you’ve gained in the process.
This is especially helpful if the pilot was a failure. Having to roll back your consolidation
effort in the pilot to the original servers doesn’t mean you should not consolidate servers;
rather, you should determine why the pilot failed and work to correct the problems.
Procedures and processes are the most likely places of a problem, usually due to a setting
or configuration not being completed or correct. Examining issues here as compared to the
proof of concept will tell you a great deal about how well your testing procedures mimic the
production environments.
It’s possible that you’ll learn your hardware design was insufficient to support the consolidated
server. This is a very bad situation if you’ve purchased hardware. Allow some pad in your
design for the system to be more stressed than you expect. You might not want to put
a 16-CPU server instead of an 8-CPU server as a pad, but you very well might design a
server that has 16 GB of RAM instead of 12 GB to allow for some headroom if you’ve
underestimated the resource needs of a consolidated server.
If you find that you’re rewriting whole sections of your plan following the pilot, you didn’t
spend enough time on phase 1 or 2 (or both) and should go back and rework those phases
using the knowledge you’ve gained. At this time, you should be tweaking your procedures
and plans only slightly: reordering steps, adding in a forgotten step, or perhaps deleting a
single item.

■ Phase 4: Deploying

The final phase of your consolidation effort is the deployment stage where you move the
THE BOTTOM LINE servers onto a consolidated effort in line with your plan and stabilize the applications in a
live, production environment.

You now start deploying ! Your plan from the planning phase should have involved scheduling
that determines the order in which applications and databases are consolidated into their new
environment. There may be requirements to complete one consolidation so that hardware can
be freed up and reused in a later consolidation, or it’s possible that all your changes can be
done in parallel. Based on the experiences in phase 3 of your proof of concept and the pilot,
however, you may choose to reorder the moves to ensure as little disruption as possible to the
business.
Regardless of how well your testing has gone up to this stage, it’s highly recommended that
you don’t perform all your consolidations at once, or even within a short period of time.
Other issues will arise in your organization that you must deal with in addition to any glitch-
es in the consolidation. Allow time to work on issues without forcing large sections of your
schedule to be reworked.
Designing a Consolidation Strategy | 83

Staffing can be an issue during a consolidation deployment. Unless you can afford addi-
tional consulting help, you’ll be asking the existing staff to work longer, usually late-night,
TAKE NOTE
* hours. Scheduling the consolidations too close together will run the risk of overworking
your staff and increasing the likelihood of mistakes. Allow time between the consolidations
to let your staff recover from the additional work.

Any change to a production environment runs the risk of destabilizing the applications on
which end users depend. This not only leads to less efficient use of the applications for the
business, but also frustrates the end users. There will be a point after which you’ve decided
the consolidation can’t be rolled back to the original environment. At this point, you must
develop workarounds or reconfigurations, or implement contingency plans to stabilize the
environment. Some things you may have planned for and some may be completely unex-
pected, but in either case you must work to ensure that the nontechnical aspects of the con-
solidation aren’t forgotten. Let your end users know of your plans, show them you’re working
quickly to stabilize things, and apologize for the inconveniences.

S K I L L S U M M A RY

Consolidation is a huge trade-off process between any number of conflicting requirements. It


may involve trading peak performance for efficiency, costs now for costs later, or some other
set of metrics. Whether this is the right decision for your company or situation is something
that must be examined on a case-by-case basis.
The focus of this Lesson was to provide an outline of how to proceed with a consolidation
analysis and deployment if you decide this is the right decision for you. Remember that the
decision to proceed as well as the process of planning, testing, and deployment is often an
iterative process that requires you to examine your decisions and reexamine them again and
again, considering all your options and their implications.
For the certification examination:
• Know how to analyze a dispersed environment. Understand the steps involved in
analyzing an environment of SQL Server instances with an eye toward consolidation.
• Understand why you consolidate in a single instance versus multiple instances.
Understand the reasons why multiple instances may be preferred over consolidation to a
single instance.
• Know the issues to be aware of in a consolidation. Know a number of issues that impact a
consolidation project, both technical and nontechnical.

■ Knowledge Assessment

Case Study
DkRanch Cabinets
dkRanch Cabinets is a small company with 120 employees that builds custom kitchen
cabinets. The company currently has five applications that it uses to run the business,
each requiring a SQL Server.

Planned Changes
The company would like to consolidate down to two SQL Server database servers: one
for the internal applications and development and one for WebPresence. The company
84 | Lesson 3

wants to be sure it can perform the consolidation without purchasing new hardware
and still have a well-performing system.
The new consolidated servers will run SQL Server 2005.

Existing Data Environment


The applications are SalesCRM, Accounting, Inventory, WebPresence, and
Development. SalesCRM is used by the sales application to record sales of products.
The Accounting database handles all the financial applications for the company, and
Inventory is a less secure application used by the woodworkers to record the raw mate-
rials they receive along with the finished products that are produced. Development is
the place where the internal IT staff builds and tests the Inventory application before
deploying it to Inventory. WebPresence is the back end for the company Web site. Each
application requires a single database to support it.
The existing servers have baseline measurements as follows:

S ERVER CPU B ASELINE M EMORY B ASELINE


SalesCRM 16% 1.6 GB
Accounting 18% 800 MB
Inventory 29% 1.2 GB
WebPresence 32% 1.2 GB
Development 24% 1.6 GB

Existing Infrastructure
The current server setup is on Windows Server 2000, and the individual hardware is set
up as shown here:

SQL H ARD -D ISK D ATABASE E MPLOYEE


A PPLICATION E DITION CPU S RAM S PACE S IZE U SERS
SalesCRM 2000 Standard 1 4 GB 80 GB (RAID 1, 2 drives) 20 GB 8
Accounting 2000 Standard 4 2 GB 120 GB (RAID 5, 4 drives) 60 GB 3
Inventory 2000 Standard 2 2 GB 240 GB (RAID 5, 7 drives) 40 GB 12
WebPresence 7 Standard (per 2 8 GB 80 GB (RAID 1, 2 drives) 20 GB 2 + anony-
CPU) mous users
Development 2005 Standard 2 8 GB 240 GB (RAID 5, 7 drives) 12 GB 4

All the servers listed are of the same hardware model and have interchangeable CPUs,
RAM, and disks.
Each of these servers meets the recommended requirements for SQL Server 2005.

Business Requirements
The servers can be reconfigured to meet the necessary needs, and any leftover hardware
can be redeployed in other areas as other servers are needed elsewhere. If the consolida-
tion isn’t performed, other projects will be placed on hold due to budgetary reasons.
Designing a Consolidation Strategy | 85

Management would like to consolidate servers, but they want a valid business reason for
undertaking this project.
An existing project to upgrade the database servers to SQL Server 2005 was already
approved, with Enterprise Edition upgrades slated for the multiprocessor servers.
The company’s current DBA is overloaded with tuning and managing the five servers,
and a second employee is being considered. If the consolidation is performed, this will
be unnecessary.
If a pilot application is to be included with Inventory, the Accounting application
should be used.
Management is behind the consolidation, but disruptions to the WebPresence and
Inventory applications must be minimized during the workweek.

Technical Requirements
The new servers should use no more than 70 percent CPU based on previous baselines.
Any servers that will be redeployed must have at least one CPU, 2 GB of RAM, and 40
GB of disk space. All current servers run RAID, and any redeployed servers must still have
at least two drives to run RAID 1. All the RAID cards can support either RAID 1 or 5.
For the purposes of estimating, each additional CPU above the first counts as one CPU
when calculating loads.
One spare eight-way server with 8 GB of RAM is being returned to the vendor, but it’s
available for the next two months for testing if needed.
All the applications are developed in-house and can be configured to connect to default
or named instances. The development procedures, however, call for the development
databases to be named the same as their production counterparts.

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions:
1. The current server baselines are listed in the case study. Do the CPU measurements
allow for consolidation to two servers?
a. Yes
b. no
2. The current server baselines are listed in the case study. Do the memory measurements
allow for consolidation to two servers?
a. Yes
b. No
3. Which of the following are valid reasons to proceed with the consolidation? (Choose all
that apply.)
a. Lower salary costs
b. Reduced power consumption
c. Standardized hardware
d. Lower upgrade costs
4. The development group has been planning to add an upgrade to the Inventory applica-
tion to support new products. Because the application is rarely taken offline, they ask to
include this upgrade in the project plan for the consolidation. What should you do?
a. Include the change in your project plan.
b. Include the change in your pilot plan.
c. Do not include the change.
86 | Lesson 3

5. In what order should the following steps be performed for a successful consolidation?
a. Pilot the consolidation using Accounting.
b. Develop a process for migrating the logins from one server to another.
c. Examine the business ROI for performing a consolidation.
d. Form a team for the project.
e. Migrate the remaining applications and stabilize them on the new servers.
6. Testing of your processes on a full-scale load should be performed in which phase?
a. Planning
b. Envisioning
c. Development
d. Deploying
7. The development group has been planning to add an upgrade to the WebPresence appli-
cation to support new products. Because the application is rarely taken offline, they ask
to include this upgrade in the project plan for the consolidation. What should you do?
a. Include the change in your project plan.
b. Include the change in your pilot plan.
c. Do not include the change.
8. You are proceeding with the consolidation project and need to determine how to set up
the new server. Which configuration should you use to minimize instances?
a. Four named instances, one for each application
b. Three named instances and one default instance, one for each application
c. One default instance and one named instance
d. One default instance
9. You have almost completed your consolidation plan. The Inventory, Accounting, and
SalesCRM applications have been migrated onto the new server. However, when you
add the Development instance, you experience some severe CPU load problems. What
should you do if this is performed late on a Sunday night?
a. Continue into the week and work out any problems.
b. Roll back the Development consolidation and retest your configuration.
10. You are developing an aggressive consolidation plan to complete all server moves in the
current quarter. July 4 falls on the Tuesday after your first planned server move. Four of
your six IT employees have scheduled vacation on that weekend. What should you do?
a. Move the project plan for the first server migration one week ahead to the end of June.
b. Proceed with the plan as scheduled, and ensure the other two employees are well
versed in the process.
c. Move the project plan for the first server migration back one week to the middle of July.
Analyzing and L ESSON 4
Designing Security
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Analyze business requirements. Foundational
Gather business and regulatory requirements. Foundational
Decide how requirements will impact choices at various security levels. Foundational
Evaluate costs and benefits of security choices. Foundational
Decide on appropriate security recommendations. Foundational
Inform business decision makers about security recommendations Foundational
and their impact.
Incorporate feedback from business decision makers into a design. Foundational
Integrate database security with enterprise-level authentication systems. Foundational
Decide which authentication system to use. Foundational
Design Active Directory organizational units (OUs) to implement server- Foundational
level security policies.
Ascertain the impact of authentication on a high-availability solution. Foundational
Establish the consumption of enterprise authentication. Foundational
Ascertain the impact of enterprise authentication on service uptime requirements. Foundational
Modify the security design based on the impact of network security policies. Foundational
Analyze the risk of attacks to the server environment and specify mitigations. Foundational

KEY TERMS
active directory (AD): The oper- audit: An independent verification security policy: The written
ating system’s directory service of truth. guidelines to be followed by all
that contains references to all organizational unit: An object employees of the enterprise to
objects on the network. Examples within Active Directory that protect data and resources from
include printers, fax machines, may contain other objects unintended consequences. A
user names, user passwords, such as other organizational security policy, for example, should
domains, organizational units, units (OUs), users, groups, exist guiding all users on how to
computers, etc. computers, etc. protect their network password.

87
88 | Lesson 4

In SQL Server 2000, some key security templates made security cumbersome and resulted
in workarounds that often didn’t meet users’ requirements. As a result, one of the key
design considerations with SQL Server 2005 was an increased level of security for the
server. SQL Server now not only includes more control and capabilities but also makes it
easier for the DBA to administer the security policies for the server.

This Lesson will examine the methods and reasoning behind designing an effective database-
level security policy for your SQL Server instances.

■ Gathering Your Security Requirements

Before you can develop an effective security policy, you must understand the requirements
that your plan must meet. These include requirements dictated by your business as well
THE BOTTOM LINE as any regulatory requirements imposed on your business by governmental or regulatory
agencies. Your plan must cover both of these types, and you must resolve any conflicts
between the two based on your situation.

The requirements imposed on your SQL Servers by the business will in all likelihood be easier
to meet (in other words, they will be less restrictive) but will probably be harder to ascertain.
When someone in business decides on a requirement for an application, that requirement
may or may not be documented thoroughly, which can cause you difficulties during planning.
You’ll spend much of this part of the design process interviewing executives, business liaisons,
stakeholders in each application, developers, administrators, and anyone else who may know
why an application has a security need.
The regulatory requirements, conversely, should be easy to determine. A business IT liaison
should be able to let you know which governmental regulations apply. Once you know the
applicable laws or codes, you can look them up from the appropriate agency’s offices or Web
site and incorporate them into your documentation.

As you gather this information, document it carefully. You may want to segregate the
TAKE NOTE
* data by server instance and database for ease of locating it later. You’ll use the various
requirements to design the security policy for your SQL Server.

WARNING Make sure you In addition to regulatory or governmental requirements, you may be subject to requirements
know the exact details of the from industry groups, standards bodies, or even insurers. Each certifying, regulating, or
requirements—and don’t rely on a
summary from a source other than industry-related company that interacts with your organization may have its own set of rules
the regulatory agency. A digest or and regulations.
guideline from another source can
help you understand the rules, but Often these governmental rules require different consideration than the rules that are estab-
your security decisions must satisfy lished for the rest of your enterprise. Regulatory rules exist to meet governmental standards
the original requirements.
or rules, while your enterprise will have developed rules to meet its own goals. If possible, it
helps to conform all your servers to the same set of rules. This makes it easier for everyone to
both administer the servers and understand the way each server works. This may not be pos-
sible for some applications that have conflicting requirements. For example, your accounting
systems may be bound by requirements for auditing that are mutually exclusive from other
systems that require a high degree of privacy for the data. The following are a few example
requirements:
Analyzing and Designing Security | 89

• All logins must be mapped to Active Directory accounts.


• Customer Social Security numbers must be encrypted as per government regulations.
• All data access to the medical database must be audited.
• Only bonded individuals can be assigned system administrator privileges as per insur-
ance guidelines.
After you gather the requirements from all sources, be sure to document any existing security
settings on your SQL Servers. These may or may not be in conflict with the requirements,
but in designing a security plan, you should consider the current environment. Have
mitigation plans handy for any changes to be sure that the databases remain available and
functional to users.
Before examining how you’ll use these requirements, you must understand the security scope
in SQL Server.

Case Study: Gathering Requirements

You’ve been assigned the task of architecting a new infrastructure for the SQL Server
2005 upgrade at a U.S. pharmaceutical company. To ensure that your design complies
with all applicable requirements, you schedule interviews with the chief operations officer
and his staff as well as the senior researchers.
You’re informed that you must adhere to a number of requirements: 10CFR15 as mandated
by the U.S. government, Sarbanes-Oxley requirements for the company as a publicly held
entity, and various insurance requirements to ensure worker and customer safety.
The process of complying with these regulations means you must validate every security
decision against all the different requirements. An internal committee of employees will
check your plan’s compliance when you’ve completed it.
Once you’ve made the necessary decisions, you need to ensure that a representative from
each body whose requirements you’re meeting audits the plan and documents compliance
with or deviation from each of their requirements.

Understanding Security Scope

X REF In SQL Server, security is applied at various levels, each encompassing a different scope on
External Windows which it applies. Security can be applied at the server level, the database level, and the schema
server–level security will level. This Lesson will examine overall security system design for the entire enterprise.
be dealt with in Lesson 5
and internal server
instance and database Figure 4-1 shows the hierarchy of a SQL Server. The highest level is the server instance, which
security in Lesson 6. contains one or more databases. Each database has its own users, which are mapped to server
instance level logins. Database security applies to the database container as well as all objects
within that database. Outside of the SQL Server are the Windows server and enterprise-level
security structures.
SQL Server has a four-part set of security levels: server, database, schema, and object. The
schema level was introduced with SQL Server 2005. A schema is essentially a container of
objects within a database; a single database can include multiple schemas. SQL Server 2000
blended the object’s owner and a schema to form a multipart naming system for objects. Thus
dbo. TestTable and Steve. TestTable were two different objects. However, the owner, Steve in
this case, was bound to the objects, and it was cumbersome to remove the user Steve.
90 | Lesson 4

Figure 4-1
SQL Server hierarchy

SQL Server now separates the schema from the owner in the database. As you’ll see later in
X REF
this textbook, this difference allows you to meet the security needs of the application without
Lesson 6 discusses the imposing a large burden on the database administrator.
permissions for separate SQL Server also has a number of encryption capabilities along with a more granular permis-
schemas. sions structure that enables you to meet most any security requirements for your enterprise.
You’ll learn about these encryption capabilities in later sections as you develop a database
security plan.

Analyzing Your Security Requirements

Once you’ve gathered all the security-related requirements for your database(s), you must
begin to analyze how they affect your SQL Server. You can meet most requirements in a
variety of ways, and by examining your applications’ various needs, you can choose the
appropriate SQL Server vehicle.

The first step in examining security requirements is to determine the scope of each
requirement. This Lesson looks at database-level security; subsequent Lessons will examine
other scopes. A requirement should fall into one of the scopes described in Table 4-1.

Table 4-1
Security requirements scope S COPE C RITERIA
criteria
Server level Anything that references the login to the SQL Server instance or involves
the configuration of the instance. Authentication of an individual or
service is addressed at this level.
Database level Requirements that address the storage of data in a database, encryption
of data, or the security of all schemas contained in a database.
Schema level Application-specific requirements that deal with access to specific SQL
Server objects (tables, views, stored procedures, and so on) that will be
stored in the same schema and accessed separately from other objects in
another schema.
Service level Requirements that address the security of a service, HTTP endpoint, or
Service Broker queue.
Analyzing and Designing Security | 91

You should classify each requirement as needing attention at one of these levels. The specifics
of these levels are addressed in later Lessons.
Because requirements for security can be general and encompass many different areas, it’s dif-
ficult to provide a comprehensive list that specifies where requirements fall. Table 4-2 gives
a few examples of requirements at the various levels to show how your analysis can classify
sample requirements.

Table 4-2
Sample security requirements R EQUIREMENT C LASSIFICATION
classification
Login security must be integrated with Server level
Active Directory.
It must be possible to deny a particular login Server level
access to the server if necessary.
Developers must have read-only access to Database level or schema level, depending
production database systems. on the design of the database
Web services used for reporting to clients Schema level if these tables are separated
must only have access to the invoice portion from others by schema; otherwise,
of the Sales database. database level
Service accounts must be unique for each Service level
instance/service combination.
No user should own any tables. Schema level
Developers should be able to manage all Schema level
objects in their development databases.

Items that fall at the database or schema level need to be addressed and considered when any
database design changes are made. Your security architecture must be followed during the
fundamental development of objects.
X REF
Dealing with Conflicting Requirements
Lesson 7 deals with
object-level security issues.
Many companies have only one set of security-based requirements; some have none. In
such companies, it’s unlikely that you’ll have conflicting requirements. However, in com-
panies that must follow governmental regulations or guidelines from standards bodies, it’s
possible that security needs will conflict.

For example, suppose you work for a medical firm that manages records for a series of hos-
pitals. Privacy regulatory requirements dictate that patients’ Social Security information
be encrypted, but your application relies on this data when searching patient information.
Because encrypting these columns would prevent them from being indexed, you determine
that this approach is unacceptable; instead, you decide to encrypt the patient names, which
aren’t used for searching. Doing so is in conflict with the explicit requirements.
As a designer of the enterprise’s security infrastructure, you most likely need to seek guidance
from your superiors or executive management about how to proceed. Making a decision on your
own may not lead to a result that meshes with the desires of your company’s leaders and may
end up costing the company financially. It can also be hazardous to your career! In this example,
you should approach your firm’s executives with your reasons for making the security decision to
encrypt the patient names. One reason to make this decision is that patients’ privacy is still pro-
tected, because their names are encrypted. However, because this action could be construed as a
violation of the regulatory requirement, the company has three choices: agree that this possibility
is acceptable, seek approval from the regulatory agency, or change the application.
92 | Lesson 4

CERTIFICATION READY?
You should make decisions yourself as much as possible; but when you’re faced with man-
Be prepared for exam dates or directives that conflict with one another, you need to seek resolution from those in
questions giving you charge of the company—especially if the decision is made to stray from regulatory guidelines.
choices on conflicting Company leaders often have a working relationship with standards bodies or governmental
requirements. Pay offices and can adapt the requirements to meet your company’s needs.
attention to stated
objectives and their
If you’re forced to choose between conflicting requirements yourself, understand the impli-
importance. cations of ignoring any particular set of rules. In making your decision, you should meet
all requirements to the greatest extent possible, but understand that governmental regula-
tions usually are more important than corporate or certification ones. Penalties for ignoring
WARNING If you aren’t
requirements that have been written into law or codified by a governmental office can be
a corporate officer, then you are financial woe for your company and may result in incarceration.
somewhat shielded from legal
responsibilities—but you aren’t If you’re choosing between your corporate mandates and the guidelines of a standards body or cer-
completely absolved of responsi- tification (such as ISO 9000), you should follow your corporate mandates. This is a general guide-
bility if you don’t meet regulations.
Losing your job is one thing; going line; make sure you have the permission of your company’s executives to proceed in this manner.
to jail is something else entirely.
Analyzing the Cost of Requirements

Not all requirements you gather will be implemented on your SQL servers. Regulatory
and mandatory requirements will be adhered to, but there may be requirements that the
business would like to impose but chooses not to for cost reasons.

Every security decision you make has a cost. It isn’t necessarily a monetary cost, such as the
purchase of a piece of auditing software. It can be a cost in terms of time (RSA 2048-bit
encryption takes too long to complete with current technology), in terms of effort (requiring
two-factor authentication will result in too many errors from users), or in terms of another
resource. As the designer for your SQL Server infrastructure, you need to weigh the costs and
benefits of each decision to determine whether it’s worth pursuing.
Financial costs are simple to determine via price quotes from vendors and suppliers, licensing
costs based on existing installations or user counts, and so on. You can generally gather this
information easily and use it to determine the amount of money that your company must
spend for security items. Make sure to assign these direct dollar costs to each particular item.
Nonfinancial costs are difficult to establish, and you’ll have to decide how your company will
assign the value of those costs. You need to allocate a value in dollars (or some other currency)
so that you have a method of measuring these expenses along with other costs. You can do this
in a number of ways, almost all of which require that you consult with the people and depart-
ments that will be affected to gather information about the impact from a particular decision.
Time is an easy cost to determine. Often, the time an event takes can be translated into an
expense based on the cost of the resources involved. Each employee has a cost that can be
divided out to determine the per-minute value of his or her time. Security decisions often
impose a burden on people that equates to time spent on some activity, so it’s relatively simple
to determine the security cost of a particular decision.

When you examine the cost of time, include all the people involved. For example, a pass-
word change resulting from a security decision to expire passwords results in the use of the
TAKE NOTE
* time of at least two people: the person deciding whose password must be changed and the
person making the change.

Other costs, such as increased time for customers or clients to use your system, their desire
or ability to work with your system, or even potential costs for others to integrate with you,
must be estimated by someone in your organization. The sales department may need to exam-
ine your requirements and determine the opportunity cost of a decision on the company’s
LAB EXERCISE overall ability to generate revenue.
Perform the exercise in your lab In Exercise 4.1, you’ll determine the time cost of resetting passwords.
manual.
Analyzing and Designing Security | 93

If need be, you can extrapolate this number to other numbers of employees based on the
expected growth or shrinkage of your workforce. For example, what is the cost for 10 people if
the average salary is $40,000? What is the cost for 20 people if the average salary is $40,000?
This cost analysis section of your design is purely subjective, based on the business in which
you’re working. You’ll need to solicit feedback from others in the business when you make
your calculations and also review your results with them to be sure you’re correctly accounting
for the costs of your changes.

BENEFITS
The cost analysis of your security design also has another aspect: the benefits analysis. Each design
decision, from password policy to encryption to the use of roles in your SQL Server, brings a
security benefit to your enterprise. The results may include lower risk of data loss, better market-
ing material to help sell your product or service, or a time savings that affects an employee’s job.
When conducting the cost analysis, make sure to consider the benefits and point them out in
your security plan. Too often, security is seen strictly as a cost, without including the benefits
that result from implementing a particular technology or process.

The benefits of a security policy can be hard to quantify. Extra attention paid to security is
TAKE NOTE
* frequently used as a marketing tool to showcase companies. Be sure you communicate the
positive aspects of your security plan to the marketing or sales department.

RISK FACTORS
Closely tied to the cost analysis for many items is the risk that some event will occur. For
example, suppose you determine that a SQL Injection attack on your SQL Server will result
in an average loss of $5,000 in time, product, investigation, and so on. However, using
industry data and past experience, you conclude that there is only a 1 percent chance of such
an attack each month.
The analysis for this event needs to calculate $5,000 at a 1 percent risk level, or a $50 per
month average loss. Any benefits or costs associated with preventing this event should be
compared against the $50 per month value rather than $5,000 per month.
Such risk factors can be hard to determine, but your insurance company can most likely help
you. The insurance industry is built on statistical analyses of various events and the probabil-
ity that they will occur. In most cases, security deals with an event that causes a breach and
results in a loss of money; your enterprise’s insurance company can help to quantify the actual
risk levels and the cost or benefit you should assign to any given design decision.

■ Integrating with the Enterprise

SQL Server uses two methods for authenticating logins to the server: Windows authentication
of users using Active Directory (AD) or local Windows users, and SQL Server authentication
THE BOTTOM LINE using a name and password. Your company may use other methods of authentication, such
as RADIUS, Novell’s Identity Manager, or other enterprise identity-management software.
These two, however, are the only ones available to SQL Server, and you’ll need to choose one
or both of them for your integration efforts.

Windows authentication in a domain environment uses Active Directory and works with the
users and groups you’ve already set up in your Active Directory database. You can add users and
groups as logins for your SQL servers, and the users’ credentials will automatically be checked
against Active Directory when they attempt to log on; they won’t need to reenter their password.
In contrast to Windows authentication, SQL Server authentication stores the login name and pass-
word in the server’s master database. To log on to the server, users supply the name and password,
94 | Lesson 4

which are matched against the values stored in SQL Server. Each time a user logs on to the server,
he or she must supply a name and password for the connection.

Choosing an Authentication Method

In deciding which authentication method you’ll use in your SQL Server infrastructure,
you should consider your enterprise’s existing environment. If AD is already present and
widely deployed to the clients that will connect to SQL Server, then this is the preferred
method of attaching to SQL Server. In this case, you should disable SQL Server authenti-
cation on your servers to reduce the surface area of attacks. Without SQL Server authen-
tication, there are fewer ways to connect for all users, including intruders.
TAKE NOTE
*
Password policy enforce- This type of authentication offers the advantage of tying access to SQL Server directly to an
ment is available only individual who has accessed other resources on the network. It also simplifies access because
on Windows Server the user doesn’t need to remember a separate account and password combination. The under-
2003 and 2008. lying AD infrastructure and the client can automatically authenticate the user. This approach
also ensures that Windows password policies are enforced and the user password is periodi-
cally changed for all resources.
CERTIFICATION READY?
Know when to use If you have clients that can’t use AD and must authenticate with a name and password, then
the different types of you’ll need to enable SQL Server–authenticated connections. This method isn’t enabled by
authentication. For default and must be changed for each server during or after installation.
example, can a Vista
Home Premium computer Non-Windows clients or applications using a technology such as Java or Perl that doesn’t sup-
user log in using an port the Windows authentication technology will require you to enable SQL Server authenti-
Active Directory user ID? cation. This approach adds an administrative overhead of managing a second set of users and
passwords that is separate from your enterprise list of users.
Choosing SQL Server authentication doesn’t disable Windows authentication. Your choices
LAB EXERCISE are Windows authentication only or both SQL Server and Windows authentication.
Perform the exercise in your lab In Exercise 4.2, you’ll learn how to change an authentication mode.
manual.
Setting Up Using Groups and Roles

Regardless of whether you choose to use AD as an authentication mechanism, you should


follow its model of users being assigned to groups and rights being granted to those
groups. This is a fundamental principle of Windows and SQL Server security and one
that your design should adhere to in philosophy.

Because most enterprises have multiple applications, the specific policies or rights granted for a
particular application should be documented with that application. However, your overall design
should ensure that roles are created for tasks and that the groups of tasks common to a particular
job are bundled together into a role. This is possible with the use of AD Organizational Units
(OUs) and groups, which let you assign groups as members of higher-level groups.
This doesn’t work, however, with internal SQL Server security. A database role can’t contain
other roles—only users. This means your security must be at a less granular level when creat-
ing roles and assigning permissions.

UNDERSTANDING KERBEROS
Kerberos is an enterprise network authentication technology that uses tickets passed between
servers and clients to authenticate users. It is part of Windows 2000 and newer Active Directory
domains, which use the TCP/IP protocols for network communication. Your SQL Server can use
Kerberos for its users as well if they are authenticated via Windows authentication. However, the
decision to use Kerberos means that all your clients must communicate with SQL Server using
the TCP/IP protocols. Other network protocols exist as well as other directory services such as
Novell’s Netware. These other environments obviously cannot use Windows authentication.
Analyzing and Designing Security | 95

To use Kerberos, SQL Server must be registered with a Service Principal Name (SPN) in
WARNING There are a few
issues with configuring Kerberos, Active Directory. This ensures that it can be managed within the Active Directory schema.
so be sure you consult the docu- When the SQL Server service account is configured to use the local system account, the server
mentation for SQL Server. Use sys. will automatically publish the SPN in AD for you. However, a SQL Server best practice is to
dm_exec_connections to deter-
mine if it is enabled. Check BOL change the startup account from local system to a domain user account to better secure the
“Using Kerberos Authentication” SQL Server instance. If you’re using a domain user account to run the SQL Server service,
for more details. then you have to manually create the SPN for the account in Active Directory. This can be
done with the setspn utility program.
If you choose to use Kerberos as an authentication mechanism, coordinate with your network
administrators to be sure your clients can support the protocol and your infrastructure is set
up to implement it.

IMPLEMENTING ADMINISTRATIVE SECURITY


No matter which authentication method you choose, you can always use Windows authen-
tication for your administrators—and you should choose to do so. Because SQL Server is a
Windows platform technology, the DBAs and other administrators can and should be able to
authenticate using this method.
All administrators for SQL Server should be configured using server-level roles and Active
X REF Directory groups or OUs to group users together by their particular job function. Toward
this end, you should determine the different functions for which your administrators will
Lesson 6 discusses addi-
be responsible and then create the OUs or groups necessary for those roles, adding your
tional server-level roles.
Windows users into those roles.
SQL Server lets you set the permissions for serverwide administrative functions in a more
granular fashion. Your security design should incorporate the idea of the least privileges neces-
sary for a particular group to perform a particular function. If specific people are responsible
for adding users and logins to your SQL Server, then don’t add them to the sysadmin role.
Instead, assign them the securityadmin role, and allow them to perform that function.
Your decisions about administrative security should not impose a large burden on the system
administrators. If one DBA is responsible for the server, then it doesn’t make sense to create
four or five Windows groups for different functions. Just assign this person to a group in the
sysadmin role and let them manage the server.

The recommendation is that the Windows administrators group be removed from the SQL
Server sysadmin group to ensure a separation of duties and limit the ability of non-DBA
administrators to work inside SQL Server. Consider this even in small companies where
TAKE NOTE
* one person performs both functions; the second person performing each job may not be
the same individual, and he or she can easily be added to one group without automatically
being a member of both.

IMPLEMENTING APPLICATION ROLES SECURELY


Application roles are a database-level security tool, but one that you should consider in an
X REF overall security plan for your SQL Server infrastructure. These roles allow a connection to
receive a set of rights different from the ones they gained after connecting to SQL Server. If
Lesson 6 discusses
you need to secure data in applications from access outside of those applications, you should
application roles.
consider implementing application roles.

ALLOWING IMPERSONATION AND DELEGATION SECURELY


As with application roles, impersonation using the EXECUTE AS keywords is more of a
database-level security feature. However, your overall SQL Server security design should
address whether this technique will be allowed on your servers. In some instances, regulatory
96 | Lesson 4

or other requirements for auditing may prevent the use of this feature. For example, in some
financial applications, a user initiates an action, such as trading a security. If this action is set
up in the application to be performed as another user, using the impersonation capabilities
of the EXECUTE AS clause, then the impersonated user will appear to have performed the
action. Because other users could share this capability, it will appear in audit records that the
same SQL Server user performed all trades, and this may violate the requirement of auditing
who actually performed the trade.
Delegation occurs when the SQL Server uses the credentials from the connection to access
other SQL Servers for a distributed query. Again, your security policy should address whether
this is allowed and what type of configuration should be used in implementing this feature.

Assessing the Impact of Network Policies

Network policies and infrastructure can have a substantial impact on your SQL Server
design of security policies and procedures. You should follow a few general policies from
a security standpoint, but many network infrastructure decisions can have a substantial
impact on the security of your database servers. Because SQL Servers tend to contain
WARNING Make sure your important enterprise data, they should be protected in some basic ways. First, each SQL
development servers, especially
those that contain copies of pro- Server—along with other important network servers—should be physically protected in a
duction server data, are protected locked, controlled access room. Your network policy should include this type of mandate.
in the same way as the production Make sure this is the case.
servers.

This protection of the servers should also extend to the backup systems, whether disk or tape.
X REF
A number of security breaches involving database systems have occurred when backup tapes
Lesson 5 talks more were compromised. Encryption technology as well as physical protection needs to deal with
about physical security. any removable media used for backup of your SQL Server data.
In addition to being physically protected, all SQL Servers should be logically protected at
the network level by firewalls. All connections to the server occur through network access,
even those from the local server console, so a firewall helps to ensure that only legitimate
clients are allowed to access the SQL Server. The network infrastructure team should be
aware of all SQL Servers and the access requirements of clients to configure the appropriate
firewall rules.
As you deploy SQL Servers or work with the existing environment to better secure your
instances, you’ll work closely with the network team in the placement of your instances
within the network. In many cases, you’ll want to place your SQL Servers in a central
location to ensure quick response times for clients while protecting them to some extent
from unauthorized access. For SQL Servers that provide data to Internet-accessible sys-
tems, this often means placing them in a demilitarized zone (DMZ) on the network.
However, you may also place them near other servers that are segregated from desktop
clients on the network.
CERTIFICATION READY? This configuration on the network may also extend to internal routers as your network grows.
Suppose that a new
Because large networks usually contain a number of subnets, the appropriate traffic should
remote corporate
location should have
be blocked or allowed through to SQL Servers based on the need to access that information.
users access the central SQL Servers that are used for storing data that isn’t accessed by the enterprise, such as those
corporate SQL Server that used for auditing systems, monitoring systems, and so on, should be behind routers or fire-
is behind the corporate walls configured to block random access from clients.
firewall. How could
The specific traffic requirements of SQL Server, TCP/IP connections versus named pipes,
this work? What might
need to be changed or
Secure Socket Layer (SSL) connections, encrypted traffic, and so on, will require that your
specified? DBAs and the network administrators work closely together to ensure that inappropriate
access doesn’t take place and appropriate access is granted.
Analyzing and Designing Security | 97

In addition to the protection of SQL Server, you need to work with the network team on tak-
ing advantage of SQL Server’s various features. Many of the capabilities of SQL Server require
certain configurations of the network and access beyond port 1433 for clients. Table 4-3 lists
some of the capabilities requiring network interactions.

Table 4-3
SQL Server features requiring F EATURE R EASON
network configuration
Named instances Named instances work with ports other than 1433. To
ensure proper security, each should be assigned
a port that is configured in a router or firewall.
SSL encrypted connections SSL connections should use a certificate procured
for and assigned to your enterprise by a trusted
authority. The network team usually acquires and
installs these.
SQL Server Integration Services (SSIS) Many of the connections available in SSIS require
network access such as web service connections.
These should be appropriately configured in a
firewall.
Common language runtime (CLR) By default, CLR assemblies are disabled on the
programming server. If they are installed on the server with
explicit permissions, by default they can’t access
resources outside the server. If permissions are
required to access objects outside the server, the
WARNING Microsoft has requirement and implementation should be
implemented many ways of documented in the network and routers appropriately
accessing the network both into configured. This is especially critical for the UNSAFE
and out of SQL Server. However,
each method you implement permission set, because there are virtually no
greatly increases the surface area restrictions on what an assembly can do with this
of attack. You should use them level of permissions assigned.
cautiously.

■ Achieving High Availability in a Secure Way

Many businesses want their SQL Servers to be available 24 hours a day, 7 days a week,
every week of the year. SQL Server includes a number of new and enhanced features to
THE BOTTOM LINE help businesses achieve a highly reliable database server. However, many high-availability
(HA) solutions can impact your security design, because you’re essentially spreading the
security of a single system across multiple servers.

You can design a highly available system in a number of ways, and each requires different
X REF
security considerations in SQL Server. Table 4-4 lists these technologies and some of the
Lesson 10 discusses the security ramifications.
design of high-availability
systems.
98 | Lesson 4

Table 4-4
High-availability security HA T ECHNOLOGY S ECURITY I MPACT
considerations
Clustering A clustered solution has two or more installations of
SQL Server that are set up separately but that present
a single logical instance of SQL Server. The security
access set up on these underlying instances must be
consistent so that DBAs and other administrators can
work with either database instance. The accounts for
the instance itself also have restrictions.
Database mirroring Setting up mirroring usually entails two or three
SQL Server servers, which need to communicate
with each other. This means authenticating the
connections between them. In addition, logins for
users must be available on both servers. Security
policy should ensure that changes on one server
are propagated to the other. The servers can be set
up to use Windows authentication or a certificate-
based authentication. The capability exists to work
outside the domain structure. You should consult the
database mirroring documentation if implementing
this feature. Note that mirroring was not available as
a supported technology in the initial release of SQL
Server 2005. Database mirroring is available with the
release of Service Pack 1 for SQL Server 2005 and
with SQL Server 2008.
Replication This isn’t specifically a HA technology, but many
companies use replication to move data between
servers as protection against a system failure. Logins
and security should be set up the same way between
replicated databases if they are used as a HA tool.
The replication agents also need security to be set up
in keeping with the principal of least privilege neces-
sary. These agents can be running under separate
security accounts and should be configured as such
with the minimum necessary permissions.
Log shipping Log shipping uses the SQL Server Agent service to
perform backups, move files, and perform restores.
The SQL Server Agent service account is the default,
but you can configure a separate proxy for this
purpose. A separate proxy is preferred with only
the permissions necessary to perform the log
shipping functions. Only sysadmins can implement
log shipping.
Manual backup and restore Administrators need administrative permissions and
rights to access both the backup servers and the
backup media (disk or tape).
Analyzing and Designing Security | 99

No matter which HA technology you choose to implement (if any), you need to make sure
your security policy covers the permissions and policies for each of them. Each will have dif-
ferent security requirements, and it’s easy to forget to properly secure them. Not assigning
tight security to your backup systems can result in vulnerabilities to your business, and assign-
ing very restrictive security policies can result in the failover systems not being available when
they are needed.
Implementing any of these HA technologies means your security policy must account for the
corresponding needs and requirements. Most of these technologies require the existence of an
Active Directory domain in order to achieve the authentication required between the servers.
Your policy should specifically address which ones and how the authentication mechanisms
will be implemented in your enterprise.

Mitigating Server Attacks

Most of your security policies will be developed to handle internal separation of duties
and prevent accidental integrity problems from untrained internal users. For most of your
users, security should be a transparent entity that they don’t consciously interact with.

However, many threats to your data are malicious in nature. The mainstream press tends to
portray hackers on the Internet as a great threat to your servers, but there are also corporate
hackers who can compromise your specific servers. These may be consultants, competitors,
disgruntled employees, or any other individuals who can physically enter your business or
X REF
interact with your employees. Generally, internal threats are greater than external threats.
Lesson 7 gives more
details about data security In either case, security should be a barrier that prevents these individuals from gaining access.
design. Your security should prevent them from changing or accessing data and, properly designed,
should prevent them from even knowing the data is there.
You need to implement two types of security to mitigate attacks on your server and design at
least one process into your overall security plan. You must design detailed technical security
in setting permissions, assigning roles and rights, and integrating with enterprise and network
systems. You also need to ensure that administrative security policies are in place to prevent
social engineering practices from being successful. Finally, you should ensure that the Surface
Area Configuration Manager tool is run on every installed SQL Server and any vulnerabilities
or warnings that crop up are addressed in your policy.
Technical security is the easy part of this exercise. Network access to the server should be
controlled by properly configured firewalls and routers as well as integration with enterprise
authentication. Password policies, the use of roles for rights, and proper data security designs
will address these technical requirements. Each of these items is addressed in later Lessons in
this book. Proper application design to prevent SQL injection is also important.
Administrative policies for security are much more difficult to enforce and train people to
use. Many of the security compromises that occur in business do so because of socially engi-
neered access. With social engineering, employees are often tricked into trusting an outsider
and granting access or disclosing the details of their account. Social engineering of passwords,
access rights, or any other circumvention of security policy is difficult to guard against.
Employees are naturally trusting of each other, and they have a tendency to shortcut rules to
WARNING It isn’t just out- help other employees. Skilled hackers can use this natural tendency to trick an employee into
siders who gain access by social giving them access they shouldn’t have.
engineering. Employees sometimes
engineer additional access for Constant testing of employees’ adherence to policy and penalties for bending rules is required
malicious reasons. Your policy
shouldn’t have exceptions, but
to prevent social engineering attacks. Because this is an administrative security policy rather
it should include auditing in the than one implemented using computer tools, you should consult with your Human Resources
event that an exception is made. department about what is and isn’t allowed as a policy.
You need to be aware of a few different types of attacks and plan for them.
100 | Lesson 4

PREVENTING SQL INJECTION ATTACKS


The number-one external attack on databases is through SQL Injection, especially as more
databases are used to power Web sites that anyone with an Internet connection can access.
SQL Injection occurs when someone submits additional SQL code into a field where an
application is expecting only data and then uses a statement terminator to execute a second
statement. For example, suppose you have a Web site that asks for a name and password.
After a user submits the name Bob and the password 1jdrk4, the following SQL is executed:
SELECT Name FROM Users WHERE Name = ’Bob’ AND pwd = ’ljdrk4’

This SQL code verifies whether the name and password submitted are correct if a row is
returned. Thousands of applications have been built using this technique. The problem
occurs if an attacker submits something like the following in place of the password:
Mypassword’; SELECT * FROM users;

In this case, when the variables are substituted into the previous SQL statement, this happens:
SELECT Name FROM Users WHERE Name = ’Bob’ AND pwd = ’Mypassword’;

SELECT * FROM Users;’

Note that now two statements are executed: the one that is expected and a select to return all
data from the Users table. Depending on the structure of the application, the attacker could
conceivably gain information about all users on the system. Additional code has been “injected”
into the SQL statement.
This vulnerability mainly comes from building a SQL statement in an application and execut-
ing it. The recommendation to use only functions and stored procedures goes a long way toward
preventing SQL Injection attacks by encapsulating the code inside another structure and using
variables in your code instead of building a statement with the strings stored in the variable.
You should develop a policy for all your application development work that seeks to minimize
SQL Injection vulnerabilities by using only precompiled modules and requiring all code to con-
form to best practices. If possible, you should specify that dynamic or ad hoc SQL not be allowed.

MANAGING SOURCE CODE


With the addition of modules like .NET assemblies as a method of writing functions, user-
defined data types, and other SQL Server objects, there is a new surface through which attacks
can be made. Developers often use a module that works for them without necessarily requiring
the source code. Because source code often raises the price of an assembly, many companies
purchase components for a specific piece of functionality without obtaining the source code.
This policy opens you up to potential backdoors written into the code as well as potential
SQL Injection attacks, overflow buffers, and more. A good security policy requires code
reviews of all source code by someone other than the developer to ensure that it conforms to
your standards as well as best practices. This includes purchased assemblies as well as inter-
nally developed code.

GUARDING AGAINST HTTP ATTACKS


SQL Server includes the ability to generate web services as well as send and receive HTTP
traffic from outside sources. This means attackers can directly seek to cause buffer overflows
or take advantage of any vulnerabilities that may be discovered in the web services protocols.
To ensure that the security of your database server isn’t compromised, access through this pro-
tocol should be limited to those machines or subnets that require this information. In addi-
tion, you should ensure that administrators are watchful for any issues with the http.sys driver
in Windows or any web service attacks.

THWARTING PASSWORD CRACKING


Prior to SQL Server 2005, some tools could perform a brute-force dictionary attack on SQL
Server logins with access to the syslogins table. These tools worked extremely fast, cracking
almost any password within a few hours.
Analyzing and Designing Security | 101

Although such tools haven’t proven that they work on SQL Server 2005, it’s possible that
some such tool will be developed. Using a strong password policy that forces changes on a
regular basis can help to thwart this type of attack. You should also set access controls to limit
the ability of regular users to access the syslogins table.

Protecting Backups

In addition to the database server, you need to secure the backup data extracted from
SQL Server in the event of a disaster. Most companies use offsite storage for their back-
ups, whether tapes in a temperature-controlled environment or real-time disk backups in
another location. This data must be protected just as strongly as the production backup,
because it contains a copy of your databases at a point in time. Many news stories in
recent years have described how backup tapes containing sensitive data have been pilfered.

Your policies concerning backups should ensure that access to them, whether on physical
media or on a file system, is limited to those individuals who require this access (usually
system administrators). In addition, with privacy and data security laws being enacted, data
given to developers for testing purposes should be obfuscated in some manner.
The password features for SQL Server backups may prevent their restoration by an attacker;
however, the files themselves have been shown to contain the data in clear text. A third-party
encryption product or the Windows native encryption features for the file system should be
used to prevent anyone from accessing this data if they manage to obtain the files.

■ Auditing Access

One of the main ways in which you can measure the effectiveness of your security policy
is by examining its effectiveness over time. This requires that you implement auditing that
THE BOTTOM LINE
both tracks changes made to the individual SQL Servers and catches any attempts at inap-
propriate access.

Just as a database can track changes to data made over time, your security design should
CERTIFICATION READY?
include provisions that track configuration changes; the addition of users, roles, or other
Your Windows OS and
SQL Server maintain
objects; and security changes made on the server. Preferably, you should use automated
multiple log files. Which methods to track changes, such as data definition language (DDL) triggers, which can provide
log maintains auditing a record of security related alterations.
data? Can you identify System errors and alerts should be noted as well, because they often indicate when an attack
other logs and their
has taken place or the database server is inappropriately configured. SQL Server Agent can
purposes?
notify administrators and should be configured to do so when errors are trapped or alerts are
fired. A good policy ensures that important items in your environment are monitored.
Your design also needs to provide some method of tracking unauthorized attempts to access
your SQL Servers. These access attempts can help gauge how much effort is being put into
testing your security. A network-level intrusion detection system (IDS), automated Profiler
traces, or internal SQL Server alerts can be helpful in providing an audit trail.

If you have automated processes or applications that connect without a live user, expired
passwords can cause failed attempts that look like an attack on your server. Auditing will
TAKE NOTE
* help you track down these applications and determine whether there is a configuration
issue or a real attack.
102 | Lesson 4

As with all other security design considerations, the costs and benefits should be factored into
your final design and presented to business leaders and executives.

■ Making Security Recommendations

Once you’ve performed your cost analysis, you need to make recommendations for a secu-
rity policy at some level. It may be for the entire enterprise or for a single instance of SQL
THE BOTTOM LINE
Server. In either case, your decisions need to be based on your sound judgment that they
meet the necessary requirements (whether regulatory or mandated by the business). Your
recommendations should also balance the desire to meet other requirements with the cost
to the business of implementing them. Choosing to require password changes every day
costs too much for most businesses and wouldn’t be a recommendation for a retail chain.
But for a very high security situation, it might be acceptable.

Although you’re gathering information from many people throughout this process and may
be sharing your decisions or thoughts about security with them, you need to create a formal
document that lists the specific security recommendations you’ve decided on. This should be
a complete and final recommendation of how your SQL Server security infrastructure will be
implemented, and it should include the touchpoints with other groups (such as the network
group) along with your requirements for them.
Your document should describe the rationale for your decisions as well as the impact of those
decisions on the various parts of the business. As much of the cost analysis as possible should
be included in this document. The data you supply and the costs, benefits, and risks should
be used by business leaders to determine how to proceed.

Make every effort to present a complete and final document. Business executives and
affected leaders may overrule some sections and want them changed, and you should
TAKE NOTE
* incorporate their feedback into your document and resubmit the plan. However, when you
make your submission, it should be complete, not a draft.

■ Performing Ongoing Reviews

Security is a constant process, unlike many other computer functions that can be config-
THE BOTTOM LINE ured and left alone. Your design should be encompassed in a living document that changes
and flexes according to the changing environments of SQL Server and your enterprise.

In securing your SQL Servers, you may make choices that other departments don’t find
acceptable, given the impact on their business area. Have alternatives for them along with
the risks and costs associated with changing your plan. Security is always a balance between
protecting the integrity of and access to your data and enabling the business to function
smoothly.
As a part of your design, you should include a time frame in which the design itself is
reviewed. Doing so ensures that the design evolves over time to meet the changing environ-
ment it governs. Typically this review happens every year, but the time frame can be longer or
shorter depending on your particular requirements.
Analyzing and Designing Security | 103

S K I L L S U M M A RY

Developing a comprehensive security plan for SQL Server isn’t a quick or easy process. It
depends on the requirements of your particular business, both internal and external. This
Lesson has guided you in the areas that are important and given you ideas for the items you
should consider in a security policy. However, because the requirements of each organization
are unique, it’s difficult to specifically determine which policies you should implement.
One guiding principal in all your security design is that each role should have the least
amount of privileges required to complete an appropriate task. It’s always possible to assign
multiple roles to a specific individual, but by separating these roles into different groups,
you can more easily ensure that your policies are followed and that rights are granted and
revoked as users move in and out of roles. Windows authentication is
preferred for this reason.
Security is an ongoing process. It requires review as well as auditing of both actions and com-
pliance. Your design should specify methods that let you audit the security of your SQL Servers
to be sure they are properly configured and that inappropriate access isn’t taking place. Your
design should also specify a time frame in which the plan is reviewed so that it continues to
meet your requirements.
For the certification examination:
• Know how to gather security requirements. Understand the different types of require-
ments and their order of importance.
• Understand the various ways network policies impact SQL Server. There are a number of
places that network polices impact the security plan for SQL Server.
• Know the implications and benefits of choosing an authentication mechanism. There are
two choices for SQL Server authentication, and you should understand the differences
between them.
• Know how to analyze the risk of attack to your SQL Server and mitigate any issues.
Understand the types of attacks, both computer and social engineering based, and
strategies for mitigating them.
• Understand how to examine the true costs of your decisions. Each decision has an associ-
ated cost and a benefit as well as a risk factor. Know how to include these in your security
design.

■ Knowledge Assessment

Case Study
Delaney’s Simulations
Delaney’s Simulations is a company that provides event scenarios to police and military
organizations for training purposes. The events are mocked up at their facility. The
actions of trainees are recorded and evaluated, and the data is stored in a SQL Server
2005 database.

Planned Changes
The company is growing and wants to make the results of the simulations available to
clients on the Internet, but there is some concern over security. The privacy of the train-
ees as well as the clients must be maintained, and the results must not be disclosed to
104 | Lesson 4

any unauthorized individuals. A new security policy must be designed. The senior DBA,
Dean, at Delaney’s Simulations is tasked with developing policies for the database servers.

Existing Data Environment


The company currently uses two SQL Server 2000 servers. SQLTest is used to test the
simulations in conjunction with the SimTest server that hosts the ASP.NET application.
Matching SQLLive and SimLive servers are used in the actual simulations.
Two new SQL Server 2005 servers will be set up as SQLWebTest and SQLWebLive for
the web-based environment that will provide data to the company’s clients. These servers
will receive data from the simulation servers using SSIS.

Existing Infrastructure
Currently, Delaney’s Simulations uses SQL Server authenticated logins for the develop-
ers as well as the simulation applications. These applications have been in use for some
time, and a former developer’s account is used for all the connections.
An Active Directory domain is used for employees and servers as a central point of
authentication. There is only one OU currently, but a second one is planned for the
servers that will be exposed to the Internet.
Additional firewalls are planned for the Internet connection and as a way to segregate
the servers used for the Internet from the other internal servers and clients.
All IT personnel have complete access to the development servers. Only a few people
have access to the production servers.

Business Requirements
Although the company expects that the results of all simulations will be available to
clients on the Internet, there is an understanding that privacy and security requirements
may prevent that. Developing a strong security policy is critical to the company’s con-
tinued success.
Additional time has been allocated to the development team to make changes that will
provide better security. One possible consideration is the deployment of a .NET client
application to all the company’s clients and forcing all data access through this client
application. If the security requirements dictate this approach, the project will proceed.
Because the company has a number of government clients, some regulations relating to
industries in the Code of Federal Regulations (CFR) apply to Delaney’s Simulations.
These regulations require the auditing of all access to any person’s result data as well as
controls placed on who inside Delaney’s Simulations can access this data.

Technical Requirements
The new servers will be placed in a separate OU, and specific accounts will be required
for the SSIS transfer of data.
Firewalls will be installed, and only access through specific routes to specific machines
will be allowed. The only exception is the Internet web server.
It has been decided that as many SQL Server–authenticated accounts as possible need
to be replaced with Windows Integrated logins, because corporate policy dictates that
all access be granted by the AD domain.
Access through the Internet should entail as few permissions as possible and should
provide the best security that the IT group thinks it can provide.
Analyzing and Designing Security | 105

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions.
1. The internal development team members insist that they need to have a copy of the
production server’s data in their test environment. However, the CFR regulations seem
to prohibit this, because the developers are not a group that needs access to individuals’
data. How should the security policy address this request?
a. Set a corporate policy to override the CFR regulation and allow developer access.
b. Prohibit developer access as required by the CFR regulation, and force the developers
to build their own test data.
c. Allow the developers to receive copies of the data from the production server, but
require obfuscation of the individual information to satisfy the CFR regulation.
d. Allow each client to determine whether they see the need to comply with the CFR
regulation.
2. To comply with the company’s security requirements, what should be done about the
developers’ access to their test server?
a. Their Windows logins should be added to the test servers with the same rights as
their old SQL logins, and their SQL logins should be deleted.
b. Their Windows logins should be added to the test servers with the same rights as
their SQL logins.
c. A central application role should be created for all developers.
d. One Windows login should be created for all developers to share.
3. It is decided that the risk of data compromise should be reduced by limiting the rights of
the Internet application to access data. Which of the following would be the best policy
to choose?
a. Set up a Windows account for IIS to use, and grant this account login rights to the
SQL Server and the tables that it needs.
b. Set up a Windows account for IIS to use that can only log on to the SQL Server. Use
an application role that the ASP.NET application can invoke to get rights to indi-
vidual tables.
c. Hard-code the information for a SQL Server login into the application to use.
4. Company management prefers that only the DBA or the DBA’s backup be allowed to
deploy changes to the production environment. What two policy changes should be used
to enforce this?
a. Limit access to the production servers to only those individuals’ machines using the
firewall.
b. Distribute a memo to all employees outlining who can access the production servers.
c. Disable all network access, and distribute keys to the data center to the individuals
who will access the production servers.
d. Only grant access to the production servers to the appropriate individuals’
ActiveDirectory accounts.
5. Because the CFR requirements can be amended every year, what policy should be set in
place?
a. All past implementations are expected to be grandfathered, so nothing should be done.
b. One of the IT employees should be designated to review the CFR requirements each
year and determine whether application changes are needed.
c. The application should be rewritten every year to ensure it complies with the current
regulations.
6. To ensure that the auditing requirements are met, what type of policy should be set up?
a. Develop corporate guidelines that outline how auditing should be built into any
application that accesses the data.
b. Because only one application currently accesses the data, build auditing into the
ASP.NET application.
106 | Lesson 4

c. Require that the database log all accesses, and force queries to use stored procedures.
d. Disable all client accounts by default, and enable them only after clients phone and
confirm they require access that day.
7. To ensure that only the appropriate individuals receive access, what policies should be in
place for the support personnel? (Choose as many as needed.)
a. Passwords may not be given out over the phone. Password resets must be sent to a
registered e-mail address.
b. Firewall changes to allow access from new IPs should be performed only after the
requester’s identity and authority are verified.
c. No account information is to be sent in e-mail or given over the phone in response
to a request. An e-mail must be composed from scratch and sent to registered clients’
addresses only.
d. Before troubleshooting problems with a client, verify their identity by sending them
an e-mail using the e-mail address on file with Delaney’s Simulations.
8. The disaster-recovery plan includes switching the production and development servers if
necessary. How should security policy be handled for these servers?
a. The accounts are preset on the development machine, but they are disabled. Auditing
is set up to ensure they are not enabled without authorization.
b. No accounts are set up. They can be re-created in the event of an emergency.
c. The accounts are set up and ready in case a disaster occurs. Employees are instructed
not to use them.
d. There does not need to be any policy regarding this matter.
9. What type of account should be used to transfer data between the simulation SQL
Server and the web SQL Server for SSIS?
a. The developer’s account that builds the packages
b. The SQL Server database server service account
c. A dedicated domain account with rights to only those machines and the appropriate
tables
d. The local system account
10. After SQL Server has been installed on all servers, what should be done before allowing
users to access them?
a. Disable the SQL Server Agent.
b. Set up Profiler to run in C2 mode.
c. Turn off SQL Server login access.
d. Run the Surface Area Configuration Manager tool.
Designing Windows L ESSON 5
Server-Level Security
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Develop Microsoft Windows server-level security policies. Foundational
Develop a password policy. Foundational
Develop an encryption policy. Foundational
Specify server accounts and server account rights. Foundational
Specify the interaction of the database server with antivirus software. Foundational
Specify the set of running services and disable unused services. Foundational
Specify the interaction of the database server with server-level firewalls. Foundational
Specify a physically secure environment for the database server. Foundational

KEY TERMS
asymmetric key: In cryptology, belongs to a stipulated individual changing data into an unreadable
one key, mathematically related or organization. form.
to a second key, is used to encrypt cryptology: The study or practice services: Processes that run in
data while the other decrypts the of both cryptography (enciphering the background of the operating
data. and deciphering) and cryptanalysis system; analogous to Daemons
certificate: A digital document (breaking or cracking a code in Unix.
(electronic file) provided by a system or individual messages). symmetric key: In cryptology, a
trusted authority to give assur- encryption key: A seed value used single key is used to both encrypt
ance of a person’s identity; cer- in an algorithm to keep sensitive and decrypt data.
tificates verify a given public key information confidential by

SQL Server is a software application that runs on top of a Windows operating system
server. This means the security of a SQL Server installation depends to some extent on
the security processes that exist for the Windows server. If the Windows server is com-
promised, then there is a good chance that some or all of the SQL Server security can be
circumvented.

The previous Lesson looked at an overall security policy for your SQL Server infrastructure.
This Lesson moves to the more granular level of the individual Windows server. You’ll learn
how many of the serverwide security parameters of SQL Server are determined and set
globally for each instance.

107
108 | Lesson 5

■ Understanding Password Rules

With SQL Server authentication, the database server can apply password policies to SQL
THE BOTTOM LINE
Server logins. This greatly reduces the effectiveness of a brute force attack on a particular
login, because the password doesn’t have to remain the same indefinitely.

TAKE NOTE
* With SQL Server, you have two choices for authentication, as discussed in Lesson 4. With
Password policies for Windows authentication, the password policies set on the Active Directory (AD) domain
SQL Server logins can govern the individual login; SQL Server is completely removed from managing this part
be enforced only when of security.
the instance is installed
The options available for SQL logins, shown in Figure 5-1, can be set when a login is created
on Windows Server
or changed later if a login is edited. In this dialog box a SQL Server login is selected, which
2003 or 2008.
requires that a name and password be specified.

Figure 5-1
Password policy options

SQL Server Management Studio lists three options for this login, as described in Table 5-1.
Although one is called Enforce Password Policy, all three are part of the password policy or
rules for server security.

Table 5-1
Password policy options O PTION D ESCRIPTION D EFAULT
Enforce Password Policy Causes the password to be Checked by default
checked against stipulated
policies
Enforce Password Causes the RDBMS to Checked by default
CERTIFICATION READY? Expiration respect the password
These checks have expiration policy set on
dependencies: If Enforce the host server operating
Password Policy is not
system
set, SQL Server does
not enforce the other User Must Change Requires a new password to Checked if a new password
two; if Enforce Password Password at Next be set the next time the user is entered or by default
Expiration is not set, SQL Login logs in, before any batches
Server does not enforce are processed
the User Must Change
Password at Next Login.
Or, to put it another way,
you cannot select Change
Password unless you also The first two options correspond to the same settings on Windows Server 2003 (and newer)
select the other two. for user accounts. By default, a Windows server in a domain respects the domain policies set
in AD; but in either case, SQL Server follows the policy of the host Windows server.
Designing Windows Server-Level Security | 109

The domain host server has only one policy. This means that although you set these
TAKE NOTE
* options individually for each instance, the amount of time before password expiration is
the same for all instances on a server.

Enforcing the Password Policy

The Enforce Password Policy check box requires that any password meet the following
requirements by default (available on the Windows operating system on Windows Server
2003 and newer):

• The password must be at least eight characters long.


• The password can’t contain all or part of the username. Specifically, no three or more
consecutive alphanumeric characters delimited by white space, comma, period, or hyphen
can match the username.
• The password must contain characters from three of the four following areas:
° Uppercase letters (A through Z)
° Lowercase letters (a through z)
° Base 10 numbers (0–9)
° Nonalphanumeric characters such as the exclamation point (!), at symbol (@), pound
sign (#), dollar sign ($), and so on
If a password doesn’t meet these requirements, then it isn’t accepted in the new login dialog
box or in the Properties dialog box for an existing login. If this option is set for the login,
then these requirements are also enforced when a login changes its password using SQLCMD
or another application.
These requirements are checked by comparing the entered password with the selected domain
or local policies.

Enforcing Password Expiration

The password expiration follows the same setting for the host Windows server logins, as
shown in Figure 5-2 on the Windows 2003 platform.

Figure 5-2
Windows 2003 password
expiration setting

When a password is set or changed, the date is noted in the master.sys.server_principals sys-
tem view. This view is checked on each login to determine whether the password has expired.
If the password has expired, then users are prompted to enter a new password before they can
continue with their session.
The three password check boxes are related. You can select Enforce Password Policy
only; Enforce Password Policy and Enforce Password Expiration; or Enforce Password
Policy, Enforce Password Expiration, and User Must Change Password at Next Login.
Be aware though that these settings involve password settings on the Windows operating
system.
110 | Lesson 5

Enforcing a Password Change at the Next Login


This option is selected by default whenever a new password is entered. It forces a pass-
word change by the login as soon as the next session is established but before any batches
can be processed. This prevents the administrator who set the password from knowing
the user’s password indefinitely. It’s a good security policy that prevents unauthorized
access by the administrator. It also ensures that the user chooses their own password,
which increases the likelihood they will remember it.

Following Password Best Practices

The best practice for any password mechanism includes all three of these elements, which
is why they’re selected by default. Your enterprise may have its own requirements, but in
general all three options should be selected for most logins.

An exception to this policy may exist for automated services that require their own
accounts. You should still enforce the policy; but because of issues if these services fail, and
TAKE NOTE
* their inability to change their own passwords, the expiration and change requirements may
not be selected. This doesn’t mean you should never change those passwords—but they
must be manually changed.

LAB EXERCISE

Perform the exercise in your lab In Exercise 5.1, you’ll walk through how to add a login.
manual.

■ Setting Up the Encryption Policy

In SQL Server 2005, the encryption capabilities have been greatly expanded, and crypto-
graphic functions and features have been introduced throughout the platform. These features
THE BOTTOM LINE
allow the encryption of data using a variety of techniques and algorithms, which enables the
administrator to meet most security needs.

SQL Server 2000 had only one option for native encryption in your database: the one-way
ENCRYPT() function, which generated a one-way hash of a string. This enabled you to
encrypt a value for comparison with another hashed value later, in the same way Windows
encrypts your password.
This encryption’s flexibility was limited, however, and data stored in this form couldn’t be
decrypted back to the original text. With today’s increased functionality, however, deciding
how to deploy encryption and use it in your enterprise requires careful examination of the
level of security you need versus the effort required to meet those needs.
You need to examine a high-level view of how encryption works before delving into the
details and implications of using encryption. When a user needs to read this data, they must
supply some sort of password or certificate that the server uses to decrypt the data back into
readable values. This process is fairly straightforward, but its administration is complex and
should be considered with caution before implementation.
Designing Windows Server-Level Security | 111

WARNING If you change


Understanding the Encryption Hierarchy
the service account using SQL
Server Configuration Manager, Before you can make decisions about how and where to deploy encryption, you need to
then the service master key is
decrypted with the old password examine how the encryption technologies work in SQL Server. Begin by looking at the
and re-encrypted with the new hierarchy of encryption and how it’s structured in the product.
password automatically. If you
change the service account manu-
ally, then you must decrypt and
re-encrypt the service master key When SQL Server is first installed, the service account password is used to encrypt a service
manually as well, or your encryp- master key. This is done using a 128-bit Triple DES algorithm and the Windows Data
tion hierarchy will be broken and Protection API (DPAPI). The service master key is the root of the encryption hierarchy in
data may be lost.
SQL Server, and it’s used to encrypt the master key for each database.
Although the service master key is automatically created, the individual database master
keys require manual preparation. The administrator creates this key for each database when
encryption is needed. The database master key, in turn, is used to encrypt asymmetric keys
and certificates that are used for encrypting specific data.
LAB EXERCISE

Perform the exercise in your lab In Exercise 5.2, you’ll set up a database master key before you begin looking at the keys
manual. available in SQL Server.

Using Symmetric and Asymmetric Keys

Two types of keys are used in cryptology operations. A full discussion of the details of
these key types is beyond the scope of this textbook, but you’ll review an introduction to
enable you to make some decisions regarding your encryption policy. It’s assumed that
you have a basic understanding of how cryptography works and the meanings of some
common cryptographic terms.

Symmetric keys are the simpler of the two key types and pose much less of a load on the
server during encryption and decryption operations. They’re called symmetric because the
same key is used for both encryption of plaintext and the decryption of ciphertext. This poses
some security risks, because only one key is needed for the operations and there is no way to
authenticate the other side of the cryptographic transaction. However, this is still fairly strong
encryption and is often used to encrypt the data in a SQL Server column.
SQL Server lets you use the following algorithms in symmetric key encryption:
• Data Encryption Standard (DES)
• Triple DES (3DES)
• RC2
• RC4
• RC4 (128 bit)
• DESX
• Advanced Encryption Standard (AES; 128-, 192-, or 256-bit version)
When you create a symmetric key, you specify the algorithm to be used as well as an
encryption key mechanism. The encryption method can be the database master key, which
means SQL Server can automatically decrypt and open the key for use in encryption or
decryption operations. The default method is Triple DES if a key is secured by a password
instead of the database master key.

SQL Server encryption relies on the Windows operating system to implement individual
TAKE NOTE
* types of encryption. In some cases, older versions of Windows may or may not support
newer encryption algorithms that are supported by SQL Server.
112 | Lesson 5

Asymmetric keys differ from symmetric keys because a different key is used for encryption
WARNING If you don’t use
the database master key to secure operations than for decryption operations. Asymmetric keys come in pairs, and each key in
the symmetric key, then the key is the pair is designated as either public or private. These keys are more complex and require a
potentially secured by a weaker larger amount of resources to perform either encryption operation, placing more of a load on
algorithm (Triple DES) than the data
(if you choose RC4_128 or AES). your SQL Server.
The public key is distributed to users or applications, and the private key is used by the server
to encrypt data. Users submit the public key with their queries, and the server can use that
key to decrypt the data. In addition, because the keys are matched as a pair, the server knows
that a particular user’s public key is matched with the private key, providing the cryptographic
transaction with a level of authenticity.
All asymmetric keys are created using the RSA algorithm with 512, 1,024, or 2,048 bits in
the keys. Certificates are a form of asymmetric key and are discussed in the next section.

Using Certificates

SQL Server allows certificates to be used in encryption operations in addition to keys.


SQL Server conforms to the X.509 standard for certificates that is widely used around
the world. Certificates can be organized in a hierarchy of trust, which ensures that a
particular certificate can be used in encryption operations and also that certificates can
be traced to ensure that the holder of the certificate is a particular entity. In SQL Server,
you can generate your own self-signed certificates or use a certificate from a Certificate
Authority (CA) that is trusted by your enterprise and has signed a certificate.

Certificates are useful when you need to certify that your data is in fact your data. By
encrypting data using the private key for your certificate, a user can use the public certificate
not only to decrypt the data, but also to trace back the certificate to authenticate the server as
belonging to your enterprise. In essence, the certificate provides a digital signature.
Certificates can also be revoked and expired. This can be useful if you need to limit access
to a certain period of time or if you want to ensure periodic reissue of the certificate and
validation of its identity or authorization.

Considering Performance Issues

Encryption is a great tool for protecting your company’s data from theft and unauthor-
ized use. However, it isn’t a tool that you can widely deploy to all the data in your data-
bases. Encrypting data involves a number of performance issues that limit how widely
you can deploy this feature.
TAKE NOTE
*
If you use certificates,
SQL Server makes We have already mentioned the first consideration in the earlier discussion of symmetric and
this process easy asymmetric keys: Asymmetric keys provide more security but at the cost of slower perfor-
by including the mance for both encryption and decryption operations than symmetric keys. For this reason,
DecryptByKeyAutoCert() asymmetric keys generally aren’t used to encrypt data, but rather are used to encrypt symmet-
function. This function ric keys. The symmetric keys are then used to encrypt the data.
automatically decrypts
This hybrid method of using cryptographic techniques can be confusing, but it’s widely
data encrypted by a
deployed. The speed difference can be substantial, so symmetric keys are commonly used to
symmetric key that is
encrypt all types of messages and other data; then, the symmetric key is secured by a longer,
itself encrypted using a
stronger, asymmetric key. This technique reduces the performance penalty for the stronger
certificate.
encryption to a minimum.
Designing Windows Server-Level Security | 113

Another consideration for encrypted data is that once the data in a column is encrypted, it
WARNING Don’t forget the
effect of encryption on write opera- isn’t available to the query processor for use in indexes, joins, sorting operations, grouping
tions. Even with symmetric keys, operations, and filtering. This can seriously impact your database’s ability to perform efficient
the overhead of adding encryption queries. A database that was completely encrypted would be equivalent to a database with no
can substantially reduce a busy On-
Line Transaction Processing (OLTP) indexes; it would incur a substantial performance penalty in reading data due to the decryp-
system’s ability to keep up with tion operations.
inserts and updates.
If the overhead of the encryption and decryption operations isn’t enough of a penalty,
encrypted data also can affect performance in another way: Encrypted data grows in size (often
substantially) and causes fewer rows to be returned with each page in addition to requiring
more disk space for storage and backups. The growth in size is given the following formula:
Size = (FLOOR (8 + D) / BLOCK) + 1) * (BLOCK + BLOCK + 16)
LAB EXERCISE
D is the original data size, and BLOCK is the bit size of the cipher. RC2, DES, Triple DES,
Perform the exercise in your lab and DESX are 8-bit ciphers, and the rest are 16-bit ciphers. In Exercise 5.3, you’ll see how
manual. dramatically a piece of data can grow with encryption applied.
A 20-character data element would require 35 bytes—57% more space—if encrypted. For
larger columns, that difference is less, but for small columns, this may require substantial
schema changes.
Because the addition of encryption can affect your server in a number of ways, you should
consider each of these implications when deciding how widely you’ll deploy this feature in
your applications. In addition to the performance implications, don’t forget that the addi-
tional overhead for data size can also affect your schema, forcing changes not only in tables,
but also in related code in stored procedures, functions, assemblies, and other parts of your
application.
The next section will look at a few ways you need to use this information in deciding what
amount of encryption to use and how to deploy it.

Developing an Encryption Policy

Now that you know the details of encryption, you must determine what type of policy
makes sense for your enterprise. The security of financial or medical information may
necessitate encryption because of regulatory requirements. The entries in a Web site
guestbook for your local photography club may not need any type of encryption—or
much security of any kind.

The policies you develop for encryption will partly be driven by your enterprise’s overall secu-
X REF
rity policy. The need for encryption to be deployed will be dictated by that policy, whereas
Overall security policy is many of the technical details must be determined separately.
discussed in Lesson 4.
When you determine that data from a particular table requires encryption, be sure you limit
the encryption to only those columns that really need it. The time required to encrypt and
decrypt each column of data affects the performance of your server by slowing it down, and
encryption also requires additional disk space. Avoid including primary keys and foreign keys
(columns used for sorting or grouping operations) as encrypted columns, because you pay
a severe performance penalty for any queries that need to perform these operations on
encrypted columns.

If you need to encrypt a column that is functioning as a primary or foreign key, you
TAKE NOTE
* should derive a surrogate key and use that instead as the primary or foreign key.
114 | Lesson 5

Managing Keys

You shouldn’t choose to implement encryption without thinking about how doing so will
affect your enterprise. Often, DBAs don’t have access to the keys, which limits their ability
to help with queries and check data integrity. The key management scheme is often a complex
and difficult undertaking because the security of these keys is critical to ensuring that
unauthorized individuals can’t access your data. Deciding which keys are available to which
individuals and how to store these keys is critical to a well-designed encryption scheme.

If your key management scheme is compromised, then you may need to change your keys.
If you’re changing the asymmetric keys used to encrypt symmetric keys, this is a simple
process—especially with certificates, which can be revoked. However, if you need to change
the symmetric keys that encrypt your data, you must decrypt the data and then re-encrypt
it with a new key. This endeavor can be resource intensive and time consuming if you have
a large amount of data. This is another reason to limit the amount of encryption that you
choose to deploy in your database.

Choosing Keys

It’s recommended that you use a hybrid key scheme, with asymmetric encryption used
to secure the symmetric keys that encrypt the data. The symmetric keys should use the
strongest algorithm you can afford to deploy. This is determined by testing and weigh-
ing the performance implications of each algorithm along with any specific requirements
you have from regulatory bodies. For example, your company may be required by law to
implement DES encryption even though RC4_128 might be more secure.

The asymmetric encryption that you choose should follow the same guidelines. Longer keys
containing more bits are preferable to shorter ones, but there can be a severe performance
penalty. Test the keys under load to ensure that your server can handle the additional process-
ing requirements. Certificates can be used instead of plain asymmetric keys, but be sure you
have a mechanism for ensuring their update, revocation, and replacement as necessary. The
details of developing such a policy are beyond the scope of this textbook, but more details are
available in the Windows Server Resource kits and from certificate vendors.
The overall encryption of the server also requires that the service master key and the database
master keys be protected. SQL Server handles the service master key internally as long as
you use the Configuration Manager to change service accounts. The database master keys are
needed whenever you restore a database to a new server. This often occurs in disaster-recovery
scenarios and development areas, so your policy should address how to protect the keys in
these situations as well as make them available when needed.
Finally, SQL Server lets you implement user-level encryption using a passphrase instead of a
key. This mechanism can be tempting, but be careful—forgetting the passphrase means that
data is lost. Avoid this encryption mechanism in your policy if possible.

Extensible Key Management

SQL Server 2008 includes a new feature known as Extensible Key Management (EKM).
This is a method of providing for encryption using software and usually hardware such
as smart cards or USB devices provided by third-party entities. With EKM, encryp-
tion can be established using physical hardware known as a Hardware Security Module
(HSM). This can be a more secure solution because the encryption keys do not reside
with encrypted data in the database. Instead, the keys are stored on the hardware device.
Extensible Key Management with an external hardware device can provide the following
benefits depending on your security design:
Designing Windows Server-Level Security | 115

• Additional authorization check providing further separation of duties


• Faster performance using hardware-based encryption/decryption
• External encryption key generation
• External encryption key storage via physical separation of data and keys
• Encryption key retrieval
• External encryption key retention
• Easier encryption key recovery
• Manageable and securable encryption key distribution
• Secure encryption key disposal
EKM is implemented by registering the service in SQL Server 2008. This involves creating a
cryptographic provider object at the server instance level and specifying a dll file containing
X REF the software to implement the encryption. The dll file should be supplied by the EKM
product vendor. The cryptographic provider object can then be used in creating both
Transparent Data symmetric and asymmetric keys. The keys are then used like other keys to encrypt or decrypt
Encryption is explained objects in SQL Server. EKM can also be used with Transparent Data Encryption (TDE) to
further in Lesson 6. encrypt entire databases although only asymmetric keys can be used for TDE.

Extensible Key Management is only available in the Enterprise, Developer, and Evaluation
TAKE NOTE
* editions of SQL Server 2008.

■ Introducing SQL Server Service Accounts

SQL Server services process database actions. Each service must be started by an authorized
user. Each service must have an owner with appropriate rights and permissions to perform
THE BOTTOM LINE
the tasks assigned. Use SQL Server Configuration Manager to make any changes to SQL
Server services.

SQL Server and its associated parts are software programs that are similar to any other programs
on your computer. A user must log in and start each of these programs in order for them to run.
For a server program such as SQL Server, you don’t want to have to log on and start it manually
each time the server is started—especially if it’s restarted in the middle of the night!
In order for SQL Server to start automatically, a user account must log on to the host Windows
server and then start running the application. In Windows, this overall scheme represents a service,
and the service account is the account that logs on to the Windows server and starts the program.

Understanding the SQL Server Services

SQL Server offers a number of services that are part of the database server, each of
which can have its own service account. In all, 10 services can be installed with SQL
Server, as listed in Table 5-2. They aren’t all installed by default; you must select them
for installation.

TAKE NOTE
* Each of these services can have a service account that logs on to start the service and whose
The instance name context is used when performing actions for the service. For the service account to have
includes the name of the enough rights for the service to run under its context, each is added to a group created by
local computer. the installation program. The groups for each service are shown in Table 5-2 as well. The
“InstanceName” part of the group name is replaced by the name of that particular instance.
When the SQL Configuration Manager changes a service account, the new account is placed
in the appropriate group for the rights needed to run that service. The former user is also
removed from the group. This ensures that the appropriate rights are granted for each account
as required in accordance with the principle of granting the least rights needed.
116 | Lesson 5

Table 5-2
SQL Server services

S ERVICES U SER G ROUP I NSTANCE A WARE ?


SQL Server Default: 2005: SQLServer2005MSSQLUser$ComputerName $MSSQLSERVER Yes
Default: 2008: SQLServerMSSQLUser$ComputerName $MSSQLSERVER
Named: 2005: SQLServer2005MSSQLUser$ComputerName $InstanceName
Named: 2008: SQLServerMSSQLUser$ComputerName$InstanceName
SQL Server Agent Default: 2005: SQLServer2005SQLAgentUser$ComputerName$MSSQLSERVER Yes
Default: 2008: SQLServerSQLAgentUser$ComputerName$MSSQLSERVER
Named: 2005: SQLServer2005SQLAgentUser$ComputerName$InstanceName
Named: 2008: SQLServerSQLAgentUser$ComputerName$InstanceName
Analysis Services Default: 2005: SQLServer2005MSOLAPUser$ComputerName$MSSQLSERVER Yes
Default: 2008: SQLServerMSOLAPUser$ComputerName$MSSQLSERVER
Named: 2005: SQLServer2005MSOLAPUser$ComputerName$InstanceName
Named: 2008: SQLServerMSOLAPUser$ComputerName$InstanceName
Reporting Default: 2005: SQLServer2005ReportServerUser$ComputerName Yes
Services $MSSQLSERVER and SQLServer2005 ReportingServices
WebServiceUser$ComputerName$MSSQLSERVER
Default: 2008: SQLServerReportServerUser$ComputerName$MSRS10
.MSSQLSERVER
Named: 2005: SQLServer2005ReportServerUser$ComputerName$Instance
Name and SQLServer2005ReportingServicesWebServiceUser
$ComputerName$InstanceName
Named: 2008: SQLServerReportServerUser$ComputerName$MSRS10
.InstanceName
Notification Default or Named: 2005: SQLServer2005NotificationServicesUser No
Services $ComputerName
Integration Default or Named: 2005: SQLServer2005DTSUser$ComputerName No
Services Default or Named: 2008: SQLServerDTSUser$ComputerName
FullText Search Default: 2005:SQLServer2005MSFTEUser$ComputerName$MSSQLSERVER Yes
Default: 2008: SQLServerFDHostUser$ComputerName$MSSQL10
.MSSQLSERVER
Named: 2005: SQLServer2005MSFTEUser$ComputerName$InstanceName
Named: 2008: SQLServerFDHostUser$ComputerName$MSSQL10.InstanceName
SQL Server Default or Named: 2005: SQLServer2005SQLBrowserUser$ComputerName No
Browser Default or Named: 2008: SQLServerSQLBrowserUser$ComputerName
SQL Server Active Default or Named: 2005: SQLServer2005MSSQLServerADHelperUser No
Directory Helper $ComputerName
Default or Named: 2008: SQLServerMSSQLServerADHelperUser$ComputerName
SQL Writer N/A No

If you choose to manually change the service account using the Services applet in the Control
Panel or the Manage Computer MMC snap-in, make sure the new account is placed in the
TAKE NOTE
* appropriate group. It isn’t recommended that you assign individual rights to each user account.

Each of these services is classified as either instance-aware or instance-unaware. If a service is


instance-aware, then separate copies of its executables and supporting files are installed with
each new instance, and it’s able to run independently of other instances. Services that are
Designing Windows Server-Level Security | 117

instance-unaware are installed only once on each Windows host and serve all instances on
LAB EXERCISE that host. (Table 5-2 gives the classification of each service.)
Perform the exercise in your lab In Exercise 5.4 Part A, you’ll explore service account groups.
manual.

Choosing a Service Account


Each particular service must have an account under which it can run, but you have a
number of choices when choosing an account. When you install SQL Server, you’re
given the option to allow all services to run under the same account. Alternatively, you
can specify each service to use separate accounts. This section examines the default and
optional accounts before looking at why you should choose a particular account.

The default choice is to specify a Domain User account under which the SQL Server service
will run. This approach is recommended to ensure that the least amount of privileges is granted
and that you control which permissions this account has. The account you choose should be an
account created specifically for this service and not an account that an actual person will use.
It’s also not recommended that you share accounts for different services or servers.
Most Windows operating systems include three built-in accounts under which you can run
the SQL Server services: the Local System, Network Service, and Local Service accounts.
They differ in the following ways:
• Local System. This is a highly privileged account that can access most resources on the
local computer. It isn’t recommended that you use this account.
• Network Service. This account has the same level of access as the Users group on the
local computer. When it accesses network resources, it does so under the context of the
local computer account.
• Local Service. This is a built-in account that has the privileges of the local Users group
on the computer. This is the best choice for SQL Server services if you must use a
built-in account. Network resources are accessed with no credentials and a null session.
All other services have their own default and optional accounts, as shown in Table 5-3.

Table 5-3
Service account defaults for SQL Server services

S ERVICE D EFAULT A CCOUNT O PTIONAL A CCOUNTS


SQL Server Domain User Local System, Network Service
SQL Server Agent Domain User Local System, Network Service
Analysis Server Domain User Local System, Network Service, Local Service
Report Server Domain User Local System, Network Service, Local Service
Notification Services N/A N/A
Integration Services For Windows Server 2003 and 2008: Domain User, Local System, Local Service
Network Service
For Windows 2000 Server: Local System
FullText Search Same as SQL Server Domain User, Local System, Network Service, Local Service
SQL Server Browser Domain User Domain User, Local System, Network Service, Local Service
SQL Server Active Network Service Local System
Directory Helper
SQL Writer Local System N/A
118 | Lesson 5

When deciding on a service account, you must consider how the service will be used and
TAKE NOTE
* under what security parameters your server will run. The recommendations are general guide-
SQL Server doesn’t lines and should be followed unless your environment dictates particular reasons to deviate
configure Notification from them.
Services. An administra-
tor must do this using The following sections address the types of account along with the types of situations when
an XML file. you should use them and reasons for using each.

WARNING The Express


Choosing a Domain User
editions of SQL Server use differ-
ent accounts on some platforms. The general guideline for most services is to use a Domain User account. If your domain
Consult Books Online for the vari-
ances for SQL Server 2005 Express administrators follow the guidelines for granting privileges to domain accounts, the
edition. EVERYONE and other global groups won’t have any privileges. The Domain User account
that is used will be granted the appropriate privileges by the SQL Server setup program on
the local machine, both to run as a service and to access files on the local machine.

Additional privileges may be required in the following types of situations and can be granted
to a Domain User account as needed:
• The need to access a drive (local or network) to read and/or write files.
• The need for heterogeneous queries accessing another data source.
• The need to work with replication.
• The need to use mail services. For Microsoft Exchange, this is necessary, but other mail
systems (like those used by Database Mail or SQL Agent) may require a Domain User
account as well.
If you need to grant additional rights to this account, it’s recommended that you create new
groups and grant the rights to those groups with this Domain User included in those groups.
The groups can be local groups for this Windows host or domain-level groups that provide
access to remote machines.

Even though this section specifies Domain User, it could be a local user account on the
TAKE NOTE
* local Windows host for a standalone server.

Choosing a Local Service

The Local Service built-in account is used for running services on the local machine
with a limited set of privileges. This account has the same rights and privileges as any
authenticated user, which it receives as part of the Users group. As with any well-secured
machine, you should grant few, if any, additional rights to this group.
TAKE NOTE
*
A full list of permissions
granted is in Books
This account is a good choice for services that don’t access any resources outside of their own
Online under “Setting
services. Because SQL Server setup adds this account to the appropriate group (as shown
Up Windows Service
earlier in Table 5-2), it receives access to certain folders in order to run properly, but it has no
Accounts.”
rights outside of those minimal permissions.
If you have a service that requires access to additional folders beyond those permissions
granted by SQL Server, it’s recommended that you choose a Domain User account.
Designing Windows Server-Level Security | 119

Choosing a Network Service


CERTIFICATION READY?
Ensure that you
understand the effects of The Network Service built-in account has the same rights as members of the Users group
using an Active Directory on the local Windows machine, but it also has the ability to access network resources
user versus a local user using the computer’s domain account. This account provides limited credentials, but it
account. allows some level of network access.

If you’re performing anonymous file transfers, this account works for Integration Services; but
WARNING Microsoft highly
recommends that you don’t use if you must copy files to or from secured folders on your domain, a Domain User account is
this account for SQL Server or SQL recommended. This account works well for the Active Directory Helper, and it’s recommend-
Server Agent. ed that you leave this account set for that service.

Choosing a Local System

The Local System account is the most powerful account on the local machine. It’s
equivalent to an Administrator, but it has additional system-level access that even the
Administrator account doesn’t have. Because this is considered a privileged account, it
isn’t recommended that SQL Server or any of its services run under this account.

Case Study: Planning for Services

Prior to SQL Server 2005, most computers had just two services running: SQL Server
and SQL Server Agent. For many servers that didn’t require access to network resources
(such as mail), most users ran their servers under either the Local System or the
Administrator account. This usually occurred because nobody planned the SQL Server
installation prior to installing the software, and, when faced with the need to choose a
service account, they made one of the two previous choices.
Both are poor choices because they grant rights to the server that aren’t needed, and any
security breach could result in more problems than just a loss of data.
With SQL Server 2005 and its 10 potential services, it’s strongly recommended that you
plan your installation and create user accounts for each of your services prior to installing
the SQL Server 2005 software.

The general guideline is to create a user account for all instance-aware services and
Integration Services. In the next section, you’ll examine how you change to the user account
you’ve created.

Changing Service Accounts

As mentioned, if you need to change the service account, you should use the SQL Server
Configuration Manager to ensure that the proper rights and settings are granted to the
new account. This program is installed with SQL Server and is located in the SQL Server
2005 group under Programs, in the Configuration Tools section.
120 | Lesson 5

When you start this tool, it has three main sections, as shown in Figure 5-3: SQL Server 2005
Services, SQL Server 2005 Network Configuration, and SQL Native Client Configuration.
This section will examine only the first section.

Figure 5-3
Using the SQL Server
Configuration Manager

As shown in Figure 5-3, the SQL Server Services section shows the services that are running
on one particular server. In this case, five services are running on this server, with one named
instance:
• SQL Server Integration Services. As mentioned, there is only one copy of this service
for each Windows host. Additional instances share this one service.
• SQL Server FullText Search. Each instance that has this enabled will have one service
listed here.
• SQL Server. This is the main database engine, shown here as an instance named SS2K5.
• SQL Server Agent. This is another instance-aware service. In this case, the service is
running for instance SS2K5.
• SQL Server Browser. This is one of the instance-unaware services, meaning this is the
only copy of this service regardless of how many instances are installed.
To the right of the service name, a number of pieces of information appear, as shown in
Figure 5-4. These include the status, start mode, service account, process ID, and type. The
status and start mode will be discussed later in this Lesson in the “Working with Services”
section. The main item you’re concerned with here is the service account. In this example,
three different service accounts are in use.

Figure 5-4
Viewing SQL Server services
in Configuration Manager

The default Network Service account is running Integration Services, and SQL Server Agent
for this instance has its own account. SQL Server, FullText Search, and SQL Server Browser
all share the same service account, which isn’t recommended.
You can change the service accounts using the Services Control Manager in Control Panel
or Manage Computer, but it isn’t recommended that you do this for SQL Server. The
Configuration Manager is specifically designed to ensure that the proper permissions are set
up for any service accounts. This includes the file-level permissions for accessing files and
folders as well as the necessary service-level rights. Table 5-4 shows the service rights needed
for each service.
Designing Windows Server-Level Security | 121

Table 5-4
Service account service rights S ERVICE S ERVICE R IGHT
needed
SQL Server Log on as a service, Act as part of the operating system (Windows
2000), Log on as a batch job, Replace a process-level token,
Bypass traverse checking, Adjust memory quotas for a process,
Permission to start SQL Server Active Directory Helper, Permission
to Start SQL Writer
SQL Server Agent Log on as a service, Act as part of the operating system
(Windows 2003), Log on as a batch job, Replace a process-level
token, Bypass traverse checking, Adjust memory quotas for a
process
Analysis Server Log on as a service
Report Server Log on as a service
Integration Services Log on as a service, Permission to write to application event log,
Bypass traverse checking, Create global objects, Impersonate a
client after authentication
Notification Services N/A
FullText Search Log on as a service
SQL Server Browser Log on as a service
SQL Server Active None
Directory Helper
SQL Writer None

WARNING Choose a strong


If you manually change the service account, you can easily forget to grant a permission, which
may result in the service not running or not running properly. You may also inadvertently
Warning!

password for this user account.


It can always be changed and grant too many rights to the service, which can result in a poorly secured environment.
the service restarted with a new
password. For services, you don’t
Because the Configuration Manager is as easy to use as any other tool, you should only use
want a weak password that this tool to change service accounts.
someone can guess. This is a user
account, so it can be used to log Before you use this tool, however, be sure you’ve already set up the appropriate user accounts
on to your network. on your local computer or on the domain. This tool doesn’t allow you to create a new
account, only select an existing account.

Be sure you don’t select the User Must Change Password at Next Login check box on the
new account. The service has no way of doing this and won’t start. The recommendation
TAKE NOTE
* is that you also select the Password Never Expires check box to ensure that this service
doesn’t stop unexpectedly. This doesn’t mean the password should never be changed, only
that it should be manually changed, not forced.

LAB EXERCISE

Perform the exercise in your lab In Exercise 5.4 Part B, you’ll change the service account for the FullText Search service.
manual.
122 | Lesson 5

■ Setting Up Antivirus Software

Many antivirus software vendors supply two licenses per user so the home office computer
can also be protected against contaminating the enterprise server. In both cases, though,
THE BOTTOM LINE
antivirus software is “reactionary”—it can’t be updated until after an attack has been detected
somewhere in the world. If you happen to have the first server attacked, even your diligent
efforts won’t help—your systems can still be infected. Plan a control strategy should the
unwanted occur.

Many SQL Server instances run unattended and provide a network service to clients, with the
Windows operating system providing a host for SQL Server. In these cases, antivirus software
shouldn’t be necessary. However, in some cases Windows provides other software services such
as file serving, e-mail, or some other process, and antivirus software is warranted.
There is no reason SQL Server and an antivirus software application can’t exist together, but
you must appropriately configure the antivirus software. For most applications, the default
configuration will cause the SQL Server to perform poorly.

Some companies require antivirus software on all machines. It isn’t worth arguing about
TAKE NOTE
* this necessity on a dedicated SQL Server. Instead, work with the network administrators to
properly configure the software.

An antivirus program works by hooking into the disk access drivers and validating every
attempt to write to a file. In this way, it prevents a malicious program from altering a file and
writing a virus into the file that will execute or propagate when the file is accessed.
SQL Server requires file accesses whenever it performs an insert, update, delete, or other opera-
tion that changes data, which means the antivirus program by default scans the data and log
files for each operation. Because data and log files are often megabytes, gigabytes, or even larger
in size, this can cause the server to “halt” while the antivirus software completes its scan.
For this reason, it’s highly recommended that you configure your antivirus software to exclude
the following files:
TAKE NOTE
* • Database data files. .mdf and .ndf files.
Some environments
• Database log files. .ldf files.
should exclude specific
files only and not whole • Backup files. Usually .trn, .dif, and .bak files, but whatever backup extensions you use
directories. Although should be excluded.
this is possible, it may In addition, you may want to exclude files in the following scenarios:
cause issues with backup
files, which often have • Quorum drives. In a clustered situation, you should exclude the quorum drives completely.
a unique name for each • Replication. You may want to secure tightly access to any folders where temporary repli-
backup. Try working cation files are written, and exclude them from scans.
with your network • SQL Server log files. Only SQL Server should write to these files. They can grow quite
group for an exception large, so exclusion prevents any slowdowns on your server.
in this case.
• Log shipping files. You should especially exclude these files on the standby server.
It’s possible that other files will cause issues with your SQL Server installation. After you’ve
installed both SQL Server and your antivirus software, examine the logs of the antivirus
software and be sure you aren’t scanning files that conflict with the operation of your
database server.
Designing Windows Server-Level Security | 123

■ Working with Services

THE BOTTOM LINE Minimize the number of enabled services to minimize unneeded overhead.

Earlier in this Lesson, you learned how to manage service accounts for SQL Server to control
different parts of the instance. However, you can enable many more services on the instance sep-
arate from those examined earlier. Each of these services has a security implication and should
be disabled unless needed. As part of adherence to the Trustworthy Computing Initiative, SQL
Server is installed in a “secure by default” mode, and most of these services are disabled. This
section will explain how to enable these services as well as the security impact of each.
When you install SQL Server and whichever options are required for your installation, only
certain services are set to automatically start up when the computer boots. Table 5-5 shows
the components of SQL Server and the startup state of each after setup completes.

Table 5-5
Services default mode S ERVICE D EFAULT M ODE AFTER S ETUP
SQL Server Started
SQL Server Agent Stopped
Analysis Services Started
Integration Services Started
Report Server Started
Notification Services N/A
FullText Search Stopped
SQL Server Browser Stopped
SQL Server Active Directory Helper Stopped
SQL Writer Stopped

Not all of these components are installed by default, and they may not be present on your
systems. If you choose to install them, however, the mode listed in Table 5-5 is the mode they
will be in unless you selected the autostart options in the setup program.
Each service’s mode should remain stopped unless you’re using the service on this server. For
example, the FullText Search service may be installed, and you may plan on using it, but until
you create a full-text index and require its update, don’t start this service. If you’ve installed
services but aren’t using them, set them to disabled until such time as you have an application
that requires them.
Although you can use the Service Control Manager in the Control Panel for the Windows
host to change modes, it’s recommended that you use the SQL Server Configuration Manager
for all service changes relating to SQL Server. Regardless of the mode a service is set to, the
administrator can change this mode to one of the following three modes:
WARNING Don’t install all
components on servers by default. • Automatic. In this mode, the service account attempts to log on and start the service
Make sure you need Integration when the Windows server boots.
Services, Report Server, or any other
service before installing it. It’s a • Manual. In this mode, the service account doesn’t log on and start the service when
tenet of the Trustworthy Computing Windows starts; but a start message can be issued to the service, and it will attempt to start.
Initiative that installations should
be secure by default; installing all • Disabled. In this mode, the service can’t be started with a start message. The administra-
components as policy violates this. tor of the Windows host needs to change the service to Automatic or Manual.
124 | Lesson 5

Most services that are being actively used, such as the database server, analysis server, and so
on, should be set to Automatic so they’re available any time the server is running. Services
that you use rarely can be set to Manual or Disabled to prevent them from starting when the
server boots. If you no longer need a particular service on one of your servers, you should set
LAB EXERCISE it to Manual or Disabled and stop it.
Perform the exercise in your lab In Exercise 5.5, you’ll disable a service.
manual.

■ Configuring Server Firewalls

Most firewalls in corporate environments have dedicated hardware devices that function as
THE BOTTOM LINE highly configurable routers with rules specifying the security rules for traffic passing through
them. This must be configured according to your business needs.

However, as the number and variety of threats have proliferated, many operating systems have
started to integrate and run software firewalls alongside other services. Most of the platforms
on which SQL Server runs include a software firewall that needs to be configured to allow
SQL Server to access and be accessed from clients and other servers.
If you have a firewall enabled, then you must make sure the ports used for SQL Server are
open for communication with those clients that need it. For the default instance, this usually
means that ports 1433 and 1434 are open, but named instances choose a port on startup by
default. In order to secure these instances with your firewall, you need to use the SQL Server
Configuration Manager to assign a specific port to these instances for communication with
their clients.
Table 5-6 lists the various services and the ports that they require for communication. For
named instances of these services, you can specify specific ports to be used.

Table 5-6
TCP port numbers used by S ERVICE P ORT N UMBER
services
SQL Server default instance 1433, 1434 (UDP)
SQL Server named instance Chosen at startup
Integration Services DCOM ports (consult Windows OS documentation), 135
Analysis Services default instance 2383
Analysis Services named instance Chosen at startup
Analysis Services Browser 2382
Named pipes connections 445
Report Server (through IIS) 80
Endpoints Specified endpoint TCP port used in endpoint setup

TAKE NOTE
* Like the services installed on your server, the ports used by the services shouldn’t be opened
Be sure you also close
on a firewall unless they’re being used. Keep open the minimum number of ports required
ports if services are no
for the server to meet your needs. For example, if Integration Services is accessed only on the
longer being used.
local server and not across the network, then don’t open these ports.
Designing Windows Server-Level Security | 125

Physically Securing Your Servers

Every server that you have running in your enterprise should be physically secured from
unauthorized access. There are many ways of enforcing security and protecting your
server through software, but most of these can be circumvented if the server can be
CERTIFICATION READY?
When examining
physically accessed or attacked. The local file system security can be bypassed if someone
security, be sure you can boot a server from another source, and this can lead to security-related files or data
grasp the breadth and files being copied and the data compromised.
depth of this topic. Do
you understand how
SQL Servers are no exception. But because they can be easily set up on many platforms and
authentications, physical
barriers, firewalls,
are used in testing new solutions, sometimes the servers’ physical security isn’t maintained as
disaster recovery plans, they’re moved to an employee’s office or cubicle.
business recovery plans, If you’re storing enterprise data on a SQL Server, the server should be stored in a physically
risk analyses, policies, secure manner. This means behind a locked door with a limited number of people able to
enforcement, incident
access the machine. Access controls that log and control which individuals can access the
response plans, and
forensic investigations all
room are preferred; they’re even mandated in some environments.
interact? SQL Servers often have large disk subsystems, so be sure the disks are secured to prevent their
physical theft. Due to the large data sets, tape backup systems are often used. Make sure phys-
ical control over these tapes is maintained and they aren’t allowed to sit on a desk or other
unsecured area where unauthorized people have access to them.

S K I L L S U M M A RY

This Lesson has investigated how to design Windows server-level security. The server-level
policies provide the highest level of security for SQL Server. Your password and encryption
policies should provide the level of security you need, balanced with the performance required
on your server. The services, service account, and firewall policies should be set to the
absolute minimums required for each server. Enabling all services or opening all possible ports
increases the surface area available for attack on your server unnecessarily. Configure and
make available those items only when you need them, and disable them when they’re no
longer needed.
Security is an ongoing process and should evolve as your server changes. Developing policies
and procedures that make the least amount of resources available from a security perspective
will help to ensure that you’re protected and that your server functions in an optimum
manner at all times.
For the certification examination:
• Understand the SQL Server password policy. You should know the options for password
policies in SQL Server and the impact of each one.
• Understand the different SQL Server encryption options. You should know how encryption
is configured at the server level in SQL Server.
• Know how to properly configure a service account. SQL Server has different sections that
require service accounts, and you need to know how they should be configured.
• Understand how antivirus software interacts with SQL Server. You should be able to
configure antivirus software to coexist with a SQL Server instance.
• Know how to enable and disable services. SQL Server consists of multiple services, and
you should understand how and why to enable or disable them.
• Understand how server-level firewalls interact with SQL Server. A server-level firewall is a
software service that runs alongside a SQL Server instance. Understand how these interact
and how they should be configured.
126 | Lesson 5

■ Knowledge Assessment

Case Study
The Ever-Growing Wealth Company
The Ever-Growing Wealth Company manages retirement funds for many people
and is concerned about the security of its data. To ensure that its database servers are
adequately protected, the company decides to review and revamp its security policies.

Planned Changes
The company’s management thinks the security policies for its applications must be
strengthened and that encryption needs to be deployed. However, these changes can’t
cause problems in the event that disaster-recovery plans must be implemented.

Existing Data Environment


The company currently has two SQL Servers that separately support two different
applications. A third SQL Server receives copies of all backups immediately after they’re
completed and is available in the event of a disaster. One of these, called SQLWeb,
supports the company Web site on the Internet. The other, SQLTrading, supports the
portfolio management and trading application.
SSIS is expected to be used to move some data between these two servers.

Existing Infrastructure
All these servers are stored in the company’s data center, which is a climate-controlled,
converted office in the company’s current location. The company would like to move all
its servers to a co-location facility with a dedicated network connection back to the office.
Currently, a tegwc.com domain contains two main organizational units (OU), one for
the internal employees and one for any client accounts.
The two SQL Servers are named instances that use dynamic ports. A firewall protects
the entire network, but all servers exist in a flat Ethernet topology as shown in the Case
Exhibit of this case study.

Business Requirements
The clients of Ever-Growing Wealth expect to be able to access their data at any time
of the day or night. The existing disaster-recovery plan allows system administrators a
five-minute response time to failover the SQL Servers, and this is deemed acceptable.
However, it can’t take more time than this to get the application running.
The company expects that regulatory requirements will be enacted soon for all financial
companies, so the strongest encryption possible is preferred, balancing the performance
of the servers. Newer hardware is available to make up for any issues from the imple-
mentation of encryption.

Technical Requirements
For the new servers, the company purchased the next generation of hardware to allow
for the additional load of encrypting data. However, complete encryption of all data
using asymmetric keys will likely overload these servers; therefore, the security policy
must work within these hardware constraints.
Each instance has a SQL Server Agent service that performs various functions,
including copying backup files to another server and running business maintenance
jobs that access the mail server.
Designing Windows Server-Level Security | 127

The existing named instance configuration can’t be changed because it’s mandated by
the disaster-recovery plan.
Network firewalls are set up to protect the internal network, but it has been decided to
also use the built-in Windows firewalls.
The existing applications use SQL Server logins from clients to access data. This
structure can’t be changed, but better security can be built into the application to take
advantage of SQL Server’s capabilities.
Case Exhibit
Internal File Server
Internet Internal PCs

Ethernet
Firewall

Web Server WebSQL TradingSQL

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions.
1. The default Windows 2003 password policy has not been changed. Which of the
following passwords would be acceptable for a SQL Server login named BillyBob?
a. Kendall01 b. KityK@t
c. BillyBob2$ d. Barnyard
2. The company will continue to use SQL Server logins for its applications, but it will
reissue passwords to all its clients. Which of the following password options should you
check to meet the business requirements? (Choose as many as needed.)
a. Enforce Password Policy
b. Enforce Password Expiration
c. User Must Change Password at Next Login
d. Require Complex Password
3. You are planning to change the service accounts for the SQL Server 2005 database
instances. To ensure that you meet the business requirements for disaster recovery, which
account should you choose for each SQL Server Agent named instance?
a. Local System b. Local Service
c. A Domain User d. Network Service
128 | Lesson 5

4. After installing your SQL Server 2005 server, it appears to be running very slowly.
Investigation reveals that the mandatory antivirus software is scanning your database
files. What should you do?
a. Remove the antivirus software.
b. Disable the antivirus software.
c. Stop the software from scanning the drive where the SQL Server executables are
located.
d. Stop the software from scanning the data and log files.
5. Because the clients for the Ever-Growing Wealth Company renew their contracts for
services annually, you want those clients who do not renew their contracts to have their
access revoked automatically. What type of encryption supports this?
a. Use a DES key to encrypt data, and require it to change every year.
b. Use certificates issued to each client that the application will use to authenticate
users.
c. Use an asymmetric key that you generate and send to clients to install on their
computer with the application.
d. Set a password age of 365 days, and force clients to change their password through
the application when it expires.
6. Based on the information in the case study, how many services will be running on each
active SQL Server instance?
a. 1 b. 3
c. 5 d. 10
7. Which type of encryption is recommended for sensitive data on each server?
a. Shared DES symmetric keys used to secure DES keys that encrypt data
b. Shared Triple DES symmetric keys used to secure DES keys that encrypt data
c. RSA 1024 keys used to secure AES_256 keys that encrypt data
d. Certificates used to secure AES_256 keys that encrypt data
8. How many instances of Integration Services need to be installed on the spare SQL Server
2005 server for disaster recovery if both the WebSQL and TradingSQL servers could
failover at the same time?
a. Zero, and another SQL Server 2005 server is needed
b. One for both instances
c. Two—one for each instance
9. A certificate is which type of security mechanism?
a. Symmetric
b. Asymmetric
10. In developing a strong security infrastructure, you decide to install firewalls to protect
the internal network from the servers as well as an additional firewall that segregates the
external web server and WebSQL SQL Server 2005 server from the other servers. What
other actions should you take? (Choose as many options as needed.)
a. Configure each instance to use a specific TCP/IP port for communicating with
clients.
b. Configure each server’s firewall to allow port 1433 (TCP) through.
c. Configure each SQL Server server’s firewall to allow a specific TCP port through to
each SQL Server server depending on each database server’s TCP/IP configuration.
d. Configure each named instance to use port 1433 (TCP).
Designing SQL Server L ESSON 6
Service-Level and
Database-Level
Security
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Design SQL Server service-level security. Foundational
Specify logins. Foundational
Select SQL Server server roles for logins. Foundational
Specify a SQL Server service authentication mode. Foundational
Design a secure HTTP endpoint strategy. Foundational
Design a secure job role strategy for the SQL Server Agent Service. Foundational
Specify a policy for .NET assemblies. Foundational
Design database-level security. Foundational
Specify database users. Foundational
Design schema containers for database objects. Foundational
Specify database roles. Foundational
Define encryption policies. Foundational
Design DDL triggers. Foundational

KEY TERMS
data definition language commands such as INSERT, independently of the database
(DDL): A subset of T-SQL UPDATE, and DELETE. user who created it; a schema is
commands that create, alter, and role: A SQL Server security account a container of objects. A schema
delete structural objects such as that is a collection of other security can be owned by any user and
tables, users, and indexes in SQL accounts that can be treated as its ownership is transferable.
Server. a single unit when managing scope: A division of SQL Server’s
data manipulation language permissions. A role can contain security architecture (principals,
(DML): A subset of T-SQL SQL Server logins, other roles, and permissions, and securables)
commands that manipulate data Windows logins or groups. that places securables into
within objects in SQL Server. schema: Each schema is a server-scope, database-scope,
These are the regular T-SQL distinct namespace that exists and schema-scope divisions.

129
130 | Lesson 6

SQL Server runs on top of a Windows operating system host, but it is a full system
inside itself. There are a number of server-level security features that can be configured
and must be properly set up to ensure the entire database service is secure.

In the previous Lesson, you looked at securing SQL Server from the Windows level, the highest
level of security assignment. This Lesson examines many of the server-level SQL Server security
items that affect the entire database server, such as logins, server roles, endpoints, SQL Server
Agent, and .NET assemblies, as well as the high-level principals in the database. The next
Lesson will delve further into the server and examine the security of individual objects.
All the entities that can request resources in SQL Server—in other words, those logins, users,
X REF or processes that can perform queries and make changes to the server—are known as princi-
Lesson 7 covers more pals. The securables are the resources that the principals can access. In SQL Server, there are
aspects of securables. three levels of principals: the Windows level, the SQL Server level, and the database level.
This Lesson looks at logins (the first two levels) as well as users and roles (the third level).

■ Creating Logins

In order for a user of any sort, including some of the SQL Server components, to access
THE BOTTOM LINE data or perform a job in the SQL Server, the user must log in to the server.

A login is required to gain access to resources, although the login itself doesn’t grant the user
any rights other than the ability to connect to the server. A login is one of the principals in
SQL Server, an entity that can request resources from the server.
Just as a user must log in to Windows, either a local machine or a domain, a user must also
log in to SQL Server. Each SQL Server is separate, and they don’t share logins, although SQL
Server has the provision to trust another entity with authentication and enable a user to carry
X REF that authentication through to SQL Server. SQL Server uses two types of logins:
Lesson 4 examined • Windows-authenticated logins
Windows authentica- • SQL Server–authenticated logins
tion versus SQL Server
authentication. There are two main differences between these logins from the SQL Server administrator point
of view, but the server treats them the same once the user logs in.
The first difference is that SQL Server “trusts” Windows-authenticated logins in that it
assumes the local machine security system or the Active Directory (AD) domain has authen-
X REF
ticated the user. SQL Server accepts the token presented by Windows, and, if this user has
Lesson 5 examined a matching login, no further authentication is performed. Windows authentication allows
how passwords are a user or a group to be added to SQL Server, which lets an administrator take advantage of
treated for SQL Server– existing Windows groups for security assignment in SQL Server.
authenticated logins.
The second difference is that SQL Server logins are individual users only; no groups are available
using this type of login. However, SQL Server can take advantage of some of the advantages of
Windows logins by enforcing password policy in some cases.
X REF
You can add both types of logins the same way; the only difference is that you must specify a
See the “Mapping password for SQL Server–authenticated logins. Creating a login, however, doesn’t grant rights
Database Users to Roles” to a particular database. That requires a user to be mapped to this login.
section in this Lesson.
One special login can’t be removed from the server: the SQL Server system administrator,
or sa, login. This login is built into SQL Server and is similar to the Administrator login on
Windows. The sa login is the highest-privileged login, is a member of the sysadmin server role,
and can perform any operation on the server. You can rename this user, but you can’t drop
sa; you also can’t remove sa from the sysadmin role. The sa login is disabled if SQL Server
Authentication isn’t enabled; however, it can also be disabled manually by an administrator.
Designing SQL Server Service-Level and Database-Level Security | 131

The sa user always is assigned SID0x01, regardless of the name. You can see this by querying
TAKE NOTE
* master.sys.syslogins.

LAB EXERCISE
Exercise 6.1 shows how to add a login; additional options for this login are detailed throughout
Perform the exercise in your lab
this Lesson.
manual.
As mentioned previously, creating a login doesn’t grant any rights to the actual client other
than the ability to connect to the server. In fact, without a user mapping to the default data-
base, the login will fail. This happens because the login process sets the initial context for the
user to the default database if one isn’t specified, and after verifying the login, a user mapping
is required to establish the session.
Before examining the user mapping, the next section looks at the server roles available to
the login.

■ Granting Server Roles

A role in SQL Server is analogous to a group in Windows. You can grant certain rights to
THE BOTTOM LINE a role and then add one or more users to the role to receive those rights. SQL Server has
three types of roles: server roles, application roles, and database roles.

This section examines server roles, and subsequent sections will discuss the other roles.
The server roles in SQL Server are fixed in that their rights are predetermined, and you can’t
add, change, or delete the roles. Table 6-1 describes the available server roles.

Table 6-1
Server roles in SQL Server
R OLE D ESCRIPTION
sysadmin This role grants its members rights to all functions on the server and
defaults to dbo as a user in each database.
serveradmin This role can change the serverwide configurations of SQL Server and
initiate a shutdown.
setupadmin This role can work with linked servers (add, configure, and remove).
securityadmin This role works with logins (add, edit, and drop) as well as grant
server-level permissions (GRANT, REVOKE, and DENY) and works with
database-level permissions. This role can change SQL Server login
passwords.
processadmin This role can terminate processes running on SQL Server.
CERTIFICATION READY? dbcreator This role can create, alter, or drop any database.
Ensure that you know diskadmin This role can manage the disk files on SQL Server.
the fixed server roles.
Expect at least one exam bulkadmin This role allows its users to execute the bulk-insert functions of
question on this topic. SQL Server

You can grant the fixed server roles to any login on your server. This includes any groups that
are added as Windows-authenticated logins. Using Windows groups allows you to manage
your security at the Windows level and ensures that you don’t have a mismatch between the
Active Directory mappings for your employees and their capabilities in SQL Server.
132 | Lesson 6

Granting a server role to a login should follow the same principles discussed in Lesson 4 of
using the least privileges required for a particular function.
By default, the local Windows Administrators group (BUILTIN\Administrators) is added as
a login and placed in the sysadmin role. This usually means all your Active Directory domain
administrators are SQL Server system administrators by default. This violates the separation
of duties as a best practice and should be changed as soon as you’ve created a separate system
LAB EXERCISE administrator login.
Perform the exercise in your lab In Exercise 6.2, you’ll add two server roles to the Delaney login you created in Exercise 6.1.
manual.
Mapping Database Users to Roles
X REF
A database user is a principal that is authorized to access a particular database and exists
See the “Granting in the database public role.
Database Roles” section
in this Lesson.
Every login must be mapped to a user to allow it to access a database, including the default
database.
TAKE NOTE
*
The exception is if the One user, the guest user, is created by default in each database, and it’s assigned to any login that
guest user exists. A login doesn’t already have a user mapped to it in a database and has the rights of the public role. This
can access a database means that by default a user can access any database on your server if the guest user exists in that
other than its default database. As a security precaution, you should remove the guest user from all your databases.
using guest. You grant users rights to individual objects, and you can include users in roles. The recom-
mendation is that you shouldn’t grant any rights to a user and should instead include users
in one or more roles to receive their permissions. This is similar to the recommendation for
MORE INFORMATION rights not being granted to Windows users, only Windows groups, with those users included
You learn of other roles later in groups for permissions.
in this Lesson in the “Granting
You can create users when a login is created or add users later and map them to existing
Database Roles” section.
logins. Because a login requires a user mapping in the default database for the login to
succeed, at least one user is usually created when a login is created. Additional users can be
LAB EXERCISE mapped at a later time. Exercise 6.3 shows how a new user can be created and mapped to the
Perform the exercise in your lab Delaney login from Exercise 6.1.
manual.
Although logins must be mapped to users, the reverse isn’t true: Users don’t necessarily have
to be mapped to logins. When this is the case, the user may be in one of two states: orphaned
or mapped to an asymmetric key mechanism.
Orphaned users occur usually when a database is restored on a different server from the one on
which it was created. The mapped login may not exist or may have a different SID on the new
server. These users need to be remapped using the sp_change_users_login stored procedure,
which will remap the user to another login or create a corresponding login that is mapped to
the user. You can find more information about sp_change_users_login in Books Online.
When a user is mapped to an asymmetric key mechanism, it can be either an asymmetric key or
a certificate that exists in the database. This mapping takes place when the user is created and a
specific key that already exists is mapped to the user with the CREATE USER statement.
When a user is mapped to a key mechanism, it means that a connection is made using Service
Broker or another service that supports certificates or asymmetric keys. The login is made using
the certificate or key and then mapped through to the user who is mapped to that certificate. All
other security checks are then made on this mapped user just as with any other database user.

Certificate-based authentication and key-based authentication aren’t available with client


TAKE NOTE
* connections from tools such as Management Studio and SQLCMD.

The next two sections will examine how you handle additional database security with schemas
and database roles.
Designing SQL Server Service-Level and Database-Level Security | 133

■ Securing Schemas

A schema is a way of dividing the objects in a database into a namespace, which is a


THE BOTTOM LINE domain where each object has a unique name.

A schema is in some ways tightly bound to database users but in other ways shares no implicit
connection; however, it’s included as part of the SQL-92 standard specification. A database
may contain many schemas, or only the default namespaces created by the server: dbo, sys,
and INFORMATION_SCHEMA, along with the schemas for each fixed database role.
A little history on this topic will help to explain how it works. Prior to the 2005 version of
SQL Server, there was the concept of an owner for each object. This was the third part of the
four-part naming structure. Each object in a database followed this form:
server_name.database_name.owner_name.object_name
For example, say the user Tia created a table called Horses. The full name of the table would
be Tia.Horses in the database, and no other object could have that name. There could be
another table created by the database owner, dbo, as in dbo.Horses, but it would be a separate
object from Tia.Horses, with its own data, permissions, and storage.

The four-part naming structure in SQL Server 2005 and 2008 is no longer server.database
TAKE NOTE
* .owner.object as it was in SQL Server 2000. It’s now server.database.schema.object.

A user seeking to query the Horses table would need to know whether they wanted to query
Tia.Horses or dbo.Horses. A simple select * from Horses by the user Brian would default to
querying the Brian.Horses table and then dbo.Horses if the first one didn’t exist. In other
words, each database user had their own implicit schema based on their username.
Although this was a workable method of separating objects into their own namespace, it cre-
ated problems when users needed to be removed from the database. Because a user can’t be
dropped when they own objects, a user who owned a large number of objects would need all
those objects moved to a new owner—essentially, a new namespace. This created a tremen-
dous amount of work in a database of any size and required application changes where code
explicitly referenced a particular namespace.
Starting with SQL Server 2005, the schema has been separated from the owner of an object.
Now a schema creates the namespace, and although it’s owned by a database user, removing
the user merely requires reassigning the schema to a new owner. All the namespaces remain
the same because the schema name doesn’t change and the owner has no effect on the security
of the namespace. This is known as user-schema separation.
A schema is essentially a grouping mechanism in a database; allowing a number of objects
to be classified in the schema’s namespace makes it easy to assign higher-level permissions to
roles or users by granting them permissions to the schema or making the schema their default
schema. Just as a role lets you group users, the schema lets you group objects.
Every object in the database must belong to a schema. If one isn’t specified when the object
is created, the object falls under the default schema of the user creating the object. A schema
can have any valid SQL Server name, but the schema names in a database constitute their
own namespace and must be unique. Often, a client-side application or a portion of one
is used to group a number of objects, tables, views, stored procedures, and so on, into a
single namespace and permissions assigned to that namespace to simplify maintenance.
For example, if the application deals with human resources data as managed by a client
application developed in Access Projects, the schema might universally for that software
program be personnel.
134 | Lesson 6

In large applications, it is common for the application code to have its own authentication
TAKE NOTE
* method. In such cases, the use of users and schemas in SQL Server is much less important. In
The default schema for such cases, the use of Application Roles could be important. Application Roles are discussed
users is the dbo schema. later in this lesson.
The default namespace for a user appears on the user Properties page (shown in Figure 6-1). You
can change this to any schema in the database using this dialog box or the T-SQL ALTER
USER command.

Figure 6-1
User default schema

Each schema also has an owner who owns it like any other object. Figure 6-1 shows the
check boxes just below the default schema that you can use to specify ownership of schemas
by this user. A role can also own a schema, as shown in Figure 6-1. Because a role can own a
LAB EXERCISE schema, multiple users can own a schema, through role and/or group membership or specific
Perform the exercise in your lab inclusion, simplifying permissions for that group of objects. Exercise 6.4 walks you through
manual. creating a schema and assigning an owner.

■ Granting Database Roles

Use roles to define access categories. Then control the permissions using the role container.
THE BOTTOM LINE
Add or remove users, as needed, to accommodate changing employee assignments.

Database roles are the last of our internal security mechanisms for assigning rights to princi-
WARNING The exception
pals and allowing access to securables. You are introduced to the three types of database roles
to the previous paragraph is the in this section: fixed database roles, user-defined database roles, and application roles.
public role. Because every user is
a member of this role, any rights You use these roles to group users so you can easily assign permissions to them. It’s recom-
to this role are extended to every mended that permissions not be assigned to individual users, only to roles, just as it’s recom-
database user. It’s recommended mended in AD to assign ACL permissions to groups instead of individual users. Then, you
that no right be granted to this
role and that you instead use a can add users to the role to receive the appropriate rights. This approach greatly reduces the
user-defined role. administrative burden when objects and users are added to and removed from the database.
The three types of roles are all used differently.

Working with Fixed Database Roles

Fixed database roles are analogous to the server roles discussed earlier. These are roles cre-
ated by SQL Server with specific rights that can’t be changed; these roles can’t be deleted.
Table 6-2 describes the fixed database roles.
Designing SQL Server Service-Level and Database-Level Security | 135

Table 6-2
Fixed database roles R OLE D ESCRIPTION
db_accessadmin This role allows members to grant or revoke database access to logins.
db_backupoperator This role allows its members to back up the database.
db_datareader This role allows members to read data (SELECT) from all user tables
in the database.
db_datawriter This role allows members to change data in all user tables (INSERT,
UPDATE, and DELETE). It doesn’t imply that you can read data.
db_ddladmin This role can carry out any data definition language (DDL) statement
in the database.
db_denydatareader This role is prevented from reading data from all user tables.
db_denydatawriter This role is prevented from adding, changing, or deleting information
from any user table in the database.
db_owner This role is the highest-level role in the database and can perform
all configuration or management operations in the database. This
includes dropping the database.
db_securityadmin This role can modify the permissions and roles in the database.
public This role initially has limited rights to objects in the database, but
it’s assigned to every user. This role can’t be removed. It’s a user-
defined role in its permissions, but it’s mentioned here because the
server creates this role.

Because these are fixed roles, they aren’t suited to securing your individual objects. Instead,
most of these roles are used for administrative functions or widespread access to objects.
The db_datareader and db_datawriter roles are usually granted to developers because they
allow access to all tables. Granting these rights to individual users means they can access all
data in all tables. If you create a new table that you only want a limited number of users to
access, anyone in this role will still have access. For this reason, limit the use of these roles to
nonproduction databases.
Similarly, the db_denydatareader and db_denydatawriter roles have far-reaching effects. These are
good roles in specialized situations—specifically, auditing and read-only situations, respectively.
The public role is unique in that, although it’s created by SQL Server, it has no explicit rights
to your objects. This role has limited rights to read system views by default and is assigned
to all users. This assignment can’t be removed from any role. However, the administrator can
change the permissions of this role, as discussed in the next section.
Like server roles, these should be assigned only in accordance with the least privileges needed
for a job function. For example, assigning the db_owner role to a developer who only needs
LAB EXERCISE to run DDL and back up the database is a poor practice.
Perform the exercise in your lab Exercise 6.5 shows how to add a user to a role.
manual.
Working with User-Defined Roles

CERTIFICATION READY? User-defined roles are similar to database roles in that they allow multiple users to be
Ensure that you know added. User-defined roles can own schemas, and they can have permissions assigned to
the fixed database roles. them. However, these roles aren’t added when SQL Server is installed; rather, the admin-
Expect at least one exam istrator creates them as needed. These roles are analogous to Windows Active Directory
question on this topic. groups that are used to combine a series of users for easy management of permissions.
136 | Lesson 6

It’s recommended as a best practice in SQL Server that rights to securables not be granted to
individual database users but rather be granted to roles. Each role should be granular enough
to provide security to a set of functionality, but not so granular as to require a cumbersome
amount of administration. Most databases have two to five roles, one for each major section
of functionality or group of users that will access the database.
As mentioned in the previous section, the public role is available in every database and is
automatically assigned to every user in the database. Even though this role is created by SQL
Server, the administrator can modify the permissions for this role, just like any user-defined
role. Any permission assigned is granted to every user in the database, just as any specific
denial of access prevents every user from accessing that object. Just as it’s recommended that
WARNING Granting permis- you not grant explicit rights to the Everyone group in your Active Directory domain, it’s rec-
sions to public means you can’t
easily revoke those permissions ommended that you not grant rights to this group in a database. Instead, create another role,
from a specific user later if need be. and assign the specific rights you require to that role.
The administrator can create any other role he or she chooses in order to assign varying per-
missions to users. These roles you create must have a unique name in the database that con-
forms to the SQL Server object-naming rules. You can create thousands of roles, but doing
so isn’t practical. Instead, you should seek to create roles that mimic the major job functions
along which you typically divide the users’ access.
Once you create a role, you can assign permissions to it using the GRANT, REVOKE, and
DENY statements in the same manner that you assign permissions to any user. These state-
ments are discussed in detail in Books Online with examples and syntax elements defined.
The permissions you assign should be the necessary permissions to enable the role to work
with whatever data it needs—and no more.

You shouldn’t perform a blanket assignment of permissions. An auditing role doesn’t need
INSERT, UPDATE, and DELETE permissions to tables, and they shouldn’t be granted
TAKE NOTE
* along with SELECT permissions. Grant a role the specific permissions needed to an object
rather than granting all rights.

As with a user, this role can be assigned explicit permissions to perform functions in the
database, such as creating tables, performing a backup, and so on. If you must assign these
permissions and a fixed database role doesn’t meet your needs, then you should assign them to
a specific role, with users added to the role. The role can also own schemas, which gives the
members the right to work with objects under those schemas.
LAB EXERCISE

Perform Exercise 6.6 in your Exercise 6.6 walks through creating a role, assigning it to a user, and assigning explicit rights
laboratory manual. to two tables.

TAKE NOTE
* Using Application Roles
Typically you’ll assign
rights to the appropriate Application roles are a unique kind of role in SQL Server. As in previous versions, the
role(s) when an object is application role isn’t assigned to a user; rather, it’s invoked by a user who’s already con-
created. nected. Once set, users assume the rights granted to the application role and execute
everything in the context of this role, rather than their previous context. Users also lose
CERTIFICATION READY? any rights they had before invoking the application role.
A user has a logon
with table permission
in a database. The user
also connects to the
The application role is created with the CREATE APPLICATION ROLE command, which
same tables through an requires a role name and a password. This password is how the application role is secured. A
application role. In this user can’t invoke this role without the password, which is usually secured in an application.
ambiguous situation, This ensures that only that particular application can access these objects.
what permissions does
As with user-defined database roles, the application role can be granted permissions to schemas
the user have?
and individual objects. You do this in the same ways shown for user-defined database roles. These
Designing SQL Server Service-Level and Database-Level Security | 137

rights aren’t assigned to users, and users aren’t granted the application role; instead, the password
is used to move a user into the application role. These rights remain in effect until the user
disconnects from SQL Server or “unsets” the role using the stored procedure sp_unsetapprole.

To revert to your original permissions without disconnecting and reconnecting to SQL


TAKE NOTE
* Server, you must save a cookie of information when you execute sp_setapprole.

Application roles aren’t widely known, but they’re valuable resources. If you allow your users
to connect to SQL Server with their Windows account or SQL Server–authenticated account
and assign permissions to that user, then they can connect with any application. This means
that in addition to using your ERP application to work with data, a user could also connect
with Microsoft Excel or another ODBC-compliant program and manipulate data outside the
expected business application. If necessary business rules are embedded in the application and
not the database, they may be bypassed. In cases where you want to ensure that only a par-
ticular application is used to access certain data, an application role can enforce this limitation
if it can be set to invoke the application role.

■ Introducing DDL Triggers

SQL Server supports the concept of a DDL (data definition language) trigger, which responds
to events that define objects in SQL Server. The primary events are the CREATE, ALTER, and
DROP statements and their variants. Because these statements fundamentally alter the way in
THE BOTTOM LINE which the server can work with the addition or deletion of objects, a trigger allows auditing or
greater control over these types of changes. The objects affected by DDL statements can be data
objects (tables, views, procedures, and so on) or principal objects (logins, users, roles, and so on).

Triggers have been a part of SQL Server since its inception. Triggers are sections of code that
execute in response to some event. Prior to SQL Server 2005, triggers were limited to data
manipulation language (DML) events. These are INSERT, UPDATE, and DELETE state-
ments that modify or manipulate data.
Just like a DML trigger, which fires when a particular event occurs, a DDL trigger fires when a
DDL event for which it’s set up occurs. These types of triggers are more complex and use differ-
ent internal structures to determine what data is available from the event. DDL triggers execute
after the T-SQL statement is complete. These triggers can’t be used as INSTEAD OF triggers.
The following sections discuss the scope, events, and recommended policy for DDL triggers.

Understanding DDL Trigger Scope

A regular DML trigger is scoped to the table against which it’s created. When the par-
ticular event for that table, and only that table, is executed, the trigger fires and performs
its actions. A DDL trigger, however, is scoped differently because the events for which
it fires aren’t tables in a particular database/schema combination. Instead, a DDL trigger
has two scopes: server-wide and database-wide.

A server-wide scope means that any time the particular event executes anywhere on the server
instance, in any database, this trigger fires. An example is the CREATE ENDPOINT
command. If a server-wide trigger is set for this event, then it will execute if an endpoint is
created in any database that exists on the instance.
A database-wide scope, in contrast, means that only those events executing in a particular data-
base cause the trigger to fire. If a CREATE USER DDL trigger is created in the Sales_Prod
database and scoped to that database, then if a CREATE USER statement is executed in the
Sales_Dev database, the trigger doesn’t fire. This limits the ability and necessity of the trigger
138 | Lesson 6

to perform any action across the server. A separate DDL trigger for the CREATE USER event
would need to be created in the Sales_Dev database to track events in that database.
Using scope can limit the execution of these triggers to only those events that need to be
acted upon. Often, a database administrator is concerned about events in one database that
aren’t important in another.

Specifying DDL Trigger Events

When a DML trigger is created, you specify the particular event (or events) in the code. For
example, a trigger to copy information from an inserted sales invoice might look like this:

CREATE TRIGGER sales_insert ON Sales FOR INSERT


AS
INSERT sales_audit (invoice, customer, salesdate)
SELECT invoice, customer, salesdate
FROM inserted
GO
This trigger notes the scope (the Sales table) and the event (INSERT). Multiple events can be
included if necessary.
You can also set a DDL trigger to respond to events, as shown here:
CREATE TRIGGER NoDrop
ON DATABASE
FOR DROP_TABLE
AS
PRINT ‘Disable Trigger “NoDrop” to drop tables’
ROLLBACK
GO
This trigger is scoped for the current database and is fired in response to a DROP TABLE
TAKE NOTE
* command, which would fire the DROP_TABLE event. See Books Online for the events that
Events that occur at a occur on a SQL Server.
database level, such as
In addition to events, a DDL trigger can also respond to event groups. These are broader clas-
CREATE USER, can
sifications of events, such as the DDL_LOGIN_EVENTS group, which covers the CREATE
be captured in a server
LOGIN, ALTER LOGIN, and DROP LOGIN events. These groups are linked in a tree, a
instance scoped DDL
portion of which is shown in Figure 6-2. The complete tree is available in Books Online and
trigger.
is extensive, covering server-level and database-level events.

Figure 6-2
DDL trigger events
Designing SQL Server Service-Level and Database-Level Security | 139

Using an event group instead of the event class ensures that you don’t forget an event related to
that class. It’s easier to administer and work with one trigger that tracks all LOGIN events than
it is to create three separate triggers for each of the CREATE, ALTER, and DROP events. Be
aware of the tree structure, however, because lower-level events will cause the trigger to fire. The
code in your trigger must be able to handle all the events in your group to work properly.

Defining a DDL Trigger Policy

With the addition of DDL triggers, the administrator has a great deal more admin-
istrative and auditing capability than existed before, without requiring the overhead
of running traces and analyzing their output. However, this means you need to judi-
ciously use this capability to avoid placing an undue burden on either the server or
the administrator.

Before discussing how to define your trigger policy, please examine a few more features of
DDL triggers that may impact how you use them. First is the capability of having multiple
triggers fire for an event. You can set up two separate triggers for the CREATE USER event
and have them both fire when this event occurs. This may be necessary for any number of
business reasons, but you must set two factors in your policy.
The first is the firing order. This can be important, because one trigger may depend on
the other having already executed in order to function as designed. You can change the fir-
ing order of your triggers, but this must be done explicitly; and you should ensure that all
WARNING Beware of
encrypting triggers. If the code is instances of multiple triggers—DML and DDL—have the firing order set and known.
encrypted, the trigger can’t be rep-
licated. Also note that DDL triggers Second, each trigger consumes resources. Having multiple triggers means more work for the
aren’t fired in response to tem- server to set up the execution environment as well as more potential work in the same trans-
porary table operations. This is an action. Don’t overload the server with unnecessary triggers. If possible, ensure that triggers are
inherent limitation and means you
can’t prevent or audit these events consolidated, perhaps requiring code reviews for multiple-trigger situations.
with these triggers. Make sure
all developers and administrators Another feature of triggers to be aware of is the capability for code encryption. This is dis-
know that as part of your policy. cussed in the next section, “Defining a Database-Level Encryption Policy.”
DDL triggers also require different coding structures to access the data about events. Unlike
DML triggers, which use the inserted and deleted tables, a DDL trigger requires you to work
with event data. Make sure anyone using these triggers has been properly trained to gather
the data.
As has been shown, these triggers are powerful but much more complex than DML trig-
X REF
gers. Therefore, you need a more extensive policy to deal with their use in your servers. The
Lesson 4 covers designing features mentioned should be noted in your policy, but you should also have guidelines indi-
security infrastructure. cating where and why these triggers are created. As discussed in Lesson 4, when you look at
designing a security infrastructure, there may be regulatory or industry guidelines about con-
trolling or auditing various types of events, most notably security changes. DDL triggers can
provide a way to do this, but don’t overuse the triggers and cause a large burden on the server
or company. For example, preventing new logins may provide some level of security, but if
the administrator can’t disable the trigger when needed in a timely fashion, the business may
be negatively affected.
These triggers also need to be monitored for the information they may return or store. This
generally means an auditing environment, so you should have a policy indicating how this
data is secured and made available when needed. Security for the DDL trigger data is as
important as the security of the data used by the company, because it can show whether the
company data is being properly accessed or compromised.
140 | Lesson 6

■ Defining a Database-Level Encryption Policy

SQL Server lets you use the WITH ENCRYPTION keywords in the CREATE or ALTER
THE BOTTOM LINE statement to encrypt the data that makes up this object. This prevents anyone who gets a
copy of the database from reading the T-SQL code that forms the object.

In Lesson 5, you studied an overall encryption scheme and policy for SQL Server using the
encryption mechanisms that turn plain-text data into ciphertext. The keys exist at a server
or database level, and the encryption occurs at a table or column level. As mentioned in that
Lesson, the minimum amount of data should be encrypted. This ensures good performance
of your SQL Server. Your policy should be specified at a corporate level, with exceptions
documented as needed.
TAKE NOTE
* Another type of encryption is available in a database: encryption of code. A number of code
The encryption option objects—stored procedures, functions, triggers, and so on—store the plain-text code of the
can’t be used with CLR object in the database. This means any user with rights to read the sys.syscomments table can
assemblies. read the code for the object and potentially misuse that information. This code may also be
proprietary and a part of your company’s intellectual property.
When you encrypt the code for an object, it can’t be read from the server, which provides a
degree of protection for how the object works. This security feature can help prevent mali-
WARNING It is very impor-
cious users from examining your code for vulnerabilities to SQL Injection or other hacks as
tant to keep a copy of the original
code. The encrypted object code well as protect your intellectual property. However, encrypted code requires that your devel-
can’t be recovered. opers be careful with the original source.
Your policy regarding encryption should specify whether this feature of SQL Server will be
used. If you choose to use it, it’s recommended that it be applied to each object type and not
to each object. In other words, specify that all triggers be encrypted rather than some being
encrypted and some not.

Transparent Data Encryption

SQL Server 2008 includes a new feature known as transparent data encryption (TDE).
This is a method of encrypting an entire database without requiring the modification
of any application code. This encryption is transparent to the application code as SQL
Server 2008 automatically handles the encryption and decryption of all data going in
and out of the database. The primary purpose of this TDE feature is to have the entire
database encrypted so that any unauthorized person having direct access to copies of the
database files and/or transaction log files cannot decrypt and read the data. This ensures
that if for example, backup files on tapes fall into the wrong hands, the data is still secure
from unauthorized access.

While TDE is implemented for individual databases, it is actually partially an instance level feature
of SQL Server 2008 as the instance’s tempdb is also encrypted when TDE is active. Different data-
bases within a single instance can be encrypted with TDE using different encryption algorithms.
Tempdb is encrypted when any one database in the instance is encrypted and tempdb, when it is
encrypted, is always encrypted with the AES_256 algorithm.
As TDE is a transparent database-level encryption methodology, individual columns of data can still
be encrypted. This allows existing code and encryption designs to continue to function.
TDE is established by using a certificate stored in SQL Server. This certificate is based on
the master key also created and stored in SQL Server. The certificate must be in the master
database. An example of encrypting an entire database with TDE is shown below.
Designing SQL Server Service-Level and Database-Level Security | 141

USE mydatabase
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM AES_128
ENCRYPTION BY SERVER CERTIFICATE mycert;
X REF
GO
Lesson 5 examined keys,
ALTER DATABASE mydatabase
certificates, and encryp-
tion algorithms. SET ENCRYPTION ON;
GO
It is critically important to understand that the database master key and the encryption cer-
tificate need to be backed up to a secure location. This location also needs to be separate from
regular backups or other copies of the database files. The encryption security provided by TDE
is meaningless if database files and the certificate both fall into the hands of the wrong person.
Further, for disaster recovery or other restore operations to a different server, the certificate will
be required for restoring a TDE encrypted database. You can think of the certificate as the key
to unlocking your database. Certificates are very rarely changed so securing a backup copy of
critical certificates should be an easy activity.

■ Securing Endpoints

Every network communication with SQL Server takes place through an endpoint, which
is a communication point for SQL Server. Endpoints exist for the protocols clients use to
THE BOTTOM LINE
communicate with SQL Server as well as for database mirroring, the Simple Object Access
Protocol (SOAP) and Web Service requests, and the Service Broker.

When the server is installed, a Tabular Data Stream (TDS) endpoint is created for each proto-
X REF
col that is enabled. Table 6-3 shows the protocol endpoints and the default names as set up by
Lesson 10 discusses SQL Server. Each protocol requires an endpoint with a unique name.
database mirroring.

Table 6-3
Default protocol endpoints P ROTOCOL D EFAULT E NDPOINT N AME
Dedicated Administrator Connection (DAC) Dedicated Admin Connection
Named Pipes TSQL Named Pipes
Shared Memory TSQL LocalMachine
TCP/IP TSQL Default TCP
VIA TSQL Default VIA

The Shared Memory, Named Pipes, and DAC protocols have only one endpoint per instance.
The VIA and TCP/IP protocols have a default, but the administrator can create additional
endpoints for various services.
Each endpoint exists as an object in SQL Server, like many other objects, and permissions
are granted to allow its use. You can apply the typical GRANT, REVOKE, and DENY per-
mission statements to the endpoint using the ALTER, CONNECT, CONTROL, TAKE
OWNERSHIP, and VIEW DEFINITION permissions. Each of these permissions is similar
142 | Lesson 6

to the same object permissions discussed in Lesson 7. The exception here is the CONNECT
permission, which isn’t associated with most other types of objects.
Each of these endpoints enables a communication path into SQL Server, but they all function
slightly differently with different options and potential security issues. The following sections
discuss each type of endpoint.

Introducing TDS Endpoints

A TDS endpoint is one that is built to accept TDS communications. These are the
standard protocol communications used by Management Studio, ActiveX Data Objects
(ADO), Open DataBase Connectivity (ODBC), and most other SQL Server clients. The
TDS protocol is encapsulated in the underlying transport protocol (TCP/IP, Named Pipes,
and so on) and contains the T-SQL batches that are submitted for most applications.

TAKE NOTE
* The default permissions to an endpoint allow all users to connect. This permission is implic-
The Dedicated Admin itly granted to a login when it’s created. You can change this for the TDS endpoints by
Connection endpoint DENYing access to Everyone and then granting explicit permissions to each login that will be
is an exception. Only a allowed to connect.
sysadmin login can Each endpoint can be associated with any IP or one particular IP on the server. It’s also asso-
connect using this ciated with a port. When you create a new endpoint, you can specify both of these parameters
endpoint. to configure how clients will be allowed to connect to the SQL Server. For dynamic ports, the
default TCP endpoint is used. You can create additional TCP/IP listeners in the SQL Server
Configuration Manager, under the Network Configuration section, and then associate them
with an additional endpoint. VIA connections are treated the same as TCP/IP connections.

Using SOAP/Web Service Endpoints

SQL Server 2000 could create a web service and respond to queries using HTTP, but it
required integration with the Internet Information Server (IIS) on the Windows host.
Starting with SQL Server 2005, the database server can natively respond to HTTP
requests with an endpoint using SOAP.

SOAP allows method calls to be mapped to stored procedures or ad hoc batches to be sent
to the server using the HTTP or HTTPS protocol. It’s often used in web services as a way to
programmatically access methods on a remote server.
When an endpoint is created for use with SOAP calls, it must be specified with not only the
port to be used, but also the protocol (HTTP or HTTPS) along with the type of authentica-
tion that will be allowed. Five types of authentication are available; they’re listed here from
least secure to most:
• Basic. One of two mechanisms in the HTTP 1.1 specification. The username and pass-
word are encoded in the header using base64 encoding. Requires https communications.
TAKE NOTE
* • Digest. Second mechanism in the HTTP 1.1 specification. The username and password
An important facet: are hashed using MD5 and compared on the server. Only domain accounts can be used.
Anonymous authenti- • NTLM. Authentication method used in Windows 95, 98, and NT4.
cation isn’t supported • Kerberos. Supported in Windows 2000 and later. A standard authentication mechanism
with endpoints. The used by Active Directory and requiring that a service principal name (SPN) be registered
user must be a valid for SQL Server.
Windows user.
• Integrated. Allows NTLM or Kerberos methods to be used.
Each of these types has pros and cons, although only one at a time is associated with an end-
point. You can change the authentication type using the ALTER ENDPOINT statement.
Kerberos is preferred, and Basic is the last choice in terms of security.
Designing SQL Server Service-Level and Database-Level Security | 143

Each SOAP endpoint also requires a unique path on the Windows server that equates to a
virtual directory in IIS. This isn’t necessarily a security issue, although setting standard paths
on all SQL Servers gives potential attackers information they can exploit.
One specific security recommendation for SOAP endpoints is that you should specify only
those methods actively being used as available with this endpoint. If methods are retired, you
should remove them from the endpoint. SOAP endpoints can also support ad hoc batches,
but this capability is disabled by default.

Working with Service Broker and Database Mirroring Endpoints


Service Broker and database mirroring endpoints share many of the same options and
security issues. The Service Broker provides a queuing mechanism in SQL Server. Lesson
10 discusses database mirroring with other high-availability technologies.

For both of these endpoints, you have the option of using a certificate for authentication by
the endpoint. This means the private key of a certificate is used on this server, and the client
must have the matching public key for authentication. Because certificates expire and there
are sometimes issues with transfer and renewal between servers, you also have the option of
using Windows authentication methods (NTLM or Kerberos).
Unlike the other endpoints, however, these endpoints provide a fallback option. The connec-
tion can be attempted with either a certificate or Windows method and then fall back to the
other. You specify this along with which type of authentication to use first in the CREATE
TAKE NOTE
* ENDPOINT or ALTER ENDPOINT statements.
If both sides of the
endpoint specify differ- It’s recommended that you not allow the fallback. If you truly need certificate authentication,
ent algorithms, the one then you should specify it and not let it fall back to Windows authentication. Often a backup
specified by the receiving connection method becomes permanent because it works and employees won’t seek to fix the
end is chosen. primary method. If you allow this to stop communications when it doesn’t work, it will force
people to fix the primary method of communication.
One other security mechanism is available for these endpoints: encryption. The connection
used for this endpoint can be set to not use encryption, to allow it if the client is capable,
WARNING RC4 is weaker or to require it in all communications. The default is to require encryption using the RC4
than AES, but it’s considerably fast-
er. Choose based on your primary algorithm. The endpoint specifies whether the AES algorithm will be used instead, or if either
need: performance or security. algorithm can fall back to the other.
In general, you should require encryption if all clients can support it. If not, then specifying
the SUPPORTED option will allow encryption to be used when possible. Avoid disabling
encryption for the critical services if at all possible, because it means data is being communi-
cated in clear text.

Defining an Endpoint Policy

The security policy for endpoints is similar to that for most any other SQL Server com-
ponent: Grant only the minimum rights needed for this object. In addition, because an
endpoint is an active listener for communications, it can be started, stopped, or disabled.
It’s recommended that only the endpoints needed for T-SQL communications be started
and active on any server. If you no longer need an endpoint, you should disable it if you
may need it again or drop it if not.

This applies to protocols because there is sometimes a temptation to enable all protocols on a
server. If there are no named pipes or VIA clients, then these protocols and their corresponding
endpoints should be removed. Doing so reduces the surface area available for attack and increases
the security of your server.
The state of each endpoint is also a security item that should be addressed in your policy. The
three states differ in how they affect the security of your server. As mentioned previously, if an
144 | Lesson 6

endpoint isn’t being used, it shouldn’t be started; but should it be stopped or disabled? The server
responds differently to these two states, and this is important in deciding on security policy. In
the disabled state, the endpoint doesn’t respond to client requests, which is the same as if the
endpoint hadn’t been created. If an endpoint isn’t being used currently, it should be in this state.
In contrast to the disabled state, the stopped state doesn’t respond to client requests but returns an
error. This is similar to the 404 errors returned by web servers when a page doesn’t exist. This state
doesn’t allow clients to connect to the server, but it does tell a client that an endpoint exists on this
port that may be started again. The stopped state is useful for temporary service interruptions, like
maintenance activities being performed on the endpoint; however, you shouldn’t use it as anything
more than this. If the endpoint won’t be used for any length of time, you should disable it.
Finally, it’s recommended that you use secure communications—HTTPS for SOAP end-
points and encryption for Service Broker and database mirroring—to make sure your trans-
missions aren’t intercepted while in transit. The defaults for the various types of endpoints are
set in accordance with the Trustworthy Computing Principles in mind, requiring Windows
authentication and encryption by default.

■ Granting SQL Server Agent Job Roles

The msdb database has three fixed database roles that allow you to assign permissions relating
THE BOTTOM LINE to SQL Server Agent to users who aren’t sysadmins.

SQL Server Agent is an extremely useful component in the SQL Server platform, letting you
schedule tasks that need to be performed both in SQL Server and on the domain. In the past
with SQL Server 2000, a broad scope of permissions was required to use the Agent, but start-
ing with SQL Server 2005, a number of roles have been introduced allowing finer-grained
control over the security for this subsystem. You can also specify proxies for jobs instead of a
central proxy for all non-sysadmins as in SQL Server 2000.
As with other security decisions, your policy should grant the least privileges required to
logins in order to allow them to perform their jobs. The three roles are:
SQLAgentUserRole. This is the least privileged role for using the SQL Server Agent. It
allows the user to work with single-server jobs for that instance only and to enumerate opera-
tors but not change them. Only jobs owned by the user can be examined and changed. The
job history can’t be deleted.
SQLAgentReaderRole. This role includes all the privileges of the SQLAgentUserRole, but
it can also work with multiserver jobs and view their history, properties, and schedules. This
role can’t edit those multiserver jobs nor delete job history.
SQLAgentOperatorRole. This role includes all the privileges of the other two roles as well
as the ability to list alerts, operators, and delete the job history from the local server. This role
can also enable or disable jobs.
None of these three roles is as powerful as an administrator, and none can create or edit alerts,
operators, or proxies. However, if you’re delegating the ability to start jobs, you should con-
sider using one of these roles to give limited privileges to certain users.

Case Study: Specifying Proxies


SQL Server Agent lets you specify a proxy for individual job steps. In this case, the job
will run under the credentials of the proxy instead of the owner of the job or the SQL
Server Agent service.
Only sysadmins can add, edit, or delete proxy accounts, so there is no security to delegate
here. However, the proxy is restricted to a particular subsystem of SQL Server, so the
Designing SQL Server Service-Level and Database-Level Security | 145

sysadmin should decide to which subsystem(s) the proxy has access. These are the
subsystems:
• ActiveX Scripts
• Operating System Commands
• Replication Distributor
• Replication Merge
• Replication Queue
• Replication Snapshot
• Replication Transaction-Log Reader
• Analysis Services Command
• Analysis Services Query
• SSIS Package Execution
• Unassigned
As with roles, you should create proxies for different types of jobs or actions that they need
to perform. However, they shouldn’t have more permissions than necessary to complete a
particular job step. If a series of steps requires access to Integration Services packages only,
and another series of steps accesses the operating system only, you shouldn’t create a proxy
with rights to both systems and use it in both cases. Instead, create two separate proxies.
Your policy for creating proxies should also seek to minimize the rights required for
each job and to share proxies in job steps only insofar as they’re doing the same work as
another job step.

■ Designing .NET Assembly Security

Starting with SQL Server 2005, there has been a huge increase in the capabilities of stored
procedures and functions. This involves the integration of the .NET Common Language
THE BOTTOM LINE Runtime (CLR) with the database server. This lets you use any .NET language, from C#
to VB.NET to Perl.NET, to write a series of methods that can be wrapped in a function or
stored procedure and called in any T-SQL batch.

The assemblies that can be called from within SQL Server must first be registered in the SQL
Server, similarly to the way you must register DLL code with Windows using the CREATE
ASSEMBLY command. This registration command allows the user to specify how the security
for these assemblies is controlled by the server. There are three levels of security for assemblies:
SAFE, EXTERNAL_ACCESS, and UNSAFE.

Setting SAFE

SAFE assemblies are completely written in managed .NET code and are intended to
access resources only within SQL Server. Computations and business logic can be per-
formed with data in tables, but there is no access outside the SQL Server, including the
Windows host file system and API calls.

This is the most restrictive level of security for a .NET assembly. If requirements dictate that
a .NET assembly perform the computation on only SQL Server data, this is the level of secu-
rity you should set. There are some restrictions on the assembly, such as the fact that it must
be type-safe and a few other limitations on the programming capabilities allowed.
146 | Lesson 6

If the CREATE ASSEMBLY permission is granted to a user, then that user can create assem-
blies with this level of security.

Setting EXTERNAL_ACCESS

Assemblies that must access resources outside of SQL Server, such as the Windows host
file system, the network, the local registry, or web services, are usually secured with
EXTERNAL_ACCESS security level. These assemblies are still written as managed code
and must be typesafe, but they can make limited access outside of the SQL Server. These
assemblies can access memory buffers owned by the assembly in the server.

If an assembly requires this level of access, then this is the preferred level of security because
there are still restrictions on the programming of the code. The login creating this assembly,
however, requires the EXTERNAL_ACCESS ASSEMBLY permission in addition to the
CREATE ASSEMBLY permission.

Setting UNSAFE

UNSAFE assemblies are completely unrestricted by SQL Server and can access any resource
on the local machine or the network. These assemblies can access memory buffers in the
SQL Server process space and call unmanaged code, such as legacy COM components.

There are virtually no restrictions on the type of code that can be called from an UNSAFE
assembly, which can severely affect the stability of SQL Server. Because of the potential issues,
only a member of the sysadmin server role can create an UNSAFE assembly.
It’s recommended that you require extensive code reviews by very experienced developers
before allowing UNSAFE assemblies on your server.

S K I L L S U M M A RY

Developing security policies at the SQL Server service level is much more complicated in SQL
Server 2005 and 2008 than in previous versions. With a change in the paradigm of how the
server security is structured, the administrator must better understand the new capabilities
and the ramifications of using them.
The login and role structure is similar to previous versions, but changes allow more granular
control of permissions; administrators should study these areas. This is
especially true for the SQL Server Agent permissions and roles.
Some of the new structures, such as DDL triggers, endpoints, and .NET assemblies, mean that
the designer of a security policy must address new areas. Doing so requires extensive work to
understand how these new features work from a security standpoint.
This lesson has broken down SQL Server security to those areas that affect the overall server.
This completes two thirds of the security structure for SQL Server. The next Lesson will discuss
the most granular security, at the object level.
For the certification examination:
• Understand the different types of SQL Server logins. You should know different types
of logins available in SQL Server, Windows-authenticated logins, and SQL Server–
authenticated logins, as well as their differences.
• Understand the server roles available in SQL Server. Know the different server roles,
including their capabilities and their restrictions.
Designing SQL Server Service-Level and Database-Level Security | 147

• Understand database users. You should understand what a database user is and how it’s
mapped to other principals or securables.
• Understand schema concepts. Know what a schema is and its role in security management.
• Understand database roles. Know the three types of roles in SQL Server, understand the
differences, and know when to use them in your security design.
• Know how to work with endpoints. Understand what they are and how they impact security.
• Understand what DDL triggers are. These triggers are different from regular triggers and
you need to understand what they are and how they work.
• Know how to secure SQL Server Agent jobs. The SQL Server Agent subsystem runs along-
side the other SQL Server services, and you must understand how this job system impacts
the overall security of the database platform.
• Understand how to secure .NET assemblies. You should understand how security works in
regard to .NET assemblies and the implications of the various settings.

■ Knowledge Assessment

Case Study
Herd of Two
Herd of Two stables is a large facility that boards and trains horses. It exists on 400 acres
in Colorado and employs 12 administrative and technical people to handle its comput-
ing infrastructure. In addition, 32 other stable hands have requirements to interact with
terminals to use the time-card system and to track horse care.

Planned Changes
Herd of Two stables is currently running one SQL Server 2000 instance on Windows
2000 for its three applications. It would like to add two additional instances on separate
servers to improve performance and upgrade to SQL Server 2005 at the same time.
All the servers need to be secured properly. The stable hands use SQL Server logins
because they don’t have accounts set up in the AD; however, they need to conform to
the password policy present on the domain.
There is also the need to use encryption to protect the personal information of clients
along with their credit-card billing information.

Existing Data Environment


Currently, three applications are used on SQL Server 2000. One is the financial system
to handle all billing and accounting functions, which is accessed by only a few employees.
The second application is the horse-care system, which contains the instructions for
feeding, medicating, and training. This application is accessed by almost all employees.
There is also an application for tracking time worked by employees, which is connected
to hardware devices that print time cards. These devices are programmable and run a
small client application that logs in to SQL Server to record employee IDs and time
notations.

Existing Infrastructure
The current servers are running Windows 2000. There are enough licenses to cover new
servers, so it’s assumed Windows 2000 will be installed on the new servers.
148 | Lesson 6

The hardware for the new servers as well as the existing server meets the requirements
for SQL Server 2005.

Business Requirements
All employees using SQL Server logins must abide by the password policy, which is set
on each machine using AD Group Policy.
The barn foreman has requested that he be notified whenever a new stable hand’s login
is added to the SQL Server to be sure they have been trained properly before receiving
access.
It’s requested that a high-availability system be set up between two of the servers for the
horse-care system, but no money is available for a clustering solution.
The barn foreman has requested the ability to perform backups of the horse-care data-
base during the afternoon when all horses have received their medication.

Technical Requirements
The applications can be altered to handle the encryption needs; however, the hardware
can’t handle large key lengths.
The SQL Servers must run various jobs on demand. The IT staff isn’t always available,
so a secure solution is desired that will allow the barn foreman to execute a few jobs on
the horse-care system server.
Some enhancements to the three systems are planned using the CLR integration.
Various assemblies will need to be installed on the servers in a secure manner.

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions.
1. The new servers will be installed with Windows 2000 and SQL Server 2005. What type
of password requirements will be enforced?
a. Password expiration
b. Password policy (length, content)
c. Both of the above
d. None of the above
2. Which of the following algorithms would be best suited to provide encryption without
taxing the server hardware?
a. RSA 1024
b. AES_256
c. DES
d. Triple DES
3. To allow the barn foreman to run certain jobs on one SQL Server 2005 server, which
role should they be assigned?
a. sysadmin
b. SQLAgentOperatorRole
c. SQLAgentReaderRole
d. SQLAgentUserRole
4. One of the assemblies that will be used to enhance the horse-care system requires access
to read an RSS feed from an Internet Web site. What level of permissions should it be
installed with?
a. SAFE
b. UNSAFE
Designing SQL Server Service-Level and Database-Level Security | 149

c. EXTERNAL_ACCESS
d. UNLIMITED
5. When the new servers are installed, database mirroring will not be enabled. Should the
database mirroring endpoint be created to prepare for the future activation?
a. Yes
b. No
6. What would be the preferred method of ensuring that new logins are sent to the barn
foreman?
a. Build an auditing routine into the application, and use it to add all logins.
b. Load the Windows security log into a table at the end of each day, and search for the
creation of a new login.
c. Use a paper form for new logins that requires the foreman’s signature before a login is
created.
d. Create a DDL trigger that responds to the server CREATE LOGIN event. Use it to
send e-mail to the foreman.
7. To allow the barn foreman to back up the horse-care database, what role should he be
assigned?
a. sysadmin
b. backupadmin
c. db_backupoperator
d. db_owner
8. Enhancements to the horse-care application will require tables in the financial database.
However, the users of the financial database should not access the horse-care tables
directly. What security measure should be used to easily assign permissions?
a. Server roles
b. Schema separation
c. Fixed database roles
d. Application roles
9. Currently, all users of the time-card application can access all tables and stored proce-
dures in that database. However, future enhancements are planned to limit access to
stored procedures only. What role should be assigned to the users of this application?
a. db_owner
b. A user-defined role with specific permissions
c. db_datareader
d. db_datawriter
10. The president of the company is concerned about being able to connect to the serv-
ers at any time. Application security is handled in the applications, and the president is
assigned to the user-defined roles in SQL Server 2005 for access. She has no server roles
assigned. She would like to use the Dedicated Admin Connection to be sure she can
always connect. Can you grant rights to this endpoint to her login?
a. Yes, with GRANT CONNECT ON DAC To <login>.
b. Yes, by assigning the login to the sysadmin role.
c. No, there is no way to do this.
7 LESSON
Designing SQL Server
Object-Level Security
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL EXAM OBJECTIVE


Develop object-level security. Foundational
Design a permissions strategy. Foundational
Analyze existing permissions. Foundational
Design an execution context. Foundational
Design column-level encryption. Foundational

KEY TERMS
assembly: A managed application the .NET technology provided login token and one or more
module that contains class by Microsoft that handles the user tokens (one user token
metadata and managed code actual execution of program code for each database assigned).
as an object in SQL Server. By written in any one of many .NET Authenticators and permissions
referencing an assembly, common languages. control ultimate access.
language runtime (CLR) functions, data control language (DCL): permission: An access right to
CLR stored procedures, CLR A set of SQL commands that an object controlled by GRANT,
triggers, user-defined aggregates, manipulate the permissions that REVOKE, and DENY data control
and user-defined types can be may or may not be set for one or language commands.
created in SQL Server. more objects.
common language runtime execution context: Execution
(CLR): A key component of context is represented by a

The purpose of SQL Server, or any relational database system, is to store data and make
it available for use by applications. All the security that is built into SQL Server is for the
purpose of protecting and ensuring only authorized access to your data. This Lesson looks
at the lowest level of security—the objects. If you have to name it, it’s an object; if it has a
name, it’s an object.
Object is a general term for all the entities inside a database that store or interact with
your data. These include tables, views, stored procedures, functions, assemblies, and more.
The users who have access inside a database require permissions in order to use these
objects to read and write data.

150
Designing SQL Server Object-Level Security | 151

■ Developing a Permissions Strategy

Before you can assign permissions to the objects in your database(s), you must have some
THE BOTTOM LINE type of strategy for securing the objects and the data they contain. This strategy should
define at a high level how you’ll handle the requirements for your applications.

The first recommendation that every SQL Server administrator should implement is to
adhere to the basic tenet of security and not assign any more access than necessary to perform
a job. This ensures that no one, whether accidentally or maliciously, accesses data or functions
they aren’t supposed to access. You control access with permissions.
The use of roles for assigning permissions is another part of a good permissions strategy.
Whether these are fixed roles or user-defined roles, managing permissions is greatly eased by
using roles for all permission assignments and not assigning any permissions to individual
users. Security is a hard process, and by making it easier to administer, it’s more likely that
your security decisions will be enforced and maintained.
This use of roles includes administrative functions. SQL Server allows you to divide the
administrative tasks among a larger number of people without compromising the server’s
overall security by granting everyone sysadmin permissions. It’s recommended that if your
X REF company uses multiple people to handle various tasks, you should ensure they aren’t granted
more permissions than necessary.
In Lesson 6, you exam-
ined the administrative The next part of your strategy should address whether any permissions are assigned to the guest
roles for the server, user or the public account. It’s highly recommended that you don’t assign rights to either of
databases, and SQL these objects and that you instead explicitly set up other roles to meet your needs. Because these
Server Agent. are shared objects among all users, it’s harder to remove permissions from them if necessary.
Some administrators have avoided using application roles because doing so requires that an
application be able to execute a stored procedure and switch permissions. However, this is a
great way to ensure that a specific application is used to access data. If you can control the
way the application executes stored procedures on login, this is a good choice. If not, then
you may want to design a policy that forbids the use of this type of role.
Last, you need to determine the degree to which you’ll allow the object-control permis-
sions such as Alter, Create, and Drop for different objects or schemas. Most users don’t need
these permissions, nor are they warranted, because most users won’t change the structure of
database objects. However, you may have specific groups of developers or application admin-
istrators who need the ability to use these permissions. Your strategy should specify the cases
in which these permissions will be granted or which objects or schemas you’ll let specific users
change. However, as with individual object permissions, you should use roles to easily assign
permissions to and remove them from users by moving the users in and out of roles.

In assigning data-access permissions and control permissions, you should use separate roles
TAKE NOTE
* for each.

Your policy should strike a balance between being as granular as necessary to meet the
business needs of your company and making the role assignment simple enough to ensure it
can be administered. The extreme level would create a role for each user, which would be an
unnecessarily complex administrative burden. Instead, you should assign a role to each major
job function and grant the appropriate permissions to the role. If there are exceptions for an
individual or a subset of the job’s members, create a role for the exceptions and then use a
DENY approach to remove the permissions they shouldn’t be assigned. This way, a single role
retains the permissions to a large class of objects.
152 | Lesson 7

LAB EXERCISE

Perform Exercise 7.1 in your lab In Exercise 7.1, you’ll assign permissions to a user-defined database role.
manual.
Understanding Permissions

A number of permissions can be assigned to different objects, each with different mean-
ings that affect how your security plan applies to the database users. In this section,
you’ll briefly look at the various permissions and the objects to which they apply.

Before moving to permissions, you need a basic understanding of SQL Server terminology
related to security. The permissions assignment involves two entities. A principal is a user,
group, or process that requests some service. These are the individuals who receive permission
to perform some action or request a service. A securable is an object on which some
permission is granted. For example, in Exercise 7.1, the SalesManager group is a principal
that received the Select permission on the HumanResources.JobCandidate securable.
To apply a permission to an object, or to remove it, the administrator uses the Data Control
Language (DCL) commands. There are many DCL commands that are used to manage
access to SQL Server. The three DCL commands related to permissions are the following
commands:
• GRANT. The GRANT command adds the permissions listed to the permission list
of the affected user or role. This command is used when you wish to let a principal
receive new permissions. In Exercise 7.1, you added the Select permission on the
HumanResources.JobCandidate table to the SalesManager role.
You can use the WITH GRANT option with this command. Doing so allows the
principal that receives the permission to in turn assign it to others.
• REVOKE. REVOKE is the opposite of GRANT. It removes a permission on an object
from a principal. You use this command when a particular principal no longer needs the
specified permission on the securable. The lack of a permission leaves the object in an inde-
terminate state: access has been neither granted nor denied. The interaction of a user and a
CERTIFICATION READY?
role must resolve this ambiguity. If the ambiguity remains, Microsoft denies access.
You elect to apply the
REVOKE permissions There are two options with this command. The first parameter is GRANT OPTION,
to all objects in your which removes the specified principal’s ability to grant permissions. This doesn’t affect
database to user Nancy. the permission set for the principal on the securable, but it prevents the principal from
What are her effective assigning permissions to other principals.
permissions for these
The second parameter is CASCADE, which revokes the permission from the user as well
objects?
as any users to whom they have subsequently granted permissions.
• DENY. The last permission command is DENY, which prevents the principal from
having the permission specified on the securable. Unlike REVOKE, this command doesn’t
require permission on the object to have been previously granted. Instead, this command
prevents access whether the principal currently has the permission or is assigned it in the
future. The DENY permission overrides any other permission assignments.
This command is often used when overall permissions are granted to a role, but specific
securables contained in that role must not be accessed by a subset of the role members.
For example, suppose that all the sales managers should be allowed to SELECT from the
Production.ProductInventory table, but the junior managers (Bob and Steve) shouldn’t
be allowed to UPDATE this table. Rather than creating a separate role for one group of
sales managers, you can use the DENY command to remove the ability to update data
from those two users only.
Using GRANT and DENY lets you easily implement the permission policy that fits your
CERTIFICATION READY? business needs. As discussed earlier, using roles at the highest level ensures that your policy is
Know the difference easy to administer. Using GRANT to assign permissions at this level and DENY to selectively
between GRANT,
exclude individuals is the most efficient way to manage permissions. As your schema and
REVOKE, and DENY.
business needs change, you can use REVOKE to remove permissions that no longer apply.
Designing SQL Server Object-Level Security | 153

Applying Specific Permissions


The particular permissions you apply will depend on the business needs of your
applications and database. The permissions that are applicable to objects are listed next,
along with a brief description to assist you in deciding which permissions should apply
to different types of objects.

You need to understand that the permission sets are hierarchical in nature. Permissions on a
container object, such as a database or schema, imply permissions on the objects contained
inside, such as the schemas inside the database or the objects inside a schema:
• Alter. The Alter permission applied to individual objects includes all permissions except
the ability to change ownership. These permissions include the ability to create and drop
objects. If granted on a scope, such as schema, the principal can create or drop objects
within the scope.
• Alter Any. This form of the Alter permission applies to either server-level or
database-level securables, such as logins or users, respectively. The principal receiving this
permission can create, alter, or drop any securable in the scope.
• Backup. The Backup permission supersedes the Dump permission and allows the
principal to perform a backup on the database.
• Control. The Control permission is equivalent to assigning ownership of the securables.
All available permissions are granted to the principal, and the principal in turn can grant
those permissions to others.
• Create. The Create permission allows the principal to create new objects of the type
specified in the assigned scope, server, or database.
• Delete. The Delete permission lets the principal user remove data from tables, views, or
synonyms.
• Execute. The Execute permission allows the principal to invoke stored procedures,
functions, or synonyms.
• Impersonate. The Impersonate permission can be granted at the login or user level and
allows the principal to change their security context to that of the assigned user or login.
• Insert. The Insert permission applies to tables, views, and synonyms and allows the
addition of data to those objects.
• Receive. The Receive permission allows (or denies) communication in the Service
Broker queue. Also: CLR ⫽ Common Language Runtime; ACL ⫽ Access Control List.
• References. The References permission is required to access another object for the pur-
pose of verifying a primary- or foreign-key relationship. This applies to scalar and aggre-
gate functions, the Service Broker, tables, views, synonyms, and table-valued functions.
• Restore. The Restore permission supersedes the Load permission and allows a backup to
be applied to a database in a restore operation.
• Select. The Select permission lets a principal query a particular object and return the
data from a table, view, table-valued function, CLR function, or synonym.
• Take Ownership. This permission is similar to the Windows ACL permission and allows
the principal to change ownership to themselves for the objects on which it’s granted.
• Update. The Update permission confers the ability to change the individual values of
data in tables, views, and synonyms.
• View Definition. This permission is new to SQL Server 2005 and provides more
granular control by allowing access to the metadata about a particular class of object.
Without this permission, the metadata definition of an object isn’t available.
The granularity for most of these permissions is the individual object level with the exception
of the Select, Insert, Update, Delete, and References permissions. These permissions can be
assigned to individual columns if your business needs dictate this capability.
154 | Lesson 7

■ Analyzing Existing Permissions

Before you implement the policy you’ve set up and make changes, you should examine
the existing structure of permissions to ensure that any changes you require won’t result in
THE BOTTOM LINE
application issues. You also need to determine if there are conflicts with your policy and the
existing permissions and mitigate these issues.

If you aren’t setting up a new application or database, then you probably have existing
permissions in your database(s). You must examine two different types of object permissions:
the specific object-access permissions such as Select, Insert, Update, Delete, Execute, and
others that are used for accessing data; and the controlling permissions, such as Alter, Create,
Drop, and similar commands that convey control over an object.
The most common types of permissions are the data-access permissions granted on individual
objects. These permissions are shown in Table 7-1 along with the objects to which they can be
granted. The best practice is to assign these permissions to roles only, not to individual users.

Table 7-1
Object permissions
P ERMISSION OBJECTS TO WHICH IT APPLIES
Select Tables, views, table-valued functions, CLR functions, and synonyms
Insert Tables, views, and synonyms
Update Tables, views, and synonyms
Delete Tables, views, and synonyms
References Scalar and aggregate functions, table-valued functions, Service
Broker queries, tables, views, and synonyms
Execute Procedures, scalar and aggregate functions, synonyms
Receive Service Broker queries
View Definition Scalar and aggregate functions, Service Broker queries, tables,
views, and synonyms

When you’re assigning object permissions, include the security choices with the CREATE
TAKE NOTE
* or ALTER scripts used to develop the objects.

Unfortunately, SQL Server Management Studio doesn’t include any easy tools to see all the
object permissions for a role. However, a great deal of information is available using functions
and views. You can determine the permissions for a role or a user by executing the fn_my_
permissions function under the context of the user or role. Changing the execution context of
the current user or login is discussed in the next section.
CERTIFICATION READY? The process of reconciling the existing permission set against the policy that has been deter-
Know the different object
mined is tedious. It’s best tackled on a role-by-role basis, working from the new policy and
permissions and the
object types to which
checking the permissions for the objects and securables against what is currently assigned. As
they can apply. Can you identify missing permissions, you can use the GRANT command to add them to either
Execute be applied to new or existing roles as specified in the policy.
a Service Broker query? If you encounter permissions that are currently granted on securables but shouldn’t be in
How about the View
the new security policy, you have a choice of how to proceed. If the permissions are at a
permission?
gross level, meaning an entire role currently has permissions it shouldn’t, then you can use
Designing SQL Server Object-Level Security | 155

REVOKE to remove these permissions. If this situation is at a lower level, such as a user or
users who should have permissions separate from the larger role, then you can create a role for
this subset that contains the DENY permission at the appropriate level.

■ Specifying the Execution Context

Execution context specifies the way in which permissions are checked on statements and
THE BOTTOM LINE
objects.

By default, when a login connects to a SQL Server database, that login is used to determine
which permissions the user is assigned and which objects that user can access. However, SQL
Server includes the ability to change your context to that of another user, enabling you to
receive additional—or, potentially, reduced—permissions for the batches you execute.
Previous versions of SQL Server provided a limited ability to change your execution context
with the SETUSER command. This was a handy tool for system administrators, but it wasn’t
useful for the average login because it was limited to sysadmin or db_owner roles. Because some-
times you need to escalate permissions for a single function, SQL Server has the EXECUTE AS
statement that allows for permission escalation by temporarily changing the user context.

Microsoft has deprecated SETUSER, so your policy should recommend that it be replaced
TAKE NOTE
* in code wherever possible.

This command can be used in two cases, and you should address both with decisions
governing its use. In the first case, a particular function or stored procedure executes in a
specific login’s context. The second case is for longer batches or sessions where a series of
commands are executed as another user. These two situations are discussed next.

Implementing EXECUTE AS for an Object

The first case for switching your execution context occurs when an object contains the
EXECUTE AS statement as part of its definition. The object can be a stored procedure,
a function, a Service Broker queue, or a trigger. Each of these objects can have the execu-
tion context specified in one of four ways:

• EXECUTE AS CALLER. The behavior of an object is just as it is in previous versions


of SQL Server when this is specified. The execution context of the module being called
is set to that of the caller or login invoking the module.
In this situation, the permissions are checked on the module and its referenced objects
using the security token of the login or user executing the module. No additional
permissions are added to or removed from the session.
• EXECUTE AS <user_name>. In this case, <user_name> should be replaced with the
name of a database user. When the module is invoked, the permissions for the caller are
checked only to ascertain whether the caller can execute the stored procedure. After that,
permissions for any objects inside the module, whether in the same ownership chain as
the module or not, are checked against the username specified, not the caller. This allows
you to specify permissions for the module that may be different than those of the caller
or any other user in the database.
156 | Lesson 7

• EXECUTE AS SELF. This context is similar to the EXECUTE AS <user_name> context,


but it uses the context of the user creating or altering the module. SELF, in this case,
applies to the user that is executing the CREATE or ALTER statement on the module.
As an example, Steve is creating the NewSchema.MyProcedure stored procedure. The
code is as follows:
CREATE PROCEDURE NewSchema.MyProcedure
WITH EXECUTE AS SELF
AS
SELECT * FROM Steve.MyTable
Steve then grants Dean permission to execute this stored procedure. When Dean exe-
cutes it, permissions are checked to be sure he can execute the module, but the permis-
CERTIFICATION READY?
sions check on Steve.MyTable uses Steve’s permission set.
Know the forms of the
EXECUTE AS command • EXECUTE AS OWNER. This context uses the permission set of the module owner for
and be prepared to all objects referenced in the module. If the module doesn’t have an owner, the owner of
identify how the use of the module’s schema is used instead.
this command would This is similar to EXECUTE AS SELF if the person creating the module is the same as
alter the execution
the owner at execution time. However, because object ownership can be changed, this
context.
context allows the permission check to move to the new owner of the object.

In the three cases where execution context is set to a particular username, that user can’t be
TAKE NOTE
* dropped until the execution context is changed.

Case Study: Developing an EXECUTE AS Policy for an Object

These are all powerful features that allow you to temporarily assign a different set of
permissions to a user by allowing them to execute a module. These permissions don’t
carry through—for example, executing a permission on a module calling the Sales table
doesn’t grant permission to access the Sales table.
This limitation is useful when you want to let users access cross-schema objects, but you
don’t want to grant them explicit rights. Just as with schemas, implications exist that
can cause issues in administering security.
Because users tend to change more often than permissions or objects, you use tech-
niques that allow for this flexibility. In assigning permissions, you use groups and roles
to collect users together for easy administration. Starting with the 2005 version of SQL
Server the concept of a schema has been available. The schema separates object owner-
ship from individual users for the same reason. And this should caution you against
using a particular user or SELF to change execution context: Because a one-to-one
mapping exists between the user and a module, if the user needs to be dropped, every
module must be altered to change the execution context. This is the same administrative
issue with users both owning an object and being its schema.
Instead, if you need to grant temporary permissions, the EXECUTE AS OWNER
statement is the best choice if the permissions for the owner are set appropriately for the
referenced objects. However, this can still cause issues if the administrator doesn’t want
an object’s owner to have the extended permissions.
The best policy you can implement is to create specific users that are in a role expressly
created to meet your permissions needs. These users shouldn’t map to a user login, but
rather should exist only to execute the modules requiring special permissions.
If you think your environment is static enough to use individual users, then EXECUTE
AS is a good way to change permissions in only one module.
Designing SQL Server Object-Level Security | 157

Implementing EXECUTE AS in Batches

You can also use the EXECUTE AS statement as a stand-alone command that changes
the entire context of the session to that of the user specified. In this case, the use of the
statement is as follows:

USE MyDB
GO
EXECUTE AS USER = 'Kendall'
SELECT * FROM dbo.MyTable
. . . (more statements)
If Steve logged on to the SQL Server, the first statement sets the database context. The
next line changes the security context to that of Kendall. From that point forward, all the
statements that are executed have their permissions checked as if Kendall had logged on and
were executing them.
To use this statement, the calling login or user must have Impersonate permissions on the login
or user named in the command. This entails assigning that permission to those users or logins
that will need to change their context. As with other permissions, a database role is the best vehi-
cle for assigning permissions for user-level context switches. For login-level changes, you must
use individual permissions, although you can use Windows groups if you have Windows logins.
All statements are executed with this new security context until one of the following events
occurs:
• The session ends.
• Another EXECUTE AS statement is run.
• The REVERT command is issued.
The behavior of this command is similar to that of a trigger in that the calls to EXECUTE
AS nest themselves on a stack. The REVERT command returns you to the previous execution
context by default. You’ll look at the REVERT command in more detail after examining the
options for EXECUTE AS.
When you change execution context with EXECUTE AS, a few options are available that can
give you more control over the security of your data:
Scope of the EXECUTE AS Statements. You can change context in one of two ways: at the
user level or at the login level. By choosing one of these, you define the scope of the imper-
sonation that takes place. A login is a security object that exists at the server level, covering all
databases as fixed server roles. If you change your context to that of a login, then it’s as if you
logged on to the server as that user; the settings follow even if you change databases.
The syntax for this option is as follows:
EXECUTE AS LOGIN = <login name>
The user scope is only within the database in which it’s invoked and the user exists. This
prevents the inadvertent granting of permissions in other databases that the impersonated user
may have rights to access. This extends to USE <database> statements and linked server or
other distributed queries as well. Any of these cross-database statements will fail.
The syntax for this option is as follows:
EXECUTE AS USER = <user name>
NO REVERT. The NO REVERT option prevents the return of execution context to the
previous user or login. This is similar to an application role in that the session remains in the
context of the new user until the session is dropped. In essence, this command clears the stack
of execution contexts and prevents the return to any prior context.
If the REVERT command is run after this option has been specified, it has no effect. To
invoke this option, use the following syntax:
EXECUTE AS USER = <user name> WITH NO REVERT
158 | Lesson 7

NO REVERT COOKIE. As with many things in SQL Server, there is an exception to the
NO REVERT option. The execution context can be stored in a varbinary variable and used
to return to the previous execution context. This option allows the client to maintain the data
needed to restore the previous context.
The syntax to invoke this option is as follows:
DECLARE @cookie VARBINARY(100);
EXECUTE AS USER = <user name>
WITH NO REVERT
COOKIE = @cookie;
GO
To restore the context, you execute the following, assuming the @cookie variable contains the
correct cookie from the previous statement:
REVERT WITH COOKIE = @cookie
This type of statement ensures that the execution context can be reversed only by a client that
knows the correct cookie value. In connection-pooling statements, this can prevent another
client from changing its context without knowing this value.

Auditing

It’s important to consider the auditing aspects of changing context. Once users change
their context, many of the functions associated with auditing return the name of the
new context, not the original login. The debate over which login or username should be
recorded may be philosophical, but the requirements of many enterprises dictate that the
auditing should be traceable back to an actual physical user, not an account.

Fortunately, there are a few ways in which you can access the underlying information about the
original authenticated user. The ORIGINAL_LOGIN() function returns the name of original
login (either Windows or SQL Server authenticated). This can be used in place of the USER_
NAME() or SUSER_SNAME() functions, which return the current context, not the original user.
If you’re using Profiler to trace the execution of an application, Profiler includes a column
named SessionLoginName, which isn’t visible or selected by default. This column contains
the value of the original login to the server; to see it, check the Show all columns check box
on the Events Selection tab, as shown in Figure 7-1.

Figure 7-1
Events Selection tab of the
Trace Properties dialog box
Designing SQL Server Object-Level Security | 159

The default Profiler selections of NTLoginName and LoginName will change depending on
LAB EXERCISE
the context switch. This can lead to problems in providing a well-defined auditing trail.
Perform Exercise 7.2 in your lab In Exercise 7.2, you’ll change the execution context.
manual.

Developing an EXECUTE AS Policy for Batches

Using this command is much different from using it as part of a module. A user who
can change context can execute any command that the new context has the appropriate
permissions to run. Similar to the use of ad hoc SQL or dynamic SQL with EXEC(),
this has the potential to be a large security risk. An open-ended list of possible com-
mands that can be run invites the possibility of a vulnerability or misconfigured permis-
sion set being left available for malicious or inadvertent use.

In general, the use of EXECUTE AS for batch queries is best suited to testing and simulation
by administrators and developers. By changing context to that of a regular user, a developer
can easily and quickly determine whether the application will perform properly for a real
user. The permissions are checked, procedures executed, and data queries return just as if the
developer had logged on as that user.
Because the developer can quickly change back to their own context to make a change with
REVERT, this greatly speeds the development process without requiring tedious switching of
windows or applications, logging off and back on, or any of the previous techniques. Because
many companies use Windows Authentication, this also allows the developer to simulate
other logins on a single machine as different users.

The EXECUTE AS statement works only with logins. It doesn’t work with groups, roles,
TAKE NOTE
* certificates, or any built-in accounts such as Local System, Local Service, or Network
Service.

Another way to use this feature in your security policy is to enable users to change context
with the database scope and create your own type of application role. By using the WITH
X REF
NO REVERT option, you can duplicate the functionality of the application role. You can
You examined application allow individual users to log on to SQL Server for auditing purposes; but by preventing them
roles in Lesson 6. from accessing any objects, you can force them to use the EXECUTE AS statement to obtain
permissions to query data.
Unlike with an application role, you can allow users to switch back to their original context.
This can be useful if you have more than one application and wish to let users switch between
them with different contexts without dropping their sessions.
The final place you should use the EXECUTE AS statement is in checking your permission
policy. A system administrator can use this command to impersonate any other users and
check which objects they can access and which they can’t. This is the best way to ensure that
your permissions policy is correctly designed and implemented.

■ Specifying Column-Level Encryption

The use of encryption creates a large load on your database server’s processor because complex
THE BOTTOM LINE calculations are required to both encrypt and decrypt data. You have two ways to limit the
processing load on the server: Choose keys wisely, and limit the deployment of encryption.
160 | Lesson 7

Choosing Keys
X REF

Lesson 5 discussed the The key choice has two parts: the type of key and the use of the key.
overall encryption policy.

As mentioned in Lesson 5, a variety of algorithms can be chosen for keys. These are divided
into two types, symmetric and asymmetric, as well as different lengths, commonly specified as
the number of bits in the key. Longer keys are more secure, but they require more processing
power. The general policy regarding key length is to choose the longest length you can, given
the processing capability your server can handle.
The algorithms have advantages and disadvantages that are beyond the scope of this text. If
you don’t have a regulatory requirement for a particular algorithm, you should research them
to determine which is best suited for your application.
However, the use of each algorithm can greatly affect the performance of your server. The
recommendation is that you use an asymmetric key to secure the symmetric keys, which
in turn encrypt the data. This means a symmetric key is used to perform the encryption
and decryption of your data, because it’s faster with less of a load on the server’s processor.
This key is in turn encrypted by an asymmetric key, which is more secure but requires
greater processing power to perform the encryption operations. The use of certificates
or server-created asymmetric keys is a decision you should make based on your existing
infrastructure. Certificates are more complex and require more administration because they
have expiration dates. Some applications, however, require certificates.
One other method of encryption is available, but the keys aren’t maintained by SQL Server.
If you use the ENCRYPTBYPASSPHRASE() function, then you supply a password that will
be used to encrypt or decrypt the data. In this case, the user or application must supply this
passphrase or password every time an encryption or decryption operation takes place. Because
the keys aren’t secured or recoverable, this method of encryption isn’t recommended.

Deploying Encryption

The second aspect of encryption that you can control is the overall scale of how widely
it will be deployed in your database. The more columns you choose to encrypt, the more
processing power will be required both to store the data in a cipher format and to decrypt
the data each time that column is used in a query.

In addition to the processing requirements, encrypting data involves a few other performance
issues. Columns that are encrypted can’t be efficiently indexed or searched using an index.
This means that primary keys, foreign keys, and columns that will be indexes for heavily used
queries shouldn’t be encrypted. This requires careful consideration because many times the
column that you want to encrypt contains things like SSNs, credit card numbers, and so on
that you want to use as keys.
Both of these reasons should limit the amount of data you encrypt in your database. As you
seek to design an effective encryption scheme for your application, the meaning that can be
gleaned from unencrypted information should be carefully analyzed. Sometimes, encrypting
just a few sensitive columns can provide enough security without adversely affecting
performance.
For example, if you have a table of information that stores employees’ names, titles, and
annual salaries, it doesn’t make sense to encrypt the entire table. If just the salary is encrypted,
then someone reading the table can’t determine an individual’s salary, and the table can still be
easily indexed and searched by name or title.
However, if you choose to encrypt the name, then assuming this isn’t a key field, all the salary
values can be mapped to titles. This approach will disclose to anyone who can read the table a
Designing SQL Server Object-Level Security | 161

CERTIFICATION READY?
great deal of information about other employees based on their title. Similarly, if you encrypt
Is a certificate an example the titles, each person’s name can be easily matched with a salary.
of an asymmetric key or a Determining which fields to encrypt is a difficult decision that each administrator must make
symmetric key? based on the actual tables and the data contained in each column. Only by analyzing your
situation will you be able to decide which fields need to be encrypted and balance that against
LAB EXERCISE the performance penalties of using encryption.
Perform Exercise 7.3 in your In Exercise 7.3, you’ll encrypt a column of data two different ways.
lab manual.

■ Using CLR Security

Common language runtime (CLR) is Microsoft’s runtime environment technology for


THE BOTTOM LINE executing .NET program language code. CLR code can be used inside SQL Server and you
can create a wide variety of functions, stored procedures, and triggers that can meet virtually
any need in a T-SQL query.

One of the most interesting features for developers in SQL Server is the ability to code
modules in any .NET language and execute them in SQL Server. However, managing the
security of these objects is more important than ever, because the impact of these objects can
be seen in queries that may access millions of rows of data at a time.
By default, the ability to execute CLR objects in SQL Server is turned off. An administrator
must make a conscious decision to enable this functionality so that .NET assemblies
LAB EXERCISE registered on the server can be executed.
Perform Exercise 7.4 in your lab Exercise 7.4 will walk you through enabling the CLR environment in SQL Server.
manual.

Creating Assemblies

Once CLR usage is enabled, then you can deploy or install .NET assemblies on the SQL
Server instance and create objects to use the methods inside these assemblies.

These assemblies can be deployed automatically using Visual Studio .NET 2005/2008
X REF
or copied and manually added to SQL Server by an administrator. The ability to add an
In Lesson 6, you exam- assembly to SQL Server, however, requires the Create Assembly permission in either case.
ined the security levels This is similar to the ability to create a stored procedure or function. As previously noted,
for .NET assemblies, only sysadmins can create assemblies with an UNSAFE permission set due to the lack of
which can be set at SAFE, restrictions on what code is called.
EXTERNAL_ACCESS,
Integrating assemblies into SQL Server is slightly more complex than doing so with
or UNSAFE.
procedures or functions because of the nature of a .NET assembly. Each assembly can call
other assemblies; as a result, SQL Server must also load those referenced assemblies. If they
don’t exist, then SQL Server will load them as well.
If the assemblies already exist, then the same user or role must own the assembly, and the
referenced assembly must have been created in the same database. If not, then the creation of
the assembly will fail.
162 | Lesson 7

One other note about security when creating assemblies: Because they’re loaded from the file
system, the SQL Server must be able to access the files on the file system.

Accessing External Resources

As previously mentioned, a .NET assembly’s ability to access a resource outside of SQL


Server is governed by the permission set assigned when the assembly is created. If a per-
mission set isn’t given, it defaults to the SAFE level. Therefore, an assembly must explicitly
be assigned the EXTERNAL_ACCESS or UNSAFE permission set to access resources
outside SQL Server.

When this access occurs, it uses the permission set of the SQL Server service account unless
you use special programming techniques to enable impersonation of another account. These
advanced techniques require calls to the SqlContext.WindowsIdentity API. Consult the .NET
SDK for more information on this topic.
If you use impersonation, the calls must be out of process if they require data access. This
ensures the stability of the process and enables secure data access.
There are other restrictions on EXTERNAL_ACCESS. If an assembly attempts to access
external resources, and it was called by a SQL Server login, the access is blocked and an
exception is thrown. This also occurs if the caller isn’t the original caller. If the caller is a
Windows login and the original caller of the module, then the security context of the SQL
Server service is used, as mentioned at the beginning of this section.

ENABLING TRUSTED ASSEMBLIES


As your applications integrate .NET assemblies, it’s likely that some assemblies will need
to reference each other. To do this, the assemblies must be fully trusted if they signed with
a strong name—that is, unless they’ve been marked with the AllowPartiallyTrusted-Callers
attribute, which lets assemblies call each other without requiring them to be fully trusted.

USING APPLICATION DOMAINS


An application domain inside the SQL Server CLR environment is a bounded environment
inside of which a .NET module executes. This provides isolation between assemblies and their
internal structures.
However, if two or more assemblies are loaded that belong to the same owner, they’re loaded
in the same application domain. This enables the assemblies to discover each other at runtime
through reflection and to call each other in a late-bound fashion. Permissions aren’t checked
when assemblies call each other this way.

USING MODULE SIGNING


Modules in SQL Server, such as stored procedures, functions, triggers, and assemblies, can be
signed with a digital signature from a certificate. Users who are assigned permissions to use
this module can use the public key to decrypt the module and execute it.
This signing provides a way to temporarily grant greater privileges to a user or role without
switching the execution context. When a module that is signed is executed, the permissions
of the signer are added to the permissions of the caller only for the duration of the module’s
execution. Thus a module with a specific function can perform an action without granting
additional rights to the user, ensuring that the code isn’t changed.
An example is a module that accesses the list of SPIDs running on the server and computes
the blocking chain to determine the root blocker and the affected SPIDs. This is a complex
query in T-SQL, but a .NET assembly can easily compute the result. However, because it
Designing SQL Server Object-Level Security | 163

requires access to a system resource under a different ownership chain than the module,
unnecessary rights must be granted for this approach to work. Using module signing, you
can create a module that accesses the system resources under a login assigned to a certificate.
The certificate login will have the permissions to access the system resources. When users or
roles receive permission to execute the module, they will receive the permissions to access the
system resources only as long as the module is being invoked. Once it completes, they will no
longer be able to access the system resources.

Developing a CLR Policy

The CLR is a complicated environment that allows access to resources inside and outside
SQL Server in many ways. It greatly increases SQL Server’s capabilities and allows new
ways of working with data that were never possible before. It also allows unprecedented
access to the file system, host system resources, and network applications.

As such, it’s important that you develop a strong policy to guide how CLR objects are
integrated into your SQL Server environment to guarantee a secure system. Your policy
should consist of three parts to ensure that you maintain control over the assemblies you
integrate into your database environment. (This assumes you’ll allow the CLR to be enabled
on your server. If not, then that is the policy you should set.)
The first part of your policy uses GRANT, REVOKE, and DENY permissions you apply
to the assemblies to allow objects to be created using them. Because this policy affects the
functions and modules when they’re executed, be sure you limit the rights to execute these
functions to those who need to do so. If one group uses a CLR module to load data from a
web service, don’t grant rights to execute the module to all your users. Create a role, and limit
the execution rights to that group.
The second part of your policy deals with the assembly’s permission set. You should assign
the minimum set of permissions necessary when the module is created. Unless an assembly
truly needs to access resources outside the server, it should have the SAFE permission set
applied. Stringent requirements should be met before you create UNSAFE-level assemblies.
Your policy for developing these types of assemblies should have guidelines for what types
of functions will be granted permissions other than the SAFE level; that level should be the
default for all assemblies unless the need for the other levels is proven.
For those modules that access outside resources, you’ll want to make sure they will work
correctly in a multirow result set. For modules that are designed to work on a single row or a
few fields of data, you may wish to set a policy to prevent their use in queries or updates that
will affect large sets of data.
In addition to setting the permission-set level, you should also be sure you’re aware of the
interactions between assemblies. Utility assemblies contain code required by other assemblies,
but these need to be owned by the same user or schema and in the same database for the
assemblies to reference each other. If they aren’t, the code must be restructured to exist inside
a single assembly. The policy you develop should specify these restrictions to ensure the
resulting application functions as expected when deployed.
In setting this policy, you should also limit who has the right to add assemblies. Ideally, only
administrators should be allowed to create assemblies; if you assign rights to other users, be
sure that it’s a limited group and that each assembly added is carefully documented as to its
purpose and use (especially if it doesn’t use the SAFE permission level).
The last part of your policy that you need to design is the code policy for the assembly.
Although it’s likely that code standards for .NET development exist in most environments,
these assemblies will be called from within SQL Server and potentially called many times for
164 | Lesson 7

a single query. The assembly must be built to handle the stresses of executing inside the server
environment and should be coded to ensure that it doesn’t affect the stability of SQL Server.
Your environment may specify requirements for performance or load to ensure an assembly
doesn’t slow the server or adversely affect performance. Because these assemblies will act on
large sets of data that will be used for updates, reports, and business decisions, the modules
need to be thoroughly examined for accuracy in addition to performance.
For assemblies that access resources outside SQL Server, make sure the work being performed
can complete in a timely manner and not affect the performance of other queries or cause
instability with the SQL Server instance. This is especially important if any modules work
directly with memory or the server’s configuration.

S K I L L S U M M A RY

The security in your database is the final level of protection for your data. After logging in,
mapping to a user, and receiving the security tokens for any role memberships, the Select,
Insert, Update, Delete, Execute, and other T-SQL permissions applied to objects must be set to
meet your business requirements, but in as limited a way as possible.
Developing a strong policy is important for maintaining the security of your data, but it must
be applied to be effective. Your policy should be analyzed against the existing permissions and
any deficiencies brought into compliance. A limited number of exceptions may be required
because of preexisting applications or requirements, but they should be kept to a minimum.
SQL Server provides the ability of users to impersonate others and execute commands in a
security context other than their own. A secure SQL Server instance ensures that this is
controlled and limited to those cases where it’s truly needed. The impersonation capabilities
also can affect auditing systems and capabilities, so these functions should be examined to
guarantee that they still perform the functions that are required.
Encryption is a wonderful way to secure your data, but it brings with it a number of perfor-
mance trade-offs. A good policy will find a way to balance the secure control of your data
with the need to meet performance goals.
The CLR integration into SQL Server is an incredible capability that provides almost unlimited
ways to manipulate and analyze your data. However, it can drastically reduce the security
of your instance if controls aren’t developed around the assemblies you allow on your
server. This is especially true of any CLR access outside of the SQL Server instance. A strong
policy should be developed early on to eliminate the introduction of new points of attack or
instability on the database server.
For the certification examination:
• Understand how to design a permissions strategy. You should know the different parts of
the permissions strategy and how to structure your policy in order to meet your security
needs.
• Know the permission-assignment commands. Be sure that you understand the meanings
and use of GRANT, REVOKE, and DENY.
• Understand how to analyze existing permissions. You should be able to analyze existing
permissions and reconcile these against your security permission policy.
• Understand execution context. Be sure that you understand how to determine and change
your execution context and the implications of doing so.
• Know the implications of using the CLR. The CLR environment changes the capabilities of
SQL Server. You should understand the potential impact of these capabilities.
Designing SQL Server Object-Level Security | 165

■ Knowledge Assessment

Case Study
Jack’s Steamed Shrimp
Jack’s Steamed Shrimp is a small food chain that specializes in steamed seafood and has
a number of locations in seaside resort towns. Each location has outside tables for diners
to enjoy and a thriving delivery business to the surrounding areas. The company has
grown to more than 20 locations and 250 employees.
Jack’s has developed two applications internally that are used to run the day-to-day
operations of the business. One is designed to handle the point of sale (POS) for the
food sales, both in-store and telephone orders for delivery. The other maintains the
inventory and food orders required to ensure that none of the locations run out of
supplies.

Planned Changes
New applications are being rolled out to replace existing applications that currently run
on the same database. Some schema changes will be included to extend functionality,
but a number of the old objects in both the Inventory and Sales databases will be
maintained.
Some new users and roles will be required, but the existing users will be reassigned new
permissions based on a policy developed for the new applications. Each database will
have two new roles: AppUsers and AppAdmins, with permissions assigned based on the
capabilities of the users.

Existing Data Environment


There are two databases on a single SQL Server 2005 instance: Sales and Inventory.
Each database supports an application that is currently being replaced with an upgraded
version.
Currently, there is a single role in each database that includes each employee’s login
account. Because the terminals running the point-of-sale application are shared, each
employee has a SQL Server login that they use to verify their identity to the application
and server.
The point-of-sale database contains two schemas: the POS schema for the data-entry
tables used in orders and the Supervisor schema for data that is aggregated from the
orders for sales planning and forecasting.
The Inventory database contains a single schema, FoodStore, that contains all objects in
this database.

Existing Infrastructure
The SQL Server 2005 server runs on a Windows 2003 server. All employees have a
domain account for logging in to the applications and various terminals.
The SQL Server 2005 instance was installed with the default options.
Each clerk carries a keycard on which a digital certificate is stored; this certificate
uniquely identifies that employee.

Business Requirements
A number of enhancements have been written in CLR language to handle a few
complex business requirements.
166 | Lesson 7

A number of regular customers keep their credit card numbers on file, and these
must be encrypted.
The developers need to have rights to the tables used for data lookups on the
Inventory system. However, the junior developers shouldn’t have rights to the
detailed inventory tables.

Technical Requirements
The inventory application uses a web service to gather data from business partners
and must access an Internet web server for this data.
Individual clerks need to be able to input the credit card numbers for clients, but
they shouldn’t have access to the tables where this information is stored. It’s decided
that the execution context for the stored procedures that insert the data should be
changed to sales manager.
A development user-defined role is set up in each database. This role is a member of
the db_datareader role.
The POS schema contains two CLR modules: GetSpecials, which builds customized
coupons for returning clients; and CalcRoute, which determines the driving
directions for deliveries.
The Inventory database contains a CLR module called OrderPredictor that
determines whether a product needs to be reordered when it’s called.

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions:
1. You create an assembly on the server for one of the new .NET modules given to you by
a developer. You create a function for this assembly and assign security rights, but it does
not seem to work. What is wrong?
a. .NET assemblies should be called directly, not through a function.
b. A stored procedure is used to access .NET assemblies.
c. The CLR subsystem is not enabled.
d. The module is not trusted.
2. Which level of permissions should be assigned when creating the assemblies that call the
web service?
a. SAFE b. UNSAFE
c. EXTERNAL_ACCESS d. REMOTE
3. You do not want to change the overall rights for the developers’ role because the senior
developers should be allowed to access the Inventory table. What rights should you
assign to the junior developers?
a. REVOKE SELECT ON INVENTORY
b. DENY SELECT ON INVENTORY
c. REMOVE SELECT ON INVENTORY
d. EXCEPTION SELECT ON INVENTORY
4. If developers build a module that will handle the proper casing of customer names
when they are queried for printing on the delivery labels, what permission set should be
assigned this module?
a. UNSAFE b. EXTERNAL_ACCESS
c. SAFE d. LOW
Designing SQL Server Object-Level Security | 167

5. One of the developers wants to use the CalcRoute module in the POS database from the
OrderPredictor module in the Inventory database to help give suppliers directions to the
locations. Can these two modules call each other?
a. Yes
b. No
6. There is a customer defaults table that contains three fields: the customer code, the
last order, and a credit card for automatic ordering. Which of these columns should be
encrypted for security?
a. All three
b. The last order and credit card number
c. The customer code and credit card number
d. The credit card number
7. One of your developers is building stored procedures on their test system that will
require elevated privileges for its execution. It is decided to elevate privileges for the
module to that of the owner of the procedure at execution time. What clause should
the developer use when creating the module on his test system to ensure it is deployed
correctly on the production system?
a. EXECUTE AS SELF
b. EXECUTE AS <developer user name>
c. EXECUTE AS OWNER
d. EXECUTE AS CALLER
8. The auditing company was granted temporary access to the POS.CustomerDefaults
table during tax season by issuing GRANT SELECT ON POS.CustomerDefaults TO
Auditors. To remove these permissions, what should you execute for the Auditors role?
a. REVOKE SELECT ON POS.CustomerDefaults
b. DENY SELECT ON POS.CustomerDefaults
c. REMOVE SELECT ON POS.CustomerDefaults
d. GRANT NONE ON POS.CustomerDefaults
9. You wish to allow the senior developer access to the POS.Products table and want to let
him give this permission to other developers without granting him any fixed database
roles. What clause should you use with the GRANT command to achieve this?
a. WITH CASCADE
b. WITH GRANT
c. WITH ALLOW
d. WITH OWNERSHIP
10. After a trial period, you realize that the developers should not have permissions on the
production POS database. So, you want to revoke permissions from the senior developer
along with any permissions he has granted to others. What clause should you use with
the REVOKE clause?
a. WITH REMOVE
b. WITH REVOKE
c. CASCADE
d. FROM ALL
8 LESSON
Designing a
Physical Database
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL EXAM OBJECTIVE


Modify an existing database design based on performance
and business requirements. Foundational
Ensure that a database is normalized. Foundational
Allow selected denormalization for performance purposes. Foundational
Ensure that the database is documented and diagrammed. Foundational
Design tables. Foundational
Decide if partitioning is appropriate. Foundational
Specify primary and foreign keys. Foundational
Specify column data types and constraints. Foundational
Decide whether to persist computed columns. Foundational
Specify physical location of tables, including filegroups
and a partitioning scheme. Foundational
Design filegroups. Foundational
Design filegroups for performance. Foundational
Design filegroups for recoverability. Foundational
Design filegroups for partitioning. Foundational
Design index usage. Foundational
Design indexes for faster data access. Foundational
Design indexes to improve data modification. Foundational
Specify physical placement of indexes. Foundational
Design views. Foundational
Analyze business requirements. Foundational
Choose the type of view. Foundational
Specify row and column filtering. Foundational

168
Designing a Physical Database | 169

KEY TERMS
constraint: A property assigned such as searching, sorting, and named; if the database structure
to a table column that prevents recombining data. Databases are has a name, it’s an object.
certain types of invalid data stored in files. Examples include database, table,
values from being placed in the index: In a relational database, a attribute, index, view, stored
column. For example, a UNIQUE or database object that provides procedure, trigger, etc.
PRIMARY KEY constraint prevents fast access to data in the table: A two-dimensional object,
you from inserting a value that is rows of a table, based on key which consists of rows and
a duplicate of an existing value, values. Indexes can also enforce columns, that stores data about
a CHECK constraint prevents you uniqueness on the rows in an entity modeled in a relational
from inserting a value that does a table. SQL Server supports database.
not match a specified condition, clustered and nonclustered view: An object defined by a
and NOT NULL prevents you from indexes. The primary key of a SELECT statement that permits
leaving the column empty (NULL) table is automatically indexed. seeing one or more columns from
and requires the insertion of In full-text search, a full-text one or more base tables. With
some value. index stores information about the exception of instantiated
database: A collection of significant words and their views (indexed views), views
information, tables, and other location within a given column. themselves do not store data.
objects organized and presented object: An object is an allocated
to serve a specific purpose, region of storage; an object is

Think about your house for a moment, then your office, classroom, gym locker, car, and
any other place you habitually haunt. These locations are full of objects you own—such
as clothes, food, DVDs, your copy of this textbook, tools, and so on. Most of your stuff
is probably at your home, but unless you’re severely messy, it’s unlikely that you randomly
toss your stuff into your house and lose hope you can find it again later.

What you do is try to store your various objects in containers (such as cabinets, dressers, or
bookshelves). More than likely, you also keep similar objects together; for example, your dress
shirts are hung next to one another in the closet, your Star Trek videos are all neatly lined up
on a shelf in some sort of order, and so on.
Why do you organize your objects? Because if you didn’t, you couldn’t find them later, and
if you couldn’t find them, you couldn’t use them. If you can’t use them, what’s the point of
having them? If you don’t know where an object is when you want it, you’ll spend a great deal
of unproductive time trying to find it. These principles also hold true with SQL Server.
SQL Server is full of tables, views, stored procedures, and other objects. When it comes to
your clothes, food, tools, and so on, you need containers to store them—with SQL Server,
those containers are databases.
It makes sense that before you begin creating objects, such as tables and views, you must
create the database that will contain those objects. In this Lesson, you’ll learn what you need
to do while creating, configuring, and administering databases in order to maximize their
performance. As with most tasks in the book, planning is the hard part—but the rewards of a
well-constructed database plan are well worth it.
Databases consist of up to three types of files: primary data files, secondary data files, and
transaction log files. The primary data files store user data and system objects that SQL Server
needs to access your database. The secondary data files store only user information and are
used to expand your database across multiple physical hard disks. The transaction log files
allow up-to-the-minute recoverability by keeping track of all data modifications made on the
system before they’re written to the data files.
170 | Lesson 8

■ Modifying a Database Design Based on


Performance and Business Requirements

You don’t make decisions on how the database should be designed in a vacuum or based on
personal whimsy. Because you’re in the process of building a database infrastructure, you
THE BOTTOM LINE
need to consider some critical issues: performance, and the user’s or organization’s business
requirements.

A SQL Server database consists of a collection of tables that stores a specific set of structured
data. A table contains a collection of rows, also referred to as records, and columns, also
referred to as attributes. Each column in the table is designed to store a certain type of infor-
mation; for example, dates, names, dollar amounts, and numbers.
Tables have several types of controls, such as constraints, triggers, defaults, and customized
user-data types, which are used to protect and guarantee the validity of the data. As you’ll
see later, tables can have indexes similar to those in books that help you find rows quickly. A
database can also contain procedures that use Transact-SQL or .NET Framework program-
ming code to perform operations with the data. These operations include creating views that
provide customized access to table data or running user-defined functions that perform com-
plex calculations on a subset of rows.

Planning a Database

The first step in designing and creating a database is to develop a plan. A plan serves two
purposes: It provides a guide to follow when implementing the database, and it serves as
a functional specification for the database after it has been implemented.

The nature and complexity of a database, and the process of planning it, can vary signifi-
cantly. A database can be relatively simple and designed for use by a single person, or it can
be large and complex and designed, for example, to handle all the banking transactions for
thousands of clients. In the first case, the database design may be little more than a few notes
on some scratch paper. In the latter case, the design may be a formal document hundreds of
pages long that contains every possible detail about the database.
Regardless of the database’s size and complexity, there some basic principles you should always
follow:
Gather information. Before creating a database, you need a good understanding of what
the database is for and what it’s expected to do. Is it a new database? Is it a modification of
an existing electronic one? Or is it intended to replace a paper-based or manually performed
information system?
By reviewing the background and any existing systems, paper or electronic, you’ll get most
of the information you need. Collect copies of customer statements, inventory lists, manage-
ment reports, and any other documents that are part of the existing system, because these will
be useful to you in designing the database and the interfaces.
You should also review the business requirements of the database and organization and
make sure they coincide. Another key task is to interview the stakeholders and everyone else
involved in the system to determine what they do and what they need from the database.
It’s also important to identify what they want the new system to do, and also to identify the
problems, limitations, and bottlenecks of any existing system. Your design should take advan-
tage of every optimal opportunity and, at the same time, minimize the physical shortcomings
and bottlenecks that may exist in the system, at least until you can correct them.
Designing a Physical Database | 171

Inventory the objects. As part of your plan, you need to review the planned objects (and inven-
tory those that exist, if you’re modifying an existing database). You should do the following:
• Identify the objects.
• Model the objects.
• Identify the types of information for each object.
• Identify the relationships between objects.

Ensuring That a Database Is Normalized

Normalization is the process of taking all the data that will be stored in a database and
separating it into tables according to rigorous rules. Unless you’re going to keep all your
data in a single table (not the best idea in most organizations), this is a decision-making
process. By defining ways in which tables can be structured, normalization helps you
come up with an efficient storage structure.

Efficient in this case doesn’t mean minimum size. Efficiency refers to structuring the database
so that data stays organized and changes are easy to make without side effects. Minimizing
storage size is sometimes a product of normalization, but it’s not the main goal.
Normalization primarily acts to preserve the integrity of your data. No matter what opera-
tions are performed in your database, it should be as difficult as possible to insert meaningless
data. Normalization recognizes four types of integrity:
• Entity integrity. Maintaining data consistency for each row (or instance) in the table.
This is often enforced with a unique identifier which can, but need not, be a primary key.
• Domain integrity. Maintaining data consistency with a column (or attribute) in a table.
This is often enforced with validity checking (null or not null, range or value).
• Referential integrity. Maintaining data consistency between columns in a table or
between tables in the database. This is often enforced with a foreign key.
• User-defined integrity. Maintaining data consistency by defining specific rules that
do not fall into one of the other integrity categories. This is often enforced with stored
procedures and triggers.
Normalizing a logical database design involves using formal methods to separate the data into
multiple related tables. Several narrow tables with fewer columns are characteristic of a normalized
database. A few wide tables with more columns are characteristic of a non-normalized database.
Some of the benefits of normalization include the following:
• Faster sorting and index creation.
• A larger number of clustered indexes.
• Narrower and more compact indexes.
• Fewer indexes per table. This improves the performance of the INSERT, UPDATE, and
DELETE statements.
• Fewer null values and less opportunity for inconsistency, which increases database
compactness.
• Fewer situations in which a single piece of data is stored in multiple and thus redundant
locations.

Allowing Selected Denormalization for Performance Purposes


Even though you’ll usually normalize your database, there will be occasions when you
need to denormalize it. However, you should start with the idea that you should never
denormalize your data without a specific business reason for doing so. Careless denor-
172 | Lesson 8

malization can ruin the integrity of your data and lead to slower performance—if you
denormalize too far, you end up including many extra fields in each table, and it takes
time to move that extra data from one place in your application to another.

The principal goal of normalization is to remove redundancy from the data. By contrast,
denormalization deliberately introduces redundancy into your data. Theoretically, you should
never denormalize data. However, in the real world, things aren’t always that simple, and you
may need to denormalize data to improve performance. For example, if you have an overnor-
malized database, it can slow down the database server because of the number of joins that
must be performed to retrieve data from multiple tables.

When you’re forced to denormalize data for performance, make sure you document your
TAKE NOTE
* decision so that another developer doesn’t think you made a mistake.

No hard and fast rules tell you exactly how (or whether) to denormalize tables in all circum-
stances, but you can follow these basic guidelines:
• If your normalized data model produces tables with multipart primary keys, particularly
if those keys include four or more columns and are used in joins with other tables, you
should consider denormalizing the data.
• If producing calculated values such as maximum historic prices involves complex que-
ries with many joins, you should consider denormalizing the data by adding calculated
columns to your tables to hold these values. SQL Server supports defining calculated
columns as part of a table, as you’ll see shortly.
• If your database contains extremely large tables, you should consider denormalizing the
data by creating multiple redundant tables instead. You may do this either by column
or by row. For example, if the Medications table contains many columns, and some of
these (such as patent date) are infrequently used, it will help performance to move the
less frequently used columns to a separate table. With the volume of the main table
reduced, access to this data will be faster. If the Medication table is worldwide, but most
queries require information about medications from only one region, you can speed up
the queries by creating separate tables for each region.
• If data is no longer live and is being used for archiving, or is otherwise read-only, denor-
malizing by storing calculated values in columns can make certain queries run faster.
• If queries on a single table frequently use only one column from a second table, consider
including a copy of that single column in the first table.

Ensuring That the Database Is Documented and Diagrammed


If you have had any experience with IT or know anyone in the field, the most common
complaint is the lack of documentation. With respect to the databases in SQL Server,
documentation takes two forms. The first is traditional documentation, which is basi-
cally a written record of the code to supplement your plans. Ideally, the code is well
annotated. The second way is to document a database visually through a diagram that
shows the relationships between the parts of the database. Most DBAs and developers
find creating either kind of documentation tedious and are exceptionally creative when it
comes to finding excuses to move those tasks to the bottom of the priority pile.

As you’ll see, Microsoft has made the documentation and diagramming processes less painful
and designed them so that you can use some of the processes to modify your database.
Designing a Physical Database | 173

DOCUMENTING THE SCHEMA


Starting with SQL Server 2005, you can document an existing database structure, called a
LAB EXERCISE schema, by generating one or more SQL scripts. You can view a SQL script by using the
Perform Exercises 8.1 and 8.2 Management Studio Query Editor or any text editor. In Exercise 8.1, you’ll document the
in your lab manual. AdventureWorks database schema.
You can use the database schema generated as an SQL script for the following tasks:
• Maintaining a backup script that lets the user re-create all users, groups, logins, and
permissions
• Creating or updating database development code
• Creating a test or development environment from an existing schema
TAKE NOTE
* • Training newly hired employees
Be aware that you
• Diagramming a Database Structure
can also use Database
Diagram Designer when Back when electronic databases were young, the only way you could diagram your database
you’re designing (or was by sketching it on a piece of paper or drawing it on a blackboard. Thankfully, those days
modifying) a database are gone. SQL Server ships with a tool called Database Diagram Designer that allows you to
to create, edit, or delete design and visualize a database to which you’re connected. To help you visualize a database,
tables, columns, keys, it can create one or more diagrams illustrating some or all of the database’s tables, columns,
indexes, relationships, keys, and relationships.
and constraints.
You’ll use database diagrams in Exercise 8.2.

■ Designing Tables

Tables are database objects that contain all the data in a database. Each database has at least
one table, and frequently more. Data in tables is organized in a row-and-column format from
THE BOTTOM LINE which is derived the “relational” part of your relational database management system. Each
row represents a unique record, and each column represents an attribute within the record.

Figure 8-1 shows the Person.Contact table from the AdventureWorks database. It contains a
row for each contact and columns representing contact information such as IDs, titles, names,
and e-mail addresses, among others.

Figure 8-1
Tables consist of columns and
rows, also called attributes
and records.
174 | Lesson 8

When you design a database, you should first determine what tables it needs, the type of data
that goes in each table, and, as you saw earlier in the book, which users can access each table.
The recommended way to create a table is to first define everything you need in the table,
including data restrictions and other components. Key decisions you need to make about the
table include the following:
• The types of data the table will contain
• The number of columns in the table and, for each column, the datatype and length, if
it’s required
• Which columns will accept null values
• Whether and where to use constraints or defaults and rules
• The types of indexes that will be needed, where required, and which columns are
primary keys and which are foreign keys
Another good method for designing a table, especially in a complex or critical environment,
is to create a basic table, add some data to it, and then experiment with it for a while. This
method is useful because it gives you an opportunity to get an idea of what transactions are
common and what types of data are entered most frequently before you commit to a firm
design by adding constraints, indexes, defaults, rules, and other objects.

Deciding Whether Partitioning Is Appropriate

Tables in SQL Server can range from very small, having only a single record, to extremely
huge, with millions of records. These large tables can be difficult for users to work with
because of their sheer size. To make them smaller without losing any data, you can
partition your tables.

Partitioning tables works just like it sounds: You cut tables into multiple sections that can be
stored and accessed independently without the users’ knowledge. Suppose you have a table that
contains order information, and the table has about 50 million rows. That may seem like a big
table, but such a size isn’t uncommon. To partition this table, you first need to decide on a par-
tition column and a range of values for the column. In a table of order data, you probably have
an order date column, which is an excellent candidate. The range can be any value you like;
but because you want to make the most current orders easily accessible, you may want to set
the range at anything older than a year. Now you can use the partition column and range to
create a partition function, which SQL Server will use to spread the data across the partitions.
Next, you need to decide where to keep the partitioned data physically; this is called the par-
tition schema. You can keep archived data on one hard disk and current data on another disk
by storing the partitions in separate filegroups, which can be assigned to different disks.
Once you’ve planned your partitions, you can create partitioned tables using the CREATE
TABLE function.

To see what a partitioned table looks like, examine the AdventureWorks database:
TAKE NOTE
* The TransactionHistory and TransactionHistoryArchive tables are partitioned on the
ModifiedDate field.

Partitioning a table improves performance and simplifies maintenance. When you split a large
table into smaller, individual tables, queries that access only a fraction of the data can run
faster because there is less data to scan. Maintenance tasks, such as rebuilding indexes or back-
ing up a table, can also run more quickly.
You can partition a database without splitting tables by physically putting tables on individual
disk drives. Putting a table on one drive and related tables on another drive can improve
Designing a Physical Database | 175

query performance because when queries that involve joins between the tables are run, multiple
disk heads read data at the same time. SQL Server filegroups can be used to specify the disks
on which to put the tables.
There are three types of partitioning:
Hardware partitioning. Hardware partitioning designs the database to take advantage of
the available hardware architecture, including multiprocessors and Redundant Array of
Inexpensive Disks (RAID) configurations.
Horizontal partitioning. Horizontal partitioning divides a table into multiple tables with the
same number of columns but fewer rows. For example, suppose a hospital has a table with a
billion rows of patient billing data. The table could be partitioned horizontally into 12 tables,
with each smaller table containing a month’s worth of data. If a user issues a query requiring
data for a specific month, it references only the appropriate table.
If you opt to partition tables horizontally, you should partition the tables so that queries refer-
ence as few tables as possible. Otherwise, excessive UNION queries, used to merge the tables
logically at query time, can affect performance.
Horizontal partitioning is typically used when data can be divided based on age or use. For
example, a table may contain data for the last five years, but only data from the current year
is regularly accessed. In this case, it makes performance sense to partition the data into five
tables, with each table containing data from only one year.
Vertical partitioning. Whereas horizontal partitioning divides tables based on rows, vertical
partitioning divides a table into multiple tables containing fewer columns. There are two
types of vertical partitioning:
• Normalization: the process of removing redundant columns from a table and putting them
in secondary tables that are linked to the primary table by primary key and foreign key relation-
ships.
• Row splitting: divides the original table vertically into tables with fewer columns. Each
logical row in a split table matches the same logical row in the others. For example, joining
the tenth row from each of the split tables re-creates the original row.
Like horizontal partitioning, vertical partitioning lets queries scan less data, thus improving
query performance.
Vertical partitioning can also have an adverse impact on performance because analyzing data
from multiple partitions requires queries that join the tables, slowing the process. Vertical
partitioning can also negatively affect performance if partitions are very large.

Specifying Primary and Foreign Keys

You can use a primary key to ensure that each record in your table is unique in some
way. The primary key does this by creating a special type of index called a unique index.
An index is ordinarily used to speed up access to data by reading all the values in a
column and keeping an organized list of where the record that contains that value is
located in the table. A unique index not only generates that list, but also doesn’t allow
duplicate values to be stored in the index. If a user tries to enter a duplicate value in the
indexed field, the unique index returns an error, and the data modification fails.

TAKE NOTE
* Take another look at Figure 8-1. In this case, assume the ContactID field is defined as a pri-
When a column can be
mary key. As you can see, you already have a contact with a ContactID of 1 in the table. If
used as a unique identi-
one of your users tries to create another contact with a ContactID of 1, they will receive an
fier for a row (such as
error and the update will be rejected, because ContactID 1 is already listed in the primary
an identity column), it’s
key’s unique index. (This is just an example—the ContactID field has the identity property
referred to as a surrogate
set, which automatically assigns a number with each new record inserted and won’t allow you
or candidate key.
to enter a number of your own design.)
176 | Lesson 8

Choosing a Primary Key

The primary key must consist of a column (or columns) that contains unique values.
This makes an identity column a good candidate for becoming a primary key, because
the values contained therein are unique by definition. If you don’t have an identity col-
umn, make sure you choose a column, or combination of columns, in which each value
is unique. Regardless of whether you use an identity column, when deciding which field
to use as a primary key, you should consider these factors:

Stability. If the value in the column is likely to change, it won’t make a good primary key.
When you relate tables together, you’re making the assumption that you can always track the
relation later by looking at the primary key values.
Minimality. The fewer columns in the primary key, the better. A primary key of customer_id
and order_id is superior to one of customer_id, order_id, and order_date. Adding the extra
column doesn’t make the key more unique; it merely makes operations involving the primary
key slower.
Familiarity. If the users of your database are accustomed to a particular identifier for a type
of entity, it makes a good primary key. For example, you might use a part number to identify
rows in a table of parts.

When a column has mostly unique values, it’s said to have high selectivity. When a
TAKE NOTE
* column has several duplicate values, it’s said to have low selectivity. The primary key field
must have high selectivity (entirely unique values).

LAB EXERCISE

Perform Exercise 8.3 in your lab In Exercise 8.3, you’ll examine the Person.Contact table and modify a primary key.
manual.
USING FOREIGN KEYS
A foreign key is the column (or combination of columns) whose values match the primary
key in the same or another table. It’s most commonly used in combination with a primary
key to relate two tables on a common column. It can also be defined to reference the columns
of a UNIQUE constraint in another table.
For example, assume you have two tables, Medications and Physicians, with the following col-
umns, where PK means the primary key and FK means the foreign key:

M EDICATIONS PHYSICIANS
MedicationID (PK) PhysicianID (PK)
PhysicianID (FK) LastName
Class FirstName
Number Specialty
Frequency DateofHire
Designing a Physical Database | 177

You can relate the two tables on the PhysicianID column that they have in common. If you
use the PhysicianID field in the Physicians table as the primary key (which you already have),
you can use the PhysicianID field in the Medications table as the foreign key that relates the
two tables. You won’t be able to add a record to the Medications table if there is no matching
record in the Physicians table. Not only that—you can’t delete a record in the Medications
table if there are matching records in the Physicians table.
With a foreign key in place, you can protect records not only in one table but in associated
related tables from improper updates. Users can’t add a record to a foreign-key table without
a corresponding record in the primary-key table, and primary-key records can’t be deleted if
they have matching foreign-key records.
The relationship between a primary key and a foreign key can take one of several forms. It
can be one-to-many. It can be one-to-one, where precisely one row in each table matches
one row in the other. Or it can be many-to-many, where multiple matches are possible
(imagine a table of physicians and a table of patients, each of whom might see many
physicians).

To implement a many-to-many relation in SQL Server, you need to use an intermediate


TAKE NOTE
* joining table to break the relation into two separate one-to-many relations.

SPECIFYING COLUMN DATATYPES AND CONSTRAINTS


Each field in a table has a specific datatype, which restricts the type of data that can be
inserted. For example, if you create a field with a datatype of int (short for integer, which is
a whole number [a number with no decimal point]), you won’t be able to store characters
(A–Z) or symbols (such as %, *, #) in that field because SQL Server allows only numbers
to be stored in int type fields. In Figure 8-2, you can see the datatypes listed in the second
column.
You’ll notice that some the fields in this table are either char or varchar (short for character
and variable character, respectively), which means you can store characters in these fields as
well as symbols and numbers. However, if numbers are stored in these fields, you won’t be able
to perform mathematical functions on them because SQL Server sees them as characters, not
numbers.

Figure 8-2
Table field names and
datatypes

SPECIFYING SQL SERVER BUILT-IN DATATYPES


The following is a list of all the SQL Server datatypes, their uses, and their limitations:
Bigint. This datatype includes integer data from –263 (–9,223,372,036,854,775,808) through
263 – 1 (9,223,372,036,854,775,807). It takes 8 bytes of hard-disk space to store and is useful
for extremely large numbers that won’t fit in an int type field.
178 | Lesson 8

Binary. This datatype includes fixed-length, binary data with a maximum length of 8,000 bytes.
It’s interpreted as a string of bits (for example, 11011001011) and is useful for storing anything
that looks better in binary or hexadecimal shorthand, such as a security identifier.
Bit. This datatype can contain only a 1 or a 0 as a value (or null, which is no value).
Char. This datatype includes fixed-length, non-Unicode character data with a maximum
length of 8000 characters. It’s useful for character data that will always be the same length,
such as a State field, which will contain only two characters in every record. This uses the same
amount of space on disk no matter how many characters are stored in the field. For example,
char(8) always uses 8 bytes of space, even if only four characters are stored in the field.
Datetime. This datatype includes date and time data from January 1, 1753, to December
31, 9999, with time values rounded to increments of .000, .003, or .007 seconds. This takes
8 bytes of space on the hard disk and should be used when you need to track very specific
dates and times.
Decimal. This datatype includes fixed-precision and scale-numeric data from –1038 + 1
through 1038 – 1 (for comparison, this is a 1 with 38 zeros following it). It uses two
parameters: precision and scale. Precision is the total count of digits that can be stored in the
field, and scale is the number of digits that can be stored to the right of the decimal point. If
you have a precision of 5 and a scale of 2, your field has the format 111.22. This type should
be used when you’re storing partial numbers (numbers with a decimal point).
Float. This datatype includes floating-precision number data from –1.79E + 308 through
1.79E + 308. Some numbers don’t end after the decimal point—pi is a fine example. For such
numbers, you must approximate the end, which is what float does. For example, if you set a
datatype of float(2), pi will be stored as 3.14, with only two numbers after the decimal point.
Identity. This isn’t a datatype, but a property, typically used in conjunction with the int
datatype. It’s used to increment the value of the column each time a new record is inserted.
Int. This datatype can contain integer (or whole number) data from –231 (–2,147,483,648)
through 231 – 1 (2,147,483,647). It takes 4 bytes of hard-disk space to store and is useful for
storing large numbers that you’ll use in mathematical functions.
Money. This datatype includes monetary data values from –263 (–922,337,203,685,477.5808)
through 263 – 1 (922,337,203,685,477.5807), with accuracy to a ten-thousandth of a mon-
etary unit. It takes 8 bytes of hard-disk space to store and is useful for storing sums of money
larger than 214,748.3647.
Nchar. This datatype includes fixed-length, Unicode data with a maximum length of 4,000
characters. Like all Unicode datatypes, it’s useful for storing small amounts of text that will be
read by clients who use different languages.
Numeric. This is a synonym for decimal.
Nvarchar: This datatype includes variable-length, Unicode data with a maximum length of
4,000 characters. It’s the same as nchar, except that nvarchar uses less disk space when there
are fewer characters.
Nvarchar(max). This datatype is just like nvarchar; but when the (max) size is specified, the
datatype holds 231 – 1 (2,147,483,647) bytes of data.
Real. This datatype includes floating-precision number data from –3.40E + 38 through
3.40E + 38. This is a quick way of saying float(24)—it’s a floating type with 24 numbers
represented after the decimal point.
Designing a Physical Database | 179

Smalldatetime. This datatype includes date and time data from January 1, 1900, through
June 6, 2079, with an accuracy of 1 minute. It takes only 4 bytes of disk space and should be
used for less specific dates and times than you’d store in datetime datatype.
Smallint. This datatype includes integer data from –215 (–32,768) through 215 – 1 (32,767).
It takes 2 bytes of hard-disk space to store and is useful for slightly smaller numbers than you
would store in an int type field, because smallint takes less space than int.
Smallmoney. This datatype includes monetary data values from –214,748.3648 through
214,748.3647, with accuracy to a ten-thousandth of a monetary unit. It takes 4 bytes of space
and is useful for storing smaller sums of money than would be stored in a money type field.
Sql_variant. This isn’t a datatype either; it lets you store values of different datatypes.
The only values it can’t store are varchar(max), nvarchar(max), text, image, sql_variant,
varbinary(max), xml, ntext, timestamp, or user-defined datatypes.
Timestamp. This datatype is used to stamp a record with an incrementing counter when
the record is inserted and every time it’s updated thereafter. It’s useful for tracking changes to
your data.
Tinyint. This datatype includes integer data from 0 through 255. It takes 1 byte of space on
the disk and is limited in usefulness because it stores values only up to 255. Tinyint may be
useful for something like a product-type code when you have fewer than 255 product codes.
Uniqueidentifier. The NEWID() function is used to create globally unique identifiers that
look like the following example: 6F9619FF-8B86-D011-B42D-00C04FC964FF. These
unique numbers can be stored in the uniqueidentifier type field; they’re useful for creating
tracking numbers or serial numbers that have no possible way of being duplicated.
Varbinary. This datatype includes variable-length, binary data with a maximum length of
8,000 bytes. It’s just like binary, except that varbinary uses less hard-disk space when fewer
bits are stored in the field.
Varbinary(max). This datatype has the same attributes as the varbinary datatype; but when
the (max) size is declared, the datatype can hold 231 – 1 (2,147,483,647) bytes of data. This is
very useful for storing binary objects like JPEG image files or Word documents.
Varchar. This datatype includes variable-length, non-Unicode data with a maximum of 8,000
characters. It’s useful when the data won’t always be the same length, such as in a first-name
field where each name has a different number of characters. This uses less disk space when
there are fewer characters in the field. For example, if you have a field of varchar(20), but
you’re storing a name with only 10 characters, the field will take up only 10 bytes of space,
not 20. The field will accept a maximum of 20 characters.
Varchar(max). This is just like the varchar datatype; but you specify a size of (max). The
datatype can hold 231 – 1 (2,147,483,647) bytes of data.
Xml. This datatype is used to store entire XML documents or fragments (a document that is
missing the top-level element).

The text, ntext, and image datatypes have been deprecated as of the 2005 version of SQL
TAKE NOTE
* Server. You should replace them with varchar(max), nvarchar(max), and varbinary(max)
when you design tables and replace them in existing tables.
180 | Lesson 8

SQL Server 2008 introduces eight new datatypes. Four new date and time related datatypes
provide a greater degree of control and precision of chronological data. Two new spatial
datatypes provide specific methods of storing positional data. The other two new datatypes
provide abilities to better handle large data objects and hierarchically structured data. The
following is a list of the new SQL Server 2008 datatypes, their uses, and limitations:
Date: This datatype uses the format of YYYY-MM-DD and is compliant with ANSI standards.
It uses 3 bytes of storage and can contain values from 0001-01-01 through 9999-12-31.
Datetime2: This datatype provides for a higher degree of time precision. This datatype can
store values from 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999 and
uses from 6 to 8 bytes of storage depending upon the time precision.
Datetimeoffset: This datatype is different from the other date and time datatypes in that it con-
tains a timezone offset value. The format and range is the same as the new Datetime2 datatype
except that an offset in hours and minutes follows the time value. This offset value can be either
positive or negative. Storage space ranges from 8 to 10 bytes depending upon the time precision.
TAKE NOTE
* Filestream: The use of Filestream is as an extension property to the existing varbinary(max)
datatype. The use of this property allows BLOB (Binary Large OBject) data such as photo-
graphs, audio, etc. to be stored outside of the SQL database in the Windows NTFS file system
but still provide for control and management using SQL Server. Special configuration steps
need to be taken to enable Filestream usage.
Hierarchyid: This datatype provides the ability to store a complex nested parent-child hier-
archy of data in a single table column. The size of this datatype increases with the depth and
quantity of data to be stored in the rows of the table.
Geography: This datatype stores positional data using geometric shapes (such as Point, Polygon,
LineString, etc) and latitude and longitude coordinates using a geodetic (round Earth) model.
Geometry: This datatype is similar to the Geography datatype. The difference is that Geometry
uses a planar (flat Earth) model.
Time: This datatype only stores time values and is compliant with ANSI standards. Data is
stored in hh:mm:ss[.nnnnnn] format and uses from 3 to 5 bytes of storage depending upon the
time precision.

SPECIFYING USER-DEFINED DATATYPES


SQL Server allows you to create your own datatypes based on your needs. For example, you
might want to create a State datatype based on the char datatype with all the parameters (length,
capitalization rules, and so on) prespecified, including any necessary constraints and defaults.

Using Constraints

CERTIFICATION READY? As you’ve seen from the beginning of this Lesson, tables are wide open to just about any
Rules (constraint objects kind of data when they’re first created. The only starting restriction is that users can’t vio-
defined once and used late the datatype of a field; other than that, the tables are fairly insecure from whatever
on multiple objects) your users want to put in them.
have been available
in previous editions of
SQL Server and remain So far you’ve seen that getting control of the table isn’t difficult, but it does require work.
available in both SQL You’ve learned how to use primary and foreign keys to control what happens to data and limit
Server 2005 and SQL what can be entered and what can’t.
Server 2008. Know
when to use them if Now, delve into the issue of how to restrict what data your users can enter in a field and how
only to eliminate a to maintain data integrity. You can enforce three types of integrity:
possible answer on your
Entity integrity. Entity integrity is the process of making sure each record in the table is unique
certification test.
in some way. Primary keys are the main way of accomplishing this; they can be used with foreign
Designing a Physical Database | 181

keys in enforcing referential integrity. Unique constraints are used when the table includes a field
that isn’t part of the primary key but that needs to be protected against duplicate values.
Referential integrity. Referential integrity is the process of protecting related data that is stored
in separate tables. A foreign key is related to a primary key. The data in the primary key table
can’t be deleted if there are matching records in the foreign-key table, and records can’t be
entered in the foreign-key table if there is no corresponding record in the primary-key table. The
only way around this behavior is to enable cascading referential integrity, which lets you delete or
change records in the primary-key table and have those changes cascade to the foreign-key table.
Domain integrity. Domain integrity is the process of restricting what data your users can enter
in a field. Check constraints and rules can be used to validate the data the users try to enter
against a list of acceptable data, and defaults can be used to enter data for the user as defaults.

USING CHECK CONSTRAINTS


Check constraints enforce domain integrity by limiting the values that are accepted by a
column. They’re similar to foreign-key constraints in that they control the values that are put
in a column; but whereas foreign-key constraints get their list of valid values from another
table, check constraints get their valid values from a logical expression that isn’t based on
data in another column. For example, you can limit the range of values for a PayIncrease
column by creating a check constraint that allows only data that ranges from 3–6 percent.
This prevents salary increases from being entered in the table that exceed or fall below the
organization established range beyond the regular salary range. As you can imagine, this is
helpful when you want to prevent a rogue accountant or disgruntled employee from finding
less-than ethical ways to get money from the system (or reduce your pay increase for not
helping them with their computer problem).
You create a check constraint using any Boolean expression that returns either true or false
based on the logical operators. For the previous example, the logical expression is PayIncrease
>= 0.03 AND PayIncrease <= 0.06.
You can apply multiple check constraints to a single column. You can also apply a single
check constraint to multiple columns by creating it at the table level. For example, a multiple
column check constraint can be used to confirm that any row with a country/region column
value of USA also has to have a two-character value in the state column. This allows for
multiple conditions to be checked in one location.

USING DEFAULT CONSTRAINTS


Check constraints serve no purpose if your users forget to enter data in a column—that is
where default constraints come in. If users leave fields blank by not including them in the
INSERT or UPDATE statement they use to add or modify a record, default constraints are
used to fill in those fields. There are two types of defaults: object and definition.
Object defaults are defined when you create your table and affect only the column on which
they’re defined. Definition defaults are created separately from tables and are designed to be
bound to a user-defined datatype.
In addition to making sure there is an entry, both types can, when used properly, save data
entry time. For example, suppose that most of your customers live in the United States and
that your data-entry people must type USA in the country field for every new customer. That
may not seem like much work, but if you have a sizable customer base, those three characters
can add up to a lot of typing. By using a default constraint, your users can leave the country
field intentionally blank if the customer is from the USA, and SQL Server will fill it in.

USING UNIQUE CONSTRAINTS


There are two major differences between primary key constraints and unique constraints.
First, primary keys are used with foreign keys to enforce referential integrity, and unique keys
aren’t. Second, unique constraints allow null (blank) values to be inserted in the field, whereas
primary keys don’t allow null values. However, as with any value participating in a unique
constraint, only one null value is allowed per column. Aside from that, they serve the same
purpose—to ensure that nonrepeating data is inserted in a field.
182 | Lesson 8

You should use a unique constraint when you need to ensure that no duplicate values can
TAKE NOTE
* be added to a field that isn’t part of your primary key. A good example of a field that might
A unique constraint require a unique constraint is a Social Security Number field, because all the values contained
can be referenced by a therein need to be unique; yet there would most likely be a separate employee ID field that
foreign-key constraint. would be used as the primary key.

Deciding Whether to Persist Computed Columns

Earlier, you saw that you can create user-defined datatypes. You can also create computed
columns. These are special columns that don’t contain any data of their own, but display
the output of an expression performed on data in other columns of the table. For
example, in the AdventureWorks sample database, the TotalDue column of the Sales.
SalesOrderHeader table is a computed column. It contains no data of its own but displays
the sum of the Subtotal, TaxAmt, and Freight columns as a single value.

Normally, computed columns are treated as virtual columns that aren’t physically stored in the
table, and their values are recalculated every time they’re referenced in a query. However, you
can use the PERSISTED keyword in the CREATE TABLE and ALTER TABLE statements
to require SQL Server 2005 to physically store computed columns in the table. When that
happens, the computed column values are updated when any columns that are part of their
calculation change.
Computed columns can be used in select lists, WHERE clauses, ORDER BY clauses, or any
other locations in which regular expressions can be used.
You must always persist computed columns in the following cases:
• The computed column is used as a partitioning column of a partitioned table.
• The computed column references a Common Language Runtime (CLR) function. In
this case, the computed column must be persisted so that indexes can be created on it.
• The computed column is used as a check, foreign-key, or not-null constraint.

Specifying Physical Location of Tables, Including


Filegroups and a Partitioning Scheme

There is no magical formula for the placement of tables or other components of a SQL
Server configuration. As always, your primary considerations should be performance and
recoverability. When placing tables or filegroups, or determining whether to partition
across multiple disks, consider all elements.

■ Designing Filegroups

Database objects, such as tables, indexes, views, and files, can be grouped together in
THE BOTTOM LINE filegroups for allocation and administration purposes. There are two types of filegroups:
primary and user-defined.

The primary filegroup contains the primary data file and any other files not specifically
assigned to another filegroup. All pages for the system tables are allocated in the primary
filegroup. User-defined filegroups are any filegroups specified by using the FILEGROUP
keyword in a CREATE DATABASE or ALTER DATABASE statement.
Designing a Physical Database | 183

TAKE NOTE
* Log files are never part of a filegroup. Log space is managed separately from data space.

No file can be a member of more than one filegroup. Tables, indexes, and large object data
can be associated with a specified filegroup, and all their pages are allocated in that filegroup.
Alternatively, the tables and indexes can be partitioned. In that case, the data of partitioned
tables and indexes is divided into units, each of which can be placed in a separate filegroup.
One filegroup in each database is designated the default filegroup. When a table or index is
created without specifying a filegroup, it’s assumed that all pages will be allocated from the
default filegroup. Only one filegroup at a time can be the default filegroup. Members of the
db_owner fixed database role can switch the default filegroup from one filegroup to another.
If no default filegroup is specified, the primary filegroup is the default filegroup.

Designing Filegroups for Performance

Now that you’ve started designing your database and created some secondary data
files, you can logically group them together into a filegroup to help manage disk-space
allocation. By default, all the data files you create are placed in the primary filegroup;
when you create an object (for example, a table or a view), that object can be created on
any one of the files in the primary filegroup. If you create different filegroups, though,
you can specifically tell SQL Server where to place your new objects. Doing so can help
with performance.

For example, suppose you have a sales database with several tables. Some of the tables are
mostly static, whereas others are volatile and frequently written to. If all these tables are
placed in the same filegroup, you have no control over the file in which they’re placed.
However, if you place a secondary data file on a separate physical hard disk (for example, disk D)
and place another secondary data file on another physical hard disk (disk E, perhaps), you can
place each data file in its own filegroup. This gives you control over where objects are created.
In this case, the best option is to place the first secondary data file by itself in a filegroup
named READ and to place the second secondary data file in its own filegroup named
WRITE. Now, when you create a table that is meant to be primarily read from, you can tell
SQL Server to create it on the file in the READ group, and you can place tables that are
meant to be written to in the WRITE filegroup.

As you learned in Lesson 2, secondary data files make up all the data files other than the
TAKE NOTE
* primary data file. Some databases may not have any secondary data files, whereas others
have several secondary data files.

Using files and filegroups improves database performance, because it lets a database be created
across multiple disks, multiple disk controllers, or RAID systems. For example, if your com-
puter has four disks, you can create a database that is made up of three data files and one log
file, with one file on each disk. As data is accessed, four read/write heads can access the data
in parallel at the same time. This speeds up database operations.
Additionally, files and filegroups enable data placement, because a table can be created in
a specific filegroup. This improves performance because all I/O for a specific table can be
directed at a specific disk. For example, a heavily used table can be put on one file in one
filegroup, located on one disk, and the other less heavily accessed tables in the database can be
put on the other files in another filegroup, located on a second disk.
184 | Lesson 8

Designing Filegroups for Recoverability


X REF

Lesson 11 covers piece- In SQL Server 2005 and 2008, databases made up of multiple filegroups can be restored
meal restores in more in stages by a process known as piecemeal restore.
detail.

When multiple filegroups are used, the files in a database can be backed up and restored indi-
vidually. Under the simple recovery model, file backups are allowed only for read-only files.
Using file backups can increase the speed of recovery by letting you restore only damaged files
without restoring the rest of the database. For example, if a database is made up of several
files physically located on different disks, and one disk fails, then only the file on the failed
disk has to be restored.

Designing Filegroups for Partitioning

You can achieve performance gains and better I/O balancing by using filegroups to place
a partitioned table on multiple files. As you know, filegroups can consist of one or more
files, and each partition must map to a filegroup. A single filegroup can be used for mul-
tiple partitions; but for better data management, including more granular backup control,
you should design your partitioned tables wisely so that only related or logically grouped
data resides on the same filegroup.

■ Designing Index Usage

In a database design, an index is an on-disk structure associated with a table or view that
speeds retrieval of rows from the table or view. An index contains keys built from one
THE BOTTOM LINE or more columns in the table or view. These keys are stored in a structure (B-Tree) that
enables SQL Server to find the row or rows associated with the key values quickly and
efficiently.

If you wanted to find the topic filegroup in this book, how would you go about it? You could
flip through pages one at a time, looking for the word filegroups or you might examine the
table of contents at the front of the book. Both these methods work, but they aren’t efficient.
Instead, you’d probably flip to the back of the book and review the index for the word
filegroup. If the index is well constructed, it will contain several entries and probably some
X REF
subheadings to help you differentiate the topic.
The next section of this
Lesson contains more Two types of indexes are associated with a table or a view: clustered and nonclustered.
details on views. Clustered indexes sort and store the data rows in the table or view based on their key values.
These are the columns included in the index definition. There can be only one clustered
index per table because the data rows can be sorted in only one order.
The data rows in a table are stored in sorted order only when the table contains a clustered
index. When a table has a clustered index, the table is called a clustered table. If a table has no
clustered index, its data rows are stored in an unordered structure called a heap.
Nonclustered indexes have a structure separate from the data rows. A nonclustered index
contains the nonclustered index key values, and each key value entry has a pointer to the data
row that contains the key value.
Table 8-1 shows the differences between clustered and nonclustered indexes.
Designing a Physical Database | 185

Table 8-1
Differences between Clustered
C LUSTERED NONCLUSTERED
and Nonclustered Indexes
Only 1 allowed per table Up to 249 allowed per table
Physically rearranges the data in the table Creates a separate list of key values with
to conform to the index constraints pointers to the location of the data in the
data pages
For use on columns that are frequently For use on columns that are searched
searched for ranges of data for single values
For use on columns with low selectivity For use on columns with high selectivity

You can have only one clustered index per table because clustered indexes physically
TAKE NOTE
* rearrange the data in the indexed table.

Which type of index should you use, and where? In a few moments, you’ll look at how you
can design indexes for faster data access and how to perform data modification. First, examine
some basic guidelines and strategies you should employ when designing indexes.
The first consideration is making sure you understand the characteristics of the database.
For example, is it an On-Line Transaction Processing (OLTP) database with frequent data
modifications, or a Decision Support System (DSS) or data-warehousing (On-Line Analytical
Processing [OLAP]) database that contains primarily read-only data?
Next, what are the characteristics of the most frequently used queries? For example, knowing
that a frequently used query joins two or more tables will help you determine the best type of
indexes to use.
You should also have a clear idea of the characteristics of the columns used in the queries.
For example, an index is ideal for columns that have an integer datatype and are also unique
or non-null.
Determine which index options may enhance performance when the index is created or
maintained. For example, creating a clustered index on an existing large table will benefit
from the ONLINE index option. The ONLINE option allows for concurrent activity on the
underlying data to continue while the index is being created or rebuilt.
Finally, make sure you give thought to the optimal storage location for your indexes.

Designing Indexes to Make Data Access Faster


and to Improve Data Modification

When you design an index, you should always follow these guidelines to maximize data
access and make it easier to modify data:

• Large numbers of indexes on a table negatively affect the performance of INSERT,


UPDATE, and DELETE statements because all indexes must be adjusted appropriately
as data in the table changes. Note that UPDATE statements are only effected if the
indexed column data is changed.
• Avoid over-indexing heavily updated tables. You should keep indexes narrow—that is,
with as few columns as possible.
• Use many indexes to improve query performance on tables with low update require-
ments but large volumes of data. Large numbers of indexes can help the performance
of queries that don’t modify data, such as SELECT statements, because the Query
Optimizer has more indexes to choose from to determine the fastest access method.
186 | Lesson 8

• Indexing small tables may not be worthwhile, especially if it takes the Query Optimizer
longer to traverse the index searching for data than performing a simple table scan
would. Although the indexes on small tables may never by used, they must still be main-
tained as data in the table changes, thus slowing performance and retarding data modifi-
cation with unnecessary resource usage.
• Indexes on views can provide significant performance gains when the view contains
aggregations, table joins, or a combination of aggregations and joins. The view doesn’t
have to be explicitly referenced in the query for the Query Optimizer to use it.
• Use the Database Tuning Advisor to analyze your database and make index
recommendations.

Creating Indexes with the Database Tuning Advisor

Ironically, the best way to plan and place indexes is to let SQL Server do it itself. SQL
Server comes with an extremely powerful tool called SQL Server Profiler, whose pri-
mary function is to monitor SQL Server. This tool provides an interesting fringe benefit
when it comes to indexing. Profiler specifically monitors everything that happens to the
MSSQLServer service, which includes all the INSERT, UPDATE, DELETE, and SELECT
statements that get executed against your database. Because Profiler can monitor what your
users are doing, it makes sense that Profiler can figure out what columns can be indexed to
make these actions faster. Enter the Database Tuning Advisor.

When you use Profiler, you generally save all the monitored events to a file on disk. This file
is called a workload, without which the Database Tuning Advisor can’t function. To create the
workload, you need to run a trace (which is the process of monitoring) to capture standard
LAB EXERCISE user traffic throughout the busy part of the day.
Perform Exercise 8.4 in your lab In Exercise 8.4, you’ll walk through the process of using Database Tuning Advisor to create an
manual. index.
Once your indexes have been created, they should be maintained on a regular basis to make
certain they’re working properly.

Specifying Physical Placement of Indexes

Part of your design plan should be to determine the storage location for the indexes you
design. Use the following guidelines and recommendations as part of your determination:

• Storing a nonclustered index on a filegroup that is on a different disk than the table file-
group improves performance because multiple disks can be read at the same time.
• Clustered and nonclustered indexes can use a partition scheme across multiple file-
groups. When you consider partitioning, determine whether the index should be
aligned—that is, partitioned in essentially the same manner as the table—or partitioned
independently.
• Create nonclustered indexes on a filegroup other than the filegroup of the base table.
This will result in performance gains if the filegroups are using different physical drives
with their own controllers.
• Partition clustered and nonclustered indexes to span multiple filegroups.
• Because you can’t predict what type of access will occur and when it will occur, it may be
better to spread your tables and indexes across all filegroups. Doing so guarantees that all
disks are being accessed, because all data and indexes are spread evenly across all disks,
regardless of which way the data is accessed. This is also a simpler approach for system
administrators.
Designing a Physical Database | 187

■ Designing Views

A view is nothing more than a virtual table whose contents are defined by a query. A view
THE BOTTOM LINE
is the filter through which you look at one or more columns from one or more base tables.

In the real world, many companies have extremely large tables that contain hundreds of
thousands, if not millions, of records. When your users query such large tables, they usually
don’t want to see all of these millions of records; they want to see only a small portion, or
subset, of the available data. You have two ways to return a small subset of data: You can use a
SELECT query with the WHERE clause specified, or you can use a view.
The SELECT query approach works well for queries that are executed infrequently, but this
approach can be confusing for users who don’t understand T-SQL code. For example, to
query the AdventureWorks database to see only the first-name, last-name, and phone fields
for contacts in Connecticut’s 203 area code, you can execute the following query:
USE AdventureWorks

SELECT Lastname, Firstname, Phone FROM Person.Contact


WHERE phone LIKE '203%'

That query returns a small subset of the data; but how many of your end users understand
the code required to get this information? Probably very few. You can write the query into
your front-end code, which is the display that your users see (usually in C# or a similar
language); but then the query will be sent over the network to the server every time it’s
accessed, and that eats up network bandwidth.
The best approach in this sort of a situation is to create a view for the users. Like a real table,
a view consists of a set of named columns and rows of data. The only difference between the
view and the table is that your view doesn’t contain any data—it shows the data, much like
the television set doesn’t contain any people, but just shows you pictures of the people in the
studio.
Unless it’s indexed, a view doesn’t exist as a stored set of data values in a database. The rows
and columns of data come from tables referenced in the query defining the view and are
produced dynamically when the view is referenced.
Now that you have a basic understanding of views, look at how to integrate them in to your
physical database design.

Analyzing Business Requirements

As you’ve just learned, views are generally used to focus, simplify, and customize the per-
ception each user has of the database. Views can be used as security mechanisms by letting
users access data through the view without granting the users permissions to directly access
the view’s underlying base tables. Views can be used to provide a backward-compatible
interface to emulate a table that used to exist but whose schema has changed. Views can
also be used when you copy data to and from Microsoft SQL Server to improve perfor-
mance and to partition data. How views are used depends heavily on your assessment of
your organization’s business requirements.
188 | Lesson 8

Views serve a number of functions and can have a number of roles:


Focusing data for the user. Views let users focus on specific data that interests them and on
the specific tasks for which they’re responsible. Unnecessary or sensitive data can be left out
of the view.
Simplifying data manipulation. You can define frequently used joins, projections, UNION
queries, and SELECT queries as views so that users don’t have to specify all the conditions
and qualifications every time an additional operation is performed on that data.
Providing backward compatibility. Views enable you to create a backward-compatible inter-
face for a table when its schema changes.
Customizing data. Views let different users see data in different ways, even when they’re
using the same data at the same time. This is especially useful when users who have many
different interests and skill levels share the same database.
Exporting and importing data. Views can be used to export data to other applications.
For example, you may want to use the Customer and SalesOrderHeader tables in the
AdventureWorks database to analyze sales data using Microsoft Excel. To do this, you can
create a view based on the Customer and SalesOrderHeader tables. You can then export the
data defined by the view.
Combining partitioned data across servers. The T-SQL UNION set operator can be used
within a view to combine the results of two or more queries from separate tables into a single
result set. This appears to the user as a single table called a partitioned view. In a partitioned
view, the data still appears as a single table and can be queried without having to manually
reference the correct underlying table.

Choosing the Type of View

SQL Server uses three types of views: standard, indexed, and partitioned. Each has its own
strengths and weaknesses.

Standard views. Combining data from one or more tables through a standard view lets you
satisfy most of the benefits of using views. These include focusing on specific data and simpli-
fying data manipulation.
Indexed views. An indexed view is a view that has been materialized. This means it has been
computed and stored. You index a view by creating a unique clustered index on it. Indexed
views dramatically improve the performance of some types of queries. Such views work best
for queries that aggregate many rows. They aren’t well-suited for underlying data tables that
are frequently updated.
Indexed views typically don’t improve the performance of the following types of queries:
• OLTP systems that have many writes
• Databases that have many updates
• Queries that don’t involve aggregations or joins
• Aggregations of data that have a lot of different values for the GROUP BY key
Partitioned views. A partitioned view joins horizontally partitioned data from a set of
member tables across one or more servers. As you learned, this has the effect of making the
data appear to the user as if they’re one table. A view that joins member tables on the same
instance of SQL Server is a local partitioned view.
A view that joins data from tables across servers is called a distributed partitioned view.
Distributed partitioned views are used to implement a federation of database servers (which
is not part of the Star Trek universe). A federation of database servers (FDS) is a group of
independently administered servers that cooperate to share the processing load of a system. By
Designing a Physical Database | 189

partitioning data, you can create an FDS, which lets you scale out a set of servers to support
the processing requirements of large, multitiered Web sites.

Specifying Row and Column Filtering

Views function to help focus data, but the result of a view on a database can still be
millions of records long, and your users may still be overwhelmed. Or, you may only
want to call for a specific part of the view output.
LAB EXERCISE

Perform Exercise 8.5 in your lab Filtering views is simple, as you’ll see in Exercise 8.5.
manual.

S K I L L S U M M A RY

The principal building block of any database infrastructure design, the physical database, has
been the focus of this Lesson, and its mastery has required us to go through quite a bit of
material. First you learned that a database is a container for other objects, such as tables
and views, and that without databases to contain all these objects, your data would be a
hopeless mess.
You learned that a database consists of up to three kinds of files: primary data files, secondary
data files, and transaction log files. The primary data files are used to store user data and
system objects that SQL Server needs to access your database. The secondary data files store
only user information and are used to expand your database across multiple physical hard
disks. The transaction log files are used for up-to-the-minute recoverability by keeping track of
all data modifications made on the system before they’re written to the data files.
You were also introduced to the value of normalization and when to selectively allow
denormalization for performance purposes. You learned how to use SQL Server scripts to
document a database and how to use the Database Diagram Designer to diagram it.
You have learned that you should sit down with a pencil and paper and think about how the
tables will be laid out before you create them, and that you need to decide what the tables
will contain, making the tables as specific as possible. You also learned that tables are made
up of fields or columns (which contain a specific type of data) and rows (an entity in the
table that spans all fields). Each column in the table has a specific datatype that restricts the
type of data it can hold—a field with an int datatype can’t hold character data, for example.
Then you found you can create your own datatypes, which are system datatypes with all the
required parameters presupplied.
Tables are open to just about any kind of data when they’re first created. The only restriction
is that users can’t violate the datatype of a column. To restrict the data your users can enter
in a column, you learned how to enforce three types of integrity— domain, entity, and
referential—through check, default, and unique constraints, as well as primary and
foreign keys.
You’ve learned how using files and filegroups improves database performance, because it lets
a database be created across multiple disks, multiple disk controllers, or RAID systems.
Data access can be accelerated by using indexes at the expense of slowing data entry.
You first looked at the clustered index. This type of index physically rearranges the data
in the database file. This property makes the clustered index ideal for columns that are
constantly being searched for ranges of data and that have low selectivity, meaning
several duplicate values.
Nonclustered indexes don’t physically rearrange the data in the database; rather, they create
pointers to the actual data. This type of index is best suited to high-selectivity tables (with few
duplicate values) where single records are desired rather than ranges.
190 | Lesson 8

You also learned to design indexes by using the Database Tuning Advisor, a tool designed to
take the stress of planning the index off you and place it on SQL Server. This knowledge of
indexing will make it easier for you to plan indexes so that you can speed up data access for
your users.
Views don’t contain any data—it’s just another means of seeing the data in the underlying,
base table. You also learned about the three types of views and when to use them, and how
to filter data in a view.
For the Certification Examination:
• Be familiar with normalization and denormalization. It’s important that you know why
databases are normalized and when you should opt to selectively denormalize a database.
Pay particular attention to the performance parameters that dictate when these need to
be done.
• Know how to document and diagram a database. Make sure you understand the uses of
the Script As process for creating a SQL script of a database object and how to use the
Database Diagram Designer.
• Be familiar with partitioning. Partitioning is a feature introduced in SQL Server 2005. Its
role in performance enhancement, especially over multiple databases, is critical.
• Understand constraints and keys. Make sure you know how primary and foreign keys
function, as well as check, default, and unique constraints. You should be familiar with the
best situation in which to use each.
• Understand filegroups. Filegroups are a key method for maximizing SQL Server database
performance. You should be familiar with the performance enhancement they offer and
their restrictions and limitations.
• Understand indexes. You should know the basic differences between the types of indexes.
• Understand views. You should know that the purpose of a view is to focus data for users.
• Know the three different types of views and when to use them.

■ Knowledge Assessment

Case Study
Trevallyn Travel
Trevallyn Travel provides a variety of travel services. It has nine storefront agencies in
six North American cities, with its main office in New York. The company also serves
worldwide customers through an online travel agency.

Planned Changes
Trevallyn Travel plans to upgrade all existing SQL Server computers to SQL Server
2005. The management of the company wants a complete review of the existing
physical database design infrastructure to ensure that it’s aligned with business
requirements and optimizes performance.

Existing Data Environment


All SQL Server computers are located in the main office in New York. Currently, all
SQL Server computers are installed with a single default instance.
Existing databases are described in the following table:
Designing a Physical Database | 191

S ERVER N AME D ATABASE N AME S IZE D ESCRIPTION


Launceston HR 500 MB Employee information, benefits,
commission data
Devonport Storefront 4 GB Reservation tracking and completed
travel forms for storefront travel
agencies
Hobart OnLine-ReadOnly 6 GB Read-only subscriber to the
TravelOnLine database. Provides
information on existing reservations
to Internet customers
Ravenwood Travel-OnLine 10 GB Reservation tracking and completed
travel forms for the online travel agency

The Storefront database is accessed though a Visual Basic application. The TravelOnLine
and OnLineReadOnly databases are accessed through a web services application.

Existing Infrastructure
The TravelOnLine and Storefront databases are mission critical. The current backup
strategy includes nightly full backups, hourly transaction-log backups, and the bulk-
logged recovery model.
System databases are maintained on a hard disk set that is separate from the user
databases.

Business Requirements
The TravelOnLine database is the busiest and should be optimized accordingly.
In the Reservation table in the Storefront database, reservations that were made in
the last six months should be retrieved the fastest.
The distribution server has a large amount of free disk space. The distribution
database must be able to be restored from the most recent backup and then receive
changes from the publication database, allowing replication to continue.
A single drive failure should not cause a server to fail.
The TravelOnLine database has a table named Pax, which holds passenger information.
(Pax is travel-agent jargon for passenger.) Any optimization that occurs on the table
should not affect current indexes. The table contains the following columns:
• PaxID
• PaxName
• Address
• City
• Region
• PostalCode
• Phone
• PreferredAirline
The most common query to this table looks up the passenger’s name.
Reservation records in the TravelOnLine database have a status field that can have one
of three settings: 1 (received), 2 (in process), or 3 (completed). Users can retrieve and
update incomplete reservations through a view, but they must not be able to complete
orders through the view.
192 | Lesson 8

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions:
1. You need to define the datatype for a new column named MeritScore in the HR
database. Which option should you select?
a. Use the text datatype.
b. Use the nvarchar(max) datatype.
c. Use the vchar(max) datatype.
d. Set the large value.
2. You need to make recommendations for maximizing the performance of queries based
on passenger names from the Pax table. What do you recommend?
a. Create an index on the passenger name and ID columns. Set the index fill factors at
10 percent.
b. Create a nonclustered index on only the passenger name column.
c. Create a clustered index on the passenger name column.
d. Create a nonclustered index, using the INCLUDE clause for all columns, on the
passenger name column.
3. Query performance on the Reservation table of the TravelOnLine database is less than
optimal. As a solution, you decide to partition the table so that queries on the current
and future reservations are quickly returned. Which of the following is the best choice
for the partition column?
a. Reservation date column
b. Reservation status column
c. Reservation airline column
d. Reservation agent column
4. You have two tables in the HR database, HR.EmployeeName and HR.EmployeeAddress,
with columns as follows:

HR.E MPLOYEE N AME HR.EMPLOYEEADDRESS


EmployeeID AddressID
LastName EmployeeID
FirstName Street
Title City
Social Security Number ZipCode
City State

Based on the previous information, which is the best choice to be a foreign key?
a. City column in HR.EmployeeName
b. City column in HR.EmployeeAddress
c. EmployeeID in HR. EmployeeName
d. EmployeeID in HR.EmployeeAddress
5. You have been told that the MeritIncrease column should be configured so that no
employee receives less than a 2 percent merit increase and no one receives more than an
8 percent increase. What do you do?
a. Create a default constraint set to 2 percent.
b. Create a check constraint that allows for data ranging from 2–8 percent.
c. Create a foreign key relationship between MeritIncrease and Salary.
d. Create a unique constraint.
Designing a Physical Database | 193

6. Under what circumstances must a computed column be persisted? (Choose all that
apply.)
a. The computed column is used as a partitioning column of a partitioned table.
b. The column references a CLR.
c. The computed column is used as a primary key.
d. The computed column is used a check constraint.
7. Which of the following are true about the differences between clustered and
nonclustered indexes? (Choose all that apply.)
a. Up to 249 clustered indexes are allowed per table.
b. Nonclustered indexes are designed for columns that are searched for single values.
c. Clustered indexes are best used on columns with low selectivity.
d. Both physically rearrange the data in the table to conform to their constraints.
8. You decide to create a view of the OnLineReadOnly database to show the current reser-
vation status based on passenger name. The view joins tables from across servers. This is
an example of what kind of view?
a. Partitioned view
b. Standard view
c. Indexed view
d. Constrained view
9. Which of the following effects does normalizing a database object, such as a database or
table, have on indexing?
a. Faster sorting and index creation
b. Large number of clustered indexes
c. Narrower and more compact indexes
d. All of the above
10. You have two tables in the HR database, HR.EmployeeName and HR.EmployeeAddress,
with columns as follows:

HR.E MPLOYEE N AME HR.EMPLOYEEADDRESS


EmployeeID AddressID
LastName EmployeeID
FirstName Street
Title City
Social Security Number ZipCode
City State

You need to ensure that there are no duplicate values in the Social Security Number
field. How should you do that?
a. Add a default constraint to the field.
b. Add a unique constraint to the field.
c. Make the Social Security Number field a primary key.
d. Make the Social Security Number field a foreign key.
9 LESSON
Creating Database
Conventions and
Standards
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL EXAM OBJECTIVE


Create database conventions and standards. Foundational
Define database object-naming conventions. Foundational
Define consistent synonyms. Foundational
Define database coding standards. Foundational
Document database conventions and standards. Foundational
Create database change control procedures. Foundational
Establish where to store database source code. Foundational
Isolate development and test environments from
the production environment. Foundational
Define procedures for moving from development
to test. Foundational
Define procedures for promoting from test to
production. Foundational
Define procedures for rolling back a deployment. Foundational
Document the database change control procedures. Foundational

KEY TERMS
camelCase: A method or accepted norms or criteria, often of each component word. An
standard for naming objects. taking the form of a custom. example of PascalCase would be:
With camelCase, all characters method: A specific means of action CustomerAddress.
are lowercased except the first to accomplish a stipulated goal standard: A standard establishes
letter of component words or objective. uniform engineering or technical
other than the first word. An PascalCase: A method or standard criteria, processes, and practices
example of camelCase would be: for naming objects. With usually in a formal, written
customerAddress. PascalCase, all characters are manner.
convention: A convention is a set lowercased except the first letter
of agreed, stipulated, or generally

194
Creating Database Conventions and Standards | 195

If you have any experience with databases, the need for and value of naming conventions,
particularly in an enterprise setting, should be both self-evident and axiomatic. In fact,
you may wonder why this book needs to have a Lesson on the obvious. If you have little
or no background, then you may consider this Lesson a primer in becoming a punctilious
nitpicker with a tendency toward anal retentiveness and rigidity of thought. You may also
want to know, “What’s the big deal about how things are named and what standards are
applied? The results are all that’s important.”

The answer is simple. Having database conventions and standards offers a method of
organizing the server infrastructure as well as increasing productivity and the effectiveness
of the database administrator and development teams. Good standards that are consistently
applied grow in usefulness over time because they help make even unfamiliar databases easier
to understand. Because it’s unlikely that you’ll be working alone, devising and creating data-
base conventions and standards should be a team effort. The standards should be good, work-
able, and something your team members agree with.
Finally, although it’s easy to think up naming conventions and coding standards, they must
be durable enough to survive changing circumstances. It’s difficult to modify conventions and
standards and apply them retrospectively to existing databases because of the impact doing
so can have on applications and security. Flexibility and the ability to adapt to changing (and
unforeseen) circumstances for standards and conventions are crucial to how successful they
are. A good example in the non-IT world is the U.S. Constitution. A mere four pages long,
it’s both the shortest and longest-lasting constitution in the world. The genius to its longevity
and effectiveness is its flexibility and ability to adapt to circumstances not even dreamt of by
its original authors.

■ Understanding the Benefits of Database Naming Conventions

When designed correctly, a database naming convention lets database developers and
THE BOTTOM LINE
administrators easily identify the type and purpose of any object in a database system.

It’s important to create a consistent and meaningful naming convention for a database server
infrastructure. Applying a single, consistent standard for the entire infrastructure, even if you
have to implement it in steps, will reduce the time and associated costs when developers start
using a new database. It will also simplify the task of managing a larger number of databases.

Database naming conventions are typically product specific. What constitutes a valid name
or good practice in one database management system may be invalid or bad practice in
another. If you’re using SQL Server with other database management systems, you’ll
TAKE NOTE
* probably need to create a naming convention that spans each system. Similarly, if you’re
migrating from a different database management system to SQL Server, you’ll likely need
to adapt the names used by migrated objects to conform to SQL Server best practices.

Some benefits of establishing a database naming convention include the following:


• Personnel who use or maintain the database can easily identify an object’s purpose, type,
and function.
• Database naming conventions let you integrate new developers into the development
team quickly and easily. The learning curve can be shortened because good naming
conventions can make database code easier to read and understand.
196 | Lesson 9

Despite the tangible benefits of naming conventions, there are still those who think that the
need to establish them doesn’t apply to their circumstances. The arguments tend to fall into a
couple of categories:
• “Our team (or the company) is small, so adopting and enforcing a naming convention
is unnecessary administrative overhead.” The problem with this argument is that the
smallness of the organization is what calls for a naming convention. Without such a
convention, dependencies on particular team members are likely to develop. Similarly,
depending on an individual’s memory means you’ll inevitably lose some critical knowl-
edge if a team member moves on. Naming conventions and standards can minimize
that loss.
• “There isn’t time for new team members to learn current conventions.” This is a false
economy usually argued for by a shortsighted manager. There is an old proverb, “Give a
man a fish, and he eats for a day; teach him to fish, and he eats for a lifetime.” By apply-
ing the proverb here, the time spent understanding how a naming convention works can
save considerable time later.

Establishing and Disseminating Naming Conventions

You should establish a convention for naming all the major types of database objects and
provide documentation for all staff responsible for creating or maintaining databases and
database applications. You should learn how to avoid common pitfalls and dangers with
naming conventions. The following sections cover all these topics.

PROVIDING NAMING CONVENTIONS FOR DATABASE OBJECTS


There’s no “correct” way to establish naming conventions for database objects. There are
many approaches. You can, for example, use standard prefixes or suffixes based on the
type of objects. Or, you can adopt a set of conventions for naming the bodies of database
objects.
It’s common and useful to prefix constraints to identify the object type. For example, many
database designers use PK_ for primary keys, CK_ for check constraints, and FK_ for
foreign key constraints. Similar conventions for other database objects include usp for stored
procedures, ufn for user-defined functions, vw_ or v_ for views, and so on.
Another common practice is to give the body of a stored procedure a name that reflects the
stored procedure’s function. For example, a stored procedure that gets a list of salesmen from
the Personnel table might be called uspGetSalesmen.
With other objects—for example, indexes—it’s common to name the object by using the
name of the table followed by the name of the columns in the index. For example, a non-
clustered index over the CustomerID column of the CustomerOrder table might be called
IX_CustomerOrder_CustomerID.
For tables, the convention frequently used is the singular name of the entity that the table
represents, such as Employee, Product, or CustomerOrder.
Table 9-1 describes some common naming conventions for database objects.

When faced with existing database objects that are poorly named but can’t be renamed, you
can use synonyms as alternate names that are more descriptive. For example, if you have a
TAKE NOTE
* table named OrdNm that holds order name data, you should consider defining a synonym
called OrderName. You can then reference the table through this synonym until you can
rename the table.
Creating Database Conventions and Standards | 197

Table 9-1
Summary of database objects D ATABASE CONVENTION
and typical naming O BJECT
conventions
Tables Tables typically represent entities such as Customer or Order. It’s best to use
the name of the entity that the table represents for the name of the table
because the name should be both accurate and descriptive. Use singular
names whenever possible.
Columns Columns describe attribute data values, and you should try to retain the
same meaningful name for each column in the database. For example,
use LastName for a column holding the last name of an employee in the
Employee table. Using descriptive names makes your SQL code more readable.
Views Views typically join several tables or other views together to generate or
summarize information. Use names that indicate the purpose of the infor-
mation they return. It’s common to use a standard prefix such as vw_ for
view names to distinguish them from tables. For example, vw_YearlySales
PerSalesRegion could be the name of a view returning yearly sales grouped
by sales region.
Stored Stored procedures express actions. You should use a meaningful name com-
procedures bining verbs and objects that describe their action. To avoid confusion with
system-stored procedures, don’t use the sp_ prefix; consider using usp instead.
User-defined User-defined functions calculate values. As with stored procedures, use mean-
functions ingful names that describe the calculations the functions perform. A common
convention is to prefix the name with ufn to distinguish user-defined func-
tions from columns or views in SQL statements. For example, ufnCalculate-
SalesTaxDue could be the name of a user-defined function that calculates the
sales tax due for a transaction.
Triggers Triggers perform an automatic action when an event occurs on a table. You
should combine the name of the table and the trigger event type. For exam-
ple, a trigger called dOrder might handle the delete event on the Order table,
and the uOrder trigger might handle the update event on the Order table. You
can also indicate whether the trigger is an AFTER or INSTEAD OF trigger by
including After or InsteadOf in the name—for example, dAfterOrder.
Indexes Index names commonly combine the name of the table and the names of the
columns, and they frequently include a prefix such as IX_. For example, the
index IX_Employee_ManagerID column might span the ManagerID column in
the Employee table. You can augment the prefix to indicate whether the index
is clustered or nonclustered, a unique index, and so on. An advantage is that
the index names become self-documenting. However, this approach can result
in lengthy names. Normally this isn’t a problem because you’re unlikely to
refer directly to the name of an index in your applications or SQL commands.
Constraints Constraints specify rules to which data in a column or set of columns in a
table must conform. It’s best to name a constraint after the rule it enforces
or the column it operates on. You can also add a prefix indicating the type of
constraint (check, primary key, foreign key, unique constraint, and so on). For
example, the check constraint CK_Employee_MaritalStatus might validate the
data in the MaritalStatus column in the Employee table as it’s entered.
Schemas Use schemas to group database objects by functionality and to partition
objects into protected domains. One danger is that it can be easy to confuse
schema names with tables. For example, in the AdventureWorks database
provided with SQL Server 2005, Sales is the name of a schema. However,
many databases also have a table called Sales. It’s more effective and less
confusing to add a prefix that identifies a name as a schema. For example,
you could use schSales to represent a schema and Sales to represent a table.
198 | Lesson 9

AVOIDING PITFALLS AND DANGERS WITH NAMING CONVENTIONS


You should exercise caution when developing your infrastructure-wide naming conventions
because you may get only one shot at it. Similarly, because it’s difficult to modify production
CERTIFICATION READY?
systems, any bad naming habits that an organization follows can be very long-lived. This can
Make sure you know
where SQL Server looks become a classic case of a self-perpetuating error that only a computer can excel at, and you’ll
for stored procedures be responsible for it. Furthermore, following bad practices can slow development and can
and the search order. result in unexpected behavior by a database or its contents.

RECOGNIZING BAD PRACTICES


If you’re not careful, database naming conventions can lead to problems. While there are no
hard-and-fast rules beyond consistency and using conventions that make sense in your con-
text and are easily transferable to other situations, there are some mistakes that you should
work to avoid.
Microsoft has identified a number of conventions and practices that it considers “bad
practice” and you should make sure that you avoid them.
Using the sp_ prefix in user-defined stored procedure names
If the sp_prefix is used for a user-defined stored procedure, it causes SQL Server to search
the master database first and then the local database. SQL Server will stop searching when it
finds the first stored procedure that matches the name it is looking for. As a result, the master
database stored procedure will be executed if it’s marked as a system-stored procedure, and the
local database stored procedure will not be executed.
Another problem is identification. Using the sp_ or sp prefix makes it difficult to tell the dif-
ference between your own stored procedures and the system-stored procedures that come with
SQL Server.
The best practice, according to Microsoft, is to label user-defined stored procedures with the
usp_ prefix.
Using uppercase and lowercase inconsistently
It really doesn’t make a difference how you use upper- and lowercase. You can use them alone,
separately, or mixed. The latter can be useful because it gives visual cues about where key
parts of the object name begin and end, especially with compound names. Two examples of
common capitalization conventions are PascalCase and camelCase. Examples of PascalCase
include such names as OrderDetails or CustomerAddresses. Examples of camelCase include
names like myAddress and vendorTerms.
Again the choice is really up to you, but the worst thing you can do is be inconsistent. This
problem becomes a disaster if you install a database or an application on a case-sensitive
server, causing operations to fail that don’t exactly match the case usage of an identifier.
By commonly accepted convention, SQL key or reserved words are usually expressed in all
uppercase text while object names are primarily expressed in some form of lowercase text.
Using spaces or nonalphanumeric characters in object names
In a word, don’t, unless of course you like to overcomplicate things and use extra keystrokes.
Using spaces complicates code and forces you to use delimiters around identifiers, or double-
quote marks around table and column names. Microsoft recommends the use of the under-
score (_), as a word separator. Mixed cases can also help.
Creating Database Conventions and Standards | 199

Naming tables with the tbl prefix


In a word, don’t. While Microsoft Access database developers commonly use the tbl prefix,
the presence of table names in the FROM clause of a SELECT statement in SQL Server
makes the table names unambiguous.
Including a datatype abbreviation in a column name
CERTIFICATION READY? The biggest problem with following this convention is the maintenance cost. For example
Suppose a query
when you change a column’s datatype, you have to change the column name or else invalidate
references a table named
Orders and a stored
the convention. Keeping up with this purely arbitrary naming convention adds no value—a
procedure references a clear case of the juice not being worth the squeeze.
table named ORDERS in Using short or abbreviated object names
the same database. Are
these the same object? There is no reason to stick to obscure and cryptic short names any longer. The point of nam-
What if, instead, there ing something is to identify it, not cause you to play a guessing game.
were columns in three
different tables named Using reserved words as object names
OrderNumb, OrderID, and This is not only bad practice, but also rife with possible disaster. Using reserved words for
OrderNo. What do you object names means that you constantly have to delimit identifiers with square brackets or
think of those different
double quote marks. This makes your code difficult to read, and again, for no good reason.
names?
The possibility also increases that the now-difficult-to-maintain SQL commands may fail.

ENCOUNTERING VENDOR NAMING CONVENTIONS


Vendors and contractors may be an unexpected problem and potential pitfall. Naming conventions
defined by a vendor may conflict with your organization’s naming conventions. In that instance,
staff members from your organization will need to learn the vendor’s naming standards if they’re
responsible for maintaining a vendor-supplied system. Although this isn’t necessarily a bad thing, it
isn’t uncommon for some of the vendor’s standards to be accidentally applied to your system and to
start coexisting with your naming standards. This is something you need to guard against.
If your organization is outsourcing database development work, it’s recommended that
you devise good naming standards and conventions for the contractor to apply. Otherwise,
you may find yourself with a contractor-supplied database design that is at odds with your
practices and that is difficult to maintain.

DOCUMENTING AND COMMUNICATING DATABASE NAMING CONVENTIONS


If you don’t already document work on and about the database infrastructure as a matter of
course, you should. A critical task in the creation and implementation of naming conventions
and standards is to document those you’ve adopted. Conventions can evolve over time, so it’s
important to keep such a document concise, clear, and up to date. In some cases, you may
need to customize the document for a specific project.
Another obvious need is to distribute database conventions and standards to all staff members
who need that information. Establish mechanisms to ensure that all database developers, admin-
istrators, and testers in the organization can access the latest version of the document. You can
use tools such as Microsoft SharePoint Portal Services to share these documents and keep con-
trol of document versions. Or, you can post them on you organization’s intranet. Either way, the
key is to make sure that conventions and standards are disseminated and enforced.
If you’re engaging external vendors, contractors, or even another department or branch of
your organization in a database project, make sure the people involved know about and are
required to follow naming conventions. Establish mechanisms to check for naming conven-
tion compliance by partners. These can include random reviews or having staff dedicate time
to the task as part of a quality assurance process. The time you spend double-checking can
LAB EXERCISE save you much more time later.
Perform Exercise 9.1 in your lab In Exercise 9.1, you’ll examine and evaluate the object names in the AdventureWorks data-
manual. base. In this exercise, you’ll look at the names of objects in the AdventureWorks database that
ships with SQL Server 2005. You’ll provide examples of good (and in some cases bad) naming
practices that have been followed.
200 | Lesson 9

■ Defining Database Standards

Just as you need to establish naming conventions, you need clearly defined database
THE BOTTOM LINE
standards. These standards cover T-SQL coding, database access, and change deployment.
In this section, you’ll examine the why and how of database standards and learn some basic
ways of creating and managing standards.

Ironically, the need for standards is an inevitable result of the flexible and freewheeling way in
which development has grown. As software developers created more and better database programs,
with maximum flexibility and an open invitation to innovate, they sowed the seeds of confusion.
Developers can now use many different techniques for accessing databases. They can docu-
ment their code in a number of different ways (assuming they document it at all). At the same
time, different teams can deploy databases and applications to the production environment in
a variety of ways. The problem with all this creativity and inventiveness is that it has unleashed
a form of documentation anarchy. When different developers and teams follow their own
individual practices, they can end up creating code and databases that are difficult to maintain.
Similarly, letting different groups deploy applications and databases in uncontrolled ways can
lead to chaos, possibly resulting in security failure, if not complete system breakdown.
Having no infrastructure standards—or, worse, having them and not enforcing them—is an
invitation to inconsistent behavior in the database and its application as well as development
of old-fashioned, poor-quality applications.
A key activity in designing your infrastructure must include database standards that are
clear, sensible, and enforced. Defining and using standards will alleviate many problems and
provide a number of benefits. For example, if you require developers to follow a standard
technique for accessing and manipulating databases, the result should be code that a differ-
ent developer can maintain with a minimal learning curve. At the same time, you’ll be more
confident of the quality of the applications being built.
In addition, defining database standards can help your team, department, or organization oper-
ate more systematically and can reduce the time it takes to learn new systems or move from one
system to another. Defining a standard process for deploying databases and database applications
reduces the scope for errors, minimizing the likelihood of system failure and security breaches.
Any list of database infrastructure standards is necessarily incomplete because every organiza-
tion has its own unique needs. As with naming conventions, database infrastructure standards
tend to be developed or enhanced by the organization that uses them, so there is no such
thing as an exhaustive list.
However, there are general standards that are nearly universal. The following sections describe
the types of standards you should consider defining.

Transact-SQL Coding Standards

The first step you should take is adjusting any preconceived notions you have about
Transact-SQL (T-SQL) code and thinking of it as true source code. Database T-SQL
code such as stored procedures, triggers, and scripts is the most common means of
implementing critical portions of database applications. When dealing with T-SQL, you
should use source-code control and enforce standards for good coding practices. You
should also ensure that all developers apply the appropriate coding standards when
performing code reviews.

T-SQL coding standards should cover a wide range of functional areas, including transaction
and error handling, stored procedure unit testing, and debugging mechanisms. Standards
should stipulate good commenting and stylistic practices, making stored procedures, func-
tions, views, T-SQL statements, and any T-SQL code items easy to understand and maintain.
Creating Database Conventions and Standards | 201

DEFINING T-SQL STANDARDS


When defining T-SQL coding standards, you might want to follow these common
recommendations:
• Use templates for each type of object, such as stored procedures, user-defined functions,
views, and triggers. The templates usually contain predefined code that guides developers
through the items they should implement. Templates can also contain boilerplate areas
for descriptions, the author, the date of creation, and a log of changes and reasons for
the changes.
• Adopt the following stylistic standards:
° Prefix every reference to a database object with the name of the schema it belongs to.
° Indent every block of code appropriately.
° Use uppercase letters for all SQL and SQL Server keywords.
• Apply the following functional standards to database code objects, whether based on T-
SQL or managed code:
° Ensure that code in triggers can handle multiple inserts, updates, or deletes, not just a
single row.
° Never use T-SQL user-defined functions (UDFs) to perform searches on other tables by
executing a lookup for some value based on a key. This use of UDFs can result in poor
performance if a UDF is used as part of a SELECT query that returns many records.
° Avoid using cursors inside stored procedures. Cursors are exceptionally poor replace-
ments for set-based queries and should be used only when absolutely required.
° Require that stored procedures avoid creating and using temporary tables unless they
improve performance.
° Employ TRY . . . CATCH constructs to perform error handling. This helps simplify the
LAB EXERCISE logic of a T-SQL block and avoids the use of @@ERROR functions in repeated tests.
Perform Exercise 9.2 in your lab In Exercise 9.2, you’ll use Template Explorer to use an existing template for T-SQL code.
manual.
DOCUMENTING, DISSEMINATING, AND REVIEWING CODING STANDARDS
As with naming conventions, you should make proper and detailed documentation of T-SQL
coding standards one of your most critical job functions. Good documentation makes it easier
for new database developers and database administrators to adapt to the practices adopted by
your organization.
Having a coding standard makes performing code reviews much easier. Code that fol-
lows a good standard will be correctly aligned, will have the same degree of comments and
documentation, and will be easier to read than code formatted using a helter-skelter style.
Reviewers can concentrate on issues such as the suitability of the algorithms used and veri-
fying that code solves the problem for which it’s designed, rather than trying to follow it.
Include standards related to what developers should and should not do with regard to T-SQL
code. Often there are multiple ways to accomplish an action. If you specify the desired way
as part of the standards documentation, you have a much better chance that developers will
adhere to this standard.
To disseminate the standards, you can use a communication portal tool such as SharePoint
Portal Server. Similarly, you can use a shared drive or a company intranet site.

Defining Database Access Standards


Databases can be accessed in a number of ways, so defining a standard mechanism for
accessing a database makes it simpler to enforce best practices and maintain security. For
example, you can specify that all data access be done using stored procedure calls from
a client application or middle-tier components. Doing so allows you to modify the
database schema or tune queries without needing to modify client or middle-tier code.
202 | Lesson 9

Another reason to develop database access standards is that when applications access databases
in a wide variety of nonstandard ways, it becomes much more difficult to optimize systems,
trace connections when identifying performance problems, and enforce security best practices.
The lack of a data access standard can also needlessly increase the complexity of the deploy-
ment process, adding an unnecessary level of fragility to an application (and the database).
Prudence dictates that, as with most infrastructure activities, you should develop a set of stan-
dards or rules for accessing databases that you can apply to your entire infrastructure.
The first question you should consider is whether you want to allow users and applications to
access the data in a database directly or only indirectly.

DIRECTLY ACCESSING THE DATABASE


Permitting direct access to database data results in a tight coupling between the SQL com-
mands that an application uses and the database schema. If you modify the tables or views in
the database, you’ll probably need to modify the application as well. Similarly, if you want to
tune the queries used by an application, you’ll probably have to change the application source
code and then redeploy the application. As a result, it isn’t normally advisable to allow direct
access to the data in a database.
You can use at least two mechanisms to implement indirect access to the data in a database:
You can specify that applications must use stored procedures, or you can restrict all data
access to use views.

INDIRECTLY ACCESSING THE DATABASE THROUGH STORED PROCEDURES


Using stored procedure to access the database has many advantages:
• Applications aren’t tightly coupled to the database schema and don’t rely on a fixed
structure of tables and columns. You can modify the structure of tables without affect-
ing the application; all you need to do is update the stored procedure to use the new
schema, and it will then return the same results and take the same parameters.
• Stored procedures can be used to shield operations that may expose sensitive data that
should be hidden from the user or application.
• It’s easier to optimize and tune queries without affecting or needing to modify applica-
tions that use the stored procedure.
• This method can reduce network traffic by encapsulating logic in the server rather than
the client applications. Note that SQL Server can be used to generate, optimize, and
reuse the same query execution plans when the same stored procedure is used repeatedly.

INDIRECTLY ACCESSING THE DATABASE THROUGH VIEWS


You can create a view for each table and provide access to the views rather than to the under-
lying tables. You can also design views that join tables or generate summary data. Some of the
advantages of using views to access data include the following:
• Views can hide complex SQL logic from applications and reduce the coupling between
an application and a database, enabling you to modify the underlying tables without
requiring that you change the application.
• Views can be configured to be selective about the information they make available to
end users and applications, based on the identity of the end user or application using
the views.
• Applications can be selective in the data that they retrieve. This differs from using
a stored procedure, because when using a view, you can configure an application to
retrieve only the columns required to implement functionality. For example, an applica-
tion that displays the hire and termination dates of employees doesn’t have to retrieve the
salaries of employees, even if this data is present in the view.
One place where stored procedures are a better option than views is in reducing network
traffic. The amount of logic you can encapsulate in a view is limited when compared to that
Creating Database Conventions and Standards | 203

available in a stored procedure. To compensate, you would need to place more of the logic in
client applications—not necessarily the best approach.

DOCUMENTING AND COMMUNICATING DATABASE ACCESS STANDARDS


As before, and with all standards, documentation and dissemination are crucial tasks. You
should ensure that all developers and designers responsible for building database applications
are aware of the data access standards, and, of course, enforce them.
One way of ensuring enforcement, if you’re responsible for assuring database access standards
are followed, is to validate all applications using a database and ensure that developers have
followed the appropriate data-access standards before allowing the application to be deployed
in a production environment. To disseminate the standards, you can use a communication
portal tool such as SharePoint Portal Server. Similarly, you can use a shared drive or a
company intranet site.

Deployment Process Standards

In this section, standards for deploying and coordinating changes to database structures
and the matching application code are discussed. Normally both database structure
and code are mutually dependent on each other and deploying even small changes can
require careful coordination of the steps in the deployment process.

By its nature, no activity is more complex than application deployment. Logistically, it’s a
complicated process, often involving several teams—a breeding ground for snafus. Having
deployment standards helps reduce the complexity of deployment and clarifies procedures
when the unexpected happens.
Similarly, an application development life cycle can be a long and complicated process, with
the goal of deploying the application to a production environment. Databases are often han-
dled independently. For example, a single database may serve multiple applications and have
its own development life cycle.
Consequently, you can’t create standards for database deployment in a vacuum. You must
define them in conjunction with standards for application deployment. Just as the application
deployment process requires a good, well-documented, and well-tested deployment plan, so
does the database deployment process.
The process of deploying a database has some unique features that require special attention to
ensure that the database will meet the necessary quality and reliability standards after deploy-
ment to the production environment. It’s important to remember that deploying a modified
database is considerably different from deploying a modified application. Unlike an applica-
tion, you can’t replace a database with a newer version. Instead, as part of the deployment pro-
cess, you must ensure that the contents (such as data and database objects) are transformed and
transferred as well. You’ll likely have to create scripts that update the structure of a database
and make the appropriate changes to the data. Finally, you must provide a way to roll back the
changes and revert to the previous version of the database if the deployment fails.
The following sections discuss some guidelines for developing database deployment standards.

DEFINING THE ROLE OF DEVELOPMENT, TESTING,


AND PRODUCTION DATABASES
As the first step, make sure you clearly distinguish the role and location of development, test,
and production databases. Normally, these are stored on different servers.
Developers should develop only using the development database. When a developer creates
scripts that build or modify the development database, the scripts should initially be tested
on the development server. When development is complete, then and only then should the
scripts be transferred to the test server and used to construct or update the test database.
If testing fails at any point, developers update the scripts in the development environment
204 | Lesson 9

before sending them back for testing. When testing has been completed, the same scripts are
then used to build or update the production environment.
To ensure the validity of the development and test environments, it’s most efficient to build
the development and test databases from backups of the production database (making sure
to protect or remove any sensitive data if applicable). You should utilize a source-control sys-
tem to maintain the latest versions of table schemas, stored procedures, and all database code
objects. The database source code should be versioned and labeled following the style adopted
for the overall application development project. Save all deployment scripts, including those
implementing schema changes and data modifications, in the same source-control system.

DEFINING METHODS FOR PROTECTING PRODUCTION DATA


DURING THE DEPLOYMENT PROCESS
Your highest priority is to ensure the integrity of the production database. (Consider having
that sentence engraved on your keyboard!) Make no mistake; losing a production database
because of sloppy handling can be a career ender.
How do you do that? There are many methods, but using the following tips as appropriate
should keep you safe from an egregious error:
• Allow only production database administrators to access the production database.
Doing so creates responsibility and ownership. And using only database administrators
familiar with the production environment will probably reduce the number and type
of errors.
• Make changes to the database only by using T-SQL scripts. You can use the SQLCMD
tool to run T-SQL scripts from the command line or from command scripts. You can also
parameterize T-SQL scripts. Doing this lets you make sure you’re running the same com-
mands in the development, test, and production environments and for repeating the chang-
es if necessary. You should perform thorough unit and integration testing of these scripts.
• Back up all affected databases before deployment. It’s easy to forget a step as obvious
X REF
as this under the stress of a deployment operation, so make sure you explicitly document
Lesson 11 covers backup it as a step in the deployment process. (Engrave this one on the keyboard as well.) Doing
in detail. so can save you; it’s almost a given that databases crash only when there’s no backup.
• Have a rollback plan. If the deployment fails, you don’t want to be improvising on the
fly. Believe it or not, acting without a plan usually makes the problem worse. Rollback
may be a simple matter of restoring the database from a backup. However, if new data
has been added that must not be lost, you need to have developed and tested scripts to
perform the rollback operation that also preserve the data.

DEFINING THE ROLES AND RESPONSIBILITIES OF STAFF


Your deployment plan should clearly indicate what staff member or role is responsible for execut-
ing each step. The documentation should also clearly specify the sequence of steps and include
a decision tree detailing the options available for each step depending on whether it succeeds or
fails. The plan must also specify how long the deployment process will take and any dependencies
that other production systems have on the database. If the deployment requires a period when
the database is unavailable to production, make sure the deployment plan document identifies
the staff members who should be notified as well as how long the system will be unavailable. It’s
best to schedule deployments that include service unavailability during off-peak hours.

RECORDING CHANGES IN A RUN BOOK


A run book logs all actions taken by a database administrator that affect production databases.
This includes anything that modifies the database or server configuration and any specific
changes, based on user requests, made to data.
The run book gives you a precise record of all the changes you’ve made to the database and
the date and time of each change. This will help you to reproduce or undo these changes if
necessary. You can also use the run book to ascertain whether an error was caused by your
actions or an external event.
Creating Database Conventions and Standards | 205

It’s best to keep your run book as a document in your source-control tool, so that you have
access to all versions.

Database Security Standards

X REF
Database security is an extensive and important topic. Setting standards for database
For more information security can help reduce that complexity across your entire infrastructure. For example,
about designing and you may decide to require that all users log in to SQL Server by using Microsoft
implementing security Windows authentication, thereby enforcing the same level of security for database users
policies for SQL Server, as at the network level. Although you should be aware of the need to set database secu-
see Lessons 4 through 7. rity standards, detailed discussion is beyond the scope of this lesson.

S K I L L S U M M A RY

In this lesson, you learned about the importance of naming conventions and database control
standards as part of an effective infrastructure. You learned how to design a flexible naming
convention system that maximizes effectiveness. You learned naming convention best practices.
You also read about bad naming practices and how to avoid them. If you’ve ever questioned
the value of a naming convention system, that doubt has been laid to rest.
You also learned that a naming convention system that is inflexible is as valueless as one that
doesn’t exist. You reviewed some methods, such as synonyms, for dealing with existing databases
that have no or poor naming conventions.
You examined how defining database standards can help your team, department, or organization
operate more systematically and reduce the time it takes to learn new systems or to move from
one system to another. You learned techniques for defining standard processes to deploy data-
bases and database applications and how doing so minimizes the chance for errors and potential
of system failure and security breaches. You learned about run books and how to use them when
deploying and maintaining your databases and applications.
You also read about best practices for coding and database access standards and how they
integrate with the security standards that you learned about in other Lessons.
For the certification examination:
• Understand the benefits of naming conventions. It’s important to know how naming
conventions work and how to develop a flexible set of naming standards.
• Understand the common bad naming practices. Just as it’s important to know what a
good naming convention is, you should also know the most typical errors and how to
avoid them.
• Be familiar with T-SQL coding standards. Understand what makes a good T-SQL code
standard, what common errors to avoid, and how to use Template Explorer to minimize
the risk of error.
• Be familiar with database access standards. Understand the best practices for database
access and how to define them as standards.
• Be familiar with database deployment standards. Understand the best practices for database
deployment and how to define them as standards. Understand how to plan deployment.
Make sure you understand the value of assigning roles and how to do so effectively. You
should also be aware of the various functions of development, test, and production servers
and how and when to use them. Be aware of how to protect data during deployment or
changes, as well as when and how to plan rollbacks in the event of a deployment failure.
206 | Lesson 9

■ Knowledge Assessment

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.

1. Which of the following are benefits of having database naming conventions? (Choose all
that apply. )
a. Provides a method to organize infrastructure
b. Reduces the learning curve for new database administrators
c. Makes coding easier
d. All of the above
2. Which of the following are the most important attributes of a naming convention?
(Choose all that apply.)
a. Flexibility
b. Regulatory requirements
c. Consistency
d. Size of the organization
3. Which of the following database objects should have a naming convention? (Choose all
that apply.)
a. Database
b. Table
c. Trigger
d. Index
4. Which of the following practices should not be followed?
a. Prefixing a view with vw_
b. Prefixing a stored procedure with sp_
c. Using prefixes with schema
d. Using the prefix ufn to define a user-defined function
5. Which of the following are good naming practices for indexes? (Choose all that apply.)
a. Combine the name of the table and the names of the columns.
b. Specify whether the index is clustered or nonclustered.
c. Include a prefix such as IX_.
d. Use spaces to separate key elements.
6. When you have an existing database with poorly named objects that cannot be renamed,
what is the best way improve clarity of the naming conventions?
a. Use a lookup table.
b. Create a new column.
c. Note in your standards documentation what the poorly named object actually
represents.
d. Use a synonym.
7. Which of the following is not a bad practice for naming conventions?
a. Using the sp_ prefix in user-defined stored procedure names
b. Inconsistent use of uppercase and lowercase letters
c. Using numbers in the name
d. Using reserved words for object names
8. Which of the following are not recommended names for tables in a SQL Server
database? (Choose all that apply.)
a. Person.Address
b. Person.Address Type
c. tbl_Person.AddressType
d. dbo.MSmerge_history
Creating Database Conventions and Standards | 207

9. Consider the following trigger name found in the AdventureWorks database:


ddlDatabaseTriggerLog. Which of the following characteristics does it have? (Choose
all that apply.)
a. Proper use of uppercase and lowercase letters
b. Proper use of a prefix to indicate the type of operation performed
c. Proper use of the word Trigger in the name
d. Proper use of alphanumeric characters
10. Consider the following index name found in the AdventureWorks database: AK_
BillOfMaterials_ProductAssemblyID_ComponentID_StartDate. Which of the following
good naming convention practices are followed in this name? (Choose all that apply.)
a. The index name includes a prefix indicating its type.
b. The index name includes the name of the original table.
c. Names of objects are separated by an underscore.
d. All of the above.
11. Which of the following is a useful tool available in SQL Management Studio for devel-
oping T-SQL code for database objects?
a. Business Intelligence Development Studio
b. Template Explorer
c. Object Explorer
d. Solution Explorer
12. In T-SQL code, you should adopt which of the following stylistic standards? (Choose all
that apply.)
a. Prefix every reference to a database object with the name of the schema it belongs to.
b. Indent every block of code appropriately.
c. Use lowercase for all SQL and SQL Server keywords.
d. None of the above.
13. You should apply which of the following functional standards to database code objects
whether based on T-SQL or managed code? (Choose all that apply.)
a. Ensure that code in triggers can handle multiple inserts, updates, or deletes, not just a
single row.
b. Use T-SQL UDFs to perform searches on other tables by executing a lookup for
some value based on a key.
c. Use cursors inside stored procedures.
d. Require that stored procedures avoid creating and using temporary tables unless they
improve performance.
14. Which of the following are true regarding allowing users and applications direct access to
data in a database? (Choose all that apply.)
a. It results in a tight coupling between the SQL commands that an application uses
and the database schema.
b. It improves database security.
c. Modifying tables or views in the database will likely require modification of the
application.
d. It streamlines troubleshooting.
15. Which of the following mechanisms can be used to implement indirect access to the data
in a database? (Choose all that apply.)
a. Triggers
b. Indexes
c. Stored procedures
d. Assemblies
208 | Lesson 9

16. Which of the following are good deployment practices? (Choose all that apply.)
a. Require developers to use the test rather than the production database.
b. Do a complete backup of the production database before applying changes.
c. Utilize a source-control system to maintain the latest versions of table schemas, stored
procedures, and all database code objects.
d. Allow only production database administrators to access the production database.
17. What must be in place prior to initiating a deployment from test to production?
(Choose all that apply.)
a. Backup of the development database
b. Definition of roles and responsibilities of staff involved
c. Sequence of steps to follow
d. Rollback plan
18. What should you use to log all actions taken by a database administrator that affect a
production database?
a. Transaction log
b. Shipping log
c. Desk calendar
d. Run book
19. The word NewYorkYankees is an example of what style of casing?
a. Hungarian case
b. Reverse Polish case
c. camelCase
d. PascalCase
20. Which of the following methods and tools can you use to ensure proper dissemination of
documentation regarding naming conventions, coding standards, rollback plans, deploy-
ment sequence, and other control procedures and standards? (Choose all that apply.)
a. Network share
b. Intranet site
c. SharePoint Portal Service
d. All of the above
Designing a SQL L ESSON 10
Server Solution for
High Availability
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Develop a strategy for migration to a highly available environment. Foundational
Analyze the current environment. Foundational
Ascertain migration options. Foundational
Choose a migration option. Foundational
Design a highly available database storage solution. Foundational
Design the RAID solutions for your environment. Foundational
Design a SAN solution. Foundational
Design a database-clustering solution. Foundational
Design a Microsoft Cluster Service (MSCS) implementation. Foundational
Design the cluster configuration of the SQL Server service. Foundational
Design database mirroring. Foundational
Design server roles for database mirroring. Foundational
Design the initialization of database mirroring. Foundational
Design a test strategy for planned and unplanned role changes. Foundational
Design a high-availability solution that is based on replication. Foundational
Specify an appropriate replication solution. Foundational
Choose servers for peer-to-peer replication. Foundational
Establish a strategy for resolving data conflicts. Foundational
Design an application failover strategy. Foundational
Design a strategy to reconnect client applications. Foundational
Design log shipping. Foundational
Specify the principal server and secondary server. Foundational
Switch server roles. Foundational
Design an application failover strategy. Foundational
Design a strategy to reconnect client applications. Foundational
(continued )

209
210 | Lesson 10

L E S S O N S K I L L M A T R I X (continued )

TECHNOLOGY SKILL 70-443 EXAM OBJECTIVE


Select high-availability technologies based on business requirements. Foundational
Analyze availability requirements. Foundational
Analyze potential availability barriers. Foundational
Analyze environmental issues. Foundational
Analyze potential problems related to processes and staff. Foundational
Identify potential single points of failure. Foundational
Decide how quickly the database solution must failover. Foundational
Choose automatic or manual failover. Foundational
Analyze costs versus benefits of various solutions. Foundational
Combine high-availability technologies to improve availability. Foundational

KEY TERMS
database mirroring: A on the principal server and then primary server functioning as the
technology for continuously copied to the secondary server. primary database in a mirroring
copying all data in a database merge replication: A method of configuration.
from one server to another replication that transfers data secondary database: The
so that in the event that the from one database to one or passive or secondary database
principal server fails, the more other databases. Data can in a mirroring configuration. Also
secondary server can take over be changed in more than one known as the mirror database.
the processing of transactions location. This may cause conflicts single point of failure: A
using its copy of the database. to arise. vulnerability whose failure leads
failover: A switch between the mirror database: The passive or to a collapse of the whole.
active and standby duplicated secondary database in a mirroring snapshot replication: A method
systems that occurs automatically configuration. Also known as the of replication that involves
without manual intervention. secondary database. database snapshots. This form
Sometimes known as switchover. principal database: The of replication is not a high-
high availability: The continuous active database in a mirroring availability solution.
operation of systems. For a configuration. transaction replication: A
system to be available, all principal server: A machine that method of replication that
components including application during normal operating conditions transfers transactions from one
and database servers, storage provides the services that a service database to one or more other
devices, and the end-to-end such as SQL Server offers. databases. Changes to data are
network need to provide quorum: The majority of servers not allowed on the receiving
uninterrupted service. in a mirroring configuration. database(s).
log shipping: A technology for A quorum of two servers witness server: An optional third
high availability that is based on determines which database server used in some mirroring
the normal backup and restore is the principal server. In a configurations to initiate the
procedures that exist with SQL normal situation, the principal automatic failover within seconds
Server. In this environment, database and the witness of the principal server failing.
transaction-log backups are made form a quorum that keeps this
Designing a SQL Server Solution for High Availability | 211

A highly tuned, efficiently designed, well-configured database is of no use if it isn’t avail-


able. The past few years have seen some huge disasters across the world, from the tsunami
in the Indian Ocean to Hurricane Katrina in the United States. Each of these has taken
its toll in many ways, many more severe than the continued availability of your database
server. However, these disasters have brought to the forefront the need to ensure that
your computer systems can survive and continue to function in the face of issues with
the primary server.

As SQL Server has matured as a product, increasing numbers of people have called for better
solutions for ensuring their databases are highly available. Microsoft has responded, expanding
the capabilities of SQL Server in this area with each version. With SQL Server 2005, there are
not only more solutions but also solutions that are easier to implement and administer.
The holy grail of availability measurements is the five nines, which corresponds to an uptime
or availability of 99.999 percent. This equates to a yearly downtime of five minutes—barely
enough time for a reboot on most servers.
Although a single server probably can’t achieve this level of availability for any appreciable length
of time, using two or more servers with a technology to move data, connections, and the other
parts of a SQL Server application to another server can help you get to this level of reliability.
This Lesson looks at the four main technologies used in SQL Server solutions to achieve a
highly available database server.

■ Examining High-Availability Technologies

SQL Server incorporates four technologies to enable you to build a highly available solution:
THE BOTTOM LINE
clustering, database mirroring, log shipping, and replication.

Before looking at any particular high-availability solution, you should first examine the goals
of a highly available system. There are some common misperceptions as to what benefits and
capabilities a high-availability (HA) designed system brings to a particular company. As with
any technical solution, the choice of which HA technology to choose should ensure that the
business requirements for availability and cost are met.

Identifying Single Points of Failure

A single point of failure is a person, component, or process that brings down the system
when it stops working. This can be a DBA who forgets to run a critical process or a
memory chip that fails and crashes a server. The goal of high-availability systems is to
withstand a single failure and continue to function.

Your database server contains multiple points of failure, some of which can be mitigated, and
some of which can’t. Suppose you install Windows 2003 and some edition of SQL Server on
your laptop computer and begin responding to client requests for a web application you’ve
built. Your single points of failure are as follows:
• CPU. A CPU failure will crash your server.
• Power supply. Most laptops have a single power supply, so its failure will crash the
system.
• Disk drive. Most laptops don’t contain any type of RAID technology, so a single drive
failure will crash the system.
212 | Lesson 10

• Network connection. Most laptops have a single network interface card (NIC) and
a single path to connect to the network, so the NIC, cable, or switch can crash your
system.
• Windows 2003. Until you implement some type of HA technology, the Windows
operating system host is a single point of failure.
• SQL Server and the application. The software components of your system, subject to
patching and changes, can fail, resulting in a system crash.
You could have a single point of failure in other places, but these are the primary ones. Some
of these can be mitigated—arguably, all of them, with a technology such as clustering. Some
components (for example, the built-in laptop mouse) might cause problems if they failed but
probably wouldn’t crash the system. If however you were unable to make take critical actions
due to the simple mouse failure, you have the risk that a small minor failure could facilitate
more significant problems.
The key goal of your HA system design is to eliminate as many single points of failure as
possible. This usually means designing redundant parts into the system, such as RAID drive
arrays, spare power supplies, and so on; but it can also include developing a plan for alternate
ways of running the system in the event of a disaster.
In designing your HA system, you must examine all the components, down to the cables that
connect the systems, and assess the impact of any particular piece of equipment failing. In
building the HA system, you should have a way to mitigate any of these failures—preferably,
an automated response.
You should also think creatively about related parts of your system. Consider patches and
upgrades, staff, vendor resources (such as your Internet connection), and more to ensure that
every part of the system, from server to client, has as few single points of failure as possible.

Setting High-Availability System Goals

Although each of these technologies works in a slightly different way, the goal for all of
them is to ensure that your data can be accessed almost all the time. This implies that the
components of SQL Server that are likely to fail won’t affect an application’s ability to
query and change the data. The HA technologies built into SQL Server don’t necessarily
guarantee that any particular hardware component or even Windows host will continue
to function, but rather that the services provided by SQL Server—the ability to access
data—will continue to be available to clients. This goal should be accomplished in
tandem with preventing the loss of any data. Usually, this requires synchronization of
the data between various copies that exist on different systems.

TAKE NOTE
* In setting your goals, be sure you’re meeting the needs of your organization and not just
An HA system is often building an HA system focused on uptime. The cost of the solution, whether automatic or
referred to as having no manual failover is required, and the impact on the finances of the company based on ROI are
single point of failure, all factors that should be incorporated into your design goals.
meaning that any one
component that fails The machine that provides the services that SQL Server offers—access to data, messaging
won’t affect the ability queues, and so on—is generally referred to as the principal server. This is the Windows host
of the database server that is running SQL Server and to which the clients connect. Any servers that are set up and
to function. ready to take over the services in the event of a disaster are called secondary servers.
A disaster in this context is any event that causes an interruption of service by the primary SQL
Server machine. This could be something as minor as a power cord that becomes unplugged,
as major as a hurricane that destroys the data center, or anything in between. Whatever event
occurs, it’s classified as a disaster for the primary SQL Server, and the HA solution chosen is
used to bring a secondary server online and allow clients to access the data on this server.
The event of moving the service from the primary server to a secondary server is called
a failover. This can be automatic or manual and doesn’t necessarily imply a disaster has
Designing a SQL Server Solution for High Availability | 213

occurred. Often, a failover is forced to occur in some situations, such as when patches are
applied, to minimize the downtime of the database.
Some of the technologies provide for an automatic failover of the SQL Server service to
another machine in the event of a disaster occurring on the primary machine. Others require
a manual intervention, with an administrator performing an action to bring the secondary
database online. No matter which solution you choose, there will be a delay as the secondary
server comes online, during which the database will be inaccessible. This delay and its
frequency affects the amount of uptime you’ll be able to achieve.
Each HA technology has advantages and disadvantages. Table 10-1 lists a few of the charac-
teristics of each technology. These characteristics will affect your choice of an HA solution in
your environment.

Table 10-1
High-availability comparison S PECIAL H ARDWARE
T ECHNOLOGY F AILOVER R EQUIRED HA S COPE
Clustering Automatic/Manual Yes Server
Database mirroring Automatic/Manual No Database
Log shipping Manual No Database
Replication Manual No Database

As shown in Table 10-1 some of the technologies support an automatic failover, which
implies a minimal delay during which the database is unavailable during a disaster. Others
require a manual intervention, which can involve substantial delays if administrators aren’t
readily available to complete the failover.
Only one technology requires special hardware: a clustering solution. More details are given in
the section on clustering regarding the implications of choosing this technology. This require-
ment can substantially affect your ability to choose this technology for budgetary reasons.
The last column in Table 10-1 shows the scope of each technology as related to its HA
capabilities. Clustering operates at the server level, which means that all databases, logins,
jobs, and so on are covered in the HA solution and will failover to the secondary server.
Notification Services and Reporting Services can be configured to run under a clustered
solution and failover along with SQL Server in the event of a disaster.
The other three technologies are designed to operate at the database level, which means that
server-level items, jobs, logins, endpoints, and so on must be synchronized on the secondary
server and then enabled on that server manually if appropriate. These technologies only
ensure that the database data itself is available in the event of a disaster.

Some technologies, such as the Service Broker and Notification Services, are contained
completely within a database. These HA technologies don’t failover automatically to the
TAKE NOTE
* secondary server. Manual intervention is required to ensure that these services continue to
function in the event of a disaster.

HA technologies can greatly assist you in providing a stable data environment to your applica-
tions and clients, but they aren’t without limitations. Those limitations, along with some
misconceptions, are discussed in the next section.

Recognizing High-Availability System Limitations

Each HA technology has specific limitations that will be discussed in individual sections.
This section will examine some of the general limitations of HA technologies along with
some of the problems that these technologies don’t solve.
214 | Lesson 10

The primary goal of any HA database system is to ensure that the database is always available,
any time of day or night, no matter what happens to any particular server. Although this is
the goal, there will always be a minimal amount of downtime as services move from the pri-
mary server to the secondary server. This can range from seconds to minutes in an automatic
failover to (potentially) hours for manual failovers. In choosing an HA technology and justify-
ing the choice to management, you should explicitly state the downtime potentials even if the
technologies function exactly as designed.
Data-loss prevention is a goal of any HA solution in addition to ensuring access to the data. This
is usually accomplished by keeping the server accessible, and also by preventing hardware or soft-
ware failures from causing any information stored in your database to be lost. The various tech-
nologies do this to varying degrees, some allowing no loss at all and others allowing you to specify
how much data you’re willing to lose. This is expressed in terms of time, because the synchroniza-
tion of data from the primary to the secondary servers takes place at a user-determined interval.
The administrator usually balances this goal of preventing data loss against the performance
or monetary costs of configuring a particular HA solution. This is often a difficult point
to explain to a nontechnical person, particularly a person in a management position.
Management never wants to hear that data could be lost and assumes that high availability
guarantees no data will be lost. An HA solution can be configured this way, but implementing
an HA solution isn’t an absolute guarantee that no data will be lost.
An HA solution provides for database services to be available on one of two or more machines
in the event of a disaster. It does not, however, provide additional performance potential or
load balancing across the multiple machines. In most cases, the secondary server machine isn’t
providing any database services for the application being protected by the HA solution. The
machine could be performing other functions, including supporting other SQL Server 2005
instances or databases, but it isn’t providing additional performance to the database or appli-
cation covered by the HA solution.
There are a few exceptions with database mirroring and log shipping, but the possible perfor-
mance gains may not continue in a failover situation.

Often, nontechnical individuals think that a cluster of two machines implies that half of
TAKE NOTE
* the requests are serviced from each machine, thereby providing a performance gain. HA
solutions are strictly for availability increases, not performance increases for your databases.

In addition to not providing additional performance for the application, the HA solution
doesn’t load-balance clients for the database services. At any particular time, one or more of
the secondary servers has resources that aren’t being used and that are available for use only in
the event of a disaster.

■ Understanding Clustering

Clustering is a technology that uses the Windows Cluster Services to provide multiple server
THE BOTTOM LINE nodes each providing SQL Server services using a central shared database on shared disk
drives typically setup in a SAN.

Clustering technology is based on Microsoft Cluster Services (MSCS) and has been available
since Windows NT 4.0 and SQL Server 6.5. This is often the first choice for administrators
who desire a highly available database server.
As shown earlier in Table 10-1, clustering operates at the SQL Server instance level, meaning
that all the instance services are protected from a hardware failure. In the event of a disaster,
all databases, logins, jobs, and other server-level services move to the secondary server.
Designing a SQL Server Solution for High Availability | 215

A failover cluster works by having various resources—in this case, including SQL Server—
installed on the cluster’s nodes. A node is any Windows server participating in the cluster. At
any given time, only one node can own a particular resource and use it to provide services to
clients. In the event of a disaster, the service fails over to another node that activates its copy
of that service and begins responding to clients.
Clustering in SQL Server 2005 has been expanded from SQL Server 2000. SQL Server
Agent, Analysis Services, Notification Services, and replication are included in failover clusters
with SQL Server 2005; SQL Server 2000 only included failover of the database services.
Disk resources are shared among all nodes, eliminating the need to keep a separate copy of
any data for the resource synchronized on multiple nodes.
Abstraction for the client is provided by presenting a virtual instance of the service—in this
case, a SQL Server 2005 service—to clients. Clients connect to this virtual instance rather
than to the actual instance running on the Windows server node. When the failover occurs,
this virtual instance moves to the secondary node, but its presentation on the network
remains the same so clients don’t need to be reconfigured.
Clustering is also the most complex technology of those presented in Table 10-1. Clustering
imposes additional demands on the database administrator and equipment to provide this
level of HA capability.

Understanding Clustering Requirements


As mentioned, failover clustering is built on the MSCS offered by the Windows operating
system. Before you can implement a SQL Server 2005 cluster, you need to have a
Windows cluster built on the host operating system. The following are some requirements
to implement a cluster:

• WSC-certified hardware. The hardware used for your cluster solution must be on the
Windows Server Catalog (WSC) as a cluster-certified system. Each server node is the same
type and size of system. It’s important to choose hardware from the cluster section, because
not all WSC resources are certified for clusters. If your solution will include a Storage Area
Network (SAN) device, then make sure the total solution is included on the WSC.
• Shared disk resources. A special shared disk subsystem must be set up to allow all cluster
nodes to connect to the same physical disks. This usually requires specialty hardware.
• Geographic limitations. Because a shared disk subsystem is involved, there are limi-
tations as to how far the clustered nodes can be from each other. This is due to the
requirements for low network and disk latency. Although this distance increases as net-
work speeds increase, the limit can prevent your solution from continuing to function
in some disasters.
• Additional network configuration. A cluster requires a network link between the
nodes—recommended to be a private network link—that allows the nodes to exchange
a heartbeat. This lets each node ensure the others are still functioning. Additional hard-
ware may be required on each node.
• Additional costs. In addition to ensuring that the cluster hardware is on the WSC,
often you must purchase additional resources, memory, disks, CPUs, or whole servers to
provide HA capabilities with a cluster solution. This can substantially increase the cost
of implementing this HA technology over other choices.
Software licensing is also a consideration, because all nodes participating in the cluster must
have the same version of SQL Server and hardware. This can add substantially to the cost of
the solution if you must license per processor, especially if you use an active/active solution
(defined in the next section). At the time of this writing, passive nodes don’t require their own
SQL Server 2005 license.
216 | Lesson 10

Designing a Clustering Solution


A failover cluster solution is relatively expensive with SQL Server 2005 because of the
hardware requirements. This option is listed first because all your design decisions will
likely be limited by the budget for your solution; carefully consider your financial limita-
tions when going through the design.

The first part of your design involves determining the type of cluster scenario to implement.
With SQL Server, you must make two intertwined decisions. The first is the number of nodes
that will be a part of your cluster. SQL Server is limited by the underlying MSCS cluster and
OS limitations as well as SQL Server itself. With Windows Server 2003 or 2008 Datacenter
edition and SQL Server 2005 or 2008 Enterprise edition, eight node clusters or sixteen node
clusters, respectively, are possible. This means up to eight (or sixteen) Windows nodes can
be connected in a cluster, but because each Windows node can have multiple SQL Server
instances, you can actually cluster more than eight SQL Server instances. There are issues
with resource requirements, so in a practical configuration, it’s unlikely you’d have more than
eight virtual SQL Server nodes present.
The standard edition of SQL Server 2005 or 2008 is limited to two nodes, and the
Workgroup edition doesn’t support failover clustering. Windows 2000 supports only two
nodes unless you use the Datacenter edition, in which case four nodes are supported.
Related to the number of nodes is the configuration of each node. Any individual node can
be set to be an active node, meaning that it’s the primary server for a virtual SQL Server and
responds to client requests, or a passive node, meaning that its SQL Server service isn’t actively
responding to requests and is awaiting failover from another node. These configurations are
referred to as active/active clusters or active/passive clusters.
This can be confusing, so consider a few examples. The simplest cluster is an active/passive
two-node cluster. In this configuration, shown in Figure 10-1, SQLProd01 is the primary
server and responds to client requests sent to SQLProd, the virtual instance. SQLProd02 is
the passive node, running idly and not responding to any client requests.

Figure 10-1
Two-node active/passive cluster

Client

SQLProd

SQLProd01 SQLProd02

Shared Disk
Designing a SQL Server Solution for High Availability | 217

If SQLProd01 fails for some reason, SQLProd02 will become the primary server after failover
and start responding to client requests. Only one server’s resources are used at a time, mean-
ing that half your server hardware (excluding disk drives) isn’t being used at any given time.
In this case, only one SQL Server license is needed for the one virtual server.
A second example, illustrated in Figure 10-2, shows a three-node, active/active cluster with three
physical servers and three virtual servers. In this case, each server is actively used at all times to
do work, and three SQL Server licenses are required for the three active server instances.
The failover strategy is more complex in this example, with each server having a designated
failover server in a round-robin fashion. Table 10-2 shows the virtual servers, primary physical
instance, and the failover physical instance.

Figure 10-2
Three-node active/active
clustering

Client

SQLProdA

SQLProd01 SQLProd02

Shared Disk

SQLProdC SQLProdB
Client Client

SQLProd03

Table 10-2
Three-node failover V IRTUAL S ERVER P RIMARY S ERVER S ECONDARY S ERVER
SQLProdA SQLProd01 SQLProd02
SQLProdB SQLProd02 SQLProd03
SQLProdC SQLProd03 SQLProd01

If any node fails, then the virtual server moves to another instance. However, when this
occurs, one physical server will be spreading its resources to serve two virtual instances. In
this example, if SQLProd02 fails, then SQLProd03 must serve clients connecting to both
SQLProdC and SQLProdB.
In order for the applications to function at a similar performance level, each server must have
enough spare processor cycles and memory to handle the additional load of a second instance
in the event of a failover.
218 | Lesson 10

The last example, shown in Figure 10-3, has a four-node cluster with three virtual nodes. In
this configuration, the cluster is set up in an N ⫹1 configuration with three active nodes. One
passive node acts as the failover node for any of the three active nodes. This type of cluster
requires three licenses for software; in addition, the passive node must have enough hardware
resources to handle the load for any one of the other three nodes.

Figure 10-3
Four-node cluster in an N ⫹1
configuration

Client

SQLProdA

SQLProd01 SQLProd02

Shared Disk

SQLProdC SQLProdB

Client Client
SQLProd03 SQLProd04

In all of these examples, the cluster solution should be designed with a specific performance
goal in mind. Because the secondary node in any of these cluster examples will receive an
increased load in the event of a failover, its hardware should be designed to handle the desired
level of performance. If the same level of performance is desired, as it often is, the secondary
server should have the same hardware configuration as the primary.
If the cluster is in an active/active configuration, then each server should have enough hard-
ware to handle its own load as well as the additional load from the node that would fail to
it. When the same performance is expected from the secondary server, a level of hardware
equivalent to that on the primary server will sit idle until a disaster event occurs. This idle
hardware is essentially an insurance cost that must be weighed against the cost of downtime
if the SQL Server instance fails.

Clustering Enhancements

SQL Server 2008 includes a set of enhancements for improved clustering. These
enhancements generally rely on enhancements included with Windows Server 2008. The
SQL Server enhancements include the following features:

• Cluster Validation Tool. This is a tool provided with Windows Server 2008. SQL Server
2008 requires a successful result from this tool in order for clustering to proceed.
Designing a SQL Server Solution for High Availability | 219

• Improved installation and set up of cluster nodes.


• Expanded maximum cluster nodes (OS dependent).
• Support for various other Windows Server 2008 OS clustering enhancements.
• Rolling Upgrades and Patches. SQL Server instances on a cluster can now be upgraded
one node at a time. This reduces the amount of downtime needed for upgrades because
only the database portion of an upgrade requires that the entire cluster be unavailable
for client access.

Considering Geographic Design

A clustering solution is generally contained within a single data-center facility, but advances
in fiber channel and iSCSI technologies make it possible to geographically disperse such
a solution to multiple sites. Doing so usually increases the cost of the clustered solution
greatly, but it provides for fault tolerance beyond a single site. Keep in mind, however, that
any clustered solution depends on a shared disk array, which is a single point of failure
for the cluster. Network latencies must be tightly controlled with any cluster, but espe-
cially with a geographically dispersed cluster. Make sure your budget allows for the proper
equipment to ensure high performance between the nodes and the disk subsystem.

If you choose to disperse your cluster across multiple sites, work closely with your hardware
vendors to ensure that the hardware chosen and the network design meet the requirements.
There is a separate cluster section for geographically dispersed clusters.
The disk subsystem is an important part of any clustered solution. The disks are shared,
although access is arbitrated to ensure that only one node controls any particular disk mount
point at a time. This disk subsystem can be a single point of failure and should be designed to
be highly available itself. HA disk subsystems are discussed later in this Lesson.

Making Hardware Decisions

The hardware choices for your cluster come from the WSC list, but you should carefully
consider expandability in your decisions. Because you’re essentially requiring the purchase
of two or more matched servers, if you outgrow your hardware the need to upgrade will
require purchasing two or more new solutions instead of just one. It’s recommended that
the hardware you purchase have the capability to add more memory or CPUs later if
required. If you determine that you need four CPUs per node, you may wish to purchase
eight CPU servers with just four CPUs installed. That way, you can add four CPUs later
to each node if necessary.

When designing your cluster hardware, keep in mind that you need to design for the perfor-
mance goal of each node when another node has failed. This usually means that the hardware
chosen for processor and memory needs should be able to handle the load of the primary and
secondary virtual servers that may potentially be running together.
For example, looking again at the three-node cluster example shown previously in Figure 10-2,
assume that each virtual SQL Server instance requires two CPUs and 4 GB of RAM to
meet its performance goals. The failover design requires that each node have four CPUs and
8 GB of RAM. If SQLProd01 fails, then SQLProd02 will be running both SQLProdA and
SQLProdB. To meet the performance goals, four CPUs must be dedicated to each instance,
for a total of eight CPUs. The RAM must be similarly configured.
Because your servers may not have equal CPU and memory requirements between the
multiple applications, you should add the needs of the instances that will run together and
purchase the amount of resources required for that node.
220 | Lesson 10

If your requirements dictate an odd number of processors or memory that doesn’t fit into
your hardware choices, it’s better to round both up to the next number of processors or
TAKE NOTE
* RAM. If you determine that a server needs three CPUs to meet performance goals, pur-
chase four CPUs rather than two. This will be a minor additional expense and some hard-
ware choices may require an even number of processors anyway.

Your hardware design also needs to specify how the hardware will be configured with the
different instances that are running. If you have a passive node that will support only one
instance in a failover situation, then you can dedicate all the resources to this instance.
However, if you’re running active/active clusters, you should specifically dedicate an amount
of memory to each instance. Doing so prevents problems when the second instance starts up
during a failover event and the two instances compete for RAM. You should also set an affinity
for CPUs between the instances to ensure that enough processor resources are set aside in case
of a failover event.

Addressing Licensing Costs

The last part of designing a clustered solution is being aware that the versions of Windows
and SQL Server must be the same on all nodes. You can’t mix editions or 32-bit and 64-
CERTIFICATION READY? bit versions in a cluster. This can have implications for the combinations of different SQL
Understand how Servers onto a clustered solution. If you have applications that only require 32-bit SQL
clustering differs from Server 2005 Standard Edition and others that require 64-bit Enterprise Edition, then
other technologies combining them on a cluster may mean spending more money on hardware and licensing
such as mirroring or for the Standard Edition applications than is justifiable. You should perform a careful ROI
replication. analysis when combining different applications to be sure the cost is worth the benefits.

■ Understanding Database Mirroring

Database mirroring is a technology in SQL Server that uses two copies of the database
THE BOTTOM LINE and provides for automatic failover in the event that one database experiences a disaster
event.

Database mirroring technology was designed to provide a very high level of database avail-
ability using lower-cost hardware than clustering. There are a number of differences between
clustering and database mirroring, and the best choice for your environment depends on the
particular needs of your organization.

Database mirroring wasn’t supported in the initial Release To Manufacturing (RTM)


version of SQL Server 2005 distributed in November 2005. However, with the release of
TAKE NOTE
* Service Pack 1 and subsequent Service Packs for SQL Server 2005, database mirroring is
a fully supported technology.

Some of the differences between clustering and database mirroring are as follows:
• Hardware. Clustering requires matching and supported hardware from the cluster sec-
tion of the WSC, including a shared disk subsystem. Database mirroring can work with
any hardware supported by Windows 2003; in addition, the two servers can use com-
pletely different hardware, resulting in a much lower hardware cost.
• Disk failure. Clustering doesn’t protect against a disk failure because the database used by
both the principal and secondary nodes resides on the same disks. Database mirroring pro-
tects against disk failure by having a copy of the database on each server’s separate disks.
Designing a SQL Server Solution for High Availability | 221

• Failover delay. Clusters fail over in 30 seconds to a few minutes, depending on the time
required to start the secondary instance and fail over the resources. Database mirror-
ing can fail to the mirror database in a few seconds. Both technologies allow manual or
automatic failover.
• Scope. Clustering operates at the server level, including SQL Server Agent, Notification
Services, other services, and all databases. Database mirroring only works at the database
level and requires that logins and any server-level resources that are required be synchro-
nized across both servers.
From this list, it may appear that database mirroring addresses most of the shortcomings of
clustering, fails over more quickly, and should be used everywhere clustering was previously
used. Although database mirroring does provide many benefits, it isn’t always the best choice.
The limited scope of database mirroring to a single database means that more administrative
work is required to ensure that the application will continue to function correctly in the event
of a failover.
TAKE NOTE
* Database mirroring is a robust technology that is usually easier to set up and administer
The master, model, and
than clustering, at a much lower cost. With its ability to provide for limited reporting using
msdb databases can’t be
database snapshots, fast failover times, and zero-data-loss protection, it’s a great alternative for
protected by mirroring.
many organizations’ user databases.
This technology works by applying all log records—essentially, every change that occurs—
from a principal database to a secondary database. The exact timing of this application
depends to some extent on how the database mirroring is configured. The application ensures
that all changes made to the principal database are reflected on the mirrored copy.
The next section will examine the configuration of a database-mirroring environment.

Designing Server Roles for Database Mirroring

A typical database mirroring setup includes either two or three servers, each providing
one of the three roles involved in database mirroring. The use of a third server is optional
to implement database mirroring but required in certain circumstances to enable auto-
matic failover.

The principal database is the live database being protected with database mirroring. Its role is
referred to as principal, whether noting the actual database or the server instance on which it’s
running. All changes made to the data from users or client applications occur on this database.
The secondary database, which receives the changes in the form of log records and has them
applied, is the mirror database. This role is the partner of the principal database and exists
perpetually in a loading state as log records from the principal are applied to this database.
The third role is that of the witness. This is an optional server used in some circumstances to
initiate the automatic failover within seconds of the principal server failing. Any version of
SQL Server, from Express to Enterprise, can act as a witness in database mirroring.
The witness works with the principal or mirror to form a quorum of servers. A quorum
of two servers determines which database is the principal server. In a normal situation, the
principal database and the witness form a quorum that keeps this database functioning as the
primary database. If communication with the principal fails, the mirror and the witness can
form a quorum to switch the mirror database’s role to that of the new principal. If the mirror
database can’t communicate with the principal, the witness and principal can still form a
quorum to prevent failover. A single server instance can function as a witness for multiple
database-mirroring sessions.
When a failover event occurs, whether automatic or manual, the principal and mirror switch
roles.
222 | Lesson 10

Understanding Protection Levels

Database mirroring can operate in one of three different modes, each offering a different
level of protection for the principal database. Each is described next along with the situa-
tions in which you may choose to employ that particular level.

UNDERSTANDING HIGH-PERFORMANCE MODE


The level that offers the least data protection but the best performance is high-performance mode.
In this mode, log records are sent from the principal to the mirror, but the principal doesn’t wait
for confirmation that the mirror has written those log records to disk before moving on to other
transactions. The two servers operate asynchronously, which allows for the best performance of the
principal database but may potentially result in some data loss if a failover is forced.
In this mode, automatic failover isn’t allowed, and an administrator must manually force the
switch of roles with a forced service failover. This causes an immediate recovery of the mirror
database, which can involve data loss if not all the transaction log records have been received
by the mirror database. A witness isn’t recommended for this mode, but if one is configured,
then it’s required to maintain a quorum. If a witness is present and the mirror goes down, then
the principal database must maintain a connection to the witness or it will take itself offline.
This mode is most useful in an environment where you can tolerate some data loss, but you
can’t tolerate the delays for all log records to be acknowledged. This may be the case if the
two servers are separated by large distances or many hops. You can choose this mode for
applications that stream data, such as price quotes, if the nature of the data is volatile but not
necessarily critical if some of it is lost in a disaster.

UNDERSTANDING HIGH-PROTECTION MODE


The intermediate level of data protection is called high-protection mode. In this mode, there
is no witness server, but the principal and mirror databases function synchronously. When a
log record is sent from the principal to the mirror, the mirror sends back to the principal an
acknowledgement that the log record has been written to disk. Once this has occurred, both
databases can then update the data with the change.
In this mode, only manual failover is supported.
This is a good mode to use if you don’t have a witness server or if you want to manually initi-
ate a failover in the event it’s necessary. Some applications may require configuration changes
to move to a new server, so an automatic failover of the database doesn’t allow the application
to keep running. This mode also may be desired if you want to ensure that two servers keep
their data synchronized, but not necessarily for disaster-recovery purposes. You may use the
mirror server for reporting or some other purpose and not require the automated failover. If
the mirror server goes down, the principal continues to operate unaffected although the data
isn’t mirrored any longer. Once the mirror comes back up, the transactions must be applied to
the mirror before it’s synchronized.

UNDERSTANDING HIGH-AVAILABILITY MODE


The third mode of operation for database mirroring is high-availability mode, which is similar
to high-protection mode but requires a witness server to form a quorum and determine the
principal server. This mode also operates synchronously, with the mirror database acknowledging
all log records from the principal database. Because all log records transferred are acknowledged
before they’re written to the database, the two databases remain synchronized at all times.
In this mode, a quorum is used to determine which server is the new principal if a server fails.
Either the old principal and the witness or the old principal and the mirror can form a quo-
rum and maintain the status quo of the principal database. If the old principal is unreachable,
however, the mirror and witness can form a quorum and switch roles to make the mirror
database the new principal database. If the mirror server goes down, the principal and witness
continue to operate.
Designing a SQL Server Solution for High Availability | 223

This mode is most appropriate for situations requiring automatic failover and zero data loss.
In conjunction with new ADO.NET 2.0 or SQL Native Client features, clients can automati-
cally redirect to the mirror server when a failover occurs.

Designing a Database-Mirroring Solution

Just as in clustering, the performance requirements will affect the type of database mirror-
ing setup you choose. If you can afford the hardware to meet your performance goals while
running in high-availability mode, this is the best choice for an HA system. If you have
hardware or network limitations, then you may opt for high-performance mode instead.

Your design should consider which databases need to be protected and then choose a separate
SQL Server instance to handle the mirror role. It’s possible to mirror to another instance of
SQL Server on the same Windows host as the principal database, but this approach provides
availability only in the event that the principal instance of SQL Server is unavailable. To
achieve a higher level of performance, you should specify another instance of SQL Server on
a physically separate Windows host.
In building your HA solution using database mirroring, distance isn’t a limiting factor.
Providing you have the network bandwidth, the principal, and mirror, databases can reside on
opposite sides of the earth.

Because database mirroring operates at the database level, you must set up a separate mir-
roring session for each database on an instance that you wish to protect. These separate
TAKE NOTE
* databases don’t all have to mirror to the same mirror server. You can use different physical
servers for each mirror database.

Your client applications, however, may dictate the feasibility of using database mirroring.
If you’re using Open Database Connectivity (ODBC), Object Linking and Embedding
Database (OLEDB), or an older database connectivity technology, then you’ll need to code
custom connection logic or set up some sort of load-balancing solution to allow clients to
redirect to the principal server. Using a load-balancing appliance and DNS names for connec-
tivity can seamlessly allow clients to connect to the proper server automatically.
If you don’t have this type of solution, you may have to manually alter the connection strings
for your client application in the event of a failover. Although the database may be avail-
able almost instantly after a failover, your clients won’t realize this until their application can
reconnect. The ability to redirect to the failover server is critical in designing your database-
mirroring environment.
If you’re using the SQL Native Client or ADO.NET 2.0 technologies, the connection strings
for connectivity can contain a primary and mirror server. This lets the client seamlessly find
the appropriate server.
In designing the mirror solution for your environment, make sure to account for the fact that
mirroring protects at a database level, not a server level. This means that as you add logins,
they should be added manually on the mirror database as well. Any server-level jobs that you
have running must be set up on the mirror database as well.
One last consideration is that the mirror database isn’t accessible or available to clients. It sits
idle, accepting transactions until it switches roles and becomes the principal in a disaster. One
way around this is to configure a snapshot based on the mirror database. Doing so gives you
a point-in-time view of the data. However, this snapshot must be continually dropped and
rebuilt to see the data changes occurring in the mirror database.
224 | Lesson 10

Configuring a Database-Mirroring Solution


Once you’ve designated a particular database as the principal and determined which server
will host the mirror database, you can begin to configure your database-mirroring environ-
ment. You should have determined that the mirror database has sufficient resources to host
a copy of the principal database as well as to handle the client load in the event of a failover.

The first step in enabling database mirroring is to ensure that the security for the database
mirroring session is set up. You have the choice to use either Windows authentication or cer-
tificates for the log-record transfer. This choice depends on your situation. If the servers are in
the same domain or in a trusted domain, then you can use Windows authentication. If you’re
TAKE NOTE
* coming from an untrusted domain, you can use certificate authentication. In either case, a
The endpoint must be login must be set up on the mirror server to allow the transfers.
created on both the The log-record transfers take place through the use of a special endpoint called a database-
principal and mirror mirroring endpoint. This endpoint must be established on each server, principal, and mirror, and
service with matching the network configured to allow traffic on the specific port chosen for communications. Because
ports. multiple databases can be mirrored from a single instance, you should specify a different port for
each database. These endpoints are created using the CREATE ENDPOINT command.
Once you have the communication channel set up, you must initialize the mirror database.
TAKE NOTE
* You do so by taking a full backup on the principal server and restoring it on the mirror server
The restores need to use using the same database name as the principal. All log backups taken since the full backup on
the WITH RECOVERY the principal must also be restored on the mirror server. This ensures that a full copy of the
option on the mirror principal database’s data is on the mirror database.
server.
At this point, both servers are configured for mirroring, and the session can be enabled.
Starting with the mirror server and then on the principal server, run the ALTER DATABASE
command with the SET PARTNER option to specify the opposite server and designated
TCP port for mirroring. Doing so enables database mirroring and begins the transfer of log
LAB EXERCISE records from the principal to the mirror.
Perform Exercise 10.1 in your In Exercise 10.1, you’ll set up mirroring on the AdventureWorks database. This exercise
lab manual. assumes that the database server was installed as the first named instance on the C: drive. If
you’ve installed your server in a different place, modify the paths to match your system. Be
sure the recovery model for your AdventureWorks database is set to full. In addition, a second
instance of SQL Server 2005 is required. It can be on the same server or a different server.
This exercise has SSC10\SS2K5 as the primary instance and SSC10\Sales as the secondary
instance. Again, adjust for your circumstances.

Testing Database Mirroring

Once you’ve enabled a database-mirroring setup, it’s important to test the setup to ensure
that mirroring is working and that your failover database can pick up the load. This also
ensures that your client applications can connect to the failover server.

You should test for two types of failover events: planned failovers, such as for maintenance
activities; and unplanned failovers, which are any failovers that haven’t been scheduled and
communicated to the appropriate people.
For planned failovers—typically, maintenance activities such as hardware or software
upgrades—you can develop a testing strategy using the manual failover commands. This
entails running the ALTER DATABASE command with the SET PARTNER FAILOVER
option on the mirror server, which forces a failover from the principal database to the mirror
database. Because this command will be run during a planned failover, you can ensure that all
clients connect to the mirror server, that all data changes have been synchronized on the mir-
ror database, and that all logins are available. Your testing strategy should ensure that a new
Designing a SQL Server Solution for High Availability | 225

login is added on the primary as well as some particular piece of data changed prior to the
failover. If your configuration and procedures are correctly set up, you’ll be able to test that
TAKE NOTE
* those changes have been copied to the mirror server.
If this test is conducted
on a production server, Unplanned events are slightly harder to test, but they can still be simulated. As with planned
make sure all affected events, you should explicitly create marked data, logins, and possibly other server-level items
clients are aware that the on the principal. You can simulate an unplanned failover by pulling the network cable out of
test is coming. the principal server. Doing so simulates a hardware or software abend (abnormal end) on the
principal server as well as a network failure, any of which could cause the failover.
If your servers are configured for automatic failover, you can check whether the marked data, log-
ins, or other specific events have been copied to the mirror server correctly. Because some of these
objects require manual synchronization by a database administrator, you should have procedures
in place to handle the case where the objects haven’t yet been moved to the mirror server.

Because the principal database may not be available in a real disaster situation, you can’t
refer to that SQL Server instance for the details of the object. You should make a paper
TAKE NOTE
* record or offline notation of the objects as part of the procedure for creating them on the
principal database.

In either of these test cases, make sure you test connectivity to the mirror database from all
the locations that require connectivity. This is especially important if you have geographically
dispersed mirrored servers. Your SQL Servers may failover quickly, but if clients can’t access
the remote SQL Server, then the application won’t be seen as available.

Mirroring Enhancements

SQL Server 2008 includes a series of changes designed to improve mirroring perfor-
mance. Prior to SQL Server 2008, mirroring was a one-way activity in that the principal
server sent data to the mirror server. Now with SQL Server 2008 and the feature for
automatic page repair each server can attempt to recover page data from the other server
participating in the mirror. If page repair is successful, all of the data is preserved. This
is because the second server in the mirror should have a perfectly good copy of the data
with which to perform the repair. In contrast, correcting errors by using the DBCC
REPAIR_ALLOW_DATA_LOSS option might require that some pages, and therefore
data, be deleted. Note however that if a corrupted page has been caused by some form of
drive hardware failure, recovery may not be possible and immediate attention should be
given to the situation.

Other enhancements to mirroring with SQL Server 2008 include:


• Compression of mirroring data. The log data being transmitted from server to server
is now compressed. The result should be less latency between a change on the primary
CERTIFICATION READY? server and the corresponding change on the mirror server.
Be prepared to answer
questions involving
• Write-ahead log. Writing log data to disk before all data has arrived on the mirror server
a witness server. also improves the speed of completing transactions.
Is a witness server • Improved efficiency of log send buffers. SQL Server 2005 reserves an entire log send
required for automatic buffer for any log flush operation. SQL Server 2008 now appends log records to the cur-
failover? What are the rent buffer if enough space is available.
hardware and software
requirements for a
• Read-ahead during undo. During a planned mirror failover, the new mirror server (the
witness server? former principal server) must undo all transactions that are not completed on the new
principal server (the former mirror server). Page read-ahead improves the efficiency of
this operation.
226 | Lesson 10

■ Understanding Log Shipping

Log shipping is a technology for high availability that is based on the normal log-file backup
THE BOTTOM LINE
and restore procedures that exist with SQL Server.

CERTIFICATION READY? In a log-shipping environment, transaction-log backups are made on the primary server and
Logs are usually moved then copied to the secondary server, where they’re restored. Prior to SQL Server 2005, the
to the standby server on Enterprise edition of SQL Server was required for this process to be automated, but many
a schedule—perhaps people developed their own scripts to simulate log shipping with the Standard edition.
every 15 minutes. If the
file transfer takes 20 In the event of a disaster situation, the final transaction logs are restored on the secondary
minutes to complete, log server, and then the status of that server is changed from a loading database to an active one.
shipping may not be a These final steps must be performed manually or with custom scripts. SQL Server provides
suitable option. Watch no automatic way to do this.
for these details in the
certification test’s lengthy Because log shipping uses regular file-transfer methods between servers, the log backups can
scenario. be copied to multiple servers, allowing multiple servers to be used for redundancy. This is an
advantage over clustering or database mirroring, although you can combine database mirror-
ing with clustering to copy the log records to other servers from the mirror server.
Log shipping also has another advantage over database mirroring and clustering: You can use
the secondary database for reporting and other read-only queries. If the secondary is a sepa-
rate server, then the HA resources are put to use instead of standing by idly.
The disadvantage of using log shipping is that the application and server roles don’t fail over
automatically. An administrator must manually bring the secondary database online, and you
must develop a method for ensuring that the application will use the secondary server. Because
manual intervention is required, the delay between when a disaster event occurs and when the
secondary server comes online will be greater than either clustering or database mirroring.
Another issue with this technology and failover is that the names of the servers on the net-
work must be different to comply with the Windows networking requirements. You must
develop a method to ensure that the clients can find and connect to the secondary server.
As with database mirroring, this is a database-level protection mechanism. Any server-level
logins, jobs, or other objects must be manually kept in synchronization by an administrator
on both servers.

Choosing Log-Shipping Roles

A log-shipping configuration includes three possible roles: the principal server, the secondary
server, and the monitor server. As with the other technologies, the primary server is the
production server that clients normally connect to for queries. The secondary server is
the server to which the database fails over if a disaster event occurs. The monitor server,
TAKE NOTE
* which is optional, should be a separate server that stores tracking information about the
In planning for more
backups and restores.
than one possible fail-
ure, or even for the
reporting load of mul- Similar to database mirroring, the hardware required is the regular hardware required for SQL
tiple applications, it’s Server. The principal and secondary servers don’t need to be the same or even similar hardware.
likely that you will have
even more powerful Similar to clustering, you can have multiple principal databases, from separate instances, all
hardware on the second- configured to fail over to a single SQL Server instance. This is a similar configuration to the
ary server than on the N+1 configuration used in clustering. This is a common configuration; because it’s unlikely
primary servers. that more than one principal server will fail at the same time, so resources are conserved in
this situation.
Designing a SQL Server Solution for High Availability | 227

The processor and RAM requirements for your secondary servers should be based on the
performance goals that must be met and baselines from your existing servers.

Switching Log-Shipping Roles

When there is a need to fail over to the secondary node, an administrator must perform
the process of bringing the secondary node online as a read-write database. If this data-
base is configured for read-only access, that will continue to work; however, any connec-
tions will be dropped when the final restores take place.

The steps for bringing the secondary server online are as follows:
1. Restore all remaining transaction-log backups from the principal server on the secondary
with the NORECOVERY or STANDBY option.
2. If the principal server is still accessible, back up the tail of the transaction log with the
NORECOVERY option.
3. Restore the tail of the transaction log, if available, on the secondary server.
4. Bring the database online by changing the state using the WITH RECOVERY option
of the RESTORE DATABASE command.
At this point, the secondary database is ready to begin handling read/write traffic. If connec-
tion changes must be made for any clients, they can occur now.
If you have multiple secondary servers, perform all these steps on all servers, with the excep-
tion of bringing the database online. Only the new primary server should be brought out of
the NORECOVERY or STANDBY state.
This secondary server, however, is still configured to be the secondary server. If this server will
begin responding to client requests and updating data, then its role should be changed from
secondary to principal. To switch roles, do the following:
1. Disable the backup job on the principal server.
2. Disable the copy and restore jobs on the secondary server.
CERTIFICATION READY?
Remember that log 3. Enable log shipping on the secondary server by using the wizard in Management
shipping requires manual Studio or by manually executing the stored procedures:
intervention in order to a. When choosing the secondary database, enter the name of the old primary
switch roles between database server.
servers.
b. Select the option No, the Secondary Database Is Initialized below the name.
Once you’ve completed these steps, the roles have been reversed. You can perform these steps
as often as needed, although after the first time, you won’t need to configure log shipping
again on the secondary—just enable the proper jobs.

Reconnecting Client Applications

When you fail over to a secondary log-shipping server, no mechanism is built into this
technology to enable clients to automatically fail over their connections. In clustering, the
virtual IP and server name remain the same. Database mirroring has failover connections
built into ADO.NET 2.0 and the SQL Native Client technologies. With log shipping,
however, the server name of the secondary server is different than the primary server, as is
its IP addresses. You must develop a method of reconnecting your clients to the second-
ary server so they can continue to access and update the database.

You can achieve this connection change three ways: manually update connection strings,
rename the secondary server, or use network abstraction techniques to direct client
connections.
228 | Lesson 10

The first of these is the most straightforward. In the connection string used by the clients,
whether this is ADO.NET, ActiveX Data Objects (ADO), OLEDB, or another mechanism,
change the name of the server to that of the secondary server. Depending on how central-
ized your connection strings storage is, this may or may not work well. If you’re supporting a
single web server with the connection string stored in a global variable, this is easy to deploy
because only one file is changed. Similarly, if your clients read the connection string from a
central location, then you can easily deploy this to a large number of clients. If the string is
coded into the registry on every client machine, this may not be the best choice for your envi-
ronment. Your decision to use this method will largely depend on how your application and
its connection strategy are architected.
The second choice is also fairly straightforward, but it’s a little more tedious. In this scenario,
you rename the Windows host of the secondary server to the name of the primary server.
Doing so requires that the primary server has already been renamed to something else or that
it’s offline. You may or may not elect to also change the network addressing, but that again
depends on your application. You must also rename the SQL Server 2005 instance to match
the Windows host, to ensure that clients can reconnect.
As an example, if your primary server is named SQLProd01 and your secondary server is
named SQLProd02, you take the primary (SQLProd01) offline or rename it to SQLProd03
(or some other unique name). Then, rename the secondary (SQLProd02) to SQLProd01
and also rename the secondary SQL Server instance on SQLProd02 to SQLProd01. If the
old primary is repaired and ready to come back online, you need to rename the secondary
(the original SQLProd02) back to SQLProd02 or another name before bringing the original
SQLProd01 back on the network with that name. Note that renaming also involves name
changes in any Active Directory domain as well as the name entries in your DNS server.
This is confusing, and if the failover isn’t permanent, you may not wish to choose this strategy.
Depending on the network configuration of your Active Directory domain controllers and
DNS servers as well as the lifetimes of cached DNS client entries, there may be a substantial
delay while the clients update their cached name lookup entries and the naming converges
onto the IP address of the renamed secondary server.
The final method is preferred by most companies and involves using your network
infrastructure to abstract the SQL Server address from the actual machine. You can
choose to use DNS or a load-balancing scheme to route traffic at the network level to the
appropriate server. For example, if you have all clients connect to a hostname in DNS such as
sql.sqlservercentral.com, instead of SQLProd01, then if SQLProd01 fails, you can change the
DNS entry for sql.sqlservercentral.com to resolve to SQLProd02.
This way, none of the clients must change, and the Windows names used by the clients
remain the same. The convergence of the name to the new address may still take some time as
clients flush their DNS caches.
A load-balancing device, either hardware or software, can be even simpler to use. If you have
all clients address the load-balancing device, it can instantly direct clients to the new server.
This is the preferred method if your network infrastructure supports it.

■ Understanding Replication

Replication is a term for multiple different types of processes that copy transaction data from
THE BOTTOM LINE one or more database servers to one or more other servers.

The replication technology available in SQL Server isn’t specifically developed for high avail-
ability. Instead, replication is designed to enable data to be moved from one or more servers
to another in a publisher-subscriber model. It can be adapted for high availability because it
automates the movement of data to remote servers.
Designing a SQL Server Solution for High Availability | 229

Three types of replication are available in SQL Server: snapshot replication, transactional repli-
cation, and merge replication. Because snapshot replication operates on the entire set of data
at one time, this type of replication isn’t suitable for an HA solution.
Transactional or merge replication, however, can operate at the transaction level. As quickly as
the transactions can be copied to the distributor and sent to the subscriber, they’re executed on
the secondary server, making both of these solutions good for implementing an HA solution.

Implementing High Availability with Transactional Replication

Transactional replication is designed to move data on a batch basis. It can be configured


to send batches of one, potentially keeping the data on the secondary server more up to
date than it might be with log shipping. Log shipping moves the transaction log containing
all transactions over a configured time period. For an HA system, the log is usually moved
every one to five minutes. The log being moved could contain dozens of transactions.

Transactional replication, however, can move data to the secondary server one transaction at a
time, resulting in very low latency for the changes being applied to the secondary server.
In building an HA system based on transactional replication, one of the advantages is that the
secondary system is a fully live system that is available for queries and even updates. If you
can separate the updates between two systems so there are no conflicts, you can implement
a bidirectional transactional replication scheme that sends updates from the primary to the
secondary and the secondary to the primary.
As with database mirroring and log shipping, you can use disparate hardware for the primary
and secondary servers. There are no restrictions beyond the fact that hardware must be on the
WSC. However, you should appropriately size the hardware for the load that will be placed
on the servers as well as the performance goals required.
A special parameter, @loopback_detection, is used with the subscription stored procedure to
prevent changes from being sent back to the originating node. You can have clients connect
to either the primary or secondary node, resulting in load-balanced performance, improved
capacity for transaction throughput, or geographically aware clients that connect to the closest
node. In any of these cases, the hardware requirements to meet a specific performance goal
can be reduced on each node because the full client load is never attached to a single server.
In the case of a disaster event, however, the surviving node must respond to all client requests,
resulting in much lower performance.
One of the downsides of replication is that it works on an article-by-article basis, where an
article is a set of data encompassing part of a table, a whole table, or a join of data between
tables. This results in an administrative effort for configuration that is directly proportional to
the number of tables in your database for an HA system. Each table must be added as a pub-
lication; this isn’t difficult when you’re setting up replication, but any changes to the schema
results in the need to add another publication. If this constant administrative requirement
can’t be performed, then some of your data may not be available if a disaster occurs.

Case Study: Handling Conflicts


If the possibility exists that the same data will be updated on separate nodes, then you
should consider merge replication instead of bidirectional transactional replication. Merge
replication is specifically designed to handle conflicts in updates on separate nodes.
With transaction replication, the last update won’t necessarily be the update that is made
on both nodes. For example, suppose NodeA receives an update to a row, and it takes
one minute for this update to be moved to NodeB. Twenty seconds after the update on
NodeA, the same row is updated to a different value on NodeB. Because of the time
delays, 40 seconds after NodeB is updated, the update is overwritten by the value from
230 | Lesson 10

NodeA. Twenty seconds later, NodeA’s value is overwritten by the replicated value from
NodeB. In this case, you’ll have different data on each system.
If you’re allowing this to occur on your systems, and they’re intended to be used for high
availability, you should ensure that code checks are run on a regular basis to look for this
disparate data. The way in which you deal with the data will be specific to your business
requirements. You must decide how the potential issues with data-update conflicts can be
handled. You’ll need to perform manual updates to the nodes with the incorrect data based
on what you decide for your business.
In merge replication, you can specify whether the first update, the last update, or cus-
tom resolution code is used to determine which value is written to all nodes. In any case,
where data can be updated on multiple nodes, you specify how conflicts will be resolved.
However, as with bidirectional transactional replication, the business rules for deciding
how conflicts are resolved will be specific to your business requirements.

Implementing High Availability with Merge Replication

Merge replication is similar to bidirectional replication in that changes made on either


the primary or secondary server are moved to the opposite server. It can be set to function
in a manner similar to transactional replication, operating on a transaction-by-transaction
basis. The HA features of merge replication are similar to those of transactional replication,
with the same hardware and scale requirements.

One of the advantages of merge replication over transactional replication is that updates can
easily be made to both the primary and secondary nodes. Any conflicting changes on vari-
ous nodes can be resolved in a variety of manners by the SQL Server replication agents. This
can provide additional scalability as well as availability by allowing a portion of clients to be
served by the primary node and a portion to be served by a secondary.
Unlike the other HA technologies, merge replication lets you split client connections among both
CERTIFICATION READY? nodes. A load-balancing technology used to direct clients to both nodes can immediately send
Know the difference clients to the surviving node in the event that the other node fails. This can provide a seamless
between a push transaction between nodes in addition to balancing the load across multiple nodes for scalability.
subscription and a pull
If you choose to share the load with multiple servers, make sure you’re aware of the performance
subscription and when to
reduction that will occur if one node fails. If a reduced performance capability is acceptable in
use each type.
the event of a disaster, then this can be a good technology for a highly available system.

Designing Highly Available Storage

One of the most important aspects of any HA solution is ensuring that your application
and the database services continue to function in the event of a disaster. All the technologies
discussed are designed to ensure that this happens. However, the disk subsystem is partic-
ularly important because your data is stored on it; the disk subsystem must be protected
differently than the server instance.

Disk drives are mechanical devices with moving parts, unlike all the other critical parts of a
database server, which are electronic. It’s far more likely that a disk drive, with its spinning
platters and moving heads, will fail than any other part of your database server. The disk drive
is also where the data of record, meaning the authoritative source, is stored. As changes are
made, they aren’t considered permanent until the log record is written on disk; and changes
aren’t necessarily recoverable until the data record is stored on a disk.
No matter which technology you choose to build an HA solution with—or which combina-
tion of the four technologies previously discussed—you must be sure your storage solution
is well protected from any disaster. Clustering, the solution chosen most often before SQL
Designing a SQL Server Solution for High Availability | 231

Server 2005, requires even more protection, because only one set of disks is shared between
TAKE NOTE
* the nodes on which the data is stored. The other three technologies have separate disk subsys-
No matter which type tems for the primary and secondary nodes, providing some degree of fault tolerance.
of disk subsystem you
choose, it is not a In designing a highly available storage solution, the method used for disk drives is the
replacement for a tape- Redundant Array of Inexpensive Disks (RAID). This technology is available in many forms
backup system that pro- and possible combinations, each of which has different advantages and disadvantages.
vides archived records Storage Area Network (SAN) technology is another way of building on the benefits of
of your data as well as RAID arrays to provide even more fault tolerance and better performance. Both of these are
off-site storage. discussed next.

UNDERSTANDING RAID ARRAYS


RAID technology dates back to 1988, when it was introduced in a paper by David Patterson,
Garth Gibson, and Randy Katz. This paper described using a series of inexpensive disk drives
to achieve greater reliability and performance than a single disk drive using a group of drives.
The original paper defined five levels of RAID; over the years, more have been added by other
groups seeking to improve on the concept.
The basic idea of RAID arrays is that multiple disks are used to store data along with a par-
ity calculation based on the data. Thus one disk, and potentially more, can fail, and the data
can still be recovered. Modern RAID controllers often include the ability to have spare drives
standing by that are added into the array in the event of a drive failure. The remaining drives
in the array can then be used to rebuild the data from the failed drive on the new drive.
Of the various RAID levels, four are suited to SQL Server database servers. (More levels are
defined, but most are either rarely implemented or not suited for database servers.) Each of
these is briefly discussed here:
• RAID 0. Also known as striping, this level involves a series of disks sharing the data load
across them. A portion of each stripe, or set of data, is written to each disk. Because each
disk operates independently and handles only a portion of the data set, performance
improves dramatically over a single disk. The downside to RAID 0 is that there is no
fault tolerance. If a single drive fails, all data is lost. A RAID 0 array can be formed from
two or more disks and all the space on both disks is used.
This level isn’t usually recommended for production systems, although it can be a great
file system for intermediate operations such as extraction, transformation, and load
(ETL) temporary storage.
• RAID 1. Also known as mirroring because the same information is written to each one
of a set of disks. Because the information is the same on each set of disks, there are two
benefits: read performance and reliability. A read can come from either disk, so which-
ever one responds first allows the SQL Server instance to receive the data and continue
processing. Having a complete copy of data on another disk means that there is fault
tolerance: A drive can fail, and the data will still be retrievable.
The disadvantages to this level are the disk space requirements and the write perfor-
mance. Because the data must be written to both disks, write performance may be
decreased. Also because there are 2 disks, one for each side of the mirror, the cost for
disk storage doubles for SQL Server instances. You can form this type of array from pairs
of disks (two or more) joined together by the controller.
This level has great read performance and fault tolerance and is often used for SQL
Server transaction logs and tempdb database files.
• RAID 5. Also known as striping with parity, this is the most common level of RAID
used in SQL Server database servers. In this type of array, you sacrifice one disk for
parity information that is calculated from the data striped across the other disks. The
parity information is shared across all disks, as is the data. This level provides a balance
between read-and-write performance because the data is striped, but all disks must be
read to retrieve the data. The cost isn’t as high as RAID 1, because only one disk is used
for parity information. You can form a RAID 5 array from three or more disks.
232 | Lesson 10

RAID 5 provides a good balance between the cost of multiple disks and the performance
of RAID 1 for reads. Unless you need to build a very high level of performance into
your database files, this is a great choice for most database data files.
• RAID 10. Also known as RAID 1+0 because it combines the RAID 1 and RAID 0 lev-
els to get the benefits of both. This is one of the highest-performing RAID levels, but it’s
also one of the most expensive options. A minimum of four disks is required to imple-
ment RAID 10.
RAID 10 is the best choice in SQL Server instances where high performance is required
and the expense of this level can be justified.

DESIGNING A RAID ARRAY


Every SQL Server production instance on a server should have its disks protected with RAID
technology. The disks are the most fragile part of the database server and also the most critical
because they hold the data. A server can be rebooted to solve many problems, but the disks
are required to reload the data after SQL Server starts up.
In choosing to design your disk arrays, you’ll be forced to balance the performance you need
with the cost of the arrays. If cost isn’t an issue, then you should implement RAID 10 every-
where to ensure high performance along with fault tolerance. Because this isn’t usually the
case, you must first determine what performance requirements you need to meet and then
choose the highest level of fault tolerance you can afford.
The first decision you must make is whether the system will be primarily write oriented or read
oriented. Often, this comes down to the role of the server: that of a transactional, or write-
based system; or that of a decision-support, or read-based, system. SQL Server database servers
are frequently On-Line Transaction Processing (OLTP) based even though there may be more
reads than writes. SQL Server Analysis Services servers usually involve many more reads than
writes. You’ll need to do some benchmarking to determine the ratio of reads to writes.
If your system is primarily writes, then you should probably choose to implement RAID 5, or
RAID 10 if you can afford it. This gives you good write performance along with fault toler-
ance for disk-drive failures.
If your system contains many more reads, then you should choose RAID 1 if the disk cost
isn’t unreasonable. Otherwise, RAID 5 with one or two extra disks (based on capacity)
balances this cost with performance.
However you choose to design your arrays, have extra disks available in case of failures, prefer-
ably operating in standby mode if your RAID controller supports hot spares. You also need to
LAB EXERCISE choose how to separate your data files, as discussed in Lesson 3.
Perform Exercise 10.2 in your In Exercise 10.2, you’ll walk through designing a series of RAID arrays for a SQL Server
lab manual. instance. In this exercise, you’ll examine building a series of RAID arrays for an instance of
SQL Server. The decision has been made to build one 16 GB array for the OS and pagefiles,
one 70 GB array for the log files, one 500 GB array for the data files, and one 50 GB array
CERTIFICATION READY?
RAID is a technology for
for the tempdb database. You have 35 GB and 70 GB drives available for the arrays.
providing redundancy
at the drive level. Watch DESIGNING A SAN STORAGE ARRAY
for exam questions that SAN technology is similar but different to a related technology known as Network Attached
discuss HA requirements Storage (NAS). Both technologies involve using one or more arrays of disks configured in
that are focused beyond possibly multiple RAID arrays providing a large amount of centralized storage to possibly
disk drives. multiple servers. SAN technology differs from NAS, in that it operates over a private network
as opposed to the normal network that most servers and clients use to communicate.
A SAN array is essentially a large set of disks managed by a high-performance controller,
which presents a mount point to various servers across a private network. Usually, the net-
work is fiber-channel based with Host Bus Adapters (HBA) in each server connected to a
switch that in turn connects to the SAN device. The SAN device presents a logical disk drive
to the server, which appears to be a single disk to the Windows host but could be two, three,
or dozens of disks on the SAN device.
Designing a SQL Server Solution for High Availability | 233

SANs utilize a complex technology and require specialized training to set up and manage.
TAKE NOTE
* In many large organizations, a single person is often dedicated to managing the equipment.
Most larger SAN solu- A complete discussion of SAN technology is beyond the scope of this text; however, a DBA
tions require extensive should be aware of some basic principles and ensure that they are implemented if the DBA
vendor support and will be managing SQL Server instances whose data is stored on a SAN device.
aren’t normally available
as off-the-shelf products. CHOOSING RAID LEVELS
Be sure you take advan-
tage of the design and Most, if not all, SAN disks are arranged into multiple RAID arrays, which are then presented to
testing resources your the servers either whole or carved up with a portion presented as a single disk to each server. The
vendor offers. RAID recommendations presented earlier in this Lesson apply to setting RAID levels on a SAN.
Because a single RAID array can be presented to multiple servers with a SAN, the DBA
should be aware of this situation if it exists. Although many SAN vendors tout the high per-
formance of their arrays, they often build a single large RAID 5 or RAID 10 array. Portions
of this large array are presented to each server for individual use. This can be a potential per-
formance problem and should be avoided or tested thoroughly to be sure SQL Server won’t
experience performance issues.

DESIGNING FAULT TOLERANCE


The SAN device will provide fault tolerance for the disk drives, but a few places can affect
SQL Server instances if they aren’t specifically addressed. The first of these is the HBAs in
your SQL Server. These rarely fail, but they’re potential points of failure; if possible, your
database server should have two HBAs with separate paths to the SAN device. This should
include separate fiber paths to separate switches for each HBA. This setup helps to ensure that
a single hardware failure on a cable or hardware device doesn’t cause SQL Server to fail.
Some SAN implementations have captive disks inside the Windows host that are used to boot
the operating system leaving the SAN disks for data storage. Others boot the Windows server
directly from the SAN disks. If your database server uses the former design, make sure those
CERTIFICATION READY?
captive disks are protected by RAID. If they fail, the Windows operating system will fail even
Remember that a SAN
is still a single storage
though the SQL Server data will be protected and available on the SAN.
mechanism. High- The last part of designing a fault-tolerant SAN solution is ensuring that the SAN device has a
availability requirements backup solution designed into it. Because SAN devices often implement multiple terabytes of
may necessitate storage data, it’s crucial that this data, or at least your SQL Server data, is protected. You should use
of data at multiple
either a second SAN—preferably, geographically remote from the primary SAN—and/or a
physical locations.
tape backup solution to ensure that the data is available in the event the SAN device fails.

Designing a High-Availability Solution


Building an HA solution is usually a balance between the likelihood of a disaster event
occurring and your enterprise’s tolerance for downtime. Many companies can function if
their database server is down for a day, so using a development or other spare server and
rebuilding the SQL Server installation is a valid HA solution. For many other companies,
however, having their database server unavailable for an hour results in substantial costs
to the enterprise.

In either case, the decision to implement an HA solution for your SQL Server requires you to
analyze the risks of disaster and the cost of downtime in order to build a solution. The spe-
cific solution you choose will depend on your needs.
Your design should consider four basic considerations: failover time, automatic or manual
failover, the application requirements, and cost. Each of these will provide input into your
design and help determine what type of solution you implement. Keep in mind that all these
factors must be balanced against one another, because more stringent requirements in one
area usually lead to additional costs in another.
234 | Lesson 10

The failover time is one of the main factors that influences the type of solution you’ll imple-
ment. A failover time measured in hours means that you can choose almost any solution,
including building a server from scratch in a motel. However, a failover time in minutes or
seconds may mean that you’re limited to clustering or mirroring. Because log shipping or rep-
lication requires manual intervention, the time required for an administrator to respond will
determine if you can use these technologies.

If you require an administrator to respond to a disaster situation, make sure you test the
response time and enact rules to guarantee the response times can be met. For example,
TAKE NOTE
* you may want to ensure that the on-call administrator is never more than 30 minutes from
a computer.

Closely related to the failover time is whether automatic failover is required. If so, then you’re
limited to clustering or mirroring unless you have and can spare programming resources to
build an automated solution.
Application issues also play a part in the HA solution you can build. Server instance–level
protection often mandates failover clustering, although you can conceivably use database mir-
roring, log shipping, or replication on all databases that need protecting. If you require SQL
Server Agent, Notification Services, or Reporting Services to be fault tolerant, then you may
be limited to clustering unless you can build creative solutions that can handle your needs.
The application’s ability to handle disasters also will influence your choice of technology. If an
application can’t be modified to work with server-name changes or other addressing consid-
erations of some technologies, clustering or database mirroring may be your only choice for a
solution.
Finally, you should weigh the cost of the technology against the benefits of the HA solution.
An application that will cost you $100 per hour of downtime may not justify the cost of a
$50,000 cluster. Each solution you design will potentially have additional hardware costs,
vendor support costs, licensing costs, employee costs for on-call or after-hour on-site support,
and more. The total cost for each solution should take all these items into account. The cost
of downtime and the risk of downtime occurring should be compared to the solution cost to
determine if the solution is worth implementing.
No matter which type of technology you choose, the SQL Server hardware should be built
with fault tolerance in mind. This usually means spare parts for the various components of
the database server, but it could also include RAID technology, spare network paths, and ven-
dor SLAs to ensure that your SQL Server instance will continue to function in the event of a
disaster.
The main thrust of an HA plan is to limit the single points of failure as much as possible.
All the technologies discussed earlier are aimed at preventing a single database, server, or disk
from being a single point of failure. There are a few other items to consider in designing your
solution, discussed next.

PLANNING FOR NONTECHNICAL ISSUES


Building a technical solution to an HA problem is the easy part. Deciding on a technology,
contracting for remote locations and services, and configuring software are all relatively
straightforward processes to complete. Other issues that can arise in a disaster situation,
however, are more difficult to plan for and may not be easily mitigated.
The biggest issues usually involve staffing in a disaster situation. This can be a small-scale
disaster where the DBA is hurt in a fire, is injured by tripping over the server power cord, or
for some other reason is left unable to respond to the issue. Or it may be a large-scale prob-
lem like those experienced during Hurricane Katrina in 2005, where companies found that
large portions of their staff were unable to report for work because of evacuations or personal
issues from the storm.
Designing a SQL Server Solution for High Availability | 235

Recognizing that your staff is critical to the successful continuation of operations in the event
of a disaster involves two phases. First, you need to ensure that processes and procedures are
documented and employees are cross-trained. Doing so helps prevent any one person from
being a single point of failure.

It’s often hard to ensure that technical employees don’t make themselves a single point of
failure. Sometimes they’re averse to documenting too much of their job for fear of working
TAKE NOTE
* themselves out of a position. Show your employees that they’re valuable in spite of the fact
that you have someone else who can perform their job.

The second part of mitigating staffing issues is planning for the problems people may experi-
ence and helping them work through those issues. Rotating shifts, providing help for families,
or other means of ensuring that your staff is able and willing to help the company through a
disaster situation can mean the difference between your database continuing to function or
never coming online again.

CONSIDERING REPORTING ISSUES


A common request from clients and managers is that the secondary server in an HA solution
be made available for reporting purposes. The logic is that because a separate copy of the data
exists and a server is sitting idle, it should be used if at all possible for another function.
Table 10-3 lists the possibilities for using the secondary server for reporting with each of the four
HA technologies. Keep in mind—and caution your clients or coworkers about—the impact of a
failover event on the reporting server. You must determine whether reporting will still be allowed
(or possible) on the reporting server if there is a failover from the primary server.

Table 10-3
Reporting options for the HA T ECHNOLOGY R EPORTING S ERVER O PTIONS
secondary HA server
Failover clustering Not available for reporting.
Database mirroring Not directly available, but database snapshots can be scheduled on
the mirror server and used for reporting.
Log shipping Secondary database(s) can be restored with the STANDBY option
and used for reporting. Reporting is unavailable during restores.
Replication Secondary database is available for reporting and, potentially, updates
if merge or bidirectional replication is used. Secondary is always
available.

Your choice for a reporting solution should take advantage of the potentials of each technology
for meeting this need. However, it should be a secondary criteria for choosing a solution—
meeting your HA needs should be the primary objective.

You can combine the HA technologies to achieve your needs, especially for reporting. Log
TAKE NOTE
* shipping or replication can be combined with database mirroring or clustering to build a
reporting server.

■ Developing a Migration Strategy

A migration strategy is needed in order to transition from a single server configuration to


THE BOTTOM LINE
one of the HA configurations.
236 | Lesson 10

The last part of building an HA system is moving your current environment to a highly
available one. This section assumes that you have a system running that isn’t fault tolerant,
and you wish to move the system to an environment that is designed to function in the event
of a disaster.
Because the system you design can be as simple or complex as your resources allow, it’s impos-
sible to discuss all possibilities, but some general guidelines can help you ensure a smooth
transition to the new environment.

Testing Your Migration

The migration to a new solution can take any number of paths, depending on how your
old and new systems are architected. The method you choose to perform this migration
also depends on the skills of your staff at quickly moving the application and other
factors discussed.

However you choose to perform the migration, it’s critical that you test the plan. Your HA
solution will probably be with new servers, so set up a development or spare server as close as
possible to the existing SQL Server database server and test the steps for moving the database,
jobs, logins, and so on. Ideally, you should capture a real-world load using Profiler on the
production server and replay that during the migration test to be sure the workload can be
executed.
There may be a few or hundreds of steps to perform the migration, and you should docu-
ment the order in which they occur as well as who should perform them. Doing so will help
ensure that the process proceeds smoothly.
Finally, you should test the failover and failback after the migration. Failback is the reverse of
failover in that failback refers to transferring the role of the principal server or database back to
the original server or database. Moving to an HA system makes sense only if you’re sure that
the failover in the event of a disaster, and the failback later, can be performed when needed.

Minimizing Downtime

Depending on which technology you choose and the structure of your current environ-
ment, it’s possible to eliminate any downtime for the application. If you’re adding to your
existing environment, such as implementing log shipping, mirroring, or replication on
an existing SQL Server instance, you can add these items to your database and initialize
them without interrupting your application’s access to the database.

Even if you’re choosing to move to new hardware with one of these technologies, you could
conceivably add the technology as you’re moving from the old server to the new one and then
“fail over” to the new server. This should be done only after some testing of the solution, but
it can move you onto a new server transparently to the application. You can then reconfigure
the old server or even replace it with another server and reconfigure the failover.
If you’re implementing a cluster from a previously unclustered solution, you’ll require down-
time to move the data and bring it up on the cluster. On a SAN, this process can be as simple
as presenting the same disks to the new server, thereby minimizing downtime. If not, then
you should perform extensive testing and documentation of the process for migrating the
application to ensure that the actual migration occurs as smoothly as possible.
Designing a SQL Server Solution for High Availability | 237

Implementing Address Abstraction

Starting with Active Directory in Windows 2000, Microsoft moved away from the older
single network addressing scheme of NETBEUI and WINS to the more widely used
Domain Name System (DNS) for name mapping. This abstraction enables the underly-
ing server IP address to change without affecting the ability of clients to access the server.
SQL Server clients typically address the server by its Windows name, which is unique on
the network and mapped or associated by DNS with an IP address.

One way to ensure a highly available system is to prevent a dependency on a particular server
name. This eases migrations to new hardware, implementation of clustering, failovers with log
shipping, and more. You can easily do this two ways in most Microsoft environments: using
DNS or using a load-balancing technology. Both of these function in a similar way to abstract
the server addressing from the actual server.
If you use DNS to abstract your server address, you should create a specific host name separate
from the server name that your clients will use to connect to the server. For example, if
dkranch.com is your domain in Active Directory and your SQL Server instance is hosted on the
SQLProd01 Windows server, the typical AD name of this server is SQLProd01.dkranch.com.
If you create a sqlprod.dkranch.com host name and link this to the IP address of SQLProd01,
then if you need to migrate to SQLProd02.dkranch.com, you can edit the DNS entry for
sqlprod.dkranch.com, and your clients will automatically connect to the new server.
The other method is to use a local-balancing scheme that routes clients to one or many serv-
ers taking part in the load balancing. Microsoft offers Network Load Balancing as a feature of
Windows Server 2003, and many hardware devices perform the same function. In choosing one
of these for a database server, be sure you configure all clients to connect to the primary server
only by default. The secondary server should receive client requests only if the primary has failed.
If you have an abstraction solution in place, your migration to an HA system should be a
simple matter of editing the abstracted name.

Training Your Staff

One item that’s often forgotten in planning for a new technology implementation is the
training of your staff to support the technology. If you purchase a vendor solution, or
a single employee designs and tests the solution, then others are often only peripherally
involved and unable to support the solution on their own.

A highly available system requires that the single points of failure be minimized—and this
includes your employees. Be sure you include time and budget to have at least two (and
preferably all) on-call employees trained.

S K I L L S U M M A RY

A highly available system is unique for many companies, involving those technologies and
processes that guarantee the system functions at the necessary level for the enterprise. The
scale and capabilities of the secondary system that is used if the primary system is unavailable
depend on the requirements of your organization.
Each of the four technologies available in SQL Server to implement highly available database
servers has its own features, disadvantages, and costs. The technology that is appropriate for
your application depends on the business’s needs for that application. No single technology is
right for all applications.
238 | Lesson 10

As you design high availability into your database servers, make sure you consider all
technologies equally in determining which one is best suited for your application. Examine the
entire system, from hardware to network to staff outside of the four technologies, to ensure
that the entire application has as few single points of failure as possible.
For the certification examination:
• Understand the SQL Server failover clustering. You should know the capabilities and
limitations of failover clustering as an HA technology in SQL Server. Also know the
requirements of this technology over and above those of non-clustered SQL Server.
• Understand SQL Server database mirroring. You should understand how database
mirroring works in SQL Server and when it’s appropriate to use as an HA technology.
• Know when to use log shipping. You should understand where and when log shipping can
be used to build a highly available system.
• Understand how replication can be used in an HA system. Know which types of replication
can be used to build an HA system as well as the limitations of choosing this technology.
• Identify single points of failure. You should understand what a single point of failure is
and how to identify the single points of failure in your system.
• Know how to migrate your application to an HA environment. You should understand how
to develop a migration plan for your application to move to an HA environment.

■ Knowledge Assessment

Case Study
Ed’s Heavy Equipment
Ed Harvey started an equipment-rental business in southeastern Virginia that special-
izes in home garden and tractor equipment. The company has equipment in a number
of home stores that customers can rent to use in their gardens, farms, or yards. Remote
terminals in each store communicate with a central office where the database server
tracks all rentals.

Planned Changes
The business has grown substantially, and now Ed wants to enable customers to reserve
equipment over the Internet as well as at stores. He wants to be sure his business con-
tinues to function even if a disaster occurs at one store or the central office.
The IT staff wants to be sure they choose the best combination of technologies to build
an HA system while being careful of the overall cost of the system.

Existing Data Environment


Currently, a single SQL Server 2005 server named SQLRentals is located at the central
office, and all clients connect to this server by its Windows name. This server contains a
central database that stores client and rental information and is backed up nightly.
A number of jobs run under SQL Server Agent and send emails to drives to notify them
to move equipment between stores. Because there can be delays in getting equipment
transferred, these jobs usually send emails three days before the equipment is needed
and continue to send emails until the destination marks the equipment as in stock.
Designing a SQL Server Solution for High Availability | 239

Existing Infrastructure
There is a single Active Directory domain to which all employees authenticate.
The current SQL Server 2005 server hardware is adequate, but a new server is expected
to be purchased this year to increase performance.
Each store has its own client machines, at least two per store, to connect to the central
office across a high-speed private network. Each store can also communicate with all
other stores via instant messaging, so clerks can send messages to each other.
Every store has room for its own server. The business considered this option initially,
but decided against it.

Business Requirements
Ed wants to be sure that if something happens to the server in the central office, all
clients can continue to connect to this server without interruption.
There is a remote possibility that the central office could go offline because of construc-
tion work in the area. Ed has arranged for another web server and separate connection
to the Internet at the Chesapeake store. This web server currently connects through the
private network to the central office. If the central office loses its Internet connection,
the Web site should continue to accept reservations. A delay of an hour or two to get
this running is acceptable.
Clients can be reconfigured to connect to another server if a long outage for the central
office is expected, but this isn’t acceptable for short-term problems.

Technical Requirements
The solution designed should take advantage of SQL Server 2005 HA technologies to
meet the business requirements.
A few new servers can be purchased, but separate servers can’t be placed in every store.
The private network provides adequate connections between stores for all client traffic
in the event of a disaster at the central office. It can’t support disk-access traffic.
The ISP for the company provides load balancing from the Internet for both web
servers using the two separate connections from the central office and the Chesapeake
store. However, the internal connection from the web servers to the database server
is managed by the internal IT staff. If the database services move to a new server, the
connections should transition easily to the new server.

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions.
1. To ensure that the central office is adequately protected, new server hardware is being
purchased. Which technology would you choose to protect the database and ensure that
all SQL Server Agent jobs continue to run if the primary server has problems?
a. Failover clustering
b. Database mirroring
c. Log shipping
d. Replication
240 | Lesson 10

2. You decide to implement automatic failover from the central office for the application
in case that office goes offline. Which technology is best suited for this?
a. Failover clustering
b. Database mirroring
c. Log shipping
d. Replication
3. To ensure that all clients can redirect to a new server in the event of a disaster, how
should you set up the new servers?
a. Set the VIP to SQLRental01 on the cluster, and name the mirror server
SQLRental02.
b. Set up a DNS host as SQLProd, and direct it to the cluster. In the event of disaster, it
can be moved to the secondary server.
c. Change the application to try the primary cluster node Windows name first and then
the secondary cluster node Windows name.
d. This cannot be done with SQL Server 2005.
4. One of the senior managers wants to consider the possibility of having multiple failover
databases to ensure that two failures do not stop the business. Which technologies can
you use to meet this objective? (Choose all that apply.)
a. Failover clustering
b. Database mirroring
c. Log shipping
d. Replication
5. The application supports mostly OLTP traffic, with a good mix of reads and writes. Your
server has five drives in it, and you want to ensure that you protect the data as well as
have as much storage as possible. Which type of RAID should you choose?
a. RAID 0
b. RAID 1
c. RAID 5
d. RAID 10
6. Which of the following can you leave out of your HA design, given the fact that the
company has never used clustering?
a. SAN devices
b. Staff training on clustering
c. RAID technology
d. A secondary server
7. You are considering separating the store rentals from the Internet customer reservations
in two databases. To do this while ensuring that your system is still fault tolerant and
with minimal application changes, which technology should you choose?
a. Failover clustering
b. Database mirroring
c. Log shipping
d. Replication
8. You decide to implement database mirroring between two servers after upgrading all
clients to use the SQL Native Client that ships with SQL Server 2005. What do you
need to do to support automatic failover?
a. Build retry code into the application to try the primary server and then connect to
the secondary server.
b. Use two connection strings, one for each server, and have the application try both
each time it runs a query.
c. Add the secondary server into the connection string as the secondary database-
mirroring server.
d. Put both servers behind a load-balancing device to handle this.
Designing a SQL Server Solution for High Availability | 241

9. The application has been modified to have customer reservations connect to one server
and store reservations connect to a second server with merge replication moving data
between the servers. Customer orders from the Internet should take precedence over
store orders if the customer has an account. How can you ensure that this happens?
a. Set up the replication to always start with the Internet server and send data to the
store server.
b. Use custom code to resolve replication conflicts.
c. Set the priority of the Internet server lower than that of the store server.
d. Have the Internet application write to both servers.
10. As a secondary plan to your clustering solution, you decide to have log shipping send
copies of the transaction logs to the Chesapeake store. To enable managers to query
this database and not load the primary database, what option should you use with the
restores?
a. WITH STANDBY
b. WITH RECOVERY
c. WITH REPORTING
d. WITH ONLINE
11 LESSON
Designing a Data
Recovery Solution
for a Database
L E S S O N S K I L L M AT R I X

TECHNOLOGY EXAM
SKILL OBJECTIVE
Specify data recovery technologies based on business requirements. Foundational
Analyze how much data the organization can afford to lose. Foundational
Analyze alternative techniques to save redundant copies of critical business data. Foundational
Analyze how long the database system or database can be unavailable. Foundational
Design backup strategies. Foundational
Specify the number and location of devices to be used for backup. Foundational
Specify what data to back up. Foundational
Specify the frequency of backup. Foundational
Choose a backup technique. Foundational
Specify the type of backup. Foundational
Choose a recovery model. Foundational
Create a disaster recovery plan. Foundational
Document the sequence of possible events. Foundational
Create a disaster decision tree that includes restore strategies. Foundational
Establish recovery success criteria. Foundational
Validate restore strategies. Foundational

KEY TERMS
business continuity plan (BCP): disaster recovery plan (DRP): media is allowed to be reused for
A policy that defines how an A policy that defines how people another new backup.
enterprise will maintain normal and resources will be protected in recovery model: A database
day-to-day operations in the event the case of a natural or man-made option that specifies how the
of business disruption or crisis. disaster and how the organization write ahead transaction log
decision tree: A technique for will recover from the calamity. records events; the options are
determining the overall risk media retention: A period of simple, bulk logged, and full.
associated with a series of related time such as a year, a month, or These settings influence your
risks; that is, it’s possible that a week for which any backup protection against data loss.
certain risks will only appear media is not altered and kept in
as a result of actions taken in that state in which it was created.
managing other risks. After this retention period the

242
Designing a Data Recovery Solution for a Database | 243

So far in this textbook you’ve learned several key aspects about designing your SQL Server
database infrastructure. These have included considerations of physical design, hardware
needs, security issues, and so on. But no matter how well you design, plan, anticipate, and
prepare, things inevitably go wrong, and disaster strikes. The permanent loss of data is a
catastrophic event that can cripple an organization.

Given those considerations, it isn’t surprising to discover that one of the primary responsibili-
ties of a database administrator is to secure the information contained in the user databases.
This responsibility consists of several tasks, including designing for fault tolerance, developing
a data restoration strategy that anticipates disaster, and securing the data.
Not having a reliable, carefully thought out disaster plan is an open invitation to catastrophe
and borders on inexcusable criminal negligence. It’s as if you decided to jump out of an
airplane without a parachute, expecting the trees below to catch you. It might work, but
it’s not a viable plan. In this Lesson, you’ll examine how to plan a data recovery strategy for
databases, including a backup and restore plan.
Although you’ll primarily focus on best practices, you should make a habit of establishing and
applying general principles when planning and using data-recovery techniques across your
infrastructure rather than trying to design a different plan for each database. Establishing and
using general principles can save you both time and money.
For example, assume you decided to design an individualized data-recovery strategy for
each database in your system. Doing so would probably result in having procedures that
vary from server to server. The subsequent plan would be unnecessarily complex. Avoiding
this is a simple matter of generalizing the principles involved and taking into account the
requirements of all your databases and applications. Armed with these principles, you can
easily design a single data-restoration strategy that can apply to the entire infrastructure.
A good data-recovery strategy starts with the premise that no matter what is done, every data-
base will need data restoration at some point in its life cycle. The role of a database admin-
istrator is to create an infrastructure plan that allows you to minimize how frequently data
recovery is needed, keep an eye out for developing problems before they develop into disasters,
and have a contingency for every possibility. The plan should also let you proceed quickly to
restoration when disasters do occur and promptly verify that the restoration was successful.
The next section reviews some basics about backup and restoration, as well as the different
types.

■ Backing Up Data

Redundancy is key to surviving a loss: a backup power supply, a second NIC, a standby or
failover server in another physical building, a hot site in another state, personnel trained in each
THE BOTTOM LINE
other’s duties, and so on. Here the emphasis focuses on a second, or even a third, copy of your
data and copies of data-entry worksheets so that you can recover to the millisecond of that loss.

A backup is a copy of data that is stored somewhere other than the hard drive of your com-
puter, usually on tape or another device, such as a hard drive on another computer connected
over a local area network (LAN), somewhere that won’t suffer the same consequences as the
primary site. There are three basic reasons to back up data:
• The possibility of hardware failure. Despite significant advances in reliability, hardware
fails, often spectacularly and more often than not with an uncanny knack for happening
at exactly the wrong time. If you don’t want to come to work one day and discover
that everything is missing because a hard drive went bad, you should always perform
regularly scheduled backups.
244 | Lesson 11

• The chance of external disasters, whether natural or man made. No matter how
much redundant hardware you have in place, it’s not likely to survive a tornado, a
hurricane, an earthquake, a flood, or a fire. Although the possibility is slight, man-made
disasters, such as a terrorist attack or an act of war, can be catastrophic.
• Human malevolence. A large number of data disasters can be traced back to insiders.
Far too often, employees who are angry with their boss or their company seek revenge by
destroying or maliciously changing sensitive data. This is the worst kind of data loss, and
the only way to recover from it is by having a viable backup.
Now that you know why you should back up your data, you need to learn how to do it.

You can use the Database Maintenance Plan Wizard to schedule backups to run automati-
TAKE NOTE
* cally. To access the wizard in Management Studio, expand Management, right click the
Maintenance Plans Folder, and then select Maintenance Plan Wizard.

CREATING A BACKUP DEVICE


To do any kind of backup, you may choose to create a backup device, which is a place to put
the backed up data. For example, the tape drive or disk drive you use in a backup or restore
operation is a backup device. Microsoft SQL Server can back up databases, transaction logs,
and files to disk (local or over a network connection) and tape devices.
SQL Server isn’t automatically aware of the various forms of media attached to your server,
so you have to tell it where to store the backups. You can create two types of backup devices:
permanent and temporary.
Temporary backup devices are created on the fly when you perform the backup. They’re
useful for making a copy of a database to send to another office so that they have a complete
copy of your data. Temporary backup devices can also be used to make a copy of your
database for permanent offsite storage (usually for archiving).
Permanent backup devices can be used over and over again; you can also append data
to them. These attributes make them the perfect device for regularly scheduled backups.
Permanent backup devices are created before the backup is performed and, like temporary
devices, can be created over the network or a locally accessible hard disk or to a local tape
LAB EXERCISE drive.
Perform Exercise 11.1 in your In Exercise 11.1, you’ll create a permanent backup device.
lab manual.

If you use a tape drive as a backup medium, it must be physically attached to the SQL
TAKE NOTE
* Server machine.

PERFORMING FULL BACKUPS


As you might guess from the name, a full backup is a backup of the entire database that
includes the database files, the locations of those files, and portions of the transaction log
(from the log sequence number [LSN] recorded at the start of the backup to the LSN at
the end of the backup). This is the first type of backup you need to perform in any backup
strategy because all the other backup types depend on the existence of a full backup. In other
words, you can’t do a differential or transaction log backup until you’ve done a full backup. A
LAB EXERCISE full backup is sometimes called a baseline in a backup strategy
Perform Exercises 11.2 and 11.3 To create a sample baseline, in Exercise 11.2 you’ll back up the AdventureWorks database
in your lab manual. to the permanent backup device you created in the previous section of this Lesson. Then, in
Exercise 11.3, you’ll back up the database using T-SQL commands.
As you saw previously, once you have a full backup, you can perform other backup types.
Designing a Data Recovery Solution for a Database | 245

PERFORMING DIFFERENTIAL DATABASE BACKUPS


A differential backup is a copy of all changes made to a database since the last full backup
was performed. This includes all changes to data and database objects. A differential data-
base backup records only the most recent change to a data record if a particular data record
has changed more than once since the last full backup (unlike a transaction log backup that
records each change). A differential backup takes less time and less space than a full backup
and is used to reduce database restoration times.
SQL Server figures out which pages in the backup have changed by reading the last LSN of
the last full backup and comparing it with the data pages in the database. If SQL Server finds
any updated data pages, it backs up the entire extent (eight contiguous pages) of data, rather
than just the page that changed.

Because each differential backup records all changes since the last full database backup, only
TAKE NOTE
* the most recent differential backup is required for restoration of data.

Differential database backups are best used with medium to large databases in between sched-
uled full database backups. As the length of time required to perform a full database backup
increases, performing differential backups obviously becomes more useful. Differential back-
ups are particularly useful in speeding up data restoration times in databases where a subset of
data changes frequently and results in large transaction logs.
Performing only full and differential backups isn’t enough. If you don’t perform transaction
log backups, your database could stop functioning.

PERFORMING TRANSACTION LOG BACKUPS


A transaction log backup is a sequential record of all transactions recorded in the transaction
log since the last transaction log backup. Transaction log backups enable you to recover the
database to a specific point in time. This can be useful if, say, you want to restore the database
to just before the entry of incorrect data.
Even though they’re completely dependent on the existence of a full backup, transaction log
backups don’t back up the database itself. They only record sections of the transaction log,
specifically since the last transaction log backup. The best way to think of a transaction log
backup is as a separate object. Then it makes sense that SQL Server requires a backup of the
database as well as the log.
The length of time required to back up the transaction log will vary significantly depending
on the rate of database transactions, the recovery model used, and the volume of bulk-logged
operations. On databases with very high transaction rates and fully logged bulk operations, the
transaction log backup can be bigger than a full database backup, and frequent transaction log
backups may be required to regularly truncate the inactive portion of the transaction log.
Because a transaction log backup records only changes since the previous transaction log
backup, all transaction log backups are required for restoration of data.
In addition to the fact that a transaction log is an entity unto itself, there is another important
reason to back it up. When a database is configured to use the full or bulk-logged recovery
model, a transaction log backup is the only type of backup that clears old transactions out of
the transaction log; full and differential backups can only clear the log when the database being
backed up is configured to use the simple recovery model. Therefore, if you were to perform
only full and differential backups on most production databases, the transaction log would
eventually fill to 100 percent capacity, and your users would be locked out of the database.

When a transaction log becomes 100 percent full, users are denied access to the database
TAKE NOTE
* until an administrator clears the transaction log. The best way around this is to perform
regular transaction log backups.
246 | Lesson 11

LAB EXERCISE

Perform Exercise 11.4 in your In Exercise 11.4, you’ll perform a transaction log backup.
lab manual.
Although full, differential, and transaction log backups work well for most databases, another
type of backup is specifically designed for very large databases that are terabytes in size:
filegroup backups.

PERFORMING FILEGROUP BACKUPS


A growing number of companies have databases that are reaching the terabyte (TB) range
CERTIFICATION READY?
The FULL, DIFFERENTIAL,
and beyond. These databases are called, logically enough, very large databases (VLDBs). If
and TRANSACTION LOG you tried to a back up a TB-sized VLDB on a nightly, or even weekly, basis, you’d probably
backup types are the quickly become frustrated—even with the fastest state-of-the-art equipment, the backups
fundamental types of would take a long time. To get around that issue, Microsoft has provided a method to back
backups. Know how they up small sections of the database: the filegroup backup.
relate to each other.
A filegroup is a way of storing a database on more than one file. It also gives you the ability
to control in which of those files your objects (such as tables or indexes) are stored. Hence, a
database doesn’t have to be on only one physical disk; it can be spread out across many disks,
with nearly unlimited growth potential.
A filegroup backup is a copy of each data file in a single filegroup. It also includes all database
activity that occurred while the file or filegroup backup was in process. A filegroup backup can
be used to back up one or more of those files at a time rather than the entire database at once.
This type of backup takes less time and space than a full database backup. It’s used for
VLDBs that are too large to be backed up in a reasonable amount of time (such as in a
24-hour period). In a VLDB, you can design the database so that certain filegroups contain
data that changes frequently and other filegroups contain data that changes infrequently
(or perhaps is read only). Using this design, you can use a filegroup backup to perform
frequent backups of the data that changes frequently and to perform occasional backups of
the infrequently changing data. By splitting the backup into segments, you can perform the
necessary backups in the available backup window and achieve acceptable restoration times.
With VLDBs, a filegroup can be restored much faster than an entire database. One nice
thing about filegroup backups is that multiple backups can be in parallel to multiple physical
devices, which significantly increases backup performance.
A caveat: Filegroup backups require careful planning because you should back up and restore all
the related data and indexes together. In addition, a full set of transaction log backups is required
to restore filegroup backups to a state that is logically consistent with the rest of the database.

If the tables and indexes are stored on separate files, the files must be backed up as a single
TAKE NOTE
* unit. You can’t back up the tables and the associated indexes separately.

■ Restoring Databases

The purpose of all backups is to provide backup copies of data that can be used by restore
processes. It is essential that backups be designed with restoration in mind. It is equally essen-
THE BOTTOM LINE
tial for database administrators to be familiar with the various types and methods of restoring
data in databases.

One of the most anxiety-provoking sights is a database that’s graphically displayed in


Management Studio with the word Shutdown in parentheses next to it. This means something
bad, probably a corrupt disk, has happened to the database. It also means you’re going to have
to perform a restore of your last backup.
Designing a Data Recovery Solution for a Database | 247

Suspect or corrupt databases aren’t the only reasons to perform restores, though. You may, for
example, need to send a copy of one of your databases to the main office or to a branch office
for synchronization. You may need to recover from mistaken or malicious updates to the data.
These reasons, and many others, make it important for you to know how to perform restores.

UNDERSTANDING THE GENERAL RESTORE STEPS


Although every data restoration scenario is different, there are several common steps you
should take when you need to restore data because of a database failure:
• Attempt to back up the transaction log. Always try to create a transaction log backup
after a database failure so that you can capture all the transactions up to the time of the
failure. You should use the NO_TRUNCATE option, which backs up the log when the
database is unusable. If you successfully back up transactions to the point of the failure,
restore this new transaction log backup set after you restore the other transaction log
backups.
• Find and fix the cause of the failure. To do this, you need to follow both SQL Server’s
and the operating system’s troubleshooting procedures. You obviously want to find the
source of the problem so you can correct the problem (if possible) and take the necessary
steps to prevent it from happening again.
• Drop the affected databases. Before the database can be re-created, it should first be
dropped so that any references to bad hardware are deleted. You can delete it using either
Management Studio or the T-SQL command DROP DATABASE <database>. If a hard-
ware problem isn’t the reason you’re restoring, you don’t need to drop the database.
• Restore the database. You can use Management Studio to restore databases quickly.
Highlight the database to be restored, right click it, choose Tasks, then select Restore,
then select Database, select the backup to restore, and click OK. If a database doesn’t
exist but you have a backup of it before it was deleted, you can re-create it by restoring
the backup.
Using T-SQL makes sense when you want to restore a database that doesn’t already exist. If a
database by the same name as the database in the backup set already exists, it will be overwritten.
To restore a backup set to a differently named database, use the REPLACE switch.
Although the syntax to do a restoration starts out simply, you can use many options to control
exactly what is restored from which backup set.
The syntax to do a restoration is as follows:
RESTORE DATABASE <database> FROM <device> <options>

These are the most common options:


• RESTRICTED_USER. Only members of db_owner, dbcreator, or sysadmin roles can
access the newly restored database.
• RECOVERY. Recovers any transactions and allows the database to be used. This is the
default if no options are specified.
• NORECOVERY. Allows additional transaction logs to be restored, and also doesn’t
allow the database to be used until the RECOVERY option is used. Basically, the
NORECOVERY switch lets you restore multiple backups onto the same database prior
to bringing the database online.

PERFORMING STANDARD RESTORES


Restoring a database doesn’t involve very many steps, but there is one very important set-
ting you need to understand before undertaking the task. The RECOVERY option, when
set incorrectly, can thwart all your efforts to restore a database. The RECOVERY option tells
SQL Server that you’re finished restoring the database and that users should be allowed back
in. This option should be used only on the last file of the restore process.
248 | Lesson 11

For example, if you perform a full backup, then a differential backup, and then a transaction
log backup, you need to restore all three of those to bring the database back to a consistent
state. If you specify the RECOVERY option when restoring the differential backup, SQL
Server won’t allow you to restore any other backups; you have told SQL Server in effect that
you’re done restoring and that it should let everyone start using the database again. If you
have more than one file to restore, you need to specify NORECOVERY on all restores except
the last one.
SQL Server also remembers where the original files were located when you backed them up.
If you back up files from the D: drive, SQL Server restores them to the D: drive. This is great
unless your D: drive has failed and you need to move your database to the E: drive, or if you
need to change the location for any reason. In this instance, you need to use the MOVE . . .
TO option. MOVE . . . TO lets you back up a database in one location and move it to
another location.
Finally, before SQL Server will let you restore a database, SQL Server performs a safety check
to make sure you aren’t accidentally restoring the wrong database. The first thing SQL Server
does is compare the database name that is being restored with the name of the database
recorded in the backup device. If the two are different, SQL Server won’t perform the restore.
For example, if you have a database named Accounting on the server, and you’re trying to
restore from a backup device that has a backup of a database named Acctg, SQL Server won’t
perform the restore. This is a lifesaver, unless you’re trying to overwrite the existing database
with the database from the backup. If that is the case, you need to specify the REPLACE
option, which is designed to override the safety check.
LAB EXERCISE

Perform Exercises 11.5 and 11.6


In Exercise 11.5, you’ll disable a database, and in Exercise 11.6, you’ll perform a
in your lab manual.
simple restore.
This type of restore is useful if the entire database becomes corrupt and you need to restore
the whole thing. However, what if only a few records are bad, and you need to get back to the
state the database was in just a few hours ago?

PERFORMING POINT-IN-TIME RESTORES


It’s not uncommon to be asked to reset the data back to a previous state, such as at the end
of the month, when accounting closes out the monthly books. This is possible if you’re doing
transaction log backups, in which case you can perform a point-in-time restore.
In addition to stamping each transaction in the transaction log with a log sequence number
(LSN), SQL Server stamps them all with a time. That time, combined with the STOPAT
clause of the RESTORE statement, makes it possible for you to bring the data back to a
previous state.

This process only works with transaction log backups, not full or differential backups.
TAKE NOTE
* In addition, you’ll lose any changes that were made to your entire database after the
STOPAT time.

Another type of restore comes in handy for VLDBs: piecemeal restores. Piecemeal restores
were implemented with SQL Server 2005 and augment the concept of partial restores
introduced in SQL Server 2000.

Another option available is to do a restore up to a specific log sequence number (LSN).


TAKE NOTE
* This could allow you to restore right up to and including a specific transaction without
knowing exactly when it occurred.
Designing a Data Recovery Solution for a Database | 249

PERFORMING PIECEMEAL RESTORES


Piecemeal restores are used to restore the primary filegroup and (optionally) some secondary
filegroups and make them accessible to users. Remaining secondary filegroups can be restored
later if needed.
Every piecemeal restore starts with an initial restore sequence called the partial-restore
sequence. Minimally, the partial-restore sequence restores and recovers the primary filegroup
and, under the simple recovery model, all read/write filegroups. During the piecemeal-restore
sequence for other than Enterprise Edition, the whole database must go offline. Thereafter,
the database is online and restored filegroups are available. However, any filegroups that have
not yet been restored, remain offline.
Regardless of the recovery model that is used by the database, the partial-restore sequence starts
with a RESTORE DATABASE statement that restores from a full backup and specifies the
PARTIAL option. The PARTIAL option always starts a new piecemeal restore; therefore, you
must specify PARTIAL only one time in the initial statement of the partial-restore sequence.
When the partial restore sequence finishes and the database is brought online, the state of the
remaining files becomes “recovery pending” because their recovery has been postponed.
Subsequently, a piecemeal restore typically includes one or more restore sequences, which
are called filegroup-restore sequences. You can wait to perform a specific filegroup-restore
sequence for as long as you want. Each filegroup-restore sequence restores and recovers one
or more offline filegroups to a point consistent with the database. The timing and number of
filegroup-restore sequences depends on your recovery goal, the number of offline filegroups
you want to restore, and on how many of them you restore per filegroup-restore sequence.
With the mechanics of backing up and restoring under your belt, you’re ready for a discussion
of theory. You need to know not only how but also when to use each of these types of back-
ups. You need to devise a viable backup strategy.

■ Devising a Backup Strategy

A backup strategy is a plan that details when to use which type of backup. For example,
you can use only full backups, full with differential, full with transaction log backups, or
THE BOTTOM LINE
any other valid combination. Your challenge is to figure out which one is right for your
environment. Examine the pros and cons of each type of strategy.

PERFORMING FULL BACKUPS ONLY


If you have a relatively small database, all you really need to do is perform full backups,
and you’re done. What is a relatively small database? There is no hard-and-fast rule; what’s
important is the size of a database relative to the speed of the backup medium. For example,
a 500 MB database is fairly small, but if you have an older tape drive that isn’t capable of
backing up a 500 MB database overnight, you won’t want to perform full backups on the
tape drive every night. In that case, effectively it’s not a relatively small database, and you
need to think of a different strategy. On the other hand, if you have hardware capable of a
10 GB backup in an hour, you can consider a full-backups-only strategy, even though the
database is twenty times larger than the database in the other example.
The major advantage of a full-only strategy is that the restore process is faster than with other
strategies, because it uses only one backup set. For instance, if you perform a full backup
every night and the database fails on Thursday, all you need to restore is the full backup from
Wednesday night. With other strategies, the restore take more time because you have more
backup sets from which to restore.
A disadvantage of a full-only strategy is that it gives a comparatively slow and larger backup
compared to the other strategies. For example, if you perform a full backup every night on
a 500 MB database, you’re backing up 500 MB every night. If you’re using differential with
250 | Lesson 11

full, you aren’t backing up the entire 500 MB every night, which is faster and requires less
disk space. A differential backup will often be a small percentage of the full backup size. This
could be only perhaps 10 MB of the 500 MB example database.
Another disadvantage of the full-backups-only strategy involves the transaction log. As you
saw earlier, the transaction log is cleared only when a transaction log backup is performed.
With a full-only strategy, your transaction log is in danger of filling up and locking your users
out of the database. To avoid this problem, you can set the recovery model to simple.
Another option is to perform the full backup and, immediately afterward, perform a trans-
action log backup with the TRUNCATE_ONLY clause. With this clause, the log won’t be
backed up, just emptied. Then, if your database crashes, you can perform a transaction log
backup with the NO_TRUNCATE option. The NO_TRUNCATE option tells SQL Server
not to erase what’s in the log already so that its contents can be used in the restore process.
This approach gives you up-to-the-minute recoverability as well as a clean transaction log.

The first thing you should do in the event of any database failure is use the
TAKE NOTE
* NO_TRUNCATE option with the transaction log backup to save the orphaned log.

PERFORMING FULL WITH DIFFERENTIAL BACKUPS


If your database is too large to perform a full backup every night, the best plan is to add
differentials to the strategy. A full/differential strategy provides a faster backup than full
alone. With a full-only backup strategy, you’re backing up the entire database every time you
perform a backup. With a full/differential strategy, you’re backing up only the changes made
to the database since the last full backup.
The major disadvantage of the full/differential strategy is that the restore process is slower
than with full-only, because full/differential requires you to restore more backups. Suppose
you perform a full backup on Monday night and differentials the rest of the week. Your data-
base crashes on Thursday. To restore the database, you’ll need to restore both the full backup
from Monday and the differential from Wednesday. If it fails on Friday, you’ll restore the full
backup from Monday and the differential from Thursday.
Be aware that differential backups don’t clear the transaction log. If you opt for this method,
you should clear the transaction log manually by backing up the transaction log with the
TRUNCATE_ONLY clause.

PERFORMING FULL WITH TRANSACTION LOG BACKUPS


Another method to consider, regardless of the size of your database, is full/transaction. This is
the best method to keep your transaction logs clean, because it’s the only type of backup that
purges old transactions from your transaction logs.
This method also makes for a very fast backup process. For example, you can perform a full
backup on Monday and transaction log backups three or four times a day during the week.
This is possible because SQL Server performs online backups, and transaction log backups are
usually small and quick.
Another point to remember is that transaction log backups are the only type that gives you
point-in-time restore capability.
A disadvantage is that the restore process is a little slower than with full-only or full/differential
because there are more backups to restore.

PERFORMING FULL, DIFFERENTIAL, AND TRANSACTION LOG BACKUPS


If you combine all three types of backups, you get the best of all possible worlds. The backup
and restore processes are still relatively fast, and you have the advantage of point-in-time
restore as well. Suppose you perform a full backup on Monday, transaction log backups every
Designing a Data Recovery Solution for a Database | 251

four hours (10:00 a.m., 2:00 p.m., and 6:00 p.m.) throughout the day during the week, and
differential backups every night. If your database crashes at any time during the week, all you
need to restore is the full backup from Monday, the differential backup from the night before,
and the transaction log backups, sometimes called incremental backups, up to the point of
the crash. This approach is fast and simple. However, none of these combinations work well
for a monstrous VLDB; for that, you need a filegroup backup.

PERFORMING FILEGROUP BACKUPS


We discussed the mechanics of the filegroup backup earlier in this Lesson; recall that file-
group backups are designed to back up small chunks of the database at a time, not the whole
database at once. This may come in handy, for example, with an 800 GB database contained
in four files in four separate filegroups. You can perform a full backup once per month and
then back up one filegroup per week during the week. Every day, you perform transaction log
backups to maximize recoverability.

SQL Server can determine which transactions belong to each filegroup. When you restore
TAKE NOTE
* the transaction log, SQL Server applies only the transactions that belong to the failed group.

PERFORMING PARTIAL AND PARTIAL DIFFERENTIAL BACKUPS


A partial backup contains all the data in the primary filegroup, every read-write filegroup, and
any optionally specified files. Partial backups are useful when a database contains one or more
read-only filegroups that have been read-only since the last full backup. A partial backup of a
read-only database contains only the primary filegroup. A partial differential backup records
only the data that has changed in the filegroups since the preceding partial backup; such a
partial backup is called the base for the differential. Therefore, partial differential backups are
smaller and faster than partial backups, which facilitates making frequent backups to decrease
your risk of data loss.
Although these types of backups, which are new to SQL Server 2005, are easy to use
and provide more flexibility for backing up under the simple recovery model, they aren’t
supported by all recovery models.
Now that you understand how to do backups and restores and the types open to you, you
need to consider designing a backup and restore strategy.

Designing a Backup and Restore Strategy: The Process

Performing database backups is one of the most important daily tasks of a database
administrator. SQL Server makes performing backup operations simple. However, you
must remember that the reason for making backups is so you can recover some or all
of your data and databases in the event of a failure. You shouldn’t be figuring out how
to do that while your users, boss, and senior management are demanding to know
where the data they need is. You must have a well-rehearsed, fast, and secure means of
performing recovery.

Managing reliable and secure backups across an enterprise can quickly become a complex
task. Therefore, it’s vital that you develop and test the backup and recovery strategy for your
database server infrastructure.
The recommended approach you should take when defining the backup and restore strategy
is to start with commonly accepted best practices for performing and documenting SQL
Server database backup and restore operations and then adapt them for the specific needs
of your organization.
252 | Lesson 11

When you’re designing an organization-wide backup and restore strategy, follow these eight
key steps:
1. Analyze business requirements.
2. Categorize databases based on recovery criteria.
3. Assign a recovery model for each category.
4. Specify the backups required to support each category.
5. Specify a backup frequency policy for each category.
6. Determine the backup security policy for each category.
7. Document the backup strategy.
8. Create a backup validation and testing policy.
Here’s what to do and how to successfully complete each of these steps.

ANALYZING BUSINESS REQUIREMENTS


Your data recovery strategy should allow you to recover missing data as quickly as possible.
You should also be aware of which databases are the most critical and the order in which they
should be recovered in the event of a major failure. Review some guidelines for determining
realistic database recovery requirements as well as how to justify decisions to management
concerning the strategy for recovering databases.
First, examine each database in your organization and determine the volume of data loss
that the organization can tolerate and the amount of downtime that the organization can
withstand when recovering the lost data.
For each database or set of databases, consult with the stakeholders, and use their input to help
determine the value of the data for the organization’s operations. If you’re not sure how, you
can use two key elements to help determine the real business requirements for data recovery:
• Quantify the acceptable cost of potential data loss. The value of data and the impact
of its loss will vary from database to database. For example, last week’s sales orders are
more important than the maintenance records of the company’s fleet of executive cars.
Establish how critical the loss of data would be to the business. Also look to see if loss
of data could result in legal or regulatory problems. You need to review this information
with the stakeholders and confirm with management.
Quantify the cost of data loss in monetary terms whenever possible. Although these are
usually estimates, they can give you a measure by which you can compare databases and
prioritize their recovery needs.
• Determine the time and cost of database recovery. For each database, determine the
maximum period of time it can be unavailable before the impact becomes so great that
the organization can no longer function effectively. For example, critical data must be
recoverable immediately, whereas less important data can often be recovered after a delay.
You need this week’s sales data now. The date of the last oil change of the CFO’s car can
wait a week if need be. You should also be prepared to quantify the recovery time as a
monetary cost, and you can then compare this against the value of the data.

CATEGORIZING DATABASES BASED ON RECOVERY CRITERIA


To manage and develop a global data-recovery strategy, you need to categorize the importance
of data based on some criteria. This in turn is based on the criticality of data, but you should
also assess the size and rate of change of the data the database contains.
Designing a Data Recovery Solution for a Database | 253

Among the many criteria you can use to categorize databases, you should include at least the
value, volatility, and size of the data, as described here:
• Value of the data. Determine the significance of the data held in a database, and
identify those databases that are most critical. Take into consideration the role that
the data plays in the organization, the estimated cost of data loss, and the cost of data
unavailability during recovery. The end result may be categories such as “mission-critical
databases,” “department-critical databases,” and “noncritical databases.” Mission-critical
and department-critical databases may both require backups of transaction logs, but
noncritical databases may only require database backups.
• Rate of change of the data. Very active databases may require a different backup-and-restore
strategy than that for relatively inactive databases. If the data changes frequently, you’ll find
that differential backups will be less useful than regular full and transaction log backups.
• Size of the data. The size of a database can impact the options available for performing
backup and restore operations. You can use the size to determine the proper combina-
tions of backups that you should take as part of your backup strategy. A large database
might be backed up only once per week or might have different filegroups backed up
each night. A smaller database might be fully backed up nightly.

Choosing a Recovery Model

Backup and restore operations occur within the context of recovery models. A recovery
model is a database property that controls the basic behavior of backup and restore
operations for the database. For instance, a recovery model controls how transactions are
logged, whether the transaction log requires backing up, and what kinds of restore operations
are available. A new database inherits its recovery model from the model database.

Recovery models simplify recovery planning, simplify backup and recovery procedures, clarify
trade-offs among system operational requirements, and clarify trade-offs among availability
and recovery requirements. Three recovery models are available: simple, full, and bulk logged.

CHOOSING THE SIMPLE RECOVERY MODEL


This model minimally logs most transactions, logging only the information required to ensure
database consistency after a system crash or after restoring a data backup.
As old transactions are committed and the log isn’t needed anymore, the log is truncated.
This truncation of the log eliminates backing up and restoring transaction logs. However, this
simplification comes at the expense of potential data loss in the event of a disaster. Without
log backups, the database is recoverable only up to the time of its most recent data backup.
The simple recovery model is generally useful only for test and development databases or
for databases containing mostly read-only data. Simple recovery requires the least adminis-
tration. Data is recoverable only to the point of the most recent full backup or differential
backup. Transaction logs aren’t backed up, and minimal transaction log space is used. After
the log space is no longer needed for recovery from possible server failure, the space is reused.
Additionally restoring individual data pages is unsupported.
The simple recovery model is inappropriate for production systems where loss of recent
changes is unacceptable. In such cases, Microsoft recommends using the full recovery model.
When you’re using simple recovery, the backup interval should be long enough to keep the
backup overhead from affecting production work, yet short enough to prevent the loss of
significant amounts of data.

CHOOSING THE FULL RECOVERY MODEL


This model fully logs all transactions and retains all the transaction log records until after
they’re backed up. In the Enterprise Edition of SQL Server 2005, the full recovery model
allows a database to be recovered to the point of failure, assuming that the tail of the log has
been backed up after the failure.
254 | Lesson 11

The full recovery model covers the broadest range of failure scenarios and includes both data-
base backups and transaction log backups. It also provides the most flexibility for recovering
databases to an earlier point in time.
If one or more data files are damaged, recovery can restore all committed transactions. In-
process transactions are rolled back. In Microsoft SQL Server, you can back up the log while
a data or differential backup is running. In the Enterprise Edition of SQL Server, you can also
restore your database without taking all of it offline if your database is in full or bulk-logged
recovery mode.
The full recovery model supports all restore scenarios.
By logging all operations, including bulk operations such as SELECT INTO, CREATE
INDEX, and bulk-loading data, the full recovery model allows you to recover a database to the
point of failure or to an earlier point in time if you’re using the Enterprise edition of SQL Server.

CHOOSING THE BULK-LOGGED RECOVERY MODEL


The bulk-logged recovery model is intended as a supplement to the full recovery model.
Before bulk operations such as bulk loading or index creation, you may want to switch a full
model database temporarily to the bulk-logged recovery model. If you do, you should switch
back to the full recovery model immediately afterward.
This model minimally logs most bulk operations, such as index creation and bulk loads, while
fully logging other transactions. Bulk-logged recovery increases performance for bulk opera-
tions and is intended to be used as a supplement to the full recovery model. The bulk-logged
recovery model supports all forms of recovery, although with some restrictions.

A database needs to be running in full or bulk-logged recovery mode in order to support


TAKE NOTE
* log shipping.

Table 11-1 compares the recovery models.

Table 11-1
Recovery model comparison
R ECOVERY R ECOVER TO P OINT
M ODEL BENEFITS D ATA L OSS E XPOSURE IN T IME ?

Simple Permits high- Changes since the most Can recover to the
performance bulk recent database or end of any backup.
copy operations. differential backup must Then, changes must
Reclaims log space be redone. be redone.
to keep space
requirements small.
Full No work is lost due Normally none. If the Can recover to any
to a lost or damaged log is damaged, changes point in time.
data file. Can recover since the most recent log
to an arbitrary point backup must be redone.
in time.
Bulk-logged Permits high- If the log is damaged, or Can recover to the
performance bulk bulk operations occurred end of any backup.
copy operations. since the most recent log Then changes must
Minimal log space backup, changes since be redone.
is used by bulk that last backup must be
operations. redone. Otherwise, no
work is lost.
Designing a Data Recovery Solution for a Database | 255

You should assign a recovery model for each category of database that you’ve identified. A good
rule of thumb for assigning a recovery model is to evaluate the requirements for each category:
• Use the full recovery model for the most critical databases. This enables you to recover
data quickly and efficiently as long as you have the appropriate database and transaction
log backups available.
• Use the bulk-logged recovery model if the database makes extensive use of bulk
operations to maximize performance.
• Use the simple recovery model for databases with less critical recovery and performance
requirements. This lets you minimize the administrative overhead associated with these
databases, at the risk of losing changes made since the last backup was taken.

When a database is created, it has the same recovery model as the model database. You
can change the recovery model using either T-SQL or Management Studio. The T-SQL
TAKE NOTE
* command is ALTER DATABASE <database> SET RECOVERY { FULL | BULK_
LOGGED | SIMPLE }.

CERTIFICATION READY?
SPECIFYING WHAT BACKUPS ARE NEEDED TO SUPPORT EACH CATEGORY
Expect exam The different categories of databases that you’ve defined might require taking different
questions requiring an combinations of backups using varying backup schedules to ensure that you keep recovery
understanding of the times within the business constraints of the organization. Remember that all recovery
different backup options models require that you do a full database backup at regular intervals and factor it into your
and how the Recovery plan. You can then take combinations of transaction log and differential database backups
Model setting relates to according to the requirements of each category.
your backup options. If
the Recovery Model is set
to Simple, how often are SPECIFYING THE BACKUP FREQUENCY
transaction log backups
While designing a backup frequency strategy, you should do the following:
going to be done?
• Specify a periodically repeating backup schedule for each category. For example, “mission
critical databases with low activity” might require you to take a full database backup
once a week, a differential backup daily, and transaction log backups every 30 minutes.
• Specify a media retention or rotation strategy for stored backups that meets business
needs and complies with relevant laws and regulations.

You should always back up the master database before adding a new database or deleting an
TAKE NOTE
* existing database as well as prior to making any global configuration changes to SQL Server.

SETTING THE BACKUP SECURITY POLICY


A database backup contains a copy of the information held in the database. Therefore, you
should apply the same level of security to it as to the original.
For each category of database, establish a policy for ensuring backup security. The policy
should cover all parts of the backup process, including creating the backups, temporarily stor-
ing backups on site, and transporting and storing backups in an offsite location, a process
known as maintaining a chain of custody. Typically, auditors will audit the chain-of-custody
records for financial and other key data. In the event of an audit, you must also be able to
demonstrate that your data has not been altered or accessed by unauthorized personnel.
You should integrate these key points into your strategy for securing backups:
• Ensure that the offsite storage location is physically secure and available whenever
required. The storage facility should be far enough away to not be subject to the same
256 | Lesson 11

type of disaster that might affect your production location but close enough to enable
you to obtain your storage media in a reasonable time.
• Use a secure method of delivering backup media to their storage destination.
• Protect your backups by using strong passwords.
• Back up your encrypted data to files and tape media.

With the introduction of transparent data encryption (TDE) in SQL Server 2008, you now
have the choice of cell-level encryption as in SQL Server 2005, full database-level encryp-
tion by using TDE, or the file-level encryption (EFS) options provided by Windows. TDE
is the optimal choice for bulk encryption to meet regulatory compliance or corporate data
TAKE NOTE
* security standards. TDE works at the file level, which is similar to two Windows® features:
the Encrypting File System (EFS) and BitLocker™ Drive Encryption, the new volume-level
encryption introduced in Windows Vista®, both of which also encrypt data on the hard
drive. TDE does not replace cell-level encryption, EFS, or BitLocker.

DOCUMENTING THE BACKUP STRATEGY


You must carefully document the backup strategies that you’ve created. The documents should
be made available to any administrators who will be called on to execute it. For each category
of database, include scheduled backup frequency, the database recovery model, references to
external storage locations that keep the backup copies, and any other information required for
performing restore operations. These documents are also of interest to financial auditors who
will assess the adequacy of the procedures for safeguarding a company’s financial records.

Don’t include passwords in your documentation, because you then make your system only
TAKE NOTE
* as secure as the document in which the passwords can be found.

CREATING A BACKUP VALIDATION AND TESTING POLICY


For each category of database, include a validation and verification policy to ensure that your
backup copies are actually useful. These policies must include frequent checks of database
restore procedures (at least once a week) and verifying that you can perform a full server
restore at least once a year.

When backing up and restoring databases, SQL Server provides online page checksum verifica-
tion and backup and restore checksums. You should consider always using the CHECKSUM
TAKE NOTE
* option with the BACKUP command. You can use the BACKUP WITH RESTORE
VERIFYONLY T-SQL command to verify the validity of a backup without performing an
entire restore operation.

An excellent method of assuring that the databases stored in a backup don’t contain any allo-
cation, structural, or logical integrity problem is by using T-SQL to run the Database Console
Command (DBCC) DBCC CHECKDB prior to initiating BACKUP DATABASE. Doing so
initiates the performance of the following database operations:
• Runs DBCC CHECKALLOC to check the consistency of disk-space allocation
structures for it.
• Runs DBCC CHECKTABLE to check the integrity of all the pages and structures that
make up every table or indexed view in the database.
• Validates the Service Broker data in the database.
Designing a Data Recovery Solution for a Database | 257

• Runs DBCC CHECKCATALOG to check for catalog consistency within the specified
database. Note that the database must be online.
• Validates the contents of every indexed view in the database.
This means the DBCC CHECKALLOC, DBCC CHECKTABLE, and DBCC
CHECKCATALOG commands don’t have to be run separately from DBCC CHECKDB.

■ Developing Database Mitigation Plans

Planning how to handle future unpredictable problems can greatly mitigate the negative
THE BOTTOM LINE
effects of these potential problems.

A database disaster recovery plan should be part of a broader disaster recovery plan for
your department or organization’s entire IT infrastructure. Developing a good database
disaster recovery plan requires coordination with other departments within the organization,
including management, network administrative staff, and offsite storage operators.
Similarly, a business continuity plan (BCP) deals with the steps needed should a local
disruption or process change occur. As with a disaster recovery plan, the BCP has to be
coordinated with all participants to recover from the loss of key employees (implement
cross-training programs, perhaps), new acquisitions and mergers (start a business process
reengineering effort, perhaps), and/or a constant flood of new Microsoft updates and
products (analyzing the values of SQL Server 2008, perhaps).

CATEGORIZING THE INFORMATION


What you put in your database recovery or maintenance plan depends on the size of your
organization and the complexity of your database server infrastructure. If you have a really
large organization or a complex plan, it’s a good idea to subdivide the plan and assign the
different parts to modules corresponding to the functionally independent parts of your
organization or the stages of the plan.
Detail is important in any mitigation plan. There are a number of ways you can categorize
the types of information the plan should contain. The following sections show some examples
of the information you should include.

CONTACT LIST
When an interruption occurs, you should have a list of key contacts readily available. These
are the people you’ll notify about the event and the current status of the system. You can
group contacts based on their role or need for information:
• Key managerial personnel in your own department. These contacts are typically
responsible for notifying other members in the organization. In some cases, you may be
the one who has to notify others, so you’ll need a more comprehensive contact list.
• Technicians and other staff responsible for helping recover the system. These
contacts include hardware service technicians, enterprise application service staff, and
possibly spare-part suppliers and transport agencies, for plans that require quickly
obtaining replacement machinery. You should also include backup personnel whom you
can notify in case you can’t contact the first level of key individuals.

You must also have a mechanism for providing access to the database administrator’s
password, should the database administrator (or the designated backup member of staff )
TAKE NOTE
* be unavailable. You should treat this password with all the sensitivity accorded to critical
financial data and protect it accordingly.
258 | Lesson 11

• Those who need an alternative mode of operation while the problem is resolved.
This includes both personnel and departments affected by the disaster. For example,
if you detect that the database supporting a shipment-tracking system of the freight-
forwarding department has failed, you should notify the designated contact for that
department, who can in turn notify the customer service staff.
Just as important, make sure all contact information is up to date. Finding out it isn’t should
happen before you’re in the middle of an actual calamity. Don’t get sloppy!

DECISION TREE
A comprehensive recovery plan must include key details for performing recovery in a wide
variety of circumstances. You should document the various circumstances that can arise and
the detailed steps to take to resolve each situation. For a particular type of disaster, you can
often consolidate many different recovery paths into a decision tree. A well-developed and
tested decision tree can help reduce errors and speed up the recovery process. You’ll review
how to create a decision tree later in this Lesson.

RECOVERY SUCCESS CRITERIA


Make sure you have a set of criteria that verifies that a particular recovery process is complete
and successful. Don’t think you’re done after you’ve restored the databases for which you are
responsible—that’s only the first step of the recovery. Make sure peripheral connectivity and
supporting items that applications require to access the databases are also available. The recov-
ery process is finished only after the applications and anything else using the databases are
running as expected.

LOCATION OF BACKUPS, SOFTWARE, HARDWARE, KEYS,


AND ACTIVATION PROCESSES
You should document the locations of backups, software, hardware, serial numbers, software-
activation keys or activation and configuration processes, and documentation describing how
to rebuild servers and reinstall software. Be sure to include information about software ver-
sion numbers and any service packs required. If you’re responsible for maintaining database
server hardware, you should also include locations of spare parts, such as replacement disks,
memory, and processors.

INFRASTRUCTURE DOCUMENTATION
The database disaster plan can contain server hardware specifications and configuration
information. However, there may also be related documents that it doesn’t contain, such as
infrastructure diagrams and application process documents. You should make sure the disaster
plan specifies the location of the most recent versions of these documents.

CREATING A DISASTER RECOVERY DECISION TREE


The key to a good disaster recovery plan is anticipation and contingency planning. You should
assess the different types of disasters that can occur, analyze recovery needs, and provide
detailed steps for recovering from each situation. As you can imagine, the recovery processes
involved can be complex, and the data involved may be irreplaceable as well as critical. Taking
the wrong step during the recovery process can lead to an inordinate amount of lost time.
Adding to this area of concern is the simple fact that when disaster strikes, stress levels go up.
In most environments, stress is managed and activities are kept on the correct path by con-
trolling the decision flow. If you’ve ever watched a disaster movie, you remember the scene
where the professionals trying to manage the event methodically go through their checklists.
One of the goals of disaster planning is to preempt decisions and ensure that they stay on the
path that will achieve the optimal results by providing a detailed set of steps to be performed
according to the circumstances.
Designing a Data Recovery Solution for a Database | 259

These steps are usually captured in decision trees that can vary based on specific conditions.
At a high level, you should attempt to classify the types of potential database disasters, how
they can be detected, and what impact they can have on the availability of your databases.
Then, for each type of disaster, determine the proper order of recovery and the methods for
verifying that recovery is successful. The following sections provide some guidelines for devel-
oping a decision tree for a database disaster recovery plan.

CLASSIFYING DATABASE LOSS SCENARIOS


As a first step, you should classify possible disasters into relevant groups that have similar
recovery paths. Each group can become a scenario that you then associate with a decision tree.
Some scenarios that you can use to help classify database service loss include the following:
• Wide-ranging natural or man-made disaster. Examples include an earthquake, a volcanic
eruption, a flood, a war, or a terrorist attack. This type of disaster could affect more than
one location in a single geographic area, and the corresponding recovery plan must not rely
on being able to recover by using locally held backups, software, or hardware.
• Loss of a single server or location. This could be the result of a fire or a serious power
outage that affects a single office, building, or immediate locality. The recovery plan
must enable another site to replace the lost server or location quickly. The replacement
site could be located close to the original site.
• Data corruption or loss in a single database. This could be the result of a disk failure,
a user error, or even an application error. The recovery plan must enable the missing data
to be recovered quickly, possibly keeping unaffected parts of the database available.
• Loss of performance or service. The database may be healthy but inaccessible or running
slowly due to the failure of one or more database services providing access to the database.
The recovery plan must provide steps for identifying and restarting the failing service.
• Security failure. The database may be operating healthily but with compromised secu-
rity. This could be due to malfunctioning or malicious software, a virus attack, or a
configuration error. The recovery plan must provide steps for rectifying security breaches
and protecting data.

PRIORITIZING DATABASE RECOVERY STEPS


In the event of a reinstall effort that impacts several databases, you need to recover the most
important databases first. The following list provides some guidelines you can follow for estab-
lishing the relative importance of databases and the order in which you should recover them:
• Identify the most critical databases. Classify the importance of databases in terms of
the losses that will be incurred if the databases are unavailable or the data they contain
becomes insecure. Consider any dependencies between databases. For example, if your
client database depends on a small configuration database, you’ll most likely have to
restore the configuration database first, even though the information that this database
contains isn’t economically important.
• Identify critical processes. Related processes can be just as important to restore as the
databases. Often, business applications involve more than one database, even more than
one service, such as SQL Server Integration Services, SQL Server Agent, and Message
Queuing. Some of these processes will be instrumental to the core business functionality
of your applications, whereas others may be less important.
• List recovery steps in the correct order based on business needs. Identify the core
CERTIFICATION READY? business processes that are most important and which must be recovered first. You
Know the sequence should consider recovering the databases that support these processes as the first priority.
of the restore process In a large organization, you can also identify processes and their corresponding databases
and when to use that have secondary or even tertiary importance.
the RECOVERY or
• Establish recovery success criteria. It’s important that you have a means to verify
NORECOVERY options.
the success or failure of each step in the recovery process. Make sure that personnel
260 | Lesson 11

performing each recovery operation have a written procedure and are required to test the
results of the step. The decision tree must include the expected results and the possible
actions to take if these results don’t occur. You can document these further actions as
branches in the tree or references to other parts of the decision tree.

DOCUMENTING A RECOVERY DECISION TREE


To be effective, a recovery plan must have sufficient detail and clarity to be used in the event
of data loss. Ensure that the logical flow of the decision trees provides enough detail that the
database administrator can easily understand the steps and execute them under stressful con-
ditions. Because of the elevated risk of error, you don’t want a database administrator to have
to make critical decisions during a time of high stress.
Documenting the recovery strategy for each catastrophic scenario. Each type of disaster
calls for a tailored set of recovery steps. The most effective means of documenting these steps
is to use a decision tree based on a flowchart or matrix. Although you can reference one deci-
sion tree from another, you should document the recovery strategy for each scenario separately.
The decision tree should specify the category of disaster, the likely symptoms (to enable
the administrator to verify the possible cause if it isn’t obvious), and a series of commands
and operations that the administrator should perform. Each command or operation should
include information enabling the administrator to verify the success of the step and specify
what to do as each step succeeds or fails.
If a particular decision tree becomes too complex, consider turning parts of it into subtrees. If
that isn’t possible, the scenario itself may be too complex and need to be broken into smaller
subscenarios.
Practicing and recording recovery times for each step. You should document the expected
time required to execute every major step in the recovery plan. You can usually obtain esti-
mates by rehearsing each step in the plan using realistic volumes of data, extrapolating if nec-
essary. With reasonable estimates about how long the major steps will take to perform, you
can reliably predict how long the overall recovery process will take.
Recovery procedures can include replacement of hardware, so you must factor in the time to
replace hardware. If you don’t have spare hardware, refer to your hardware support service-level
agreement (SLA), which specifies the maximum downtime that your service contract stipulates.
The most effective method for ensuring that your plan works is to rehearse it periodically and
validate it.

BEST PRACTICES FOR MAINTAINING A RECOVERY PLAN


Perhaps the most important challenge with a database disaster recovery and business continuity
plan is keeping it up to date as a live document and making sure all relevant staff members
understand how to use it:
• Disseminate the recovery plan. Ensure that the database disaster recovery plan is
available to everyone who needs it. When multiple copies of the plan are in circulation,
make sure you have one place, such as a source-code control system or network share,
where the original and the latest copy of the plan are kept.
• Periodically rehearse the recovery plan. The best way to ensure a smooth recovery pro-
cess is to properly rehearse the recovery plan. You should make this a high-priority task
and schedule it regularly. Your plan will work much better if staff are thoroughly familiar
with the steps involved.
When rehearsing the plan, you should vary the disaster scenarios to determine how well
your plan works in different situations. Use the rehearsals as an opportunity to revise
and update the plan.
Your rehearsal policy should ensure that you rehearse the plan enough to keep it fresh
and current but not so often that it interferes with normal work requirements.
Designing a Data Recovery Solution for a Database | 261

• Periodically validate the recovery plan. The time to specify the criteria for recovery
success is before a disaster occurs, not after. Steps you should take include the following:
° Verify that a database restore was successful. It isn’t enough to verify that you can
query the database. You must also verify that appropriate users can log on to the server
and access the correct data.
° Verify that the correct (most recent) data was restored. Consider including T-SQL
scripts in the plan that can verify the timeliness of the restored data. This means you
need to verify that the most recent or up-to-date data is present.
° Determine and communicate the extent of any data loss. No matter what you do,
you still may lose some data that was entered or changed during a certain period of
time. This can happen, for example, if you’re using log shipping and you weren’t able
to save the active part of the transaction log before the disaster.
° Validate recovery-success criteria. Have clear criteria that you can apply that test the
resulting databases for correctness. Make sure the recovery plan supports the required
applications. In the event of a catastrophic failure that extends beyond the database—
say, a building fire—you’ll need to integrate your activities with the organization’s
recovery plan.
° Validate the contact list, software locations, and hardware locations. Often the
simplest parts of a recovery plan are the elements that you overlook. Apart from the
technical steps involved in recovering the data, you must ensure that the supporting
CERTIFICATION READY? information is up to date.
Disaster Recovery can
take many forms. Read ° Conduct periodic checks to validate that people in the contact list still have the same
any exam question
phone or extension number and hold the same position in the organization. You must
carefully to identify what also ensure that backup hardware and software are located where you expect to find them.
situations a Disaster • Revise the recovery plan based on periodic rehearsals and validations. Database
Recovery Plan should recovery plan rehearsals, along with actual disaster recoveries, give you essential feed-
address. back about the usefulness and relevance of your recovery processes and documentation.
Incorporating that feedback into the recovery plan document is the most effective way of
keeping your plan up to date and keeping you prepared.
• Rehearse and revise the recovery plan when the infrastructure changes. An orga-
nization’s infrastructure doesn’t stand still. When you add new databases, applications,
hardware, or software, or when you upgrade to new releases of software, it’s important to
LAB EXERCISE verify that the recovery plan still works and to update it as necessary.
Perform Exercise 11.7 in your In Exercise 11.7, you’ll develop a disaster recovery decision tree.
lab manual.

S K I L L S U M M A RY

In this Lesson, you’ve examined the topic of designing a data-recovery solution for
databases. Throughout, you’ve learned how to go about deciding which steps you should
take to determine what data-recovery technologies to use based on business require-
ments. You’ve learned how to analyze business and assess alternative techniques and
models to save copies of critical business data for archiving and how to plan for data
archival access.
You’ve learned how to select from different backup formats and determine the number of
devices to be used for backups; how to specify what data to back up; and the frequency,
techniques, types, and recovery models too employ.
Key to the Lesson has been developing an understanding of how to create recovery plans. You
learned the questions to ask, the methods to utilize, and the procedures to follow. You’ve
discovered that in the midst of a seeming catastrophic failure, something as simple as a
well-thought-out decision tree can save you and your organization countless hours of time
and ensure your ability to recover from all but the most egregious of problems.
262 | Lesson 11

For the certification examination:


• Know how to perform full, differential, transactional, and filegroup backups. You need to
know the T-SQL syntax and SQL Server Management Studio methods of performing the
various backups. You should also focus on the advantages and disadvantages of the types
of backups.
• Know about the various database recovery models. You need to know when to use the
simple, bulk-logged, or full recovery model and the options, advantages, and disadvan-
tages of each.
• Know how to restore a database. You need to know the T-SQL syntax and Management
Studio methods for restoring databases.
• Know how to recover from a complex crash scenario. You need to know how to recover
from complete crashes of SQL Server, as well as from a crashed or suspect database.
• Know how to design a decision tree. You need to know how to design a disaster recovery
tree and what elements to include.

■ Knowledge Assessment

Case Study
Waves Styles on George
Waves Styles on George is a large fashion and apparel service agency serving as a
wholesaler for approximately 14,000 subagencies and outlets over a broad geographic
area. The company is headquartered in the city of Trevallyn, which also serves as
northern headquarters, with 407 employees. Three branch offices are located in
Devonport (eastern operations), Ravenwood (western), and Meriwether (southern).

Planned Changes
The company wants a complete disaster recovery plan overhaul and a reevaluation of its
backup strategy.

Existing Data Environment


The company currently has six databases: Customer, Contractor, Accounting, Orders,
HumanResources, and Parts. The Contractor database is not written to very often.

Existing Infrastructure
The company has three existing SQL Server 2005 computers running with default
instances, which contain the following databases:
• WGServer1: Accounting and HumanResources
• WGServer2: Customer
• WGServer3: Contractor, Orders, and Parts

Business Requirements
Users need to be able to access their data at any time of the day or night. The Customer
database must not fail when a single hard disk on the server fails. The Customer data-
base is very volatile, with numerous changes daily during business hours of 09:00 to
18:00. Most of the changes occur during the afternoon hours. Very few changes are
made during nonbusiness hours. Business requirements allow for up to one hour of data
Designing a Data Recovery Solution for a Database | 263

loss. Requirements state that no more than six backups should be required to recover
any given database. Following tests of different backup scenarios, it was determined that
full backups were not to be done during business hours and differential backups should
be performed only once during business hours.

Technical Requirements
The existing named instance configuration can’t be changed because it’s mandated by
the disaster recovery plan.
The recovery model for the Orders database must be full recovery.

Multiple Choice

Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions:
1. You are asked to design the backup schedule for the Customer database. Fill in the
blanks of the following table using these backup types (each selection may be used once,
more than once, or not at all): full, differential, copy, transaction log, incremental.

S CHEDULE BACKUP TYPE


Once per day at 23:00
Twice per day at 12:00 and 19:00
Hourly, during business hours

2. What does the NORECOVERY switch do?


a. There is no such switch.
b. It cleans out (truncates) the log without making a backup.
c. It makes a backup of the log without cleaning it.
d. It loads a backup of a database but leaves the database offline so you can continue
restoring transaction logs.
3. You need to enable more frequent backups of only the volatile data that is stored in the
Orders database. What should you do?
a. Add database log backups of the Orders database.
b. Add full database backups of the Orders database.
c. Add differential backups of the Orders database.
d. Add differential backups created by the Windows Backup Utility.
4. When do you need to use the REPLACE switch?
a. There is no such switch.
b. When you are restoring into a database with data.
c. When you are restoring into a database that is marked read only.
d. When the database you are restoring into has a different name than the originally
backed-up database.
5. Which program would you use to create backup jobs?
a. Transfer Manager
b. Backup Manager
c. Security Manager
d. Management Studio
264 | Lesson 11

6. You need to design a method for testing and verifying that future backups of
WGServer2 can be restored and that the databases stored in the backups do not
contain any allocation, structural, or logical integrity problems. Which two actions
should you perform?
a. Restore the backups to another SQL Server computer.
b. Run DBCC CHECKDB on the original databases.
c. Run DBCC CHECKDB on the restored backups.
d. Use the RESTORE VERIFYONLY command.
7. You need to be able to restore the Parts database at any given time, but it is a very large
database with many inserts and updates; a full backup takes nine hours. You implement
the following strategy: You schedule a full backup every week, with differential backups
every night. You set the recovery option to simple to keep the log small, and you sched-
ule transaction log backups every hour. Will this solution work?
a. This solution works very well.
b. This solution will not work because you cannot combine differential backups with
transaction log backups.
c. This solution will not work because you cannot schedule transaction log backups
with full database backups.
d. This solution will not work because you cannot schedule transaction log backups
when you have selected the simple recovery model for a database.
8. You have three filegroups (FilesA, FilesB, FilesC) in the HumanResources database. You
are rotating your filegroup backups so that each filegroup is backed up every third night.
You are also doing transaction log backups. The files in FilesB get corrupted. Which
of these steps would you take to restore the files? (Choose all that apply, and list your
answers in the order the steps should be taken.)
a. Restore the transaction log files that were created after the FilesB backup.
b. Restore the FilesB filegroup.
c. Back up the log with the NO_TRUNCATE switch.
d. Restore the entire HumanResources database.
9. As discussed previously, Waves Styles on George has a database called Customers. You
are performing full backups every night at 23:00 and differential backups at 12:00 and
19:00. On Tuesday at 17:45, a user deletes all the rows of the table. You discover the
error at 21:00. What is the correct way to restore the Customers database?
a. Restore the full backup from Monday night. Restore the 19:00 differential backup
until 17:44.
b. Restore the full backup from Monday night. Restore the differential from Tuesday
until 12:00.
c. Restore the full backup from Monday night. Restore the differential from 12:00.
d. Restore the full backup from Monday night. Restore the differential from 19:00.
10. You need to configure the Orders database to meet the technical requirements. Which
T-SQL statement should you use?
a. ALTER DATABASE Orders SET RECOVERY SIMPLE
b. DBCC CONFIGDB BACKUP TYPE Orders SIMPLE
c. ALTER DATABASE Orders SET RECOVERY FULL
d. ALTER DATABASE Orders SET RECOVERY MODE TO FULL
Designing a Data- L ESSON 12
Archiving Solution
L E S S O N S K I L L M AT R I X

TECHNOLOGY SKILL EXAM OBJECTIVE


Select archiving techniques based on business requirements. Foundational
Gather requirements that affect archiving. Foundational
Ascertain data movement requirements for archiving. Foundational
Design the format and media for archival data. Foundational
Specify what data to archive. Foundational
Specify the level of granularity of an archive. Foundational
Specify how long to keep the archives. Foundational
Plan for data archival and access. Foundational
Specify the destination for archival data. Foundational
Specify the frequency of archiving. Foundational
Decide if replication is appropriate. Foundational
Establish how to access archived data. Foundational
Design the topology of replication for archiving data. Foundational
Specify the publications and articles to be published. Foundational
Specify the distributor of the publication. Foundational
Specify the subscriber of the publication. Foundational
Design the type of replication for archiving data. Foundational

KEY TERMS
archive: A repository containing media: The physical item used to replication: A set of technologies
historical records that are store data. Tapes are a common for copying and distributing data
intended for long-term form of media as are individual and database objects from one
preservation. optical storage items such as database to another and then
format: The organization of data CDs and DVDs. The type of media synchronizing between databases
stored on some form of media. used must match the physical to maintain consistency.
This could be a SQL Server hardware device used for reading topology: The manner in which
backup format, a simple TXT file from and writing to the media. As the components of a system
containing comma-separated an example, an AIT tape cartridge are arranged or interrelated,
value (CSV) data, or some other (the media) must only be used in including adjacency and
form of organization of the data. an AIT type tape drive. connectivity.

265
266 | Lesson 12

One of the results of a well-designed and expertly crafted database is that it fills with
data. Over time, the sheer amount of data may begin to overwhelm the database system.
At the same time, not all of the data needs to be instantly available. Trying to keep it all
where it can be obtained immediately may serve the opposite goal and begin to degrade
your database, making accessing records more difficult than necessary.

You’ve probably seen a similar phenomenon in your day-to-day activities. You may have
started by keeping all your financial records, receipts, bank statements, cancelled checks, and
the like readily available in a file drawer or box. After a few years, the volume of material
probably convinced you to toss out some of the material you no longer need. The remainder
of the older stuff—such as five-year-old tax returns—you may have removed from the current
file drawer, inventoried, and placed in a cardboard box in your attic for future reference.
In doing so, you went through all the steps that describe a data-archiving strategy and
solution. And you thought this Lesson was going to be difficult!
In this Lesson, you’ll first review the whys and wherefores of data archiving and make sure
you’re clear on why such a system needs to be part of any database infrastructure design.
Then, you’ll be introduced to the fictional company Yanni HealthCare Network, which you’ll
use throughout this Lesson to illustrate how to visualize data-archiving concepts. Finally,
you’ll go through the basic process of designing a data-archiving solution, including determin-
ing business and regulatory requirements and what data will be archived, selecting a storage
format, developing a data-movement strategy, and designing a replication topology if replica-
tion is used.

■ Deciding to Archive Data?

The principal reason to archive data is performance.


THE BOTTOM LINE

Storing historical data—data that doesn’t need to be immediately accessible—online reduces


the performance of a database server. Conversely, archiving yesterday’s data improves query
performance for today’s data, decreases the use of disk space, and reduces the maintenance
window:
• Improved query performance. If a production database contains historical data that
is never or rarely used, queries take longer to execute because they have to scan the
historical data. Moving the historical data from the production database to another serv-
er makes queries more efficient, and you can still query the archive server if necessary.
• Decreased disk space use. Because historical data frequently uses more disk space than
the active data, one obvious advantage to archiving it is that you can free up disk space
for other purposes. Think back to the earlier example of your financial records: By
archiving your records off-site (in this case, in a cardboard box), you opened space in
your file drawer.
Financially, archiving can save you money. For example, if the historical data is stored on
an expensive disk system, such as a Storage Area Network (SAN), archiving the data will
significantly reduce your storage costs.
• Reduced maintenance time. Removing historical data makes basic maintenance tasks,
such as backup, defragmentation, and reindexing on tables more efficient, and reduces
the time required for these operations. Conversely, archiving reduces the time required
for database backup and restores operations.
• Reduced costs. Removing historical data may solve performance problems on some
systems, especially where hardware is barely adequate.
Designing a Data-Archiving Solution | 267

Although there are many advantages to archiving historical data, archiving isn’t a cure-all for
whatever ails your system. Archiving can’t provide a solution for issues such as poorly chosen
indexes, design flaws, improper file placement, and inadequate maintenance and hardware.
Typically, data archiving improves the performance of database servers. However, the results
may sometimes be disappointing. For example, if the amount of data to be archived is rela-
tively small, it may not be worthwhile to archive it, and archiving it may not produce the
desired results in terms of improved performance.
Although archiving data can be a complex process, designing an archiving solution is relatively
simple if you approach the problem in a systematic manner. At a minimum, a data-archive
plan defines both the scope of archiving and the architecture of the archived data. Once you
have the process down, you’ll be able to apply the same steps and procedures to nearly any
situation and come up with a plan that meets your needs.
In creating a data-archive plan, you should take the following steps:
1. Determine business and regulatory requirements.
2. Determine what data will be archived.
3. Select a storage format and media type.
4. Develop a data-movement strategy.
5. Design a topology if replication is used.

Determining Business and Regulatory Requirements

As you’ve seen throughout this textbook, your most important initial consideration in
any design aspect should be to determine any business and regulatory requirements that
will impact your design.

The amount of online data that is required by users depends on an organization’s business.
For example, enterprises in the health care industry have different requirements than organi-
zations in the banking industry. To identify the online data requirements of an organization,
consult with key stakeholders. By working with the stakeholders, you can identify what data
needs to be available and what doesn’t need to be accessible immediately. You can then make
plans to move the latter to backup copy devices or less expensive alternatives.

Case Study: Presenting a Data-Archiving Scenario

The Yanni HealthCare Network serves a total current population of 500,000 patients.
During its 30-year history, it has registered approximately two million persons for whom
it has provided care. Currently, all medical laboratory diagnostic test results have been
digitized and are maintained in an On-Line Transaction Processing (OLTP) database.
The laboratory results database has been growing at a rate of 1.5 percent per month
and contains a large amount of data that is almost never updated and rarely queried.
This historical data has slowed server-maintenance operations such as reindexing and
defragmentation. Because of the large size of the database, running queries is becom-
ing difficult. The final straw was reached when the chief of medicine requested a simple
cross-tab query on three years’ worth of data: The query began on a Friday and wasn’t
completed until Tuesday morning because of the volume of records.
Governmental health regulations require that all laboratory test results be maintained
for 25 years. Clinical caregivers state that they require immediate access to the past five
years’ worth of data for queries and reports. Archived data must be available by the next
day following the submission of the request. All data, as is the case with any medical
data, must be secure and confidentiality maintained. There must be at least two copies
of the archived data stored in different locales. In addition, risk-management personnel,
268 | Lesson 12

accountants, and the research staff are insistent that all information be both retained
online and archived. Finally, the organization has sufficient budget to purchase one or
more new servers for storing the archived data.
Throughout this Lesson, you’ll use this scenario as a tool to show how to design a data-
archiving solution.

Business regulations may stipulate the length of time that data must be accessible online. For
example, in many countries, banks are required by law to maintain certain customer data
online for a specific number of years. Health care providers are also subject to laws and regu-
lations related to maintaining confidential information, often for long periods of time. Other
businesses, such as manufacturing and retail sales organizations and government agencies, may
also need to comply with certain regulations. These regulations may influence data-archiving
requirements in varied ways and lead to interesting and creative archive solutions. You must
consider the impact of regulatory requirements when determining what data can be stored
offline and how quickly it must become available online when requested.
Another consideration is how much of the data you need. Users may not need detailed data
after a certain period. In such cases, you can maintain summary tables online and archive the
detailed data to offline storage.
Applying these concepts to the Yanni HealthCare Network scenario, you can see that combined
business and regulatory requirements require you to keep all the laboratory data in some
manner. The level of detail required isn’t clear, and this is the sort of question you should
discuss with the key stakeholders and management at the hospital. In these discussions, you’ll
be expected to listen and provide solutions. You’ll also be expected to explain the impact of
these requirements on the database system and any performance issues you foresee.
You need to consider one other type of requirement in your design: the accessibility of data.
Some data needs to remain online and immediately available. Other data can be removed
from immediate access but may need to be readily accessible in a short period or long period
of time. In reviewing and creating your archival data plan, you need to consider these factors
as well as the acceptable turnaround time. Once you’ve done that, you can stratify the data
based on relevant time frames. Accessibility requirements and turnaround time also determine
the storage formats and media you can use for archiving data.
Finally, you need to accurately assess the requirements against what the stakeholders want and
what the stakeholders need. Users often demand that historical data be maintained online
because they don’t want to risk losing access to it. However, after the data is archived, they
rarely access the data. This perception of risk to accessibility can lead key stakeholders to
be reluctant to move data offline. One of your tasks as a database administrator must be to
accurately scope what data needs to be archived and to determine the impact on accessibility.
When proposing a data-archive strategy, you should communicate the benefits of archiving
the data and share a plan for ensuring the security and accessibility of the archived data.
Based on the requirements spelled out in the Yanni HealthCare Network scenario, you
need to keep all existing and future data. The users want to be able to access five years of
data online. Governmental regulations require maintenance of data for 25 years, and your
management team wants to retain all data forever.
You review the data and determine that maintaining five years of data is an expensive and
resource-intensive use of assets. You do a study that indicates only 1 percent of all queries
specify data greater than three years old. You bring this information to the attention of the
stakeholders; and after examining the balance between cost, performance, and need, all sides
agree that data three to five years old can be maintained elsewhere, provided the turnaround
time is less than three hours. The result is a structured data-archive plan for accessibility based
on age of data, as shown in Table 12-1.
As you’ll see later in this Lesson, accessibility requirements also influence the structure of
archival data and affect the planning for data-storage formats and media.
Designing a Data-Archiving Solution | 269

Table 12-1
Accessibility requirements
A CCESSIBILITY REQUIREMENT A GE OF D ATA
Immediate access Data less than 3 years old
3-hour access Data more than 3 years old but less than 6 years old
24-hour access Data between 6 and 25 years old
48-hour access Data more than 25 years old

CERTIFICATION READY? Determining What Data Will Be Archived


Imagine various business
scenarios and how to
meet unique needs. For Now that you’ve established the relevant business and regulatory requirements that need to be
example, you have four considered, as well as defined the accessibility requirements, you can turn your attention to
distant warehouses and determining what data can be archived. This is also known as identifying the historical data.
corporate headquarters
needs to query the As you develop your data-archival plan, clearly identify the data that has been selected for
archived data once archiving and justifications for that selection. You should also describe the criteria you’ve used
a week. Change the
to select the data and show how the selection derives from business, regulatory, and accessibil-
scenario to storing
ity requirements as well as any other factors that may influence your design.
the data at corporate
headquarters but analysts Several basic tasks are involved in deciding what should be archived:
only query it once a
year. Keep tweaking the • Identify historical data. In the Yanni HealthCare Network scenario, the definition of
parameters and redesign what is historical was determined by the business and regulatory requirements. All data
appropriately until you're must be maintained, but data more than three years old doesn’t need to be maintained
comfortable with any online for updates and queries. That data can be archived.
situation.
But what if there aren’t any specific guidelines, or you think it may be possible to tweak
the existing requirements to meet them while changing the expected archiving paradigm?
To do that, you should analyze tables that belong to the core application and identify
data that is never updated and rarely queried. You should then present this assessment
as the justification for your design, and delineate between online and archived data. In
the previous section, you used this approach to convince management to reduce online
data from the past five years to the past three years. A corollary is to establish a sliding
window in time that delimits online data from archival data. In our example, you can
archive data that is more than three years old.
Another way to determine whether data is a good candidate for archiving is to use tools
such as SQL Trace and SQL Profiler to determine whether users have accessed a table or
a set of rows in a table during a given period.
• Determine whether there is a savings in disk-space cost. You should archive data only
if it is beneficial to do so. If a sizable amount of disk space will be recovered by archiving
data, making the savings in disk-space cost significant, then data archiving is justified.
Conversely, it may not be worthwhile to archive data that uses a small amount of disk
space. When estimating the savings in disk-space cost, remember that archiving data results
in smaller backup files, further reducing the use of disk space and other storage media. In
the Yanni HealthCare Network scenario, it’s clearly beneficial to move 25 years’ worth of
data out of the database as initially proposed. The suggestion to move data older than three
years rather than five years to archive should be made only if there will be a genuine sav-
ings; otherwise, accepting the initial requirements would be a reasonable course of action.
• Determine the performance benefits. As you learned earlier, archiving data helps
reduce disk, memory, and CPU usage. You can use the System Monitor tool to deter-
mine how the performance of system resources will improve with archiving. You should
also consider the impact of archiving data on maintenance tasks, such as reindexing,
defragmentation, and backup.
• Establish the archiving interval. You can determine the archiving interval based on
your business needs and the nature of data. For example, if you need to maintain the last
270 | Lesson 12

CERTIFICATION READY?
two years’ worth of data online, you can archive data at either monthly or weekly inter-
Imagine various business vals. If you archive monthly, then 25 months of data (2 years plus 1 month) would exist
scenarios and answer online just prior to the monthly archival process.
the questions: How long
must data be stored? STRUCTURING ARCHIVAL DATA
What table attributes To ensure the smooth movement of data from a production database to the archival media,
must be maintained? Is it you need to structure the data properly. SQL Server supports the use of partitioned, normal-
better to denormalize the ized, denormalized, and summary tables to structure archival data. You’ll examine the various
data? What answers will ways in which you can structure archival data. After that, you’ll cover the factors you should
users seek? Should you consider when choosing the structure of archival data.
create a cube instead?
You can structure archival data by using the following types of tables:
• Partitioned tables. You can use fully partitioned tables to structure archival data.
Partitioned tables were introduced with SQL Server 2005 and are more effective than
the older union-partitioned views for managing very large tables and indexes. Partitioned
tables are also easier to maintain than union-partitioned views. It can be difficult to find an
appropriate check constraint on which the partition can be based with union-partitioned
views, and queries across the views don’t always select the appropriate partition correctly.
You can place partitioned tables and their indexes in separate filegroups. In addition, you
can automatically repartition data among tables. You can also switch tables in and out
X REF
of a partition. After a table is switched out of a partition, you can move the table and its
For more information index to the archival destination.
about the table- • Normalized tables. Archiving related data together keeps the historical context of the data.
partitioning features in Normalized tables can be used to structure archival data and maintain historical content. If
SQL Server, see the you use normalized tables, a key consideration is to make sure the tables can accommodate
article on the MSDN changes in lookup values or related tables. One way to accomplish this is to add date-range
Web site at http://msdn validity to the normalized tables. You can then specify the date ranges for valid lookup
.microsoft.com/en-us/ values. Note that archiving relational data often requires the archiving of additional data
library/ms345146.aspx. involved with foreign keys. Normalized data requires these key relationships.
• Denormalized tables. If you’re unable to archive all related data together, you can use
denormalized tables to preserve the historical context of the data. Denormalized tables
store actual values rather than references to the current data. Therefore, these tables are
most useful for optimizing queries that involve complex joins.
In addition to denormalized tables, you can use indexed views to denormalize data.
Because denormalized tables persist data physically, you can retrieve data from them
more quickly than from indexed views. However, denormalized tables require additional
disk space. Denormalized tables also must be periodically rebuilt and aren’t automatically
updated like indexed views.
• Summary tables. You may not need to maintain detailed data after a certain period. In
such cases, you can keep summary tables online and archive the detailed data and store
it offline. For example, you may have a database that stores monthly sales revenue by
product. It may be possible to remove and archive the detailed data while maintaining
only the monthly summaries.

CHOOSING WHICH STRUCTURE TO USE


When choosing the structure of archival data, consider the following factors:
• Data accessibility. If a new application will be developed to access the archived data,
denormalized tables are a good choice. Alternatively, you can maintain only some of the
detailed information and discard the remaining data. If the current application must be
able to use the same mechanism to access both online and archived data, the two types
of data must be structured identically.
In addition, accessibility requirements influence the structure of archival data because
they determine the following:
° The constraints that limit the ability to update archived data.
° The amount of space that can be used for storing archived data.
Designing a Data-Archiving Solution | 271

° The time frame for accepting updates to archived data. This, in turn, may depend on
regulatory requirements.
° The rules for archiving.
• Storage costs. When developing a structure for archival data, you must be mindful of the
hardware, media, and often, software costs for storing the data. As a rule of thumb, stor-
ing archival data online has been more expensive than storing the data offline. This may be
changing. If you choose to use denormalized tables for your archived data, additional disk
space will be needed with concomitant increases in storage costs. You can reduce hardware
costs by keeping only the summary data online and storing detailed data offline. The hardware
used for creating offline storage media then becomes a factor along with the per unit cost of the
offline media as well as any special software for writing to and reading from the media.
CERTIFICATION READY?
Imagine various business
Offline storage can involve hidden costs such as transportation or retrieval costs charged by
scenarios and list the offsite couriers. In addition, you need to ensure that the security of the data that is stored
considerations a plan offline isn’t compromised. It’s also possible that in the event of a major disaster, access to
must include. Is there archived data may not be as smooth as hoped. In the aftermath of the 9/11 attacks on New York
a networking element? and Washington, DC, as well as Hurricane Katrina in New Orleans and the Gulf of Mexico, offsite
When are data no longer data-storage centers were overwhelmed by demands for archival and replicated current data.
needed?
If the structure or format of archival data differs from the source online data, additional
expenses may be incurred for developing applications and reports to access the archived data.

■ Selecting a Storage Media Type and Format

Backups and archiving of historical data require the selection and use of storage media
THE BOTTOM LINE and storage format types. This section provides information on how to make appropri-
ate selections of these topics.

Storage format refers to the logical structure of data on a type of physical media that is used
to store the archived data.
Examples of storage format could be SQL backup files, CSV data extracts, or TXT files from
BCP. The physical media can be disk, tape, optical storage such as CDs and DVDs, or other less
frequently used types of media. Depending on your requirements, you can store archived data on
tape or on low-cost magnetic or optical media. With disk costs falling, there is an increasing move-
ment toward storage on disk. You can also store archived data in a separate database on the server
that hosts the production database. This would be an example of storing archival data in SQL
Server database format. Alternatively, you can use a dedicated server to store archived data. The
choices for storage media and format are influenced by the structure and accessibility requirements
of the archival data. Each type of media and format has different characteristics with respect to
cost, accessibility, shelf life, reliability/durability, security, and changing technology:
• Cost. If you need to archive a large volume of data, the cost of storage can be significant.
As a rule of thumb, for large amounts of data, tapes are cheaper than disks or optical
media; but disk prices keep falling, and in some instances disks may make better sense.
• Accessibility. If the archived data must be quickly accessible, you should normally
use some form of online storage. Traditional offline media such as tape or CDs can be
accessed quickly if you invest in some form of robotic media library. This could be a
considerable extra-hardware expense.
• Shelf life. Shelf life refers to the lifetime of the storage media. Many types of digital stor-
age media, such as DVDs or LTO tapes, are relatively new, and their shelf life may not be
easily determined. If you opt to keep the archive media in your control, make sure you
follow vendor recommendations for storing your archived data in proper environmental
conditions (for example, store tapes in a cool, dry place). If you use an external vendor for
storage, check from time to time to make sure the data is held properly. Also consider the
shelf life of any required hardware. If data was archived to reel-to-reel tape 20 years ago,
can you get a working reel-to-reel 9-track tape drive that can read the archival tapes?
272 | Lesson 12

• Reliability and durability. You must take into account the relative durability and
reliability of the media used. Some types of media are more sensitive to handling and usage
than others and may degrade faster. For example, tapes tend to deteriorate more easily than
disks or optical media. It’s worth assessing the differences in order to ensure that archived
data is readable from archival media. If the ability to retrieve data from old media is criti-
cally important, you should have multiple redundant copies of the data and periodically
perform read testing to ensure that the media can be read. Having a redundant copy then
allows you to make a fresh copy should it be determined that any one copy is unreadable.
• Security. There are many ways to provide for encryption. However, the administrative
overhead and third-party products involved vary. For example, there are third-party
products for encrypting data on both tapes and disks. In addition to encrypting archival
data, you should ensure that the data is stored in a secured location.
• Changing technology. Once you have your plan in place, you need to be ready to adapt
it to changes in technology, because there may be shifts in relative costs of items. For
example, the authors of this book have gone from storage of archive material on floppy
disk to hard drive or zip drive to CD to DVD or SANs. It’s likely that such technological
innovation will continue unabated, and you’ll need to be prepared to revise your plans.
Now, apply the previous discussion to the Yanni HealthCare Network scenario and design a
table that shows the archival design as well as summarizes the storage formats and media (see
Table 12-2). Note that the business and regulatory requirements have led you to conclude
that tape isn’t the best approach for data greater than six years old. Although tape may be an
appropriate option in most cases, the need to conduct data studies using the historical data
leads to a trade-off between the more expensive but more manageable archive server with data
on disk, as opposed to tape.

Table 12-2
Data-archiving strategy:
R EQUIREMENT A GE OF D ATA S TORAGE F ORMAT /M EDIA
Storage format/media
accessibility Immediate access Data less than 3 years old OLTP server (the production server)
3-hour access Data more than 3 years For access to archived data within
old, but less than 6 3 hours, you can use an archive
years old server. The storage capacity of the archive
server should typically be the same as
or higher than that of the main server.
Archival servers usually need fewer
system resources and may have lower
processing capabilities than the main
server hosting the production database.
24-hour access Data between 6 and For access to archived data within
25 years old 24 hours, you can use storage media such
as tapes. However in this case, because you
want to be able to access this data to meet
other business and regulatory requirements,
you must use an archive server.
48-hour access Data more than For access within 48 hours to
25 years old archived data that is likely to only be
rarely accessed, you can use storage
media such as tapes. Although tapes
are slower than hard disks and optical
media, they’re relatively inexpensive and
potentially less reliable. Be aware that
older tapes and older tape drives can be
very problematic. Can you read any tape
that is 25 years old and older today?
Designing a Data-Archiving Solution | 273

CERTIFICATION READY? Developing a Data-Movement Strategy


Imagine various business
scenarios and justify a
storage solution best
A data-movement strategy describes how archival data is moved from the server that
meeting the trade-offs hosts the production database to the destination storage format. When developing the
between cost, longevity, strategy, you should consider the frequency of data movement and its effect on network
access speed, reliability, traffic. If data is to be moved to an archive server, you must determine whether to use
security, and building direct or indirect data transfer based on the type of connection between the production
requirements. Do you and the archive servers. Finally, you must consider the security risks involved in moving
need a remote hot site? data and define measures to safeguard the data during movement:

• Specify the frequency of data movement. You can move archival data from the server
that hosts the production database to the destination storage format periodically or on
an ad hoc basis. Best practice is to move data on a specified schedule rather than ad hoc
because then it can be easily automated and tested, resulting in fewer errors.
• Minimize the impact of data movement on production activities. Schedule data
movement as you would other routine maintenance activities, for times when the user
load is low. Sometimes it’s better to have a schedule of moving small datasets frequently
rather than a large dataset infrequently.
Also consider how data movement will affect reporting. For example, suppose our
fictional hospital ran its summary of laboratory reports for billing purposes on a
monthly basis. You would need to schedule data-movement activities so they didn’t affect
the generation of reports.
Also make sure archival data is moved from the production server to the destination
storage location in an optimal manner. For example, you can decide to transfer archival
data to another server with good disk performance, and then move it to an archive server,
rather than moving it directly from the production server. This technique minimizes the
amount of time and resources devoted to archiving by the production server.
• Choose direct or indirect transfer. If data is to be moved to an archive server, the type
of connection between the production and archive servers can affect the way you trans-
fer the data. When there is a direct connection, tools such as SQL Server Integration
Services (SSIS) and replication can be used for directly transferring data. You can also
use queries to transfer data between linked servers. If the connection between the two
servers isn’t direct and doesn’t allow you to use SQL Server tools, you need to devise
a different method of moving the data to the archive server. For indirect data transfer,
you can use tools such as SSIS and the Bulk Copy Program (BCP) and create extract
files that can then be copied over a network connection using any one of many possible
tools. As you saw in the previous Lesson, you can also use the backup command package
with SQL Server.
• Ensure the security of data during movement. All storage formats and network
connections involved in data movement must be secure. Data stored on portable media,
such as a tape or flash drive, is more vulnerable to theft and security attacks than data
stored on an archive server in a secure data center. You should always use encrypted data
transfer and encrypted files when dealing with sensitive data, which is virtually all data
in an enterprise.
• Prescribe steps for data verification. You don’t want to move archival data to the
destination storage location, delete it from the source, and then discover that it wasn’t
successfully copied. The data-movement strategy must include steps for data verification.
For example, if tapes are used to store archival data, you must retrieve the data to verify
that it has been correctly copied. Similarly, you can verify data that has been copied to
disks or optical media by viewing the data.
274 | Lesson 12

■ Designing a Replication Topology

All forms of data replication involve multiple database servers. A fundamental step in devel-
THE BOTTOM LINE
oping a replication process is to design the topology of the involved servers and how the data
will be replicated across this topology.

The sole purpose of replication is to copy data between servers. Several good reasons exist for
having a system that does this:
• If your company has multiple locations, you may need to move the data closer to the
people who are using it.
• If multiple people want to work on the same data at the same time, replication is a good
way of giving them that access.
• Replication can separate the functions of reading from writing data. This is especially
true in OLTP environments where reading data can place a load on the system.
• Some sites may have different methods and rules for handling data (perhaps the site is a
sister or child company). Replication can be used to give these sites the freedom of set-
ting their own rules for dealing with data.
• Mobile sales users can install SQL Server on a laptop, where they may keep a copy of
an inventory database. These users can keep their local copy of the database current by
connecting to the network and replicating.
You’ll probably be able to come up with even more reasons to use replication in your enterprise.
Another application of replication involves archival data. At first, this may seem counterintui-
tive. Replication is normally associated with synchronizing live (active) databases. Archival
data could be viewed as historical and static and not a good candidate for replication, but that
isn’t always the case.
There may be business reasons or regulatory requirements that mandate the existence of more
than one copy of the same archival data. Securities and Exchange Commission regulations
17a-3 and 17a-4, for example, stipulate that an exact duplicate of archived electronic records
must be stored separately from the original. Because archival data must be updated with
newly archived records, even if infrequently, replication is an easy way of ensuring that the
data remains synchronized between different locales and copies of archived databases.
Replication also addresses both regulatory compliance and disaster-recovery needs. A key
question you’ll have to assess is whether replication of the archival data is correct for your situation.
You can also use replication to provide higher availability of archival data. With replication,
if one archival site isn’t available for any reason, then another site can be used to service the
request. When a dataset is archived, several options are available for replicating this data for
high availability, as you’ll see shortly.
Replication is most appropriate for distributing copies of data among databases. It’s also the tool
of choice for supporting multisite updates and mobile users who are occasionally connected.
Although replication provides limited support for data transformation, it’s best suited for circum-
stances where the structure of the data on the publisher and on the subscribers is the same.

UNDERSTANDING AND ADMINISTERING REPLICATION TOPOLOGIES


A replication topology defines the relationship between servers and copies of data as well as
clarifies the logic that determines how data flows between servers. In general, the replication
topology you design depends on many factors, including the following:
• Whether replicated data needs to be updated, and by whom
• Your data distribution needs regarding consistency, autonomy, and latency
• The replication environment, including business users, technical infrastructure, network
and security, and data characteristics
Designing a Data-Archiving Solution | 275

Figure 12-1
Publication
SQL Server can publish, distrib-
Article
ute, or subscribe to publica- Article
tions in replication Article

Publisher Distributor Subscriber


Contains original Collects changes Receives a
copy of data from publishers copy of data

• The types of replication and replication options


• The replication topologies and how they align with the types of replication

UNDERSTANDING THE PUBLISHER/SUBSCRIBER METAPHOR


Microsoft uses the publisher/subscriber metaphor to make replication easier to understand
and implement. It works a lot like a newspaper or magazine publisher. The newspaper has
information that people around the city want to read; the newspaper publishes this data and
has news carriers distribute it to the people who have subscribed. As shown in Figure 12-1,
SQL Server replication works much the same in that it too has a publisher, distributor, and
subscriber:
• Publisher. In SQL Server terminology, a publisher is a server with the original copy of
the data that others need—much like the newspaper publisher has the original data that
needs to be printed and distributed. The data is organized into publications, which con-
sist of smaller datasets called articles.
• Distributor. Newspapers need carriers to distribute the newspaper to the people who
have subscribed, and SQL Server needs special servers called distributors to store and
TAKE NOTE
* forward initial snapshots of publications and distribute them to subscribers. Distributors
can also store transactions that need to be sent to subscribers.
A SQL Server can be
any combination of • Subscriber. A subscriber is a server with a database that receives copies of publications
these three roles. from a publisher. A subscriber is akin to the people who need to read the news and
therefore subscribe to the newspaper.
The analogy goes even further: All the information isn’t lumped together in a giant scroll and
dropped on the doorstep—it’s broken into publications and articles so that it’s easier to find
the information you want to read. SQL Server replication follows suit:
• Article. An article is data from a table that needs to be replicated. You probably don’t need
to replicate all the data from the table, so you don’t have to. Articles can be horizontally
partitioned, which means not all records in the table are published; and they can be
vertically partitioned, which means that not all columns need to be published.
CERTIFICATION READY?
Imagine various • Publication. A publication is a collection of articles and is the basis for subscriptions. A
business scenarios. subscription can consist of a single article or multiple articles, but you must subscribe to
What replication a publication as opposed to a single article.
strategy works best
for each? When can Now that you know the three roles that SQL Servers can play in replication and that
other technologies data is replicated as articles that are stored in publications, you need to learn the types
(mirroring, copying of replication.
to another location,
distributed transactions, UNDERSTANDING REPLICATION TYPES
backup and restore) be
better solutions than It’s important to control how publications are distributed to subscribers. If the newspaper
replication? company doesn’t control distribution, for example, many people may not get the paper when
they need it, or other people may get the paper for free. In SQL Server, you need to control
276 | Lesson 12

distribution of publications for similar reasons, so that the data gets to the subscribers when
it’s needed.
There are three basic types of replication: transactional, snapshot, and merge (all of which are
discussed in the following sections). Consider the following key factors when choosing a rep-
lication type:
• Autonomy. The amount of independence your subscribers have over the data they
receive. Some servers may need a read-only copy of the data, whereas others may need to
be able to make changes to the data they receive.
• Latency. How long a subscriber can go without getting a fresh copy of data from the
server. Some servers may be able to go for weeks without getting new data from the pub-
lisher, whereas other instances may require a very short wait time.
• Consistency. The most popular form of replication may be transactional replication,
where transactions are read from the transaction log of the publisher, moved through
the distributor, and applied to the database on the subscriber. This is where transactional
consistency comes in. Some subscribers may need all the transactions in the same order
they were applied to the server, whereas other subscribers may need only some of the
transactions.
Once you’ve considered these factors, you’re ready to choose the replication type that will
work best for you.

Introducing Transactional Replication

In transactional replication, individual transactions are replicated. Transactional replica-


tion is preferable when data modifications are to be replicated immediately, or when
transactions have to be atomic (either all or none applied). A primary key is required,
because each transaction is replicated individually. As described next, there are three key
types of transactional replication: standard, with updating subscribers, and peer-to-peer.

USING STANDARD TRANSACTIONAL REPLICATION


All data modifications made to a SQL Server database are considered transactions, regardless of
whether they have an explicit BEGIN TRAN command and corresponding COMMIT TRAN
(if the BEGIN . . . COMMIT isn’t there, SQL Server assumes it). All of these transactions are
stored in a transaction log that is associated with the database. With transactional replication,
each of the transactions in the transaction log can be replicated. The transactions are marked
for replication in the log (because not all transactions may be replicated), and then they’re
copied to the distributor, where they’re stored in the distribution database until they’re copied
to the subscribers via the Distribution Agent.
The only drawback is that subscribers to a transactional publication must treat the data as read
only, meaning that users can’t make changes to the data they receive. Think of it as being like
a subscription to a newspaper—if you see a typo in an ad in the paper, you can’t change it
with a pen and expect the change to do any good. No one else can see your change, and you’ll
get the same typo in the paper the next day. Transactional replication has high consistency,
low autonomy, and middle-of-the-road latency. For these reasons, transactional replication is
usually used in server-to-server environments.

USING TRANSACTIONAL WITH UPDATING SUBSCRIBERS


This type of replication is almost exactly like transactional replication, with one major difference:
The subscribers can modify the data they receive.
The two types of updatable subscriptions are immediate and queued. Immediate updating
means what it says—the data is updated immediately. For this sort of update to occur at the
subscriber, the publisher and subscriber must be connected. In queued updating, the publisher
and subscriber don’t have to be connected to update data at the subscribers, and updates can
be made while either is offline.
Designing a Data-Archiving Solution | 277

When data is updated at a subscriber, the update is sent to the publisher when next connected.
The publisher then sends the data to other subscribers as they become available.
Because the updates are sent asynchronously to the publisher, the publisher or another
subscriber may have updated the same data at the same time, resulting in conflicts when
updates are applied. All conflicts are detected and resolved through a conflict-resolution
policy defined when the publication is created.

USING PEER-TO-PEER TRANSACTIONAL REPLICATION


Transactional replication also uses peer-to-peer replication to support updating data by
subscribers. This method is designed for applications (as opposed to another SQL Server)
that may modify the data at any of the databases participating in replication. An example is
an online shopping application that modifies the contents of a database with each order or
purchase (for example, updating the mailing lists, changing the inventory, and so on).
A key difference between standard (read-only) transactional replication or transactional repli-
cation with updating subscriptions and peer-to-peer transactional replication is that the latter
isn’t hierarchical. Instead, all the nodes are peers, and each node publishes and subscribes to
the same schema and data. Hence, each node contains identical schema and data.

INTRODUCING SNAPSHOT REPLICATION


Whereas transactional replication copies only data changes to subscribers, snapshot replication
copies entire publications to subscribers every time it replicates. In essence, it takes a snapshot
of the data and sends it to the subscriber every time it replicates. This is useful for servers that
need a read-only copy of the data and don’t require updates very often—they could wait for
days or even weeks for updated data.
A good example of where to use this type of replication is in a department store chain that
has a catalog database. The headquarters keeps and publishes the master copy of the database
TAKE NOTE
* in which changes are made. The subscribers can wait for updates to this catalog for a few days
Snapshot replication if necessary.
is principally used to The data on the subscriber should be treated as read only here as well because all the data
establish the initial set will be overwritten each time replication occurs. This type of replication is said to have high
of data and database latency, high autonomy, and high consistency.
objects for merge
and transactional Snapshots are created using the Snapshot Agent and stored in a snapshot folder on the publisher.
publications. Snapshot Agent runs under SQL Server Agent at the distributor and can be administered
through Management Studio.

INTRODUCING MERGE REPLICATION


This is by far the most complex type of replication to work with, but it’s also the most flexible.
Merge replication allows changes to be made to the data at the publisher as well as at all the
subscribers. These changes are then replicated to all other subscribers until your systems reach
convergence, the blessed state at which all your servers have the same data. Because of its
flexibility, merge replication is typically used in server-to-client environments.
The biggest problem with merge replication is known as a conflict. This problem occurs
when more than one user modifies the same record on their copy of the database at the same
time. For example, suppose a user in Florida modifies record 25 in a table at the same time
that a user in New York modifies record 25 in their copy of the table. A conflict will occur
on record 25 when replication takes place, because the same record has been modified in two
different places; SQL Server has two values from which to choose. Conflict-resolution priority
is specified through the New Subscription Wizard or in Management Studio. You can also
use Management Studio’s Replication Conflict Viewer tool to examine and resolves conflicts.
Careful attention must be given to how conflicts are resolved.
Merge replication works by adding triggers and system tables to the databases on all the servers
involved in the replication process. When a change is made at any of the servers, the trigger fires
and stores the modified data in one of the new system tables, where it resides until replication
278 | Lesson 12

occurs. This type of replication has the highest autonomy, highest latency, and lowest transactional
consistency.

MANAGING A REPLICATION TOPOLOGY


After you’ve configured replication for your archival data, you should establish a replication
topology and include the following activities in your design:
Develop and test a backup-and-restore strategy. As discussed in Lesson 11, all databases
should be backed up on a regular basis, and the ability to restore those backups should be
tested periodically. Replicated databases are no different. The following databases should be
backed up regularly:
• Publication database
• Distribution database
• Subscription databases
• msdb database and master database at the publisher, distributor, and all subscribers
Script the replication topology. All replication components in a topology should be scripted
as part of a disaster-recovery plan. The scripts can also be used to automate repetitive tasks.
A script contains the T-SQL system stored procedures necessary to implement the replication
component(s), such as a publication or subscription. Scripts can be stored with backup files
to be used in case a replication topology must be reconfigured.

Scripts can be created in a wizard (such as the New Publication Wizard) or in SQL Server
TAKE NOTE
* Management Studio after you create a component. You can view, modify, and run the
script using SQL Server Management Studio or sqlcmd.

A component should be rescripted if any property changes are made. If you use custom stored
procedures with transactional replication, a copy of each procedure should be stored with the
scripts and the copy should be updated if the procedure changes.
Understand replication performance. Before replication is configured, you need to review
and understand the factors that affect replication performance and how to manage them:
• Server and network hardware
• Database design
• Distributor configuration
• Publication design and options
• Filter design and use
• Subscription options
• Snapshot options
• Agent parameters
• Maintenance
Establish a performance baseline. After replication is configured, you should use Replication
Monitor and System Monitor to determine how replication behaves with your typical work-
load and topology. Determine typical values for the following five dimensions of replication
performance:
• Latency. The amount of time it takes for a data change to be propagated between nodes
in a replication topology
• Throughput. The amount of replication activity (measured in commands delivered over
a period of time) a system can sustain over time
• Concurrency. The number of replication processes that can operate on a system
simultaneously
Designing a Data-Archiving Solution | 279

• Duration of synchronization. How long it takes a given synchronization to complete


• Resource consumption. Hardware and network resources used as a result of replication
processing

Latency and throughput are most important in transactional replication because transac-
tional replication generally requires low latency and high throughput. Concurrency and
TAKE NOTE
* duration of synchronization are most relevant to merge replication, because systems built
on merge replication often have a large number of subscribers, and a publisher can have a
significant number of concurrent synchronizations with these subscribers.

Create thresholds and alerts. Replication Monitor allows you to set a number of thresholds
related to status and performance. It’s recommended that you set the appropriate thresholds for
your topology; if a threshold is reached, a warning is displayed, and, optionally, an alert can be
sent to an e-mail account, a pager, or another device. Note that SQL Server replication pro-
vides a number of predefined alerts that respond to replication agent actions. Administrators
can use these alerts to stay informed about the state of the replication topology.
Monitor the replication topology. Monitoring a replication topology is an important part of
deploying replication. Because replication activity is distributed, it’s essential to track activity
and status across all computers involved in replication. Replication Monitor is the most impor-
tant tool for monitoring replication, allowing you to monitor the overall health of a replication
topology. T-SQL and Replication Management Objects (RMO) provide interfaces for moni-
toring replication. System Monitor can also be useful for monitoring replication performance.
Validate data periodically. Validation isn’t required by replication, but it’s recommended that
you run validation periodically for transactional replication and merge replication. Validation
lets you verify that data at the subscriber matches data at the publisher. Successful validation
indicates that at a point in time, all changes from the publisher have been replicated to the
subscriber (and from the subscriber to the publisher, if updates are supported at the subscriber),
CERTIFICATION READY?
and the two databases are in sync.
Imagine various Adjust publication and distribution retention periods if necessary. Transactional rep-
business scenarios. lication and merge replication use retention periods to determine, respectively, how long
Under what conditions transactions are stored in the distribution database, and how frequently a subscription must
do merge, snapshot,
synchronize. You should monitor your topology to determine whether the current settings
and transactional
replication make sense?
require adjustment. For example, in the case of merge replication, the publication retention
For example, what if period (which defaults to 14 days) determines how long metadata is stored in system tables.
you have unreliable If subscriptions always synchronize within five days, consider adjusting the setting to a lower
cross country network number to reduce the amount of metadata and possibly provide better performance.
connectivity? What if
Understand how to modify publications if application requirements change. After you’ve
local host users keep
changing the archived
created a publication, it may be necessary to add or drop articles or change publication and
data? What if the host article properties. Most changes are allowed after a publication is created, but in some cases,
servers keep crashing? it’s necessary to generate a new snapshot for a publication and/or reinitialize subscriptions to
the publication.

DESIGNING A REPLICATION STRATEGY


In addition to deciding whether replication is appropriate to your archival plan, you should
also perform the following tasks:
• Determine the requirements for replication with archival data. The archival data
to be replicated may be centralized at one location or dispersed at multiple locations.
There may be requirements for multiple copies of the archival data, or specific types of
access may be required. You should also estimate the maximum allowable latency for
distributing an update that is made on a publisher to its subscribers. Keep in mind that
the network and communication infrastructure must be able to support the maximum
latency. You may be able to reduce network usage by placing the distributor closer to the
280 | Lesson 12

subscribers. For example, if you place the publisher in one location and all the subscribers in
another location, you can place the distributor near the subscribers to reduce the long-
distance network traffic across slower speed WAN links.
• Select a replication type. The degree of consistency between the database on a publisher and
the replicated database on a subscriber depends on whether you use snapshot, transactional,
or merge replication. For example, if you use snapshot replication, the subscriber receives
periodic snapshots of the publication stored on the publisher. As a result, the database on
the subscriber is inconsistent with the database on the publisher. Conversely, transactional
replication distributes the publication to the subscriber with low latency. Therefore, the
database on the subscriber is more consistent with the database on the publisher.
• Create a replication topology diagram. A replication topology diagram helps you
understand the flow of data between replication partners. If a diagram of the existing
replication topology isn’t available, you must create a diagram and update it each time you
change the topology. The diagram should depict all the servers involved in replication and
the databases they host, the role of the servers, and the direction of data flow between
the servers.
For each database, identify the direction of replication. If a table exists in several databases,
ensure that the table isn’t replicated multiple times through different paths. To identify
the fastest path between replication partners, consider the speed of the network connection
and its usage level.
• Determine the distributors. Determine the placement of a distributor with respect to
the corresponding publisher. You can place the distributor and the publisher on separate
servers, or you can place them on the same server.
• Determine subscribers. Based on the data requirements of each subscriber, you can
identify the publications it requires. In addition, you must determine whether the
subscriber should be allowed to modify the published data and return the modified data
to the publisher. In general, this isn’t a concern with archival data, because the subscriber
is the archive server, which should not ever convert the original data on its own.
• Choose either push or pull subscription. If you configure the distribution agent on
the distributor, the replication process is called push subscription. Conversely, if you
configure the distribution agent on a subscriber, the replication process is called pull
subscription. When determining whether to use push or pull subscription, you should
consider the number of subscribers and the memory and CPU requirements of each
subscriber. Typically, push subscription is used to minimize the resource utilization on the
subscribers. If you want to offload the distribution agent overhead from the distributor to
the subscribers, pull subscription is a better choice.
• Determine the security requirements. When determining the security requirements of
your replication topology, make sure to identify security accounts for replication agents
LAB EXERCISE and the File Transfer Protocol (FTP) access rights for replication.
Perform Exercise 12.1 in your In Exercise 12.1, you’ll learn how to apply the replication principles you’ve learned about in
lab manual. this Lesson to the Yanni HealthCare Services scenario.

ACCESSING ARCHIVED DATA


Now that you’ve designed your archive plan, decided what data will be archived, as well as
where and how the data will be archived, how do you access that data when you need it?
Typically, you make an application archive-aware by inserting a small row in the active table
keyed to the archived row. When a user requests this archive row, a process is set in place to
deal with the request. This may involve a simple query to another table, a DVD library, a
message sent to a computer-room attendant asking her to mount a tape, or a message sent
back to the original user asking if they really need these data and explaining any time delays.
If the archived data is still in the database but has just been moved to an alternate location,
then getting to the data will be almost transparent to the users. On the other extreme, when
the archived data is stored on tapes that have to be retrieved from storage and mounted man-
ually, the end user may decide that the delay is such that they don’t need the data after all.
Designing a Data-Archiving Solution | 281

S K I L L S U M M A RY

In this Lesson, you’ve examined the topic of archiving data as it relates to database infrastructure.
You learned the role of business, regulation, and accessibility requirements and how they
shape the ultimate design of the data-archiving plan. You’ve learned how to identify data for
archiving and how to plan for data-archive access.
Throughout the Lesson, you’ve seen how to select from different storage formats and develop
a data-movement strategy. Additionally, you’ve become familiar with the role of replication in
archiving data and how to design a replication topology.
Most important, in this Lesson, as in the rest of the book, you’ve learned that database infrastruc-
ture design is a complex series of interactions involving myriad variables. You’ve also learned that
this seemingly overwhelming and difficult process is manageable. All it requires is a basic
understanding of SQL Server, an ability to grasp the elements, and a systematic approach to the
challenges they present. In the final analysis, designing a database infrastructure (or any element
of it) requires that you understand your constraints and requirements and then follow a careful
process to achieve your goal: a well-designed SQL Server database server infrastructure.
For the certification examination:
• Understand the different types of replication. Know how replication works and when the
different types are best used. Be sure to understand latency, autonomy, and consistency.
• Understand business and regulatory requirements. Know what the different business and
regulatory requirements are and how they affect your archival storage needs.
• Understand the different components of a replication topology. Be sure to understand
what publishers, distributors, subscribers, articles, and publications are. You also need to
know how they interact in a replication topology.
• Understand the advantages and disadvantages of different storage media and formats. To
effectively select from a number of options, you need to know the different storage types,
the pros and cons of each, and when they’re most appropriate. For example, tape and optical
media are low-cost, long-term storage choices for data that is never queried, but they are
not a good option for data that needs to be accessed in a very short period of time.

■ Knowledge Assessment

Case Study
Developing an Archive Plan
Thylacine Savings & Loan Association is a large financial institution serving
approximately 1.6 million customers over a broad geographic area. The company is
headquartered in the city of Trevallyn, which also serves as northern headquarters,
with 407 employees. Three branch offices are located in Stratford (Eastern operations),
Belleville (Western), and Rock Hill (Southern).
The company currently has a 3 TB OLTP database that tracks more than 2 billion
transactions each year. The main database for all transactions and operations is located
in Trevallyn. Regional databases contain deposit/withdrawal information only and the
headquarters database is updated daily from the regional offices.
Thylacine’s departmental database servers are dispersed throughout the headquarters
location. The company is currently experiencing a 4 percent annual growth and plans
to expand into four new markets at the rate of one new market every two years. The
282 | Lesson 12

database is growing at a rate of 6 percent per year and will exceed available hard-disk space
in the future. Additionally server capacity is overloaded, resulting in poor performance
and long delays. A large portion of the database data is historical information.
You’ve been asked to develop a data-archiving plan for their ATM transactions. Once an ATM
transaction has been recorded, it becomes read only. In the event of an error, a correcting
transaction is entered at a later date. The company wants to maintain only the current
month’s data in the online database and to archive the remainder to read-only media.
Government regulations require that the company maintain the previous seven years’ worth
of records of all ATM transactions and that the data be available within 24 hours.

Multiple Choice
Circle the letter or letters that correspond to the best answer or answers.
Use the information in the previous case study to answer the following questions:
1. Fill in the following table (you may need to modify it) to show the online and archived
data-accessibility requirements. Create time divisions, and classify the data based on
those divisions. Ensure that the data classification meets the query requirements.

D ATA S OURCE A CCESSIBILITY R EQUIREMENT S TORAGE F ORMAT


Online
Archived
Offline

2. Fill in the following table, summarizing your proposed data-movement schedule.

D ATA M OVEMENT FREQUENCY

3. Which of the following business requirements should be considered when designing an


archival data strategy as called for by Thylacine Savings & Loan Association? (Choose
all that apply.)
a. Cost
b. Government/Industry regulations
c. Accessibility requirements
d. Granularity
4. Which data structure is appropriate if you wish to maintain the historical context of the
archival data, but you cannot archive all the related data together?
a. Partitioned tables
b. Normalized tables
c. Denormalized tables
d. Summary tables
Designing a Data-Archiving Solution | 283

5. If the requirements for the case study were to 1) maintain 36 months of data online
for immediate access for queries and updates and 2) maintain a total of 7 years of data
to meet accounting and reporting requirements, which of the following is the most
appropriate storage format plan?
a. Place the current 36 months’ worth of data on an OLTP database server and the
previous 4 years’ worth of data on an archive server.
b. Place all the data on the OLTP server, and use partitioning to separate the data
between the current 36 months and the remaining 48 months.
c. Place the current 36 months of data on an OLTP server and the remainder on tape.
d. Use summary tables to reduce the load on the OLTP server, and store all detailed
data on an archive server.
6. The data-movement strategy should contain which of the following steps? (Choose all
that apply.)
a. Verification that data has been copied to the destination storage format
b. Means to ensure the security of data during movement
c. Specification of the frequency of data movement
d. Scheduling of data movement to minimize impact on the production server
7. Which of the following roles can a single server have in a replication topology?
a. Distributor
b. Publisher
c. Subscriber
8. Enhancements to SQL Server 2005 that simplify administration of a replication
topology include which of the following? (Choose all that apply.)
a. Schema changes can be automatically sent to subscribers without using special stored
procedures.
b. Specific tables of a database can be replicated, not necessarily the whole database.
c. A number of wizards in Management Studio make it much simpler to set up the
replication topology, once it’s designed.
d. All of the above.
9. You are a database administrator for a small company in Tasmania. You maintain a
Sales database that contains a SalesTransactions table. Requirement are that 36 months
of this table data must be stored online in the Sales database and that older data must
be sent to an archive database. Which of the following is the best way to structure the
SalesTransactions table?
a. Partitioned view
b. Table partitioning
c. Denormalization
d. Summary tables
10. Refer to the previous question. Which archival frequency would you use?
a. Daily
b. Monthly
c. Quarterly
d. Annually
This page intentionally left blank
Glossary

active directory (AD): The operating constraint: A property assigned to a table developing: Designing a database migration
system’s directory service that contains column that prevents certain types of plan for the consolidated environment,
references to all objects on the network. invalid data values from being placed in creating a solution, and testing the pilot.
Examples include printers, fax machines, the column. For example, a UNIQUE or disaster recovery plan (DRP): A policy that
user names, user passwords, domains, PRIMARY KEY constraint prevents you defines how people and resources will be
organizational units, computers etc. from inserting a value that is a duplicate protected in the case of a natural or man-
archive: A repository containing historical of an existing value, a CHECK constraint made disaster and how the organization
records that are intended for long-term prevents you from inserting a value that will recover from the calamity.
preservation. does not match a specified condition, and
assembly: A managed application module NOT NULL prevents you from leaving encryption key: A seed value used in an
that contains class metadata and managed the column empty (NULL) and requires algorithm to keep sensitive information
code as an object in SQL Server. By ref- the insertion of some value. confidential by changing data into an
erencing an assembly, common language convention: A convention is a set of agreed, unreadable form.
runtime (CLR) functions, CLR stored stipulated, or generally accepted norms or envisioning: Gathering information to
procedures, CLR triggers, user-defined criteria, often taking the form of a custom. analyze a dispersed environment and iden-
aggregates, and user-defined types can be cryptology: The study or practice of both tifying potential consolidation problems.
created in SQL Server. cryptography (enciphering and decipher- execution context: Execution context is
asymmetric key: In cryptology, one key, ing) and cryptanalysis (breaking or crack- represented by a login token and one or
mathematically related to a second key, ing a code system or individual messages). more user tokens (one user token for each
is used to encrypt data while the other database assigned). Authenticators and
decrypts the data. data control language (DCL): A set of SQL permissions control ultimate access.
audit: An independent verification of truth. commands that manipulate the permis- extent: a unit of space allocated to an object.
sions that may or may not be set for one A unit of data input and output; data is
budgetary constraint: Limits placed on your or more objects. stored or retrieved from disk as an extent
ability to invest as much as you might data definition language (DDL): A subset (64 kilobytes).
wish in an infrastructure improvement of T-SQL commands which create, alter,
project. and delete structural objects such as tables, failover: A switch between the active and
business continuity plan (BCP): A policy users, and indexes in SQL Server. standby duplicated systems that occurs
that defines how an enterprise will main- data manipulation language (DML): automatically without manual interven-
tain normal day-to-day operations in the A subset of T-SQL commands which tion. Sometimes known as switchover.
event of business disruption or crisis. manipulate data within objects in SQL filegroup: A named collection of one or more
Server. These are the regular T-SQL com- data files that forms a single unit of data al-
camelCase: A method or standard for naming mands such as INSERT, UPDATE, and location or for administration of a database.
objects. With camelCase, all characters DELETE. format: The organization of data stored on
are lowercased except the first letter of database: A collection of information, tables, some form of media. This could be a SQL
component words other than the first and other objects organized and pre- Server backup format, a simple TXT file
word. An example of camelCase would be: sented to serve a specific purpose, such as containing comma-separated value (CSV)
customerAddress. searching, sorting, and recombining data. data, or some other form of organization
capacity: A measure of the ability to store, Databases are stored in files. of the data.
manipulate and report information col- database mirroring: A technology for
lected for the enterprise. Excess capacity continuously copying all data in a data- high availability: High availability is the con-
suggests too much investment in infra- base from one server to another so that tinuous operation of systems. For a system
structure or a declining business need. in the event that the principal server fails, to be available, all components including
certificate: A digital document (electronic the secondary server can take over the application and database servers, storage
file) provided by a trusted authority to processing of transactions using it’s copy devices and the end-to-end network need
give assurance of a person’s identity; of the database. to provide uninterrupted service.
certificates verify a given public key decision tree: A decision tree is a technique for horizon: A forecasting target. A horizon too
belongs to a stipulated individual or determining the overall risk associated with far distant may result in capacity or other
organization. a series of related risks; that is, it’s possible changes that don’t prove needed; a horizon
common language runtime (CLR): A key that certain risks will only appear as a result too near may result in investments that
component of the .NET technology pro- of actions taken in managing other risks. don’t meet tomorrow’s needs.
vided by Microsoft that handles the actual deploying: Migrating and stabilizing your
execution of program code written in any database servers in the consolidated index: In a relational database, a database
one of many .NET languages. environment. object that provides fast access to data in
285
286 | Glossary

the rows of a table, based on key values. page: a unit of data storage. Eight pages owned by any user and its ownership is
Indexes can also enforce uniqueness on comprise an extent. transferable.
the rows in a table. SQL Server supports PascalCase: A method or standard for nam- scope: A division of SQL Server’s security
clustered and nonclustered indexes. The ing objects. With PascalCase, all charac- architecture (principals, permissions and
primary key of a table is automatically ters are lowercased except the first letter securables) that places securables into
indexed. In full-text search, a full-text of each component word. An example of server-scope, database-scope and schema-
index stores information about significant PascalCase would be: CustomerAddress. scope divisions.
words and their location within a given permission: An access right to an object secondary database: The passive or second-
column. controlled by GRANT, REVOKE, and ary database in a mirroring configuration.
instance: A separate and isolated copy of DENY data control language commands. Also known as the mirror database.
SQL Server running on a server. Applica- planning: Evaluating the data you gathered security measures: The steps taken to assure
tion service providers can support mul- in the previous phase and creating a data integrity.
tiple businesses and their database needs specification to consolidate SQL Server security policy: The written guidelines to
while guaranteeing one business cannot instances. be followed by all employees of the enter-
see the other’s data. policies: A set of written guidelines providing prise to protect data and resources from
direction on how to process any number unintended consequences. A security
log shipping: A technology for high avail- of issues; e.g., a corporate password policy. policy, for example, should exist guiding
ability that is based on the normal principal database: The active database in a all users on how to protect their network
backup and restore procedures that exist mirroring configuration. password.
with SQL Server. In this environment, principal server: A machine that during services: Processes that run in the back-
transaction-log backups are made on the normal operating conditions provides the ground of the operating system; analogous
principal server and then copied to the services that a service such as SQL Server to Daemons in Unix.
secondary server. offers. single point-of-failure: A vulnerability
whose failure leads to a collapse of the
media: The physical item used to store data. quorum: The majority of servers in a mir- whole.
Tapes are a common form of media as roring configuration. A quorum of two snapshot replication: A method of replica-
are individual optical storage items such servers determines which database is the tion that involves database snapshots.
as CDs and DVDs. The type of media principal server. In a normal situation, the This form of replication is not a high
used must match the physical hardware principal database and the witness form availability solution.
device used for reading from and writing a quorum that keeps this primary server standard: A standard establishes uniform
to the media. As an example, an AIT tape functioning as the primary database in a engineering or technical criteria, processes,
cartridge (the media) must only be used in mirroring configuration. and practices usually in a formal, written
an AIT type tape drive. manner.
media retention: A period of time such as recovery model: A database option that symmetric key: In cryptology, a single key is
a year, a month, or a week for which any specifies how the write ahead transac- used to both encrypt and decrypt data.
backup media is not altered and kept tion log records events; the options are
in that state in which it was created. simple, bulk logged, and full. These table: A two-dimensional object, which con-
After this retention period the media is settings influence your protection sists of rows and columns, that stores data
allowed to be reused for another new against data loss. about an entity modeled in a relational
backup. regulatory requirements: A set of compli- database.
merge replication: A method of replication ance directions from an external organiza- topology: The manner in which the
which transfers data from one database to tion. This could be a governmental agency components of a system are arranged or
one or more other databases. Data can be (e.g., the regulator of the Sarbanes-Oxley interrelated, including adjacency and
changed in more than one location. This Act or HIPPA) or your corporate head- connectivity.
may cause conflicts to arise. quarters. transaction replication: A method of replica-
method: A specific means of action to accom- replication: Replication is a set of technolo- tion which transfers transactions from one
plish a stipulated goal or objective. gies for copying and distributing data and database to one or more other databases.
mirror database: The passive or secondary database objects from one database to Changes to data are not allowed on the
database in a mirroring configuration. another and then synchronizing between receiving database(s).
Also known as the secondary database. databases to maintain consistency.
role: A SQL Server security account that is view: An object defined by a SELECT state-
object: An object is an allocated region of a collection of other security accounts ment that permits seeing one or more
storage; an object is named; if the database that can be treated as a single unit when columns from one or more base tables.
structure has a name, it’s an object. Ex- managing permissions. A role can contain With the exception of instantiated views
amples include database, table, attribute, SQL Server logins, other roles, and Win- (indexed views), views themselves do not
index, view, stored procedure, trigger, and dows logins or groups. store data.
so on.
organizational unit: An object within Active schema: Each schema is a distinct namespace witness server: An optional third server
Directory that may contain other objects that exists independently of the data- used in some mirroring configurations
such as other organizational Units (OUs), base user who created it; a schema is a to initiate the automatic failover within
users, groups, and computers. container of objects. A schema can be seconds of the principal server failing.
Index

% disk time, 69 table types, 270


@loopback_detection, 229 transactional replication, 276–277
64–bit, 22–23, 76 Arrays
RAID, 36–37, 231–232, 233
A SAN storage, 232–233
Access Assemblies
auditing, 101–102 creating, 161
categories, defining, 134 defined, 150, 161
direct, 202 EXTERNAL_ACCESS, assemblies, setting, 146
indirect through stored procedures, 202 SAFE assemblies, setting, 145–146
indirect through views, 202 trusted, 162
standards, 201–203 UNSAFE assemblies, setting, 146
Active/active three-node cluster, 217 Asymmetric key
Active Directory (AD) defined, 107, 111–112
as authentication mechanism, 93–94 mechanism, 132
defined, 87, 89 Attacks
Active/passive two-node cluster, 216 HTTP, 100
Address Windowing Extensions (AWE), 22 server, 99–101
ALTER ENDPOINT statement, 142, 143 SQL Injection, 93, 100
ALTER INDEX command, 35 virus, 122
ALTER statement, 137, 139, 141 Audit, defined, 87, 88
ALTER TABLE command, 35 Authentication
Antivirus software, 122 Active Directory, as mechanism of, 93–94
Applications application roles, 95
analyzing for consolidation, 66–67 certificate-based, 132
authentication roles, 95 choosing a method, 94
domains, 162 groups and roles, 94–96
migration, 78–79 impersonation and delegation, 95–96
monitoring for consolidation, 67–68 Kerberos, 94–95
roles, 136–137 key-based, 132
Archive, defined, 265, 266 logins, creating, 130–131
Archiving data, 265–280 network policies, 96–97
accessibility, 269, 271, 272, 280 security, administrative, 95
business requirements, 267–268 SQL Server, 93–94, 108
costs, 266, 269, 271 Windows, 93–94, 108
data-movement strategy, 273 Availability, High. See High Availability
deciding what to archive, 269–272 Average Disk Queue Length, 69
disk space use, 266, 269 Avg. Disk Read/sec, 69
historical data, identifying, 269 Avg. Disk Write/sec, 69
interval, 269–270
maintenance time, 266 B
merge replication, 277–278 Backups
performance, 269 antivirus software, 122
query performance, 266 business requirements, 252
reasons to, 266–273 compression, 35, 42
regulatory requirements, 267–268 data, 243–246
reliability, 272 database categories based on recovery criteria, 252–253
replication topology, 274–280 devices, 244
scenario, 267–268 differential, 245
security, 272 documenting, 256
shelf life, 271 filegroup, 246, 251
snapshot replication, 277 files storage, designing, 41–42
storage media and format, 271–272 frequency of, 255
structure of, 270–271 full, differential, and transaction log, 250–251
287
288 | Index

Backups (continued) versus database mirroring, 220–221


full only, 244, 249–250 enhancements, 218–219
full with differential, 250 four-node, 218
full with transaction log, 250 geographic design, 215, 219
partial and partial differential, 251 hardware decisions, 219–220
processes, 10 instances, 47
RAID, 41 licensing costs, 220
recovery types, and, 253–255 multisite, 219
restore strategy, designing, and, 251–253 nodes, 215, 216–218
security policy, 255–256 reporting options, 235
storage, offsite, 255–256 requirements, 215
strategy, 249–257 scope, 221
strategy, key steps, 252 solution design, 216–218
transaction log backups, 42, 245–246 three-node active/active, 217
validation and testing policy, 256–257 two-node active/passive, 216
very large databases (VLDBs), 246 WSC-certified hardware, 215
Best practices Cluster Validation Tool, 218
infrastructure, 14 Collation conflicts, 73
password rules, 110 Column(s)
recovery plan, 260–261 compression, 35
BitLocker, 256 computed, 182
Budgetary constraints, defined, 1, 5 datatypes and constraints, 177
Bulk log backup, 42 denormalization, 171–172
Business continuity plan (BCP), defined, 242, 257 encryption, 159–161
Business requirements filtering, 189
archiving data, 267–268 identity, 178
backup strategy, 252 naming conventions, 197
capacity, 5–6 normalization, 171, 175
infrastructure, 5–6 SessionLoginName, 158
views, 187–188 sparse, 35–36
Common language runtime (CLR), defined, 150, 161
C Common language runtime (CLR),
Cache management, on-chip, 22 programming, 97
camelCase, defined, 194, 198 Common language runtime (CLR),
Capacity security, 161–164
analysis, 4 application domains, 162
business requirements, 5–6 assemblies, creating, 161
changing, 4 assemblies, trusted, 162
CPU, 11 EXTERNAL_ACCESS, 162
defined, 1, 4–5 module signing, 162–163
designing for, 6–13 policy, developing, 163–164
disk space, 6 resources, external, accessing, 162–163
disk space, growth rate, 7–8 Compression, data. See Data compression
disk throughput, 7 Configuration
estimation period, 7, 11 approved, 14
horizon, 6 business requirements, 5–6
memory usage, 12–13 clustering, 216–218
network traffic, 9–10 database mirroring, 221–224
network requirements, 9–10 hardware and software, 13–25
regulatory requirements, 5, 8–9 holistic thinking, 3
sources of, 6 infrastructure, 2–6
storage requirements, 6–9 instances, tempdb, 52–53
technical requirements, 4–5 log shipping, 226–227
Certificate, defined, 107, 110 network, clustering, 215
Clients regulatory requirements, 5
database mirroring, 223 standardization, 3
log shipping, 227–228 Configuration Manager, 52–53
network traffic, 10 CONNECT statement, 141, 142
Clustering, 77, 98, 214–220 Consolidation
configuration, 216–218 application migration, 78–79
consolidation, and, 77 applications, envisioning, 66–68
costs, 215 clustering, and, 77
Index | 289

costs, 60–61, 64 CPU


data evaluation, 74–75 64-bit versus 32-bit, 22–23, 76
deciding against, 63–64 affinity mask, 11
deciding for, 60–63 architecture, 22–23
decisions, initial, 75–77 capacity, 11
deploying (phase 4), 82–83 clustering, 219–220
developing (phase 3), 80–82 consolidation, envisioning, 68
environment, examination of, 66–73 counters, 68
envisioning (phase 1), 59–73 estimation period, 11
geographic, 65 evaluating, 74
geographical issues, 71–72 hot add, 24–25
goals, 64–66 hyperthreading, 22–23
guidelines, 65 monitoring for consolidation, 68
hardware acquisition, 80–81 multicore, 22
instance, 65 operating system licenses, 11
management, 61–62 as single point of failure, 211
on-line analytical processing (OLAP), 65 sizing, 78
on-line transaction processing (OLTP), 65 SQL Server editions, 23
physical server, 65 types, 11, 22–23
pilot, 81 usage, 11
planning (phase 2), 73–80 CREATE APPLICATION ROLE
proof of concept, 81 command, 136
resources, use of, 62–63 CREATE ASSEMBLY command, 145, 146
return on investment, 62 CREATE ENDPOINT command, 137, 143
risk factor, 64 CREATE INDEX command, 35
scope creep, 79–80 CREATE statement, 137, 139
security, 61 CREATE TABLE command, 35, 36
service-level agreements, 71 CREATE USER statement, 132, 137
storage, 65 Cryptology, defined, 107, 111
sunk cost factor, 64
team formation, 59–60 D
types, 65 Data archiving. See Archiving
Constraint(s) Database, defined, 169
budgetary, 1, 5 Database, development, location and role,
check, 181 203–204
column, 177 Database, physical, design
default, 181 default databases, 50
defined, 168, 170 denormalization, 171–172
naming conventions, 197 documentation and diagramming, 172
prefix, 196 filegroups, 182–184
unique, 181–182 index usage, 184–186
using, 180–182 information, gathering, 170
Contact list, 257–258 normalization, 171
CONTROL statement, 141 objects, inventory, 171
Convention, defined, 194 planning, 170–171
Convention, naming. See Naming convention schema, documentation, 173
Conventions and standards. See Database, conventions table design, 173–182
and standards views, 187–189
Costs Database, production, location and role,
archiving data, 266, 269, 271 203–204
clustering, 215, 216, 220 Database, system, location, 49–50
consolidation, 60–61 Database, test, location and role, 203–204
cooling capacity, 61 Database access
database recovery, 252 auditing, 101–102
data loss, 252 direct, 202
electrical, 60–61 indirect through stored procedures, 202
high availability, 234 indirect through views, 202
return on investment, 62 standards, 201–203
salaries, 60 Database conventions and standards, 194–205
security, 92–93 naming conventions, 195–199. See also Naming conventions
soft versus hard, 61 standards, 200–205. See also Standards, database
storage, archival, 271 Database Diagram Designer, 173
290 | Index

Database files backup strategy, 249–257


groups, 45 mitigation plans, 257–261
location of, 44–45 model, choosing, 253–255
naming, 45 restoring databases, 246–249
setting up, 44 transaction logs, and, 38
size, 45 Datatypes
types, 44, 169 built-in, 177–180
Database Maintenance Plan Wizard, 244 column, 177
Database mirroring user-defined, 180
client applications, 223 DBCC SQLPERF (LOGSPACE), 39–40
versus clustering, 220–221 DDL triggers. See Triggers, DDL
configuration, 221–224 Decision tree, defined, 242
defined, 210, 220 DecryptByKeyAutoCert() function, 112
endpoints, 224 Delegation and impersonation, 95–96
enhancements, 225 Denormalization, database, 171–172
hardware, 220 DENY statement, 136, 141
high availability, security, 98 Deploying, consolidation, 82–85
high-availability mode, 222 defined, 58, 82
high-performance mode, 222 staffing, 83
high-protection mode, 222 Deployment process, database, 203–205
modes, 222–223 production data, protecting, 204
network traffic, 10 recording changes, 204–205
principal database, 221 staff responsibilities, 204
protection levels, 222–223 Developing, consolidation, 80–82
quorum, 221 defined, 58, 80
reporting options, 235 design reexamination, 82
scope, 221 hardware acquisition, 80–81
secondary database, 221 pilot, 81–82
server roles, designing, 221 proof of concept, 81
solution design, 223 Development database, role and location of, 203–204
testing, 224–225 Differential backups, 245, 250–251
transaction logs, 38 Disaster recovery plan (DRP), defined, 242, 257
witness server, 221 Disk mirroring. See Mirroring
Database restore. See Restoring databases Disk performance
Database size, estimating, 33–37 monitoring, 69–70
capacity, planning for, 34 RAID, 69
data compression, 34–35 Disk space
RAID, 36–37 archival, 266, 269
sparse columns, 35–36 compound growth, 8
Database standard(s) geometric growth, 8
access, 201–203 growth rate, projecting, 7–8
defined, 194, 200 linear growth, 8
deployment process, 203–205 sizing, 78
security, 205 storage capacity, 6–7
Transact-SQL coding, 200–201 Disk striping. See Striping
Data center, cooling capacity, 61 Disk subsystem, 75
Data compression Documentation
mirroring, 225 access standards, 203
page, 35 backups, 256
row, 35 database, physical, and diagramming, 172
Data control language (DCL) infrastructure, 258
commands, 152 mitigation plan, 258
defined, 150 naming conventions, 199
Data definition language (DDL), defined, 129, schema, 173
Data definition language (DDL) triggers, 137–139. See also Triggers, DDL T-SQL coding standards, 201
Data files Domain integrity, 171, 181
log, 31, 44, 169 Domain Name System (DNS), 237
primary, 31–32, 44, 169 Downtime, 236
secondary, 31, 32, 44, 169 DROP DATABASE <database>, 247
Data manipulation language (DML), 129 DROP statement, 137, 139
Data recovery Dynamic affinity, 11
backing up data, 243–246 Dynamic management views, 12
Index | 291

E Extended stored procedures (XPs), 73


Execution context, defined, 150, 155 Extensible Key Management (EKM), 114–115
Encrypting File System (EFS) 256 Extent(s)
Encryption defined, 30, 31
cell-level, 256 storage, physical, 32, 33
certificates, 112
column-level, 159–161 F
database-level, 140–141, 256 Failover
deploying, 160–161 automatic versus manual, 214
hierarchy, 111 cluster, 215
keys, 114–115 database mirroring, 222–223
keys, choosing, 160 defined, 210, 212, 213
performance issues, 112–113 delay, 221
policy, 113, 140–141 planned, 224–225
symmetric and asymmetric keys, 111–112 three-node, 217
transparent data encryption (TDE), 140–141 unplanned, 225
triggers, 139–141 Fault tolerance, designing, 233
Windows Server level, 111–115 Filegroup (s)
Encryption key, defined, 107, 111 backups, 251
Endpoint(s) defined, 30, 32
database mirroring, 224 designing, 182–184
default protocol, 141 partitioning, 184
policies, defining, 143–144 performance, 183
service broker and database mirroring endpoints, 143 recoverability, 184
SOAP/Web Service endpoints, 142–143 setting up, 45
TDS endpoints, 142 Filenames, setting up, 45
Entity integrity, 171, 180–181 File size, setting up, 45
Envisioning, consolidation, 59–73 Firewall, 124–125
applications, 66–68 Five nines, 211
costs, 60–61 Format, defined, 265
CPU, 68 Full backups, 244, 249–251
deciding against, 63–64 Full-Text Search, 73
deciding for, 60–63
defined, 58, 59
G
disk performance, 69–70
Geographic consolidation, 65
environment, examination of, 66–73
Global Allocation Map (GAM), 31, 33
geographical issues, 71–72
GRANT statement, 136, 141
goals, 64–66
guidelines, 65
issues, 71, 73 H
management, centralized, 61–62 Hard disk, speed versus memory speed, 32
memory, 68–69 Hardware
resources, use of, 62–63 acquiring, 80–81
return on investment, 62 clustering, 215, 219–220
security, 61 configuration, 13–25
service-level agreements, 71 consolidation, 80–81
SQL Server-specific metrics, 70–71 database mirroring, 220
systems, associated, 72–73 failure, 243
team formation, 59–60 partitioning, 175
EXECUTE AS CALLER, 155 sizing, 78
EXECUTE AS OWNER, 156 WSC-certified, 215
EXECUTE AS SELF, 156 Hardware Security Module (HSM), 114
EXECUTE AS <user_name>, 155 High Availability (HA)
Execution context, specifying, 155–159 backups, protecting, 101
auditing, 158–159 clustering, 214–220
EXECUTE AS, implementing for an object, 155–156 consolidation, planning, 76–77
EXECUTE AS, implementing in batches, 157–158 costs, 234
EXECUTE AS policy, batches, developing, 159 database mirroring, 213, 220–225
EXECUTE AS policy, objects, developing, 156 defined, 210, 214
NO REVERT, 157 dynamic name system (DNS), 237
NO REVERT COOKIE, 158 failover, 212–213
scope of the EXECUTE AS statements, 157 goals, 212–213
292 | Index

High Availability (HA) (continued) choosing, 114, 160


HTTP attacks, guarding against, 100 encryption, 107, 111, 114–115, 160
limitations, 213–214 managing, 114
log shipping, 213, 226–228 symmetric, 111–112
migration strategy, 235–237 surrogate, 113
password cracking, 100–101 Keys, tables
replication, 213, 228–230 foreign, 176–177
for reporting purposes, 235 primary, choosing, 176
server attacks, 99 specifying, 175
single points of failure, 211–212
solution design, 233–235 L
source code, managing, 100 .ldf file extension, 44
SQL injection attacks, 100 Local service account, 117, 118
staffing, 234–235 Local system account, 117, 119
storage, 230–233 Logins
technologies, 211–214 creating, 130–131
training costs, 77 system adminstrator (sa), 130
Horizon, defined, 1, 6 Log shipping
Horizontal partitioning, 175 advantages and disadvantages, 226
HTTP requests, responding to, 142 antivirus software, and, 122
Human malevolence, 244 client applications, reconnecting, 227–228
Hyperthreading, 22–23 configuration, 226–227
defined, 210, 226
I high availability, 226–228
Impersonation and delegation, 95–96 reporting options, 235
Index Allocation Map (IAM), 31, 33 roles, choosing, 226–227
Indexed Sequential Access Method (ISAM), 31 roles, switching, 227
Index(es) security, 98
access speed, 185–186 transaction log, 38
clustered, 184–185
Database Tuning Advisor, 186 M
defined, 169 .mdf file extension, 44
designing, 184–186 Media, defined, 265
naming conventions, 197 Media and format, storage, 271–272
nonclustered, 184–185 Media retention, defined, 242, 255
placement, physical, 186 Memory
Infrastructure, database server analyzing, 12–13
designing, 1–25 consolidation, envisioning, 68–69
capacity requirements, 6–13 consolidation, planning, 74–75
configuration, current, analyzing, 2–6 direct addressable, 22
software versions and hardware configurations, 13–25 evaluating, 74–75
Instance(s) forecasting and planning for, 13
assigning ports to, 53 growth rate, 13
clustering, 47 monitoring, for consolidation, 68–69
configurations, specifying, 52–53 options, choosing, 23–24
consolidation, 65, 76 sizing, 78
default, 45–46 speed versus hard disk speed, 32
defined, 30 System Monitor counters, 12
designing, 45–46 usage, assessing, 12–13
named, 46, 97 Memory: Available Bytes, 12, 68
naming conventions, 47–48 Memory: Pages/sec, 12
number of, 46–47 Memory: Paging File: % Usage, 68
service requirements, establishing, 52 Merge replication
system databases, location of, 49–50 archiving data, 277–278
Integrity, data, 171 defined, 210, 229
high availability, 230
K Method, defined, 194, 204
Kerberos Microsoft Cluster Services (MSCS), 214
as authentication method, 94–95 Microsoft Operations Manager (MOM), 13
endpoints, using, 142 Microsoft Solutions Framework (MSF), phases, 58–59
Keys, encryption Migration
asymmetric, 107, 111–112 address abstraction, implementing, 237
Index | 293

consolidation, application, 78–79 O


domain issues, 79 Object(s)
downtime, minimizing, 236 defined, 169
of DTS packages to Integration EXECUTE AS, 155–156
Services, 79 naming conventions, 196–197
high availability, 235–237 permission, 154
logins, 78 as a security level, 89
security issues, 79 Object-level security, 150–164
testing, 236 common language runtime (CLR),
training, 237 161–164
users, 78 encryption, column level, 159–161
Mirror database, defined, 210, 221 execution context, specifying, 155–159
Mirroring, 36, 231. See also Database mirroring permissions, existing, analyzing, 154–155
Mitigation plan, 257–261 permissions, strategy, developing, 151–153
business continuity plan, 257 On-line analytical processing (OLAP), 65
contact list, 257–258 On-line transaction processing (OLTP), 65
database loss scenarios, 259 Operating system
decision tree, 258–259 CPU licensing, 11
disaster recovery plan, 257 installation location, 43
documentation, 258 version and edition, choosing, 14–20
information, categorizing, 257 Organizational unit, defined (OU), 87
recovery decision tree, 260 Orphaned users, 132
recovery plan best practices, 260–261
recovery steps priorities, 259–260 P
recovery success criteria, 258 Page, defined, 30, 32
msdb, 70 Page Free Space, 31, 33
Multicore CPU, 22 Pages, 32–33
Multiple Active Result Sets (MARS), 51 Parallelism, on-processor, 22
Partitioning, tables, 174–175
N PascalCase, defined, 194, 198
Naming conventions Password
arguments against, 196 best practices, 110
bad practices, 198–199 change, enforced at next login, 110
benefits of, 195–196 cracking, 100–101
for database objects, 196–197 expiration, enforced, 109
documenting and communicating, 199 options, 108–109
establishing and disseminating, 196–199 policy, 109
instances, 47–48 Performance
pitfalls, avoiding, 198–199 archiving data, 269
vendor’s, 199 consolidation, 62–63, 69–70
.ndf file extension, 44 database mirroring, 222
.NET Assembly security, designing, 145–146 disk, monitoring, 69–70
EXTERNAL_ACCESS, assemblies, encryption, 112–113
setting, 146 filegroup, 183
SAFE assemblies, setting, 145–146 query, 266
UNSAFE assemblies, setting, 146 tempdb, 63
Network diagrams, 9 Permission(s)
Network policies, 96–97 ALTER, 151, 153
Network service account, 117, 119 CREATE, 151, 153
Network traffic Data Control Language (DCL) commands, 152
analyzing, 9–10 defined, 150, 151
bottlenecks, 10 DENY, 151, 152, 155
between clients and servers, 10 DROP, 151
forecasting and planning for, 10 existing, analyzing, 154–155
identifying, 9 GRANT, 152, 154
between servers, 10 list of, 152–154
NORECOVERY option, 247–248 object, 154
Normalization REVOKE, 152, 155
database, 171 roles, 151
tables, 175 specific, applying, 153
NO_TRUNCATE option, 247, 250 strategy, developing, 151–153
NTFS file system, 44 VIEW DEFINITION, 154
294 | Index

Planning, consolidation, 73–80 RAM, hot add, 24–25


application migration, 78–79 RAM. See Memory
clustering, 77 RECOVER option, 247–248
data evaluation, 74–77 Recovery, data. See Data recovery
decisions, initial, 75–76 Recovery model
defined, 58, 73 bulk-logged, 39, 254–255
disk subsystem, 75 defined, 242, 253
hardware, sizing, 78 full, 39, 253–254
high availability, 76–77 simple, 39, 253
instances, multiple, choosing, 76 Redundant Array of Inexpensive Disks. See RAID
iterations, multiple, going through, 77 Referential integrity, 171, 181
memory data evaluation, 74–75 Regulatory requirements
processor data evaluation, 74 archiving data, 267–268
scope creep, 79–80 capacity, infrastructure, 5, 8–9
64-bit SQL Server, 76 configuration, 5
upgrade advisor, 76 defined, 1, 5
Policies longevity, 8–9
defined, 1, 5 privacy/security, 9
encryption, 113, 140–141 Replication. See also entries for specific replication types
endpoint, 143–144 antivirus software, and, 122
EXECUTE AS, for batches, 159 archival, 274–280
EXECUTE AS, for objects, 156 conflicts, 229–230
network, authentication, 96–97 consolidation, 72
password rules, 109 defined, 265
security, backup strategy, 255–256 merge, 210, 229, 230, 277–278
security, common language runtime, 163–164 network traffic, 10
security, defined, 87, 88 publisher/subscriber metaphor, 275
security, encryption, 140–141 reporting options, 235
triggers, DLL, defining, 139 security, high availability, 98
validation and testing, backup strategy, snapshot, 229, 277
256–257 strategy, design, 279–280
Primary key, choosing, 176 topology design, 274–280
Principle database, defined, 210, 221 topology management, 278–279
Principle server, defined, 210, 212 transactional, 38, 229–230, 276–277
Process: Private Bytes: sqlservr process, 68 types, 275–276
Process: SQ: Server process: % Processor Time, 68 Resources, shared, 73
Process: Working Set, 12 Restoring databases, 246–249
Processor: % Processor Time, 68 options, 247
Production data, protecting during deployment, 204 piecemeal restore, 249
Production database, point-in-time restore, 248
data integrity, 204 standard restore, 247–248
role and location of, 203–204 steps, general, 247
Proxies, 144–145 RESTRICTED_USER option, 247
Pure log backup, 42 Role(s)
application, 136–137
Q authentication, 94–96
Quorum database, granting, 133–137
defined, 210, 221 defined, 129
drives, 122 fixed, 134–135
groups, and, 94–96
R log shipping, 226–227
RAID mapping database users to, 132
0, 36–37, 231 permissions, 131–137
1, 36–37, 231 proxies, 144–145
5, 37, 231–232 public, 135, 136
10, 37, 232 server, designing, 221
backups, 41 server, granting, 131–132
database size, 36–37 SQL Server Agent, job, 144–145
designing, 232 user-defined, 135–136
disk performance, monitoring, 69 Rollback plan, 204
levels, choosing, 223 Row splitting, tables, 175
transaction log files, 40 Run book, 204
Index | 295

S services, working with, 123–124


SAN storage, 75, 232–233 source code, managing, 100
Schema(s) SQL Injection attacks, 100
dbo, 134 SQL Server service accounts, 115–121
defined, 129, 133 standards, 205
naming conventions, 197 technical, 99
naming structure, 133 Windows server-level, 107–128
securing, 133–134 Security measures, defined, 1, 5
as a security level, 89 Security policy, defined, 87, 88
Scope, defined, 129 Security scope, 89–90
Scope creep, 79–80 Servers, linked, consolidation, 72
Secondary database, defined, 210, 221 Servers, physical
Secure Socket Layer (SSL). See SSL consolidation, 65
Security instances and system databases, 49–50
administrative, 95 numbers needed, 49–50
analyzing and designing, 87–106 security of, 125
antivirus software, 122 Service accounts, 115–121
application domains, 162 changing, 119–121
application roles, 95 choosing, 117–118
archiving data, 272 domain user, 118
assemblies, creating, 161 local service, 117, 118
assemblies, trusted, 162 local system, 117, 119
auditing access, 101–102 network service, 117, 119
authentication, 93–97 service rights, 121
backups, protecting, 101 Service-level agreement (SLA), 65, 71
benefits, 93 Service-level and database-level security, 129–146
CLR, 161–164 DDL triggers, 137–139
conflicts, 91–92 encryption policy, 140–141
costs, 92–93 endpoints, securing, 141–144
database level, 89–91, 135–137, 140–141 logins, creating, 130–131
DDL triggers, 137–139 .NET assembly, 145–146
encryption, column level, 159–161 roles, database, granting, 134–137
encryption, policy, 110–115, 140–141 roles, server, granting, 131–132
endpoints, securing, 141–144 roles, SQL Server Agent, job, 144–145
execution context, specifying, 155–159 schemas, securing, 133–134
firewalls, 96, 124–125 Service Principal Name (SPN), 95
high availability, 97–101 Services, SQL Server
HTTP attacks, 100 defined, 107, 115–116
impersonation and delegation, 95–96 list of, 3, 43, 116–117
levels, 89–91 modes, 123–124
logins, creating, 130–131 TCP port numbers, 124
module signing, 162–163 working with, 123–124
.NET assembly, 145–146 Single point of failure
object-level, 150–164 defined, 210, 211
password cracking, 100 high availability, 211–212
password rules, 108–110 Snapshot replication
permissions, existing, analyzing, 154–155 archiving data, 277
permissions strategy, developing, 151–153 defined, 210, 229
policies, 99, 163–164 Software versions and hardware configurations, 13–25
recommendations, making, 102 best practices, 14
requirements, gathering, 88–93 CPU, type, choosing, 22–23
resources, external, accessing, 162–163 hot add CPUs and RAM, 24–25
reviews, performing, 102 memory options, choosing, 23–24
risk factors, 93 operating system, versions and editions, choosing,
roles, database, granting, 134–137 14–20
roles, server, granting, 131–132 SQL Server, editions, choosing, 20–22
roles, SQL Server Agent, job, 144–145 storage requirements, determining, 24
schema level, 89–91 Sparse columns, 35–36
schemas, securing, 133–134 SQL Server 2005
scope, 89–90 choosing, 21
server level, 89–91, 129–134 CPU type and speed, 23
service level, 89–91 editions, 15, 21, 23
296 | Index

SQL Server 2005 (continued) costs, 271


hard disk space requirements, 24 data movement strategy, 273
instances per edition and component, 47 media type and format, 271–273
services, list of, 43 reliability, 272
storage requirements, 24 security, 272
SQL Server 2008 shelf life, 271
backup compression, 42 technology, changing, 272
choosing, 21–22 Storage, high availability, 230–233
clustering enhancements, 218–219 fault tolerance, 233
CPU type and speed, 23 RAID arrays, 231–232, 233
data compression, 34–35, 43 SAN storage array, 232–233
datatypes, table, 180 Storage, media and format, 271–272
editions, 16–20, 21–22, 23, 24 Storage, physical, 30–57
Extensible Key Management, 114–115 backup-file storage, 41–43
hard disk space requirements, 24 concepts, 31–33
hot add CPUs and RAM, 24–25 database size, 33–37
instances, 48 data compression, 34–35
mirroring enhancements, 225 data files, 31–32
RAM, 24 extents, 33
sparse columns, 35–36 file placement, 44–45
software modules, disk space, 24 file types, 31–32
storage requirements, 24 instances, 45–48
transparent data encryption (TDE), operating system, location of, 43
140–141, 256 pages, 32–33
SQL Server Authentication, 93–94, 108 planning for, 34
SQL Server: Buffer Manager, 12 RAID, 36–37
SQL Server: Buffer Manager: Buffer Cache Hit Ratio, 68 servers, number needed, 49–50
SQL Server: Buffer Manager: Stolen Pages and Reserved sparse columns, 35–36
Pages, 68 tempdb database, 50–53
SQL Server: Buffer Manager: Page Life Expectancy, 12 transaction logs, 31–32, 37–41
SQL Server: Cache Manager: Cache Hit Ratio, 70 SQL Server Service Executables, 43–44
SQL Server: Databases: Database Size: tempdb, 70 Storage capacity
SQL Server: Databases: Transactions/sec, 70 analyzing, 6–9
SQL Server: General Statistics: User Connections, 70 disk space, 6–7
SQL Server hierarchy, 90 disk throughput, 7
SQL Server Integration Services (SSIS), 97 locations and roles of database servers, 7
SQL Server: Memory Manager: Total Server Memory, requirements, determining, 24
12, 68 Storage consolidation, 65
SQL Server metrics, monitoring, 70–71 Stored procedures
SQL Server Service Executables, location of, 43–44 database access, indirect, 202
SQL Server Services execute permission, 153
defined, 107, 115–116 extended (XPs), 73
list of, 3, 43, 116–117 naming conventions, 196, 197–198
modes, 123–124 sp_change_users_login, 132
TCP port numbers, 124 sp_configure, 4
working with, 123–124 sp_estimate_data_compression_savings, 35
SSL encrypted connections, 97 sp_ prefix, 198
Staff sp_unsetapprole, 137
consolidation, 83 usp_ prefix, 198
contact list, 257–258 Striping, 36, 231
deployment process, 204 Striping with parity, 37, 231–232
disaster situations, 234–235 Sunk cost factor, 64
high availability, 234–235 Surface Area Configuration Manager, 73
training, 237 Symmetric key
Standard(s), database algorithms, 111
access, 201–203 defined, 107
defined, 194, 200 sys.dm_exec_cached_plans, 12
deployment process, 203–205 sys.dm_exec_query_stats, 12
security, 205 sys.dm_os_memory_clerks, 12
Transact-SQL coding, 200–201 sys.dm_os_memory_objects, 12
Storage, archival System Center Operations Manager (SCOM), 13
accessibility, 271, 272 System Monitor counters, 12
Index | 297

T space use, monitoring, 39–40


Table(s) storage, 40–41
for archiving, 270 truncating, 39
computed columns, 182 write-ahead, 32, 37
constraints, 180–182 Transact-SQL (T-SQL) coding, 200–201
datatypes, built-in SQL Server, 177–180 Transparent data encryption (TDE), 140–141
datatypes, column, 177 Triggers, DDL, 137–139
datatypes, user-defined, 180 events, 138–139
defined, 169, 173 policy, defining, 139
denormalized, 270 scope, 137–138
design, 173–182 Triggers, DML, 139
integrity, 180–181 Triggers, naming conventions, 197
keys, foreign, 176–177
keys, primary, 176 U
keys, specifying, 175 Upgrade Advisor, 76
location, physical, 182 User-defined functions, naming conventions, 197
naming conventions, 197 User-defined integrity, 171
normalized, 175, 270
partitioned, 270
partitioning, 174–175 V
row splitting, 175 Vertical partitioning, 175
summary, 270 VIEW DEFINITION, 141, 154
Tail log backup, 42 View(s)
TAKE OWNERSHIP statement, 141 backward compatibility, 188
TCP ports, 53, 124 business requirements, 187–188
Technical requirements, capacity, 4–5 database access, indirect, 202–203
tempdb data customization, 188
Configuration Manager, 52–53 data import/export, 188
disk performance, 70 data manipulation, 188
instance configurations, 52–53 defined, 169, 187
services, 52 designing, 187–189
size, 51–52, 76 dynamic management, 12
usage, 76 filtering, row and column, 189
Test database, role and location of, 203–204 indexed, 188
Topology, defined, 265 naming conventions, 197
Transactional replication partitioned, 188–189
@loopback_detection, 229 standard, 188
archiving data, 276–277 types, 188–189
defined, 210 user data, 188
high availability, 229–230
Transaction log(s), W
adding or enlarging, 40 Windows authentication, 93–94, 108
backups, 42, 245–246, 250–251 Windows Server Catalog (WSC), 215
database mirroring, 38 Windows server-level security, 107–128
data recovery, 38–39 antivirus software, 122
designing, 37–38 asymmetric keys, 112
FILEGROWTH setting, 39 certificates, 112
files, 31, 32 encryption policy, 110–115
file size, managing, 38–41 firewalls, server, 124–125
log shipping, 38 password rules, 108–110
RAID, 40 services, working with, 123–124
recoverability, 32 SQL Server service accounts, 115–121
recovery, 39 symmetric keys, 111
replication, 38 Witness server, defined, 210, 221
shrinking, 40 Write-ahead log, 32, 37

S-ar putea să vă placă și