Sunteți pe pagina 1din 3

technical opinion

DOI: 10.1145/ 1400181.1400213


We ultimately investigated five archi-
BY THILINI ARIYACHANDRA AND HUGH J. WATSON tectures: independent data marts, bus,
hub and spoke, centralized (no depen-
dent data marts), and federated. While

Which Data other architectures (e.g., hybrid) are


mentioned in the literature, they tend
to be variations on these five.
For many organizations, indepen-

Warehouse
dent data marts are the initial efforts
to provide a repository of decision sup-
port data. These marts are typically
independent of other data stores, and

Architecture
serve specific, localized needs, such as
providing data for a particular applica-
tion or business unit. The data is stored
in a data model that best supports how

is Best?
the data is used (e.g., an OLAP cube).
The bus architecture has data marts
that support various business process-
es, such as orders, deliveries, or custom-
er calls. The first mart is built for a sin-
gle business process using dimensions
and measures that are used with other
marts (i.e., conformed dimensions).
Additional marts are developed using
Over the past 15 years, companies have spent billions these conformed dimensions, which
results in logically integrated marts and
of dollars on data marts and warehouses. Despite this an enterprise view of the data. There is
experience, there is an important design decision that no normalized, relational data in this
still causes heated discussion: Which data warehouse architecture; it is entirely dimensional.
The hub and spoke architecture begins
architecture is best? with an enterprise-level analysis of data
While there are many consultants and vendors requirements. Attention is focused on
building a scalable and maintainable
that claim that a particular architecture is best, infrastructure. Using the enterprise view
there has been surprisingly little rigorous, empirical of the data, the architecture is developed
research on the topic. The literature tends to describe in an iterative manner, subject area by
subject area. In this architecture, atom-
the architectures, provide case study examples, or ic level data is maintained in the ware-
present survey data about the popularity of the various house in 3rd normal form. Dependent
data marts are created that source data
architectures. The lack of empirical research on the from the warehouse, thus maintaining
topic motivated our study. a “single version of the truth.” The de-
For the research, in addition to reviewing the pendent data marts may be developed
for departmental, functional area, or
data warehousing literature, we formed a group of specialized purposes (e.g., data mining)
20 experts to help identify the architectures to study and may have normalized, denormal-
ized, or summarized dimensional data
and the success metrics to use. Bill Inmon and Ralph structures depending on user needs.
Kimball, leading authorities in the field and advocates The centralized architecture is simi-
of the two major competing architectures (i.e., hub lar to the hub and spoke except that
there are no dependent data marts.
and spoke and bus, respectively), were among the The warehouse contains atomic level
experts who participated. data, some summarized data, and logi-

146 CO MM UNICATIO NS O F T H E AC M | O C TO BER 2008 | VO L . 5 1 | N O. 1 0


technical opinion

cal dimensional views of the data. This port either several business units (38%) not surprising. A federated architecture
architecture is a logical rather than a or the entire company (36%). Fewer must “make do” with an existing deci-
physical implementation of the hub than 12 percent of the warehouses sup- sion support infrastructure and to some
and spoke architecture. port a single function area or sub unit. extent has to live with its weaknesses.
The federated architecture is advocat- However, the domain or scope of the The most important finding is how
ed when there is a fragmented decision warehouse varies with the architecture. similar the bus, hub and spoke, and
support data environment and there is The hub and spoke and centralized ar- centralized architectures scored on the
a need to integrate at least some of the chitectures have the broadest domain product success metrics. It also helps
data. This is often the case when there and are company wide in over 40% of explain why these competing archi-
are mergers, acquisitions, and company the organizations. The bus architec- tectures have survived over time—they
reorganizations. The federated archi- ture is enterprise wide in about 30 per- can be equally successful.
tecture leaves existing decision support cent of the companies, followed by the The similarity of the product success
structures (e.g., operational systems, federated (26%), and independent data of the bus, hub and spoke, and central-
data marts) in place. The data is either marts architecture (18%). ized architectures may not be too sur-
logically or physically integrated using We computed mean product success prising. Over time, each approach has
shared keys, global metadata, distrib- measures for the various architectures. incorporated strengths from the others.
uted queries, or other methods. The independent data mart architec- For example, the hub and spoke architec-
The literature and expert interviews ture scored lowest on all measures. Next ture typically includes dimensional data
identified two major categories of suc- lowest on all measures was the feder- marts, which is fundamental to the bus ar-
cess metrics. Product measures are as- ated architecture. What was most inter- chitecture. Advocates of all architectures
sociated with information and system esting was the similarity of the success now recognize the importance of rolling
quality, impacts on individual users, scores for the bus, hub and spoke, and out an initial version quickly in order to
and impacts on the organization. Proj- centralized architectures. No statistical- realize early “wins” or financial “lift” and
ect measures relate to the time and cost ly significant differences (MANOVA was maintain management support.
of implementing the architecture. used) were found for any of the three There were differences in terms of
architectures’ product success metrics. development time and cost. Because
The Survey and Findings All of these three architectures provide of the up front planning, large organi-
We developed a Web-based survey that similar, consistently high scores on all zational domain, and additional com-
asked about the data warehouse in the of the product success metrics (gener- ponents (e.g., dependent data marts),
respondent’s company, the architecture ally in the mid 5s on the 1-7 scale). the hub and spoke architecture takes
that was implemented, the success of the The survey instrument asked respon- the longest time and is the most costly
architecture, the respondent’s company, dents to indicate the average amount of to initially develop. The other architec-
and the respondent; 454 respondents time required to implement the first tures tend to be similar in terms of de-
provided completed questionnaires.a subject area or business process in velopment time and cost.
The respondents were relatively even- the warehouse. It took just under nine Overall, we found that the major
ly distributed over data warehouse man- months for the independent data marts, data warehouse architectures can de-
agers, data warehouse staff members, bus, and centralized architectures. The liver good information quality, system
IS managers, and independent consul- next longest time was required by the quality, individual impacts, and orga-
tants/system integrators. The latter were federated architecture, with the hub nizational impacts. The study did not
asked to complete the survey with a par- and spoke architecture taking the most find a clear “winner” in the “data ware-
ticular client in mind. time, at 11.5 months. house architecture wars” because there
The companies participating in the The average initial roll out cost for is not one. The product success metrics
survey ranged from small (i.e., less than the hub and spoke was the most expen- are very similar for the bus, hub and
$10M in revenues) to large (i.e., in excess sive of all the architectures at close to spoke, and centralized architectures.
of $10B). Most of the companies are lo- $2.5M. It was also the most costly ar- Companies can select an architecture
cated in the U.S. (60%) and represent a va- chitecture to maintain, at an average based on other relevant factors, such
riety of industries, with financial services cost of $1.24M. as the availability of resources, the ur-
(15%) providing the most responses. gency of the need for the warehouse,
The hub and spoke is the most prev- Conclusion management’s strategic view of the
alent architecture (39%), followed by We found why there are both agreements warehouse, the organizational domain
the bus architecture (26%), centralized and disagreements over which architec- served, compatibility with existing sys-
(17 %), independent data marts (12%), ture is best. The study findings show con- tems and technologies, the recommen-
and federated (4%). The most common clusively that independent data marts are dations of consultants, and others.
platform for hosting the data ware- the weakest solution in terms of informa-
houses is Oracle (41%), followed by Mi- tion quality, system quality, individual Thilini Ariyachandra (tariyacha@yahoo.com) is an
assistant professor of MIS in the Williams College of
crosoft (19%) and IBM (18%). impacts, and organizational impacts. Business at Xavier University, Cincinnati, OH.
Most of the data warehouses sup- This is consistent with conventional wis- Hugh J. Watson (hwatson@terry.uga.edu) is a professor
dom. Though not as weak, the federated of MIS in the Terry College of Business at the University
of Georgia, Athens.
a The full research report is available at http://www.terry.
architecture tended to score relatively
uga.edu/~hwatson/DW_Architecture_Report.pdf low on the success metrics. This is also © 2008 ACM 0001-0782/08/1000 $5.00

O C TO B E R 2 0 0 8 | VO L. 51 | N O. 1 0 | C OM M U N IC AT ION S OF T H E ACM 147

S-ar putea să vă placă și