|
Datawarehouse Terminology
Bitmapped Indexing
A family of advanced indexing algorithms that optimize RDBMS
query performance by maximizing the search capability of the
index per unit of memory and per CPU instruction. Properly
implemented, bitmapped indices eliminate all table scans in
query and join processing.
Business Model
An object-oriented model that captures the kinds of things in
a business or a business area and the relationships associated
with those things (and sometimes associated business rules,
too). Note that a business model exists independently of any
data or database. A data warehouse should be designed to match
the underlying business models or else no tools will fully
unlock the data in the warehouse
Corporate Data
All the databases of the company. This includes legacy
systems, old and new transaction systems, general business
systems, client/server databases, data warehouses and data
marts.
Data Dictionary
A collection of Metadata. Many kinds of products in the data
warehouse arena use a data dictionary, including database
management systems, modeling tools, middleware, and query
tools.
Data Mart
A subset of a data warehouse that focuses on one or more
specific subject areas. The data usually is extracted from the
data warehouse and further denormalized and indexed to support
intense usage by targeted customers.
Data Mining
Techniques for finding patterns and trends in large data sets.
See also Data Visualization.
Data Model
The road map to the data in a database. This includes the
source of tables and columns, the meanings of the keys, and
the relationships between the tables.
Data Visualization
Techniques for turning data into information by using the high
capacity of the human brain to visually recognize patterns and
trends. There are many specialized techniques designed to make
particular kinds of visualization easy.
Data Warehouse
A database built to support information access. Typically a
data warehouse is fed from one or more transaction databases.
The data needs to be cleaned and restructured to support
queries, summaries, and analyses.
Decision Support
Data access targeted to provide the information needed by
business decision makers. Examples include pricing,
purchasing, human resources, management, manufacturing, etc.
Decision Support System (DSS)
Database(s), warehouse(s), and/or mart(s) in conjunction with
reporting and analysis software optimized to support timely
business decision making.
Joint Application Development (JAD)
JAD is a process originally developed for designing a
computer-based system. It brings together business area people
(users) and IT (Information Technology) professionals in a
highly focused workshop. The advantages of JAD include a
dramatic shortening of the time it takes to complete a
project. It also improves the quality of the final product by
focusing on the up-front portion of the development lifecycle,
thus reducing the likelihood of errors that are expensive to
correct later on.
Metadata
Literally, "data about data." More usefully,
descriptions of what kind of information is stored where, how
it is encoded, how it is related to other information, where
it comes from, and how it is related to your business. A hot
topic right now is standardizing metadata across products from
different vendors.
Methodology
The steps followed to guarantee repeatability of success. A
good methodology is built on top of real world experience.
Middleware
Hardware and software used to connect clients and servers, to
move and structure data, and/or to pre-summarize data for use
by queries and reports.
Multidimensional Database (MDD)
A DBMS optimized to support multidimensional data. The best
systems support standard RDBMS functionality and add
high-bandwidth support for multidimensional data and queries.
Users that need a lot of slices and dices might appreciate a
multidimensional database.
Object Oriented Analysis (OOA)
A process of abstracting a problem by identifying the kinds of
entities in the problem domain, the is-a relationships between
the kinds (kinds are known as classes, is-a relationships as
subtype/supertype, subclass/superclass, or less commonly,
specialization/generalization), and the has-a relationships
between the classes. Also identified for each class are its
attributes (e.g. class Person has attribute Hair Color) and
its conventional relationships to other classes(e.g. class
Order has a relationship Customer to class Customer.)
Object Oriented Design (OOD)
A design methodology that uses Object Oriented Analysis to
promote object reusability and interface clarity.
OLAP
An acronym for On Line Analytical Processing.
On Line Analytical Processing (OLAP)
A common use of a data warehouse that involves real
time access and analysis of multidimensional data such as
order information.
Performance
Data, summaries, and analyses need to be delivered in a timely
fashion. Performance is often a key issue with data
warehouses: the right answer isn't worth much if it shows up
after the decisions have been made.
Query
A specific atomic request for information from a database.
Rapid Application Development (RAD)
Part of a methodology that specifies incremental development
with constant feedback from the customers. The point is to
keep projects focused on delivering value and to keep clear
and open lines of communication. English is not adequate for
specification of computer systems, even small ones. RAD
overcomes the limitations of language by minimizing the time
between concept and implementation.
Relational On-Line Analytic Processing (ROLAP)
OLAP based on conventional relational databases rather than
specialized multidimensional databases.
Replication
A standard technique in data warehousing. For performance and
reliability several independent copies are often created of
each data warehouse. Even data marts can require replication
on multiple servers to meet performance and reliability
standards.
Replicator
Any of a class of product that supports replication. Often
these tools use special load and unload database APIs and have
scripting languages that support automation.
Report
A repeatable, formatted, nonatomic request for information
from a database. Usually a report formats and combines several
related queries.
Reporting Strategy
A top down collection of methodology, products, plans, and
teams that ensure business people can get information
reliably, accurately, and understandably. It includes choosing
tools matched to the organization's particular needs and
existing infrastructure, capturing the business models used by
the business people, finding source data, integrating all the
above into a data warehouse and/or data marts as needed.
Security
The right data for the right person. Note that a business
analyst may need access to summaries of data s/he should not
see. Security systems need to make this easy to implement
while making sure outsiders or rogue employees do not see data
they should not see.
Snowflake Schema
A layering of Star Schema that scales that technique to handle
an entire warehouse.
Star Schema
A standard technique for designing the summary tables of a
data warehouse. "Fact" tables each join to a larger
number of independent "dimension" tables. The tables
may be partially denormalized for performance, but most
queries will still need to join in one or more of the star
tables.
|