Normalized vs. Denormalized
Normalization:
Normalization
is the process of efficiently organizing data in a database. There are
two goals of the normalization process: eliminating redundant data (for
example, storing the same data in more than one table) and ensuring data
dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.
Denormalization:
Normalization: The first three forms
First Normal Form:
No repeating groups. As an example, it might be tempting to make an invoice table with columns for the first, second, and third line item (see above). This violates the first normal form, and would result in large rows, wasted space (where an invoice had less than the maximum number of line items), and *horrible* SQL statements with a separate join for each repetition of the column. First form normalization requires you make a separate line item table, with it's own key (in this case the combination of invoice number and line number) (See below).
Second Normal Form:
Each column must depend on the *entire* primary key. As an example, the customer information could be put in the line item table (see above). The trouble with that is that the customer goes with the invoice, not with each line on the invoice. Putting customer information in the line item table will cause redundant data, with it's inherant overhead and difficult modifications. Second form normalization requires you place the customer information in the invoice table (see below).
Third Normal Form:
Each column must depend on *directly* on the primary key. As an example, the customer address could go in the invoice table (see above), but this would cause data redundancy if several invoices were for the same customer. It would also cause an update nightmare when the customer changes his address, and would require extensive programming to insert the address every time an existing customer gets a new invoice. Third form normalization requires the customer address go in a separate customer table with its own key (customer), with only the customer identifier in the invoice table (see below).
What is the difference between Normalized & denormalized, and when do you use which?
When most
of us design our databases we tend to think of them as related (and
un-related) tables that contain data. It is common to normalize our
tables in order to create a cleaner, more manageable, and intuative
Database design. Normalized means separate tables via foreign key
relationship, there are however things that we should consider before we
normalize our system:
Historical Data:
One of
the most important is historical data. When we normalize our system we
open the door for historically inaccuracy. What's that mean? It means
that we can change the data in the related table, causing us to lose the
accuracy of the data thats related to it.
What do
you mean "lose the accuracy"? Lets look at an example: We have a Member
table which holds a foreign key referrence to the States table. If we
change the data in States table, which is referrenced by the Member
table, we have no way of knowing what that the Member record originally
contained (like change Republic of Congo to Republic of China).
So why
should we care, isn't this the whole reason why we normalize our
database? Yes and No. If we know that we will never have the need to
keep an audit trail or report on the data then surely normalizing it is
perfectly suitable. However we all know that software requirment change,
and there may come a time when the Marketing Department wants to know
the history of how its members data has changed over time, normalization
would destroy this historical record (unless we some how dig it out of a
backup, which would be a major pain in the rear), denormalized does
not.
Reporting:
If you
think that you will ever have to do any kind of Data Mining or Reporting
on your data then avoiding over-normalization is important. Reporting
on, or performing calculations on, large amounts of data requires that
you denormalize your database. Anticipating this during the design
phases can save you a lot of time and grief later on.
No comments:
Post a Comment