Architecture of Database Systems
12.1 Introduction
Software systems generally have an architecture, ie. possessing of a structure (form) and organisation (function). The former describes identifiable components and how they relate to one another structurally; the latter describes how the functions of the various structural components interact to provide the overall functionality of the system as a whole. Since a database system is basically a software system (albeit complex), it too possesses an architecture. A typical architecture must define a particular configuration of and interaction between data, software modules, meta-data, interfaces and languages (see Figure 12-1).
The architecture of a database system determines its capability, reliability, effectiveness and efficiency in meeting user requirements. But besides the visible functions seen through some data manipulation language, a good database architecture should provide:
-
a) Independence of data and programs
b) Ease of system design
c) Ease of programming
d) Powerful query facilities
e) Protection of data
Figure 12-1: General database system architecture
12.2 Data Abstraction
To meet the requirements above, a more sophisticated architecture is in fact used, providing a number of levels of data abstraction or data definition. The database schema, also known as Conceptual Schema, mentioned above represents an information model at the logical level of data definition. At this level, we abstract out details like computer storage structures, their restrictions, or their operational efficiencies. The view of a database as a collection of relations or tables, each with fixed attributes and primary keys ranging over given domains, is an example of a logical level of data definition.
The details of efficiently organising and storing objects of the conceptual schema in computers with particular hardware configurations are dealt with at the internal (storage) level of data definition. This level is also referred to as the Internal Schema. It maps the contents of the conceptual schema onto structures representing tuples, associated key organisations and indexes, etc, taking into account application characteristics and restrictions of a given computer system. That is, the DBA describes at this level how objects of the conceptual schema are actually organised in a computer. Figure 12-2 illustrates these two levels of data definition.
Figure 12-2: The logical and internal levels of data abstraction
Thus, modern database systems support three levels of data abstraction: External Schemas (User Views), Conceptual Schema, and Internal (Storage) Schema.
The DDL we discussed in earlier chapters is basically a tool only for conceptual schema definition. The DBA will therefore usually need special languages to handle the external and internal schema definitions. The internal schema definition, however, varies widely over different implementation platforms, ie. there are few common principles for such definition. We will therefore say little more about them in this book.
Figure 12-3: User views (external schema)
For example, suppose the following relations are defined:
-
Customer( C#, Cname, Ccity, Cphone )
Product( P#, Pname, Price )
Transaction( C#, P#, Date, Qnt )
- Define View My_Transaction_1 As
- Select Cname, Ccity, Date, Total_Sum=Price*Qnt
From Customer, Transaction, Product
Where Customer.C# = Transaction.C#
& Transaction.P# = Product.P#
- My_Transaction_1( Cname, Ccity, Date, Total_Sum )
Figure 12-4: External view definition
12.3 Data Administration
Functions of a DBA include:
- Creation of the database
To create a database, a DBA has to analyse and assess the requirements of the users and from these determine its logical structure. In other words, the DBA has to design a conceptual schema and a first variant of an internal schema. When the internal schema is ready, the DBA must load the database with actual data.
- Acting as intermediary between users and the database
A DBA is responsible for all user facilities determined by external schemas, ie. the DBA is responsible for defining all external schemas or user views.
- Ensuring data privacy, integrity and security
In analysing user requirements, a DBA must determine who should have access to which data and subsequently arrange for appropriate privacy locks (passwords) for identified individuals and/or groups. The DBA must also determine integrity constraints and arrange for appropriate data validation to ensure that such constraints are never violated. Last, but not least, the DBA must make arrangements for data to be regularly backed up and stored in a safe place as a measure against unrecoverable data losses for one reason or another.
- improve database performance as data usage patterns changes or becomes clearer
- add new applications to meet new processing requirements
- modify the conceptual schema as understanding of the enterprise’s perception of data improves
- performance monitoring
- database reorganisation
- database restructuring
The restructuring of the conceptual schema implies changing its contents, such as:
- adding/removing data items (ie. columns of a relation)
- adding/removing entire relations
- splitting/recombining relations
- changing a relation’s primary keys
- etc
-
Transaction( C#, P#, R#
, Date, Qnt )
: new attribute added
- Representative( R#, Rname, Rcity, Rphone)
Data independence refers to the independence of one user view (external schema) with respect to others. A high degree of independence is desirable as it will allow a DBA to change one view, to meet new requirements and/or to optimise performance, without affecting other views. Relational databases with appropriate relational sub-languages have a high degree of data independence.
For example, suppose that the view
- My_Transaction_1( Cname, Ccity, Date, Total_Sum )
Define View My_Transaction_1 As Select Cname, Pname , Qnt, Total_Sum=Price*Qnt From Customer, Transaction, Product Where Customer.C# = Transaction.C# & Transaction.P# = Product.P# | Pname replaces the original specification of Ccity and Date items |
Data independence is also used to refer to the independence of user views relative to the conceptual schema. For example, the reader can verify that the change in the conceptual schema in the last section (adding the attribute R# to Transaction and adding the new relation Representative), does not affect My_Transaction_1 - neither the original nor the changed view!. In general, if the relations and attributes referred to in a view definition are not removed in a restructuring, the view will not be affected. Thus we can accommodate new (additive) requirements without affecting existing applications.
Lastly, data independence may also refer to the extent to which we may change the storage schema without affecting the conceptual or external schemas. We will not elaborate on this as we have pointed out earlier that the storage level is too diverse for meaningful treatment here.
12.5 Data Protection
There are generally three types of data protection that any serious DBMS must provide. These were briefly described in Chapter 1 and we summarise them here:
- 1. Authorisational Security
This refers to protection against unauthorised access and includes measures such as user identification and password control, privacy keys, etc.
2. Operational Security
This refers to maintaining the integrity of data, ie. protecting the database from the introduction of data that would violate identified integrity constraints.
3. Physical Security
This refers to procedures to protect the physical data against accidental loss or damage of storage equipment, theft, natural disaster, etc. It will typically involve making periodic backup copies of the database, transaction journalling, error recovery techniques, etc.
- Range Product X ALL;
(X.Price > 100 & X.Price < 99999 )
Suppose, for example, that the last backup was done at time t0, and subsequent to that, a number of update transactions were applied one after another. Suppose further that the first n transactions were successfully completed, but during the (n+1)th transaction a system failure occurred (eg. disk malfunction, operating system crash, power failure, etc) leaving some pages in a corrupted state.
In general, it is not possible to just reapply the failed transaction - the failure could have corrupted the updates performed by previous transactions as well, or worse, it could have damaged the integrity of the storage model as to make some pages of the database unreadable! We have no recourse at this point but to go back to the last known consistent state of the database at time t0, ie. the entire contents of the last backup is reinstated as the current database. Of course, in doing so, all the transactions applied after t0 are lost.
At this point it may seem reasonable that, to guard against losing too much work, backups should perhaps be done after each transaction - then at most only the work of one transaction is lost in case of failure. However, many database applications today are transaction intensive typically involving many online users generating many transactions frequently (eg. online airline reservation system). Many databases, on the other hand, are very large and an entire backup could take hours to complete. While backup is being performed the database must be inactive. Thus, it should be clear that this proposition is impractical.
As it is clearly desirable that transactions since the last backup are also somehow saved in the event of crashes, an additional mechanism is needed. Essentially, such mechanisms are based on journalling successful transactions applied to a database. This simply means that a copy of each transaction (or affected pages) is recorded in a sequential file as they are applied to the database.
The simplest type of journalling is the Forward System Journal. In this, whenever a page is modified, a copy of the modified page is also simultaneously recorded into the forward journal.
To illustrate this mechanism, let the set of pages in a database be P = {p1, p2, ... pn}. If the application of an update transaction T on the database changes PT, where PT Í P, then T(PT) will be recorded in the forward journal. We use the notation T(PT) to denote the set of pages PT after the transaction T has changed each page in PT. Likewise, we write T(pi) to denote a page pi after it has been changed by transaction T. Furthermore, if T was applied successfully (ie. no crash during its processing), a separator mark, say ‘;’, would be written to the journal. Thus, after a number of successful transactions, the journal would look as follows:
< T(PT1) ; T(PT2) ; ... T(PTk) ; >
As a more concrete example, suppose transaction
- T1 changed {p1, p2, p3},
- T2 changed {p2, p3, p4}, and
- T3 changed {p3, p4, p5},
Now suppose a crash occurred just after T3 has been applied. The recovery procedure consists of two steps:
- replace the database with the latest backup
- read the system journal in the forward direction (hence the term ‘forward’ journal) and, for each set of journal pages that precedes the separator ‘;’, use it to replace the corresponding pages in the database.
-
< T1( {p1, p2, p3} ) ; T2(
{T1(p2), T1(p3), p4} ) ;
T3( {T2(T1(p3)), T2(p4), ...} ) >
In these cases, the Backward System Journal will be the more appropriate journalling and recovery technique. With this technique, whenever a transaction changes a page, the page contents before the update is saved. As before, if the transaction succesfully completes, a separator is written. Thus the backward journal for the same example as above would be:
-
<{ p1, p2, p3} ;
{ T1(p2), T1(p3),
p4} ; { T2(T1(p3)),
T2(p4), ...} >
12.6 Further Reading
Those of you wanting a more detailed introduction to the relational data model, are strongly recommended to read [Date 1995]. The book is written in a tutorial style with many of the issues, such as the database languages, presented through a progressive series of examples. Another book by the same author [Date 1993] is a readable account of the SQL standard. The book contains many examples and a reader can learn enough to be able to write simple SQL programs or to understand an existing program.
The books [Kroenke 1995], [Benyon 1997] and [Ullmann & Widom1997] are written in a similar tutorial style and describe the Relational Data Model in greater detail providing useful recommendations on actual database design.
Database theory is well presented in [Elmaseri and Navathe 1994] and [Ullmann 1988].
The book [Smith & Barnes 1987] bridges the gap between physical data structures and logical database models. Assuming knowledge of elementary data structures, it describes file processing techniques and introduces fundamentals of database systems as you become aware of design and implementation issues.
The book [Silverston, Inmon, and Graziano 1997] can be useful for those who are interested in the actual application of database systems. It provides a common set of database structures for specific functions common to most businesses, such as sales, marketing, order processing, budgeting, and accounting. The book presents and discusses in greater details different design and implementation techniques. Readers can apply one of such data structures to their own company to meet specific data needs.
References:
[Benyon 1997] Benyon D. Information and Data-Modelling (2nd. Edition), Mc Graw Hill (1997).
[Date 1995] Date, C.J. Introduction to Database Systems (6th Edition), Addison- Wesley Publ. company (1995).
[Date 1993] Date, C.J. A Guide to the SQL Standard. (3rd Edition), Addison-Wesley Publ. company (1993).
[Elmaseri and Navathe 1994] Elmaseri, K. and Navathe, D. Fundamentals of Database Systems (2nd. Edition), Benjamin / Cumming (1994).
[Kroenke 1995] Kroenke D.M. Database Processing. (5th. Edition), Prentice Hall (1995).
[Silverston, Inmon, and Graziano 1997] Silverston L., Inmon W. H. and Graziano K. The Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs, John Willey & Sons (1997).
[Smith & Barnes 1987] Smith, P.D. and Barnes, M.G. Files and Databases: An Introduction, Addison-Wesley Publ. company (1987).
[Ullmann 1988] Ullmann, J.D. Principles Of Database and Knowledge-Base Systems, Volume I: Classical Database Systems, Computer Science Press (1988).
[Ullmann & Widom1997] Ullmann, J.D. and Widom, S. A First Course in Database Systems, Prentice Hall (1997).
No comments:
Post a Comment