A
distributed database is a database that is under the control of a central
database management system (DBMS) in which storage devices are not all attached
to a common CPU. It may be stored in multiple computers located in the same
physical location, or may be dispersed over a network of interconnected
computers.
Collections of data (e.g. in a database) can be
distributed across multiple physical locations. A distributed database can
reside on network servers on the Internet, on corporate intranets or extranets,
or on other company networks. The replication and distribution of databases
improves database performance at end-user worksites.
To ensure that the distributive databases are up to
date and current, there are two processes: replication and duplication.
Replication involves using specialized software that looks for changes in the
distributive database. Once the changes have been identified, the replication
process makes all the databases look the same. The replication process can be
very complex and time consuming depending on the size and number of the
distributive databases. This process can also require a lot of time and
computer resources. Duplication on the other hand is not as complicated. It
basically identifies one database as a master and then duplicates that
database. The duplication process is normally done at a set time after hours.
This is to ensure that each distributed location has the same data. In the
duplication process, changes to the master database only are allowed. This is
to ensure that local data will not be overwritten. Both of the processes can
keep the data current in all distributive locations.
Besides distributed database replication and
fragmentation, there are many other distributed database design technologies.
For example, local autonomy, synchronous and asynchronous distributed database
technologies. These technologies' implementation can and does depend on the needs
of the business and the sensitivity/confidentiality of the data to be stored in
the database, and hence the price the business is willing to spend on ensuring
data security, consistency and integrity.
Basic architecture
A database User accesses the distributed database
through:
Local
applications;
Applications
which do not require data from other sites.
Global
applications:
Applications
which do require data from other sites.
A distributed database does not share main memory or
disks.
A centralized database has all its data on one place.
As it is totally different from distributed database which has data on
different
places. In centralized database as all the data reside on one place
so problem of bottle-neck can occur, and data availability is not efficient as
in distributed database. Let me define some advantages of distributed database,
it will clear the difference between centralized and distributed database.
Advantages of Data Distribution
The primary advantage of distributed database systems
is the ability to share and access data in a reliable and efficient manner.
Data sharing
and Distributed Control:
If a number of different sites are connected to each
other, then a user at one site may be able to access data that is available at
another site. For example, in the distributed banking system, it is possible
for a user in one branch to access data in another branch. Without this
capability, a user wishing to transfer funds from one branch to another would
have to resort to some external mechanism for such a transfer. This external
mechanism would, in effect, be a single centralized database.
The primary advantage to accomplishing data sharing by
means of data distribution is that each site is able to retain a degree of
control over data stored locally. In a centralized system, the database
administrator of the central site controls the database. In a distributed
system, there is a global database administrator responsible for the entire
system. A part of these responsibilities is delegated to the local database
administrator for each site. Depending upon the design of the distributed
database system, each local administrator may have a different degree of
autonomy which is often a major advantage of distributed databases.
Reliability
and Availability:
If one site fails in distributed system, the remaining
sited may be able to continue operating. In particular, if data are replicated
in several sites, transaction needing a particular data item may find it in
several sites. Thus, the failure of a site does not necessarily imply the
shutdown of the system.
The failure of one site must be detected by the
system, and appropriate action may be needed to recover from the failure. The
system must no longer use the service of the failed site. Finally, when the
failed site recovers or is repaired, mechanisms must be available to integrate
it smoothly back into the system.
Although recovery from failure is more complex in
distributed systems than in a centralized system, the ability of most of the
systems to continue to operate despite failure of one site, results in
increased availability. Availability is crucial for database systems used for
real-time applications. Loss of access to data, for example, in an airline may
result in the loss of potential ticket buyers to competitors.
Speedup
Query Processing:
No comments:
Post a Comment