Revision: Distributed Databases
Distributed Databases
Definition
-
A single database that is spread physically across computers in multiple locations that are connected by data communication links (database stored in more than one place)
Why have distributed databases?
-
Allows local business units to have control over data
-
Allows data in local databases to be used together for decision making based upon the entire dataset
Reduce telecommunications costs by using the local database, rather than a distant option
Reduces the risk of telecommunications failures having a major impact, as less telecommunications hardware used
The database is spread across different sites
Each remote site has the data that is relevant to itself only
There are 3 core types:
-
Partitioned between sites
-
Entire databases duplicated at each site
-
Central database with remote local database
Partitioned between sites
-
Not every location (node) needs to have all the data. Therefore the partitioned approach is giving each node the data that is relevant to itself.
-
If data is required by the node that is not held at the local database then a request for the data can be sent through the central computer (which holds a copy of all data or can link to it).
-
The central copy is updated during periods where the load on the database is less – in general this is overnight.
-
The data is split between sites either:
……
Advantages
-
Data stored close to where it is used leading to increase in efficiency
-
Local access optimization leading to better performance
-
Only relevant data is available which leads to better security
Disadvantages
-
Accessing data across partitions (different sites) leads to inconsistent access speed
-
No data replication makes backups essential
-
Potential exists for inconsistency in the data stored
-
Additional disadvantage for vertical:
……
Entire databases duplicated at each site
-
Instead of holding only the data that is relevant at each node, copies of the entire database are held at each node.
-
There is a problem with data integrity – Assume that node B updates record 1 locally. Node C also updates record 1 locally but after node B – hence we can assume that the node C data is more up to date and therefore more correct.
This is solved by effective record locking and effective database management software to control access to the data.
Hardware requirements are heavy as each node needs enough equipment to be able to handle the entire database.
Central database with remote local databases
-
No data is held at the local node, instead an index is held locally and this is used to find and then access the data is in the central database.
-
Indexes are the key data used to search the main database. Re-sorting an index into order when data is changed takes time, but a sorted index allows for fast searching of data
-
Very little hardware is required at the nodes, but the indexes need updating. In this method there is a lot of network traffic.
-
A ‘light’ alternative is to store the databases relevant to individual sites at that site, with an index being given to all databases
-
When data is required, the index gives the location of the data – this is not a central location but the location of the site that holds the required database
Advantages and Disadvantages of Methods 2 and 3
Advantages and disadvantages
-
Centralized database is useful for statistical analysis (e.g. sales figures) and backup
-
A distributed database may be less secure with more points of access for hackers
-
Decentralizing increases complexity but reduces network traffic
-
Poor record locking and DBMS causes data reliability/integrity problems
Implementing Distributed Database Systems
Advantages
-
Organizational structure
-
Breaks the network down with greater control over local access
-
Security
-
Can readily limit read/write access to different areas of the database
-
Local autonomy
-
Each local area is responsible for maintenance of its database/access (could be a disadvantage if one site is not up to scratch)
-
Errors are simpler to correct at a local level than a national level
-
Improved availability (not all or nothing)
-
Give access to parts of the database as required (some parts just require longer access time)
-
Improved reliability (replication)
-
Faster performance when held locally
……
Disadvantages
-
Complexity
-
To maintain indexes, locations, updating, etc is complex
-
Cost
-
Often requires processing and storage at each site
……
-
Security
-
Many locations and entry points to the system that need to be accounted for
-
Integrity control
-
Maintenance of data integrity needs to be maintained – it must not be possible to have one record updated in two sites at the same time
推荐阅读
【汇总】2016年雅思听力考试回忆及真题解析汇总
【汇总】2016年雅思阅读考试回忆及真题解析汇总
【汇总】2016年雅思写作考试回忆及真题解析汇总
【汇总】2016年雅思口语考试回忆及真题解析汇总
若想获取更多详尽出国留学攻略以及雅思备考资讯,可以打开我们【上海新东方雅思网】,涵盖雅思真题机经,雅思写作、口语、听力、阅读以及留学名校介绍等,也许就能找到你真正需要的。上海新东方雅思网在这里预祝各位考生学习顺利,都能考取自己满意的学校。
上海新东方寒假班报名地址
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
版权及免责声明
①凡本网注明"稿件来源:新东方"的所有文字、图片和音视频稿件,版权均属新东方教育科技集团(含本网和新东方网) 所有,任何媒体、网站或个人未经本网协议授权不得转载、链接、转贴或以其他任何方式复制、发表。已经本网协议授权的媒体、网站,在下载使用时必须注明"稿件来源:新东方",违者本网将依法追究法律责任。
② 本网未注明"稿件来源:新东方"的文/图等稿件均为转载稿,本网转载仅基于传递更多信息之目的,并不意味着赞同转载稿的观点或证实其内容的真实性。如其他媒体、网站或个人从本网下载使用,必须保留本网注明的"稿件来源",并自负版权等法律责任。如擅自篡改为"稿件来源:新东方",本网将依法追究法律责任。
③ 如本网转载稿涉及版权等问题,请作者见稿后在两周内速来电与新东方网联系,电话:010-60908555。