Discussion of Physical Design



During physical design, our objective is to minimize the time required by users to interact with the information system. So, we concentrate on how to make processing of physical files and the database efficient. While we try to achieve the objective at this phase, we have to make trade-off decisions to find a balance between performance and normalization. Some key points to the implementation of our database design are discussed in following aspects (please also see our SQL source code):

Super-type / Sub-type relationship

In order to model super-type/ sub-type relationships, subtype discriminators were employed during conceptual design since the relational data model does not yet directly support to model this relationship. But after stepping into physical design, subtype discriminators are no longer needed because we can directly identify the subtype by its primary key. For example, if we want to find which member is a moderator, simply scan the moderator table, if the member_id is in the moderator table, then it means this member is a moderator.

Indexing

Here we take advantage of the power of indexing to enhance the performance of the database. Basically those tables with more than 1000 records and most foreign keys are indexed. However, refer to the Process-to-Function Matrix, if the CUD ratio of the table is higher than 50%, we avoid indexing it. Please refer to the SQL source code of our DDL to see the indexing details.

De-normalization

The nature of our client is receiving and answering questions, however, many processes associating with questions, such as searching questions also access answers. Particularly that QUESTION and ANSWER (accepted) are the two tables with the most hits (Please see Process-to-Function Matrix). In order to reduce its access time, our team decided to add an additional / duplicate column to store the final accepted answer. An alternative solution is to cluster QUESTION and ACCEPT, so they will be physically close to each other, and there is no way to process data between two tables faster than this.