Relational and non-relational data model in the context of business intelligence
Business Intelligence provides valuable insight and allows for better interpretation and presentation and at the same time it can help in better decision-making, which can ultimately mean a competitive advantage. What does it mean relational and non-relational model in this context?
In previous article we have gone through the development of database systems. The history has shown us how this sector has responded to the increasingly demanding requirements both of professional and general public. With the advent of computers and the Internet the amount of analyzable data has increased enormously. By now, these data have had the increasing importance and value. A suitable analysis provides valuable insight and allows for better interpretation and presentation and at the same time it can help in better decision-making, which can ultimately mean a competitive advantage. Our view of the advantages and problems of the already mentioned technologies will be in the end given in the context with the concepts of Web system integration for decision making support (business intelligence). This context should serve for better understanding the issue. Let’s start with the description of several important terms.
Vertical and horizontal scaling
Scaling is the answer to the question of how to improve performance while maintaining efficiency in response to the development of the needs of the system. Vertical scaling means a qualitative change in the existing elements of the system (upgrade/degrade of the computer hardware, change in throughput of the network connectivity, and so on.). The horizontal scaling means the quantitative change of elements of the system (at high load another computer may help with data processing).
The transaction is indivisible summary of the operations, which are performed either all at once or none of the operations is performed and the entities remain unchanged (i.e. money transfer at a bank). This approach ensures consistency and integrity of data in time.
Relational database model
The term “relational” database model relates to the theory of sets. The basic constructor of the relational databases is relations (tables) that contain records (rows). Most relational databases can be scaled horizontally, i.e. spread over multiple nodes (machines), which, however, brings certain difficulties.
To create relationships (relations) between the data so called foreign keys are used. Individual tables can be, based on a defined relationship, combined into a single logical unit (by means of JOIN operations). Joining tables of large horizontally scalable database may be expensive and slow, as the final connection of relations must be made on one node and also a large data transfer between various nodes of the database system can occur. Relational databases support transactions. Properties of transactions in the relational model include atomicity, consistency, isolation and durability (ACID) (explained below). Safety, consistency and integrity of data in time are in many cases redeemed by lower performance of the database.
Non-relational database model
It has been referred to as NoSQL, presented in its beginning as No-SQL (not SQL) and later rather as a Not-only SQL. Even the author Carlo Strozzi in 2010 argued that the indication NoSQL was highly imprecise, and suggested a more accurate indication NoREL as non-relational. NoSQL is a new database concept that allows for “fast and efficient processing of the data with a focus on performance, reliability and agility” (source 1).
In the beginning NoSQL databases were an antithesis of relational databases. They were focused on the availability with the shortest possible response time, with the ability to run on commodity hardware, without the possibility to define a data structure, without the support of the relations of individual records and without the possibility of performing connecting operations (JOIN operations). The relational database uses transactions with an emphasis on ACID; NoSQL uses the BASE approach with transactions (as explained below). This approach is not as strict as ACID. It allows for a temporary data inconsistency for increased availability and performance. As this does not block recording (unlike transactions in ACID), it manifests as a rapid response even at the cost of temporary inconsistencies in the stored data.
The descriptions of the basic types of NoSQL databases (document-oriented, the key-value type, BigTable type, graph databases and time series database type), as well as the advantages and problems will be discussed in the next section focused on BigData.
The high availability and scalability of data with NoSQL databases is closely related to the so-called. CAP theorem.
ACID vs. BASE
In the previous section we learned that each the relational and non-relational database uses a different approach to transactions. However, the distribution of approaches to ACID and BASE is not strictly dependent on the type of database model. Today, the approaches merge with some databases (with some relational databases level of transaction "isolation" can be set whereas some NoSQL databases allow for transactions with the ACID properties and definition of relations).
The acronym ACID:
Atomicity - The transaction will be done either completely, or not at all, i.e. it is atomic.
Consistency – The transaction will convert the database from one consistent condition to another one. The registration only is guaranteed with valid data without commanding integrity constraints.
Isolation - Changes within one transaction before its completion are not visible to other transactions.
Durability - Changes to a successfully completed transaction are stored in the database permanently.
The acronym BASE (source 2):
Basically Available – It operates basically all the time, ...
Soft-State - ... may not be consistent throughout the period, ...
Eventual consistency - ... but ultimately it will be in a condition known to us.
Nice and well arranged comparison of the two approaches is outlined here (source 2)
orientation to commit
complex evolution (e.g. .: diagram)
|weak consistency (old data)
approximate answers are OK
supply of data “as soon as possible”
CAP, also Brewer's Theorem (source 3)
The CAP theorem divides databases into three groups. Brewer says that a distributed system can meet up to two of the three properties:
Consistency - at one moment there are the same available data on all nodes (not to be confused with the consistency in the description of ACID database properties)
Availability - each requirement is served, either successfully or unsuccessfully within a negligibly short time
Partition tolerance - functional, despite network failures or outages / outage of individual nodes
This indicates that most relational databases are of the CA type, whereas NoSQL databases are more or less of the AP type.
Relational vs. non-relational model
The mentioned properties indicate that the relational database can be used wherever we work with the data with the structure with minimal changes. Distribution of relational databases into multiple parts is possible, but very inefficient because of the very characteristics of relational databases (data interrelation, JOINs and ACID transactions, blocking at recording). Due to poorer properties at horizontal scaling it is recommended to work with an acceptable amount of data due to hardware performance and the consistency of the data should be the main requirement. With horizontal scaling, it is good to share data so that the nodes have amongst themselves as little as possible of shared links, and so that they are the most autonomous and independent. Logically interrelated data should be stored on a single node.
In contrast, NoSQL databases can access data very quickly, but at the cost of an inconsistency between nodes. Their almost unlimited horizontal scaling and maintaining linear performance growth is possible. Since there is a sharp increase in data that is needed to be stored, analyzed and processed, this feature is very important.
We presented relational and NoSQL database models with descriptions of their advantages and disadvantages. The question whether to choose relational or NoSQL database is a misleading question. The purpose of this description of properties only is to show the advantages and disadvantages of individual approaches and outline how to get the most out of both approaches. NoSQL databases cannot fully replace relational databases yet, but they offer a lot of new directions for working with data.
Database in BI
In the beginning of the article we mentioned the decision support system. This system can include a large number of tools and systems. It can include enterprise resource planning (ERP), accounting systems, customer relationship management (CRM), records of users' movements on the Web or otherwise for the company analytically important and interesting data. Data from these systems can be appropriately transformed for subsequent analysis and if necessary complemented by the analyzed data with the added value.
It should be noted that we should never perform analysis over the production data. Production data are intended for the fast access to detailed information (purchase orders, invoices, goods, ...) and for the required operations (update, inserting a new record, ...). The difference is in the approach. The production database can be accessed by several thousand users at the same time, whereas the analytical data only can be accessed by a limited amount. Furthermore, these data are not generally changed and can be supplemented appropriately. During the analysis we want to have the flexibility to change the criteria and have even difficult results available almost immediately.
The way the production data are dealt with is called OLPT (online transaction processing), whereas on the other hand the analytical approach is called OLAP (online analytical processing). The basic principle of this technology is the OLAP cube (multidimensional table) enabling fast and flexible changes in each dimension.
The online analysis can be appropriately used for two basic needs: reporting and viewing analytic data about the company; or in applications responding to the current requirements of the user. Let’s demonstrate it on two specific cases.
Analytical process in a transport enterprise
Imagine a company providing transportation, which has in every car a built-in instrument for its monitoring. Basic data will be completed with the geo-location data. In addition to the consumption and mileage we can also monitor the traffic situations in individual locations depending on the time or worse technical condition of the vehicle in case of higher consumption of automotive fuel per mileage. In the ad hoc reports we can respond flexibly and capture sudden fluctuations compared to normal condition. We can implement a system of notifications and other processes of the automatic response.
Product recommendations by analyzing social networks
Product data of an online store can be expanded with a data of user behavior on social networks, which is an increasingly popular practice (description of data mining from social networks is not the subject of this article). Based on user-defined relationships we are able to offer to our clients products of close friends. In addition, a suitable analysis can be used to gain insight into the behavior and actions of our competitors; we can target advertising campaigns more appropriately and in a more personalized way and so on.
In the first example where we work with geolocation information it is appropriate to use a database with implemented geolocation functions (e.g.: non-relational MongoDB document-oriented database). For processing of data from social networks non-relational graph databases are perfect. For both presented examples both product relational databases and the use of non-relational database for the needs of advanced analysis and some internal processes is offered.
We can invent a great number of examples like this regarding the implementation of a combination of relational and NoSQL databases. However, this should be sufficient just to give you an idea. We will discuss the above mentioned kinds of non-relational databases, as well as other issues focusing on the concept of BigData next time.
Daniel G. McCreary and Ann M. Kelly Making sense of NoSQL: a guide for managers and the rest of us. (Shelter Island: Manning, 2013. ISBN 978-161-7291-074).
Christof Strauch, NoSQL Databases, Hochschule der Medien, (Stuttgart, http://www.christof-strauch.de/nosqldbs.pdf)
Dr. Eric A. Brewer, Towards Robust. Distributed Systems (July 19, 2000, https://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf)