History and Trends in the Development of Databases
In the series about databases, I will introduce the development of this computer industry. We will go from the past through the present to the latest trends. The first part will be an introduction to this issue.
Since the theme of databases is very large, I decided to divide it into three thematically independent units. In this part, we will move together from the need to sort data into the computer age. We will mention the development of database models, database management systems and query languages. Next part will bring different types of database models as well as database management systems and their query languages on the technical side. We will also recapitulate their advantages and problems and get to less frequent types of databases, such as TimeSeries databases, and to possibilities of their use. In the third part, you can expect an excursion into the increasingly mentioned concept of Big Data, we will take a look at specialized database file systems, sorting algorithms, you will be familiarized with open-source technologies Apache (Hbase, Hadoop, etc.) and there will be a little surprise, too.
Establishment of databases as we know them today dates back to the 1950s. However, the need to store and sort data is much older.
Leaving out historic cave paintings and rare clay tablets found separately, the library of Ugarit
(a city in what is now Syria) can be considered to be the first documented mention of a comprehensive effort to store data. There was found a larger amount of clay tablets together with diplomatic texts and literary works originated from the twelfth century BC. Thus efforts to collect data are documented, but not to sort them. The effort to sort data was objectively confirmed in the Roman Forum library. However, this deep history is only a grain of sand in time.
It is the index card what is considered to be a predecessor of computer databases. Its history does not reach back so long. It is associated with the name of the naturalist Carl Linnaeus. In the 18th century AD, he introduced a system to sort his records, where each species was put on a separate sheet of paper. Thanks to it he could easily organize his records and complement them.
Involvement of Electromechanical Machines
Index cards had one big drawback - they had to be handled by a man. That was very restrictive. Therefore, in 1890, an American statistician and inventor Herman Hollerith created a counting machine for the needs of public authorities in the census. This machine used punched cards to store information. This was the first electromechanical data processing. In 1911, there was a merger of four companies, one of which was Hollerith's firm, and the Computing Tabulating Recording Corporation was established. The company later changed its name to International Business Machines Corporation. It still exists and operates under its acronym IBM. Electromechanical data processing was on the technological forefront for the next half century.
State authorities have contributed to a shift in technology and digitization once again. Before World War II (in 1935), the obligation to record the Social Security numbers of employees was enacted in the USA. The IBM Company created a new machine at the request of the authorities. It was launched in 1951, again in the census. It is known as the UNIVAC I and it has a privileged position in the computer history. It was the first mass-produced digital computer for commercial use. Eventually, however, the inefficiency of the use of the source code for database tasks was shown.
Arrival of Programming Languages
On 28 and 29 May 1959, there was a conference, the aim of which was to discuss and consider possibilities of introducing a common computer programming language. A consortium, Conference on Data Systems Languages (known by the acronym CODASYL), was established for this purpose already in 1960 and in the same year, at a conference, this language was described and called Common Business-Oriented Language, abbreviated COBOL (version COBOL-60). I would like to mention a historic contribution of an officer in the US Navy, Grace Hopper, who came up with the idea of programming language close to English. This idea subsequently influenced most programming languages and has survived to this day. Development of programming languages has been associated with the transition from magnetic tapes to magnetic disks, which allowed direct access to data. This laid the foundation of modern database systems as we know them today.
The Database Management System
In 1961, Charles Bachman of General Electric introduced the first data warehouse. At the CODASYL Conference in 1965, some elements and indications of the Bachman's data warehouse database management led to an idea and then creating the concept of database systems. The Database Task Group committee was established and its task was to develop this concept. Bachman was a member as well. In 1971, the committee published a report that defined the database management system. It introduced concepts such as data integrity, data model, database schema, atomicity or entity. The architecture of network database system was described.
During this period (1965-1970), database systems were divided to two basic models: network (most of CODASYL database systems) and hierarchical/tree (IBM implementation). The hierarchical database concept was restricted when modelling reality and the network concept also had its drawbacks. Edgar F. Codd, an IBM employee, realized that and, in 1970, he came with the idea of relational database model. The relational theory included basic operations for working with data (selection, projection, join, union, Cartesian product and difference), which make it possible to carry out all the necessary operations. Frank was also inspired by the idea of Grace Hopper and suggested using understandable terms based on the common English for query language.
SQL and the Golden Age of Databases
The main feature of these database systems was storing structured data. The introduction of algebra, relational calculus and understandable terms led to the creation of SEQUEL language (Structured English Query Language). Promoting this language to version 2 resulted in the structured query language (SQL). This language was approved in 1987 and established as ISO and ANSI standard. Since the 1970s, thanks to the success of the SEQUEL, query languages were inspired by ideas and theories of E. F. Codd. The Ingres project of University of California at Berkeley and language QUEL resulted in development of another type of database model. It was an object-relational database model, known as Postgres. The Postgres and SQL were connected in 1995 when the QUEL query language interpreter was replaced by SQL language interpreter. In 1996, it was renamed to PostgreSQL language.
Everything to Objects
The development of an object-relational model was closely associated with both implemented models. The structured object model experienced a massive development mainly in the 1990s, although its origin is associated with the period of the golden age of databases. The pressure made on object-oriented programming in the 1990s was transferred to the database world. This was the time of boom for purely object-oriented databases. Communication uses the Object Query Language, which is built over the SQL-92 standard, and the Object Definition Language, which is based on the used programming languages. The ODMG-93 standard is created for definition languages as a superset of more general model, the Common Object Model (COM). Hand in hand with programming with the same language, it also communicates, data from databases are acquired and presented. Using the same language reduces the incidence of errors which may be in the steps of migration from the relational model to objects. The advantage of the object-oriented database management systems is the possibility of a direct expression of the complexity of modelled reality in the database, but more on that next time.
Migration to Unstructured "NoSQL” Databases
At the turn of the millennium, there was a certain shift in the perception of data. Boundaries in approach built on a structured data model (relational/object-oriented model) and object application are emerging. This led to dusting off the idea of unstructured database. This idea originated in the late 1960s. The use of this model, however, was very rare at that time. Today unstructured databases called NoSQL (the NoSQL term was first coined in 1998 by Carlo Strozzi) share only this basic idea. The boom of unstructured databases is associated with the Google. It presented its database proposal called BigTable designed for large amounts of data. This proposal was inspired by the Amazon, which presented its project of unstructured database called Dynamo. These proposals of databases later became the basis for today used NoSQL databases. The biggest difference in these proposals was that the databases were not line-oriented like the SQL databases, but column-oriented. More about the difference between line and column data storage will be dealt with next time.
The issue of properly selected data storage has been discussed a lot recently. An appropriately selected data storage is a necessary condition for smooth and efficient running of applications, as well as the appropriate use of technology for working with data. Next time, we will say more on the technical solutions of various database models, the advantages and disadvantages of relational database management systems and query languages. We will go through and define the referred terms as well as the terms such as database integrity, OLAP (online analytical processing) or database scalability.