Premium Partner

Databases and Data Warehousing

This is an introduction to databases with emphasis on data warehousing.

This is an introduction to databases with emphasis on data warehousing.


Set of flashcards Details

Flashcards 26
Language English
Category Computer Science
Level Other
Created / Updated 04.03.2021 / 04.03.2021
Licencing Not defined
Weblink
https://card2brain.ch/box/20210304_databases_and_data_warehousing
Embed
<iframe src="https://card2brain.ch/box/20210304_databases_and_data_warehousing/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What are the two main architectures of databases?

  1. Shared Everything (SMP, Symmetric Multiprocessing Architecture)
  2. Shared Disk (SMP multi-node)
    1. Two problems with this system: Distributed shared manager:
      • Applies locks and shares data blocks.
      • Everything comes from one shared storage. Pipeline can be a bottleneck.
    2. examples: Oracle RAC, DB2 Pure Scale
  3. Shared Nothing (MPP, Massively Parallel Processing)
    1. Exasol, IBM Netica, Teradata, SAP HANA, Greenplum

Data distribution of a MPP system.

The goal is that all nodes work equally.

For a MPP system one would rather not distribute the data by date. The reason is that data is often requested in form of timeintervals, meaning that some nodes are idle.

Aggregations on a MPP system can be often done without any data exchange between the nodes.

Joins are more complex. Local joins are possible if the tables to be joined have the same distribution key. Otherwise one get a global join, which needs redistribution of data between the nodes.

The resdistribution overhead is critical to make the system valueable.

 

What means ACID?

Stands for Atomicity, Consistency, Isolation, Durability;

ACID gives guiding principles that can ensure database transactions can be processed reliably.

  • Atomicity: Essentially, an atomic transaction ensures that any commit you make finishes the entire operation successfully or, in cases of the connection being lost in the middle of an operation, the database is rolled back to its state prior to the commit being initiated.
  • Consistency: refers to maintaining data integrity constraints. If a databse enters into an illegal state the process will be aborted and changes roll back to their previous, legal state.
  • Isolation: Isolated transactions are considered to be "serializeable". Multiple transaction at the same time can only occur as long as the have no possibility to influence each other.
  • Durability: ensures that changes made to the database (transactions) that are successfully commited will survive permanently.

MPP System basically avoid Overhead necessary for Online Transaction Processing (OLTP). What does this mean?

 

  1. minimal locking: no row locks but table locks only
    • MPP is not approbriate for e-business systems, where constantly new entries need to be implemented on a table.
    • we have only bulk loads and do then queries
  2. column oriented design and compression
  3. small table redundancy replication instead of broadcasting
  4. no Redo-/Archive Logs: no point-in -time restore as we are used from a Oracle database. We actually don't need that because we can roll back.
  5. update = delete+insert

What is a database?

A databse system is a data management system that follows the ACID standard.

It consists of two important components. The database management system (DBSM) and the actuual database.

What is a relational database?

enable a structured storage of data.

relational databases use SQL

it is designed with the ER Diagram (Entity Relationship Diagram), which helps to explain the logical structure of databases.

 

What is a NoSQL database?

Relatively young technology that stores data in a unstructured form.

What are the advantages of database systems?

  1. Avoidance of redundancy.
  2. DBSM systems allow for complex data analysis methods.
  3. DBSM systems are request-optimized. data can be read very fast.