For a MPP system one would rather not distribute the data by date. The reason is that data is often requested in form of timeintervals, meaning that some nodes are idle.
Aggregations on a MPP system can be often done without any data exchange between the nodes.
Joins are more complex. Local joins are possible if the tables to be joined have the same distribution key. Otherwise one get a global join, which needs redistribution of data between the nodes.
The resdistribution overhead is critical to make the system valueable.
What means ACID?
Stands for Atomicity, Consistency, Isolation, Durability;
ACID gives guiding principles that can ensure database transactions can be processed reliably.
Atomicity: Essentially, an atomic transaction ensures that any commit you make finishes the entire operation successfully or, in cases of the connection being lost in the middle of an operation, the database is rolled back to its state prior to the commit being initiated.
Consistency: refers to maintaining data integrity constraints. If a databse enters into an illegal state the process will be aborted and changes roll back to their previous, legal state.
Isolation: Isolated transactions are considered to be "serializeable". Multiple transaction at the same time can only occur as long as the have no possibility to influence each other.
Durability: ensures that changes made to the database (transactions) that are successfully commited will survive permanently.
MPP System basically avoid Overhead necessary for Online Transaction Processing (OLTP). What does this mean?
minimal locking: no row locks but table locks only
MPP is not approbriate for e-business systems, where constantly new entries need to be implemented on a table.
we have only bulk loads and do then queries
column oriented design and compression
small table redundancy replication instead of broadcasting
no Redo-/Archive Logs: no point-in -time restore as we are used from a Oracle database. We actually don't need that because we can roll back.
update = delete+insert
What is a database?
A databse system is a data management system that follows the ACID standard.
It consists of two important components. The database management system (DBSM) and the actuual database.
What is a relational database?
enable a structured storage of data.
relational databases use SQL
it is designed with the ER Diagram (Entity Relationship Diagram), which helps to explain the logical structure of databases.
What is a NoSQL database?
Relatively young technology that stores data in a unstructured form.
What are the advantages of database systems?
Avoidance of redundancy.
DBSM systems allow for complex data analysis methods.
DBSM systems are request-optimized. data can be read very fast.