DataMgmt
Data Mgmt Chärtli
Data Mgmt Chärtli
Set of flashcards Details
Flashcards | 81 |
---|---|
Language | English |
Category | Computer Science |
Level | University |
Created / Updated | 31.05.2023 / 31.05.2023 |
Weblink |
https://card2brain.ch/box/20230531_datamgmt
|
Embed |
<iframe src="https://card2brain.ch/box/20230531_datamgmt/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Common window functions in SQL include ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, and SUM/AVG/MIN/MAX with an OVER clause.
A stored procedure in SQL is a precompiled set of SQL statements that can be executed repeatedly by calling the procedure name. It allows you to encapsulate complex logic and business rules into a single unit that can be reused across multiple applications.
To create a stored procedure in SQL, you can use the CREATE PROCEDURE statement followed by the name of the procedure and its parameter list (if any). For example: CREATE PROCEDURE my_proc @param1 INT, @param2 VARCHAR(50) AS BEGIN SELECT * FROM my_table WHERE col1 = @param1 AND col2 = @param2 END; This query creates a stored procedure called "my_proc" that takes two parameters (@param1 and @param2) and selects all rows from "my_table" where col1 matches @param1 and col2 matches @param2.
Some benefits of using stored procedures in SQL include improved performance, increased security, reduced network traffic, improved code organization and maintainability, and easier application development.
Normalization in database design is the process of organizing data into tables to reduce redundancy and improve data integrity. It involves breaking down larger tables into smaller ones based on their functional dependencies.
Common normalization forms in database design include first normal form (1NF), second normal form (2NF), third normal form (3NF), Boyce-Codd normal form (BCNF), fourth normal form (4NF), and fifth normal form (5NF). Each successive normal form builds on the previous one to further reduce redundancy and improve data integrity.
Some benefits of normalization in database design include improved data consistency, reduced data redundancy, easier maintenance and updates, improved query performance, and better scalability.
Common Table Expressions (CTEs) in SQL are temporary named result sets that can be defined within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement. They are derived from a simple query and can be used as an alternative to derived tables (subqueries), views, and inline user-defined functions.
The purpose of using CTEs in SQL is to simplify complex queries by breaking them down into smaller, more manageable parts. They can also be used to deal with hierarchical or tree-structured data by using the recursive clause.
CTEs were introduced as part of the SQL:1999 standard and have been supported by Oracle (CONNECT BY), MS SQL Server, DB2, Firebird, and other databases for many years.
To define a CTE in SQL, you use the WITH keyword followed by the name of the CTE and its definition. For example: WITH my_cte AS ( SELECT col1, col2 FROM my_table WHERE col3 = 'value' ) SELECT * FROM my_cte WHERE col1 > 10; This query defines a CTE called "my_cte" that selects columns "col1" and "col2
A key-value store is a type of NoSQL database that stores data as a collection of key-value pairs. Each value can be any type of data, including strings, numbers, or even other key-value pairs.
In a distributed key-value store, data is partitioned across multiple nodes in the system. Each node is responsible for storing a portion of the total dataset. When a client requests data, it sends the request to one of the nodes, which retrieves the data and returns it to the client.
Key-value stores are often used for caching frequently accessed data, storing session information for web applications, and managing user profiles or preferences.
A graph database is a type of NoSQL database that stores data as nodes and edges in a graph structure. Nodes represent entities such as people or objects, while edges represent relationships between them.
Graph databases use traversal-based queries to navigate through the graph structure and retrieve related nodes and edges. This allows for complex queries that can find patterns or connections between different entities.
Graph databases excel at handling complex relationships between entities, making them well-suited for use cases such as social networks, recommendation engines, and fraud detection systems.
A traditional RDBMS stores data in tables with a fixed schema, while NoSQL databases store data in various formats, including document-based, key-value pairs, and graph-based.
NoSQL databases can handle large amounts of unstructured or semi-structured data more efficiently than traditional RDBMS. They also offer greater scalability and flexibility.
Polyglot Persistence refers to the practice of using multiple types of databases to store different types of data within an application. This allows developers to choose the best database for each specific use case.
A document-based database stores data as documents, typically in JSON or BSON format. Each document can have its own unique structure and schema.
Sharding involves partitioning data across multiple servers in order to improve performance and scalability. Each server only holds a portion of the total dataset, allowing for faster queries and easier scaling.
Eventual consistency means that updates made to the database will eventually propagate to all nodes in the system, but there may be some delay or inconsistency during this process. This is often used as a trade-off for improved performance and availability.
A column-family database is a type of NoSQL database that stores data as columns rather than rows. Each column can contain multiple values, and each row can have a different set of columns.
In a column-family database, data is organized into column families, which are similar to tables in a traditional RDBMS. Each column family can have its own schema and indexing strategy.
Column-family databases are often used for storing large amounts of time-series or event-based data, such as log files or sensor readings.
The CAP theorem states that it is impossible for a distributed system to simultaneously provide all three of the following guarantees: consistency, availability, and partition tolerance.
Different types of databases make different trade-offs between consistency, availability, and partition tolerance. For example, traditional RDBMS prioritize consistency over availability and partition tolerance, while NoSQL databases often prioritize availability and partition tolerance over strong consistency.
Sharding can improve partition tolerance by allowing data to be distributed across multiple nodes in the system. However, it can also make it more difficult to maintain strong consistency across all nodes.
Eventual consistency is a property of distributed systems in which updates to data are propagated asynchronously and may take some time to be fully replicated across all nodes in the system. As a result, different nodes may have slightly different views of the data at any given time.
NoSQL databases often use techniques such as vector clocks or conflict resolution algorithms to reconcile conflicting updates and ensure that all nodes eventually converge on a consistent view of the data.
Eventual consistency can make it more difficult to reason about the state of the system at any given time, since different nodes may have different views of the data. It can also make it more difficult to enforce constraints or perform transactions that span multiple nodes.
Polyglot persistence refers to the practice of using multiple types of databases within a single application or system, each optimized for a specific type of data or workload.
Polyglot persistence allows developers to choose the best tool for each job, rather than trying to fit all data into a single database model. This can lead to better performance, scalability, and flexibility.
Polyglot persistence can add complexity to an application or system, since developers must manage multiple types of databases and ensure that they work together seamlessly. It can also make it more difficult to maintain consistency across different types of data.
A data warehouse is a large, centralized repository of data that is used for reporting and analysis. It typically contains historical data from multiple sources, organized in a way that makes it easy to query and analyze.
A data warehouse is optimized for read-heavy workloads and complex queries, while a transactional database is optimized for write-heavy workloads and simple queries. Data warehouses also typically contain denormalized or aggregated data, rather than raw transactional data.
Data warehouses are often used for business intelligence, reporting, and analytics. They can be used to answer questions such as "What were our sales by region last quarter?" or "Which products are most frequently purchased together?"
Data integration refers to the process of combining data from multiple sources into a single, unified view. This can involve tasks such as cleaning and transforming the data, resolving conflicts between different sources, and ensuring that the resulting dataset is consistent and accurate.
Data integration can be challenging due to differences in format, structure, and semantics between different sources of data. It can also be difficult to ensure that the resulting dataset is complete and accurate.