DataMgmt
Data Mgmt Chärtli
Data Mgmt Chärtli
Kartei Details
Karten | 81 |
---|---|
Sprache | English |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 31.05.2023 / 31.05.2023 |
Lizenzierung | Keine Angabe |
Weblink |
https://card2brain.ch/box/20230531_datamgmt
|
Einbinden |
<iframe src="https://card2brain.ch/box/20230531_datamgmt/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Traditional databases are designed to handle structured data, while Big Data databases are designed to handle unstructured or semi-structured data. Big Data databases also typically use distributed computing to process large amounts of data.
Some challenges associated with Big Data include storing and processing large amounts of data, ensuring data quality, and dealing with unstructured or semi-structured data.
Hadoop is an open-source software framework that allows for distributed storage and processing of large datasets across clusters of computers. It uses Hadoop Distributed File System (HDFS) for storage and MapReduce for processing.
HDFS is a distributed file system that provides high-throughput access to application data. It is designed to store very large files across multiple machines in a cluster.
HDFS stores data by breaking it into blocks and replicating those blocks across multiple machines in a cluster. This allows for fault tolerance and high availability of the data.
Using commodity hardware in a Hadoop cluster can be more cost-effective than using specialized hardware, as it allows for scaling out by adding more commodity machines as needed.
MapReduce is a programming model used for processing large datasets in parallel across clusters of computers. It consists of two phases: map phase and reduce phase.
In the map phase, data is divided into smaller chunks and processed in parallel across multiple machines. In the reduce phase, the results from the map phase are combined to produce a final output.