BDMA_final
BDMA@LMU
BDMA@LMU
18
0.0 (0)
Set of flashcards Details
Flashcards | 18 |
---|---|
Language | Deutsch |
Category | Computer Science |
Level | University |
Created / Updated | 11.08.2019 / 10.10.2019 |
Licencing | Not defined |
Weblink |
https://card2brain.ch/box/20190811_bdmafinal
|
Embed |
<iframe src="https://card2brain.ch/box/20190811_bdmafinal/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
What is the main idea of LossyCounting algorithm? (4)
- frequent pattern mining on stream data
- error below user-specific threshold (epsilon)
- Stream divided into buckets w = 1/epsilon
- runtime inversely proportional to epsilon
- count items in bucket, when bucket full remove infrequent items
What is special about clustering of data streams? (2)
- maintain continuously consistent good clustering of observed sequence
- memory and time efficient
What are requirements of stream clustering algorithms? (3)
- fast, incremental processing
- tracking changes
- fast identification of outliers
Explain LEADER algorithm.(3). What does it depend on? (2)
- next object:
- find closest cluster
- if d(cluser, object) < delta >> assign to object cluster
- else create new cluster
- depends on
- threshold delta (good choice)
- order of incoming objects
Explain stream k-means (3)
- stream into chunks
- apply k-means on each chunk
- optional: k-means on cluster centers to get overall k-means clustering
What microclusters consists of (3)? How is the centroid defined each? How is the radius of each micro-cluster defined?
- microcluster (CF) consists of (3)
- N: # points
- LS: sum of points
- SS: squared sum of points
- centroid: LS/N
- radius: sqrt(SS/N - (LS/N)^2)
What is the main idea of BIRCH? (2)
- online component
- average microclusters in tree structure
- offline component
- apply global clustering on all leaf entries
What is the main idea of CluStream? (2: 3, 2)
- online step
- if new point x is close (below threshold) to microcluster clu: assign x to clu
- else: create new microcluster
- if too many microclusters: merge closest two
- offline macro-clustering (on demand)
- find k macro-clusters in time horizon h
- apply k-means on micro-clusters >> k macro clusters