BDMA_final

BDMA@LMU

0.0 (0)

Set of flashcards Details

Flashcards	18
Language	Deutsch
Category	Computer Science
Level	University
Created / Updated	11.08.2019 / 10.10.2019
Licencing	Not defined
Weblink	https://card2brain.ch/box/20190811_bdmafinal
Embed	<iframe src="https://card2brain.ch/box/20190811_bdmafinal/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Flip-Mode

What is the main idea of LossyCounting algorithm? (4)

frequent pattern mining on stream data
error below user-specific threshold (epsilon)
Stream divided into buckets w = 1/epsilon
- runtime inversely proportional to epsilon
count items in bucket, when bucket full remove infrequent items

What is special about clustering of data streams? (2)

What are requirements of stream clustering algorithms? (3)

Explain LEADER algorithm.(3). What does it depend on? (2)

next object:
- find closest cluster
- if d(cluser, object) < delta >> assign to object cluster
- else create new cluster
depends on
- threshold delta (good choice)
- order of incoming objects

Explain stream k-means (3)

What microclusters consists of (3)? How is the centroid defined each? How is the radius of each micro-cluster defined?

microcluster (CF) consists of (3)
- N: # points
- LS: sum of points
- SS: squared sum of points
centroid: LS/N
radius: sqrt(SS/N - (LS/N)^2)

What is the main idea of BIRCH? (2)

What is the main idea of CluStream? (2: 3, 2)

online step
- if new point x is close (below threshold) to microcluster clu: assign x to clu
- else: create new microcluster
- if too many microclusters: merge closest two
offline macro-clustering (on demand)
- find k macro-clusters in time horizon h
- apply k-means on micro-clusters >> k macro clusters