BDMA_final

BDMA@LMU

0.0 (0)

Kartei Details

Karten	18
Sprache	Deutsch
Kategorie	Informatik
Stufe	Universität
Erstellt / Aktualisiert	11.08.2019 / 10.10.2019
Lizenzierung	Keine Angabe
Weblink	https://card2brain.ch/box/20190811_bdmafinal
Einbinden	<iframe src="https://card2brain.ch/box/20190811_bdmafinal/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Flip-Modus

What is the main idea of LossyCounting algorithm? (4)

frequent pattern mining on stream data
error below user-specific threshold (epsilon)
Stream divided into buckets w = 1/epsilon
- runtime inversely proportional to epsilon
count items in bucket, when bucket full remove infrequent items

What is special about clustering of data streams? (2)

What are requirements of stream clustering algorithms? (3)

Explain LEADER algorithm.(3). What does it depend on? (2)

next object:
- find closest cluster
- if d(cluser, object) < delta >> assign to object cluster
- else create new cluster
depends on
- threshold delta (good choice)
- order of incoming objects

Explain stream k-means (3)

What microclusters consists of (3)? How is the centroid defined each? How is the radius of each micro-cluster defined?

microcluster (CF) consists of (3)
- N: # points
- LS: sum of points
- SS: squared sum of points
centroid: LS/N
radius: sqrt(SS/N - (LS/N)^2)

What is the main idea of BIRCH? (2)

What is the main idea of CluStream? (2: 3, 2)

online step
- if new point x is close (below threshold) to microcluster clu: assign x to clu
- else: create new microcluster
- if too many microclusters: merge closest two
offline macro-clustering (on demand)
- find k macro-clusters in time horizon h
- apply k-means on micro-clusters >> k macro clusters