# Lernkarten

Karten 18 Karten 1 Lernende Deutsch Universität 11.08.2019 / 11.08.2019 Keine Angabe
0 Exakte Antworten 18 Text Antworten 0 Multiple Choice Antworten

What is the main idea of LossyCounting algorithm? (4)

• frequent pattern mining on stream data
• error below user-specific threshold (epsilon)
• Stream divided into buckets w = 1/epsilon
• runtime inversely proportional to epsilon
• count items in bucket, when bucket full remove infrequent items

What is special about clustering of data streams? (2)

• maintain continuously consistent good clustering of observed sequence
• memory and time efficient

What are requirements of stream clustering algorithms? (3)

• fast, incremental processing
• tracking changes
• fast identification of outliers

Explain LEADER algorithm.(3). What does it depend on? (2)

• next object:
• find closest cluster
• if d(cluser, object) < delta >> assign to object cluster
• else create new cluster
• depends on
• threshold delta (good choice)
• order of incoming objects

Explain stream k-means (3)

• stream into chunks
• apply k-means on each chunk
• optional: k-means on cluster centers to get overall k-means clustering

What microclusters consists of (3)? How is the centroid defined each? How is the radius of each micro-cluster defined?

• microcluster (CF) consists of (3)
• N: # points
• LS: sum of points
• SS: squared sum of points
• centroid: LS/N
• radius: sqrt(SS/N - (LS/N)^2)

What is the main idea of BIRCH? (2)

• online component
• average microclusters in tree structure
• offline component
• apply global clustering on all leaf entries

What is the main idea of CluStream? (2: 3, 2)

• online step
• if new point x is close (below threshold) to microcluster clu: assign x to clu
• else: create new microcluster
• if too many microclusters: merge closest two
• offline macro-clustering (on demand)
• find k macro-clusters in time horizon h
• apply k-means on micro-clusters >> k macro clusters