Karten 18 Karten
Lernende 2 Lernende
Sprache Deutsch
Stufe Universität
Erstellt / Aktualisiert 11.08.2019 / 10.10.2019
Lizenzierung Keine Angabe
0 Exakte Antworten 18 Text Antworten 0 Multiple Choice Antworten
Fenster schliessen

What is the main idea of LossyCounting algorithm? (4)

  • frequent pattern mining on stream data
  • error below user-specific threshold (epsilon)
  • Stream divided into buckets w = 1/epsilon
    • runtime inversely proportional to epsilon
  • count items in bucket, when bucket full remove infrequent items
Fenster schliessen

What is special about clustering of data streams? (2)

  • maintain continuously consistent good clustering of observed sequence
  • memory and time efficient
Fenster schliessen

What are requirements of stream clustering algorithms? (3)

  • fast, incremental processing
  • tracking changes
  • fast identification of outliers
Fenster schliessen

Explain LEADER algorithm.(3). What does it depend on? (2)

  • next object: 
    • find closest cluster
    • if d(cluser, object) < delta >> assign to object cluster
    • else create new cluster
  • depends on
    • threshold delta (good choice)
    • order of incoming objects
Fenster schliessen

Explain stream k-means (3)

  • stream into chunks
  • apply k-means on each chunk 
  • optional: k-means on cluster centers to get overall k-means clustering
Fenster schliessen

What microclusters consists of (3)? How is the centroid defined each? How is the radius of each micro-cluster defined?

  • microcluster (CF) consists of (3)
    • N: # points
    • LS: sum of points
    • SS: squared sum of points
  • centroid: LS/N
  • radius: sqrt(SS/N - (LS/N)^2)
Fenster schliessen

What is the main idea of BIRCH? (2)

  • online component
    • average microclusters in tree structure
  • offline component
    • apply global clustering on all leaf entries
Fenster schliessen

What is the main idea of CluStream? (2: 3, 2)

  • online step
    • if new point x is close (below threshold) to microcluster clu: assign x to clu
    • else: create new microcluster
    • if too many microclusters: merge closest two
  • offline macro-clustering (on demand)
    • find k macro-clusters in time horizon h
    • apply k-means on micro-clusters >> k macro clusters