Premium Partner

KDDM

KDDM@LMU

KDDM@LMU


Kartei Details

Karten 203
Sprache Deutsch
Kategorie Informatik
Stufe Universität
Erstellt / Aktualisiert 01.08.2019 / 03.08.2019
Lizenzierung Keine Angabe
Weblink
https://card2brain.ch/box/20190801_kddm
Einbinden
<iframe src="https://card2brain.ch/box/20190801_kddm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Explain the KDD-process (4)

  1. data cleaning and integration (60%)
  2. transformation and selection (features, reduction of dimension)
  3. data mining (pattern)
  4. visualisation / evaluation (presentation of results)
  5. >> iterative

Name and briefly explain basic mining tasks (5)

  • frequent pattern mining
    • find co-occuring items, association rules
  • clustering
    • find (dis-) similiarity function between objects, given no labels
  • classification
    • find function to distinguish classes to predict new objects, given labels
  • regression
    • find function to describe output to predict new objects
  • outlier detection
    • find objects which do not comply general behvaior of data

Name the general setting of frequent pattern mining concerning the data, the goal, how frequency is defined is defined. Name a naive approach.

  • database with transactions (set of items)
  • goal: finding patterns, associations, correlations
  • frequent if: supp(X) >= minSup
    • \(supp(X) = | \{X \subseteq T\} | / |D|\)
    • How often X occurs in a transactions within database
  • naive approach: count frequency of all possible subsets (2m itemsets)

What is the apriori principle?

  • non-empty subset of frequent itemset is frequent
    • \(A' \subset A: supp(A') \ge supp(A) \ge minSup\)
  • superset of non-frequent itemset is non-frequent
    • \(A' \supset A: supp(A') \le supp(A) < minSup\)

Explain the apriori-algorithm: general idea and steps.

  • count 1- itemsets, 2-itemsets, 3-...
  • with counting k+1-itemsets only where those all subsets of length k were frequent

 

  1. produce candidates: join Lk with itslef, discard non-frequent itemsets which are included
    1. if items are ordered: p and q are joined if same k-1 items
  2. prove candidates: increment count of candidates if contained in T

What are the requirements for the candidates of the apriori algorithm?

  • completeness (superset property): \(C_{k+1} \supseteq L_{k+1}\)
  • selectiveness: significantly smaller than set of all k+1 subsets

What is meant by pruning in case of apriori-algorithm?

  • applying apriori principle: remove those candidates which contain non-frequent k-subsets: \(s \notin L_k\)

 

Explain how the counting of candidate support can be supported by hash trees. Go in detail how to search, insert and count.

  • useful if the support of huge amount of candidates is counted
  • search: apply hash-function
  • insert: search for leaf-node
    • insert itemset into leafnode
    • in case of overflow
      • leaf gets internal node, distribute entries according to hash-function
  • counting: 
    • root: compute hash for all items
    • internal node: determine hash value, continue search
    • leaf: itemset contained in T or not (+1 or not)