KDDM
KDDM@LMU
KDDM@LMU
203
0.0 (0)
Kartei Details
Karten | 203 |
---|---|
Sprache | Deutsch |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 01.08.2019 / 03.08.2019 |
Lizenzierung | Keine Angabe |
Weblink |
https://card2brain.ch/box/20190801_kddm
|
Einbinden |
<iframe src="https://card2brain.ch/box/20190801_kddm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Explain the KDD-process (4)
- data cleaning and integration (60%)
- transformation and selection (features, reduction of dimension)
- data mining (pattern)
- visualisation / evaluation (presentation of results)
- >> iterative
Name and briefly explain basic mining tasks (5)
- frequent pattern mining
- find co-occuring items, association rules
- clustering
- find (dis-) similiarity function between objects, given no labels
- classification
- find function to distinguish classes to predict new objects, given labels
- regression
- find function to describe output to predict new objects
- outlier detection
- find objects which do not comply general behvaior of data
Name the general setting of frequent pattern mining concerning the data, the goal, how frequency is defined is defined. Name a naive approach.
- database with transactions (set of items)
- goal: finding patterns, associations, correlations
- frequent if: supp(X) >= minSup
- \(supp(X) = | \{X \subseteq T\} | / |D|\)
- How often X occurs in a transactions within database
- naive approach: count frequency of all possible subsets (2m itemsets)
What is the apriori principle?
- non-empty subset of frequent itemset is frequent
- \(A' \subset A: supp(A') \ge supp(A) \ge minSup\)
- superset of non-frequent itemset is non-frequent
- \(A' \supset A: supp(A') \le supp(A) < minSup\)
Explain the apriori-algorithm: general idea and steps.
- count 1- itemsets, 2-itemsets, 3-...
- with counting k+1-itemsets only where those all subsets of length k were frequent
- produce candidates: join Lk with itslef, discard non-frequent itemsets which are included
- if items are ordered: p and q are joined if same k-1 items
- prove candidates: increment count of candidates if contained in T
What are the requirements for the candidates of the apriori algorithm?
- completeness (superset property): \(C_{k+1} \supseteq L_{k+1}\)
- selectiveness: significantly smaller than set of all k+1 subsets
What is meant by pruning in case of apriori-algorithm?
- applying apriori principle: remove those candidates which contain non-frequent k-subsets: \(s \notin L_k\)
Explain how the counting of candidate support can be supported by hash trees. Go in detail how to search, insert and count.
- useful if the support of huge amount of candidates is counted
- search: apply hash-function
- insert: search for leaf-node
- insert itemset into leafnode
- in case of overflow
- leaf gets internal node, distribute entries according to hash-function
- counting:
- root: compute hash for all items
- internal node: determine hash value, continue search
- leaf: itemset contained in T or not (+1 or not)