KDDM
KDDM@LMU
KDDM@LMU
Kartei Details
Karten | 203 |
---|---|
Sprache | Deutsch |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 01.08.2019 / 03.08.2019 |
Weblink |
https://card2brain.ch/box/20190801_kddm
|
Einbinden |
<iframe src="https://card2brain.ch/box/20190801_kddm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Lernkarteien erstellen oder kopieren
Mit einem Upgrade kannst du unlimitiert Lernkarteien erstellen oder kopieren und viele Zusatzfunktionen mehr nutzen.
Melde dich an, um alle Karten zu sehen.
What is an epsilon representative sample?
- for all rules h: |L_s(h) - L_D(h)| <= epsilon
What is the main aspect of uniform convergence?
What is the idea of concentration measures?
- empirical loss converges to true loss (law of large numbers)
- quantify how fast empirical risk converges
- markov inequality: P(x > epsilon) <= E(x) / epsilon
- Chebyshev' inequality: P(|z - E(z)| > epsilon) <= Var(z) / epsilon^2
- i.e. probability for bad hypothesis: P(|L_s(h) - L_D(h) | > epsilon) <= 1/(epsilon^2 * m)
What is PAC-learnable?
- finite hypthesis classes: m <= |H| / (epsilon^2 * delta) (chebyshev's inequality)
- infinite hypothesis classes i.e. threshold functions IF hypothesis classes are restricted (VC(H) < inf)
What is shattering.
- H shatters a finite set C (subset of X) if |H_c| = 2^|c|
Define VC-dimension (1). How to show this (2)?
- maixmal size of C that can be shattered by H
- to show:
- it exists a set C of size d that is shattered by H >> VC(H) >= d
- every set C of size d+1 can't be shattered by H >> VC(H) = d
What is the VC-dim of linear classifier (with b=0 and b != 0)
- d
- d+1
Why is it useful to use a surrogate loss function? (2)
- easier to solve (global optimum unfeasable for large m, d >> NP-hard)
- i.e. analytically solvable
- else: iterative optimization (i.e. gradient descent)
- approximates original loss
What is the main idea of SVMs? (3)
What does the solution depend on?
- transform features in highdimensional space
- assumes separable data
- maximizing margin from hyperplane
- constraint optimization problem: maximize minimal distance to hyperplane
- = minimize length of w
- solution only depends on support vectors (y_i * f(x_i) = 1)
Why is the dual form useful? (3) How to recieve it?
- primal form not analytically solvable
- get rid of constraint: y_i * f(x_i) - 1 >= 0
- inner product <x_j, x_i> included in solution = similarity (distance) of two samples >> kernel introducable
- done with lagrangian multiplier (same direction of gradient)
What is the difference between hard and soft SVMs?
- hard: no classification error allowed (linear separable)
- soft: slack-variables introduced >> hinge loss to handle margin-violations
What is the hilbert space? (1) Characterise it. (2)
- generalisation of euclidean space
- character
- complete: all cauchy sequences converge (minization)
- inner product exists
What is the Gran-matrix? How does it scale? What does it represent?
- K(x_i, x_j)
- = similarity of x_i, x_j
- scales with # of samples (m)
Name two famous kernels.
- polynomial kernel
- includes correlations = crossterms
- radial base function (RBF) / gaussian kernel
- distance of features non-linear weighted
What are sources for errors? (2) How to tackle them? (2, 3)
- overfitting = large difference between test-error (large) and train-error (small)
- reduce VC(H)
- increase |S| = m
- underfitting = large training-error
- increase VC(H)
- increase d (number of features)
- or: problem not learnable
What are learning curves good for? How do act in both cases?
- see if train- and test-error are likely to converge
- if not: reduce complexity of learner
- if true: increase m for convergence
Explain the KDD-process (4)
- data cleaning and integration (60%)
- transformation and selection (features, reduction of dimension)
- data mining (pattern)
- visualisation / evaluation (presentation of results)
- >> iterative
Name and briefly explain basic mining tasks (5)
- frequent pattern mining
- find co-occuring items, association rules
- clustering
- find (dis-) similiarity function between objects, given no labels
- classification
- find function to distinguish classes to predict new objects, given labels
- regression
- find function to describe output to predict new objects
- outlier detection
- find objects which do not comply general behvaior of data
Name the general setting of frequent pattern mining concerning the data, the goal, how frequency is defined is defined. Name a naive approach.
- database with transactions (set of items)
- goal: finding patterns, associations, correlations
- frequent if: supp(X) >= minSup
- \(supp(X) = | \{X \subseteq T\} | / |D|\)
- How often X occurs in a transactions within database
- naive approach: count frequency of all possible subsets (2m itemsets)
What is the apriori principle?
- non-empty subset of frequent itemset is frequent
- \(A' \subset A: supp(A') \ge supp(A) \ge minSup\)
- superset of non-frequent itemset is non-frequent
- \(A' \supset A: supp(A') \le supp(A) < minSup\)
Explain the apriori-algorithm: general idea and steps.
- count 1- itemsets, 2-itemsets, 3-...
- with counting k+1-itemsets only where those all subsets of length k were frequent
- produce candidates: join Lk with itslef, discard non-frequent itemsets which are included
- if items are ordered: p and q are joined if same k-1 items
- prove candidates: increment count of candidates if contained in T
What are the requirements for the candidates of the apriori algorithm?
- completeness (superset property): \(C_{k+1} \supseteq L_{k+1}\)
- selectiveness: significantly smaller than set of all k+1 subsets
What is meant by pruning in case of apriori-algorithm?
- applying apriori principle: remove those candidates which contain non-frequent k-subsets: \(s \notin L_k\)
Explain how the counting of candidate support can be supported by hash trees. Go in detail how to search, insert and count.
- useful if the support of huge amount of candidates is counted
- search: apply hash-function
- insert: search for leaf-node
- insert itemset into leafnode
- in case of overflow
- leaf gets internal node, distribute entries according to hash-function
- counting:
- root: compute hash for all items
- internal node: determine hash value, continue search
- leaf: itemset contained in T or not (+1 or not)
Name the performance bottlenecks of apriori algorithm (2), what could be a solution?
- huge candidate sets (many frequent itemsets exists)
- multiple DB-scans (n = length of longest pattern)
- solution: candidate generation? >> FP-trees
What is the general idea of FP-tree?
Explain the steps.
- idea: creating conditional databases associated with one frequent item to avoid costy DB-scans
- scan DB once, order frequent items desc
- scan again
- keep only frequent items, desc
- does a common prefix exist?
- yes: increment counter, append suffix in tree
- no: create new branch with prefix
What are the benefits of FP-tree? (2)
- completeness: long pattern unbroken (complete information)
- compactness:
- reduce irrelevant information
- frequency in descending order
- compression (same prefix)
Explain how to mine in a FP-tree.
- recursively grow frequent pattern path (3)
- create conditional pattern base (3)
- start from header
- visit all nodes via links (node-link property = ...)
- accumulate all prefix paths with frequency
- construct conditional FP-tree pattern
- mine tree, grow frequent pattern (2)
- base case: if conditional FP-tree only contains single path: enumerate all pattern/combinations
- else: create conditional pattern base >> recursive call
- create conditional pattern base (3)
What is the pattern growth property?
- "abcdef" is frequent iff "abcde" is frequent AND "f" is frequent in transactions containing "abcde" (conditional frequency)
What are the advantages of FP-growth? (4 + 1)
- no candidate generation
- compact data structure
- no repeated DB-scans
- operations: counting and tree construction
- >> faster than apriori
What is a closed / maximal frequent itemset?
- closed: for all \(Y \supset X: supp(Y) < supp(X)\)
- complete information regarding frequent itemsets
- maximal: for all \(Y \supset X: supp(Y) < minSup\)
- implies closed
-
- 1 / 203
-