KDDM
KDDM@LMU
KDDM@LMU
Fichier Détails
Cartes-fiches | 203 |
---|---|
Langue | Deutsch |
Catégorie | Informatique |
Niveau | Université |
Crée / Actualisé | 01.08.2019 / 03.08.2019 |
Lien de web |
https://card2brain.ch/box/20190801_kddm
|
Intégrer |
<iframe src="https://card2brain.ch/box/20190801_kddm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Créer ou copier des fichiers d'apprentissage
Avec un upgrade tu peux créer ou copier des fichiers d'apprentissage sans limite et utiliser de nombreuses fonctions supplémentaires.
Connecte-toi pour voir toutes les cartes.
What is an epsilon representative sample?
- for all rules h: |L_s(h) - L_D(h)| <= epsilon
What is the main aspect of uniform convergence?
What is the idea of concentration measures?
- empirical loss converges to true loss (law of large numbers)
- quantify how fast empirical risk converges
- markov inequality: P(x > epsilon) <= E(x) / epsilon
- Chebyshev' inequality: P(|z - E(z)| > epsilon) <= Var(z) / epsilon^2
- i.e. probability for bad hypothesis: P(|L_s(h) - L_D(h) | > epsilon) <= 1/(epsilon^2 * m)
What is PAC-learnable?
- finite hypthesis classes: m <= |H| / (epsilon^2 * delta) (chebyshev's inequality)
- infinite hypothesis classes i.e. threshold functions IF hypothesis classes are restricted (VC(H) < inf)
What is shattering.
- H shatters a finite set C (subset of X) if |H_c| = 2^|c|
Define VC-dimension (1). How to show this (2)?
- maixmal size of C that can be shattered by H
- to show:
- it exists a set C of size d that is shattered by H >> VC(H) >= d
- every set C of size d+1 can't be shattered by H >> VC(H) = d
What is the VC-dim of linear classifier (with b=0 and b != 0)
- d
- d+1
Why is it useful to use a surrogate loss function? (2)
- easier to solve (global optimum unfeasable for large m, d >> NP-hard)
- i.e. analytically solvable
- else: iterative optimization (i.e. gradient descent)
- approximates original loss
What is the main idea of SVMs? (3)
What does the solution depend on?
- transform features in highdimensional space
- assumes separable data
- maximizing margin from hyperplane
- constraint optimization problem: maximize minimal distance to hyperplane
- = minimize length of w
- solution only depends on support vectors (y_i * f(x_i) = 1)
Why is the dual form useful? (3) How to recieve it?
- primal form not analytically solvable
- get rid of constraint: y_i * f(x_i) - 1 >= 0
- inner product <x_j, x_i> included in solution = similarity (distance) of two samples >> kernel introducable
- done with lagrangian multiplier (same direction of gradient)
What is the difference between hard and soft SVMs?
- hard: no classification error allowed (linear separable)
- soft: slack-variables introduced >> hinge loss to handle margin-violations
What is the hilbert space? (1) Characterise it. (2)
- generalisation of euclidean space
- character
- complete: all cauchy sequences converge (minization)
- inner product exists
What is the Gran-matrix? How does it scale? What does it represent?
- K(x_i, x_j)
- = similarity of x_i, x_j
- scales with # of samples (m)
Name two famous kernels.
- polynomial kernel
- includes correlations = crossterms
- radial base function (RBF) / gaussian kernel
- distance of features non-linear weighted
What are sources for errors? (2) How to tackle them? (2, 3)
- overfitting = large difference between test-error (large) and train-error (small)
- reduce VC(H)
- increase |S| = m
- underfitting = large training-error
- increase VC(H)
- increase d (number of features)
- or: problem not learnable
What are learning curves good for? How do act in both cases?
- see if train- and test-error are likely to converge
- if not: reduce complexity of learner
- if true: increase m for convergence
Explain the KDD-process (4)
- data cleaning and integration (60%)
- transformation and selection (features, reduction of dimension)
- data mining (pattern)
- visualisation / evaluation (presentation of results)
- >> iterative
Name and briefly explain basic mining tasks (5)
- frequent pattern mining
- find co-occuring items, association rules
- clustering
- find (dis-) similiarity function between objects, given no labels
- classification
- find function to distinguish classes to predict new objects, given labels
- regression
- find function to describe output to predict new objects
- outlier detection
- find objects which do not comply general behvaior of data
Name the general setting of frequent pattern mining concerning the data, the goal, how frequency is defined is defined. Name a naive approach.
- database with transactions (set of items)
- goal: finding patterns, associations, correlations
- frequent if: supp(X) >= minSup
- \(supp(X) = | \{X \subseteq T\} | / |D|\)
- How often X occurs in a transactions within database
- naive approach: count frequency of all possible subsets (2m itemsets)
What is the apriori principle?
- non-empty subset of frequent itemset is frequent
- \(A' \subset A: supp(A') \ge supp(A) \ge minSup\)
- superset of non-frequent itemset is non-frequent
- \(A' \supset A: supp(A') \le supp(A) < minSup\)
Explain the apriori-algorithm: general idea and steps.
- count 1- itemsets, 2-itemsets, 3-...
- with counting k+1-itemsets only where those all subsets of length k were frequent
- produce candidates: join Lk with itslef, discard non-frequent itemsets which are included
- if items are ordered: p and q are joined if same k-1 items
- prove candidates: increment count of candidates if contained in T
What are the requirements for the candidates of the apriori algorithm?
- completeness (superset property): \(C_{k+1} \supseteq L_{k+1}\)
- selectiveness: significantly smaller than set of all k+1 subsets
What is meant by pruning in case of apriori-algorithm?
- applying apriori principle: remove those candidates which contain non-frequent k-subsets: \(s \notin L_k\)
Explain how the counting of candidate support can be supported by hash trees. Go in detail how to search, insert and count.
- useful if the support of huge amount of candidates is counted
- search: apply hash-function
- insert: search for leaf-node
- insert itemset into leafnode
- in case of overflow
- leaf gets internal node, distribute entries according to hash-function
- counting:
- root: compute hash for all items
- internal node: determine hash value, continue search
- leaf: itemset contained in T or not (+1 or not)
Name the performance bottlenecks of apriori algorithm (2), what could be a solution?
- huge candidate sets (many frequent itemsets exists)
- multiple DB-scans (n = length of longest pattern)
- solution: candidate generation? >> FP-trees
What is the general idea of FP-tree?
Explain the steps.
- idea: creating conditional databases associated with one frequent item to avoid costy DB-scans
- scan DB once, order frequent items desc
- scan again
- keep only frequent items, desc
- does a common prefix exist?
- yes: increment counter, append suffix in tree
- no: create new branch with prefix
What are the benefits of FP-tree? (2)
- completeness: long pattern unbroken (complete information)
- compactness:
- reduce irrelevant information
- frequency in descending order
- compression (same prefix)
Explain how to mine in a FP-tree.
- recursively grow frequent pattern path (3)
- create conditional pattern base (3)
- start from header
- visit all nodes via links (node-link property = ...)
- accumulate all prefix paths with frequency
- construct conditional FP-tree pattern
- mine tree, grow frequent pattern (2)
- base case: if conditional FP-tree only contains single path: enumerate all pattern/combinations
- else: create conditional pattern base >> recursive call
- create conditional pattern base (3)
What is the pattern growth property?
- "abcdef" is frequent iff "abcde" is frequent AND "f" is frequent in transactions containing "abcde" (conditional frequency)
What are the advantages of FP-growth? (4 + 1)
- no candidate generation
- compact data structure
- no repeated DB-scans
- operations: counting and tree construction
- >> faster than apriori
What is a closed / maximal frequent itemset?
- closed: for all \(Y \supset X: supp(Y) < supp(X)\)
- complete information regarding frequent itemsets
- maximal: for all \(Y \supset X: supp(Y) < minSup\)
- implies closed
-
- 1 / 203
-