Cartes mémoires KDDM

Cartes-fiches	203
Langue	Deutsch
Catégorie	Informatique
Niveau	Université
Crée / Actualisé	01.08.2019 / 03.08.2019
Lien de web	https://card2brain.ch/box/20190801_kddm
Intégrer	<iframe src="https://card2brain.ch/box/20190801_kddm/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What is an epsilon representative sample?

for all rules h: |L_s(h) - L_D(h)| <= epsilon

What is the main aspect of uniform convergence?

What is the idea of concentration measures?

empirical loss converges to true loss (law of large numbers)
quantify how fast empirical risk converges
- markov inequality: P(x > epsilon) <= E(x) / epsilon
- Chebyshev' inequality: P(|z - E(z)| > epsilon) <= Var(z) / epsilon^2
  - i.e. probability for bad hypothesis: P(|L_s(h) - L_D(h) | > epsilon) <= 1/(epsilon^2 * m)

What is PAC-learnable?

finite hypthesis classes: m <= |H| / (epsilon^2 * delta) (chebyshev's inequality)
infinite hypothesis classes i.e. threshold functions IF hypothesis classes are restricted (VC(H) < inf)

What is shattering.

H shatters a finite set C (subset of X) if |H_c| = 2^|c|

Define VC-dimension (1). How to show this (2)?

maixmal size of C that can be shattered by H
to show:
- it exists a set C of size d that is shattered by H >> VC(H) >= d
- every set C of size d+1 can't be shattered by H >> VC(H) = d

What is the VC-dim of linear classifier (with b=0 and b != 0)

d
d+1

Why is it useful to use a surrogate loss function? (2)

easier to solve (global optimum unfeasable for large m, d >> NP-hard)
- i.e. analytically solvable
- else: iterative optimization (i.e. gradient descent)
approximates original loss

What is the main idea of SVMs? (3)

What does the solution depend on?

transform features in highdimensional space
assumes separable data
maximizing margin from hyperplane
- constraint optimization problem: maximize minimal distance to hyperplane
- = minimize length of w
solution only depends on support vectors (y_i * f(x_i) = 1)

Why is the dual form useful? (3) How to recieve it?

primal form not analytically solvable
get rid of constraint: y_i * f(x_i) - 1 >= 0
inner product <x_j, x_i> included in solution = similarity (distance) of two samples >> kernel introducable
done with lagrangian multiplier (same direction of gradient)

What is the difference between hard and soft SVMs?

hard: no classification error allowed (linear separable)
soft: slack-variables introduced >> hinge loss to handle margin-violations

What is the hilbert space? (1) Characterise it. (2)

generalisation of euclidean space
character
- complete: all cauchy sequences converge (minization)
- inner product exists

What is the Gran-matrix? How does it scale? What does it represent?

K(x_i, x_j)
= similarity of x_i, x_j
scales with # of samples (m)

Name two famous kernels.

polynomial kernel
- includes correlations = crossterms
radial base function (RBF) / gaussian kernel
- distance of features non-linear weighted

What are sources for errors? (2) How to tackle them? (2, 3)

overfitting = large difference between test-error (large) and train-error (small)
- reduce VC(H)
- increase |S| = m
underfitting = large training-error
- increase VC(H)
- increase d (number of features)
- or: problem not learnable

What are learning curves good for? How do act in both cases?

see if train- and test-error are likely to converge
- if not: reduce complexity of learner
- if true: increase m for convergence

Explain the KDD-process (4)

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

data cleaning and integration (60%)
transformation and selection (features, reduction of dimension)
data mining (pattern)
visualisation / evaluation (presentation of results)
>> iterative

Name and briefly explain basic mining tasks (5)

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

frequent pattern mining
- find co-occuring items, association rules
clustering
- find (dis-) similiarity function between objects, given no labels
classification
- find function to distinguish classes to predict new objects, given labels
regression
- find function to describe output to predict new objects
outlier detection
- find objects which do not comply general behvaior of data

Name the general setting of frequent pattern mining concerning the data, the goal, how frequency is defined is defined. Name a naive approach.

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

database with transactions (set of items)
goal: finding patterns, associations, correlations
frequent if: supp(X) >= minSup
- \(supp(X) = | \{X \subseteq T\} | / |D|\)
- How often X occurs in a transactions within database
naive approach: count frequency of all possible subsets (2^m itemsets)

What is the apriori principle?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

non-empty subset of frequent itemset is frequent
- \(A' \subset A: supp(A') \ge supp(A) \ge minSup\)
superset of non-frequent itemset is non-frequent
- \(A' \supset A: supp(A') \le supp(A) < minSup\)

Explain the apriori-algorithm: general idea and steps.

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

count 1- itemsets, 2-itemsets, 3-...
with counting k+1-itemsets only where those all subsets of length k were frequent

produce candidates: join L_k with itslef, discard non-frequent itemsets which are included
1. if items are ordered: p and q are joined if same k-1 items
prove candidates: increment count of candidates if contained in T

What are the requirements for the candidates of the apriori algorithm?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

completeness (superset property): \(C_{k+1} \supseteq L_{k+1}\)
selectiveness: significantly smaller than set of all k+1 subsets

What is meant by pruning in case of apriori-algorithm?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

applying apriori principle: remove those candidates which contain non-frequent k-subsets: \(s \notin L_k\)

Explain how the counting of candidate support can be supported by hash trees. Go in detail how to search, insert and count.

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

useful if the support of huge amount of candidates is counted
search: apply hash-function
insert: search for leaf-node
- insert itemset into leafnode
- in case of overflow
  - leaf gets internal node, distribute entries according to hash-function
counting:
- root: compute hash for all items
- internal node: determine hash value, continue search
- leaf: itemset contained in T or not (+1 or not)

Name the performance bottlenecks of apriori algorithm (2), what could be a solution?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

huge candidate sets (many frequent itemsets exists)
multiple DB-scans (n = length of longest pattern)
solution: candidate generation? >> FP-trees

What is the general idea of FP-tree?

Explain the steps.

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

idea: creating conditional databases associated with one frequent item to avoid costy DB-scans

scan DB once, order frequent items desc
scan again
1. keep only frequent items, desc
2. does a common prefix exist?
  1. yes: increment counter, append suffix in tree
  2. no: create new branch with prefix

What are the benefits of FP-tree? (2)

completeness: long pattern unbroken (complete information)
compactness:
- reduce irrelevant information
- frequency in descending order
- compression (same prefix)

Explain how to mine in a FP-tree.

recursively grow frequent pattern path (3)
- create conditional pattern base (3)
  - start from header
  - visit all nodes via links (node-link property = ...)
  - accumulate all prefix paths with frequency
- construct conditional FP-tree pattern
- mine tree, grow frequent pattern (2)
  - base case: if conditional FP-tree only contains single path: enumerate all pattern/combinations
  - else: create conditional pattern base >> recursive call

What is the pattern growth property?

"abcdef" is frequent iff "abcde" is frequent AND "f" is frequent in transactions containing "abcde" (conditional frequency)

What are the advantages of FP-growth? (4 + 1)

no candidate generation
compact data structure
no repeated DB-scans
operations: counting and tree construction
>> faster than apriori

What is a closed / maximal frequent itemset?

closed: for all \(Y \supset X: supp(Y) < supp(X)\)
- complete information regarding frequent itemsets
maximal: for all \(Y \supset X: supp(Y) < minSup\)
- implies closed

KDDM

Créer ou copier des fichiers d'apprentissage

Créer ou copier des fichiers d'apprentissage

Connecte-toi pour voir toutes les cartes.

SWITCHaai

Office 365

Edulog

Apple ID

Google