Machine Learning

Machine Learning

Machine Learning

Vincenzo Serratore

Vincenzo Serratore

Fichier Détails

Cartes-fiches 22
Langue Deutsch
Catégorie Informatique
Niveau Université
Crée / Actualisé 30.04.2020 / 02.05.2020
Lien de web
https://card2brain.ch/box/20200430_machine_learning
Intégrer
<iframe src="https://card2brain.ch/box/20200430_machine_learning/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

realizable

if the hypothesis space contains the true functions

Trade off Accuracy vs. Generalization

Tradeoff between complex hypothesis that fit well and a simpler hypothesis that generalize well

Supervised learning (x,y)? y=f(x)? and h?

(x,y) input-output pair

y = f(x) true function to be approximated

h hypothesis

inducitve learning

learning a general function (generalization)

deductive learning

from known general function to a new rule

reinforcment learning

agent decides at the end by its own if id did right (exampel with taxi driver - tip or no tip)

or how to act or behave when given occasional reward

No free lunch theorem

No universally best Model. A set of assumptions work well for a Problem A but necesseraly for Problem B. Diffrent Models lead to diffrent Algo.

Types of Machine Learning

Supervised: input-output value pair

Unsupervised: classification

Reinforcment:

What is Machine Learning

Set of methos that can automaticcaly detect pattern in data and performe predition or other types of decision making.

Unsupervised Learning

Only Input

Detect Patterns (classification)

there exists no obvious error variable

Generalization Error

Expected value of the missclassification rate when averaged over future data => on test set

Hypothesis Space

It contains all possible hypothesis that can be built with the choosen representaion.

-Representation

-Evalutaion

-Optimization

Representation: Classifier must be represented in some formal lanugage. HYpothesis Space of the lerarner.

Evalutaion: objective funtion / scoring functionis needed to distinguish good and bad calssifier.

Optimization: Method to serach among the classifers in the language for the highest scoring one. Chocie of Opti. Algo is keay to performance.

Bias Variance Trade Off

If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.

This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.

-True Error 

-Empirical Error

True Error : The true error is not observable.

Empirical Error: Proportion of examples from sample s element of D missclassifyed by h. Gets better with more data.

Overfitting

Overfitting becomes more likely as the number of hypothesis space and the number of input attributes grows, and less likely as we increase the number of training exampels.

- Distribution of Traning and Test set are not the same.

AnTeDe: Oft ist dies ein Hinweis auf Overfitting (oder dass das die Daten im validation set nicht der Datengrundlage im training set entspricht). In diesem Lab sind allerdings die initialen Parameter schlecht gewählt und auch das Modell ist verbesserungswürdig.

Decision Tree pruning

Pruning comabts overfitting by eliminating nodes that are not clearly relevant,

Wrapper (model selection)

The wrapper enumarates the models accoriding to a parameter e.p size. For each size, is uses corssvalidation on Learner to complete the average errror rate on the traning and test set. The corss validation procedure selects the one with the lowest validation set error.

Regularization

Explicity penalize complex hypothesis - looks for a function htat is more regular or less complex. Loss function and complexits function.

PAC probably approximatly correct.

The underlying principle is that any hypothesis that is seriously
wrong will almost certainly be “found out” with high probability after a small number
of examples, because it will make an incorrect prediction. Thus, any hypothesis that is consistent
with a sufficiently large set of training examples is unlikely to be seriously wrong: that is,
it must be probably approximately correct.

Batch / Batch Size

When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.

 

 

  • Batch Gradient Descent. Batch Size = Size of Training Set
  • Stochastic Gradient Descent. Batch Size = 1
  • Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set

Epoche

The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset.

One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.

You can think of a for-loop over the number of epochs where each loop proceeds over the training dataset. Within this for-loop is another nested for-loop that iterates over each batch of samples, where one batch has the specified “batch size” number of samples.

The number of epochs is traditionally large, often hundreds or thousands, allowing the learning algorithm to run until the error from the model has been sufficiently minimized. You may see examples of the number of epochs in the literature and in tutorials set to 10, 100, 500, 1000, and larger.