Premium Partner

Jonas Wagner machine_learning_and_malware.pdf

Jonas Wagner machine_learning_and_malware.pdf

Jonas Wagner machine_learning_and_malware.pdf


Kartei Details

Karten 15
Sprache English
Kategorie Technik
Stufe Universität
Erstellt / Aktualisiert 19.06.2019 / 19.06.2019
Lizenzierung Keine Angabe
Weblink
https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf
Einbinden
<iframe src="https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What is unsupervised learning?

  • Learn the inherit structure of data without labels
  • We are given features only
  • Used for clustering and dimensionality reduction

What are the 4 stages of a machine learning pipeline?

  1. Collect examples (e.g. SPAM and not-SPAM mails) . These examples are used to train the machine learning system.
  2. Extract features from each training example to represent the ecample as an array of numbers. This step also includes research to design good features that will help a machine learning system make accurate inferences.
  3. Train the machine learning system using the features we have extracted from e.g. recognizing SPAM or not-SPAM mails.
  4. Test the system on some examples not included in the training examples.

What is done in the collecting phase of the machine learning pipeline and what is important?

  • Quality of data is very important
    • Data scientists spend a lot of time collectiong and cleaning up data
    • The better and more divers the data, the better the model
  • If the data does not contain certain cases, that model won't recognize it, e.g. can't detect cats in pictures if the training data doesn't contain pictures of cats.
  • Needs to have labels
    • This is a huge problem in IT-Security e.g. have 1mio binaries to train good/bad clasifier -> how do you know which ones are actually good or bad?

What are the characteristics of the extract phase of the machine learning pipeline?

  • A macine learning algorithm needs features of examples to work with, for a good/bad classifiert on files this might be:
    • Is it digitally signed?
    • Are the file headers malformed?
    • Is the entropy of the file high? (could indicate encryption)
  • Challenge: Needs a lot of domain knowledge
    • What are good featurs?
    • Is every feature of the same importance
    • How do I hanle a large amount of features