Premium Partner

Jonas Wagner machine_learning_and_malware.pdf

Jonas Wagner machine_learning_and_malware.pdf

Jonas Wagner machine_learning_and_malware.pdf


Kartei Details

Karten 15
Sprache English
Kategorie Technik
Stufe Universität
Erstellt / Aktualisiert 19.06.2019 / 19.06.2019
Lizenzierung Keine Angabe
Weblink
https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf
Einbinden
<iframe src="https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What is machine learning?

Machine learning is the science of getting computer to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and informatin in the form of observations and real-world interactions.

What are the benefits of machine learning?

  • Autonomy and scale easily, are better than humans at specific learned tasks.
  • Adaptability and flexebility
    • Attackers change their behavior, so should defenders
    • Autonomous reajustment based on new data
  • Make certain techinques possible that would not be feasible without machine learning.

What are the drawbacks of machine learning?

  • High data dependency
    • The system is only as good as the data it was trained on
    • Easy to overfit
  • Good models are not necessarily good for production
    • Models can be too slow and complex
  • Blackbox results are difficult to interpret

What is supervised learning?

  • learn a function that maps input to output
  • The inference process requires labeled data
  • We are giving features and labels
  • used for classification and regression

What is unsupervised learning?

  • Learn the inherit structure of data without labels
  • We are given features only
  • Used for clustering and dimensionality reduction

What are the 4 stages of a machine learning pipeline?

  1. Collect examples (e.g. SPAM and not-SPAM mails) . These examples are used to train the machine learning system.
  2. Extract features from each training example to represent the ecample as an array of numbers. This step also includes research to design good features that will help a machine learning system make accurate inferences.
  3. Train the machine learning system using the features we have extracted from e.g. recognizing SPAM or not-SPAM mails.
  4. Test the system on some examples not included in the training examples.

What is done in the collecting phase of the machine learning pipeline and what is important?

  • Quality of data is very important
    • Data scientists spend a lot of time collectiong and cleaning up data
    • The better and more divers the data, the better the model
  • If the data does not contain certain cases, that model won't recognize it, e.g. can't detect cats in pictures if the training data doesn't contain pictures of cats.
  • Needs to have labels
    • This is a huge problem in IT-Security e.g. have 1mio binaries to train good/bad clasifier -> how do you know which ones are actually good or bad?

What are the characteristics of the extract phase of the machine learning pipeline?

  • A macine learning algorithm needs features of examples to work with, for a good/bad classifiert on files this might be:
    • Is it digitally signed?
    • Are the file headers malformed?
    • Is the entropy of the file high? (could indicate encryption)
  • Challenge: Needs a lot of domain knowledge
    • What are good featurs?
    • Is every feature of the same importance
    • How do I hanle a large amount of features