Jonas Wagner machine_learning_and_malware.pdf

0.0 (0)

Kartei Details

Karten	15
Sprache	English
Kategorie	Technik
Stufe	Universität
Erstellt / Aktualisiert	19.06.2019 / 19.06.2019
Weblink	https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf
Einbinden	<iframe src="https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Kartenliste

What is machine learning?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

Machine learning is the science of getting computer to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and informatin in the form of observations and real-world interactions.

What are the benefits of machine learning?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

Autonomy and scale easily, are better than humans at specific learned tasks.
Adaptability and flexebility
- Attackers change their behavior, so should defenders
- Autonomous reajustment based on new data
Make certain techinques possible that would not be feasible without machine learning.

What are the drawbacks of machine learning?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

High data dependency
- The system is only as good as the data it was trained on
- Easy to overfit
Good models are not necessarily good for production
- Models can be too slow and complex
Blackbox results are difficult to interpret

What is supervised learning?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

learn a function that maps input to output
The inference process requires labeled data
We are giving features and labels
used for classification and regression

What is unsupervised learning?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

Learn the inherit structure of data without labels
We are given features only
Used for clustering and dimensionality reduction

What are the 4 stages of a machine learning pipeline?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

Collect examples (e.g. SPAM and not-SPAM mails) . These examples are used to train the machine learning system.
Extract features from each training example to represent the ecample as an array of numbers. This step also includes research to design good features that will help a machine learning system make accurate inferences.
Train the machine learning system using the features we have extracted from e.g. recognizing SPAM or not-SPAM mails.
Test the system on some examples not included in the training examples.

What is done in the collecting phase of the machine learning pipeline and what is important?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

Quality of data is very important
- Data scientists spend a lot of time collectiong and cleaning up data
- The better and more divers the data, the better the model
If the data does not contain certain cases, that model won't recognize it, e.g. can't detect cats in pictures if the training data doesn't contain pictures of cats.
Needs to have labels
- This is a huge problem in IT-Security e.g. have 1mio binaries to train good/bad clasifier -> how do you know which ones are actually good or bad?

What are the characteristics of the extract phase of the machine learning pipeline?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

A macine learning algorithm needs features of examples to work with, for a good/bad classifiert on files this might be:
- Is it digitally signed?
- Are the file headers malformed?
- Is the entropy of the file high? (could indicate encryption)
Challenge: Needs a lot of domain knowledge
- What are good featurs?
- Is every feature of the same importance
- How do I hanle a large amount of features

What are the charecteristics of the Training phase in the machine learning pipeline?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

based on the extracted features (and optionally labels), a model can be trained with an algorithm.
The choice of the algorithm is always based on what you're trying to achive.

What are the characteristics of the testing phase in the machine learning pipeline?

Tastatur-Befehle:

= drehen,

= vor-/rückwärts,

= scrollen

After training the quality of the model is tested with test examples
Usually done by splitting collected data into training (80%) and test (20%) set. If not, then you will run into over or underfitting.

What is the difference of over and underfitting?

Underfitet -> doesn't match the data set well bad for detection

Overfittet -> matches to exactly will not detect other samples that are simular.

What is cross validation?

split dataset into k parts, take k tests with different training and test (or validation) sets and average the scores of all tests

What is done in a machine learning cycle?

This pipeline usually gets repeated many times until a model with good accuracy is found.
The world keeps changing, e.g. new attacker data appears frequently, which means the model needs to be retrained with new data.

What is the difference between machine learning vs. deep learning?

Machine learning: feature exrtraction is done by humans

Deep learning: everything is done by machines.

What are the applications of machine learning on malware?

Classification of PE files
Detecting malicious code loading in processes
Detect code simularities

1 / 15

Kartenliste

Lernen

Jonas Wagner machine_learning_and_malware.pdf

Lernkarteien erstellen oder kopieren

Lernkarteien erstellen oder kopieren

Melde dich an, um alle Karten zu sehen.

SWITCHaai

Office 365

Edulog

Apple ID

Google