Jonas Wagner machine_learning_and_malware.pdf

0.0 (0)

Set of flashcards Details

Flashcards	15
Language	English
Category	Technology
Level	University
Created / Updated	19.06.2019 / 19.06.2019
Weblink	https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf
Embed	<iframe src="https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Card list

Study

What is unsupervised learning?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Learn the inherit structure of data without labels
We are given features only
Used for clustering and dimensionality reduction

What are the 4 stages of a machine learning pipeline?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Collect examples (e.g. SPAM and not-SPAM mails) . These examples are used to train the machine learning system.
Extract features from each training example to represent the ecample as an array of numbers. This step also includes research to design good features that will help a machine learning system make accurate inferences.
Train the machine learning system using the features we have extracted from e.g. recognizing SPAM or not-SPAM mails.
Test the system on some examples not included in the training examples.

What is done in the collecting phase of the machine learning pipeline and what is important?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Quality of data is very important
- Data scientists spend a lot of time collectiong and cleaning up data
- The better and more divers the data, the better the model
If the data does not contain certain cases, that model won't recognize it, e.g. can't detect cats in pictures if the training data doesn't contain pictures of cats.
Needs to have labels
- This is a huge problem in IT-Security e.g. have 1mio binaries to train good/bad clasifier -> how do you know which ones are actually good or bad?

What are the characteristics of the extract phase of the machine learning pipeline?

Keyboard commands:

= turn,

= for-/backward,

= scroll

A macine learning algorithm needs features of examples to work with, for a good/bad classifiert on files this might be:
- Is it digitally signed?
- Are the file headers malformed?
- Is the entropy of the file high? (could indicate encryption)
Challenge: Needs a lot of domain knowledge
- What are good featurs?
- Is every feature of the same importance
- How do I hanle a large amount of features

What are the charecteristics of the Training phase in the machine learning pipeline?

Keyboard commands:

= turn,

= for-/backward,

= scroll

based on the extracted features (and optionally labels), a model can be trained with an algorithm.
The choice of the algorithm is always based on what you're trying to achive.

What are the characteristics of the testing phase in the machine learning pipeline?

Keyboard commands:

= turn,

= for-/backward,

= scroll

After training the quality of the model is tested with test examples
Usually done by splitting collected data into training (80%) and test (20%) set. If not, then you will run into over or underfitting.

What is the difference of over and underfitting?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Underfitet -> doesn't match the data set well bad for detection

Overfittet -> matches to exactly will not detect other samples that are simular.

What is cross validation?

Keyboard commands:

= turn,

= for-/backward,

= scroll

split dataset into k parts, take k tests with different training and test (or validation) sets and average the scores of all tests

What is done in a machine learning cycle?

Keyboard commands:

= turn,

= for-/backward,

= scroll

This pipeline usually gets repeated many times until a model with good accuracy is found.
The world keeps changing, e.g. new attacker data appears frequently, which means the model needs to be retrained with new data.

What is the difference between machine learning vs. deep learning?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Machine learning: feature exrtraction is done by humans

Deep learning: everything is done by machines.

What are the applications of machine learning on malware?

Classification of PE files
Detecting malicious code loading in processes
Detect code simularities

1 / 15

Card list

Study

Jonas Wagner machine_learning_and_malware.pdf

Create or copy sets of flashcards

Create or copy sets of flashcards

Log in to see all the cards.

SWITCHaai

Office 365

Edulog

Apple ID

Google