Jonas Wagner machine_learning_and_malware.pdf
Jonas Wagner machine_learning_and_malware.pdf
Jonas Wagner machine_learning_and_malware.pdf
15
0.0 (0)
Kartei Details
Karten | 15 |
---|---|
Sprache | English |
Kategorie | Technik |
Stufe | Universität |
Erstellt / Aktualisiert | 19.06.2019 / 19.06.2019 |
Lizenzierung | Keine Angabe |
Weblink |
https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf
|
Einbinden |
<iframe src="https://card2brain.ch/box/20190619_jonas_wagner_machinelearningandmalware_pdf/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
What is unsupervised learning?
- Learn the inherit structure of data without labels
- We are given features only
- Used for clustering and dimensionality reduction
What are the 4 stages of a machine learning pipeline?
- Collect examples (e.g. SPAM and not-SPAM mails) . These examples are used to train the machine learning system.
- Extract features from each training example to represent the ecample as an array of numbers. This step also includes research to design good features that will help a machine learning system make accurate inferences.
- Train the machine learning system using the features we have extracted from e.g. recognizing SPAM or not-SPAM mails.
- Test the system on some examples not included in the training examples.
What is done in the collecting phase of the machine learning pipeline and what is important?
- Quality of data is very important
- Data scientists spend a lot of time collectiong and cleaning up data
- The better and more divers the data, the better the model
- If the data does not contain certain cases, that model won't recognize it, e.g. can't detect cats in pictures if the training data doesn't contain pictures of cats.
- Needs to have labels
- This is a huge problem in IT-Security e.g. have 1mio binaries to train good/bad clasifier -> how do you know which ones are actually good or bad?
What are the characteristics of the extract phase of the machine learning pipeline?
- A macine learning algorithm needs features of examples to work with, for a good/bad classifiert on files this might be:
- Is it digitally signed?
- Are the file headers malformed?
- Is the entropy of the file high? (could indicate encryption)
- Challenge: Needs a lot of domain knowledge
- What are good featurs?
- Is every feature of the same importance
- How do I hanle a large amount of features