Ordnerverwaltung für Jonas Wagner machine_learning_and_malware.pdf
Wähle die Ordner aus, zu welchen Du "Jonas Wagner machine_learning_and_malware.pdf" hinzufügen oder entfernen möchtest
0 Exakte Antworten
15 Text Antworten
0 Multiple Choice Antworten
Karte wurde gelöscht
What is machine learning?
Machine learning is the science of getting computer to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and informatin in the form of observations and real-world interactions.
What are the benefits of machine learning?
Autonomy and scale easily, are better than humans at specific learned tasks.
Adaptability and flexebility
Attackers change their behavior, so should defenders
Autonomous reajustment based on new data
Make certain techinques possible that would not be feasible without machine learning.
What are the drawbacks of machine learning?
High data dependency
The system is only as good as the data it was trained on
Easy to overfit
Good models are not necessarily good for production
Models can be too slow and complex
Blackbox results are difficult to interpret
What is supervised learning?
learn a function that maps input to output
The inference process requires labeled data
We are giving features and labels
used for classification and regression
What is unsupervised learning?
Learn the inherit structure of data without labels
We are given features only
Used for clustering and dimensionality reduction
What are the 4 stages of a machine learning pipeline?
Collect examples (e.g. SPAM and not-SPAM mails) . These examples are used to train the machine learning system.
Extract features from each training example to represent the ecample as an array of numbers. This step also includes research to design good features that will help a machine learning system make accurate inferences.
Train the machine learning system using the features we have extracted from e.g. recognizing SPAM or not-SPAM mails.
Test the system on some examples not included in the training examples.
What is done in the collecting phase of the machine learning pipeline and what is important?
Quality of data is very important
Data scientists spend a lot of time collectiong and cleaning up data
The better and more divers the data, the better the model
If the data does not contain certain cases, that model won't recognize it, e.g. can't detect cats in pictures if the training data doesn't contain pictures of cats.
Needs to have labels
This is a huge problem in IT-Security e.g. have 1mio binaries to train good/bad clasifier -> how do you know which ones are actually good or bad?
What are the characteristics of the extract phase of the machine learning pipeline?
A macine learning algorithm needs features of examples to work with, for a good/bad classifiert on files this might be:
Is it digitally signed?
Are the file headers malformed?
Is the entropy of the file high? (could indicate encryption)