Data Analytics Wissen

Klausur

0.0 (0)

Fichier Détails

Cartes-fiches	64
Langue	English
Catégorie	Finances
Niveau	Université
Crée / Actualisé	08.02.2025 / 08.02.2025
Lien de web	https://card2brain.ch/box/20250208_data_analytics_wissen
Intégrer	<iframe src="https://card2brain.ch/box/20250208_data_analytics_wissen/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Liste des carte

Étudier

What can you do to fix missing values?

Solution 1: Discard - only practical if the number of missing records is small. Solution 2: Imputation - replace the missing values with meaningful substitute values (mean, median) -> advantage: we can keep the observation’s non-missing information.

What are missing values? How are they denoted in Python?

If no data value is stored for a variable for an observation; denoted with NaN.

How can we recognise outliers?

Graphically, ordering the variables, using the minimum/maximum values.

What happens when we detect outliers?

We need expertise in the data to determine whether it is an error or a true extreme. Sometimes it is possible to correct the error. If the number of outliers is small and we recognize it as an error: treat it as a missing value.

What is an outlier?

Observation that is extreme (far away) compared to the rest of the data.

How many numbers of dummies?

Number of categories - 1 -> redundant information leads to the failure of algorithms.

What is the goal of the method classification?

The goal is the prediction of a categorical outcome variable.

What is the goal of the method prediction?

The goal is the prediction of a numerical outcome variable.

What methods are used in supervised vs. unsupervised learning?

Supervised learning: classification and prediction; unsupervised learning: association rules, data reduction, data exploration, visualisation.

Is there a training value in supervised vs. unsupervised learning?

In supervised learning a target value is known in the training data, it is the data on which the algorithm is trained; in unsupervised learning there is no target variable for prediction or classification.

What is the goal of supervised vs. unsupervised learning?

Supervised learning's goal is to predict a target or outcome variable; the goal of unsupervised learning is to identify patterns and divide data into meaningful groups.

Name the 9 steps of the Data Mining Process

Define/ Understand the purpose of the analysis;

Obtaining data (possibly including sampling);

Data analysis, cleaning, preparation;

Reduce the data (dimension) (for supervised data mining, partition it);

Specify the analysis goal (classification, prediction, etc.);

Selection of techniques (e.g. regression, logit);

Iterative implementation and tuning;

Evaluation of the results;

Roll-out and widespread use of the best model

What are the core ideas of Data Mining?

Data Analysis, Visualisation, Prediction, Classification, Data Reduction, Association Rules, Recommendation Systems

What are categorical variables?

Ordinal -> values can be ordered logically (e.g. good - ok - bad) and

nominal -> values cannot be ordered logically (e.g. blue, yellow, red)

What are numeric variables?

Continuous -> infinite numbers (e.g. size, time, age) and

Integer -> integer (e.g. number of cars, number of cities)

What is the 10-fold cross validation?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

It involves comparing the mean performances of the tuning parameters on test data blocks.

What do you do to find the optimal tuning parameter?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Try different tuning parameters, evaluate their performance, and compare. Performance is commonly checked using 10-fold cross-validation.

When should you use Ridge and when Lasso?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Ridge: If most variables in the dataset are useful. Lasso: If most variables in the dataset are useless. If uncertain: Use Elastic Net Regression.

Why are some coefficients in LASSO equal to zero and in RIDGE not?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Because the constraint of the RIDGE forms a circle, so the ellipse never exactly touches zero.

What is the ELASTIC NET?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

A combination of RIDGE and LASSO.

What is the LASSO?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

We minimize the sum of squared residuals + a shrinkage penalty with an absolute value term (power of one) to find ßs.

What is the RIDGE?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

We minimize the sum of squared residuals + a shrinkage penalty with a squared term (power of two) to find ßs.

What is the OLS?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

We minimize the sum of squared residuals to find ßs.

What is the trade-off the ridge has to deal with?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Ridge has a larger bias than the OLS, but a lower variance. This reflects the Bias-Variance Trade-Off.

What are Lasso, Ridge and Elastic Net regression good for?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

They can be used for prediction or classification when we have large data sets.

What is the Type 2 Error?

False Negative

What is the Type 1 Error?

False Positive

For what do you need logistic regression?

When the target variable y is categorical (e.g., color). We only deal with binary outcomes (yes (1), no (0)).

What is the difference between a simple and multiple regression model regarding the RMSE?

The RMSE of the training data is lower in the multiple regression compared to the simple one.

What is a log-log regression and how do you interpret it?

log y = ß log x + E. If x increases by 1 percent, then y changes by ß percent.

1 / 64

Liste des carte

Étudier

Data Analytics Wissen

Créer ou copier des fichiers d'apprentissage

Créer ou copier des fichiers d'apprentissage

Connecte-toi pour voir toutes les cartes.

SWITCHaai

Office 365

Edulog

Apple ID

Google