Data Analytics

Klausur

129

0.0 (0)

Fichier Détails

Cartes-fiches	129
Langue	English
Catégorie	Finances
Niveau	Université
Crée / Actualisé	24.11.2024 / 08.02.2025
Lien de web	https://card2brain.ch/box/20241124_data_analytics
Intégrer	<iframe src="https://card2brain.ch/box/20241124_data_analytics/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Liste des carte

Étudier

What is the Python code to calculate the arithmetic mean of a variable price for each category of another variable?

toyota_df.groupby('Fuel_Type').Price.mean()

What is the Python code to visualize the relationship between the selling price and the type of fuel in a boxplot?

sns.boxplot(x="Fuel_Type", y="Price", data=toyota_df, whis=100)

What is the Python code to visualize the relationship between the selling price and the type of fuel in a swarmplot?

with pd.option_context('mode.use_inf_as_na', True): sns.set(rc={'figure.figsize':(13,5), "figure.dpi":300,})

sns.set_theme(style="whitegrid")sns.swarmplot(x="Fuel_Type", y="Price", data=toyota_df, size=4)

What is the Python code to visualize the relationship between the selling price and the type of fuel in a stripplot?

with pd.option_context('mode.use_inf_as_na', True): sns.set(rc={'figure.figsize':(10,8), "figure.dpi":300,})

sns.set_theme(style="whitegrid")sns.stripplot(x="Fuel_Type", y="Price", data=toyota_df)

What is the Python code for an OLS Regression to appreciate the influence of a variable based on another variable?

modg_X = toyota_df[['Fuel_Type']

]modg_X = pd.get_dummies(modg_X, drop_first=True)

modg_X = sm.add_constant(modg_X)

modg_X = modg_X.astype(float) # Make sure that all columns have numerical values# Model estimation and results

modg = sm.OLS(toyota_df['Price'], modg_X)res = modg.fit()print(res.summary())

What is the Python code for regression statistics?

# Fuel_Type transform in Dummies

X = toyota_df[['Fuel_Type', 'HP']]

y = toyota_df[['Price']]# Transform Fuel_Type in dummies

X = pd.get_dummies(X, drop_first=True)# Split the datatrain_X, valid_X, train_y,

valid_y = train_test_split(X, y, test_size=0.4)# Model

fittingtoyota_ml = LinearRegression()toyota_ml.fit(train_X, train_y)

What is the Python code to show the regression statistics of training data?

print('Performance Measures (Training data)') regressionSummary(train_y, toyota_ml.predict(train_X))

What is the Python code to show the regression statistics of validation data?

print('Performance Measures (Validation data)') regressionSummary(valid_y, toyota_ml.predict(valid_X))

What is the Python code to replace the spaces in all variable names with underscores _?

banking_df.columns = [s.strip().replace(" ", "_") for s in banking_df.columns] banking_df.head()

What is the Python code to convert a variable into a categorical variable?

banking_df["Education"].value_counts().sort_index()

banking_df["Education"] = banking_df["Education"].map({1: "Undergrad", 2: "Graduate", 3: "Advanced/Professional"})

banking_df.head()

What is the Python code to generate a new variable that takes the value 0 when Mortgage has the value 0 and takes the value 1 in all other cases?

banking_df["has_mortgage"] = [0 if x == 0 else 1 for x in banking_df["Mortgage"]]

banking_df.head()

What is the Python code to estimate a logit model: log(odds(has.mortgage = 1| income) = ß0 + ß1 * income?

X_simple = banking_df["Income"]

Y_simple = banking_df["has_mortgage"]

X_simple = sm.add_constant

(X_simple)logit_simple_mod = sm.Logit

(Y_simple, X_simple)logit_simple_mod_res = logit_simple_mod.fit()print(logit_simple_mod_res.summary())

What is the Python code to add explanatory variables and estimate it again?

X_full = banking_df[["Income", "Family", "CCAvg", "Education", "Age"]] X_full = pd.get_dummies(X_full, prefix_sep="_", drop_first=True)

X_full = X_full.astype(float) # Make sure that all columns have numerical data types

Y_full = banking_df["has_mortgage"] X_full = sm.add_constant

(X_full)logit_full_mod = sm.Logit(Y_full, X_full)

logit_full_mod_res = logit_full_mod.fit()print(logit_full_mod_res.summary())

What is the Python code to make a confusion matrix?

predict_valid = logit_reg.predict(valid_X) cm2 = confusion_matrix(valid_y, predict_valid)

ConfusionMatrixDisplay(cm2).plot()

What is the Python code to generate a lift chart?

import kds as kds

kds.metrics.plot_lift(valid_y, predict_valid)

What are numeric variables?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Continuous -> infinite numbers (e.g. size, time, age) and

Integer -> integer (e.g. number of cars, number of cities)

What are categorical variables?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Ordinal -> values can be ordered logically (e.g. good - ok - bad) and

nominal -> values cannot be ordered logically (e.g. blue, yellow, red)

What are the core ideas of Data Mining?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Data Analysis, Visualisation, Prediction, Classification, Data Reduction, Association Rules, Recommendation Systems

Name the 9 steps of the Data Mining Process

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Define/ Understand the purpose of the analysis;

Obtaining data (possibly including sampling);

Data analysis, cleaning, preparation;

Reduce the data (dimension) (for supervised data mining, partition it);

Specify the analysis goal (classification, prediction, etc.);

Selection of techniques (e.g. regression, logit);

Iterative implementation and tuning;

Evaluation of the results;

Roll-out and widespread use of the best model

What is the goal of supervised vs. unsupervised learning?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Supervised learning's goal is to predict a target or outcome variable; the goal of unsupervised learning is to identify patterns and divide data into meaningful groups.

Is there a training value in supervised vs. unsupervised learning?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

In supervised learning a target value is known in the training data, it is the data on which the algorithm is trained; in unsupervised learning there is no target variable for prediction or classification.

What methods are used in supervised vs. unsupervised learning?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

Supervised learning: classification and prediction; unsupervised learning: association rules, data reduction, data exploration, visualisation.

What is the goal of the method prediction?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

The goal is the prediction of a numerical outcome variable.

What is the goal of the method classification?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

The goal is the prediction of a categorical outcome variable.

What is the code for descriptive analysis of the dataset housing?

Commandes clavier:

= tourner,

= avant/arrière,

= faire défiler

housing_df.describe()

What is the code to show the dimension of the dataset?

housing_df.shape

What is the code to show the first 5 lines of the dataset?

housing_df.head()

How many numbers of dummies?

Number of categories - 1 -> redundant information leads to the failure of algorithms.

What is an outlier?

Observation that is extreme (far away) compared to the rest of the data.

What happens when we detect outliers?

We need expertise in the data to determine whether it is an error or a true extreme. Sometimes it is possible to correct the error. If the number of outliers is small and we recognize it as an error: treat it as a missing value.

1 / 129

Liste des carte

Étudier

Data Analytics

Créer ou copier des fichiers d'apprentissage

Créer ou copier des fichiers d'apprentissage

Connecte-toi pour voir toutes les cartes.

SWITCHaai

Office 365

Edulog

Apple ID

Google