Klausur


Fichier Détails

Cartes-fiches 129
Langue English
Catégorie Finances
Niveau Université
Crée / Actualisé 24.11.2024 / 08.02.2025
Lien de web
https://card2brain.ch/box/20241124_data_analytics
Intégrer
<iframe src="https://card2brain.ch/box/20241124_data_analytics/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
What is the histogram useful for?

Visualization of the distribution of a continuous variable.

What is the Python code for a histogram using seaborn?

sns.histplot(data=housing_df, x="MEDV")

What does a Bar Chart show?

In the simplest version, a bar chart shows only the frequency in each category.

What is the Python code for a bar chart using seaborn?

sns.countplot(x="CHAS", data=housing_df)

What is the Python code for a bar chart representing the arithmetic mean using seaborn?

from numpy import mean sns.barplot(x="CHAS", y="MEDV", data=housing_df, estimator=mean)

What is the Python code for a bar chart representing the median using seaborn?

from numpy import median sns.barplot(x="CHAS", y="MEDV", data=housing_df, estimator=median)

What are line charts important for?

Important for visualization of time series data.

What is the Python code of a line chart using matplotlib?

sns.lineplot(x="Date", y="Ridership", data=trains_df)

What is the Python code for frequency tables?

pd.crosstab(housing_df["CHAS"], housing_df["CAT_MEDV"])

What is the Python code to load a dataset and assign a name for the dataframe in pandas?

dataframename_df = pd.read_csv("dataset.csv", keep_default_na=False)

Rename all column names in Python so that the spaces contained are replaced by _.

housing_df.columns = [s.strip().replace(" ", "_") for s in housing_df.columns]

What is the Python code to calculate the arithmetic mean?

housing_df.VARIABLE.mean()

What is the Python code to calculate the median?

housing_df.VARIABLE.median()

What is the Python code to draw a random sample of 10 houses from the dataset?

Sample_1 = housing_df.sample(10, random_state=1) Sample_1

What is the Python code to oversample houses with more than 12 rooms?

weights = [0.9 if rooms > 12 else 0.01 for rooms in housing_df.ROOMS]

Sample_2 = housing_df.sample(10, weights=weights, random_state=1)

Sample_2

What is the Python code to read the data from a table?

Data_a = pd.DataFrame({"Age": [10,20,30], "Income": [30,40,50]})

What is the Python code to standardize two variables?

Data_a = pd.DataFrame({"Age": [10,20,30], "Income": [30,40,50]})norm_df = (Data_a - Data_a.mean()) / Data_a.std()

What is the Python code to convert two categorical variables into dummy variables?

Housing_df = pd.get_dummies(Housing_df, columns=["Variable1", "Variable2"], prefix_sep="_", drop_first=True)

 

print(list(Housing_df.columns))Housing_df.head()

What is the Python code to prepare the dataset for supervised learning methods? Split the data set into three parts (50% training, 30% validation, 20% test)

trainData, temp = train_test_split(Housing_df, test_size=0.5, random_state=1)validData, testData = train_test_split(temp, test_size=0.4, random_state=1)print("Training: ", trainData.shape)print("Validation: ", validData.shape)print("Test: ", testData.shape)

What is the Python code to rename one variable?

housing_df = housing_df.rename(columns={"VARIABLE": "VARIABLENEW"})

What is the Python code to plot a histogram?

sns.histplot(data=housing_df, x="VARIABLE", binwidth=10) # 'binwidth' can be adjusted or omitted depending on the task

What is the Python code to plot two boxplots with whiskers?

sns.boxplot(y=housing_df["VARIABLE1"], x=housing_df["VARIABLE2"])

What is the Python code to plot two boxplots without whiskers?

sns.boxplot(y=housing_df["VARIABLE1"], x=housing_df["VARIABLE2"], whis=[0, 100])

What is the Python code to plot two barcharts showing the mean?

sns.barplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", estimator=mean)

What is the code for the scatterplot in Python using seaborn and making the dots a bit transparent?

sns.scatterplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", alpha=0.8)

What is the code for a scatterplot with a regression line in it?

sns.regplot(data=housing_df, y="VARIABLE1", x="VARIABLE2")

What is the code for a color-coded scatterplot according to VARIABLE3 (binary)?

sns.scatterplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", hue="VARIABLE3", alpha=0.8)

What is the code for the minimum of a variable?

housing_df.VARIABLE.min()

What is the code for the maximum of a variable?

housing_df.VARIABLE.max()

What is the code for the standard deviation of a variable?

housing_df.VARIABLE.std()

What is the code to show a frequency table of a binary variable?

housing_df.VARIABLE.value_counts()

What is the code to name the binary outcome (0, 1) into sentences?

housing_df["VARIABLE_lb"] = housing_df["VARIABLE"].map({0: "Unter", 1: "Over"})

What is the code for a color-coded scatterplot according to VARIABLE 3 (binary) with sentences as outcomes?

sns.scatterplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", hue="VARIABLE3_lb", alpha=0.8)

What is the code to cross-tabulate two binary variables?

pd.crosstab(housing_df["VARIABLE1"], housing_df["VARIABLE2"])

What is an Error?

Classification of an observation as belonging to one class, although it belongs to another.

What is an Error Rate?

Proportion of misclassified observations out of all observations of the datasets in the validation data.

What is the Naive rule?

Classify all observations as belonging to the most frequent class (benchmark).

How does the separation of observations affect the error?

High separation: Predictor variables lead to a low error. Low separation: Predictor variables do not significantly improve the naive rule.

How do you decide who gets into which class of interest?

For example: Class 1 (acceptance of credit) vs. Class 0 (rejection of credit). Calculate the probability of belonging to Class 1. If it is lower than 0.5 (threshold), classify as Class 0; otherwise, classify as Class 1.

What is the Accuracy?

1 - Error Rate