Data Analytics
Klausur
Klausur
Fichier Détails
Cartes-fiches | 129 |
---|---|
Langue | English |
Catégorie | Finances |
Niveau | Université |
Crée / Actualisé | 24.11.2024 / 08.02.2025 |
Lien de web |
https://card2brain.ch/box/20241124_data_analytics
|
Intégrer |
<iframe src="https://card2brain.ch/box/20241124_data_analytics/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Visualization of the distribution of a continuous variable.
What is the Python code for a histogram using seaborn?
sns.histplot(data=housing_df, x="MEDV")
In the simplest version, a bar chart shows only the frequency in each category.
What is the Python code for a bar chart using seaborn?
sns.countplot(x="CHAS", data=housing_df)
What is the Python code for a bar chart representing the arithmetic mean using seaborn?
from numpy import mean sns.barplot(x="CHAS", y="MEDV", data=housing_df, estimator=mean)
What is the Python code for a bar chart representing the median using seaborn?
from numpy import median sns.barplot(x="CHAS", y="MEDV", data=housing_df, estimator=median)
Important for visualization of time series data.
What is the Python code of a line chart using matplotlib?
sns.lineplot(x="Date", y="Ridership", data=trains_df)
What is the Python code for frequency tables?
pd.crosstab(housing_df["CHAS"], housing_df["CAT_MEDV"])
What is the Python code to load a dataset and assign a name for the dataframe in pandas?
dataframename_df = pd.read_csv("dataset.csv", keep_default_na=False)
Rename all column names in Python so that the spaces contained are replaced by _.
housing_df.columns = [s.strip().replace(" ", "_") for s in housing_df.columns]
What is the Python code to calculate the arithmetic mean?
housing_df.VARIABLE.mean()
What is the Python code to calculate the median?
housing_df.VARIABLE.median()
What is the Python code to draw a random sample of 10 houses from the dataset?
Sample_1 = housing_df.sample(10, random_state=1) Sample_1
What is the Python code to oversample houses with more than 12 rooms?
weights = [0.9 if rooms > 12 else 0.01 for rooms in housing_df.ROOMS]
Sample_2 = housing_df.sample(10, weights=weights, random_state=1)
Sample_2
What is the Python code to read the data from a table?
Data_a = pd.DataFrame({"Age": [10,20,30], "Income": [30,40,50]})
What is the Python code to standardize two variables?
Data_a = pd.DataFrame({"Age": [10,20,30], "Income": [30,40,50]})norm_df = (Data_a - Data_a.mean()) / Data_a.std()
What is the Python code to convert two categorical variables into dummy variables?
Housing_df = pd.get_dummies(Housing_df, columns=["Variable1", "Variable2"], prefix_sep="_", drop_first=True)
print(list(Housing_df.columns))Housing_df.head()
trainData, temp = train_test_split(Housing_df, test_size=0.5, random_state=1)validData, testData = train_test_split(temp, test_size=0.4, random_state=1)print("Training: ", trainData.shape)print("Validation: ", validData.shape)print("Test: ", testData.shape)
What is the Python code to rename one variable?
housing_df = housing_df.rename(columns={"VARIABLE": "VARIABLENEW"})
What is the Python code to plot a histogram?
sns.histplot(data=housing_df, x="VARIABLE", binwidth=10) # 'binwidth' can be adjusted or omitted depending on the task
What is the Python code to plot two boxplots with whiskers?
sns.boxplot(y=housing_df["VARIABLE1"], x=housing_df["VARIABLE2"])
What is the Python code to plot two boxplots without whiskers?
sns.boxplot(y=housing_df["VARIABLE1"], x=housing_df["VARIABLE2"], whis=[0, 100])
What is the Python code to plot two barcharts showing the mean?
sns.barplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", estimator=mean)
What is the code for the scatterplot in Python using seaborn and making the dots a bit transparent?
sns.scatterplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", alpha=0.8)
What is the code for a scatterplot with a regression line in it?
sns.regplot(data=housing_df, y="VARIABLE1", x="VARIABLE2")
What is the code for a color-coded scatterplot according to VARIABLE3 (binary)?
sns.scatterplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", hue="VARIABLE3", alpha=0.8)
What is the code for the minimum of a variable?
housing_df.VARIABLE.min()
What is the code for the maximum of a variable?
housing_df.VARIABLE.max()
What is the code for the standard deviation of a variable?
housing_df.VARIABLE.std()
What is the code to show a frequency table of a binary variable?
housing_df.VARIABLE.value_counts()
What is the code to name the binary outcome (0, 1) into sentences?
housing_df["VARIABLE_lb"] = housing_df["VARIABLE"].map({0: "Unter", 1: "Over"})
What is the code for a color-coded scatterplot according to VARIABLE 3 (binary) with sentences as outcomes?
sns.scatterplot(data=housing_df, y="VARIABLE1", x="VARIABLE2", hue="VARIABLE3_lb", alpha=0.8)
What is the code to cross-tabulate two binary variables?
pd.crosstab(housing_df["VARIABLE1"], housing_df["VARIABLE2"])
Classification of an observation as belonging to one class, although it belongs to another.
Proportion of misclassified observations out of all observations of the datasets in the validation data.
Classify all observations as belonging to the most frequent class (benchmark).
High separation: Predictor variables lead to a low error. Low separation: Predictor variables do not significantly improve the naive rule.
For example: Class 1 (acceptance of credit) vs. Class 0 (rejection of credit). Calculate the probability of belonging to Class 1. If it is lower than 0.5 (threshold), classify as Class 0; otherwise, classify as Class 1.
1 - Error Rate