IAI | HSLU | Magdalena Picariello

Introduction to AI | HSLU

0.0 (0)

Set of flashcards Details

Flashcards	92
Language	English
Category	Computer Science
Level	University
Created / Updated	17.10.2023 / 02.11.2023
Weblink	https://card2brain.ch/box/20231017_iai_%7C_hslu_%7C_magdalena_picariello
Embed	<iframe src="https://card2brain.ch/box/20231017_iai_%7C_hslu_%7C_magdalena_picariello/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Card list

Study

What is the strategy of "trying random values" when tuning hyperparameters?

It involves randomly sampling hyperparameter values to avoid potential biases and to ensure a thorough search.

What does the concept of "coarse to fine" mean in hyperparameter tuning?

It refers to starting with a wide exploration of hyperparameter values (coarse) and gradually narrowing down the search to find the optimal values (fine).

What is the primary goal of data augmentation in machine learning?

Create realistic examples that (1) algorithm does poorly on, but (2) humans do well on

Data augmentation aims to generate additional training examples to enhance the performance of a machine learning model.

Checklist für data augmentation:

● Does it sound realistic?

● Is the x->y mapping clear (e.g. can humans recognize speech?)

● Is the algorithm currently doing poorly on it?

How can we perform artificial data synthesis?

With unstructured data, we can perform artificial data synthesis

What is feature engineering in machine learning?

With structured data, we can do feature engineering.

Example: Restaurant recommender
Vegetarians are frequently recommended restaurants with only meat options Possible features to add:

● Is person vegetarian (based on past orders)?
● Does restaurant have vegetarian options (based on menu)

What is the role of data iteration in the feature engineering process?

Data iteration involves continuously revising and improving features based on the results of error analysis, user feedback, and benchmarking. It helps enhance the quality and relevance of features.

Why can error analysis be more challenging without a good baseline (HLP)?

Error analysis becomes challenging without a good baseline (Human-Level Performance or HLP) because you lack a reliable reference point for comparison. A baseline provides insight into how well a human can perform the task, which is essential for identifying areas where the model falls short.

Q1: How can user feedback contribute to the data iteration process in feature engineering?

Q2: Why is benchmarking against competitors a useful source of inspiration for feature engineering?

A1: User feedback is valuable for identifying which features users find relevant or missing. It provides insights that guide the selection and engineering of features to enhance user satisfaction and model performance.

A2: Benchmarking against competitors helps identify areas where your model can gain a competitive edge. By analyzing competitor performance, you can inspire the development of unique features that set your model apart in the field.

Build your system quickly, then iterate

what are the directions you can take to improve speech recognition system:

● Noisy background:

○ Cafe noise

○ Car noise

● Accented speech

● Far from microphone

● Young children

● Stuttering

Data or modeling?

AI system = Code (model) + data

Data or modeling?

Model centric view VS. Data cenrtic view

What is the typical mindset in Data or Modeling?

Traditional machine learning research driven by improving benchmark dataset performance.
Researchers often work on a fixed dataset they download.
This approach has led to significant progress in machine learning.
In production systems, the dataset doesn't need to remain fixed.
It's common to edit the training and test sets to enhance data quality for better system performance.

Questions:

In what cases can focusing on optimizing the data and hyperparameters be more effective than code optimization?
What does a machine learning system comprise in terms of components?
How does taking a non-model-centric approach differ in terms of optimization?
What is a crucial step during the modeling phase of machine learning?
How can error analysis be utilized in the context of data improvement?
Why is collecting more data not always the most efficient solution?
How can error analysis contribute to a high-accuracy model?

How long should you spend obtaining data?

What AI can and cannot do?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Due diligence on the project

Keyboard commands:

= turn,

= for-/backward,

= scroll

Brainstorming framework

Keyboard commands:

= turn,

= for-/backward,

= scroll

Think about automating tasks rather than automizing jobs, e.g. call center routing, radiologists
What are the main drives of the business value?
What are the main pain points in the business?

Key Questions in Technical diligence:

Keyboard commands:

= turn,

= for-/backward,

= scroll

● Can we meet desired performance?
● Can we use pre-existing components?
● How much data is needed?
● What resources are needed?
● What are the dependencies?
● Are there any legal constraints?

Key Questions in Business diligence:

Keyboard commands:

= turn,

= for-/backward,

= scroll

● Does it lower costs?
● Does it generate revenue?
● Does it enable launching new product?
● Does it generate ENOUGH value?

Is the project technically feasible?

Questions to ask yourself:

Keyboard commands:

= turn,

= for-/backward,

= scroll

Do other people solve similar problems?
What performance do they achieve?
With my skills, what performance can I achieve?
Do I need additional resources?
Can I do it in a reasonable time?

Is the project technically feasible?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Can I make an AI project without big data?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Yes, you can make progress without big data.

Is having more data beneficial for AI projects?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Having more data never hurts and can often enhance AI performance.

What is the downside of relying on big data?

Keyboard commands:

= turn,

= for-/backward,

= scroll

Gathering large volumes of data can be very expensive and resource-intensive.

Can limited data be valuable for AI projects?

Yes, you may be able to bring value to your AI project even with the limited data that you have. It depends on the specific project's goals and requirements.

Why are ML models from scientific publications often irreproducible?

● There is no obligation to publish code, model, and data

● If model (inference code) is published, it doesn’t mean that training code is published too

● The exact training datasets are rarely available

● Even if they are, the preprocessing code may be missing

● Code quality is often poor (not compiling, missing dependencies etc.)

● Described models are tweaked to some metric

Risk assessment matrix

Risk assessment

Machine Learning Workflow

Echo / Alexa

Collect data
- ○ Collect audio clips of people saying “Alexa”
- ○ Collect audio clips of people saying other stuff
Train model
- ○ Classify audio clips (Alexa/Not Alexa)
- ○ Iterate many times till good enough
Deploy model
- ○ Put ML software in the smart speaker
- ○ Get data back for failing cases
- ○ Maintain/update model

1 / 92

Card list

Study

IAI | HSLU | Magdalena Picariello

Create or copy sets of flashcards

Create or copy sets of flashcards

Log in to see all the cards.

SWITCHaai

Office 365

Edulog

Apple ID

Google