IAI | HSLU | Magdalena Picariello

Introduction to AI | HSLU

Introduction to AI | HSLU


Fichier Détails

Cartes-fiches 92
Langue English
Catégorie Informatique
Niveau Université
Crée / Actualisé 17.10.2023 / 02.11.2023
Lien de web
https://card2brain.ch/box/20231017_iai_%7C_hslu_%7C_magdalena_picariello
Intégrer
<iframe src="https://card2brain.ch/box/20231017_iai_%7C_hslu_%7C_magdalena_picariello/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What AI can and cannot do?

Due diligence on the project

Brainstorming framework

  • Think about automating tasks rather than automizing jobs, e.g. call center routing, radiologists
  • What are the main drives of the business value?
  • What are the main pain points in the business?

Key Questions in Technical diligence:

  1. ●  Can we meet desired performance?

  2. ●  Can we use pre-existing components?

  3. ●  How much data is needed?

  4. ●  What resources are needed?

  5. ●  What are the dependencies?

  6. ●  Are there any legal constraints?

Key Questions in Business diligence:

 

● Does it lower costs?
● Does it generate revenue?
● Does it enable launching new product?
● Does it generate ENOUGH value?

Is the project technically feasible?

Questions to ask yourself:

  1. Do other people solve similar problems?

  2. What performance do they achieve?

  3. With my skills, what performance can I achieve?

  4. Do I need additional resources?

  5. Can I do it in a reasonable time?

Is the project technically feasible?

Can I make an AI project without big data?

Yes, you can make progress without big data.

Is having more data beneficial for AI projects?

Having more data never hurts and can often enhance AI performance.

 

What is the downside of relying on big data?

Gathering large volumes of data can be very expensive and resource-intensive.

Can limited data be valuable for AI projects?

Yes, you may be able to bring value to your AI project even with the limited data that you have. It depends on the specific project's goals and requirements.

Why are ML models from scientific publications often irreproducible?

●  There is no obligation to publish code, model, and data

●  If model (inference code) is published, it doesn’t mean that training code is published too

●  The exact training datasets are rarely available

●  Even if they are, the preprocessing code may be missing

●  Code quality is often poor (not compiling, missing dependencies etc.)

●  Described models are tweaked to some metric

Risk assessment matrix

Risk assessment

Machine Learning Workflow

Echo / Alexa

  1. Collect data

    • ○  Collect audio clips of people saying “Alexa”

    • ○  Collect audio clips of people saying other stuff

  2. Train model

    • ○  Classify audio clips (Alexa/Not Alexa)

    • ○  Iterate many times till good enough

  3. Deploy model

    • ○  Put ML software in the smart speaker

    • ○  Get data back for failing cases

    • ○  Maintain/update model

Data Science Workflow

Optimizing a sales funnel

1. Collect data
○ History of users going to web pages (time/country/website)

2. Analyze data
○ Iterate many times to get good insights

3. Suggest hypotheses/actions

  • ○  Deploy changes

  • ○  Re-analyze data periodically

ML + DS Workflow

1. Collect data
2. Train model: iterate

● Analyze data (actual vs expected)

● Formulate hypothesis
● Suggest actions

● Implement, test, and repeat

3. Deploy the model

What is the initial phase in the machine learning project lifecycle?

The initial phase in the machine learning project lifecycle is Scoping, where the project's objectives and requirements are defined.

 

What activities are involved in the "Data" phase of the ML project lifecycle?

In the "Data" phase, the activities include defining data, establishing a baseline, labeling and organizing data.

What does the "Modeling" phase entail?

The "Modeling" phase involves selecting and training a model for the machine learning projec as well as Perform error analysis.

What takes place during the "Deployment" phase?

  • Productize the model
  • Monitor and maintain the system. 

What is the goal of Scoping - student version VS. Scoping: speech recognition

  • -  Defined for you

  • -  Goal: due diligence on it

    • -  Is it feasible?

    • -  What resources do I need?

  • -  What metric do I use?

 

VS:

Data - student version VS. Data: speech recognition

  • -  Provided

  • -  You need to do due diligence

  • -  Do I have enough features?

  • -  Do you need additional data sources?

  • -  Is the data quality good enough?

VS:

Model development is an iterative process

ML system = code + data

Deployment: ML in production

Deployment

  • -  Out of scope

  • -  Data drift/concept drift

  • -  Data and Deployment is where most of the work happens

  • -  Very little jobs for the modeling

ML Project Lifecycle - student version

How do I start?

  1. Establish the baseline level of performance

  2. Implement quick-and-dirty model

  3. Perform sanity checks

  4. Error analysis

Baseline level of performance

Ways to establish baseline?

 

  • ●  Human level performance

  • ●  Literature search for state-of-art/open source

  • ●  Quick-and-dirty implementation

  • ●  Performance of older system

    Baseline helps us understand what may be possible.
    In some cases (e.g. HLP) it also gives us understanding of the irreducible error.

Quick and dirty model

What is advisable for first model?

●  Do literature search to see what is possible (kaggle forum, courses, blogs, open-source-project)

●  Find open source implementations if available

A reasonable algorithm with good data will often outperform a great algorithm with not so good data.

Why is it important to perform sanity checks for code and algorithms?

We are only humans, and our code is error prone.

Sanity checks - How to perform Sanity checks to find errors early on?

● Get a small subset of the data
● Train an overfitting model
● Does your model deliver expected results?

● Are your results uploadable to Kaggle?

Error analysis - What is the single number evaluation metric? 

●  One number

●  Can capture few aspects

●  Evaluated at different stages of the process

●  Allows you to track progress

Evaluation metric allows you to quickly tell if model A or model B is better.
Having a dev set plus single number evaluation metric allows to speed up iterating.

Error analysis - Comparing to human- level perfomrance

Why human level performance?

As long as ML is worse than humans, you can:

●  Get labeled data from humans

●  Gain insights from manual error analysis: why did person get it right?

●  Do better analysis of bias/variance

Prioritizing what to work on

How to decide on most important categories to work based on:

●  How much room for improvement there is

●  How frequently that category appears

●  How easy it is to improve accuracy in that category

●  How important it is to improve this category

Data pipeline

Experiment tracing