IAI | HSLU | Magdalena Picariello
Introduction to AI | HSLU
Introduction to AI | HSLU
Fichier Détails
Cartes-fiches | 92 |
---|---|
Langue | English |
Catégorie | Informatique |
Niveau | Université |
Crée / Actualisé | 17.10.2023 / 02.11.2023 |
Lien de web |
https://card2brain.ch/box/20231017_iai_%7C_hslu_%7C_magdalena_picariello
|
Intégrer |
<iframe src="https://card2brain.ch/box/20231017_iai_%7C_hslu_%7C_magdalena_picariello/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Brainstorming framework
- Think about automating tasks rather than automizing jobs, e.g. call center routing, radiologists
- What are the main drives of the business value?
- What are the main pain points in the business?
Key Questions in Technical diligence:
● Can we meet desired performance?
● Can we use pre-existing components?
● How much data is needed?
● What resources are needed?
● What are the dependencies?
● Are there any legal constraints?
Key Questions in Business diligence:
● Does it lower costs?
● Does it generate revenue?
● Does it enable launching new product?
● Does it generate ENOUGH value?
Is the project technically feasible?
Questions to ask yourself:
Do other people solve similar problems?
What performance do they achieve?
With my skills, what performance can I achieve?
Do I need additional resources?
Can I do it in a reasonable time?
Can I make an AI project without big data?
Yes, you can make progress without big data.
Is having more data beneficial for AI projects?
Having more data never hurts and can often enhance AI performance.
What is the downside of relying on big data?
Gathering large volumes of data can be very expensive and resource-intensive.
Can limited data be valuable for AI projects?
Yes, you may be able to bring value to your AI project even with the limited data that you have. It depends on the specific project's goals and requirements.
Why are ML models from scientific publications often irreproducible?
● There is no obligation to publish code, model, and data
● If model (inference code) is published, it doesn’t mean that training code is published too
● The exact training datasets are rarely available
● Even if they are, the preprocessing code may be missing
● Code quality is often poor (not compiling, missing dependencies etc.)
● Described models are tweaked to some metric
Machine Learning Workflow
Echo / Alexa
Collect data
○ Collect audio clips of people saying “Alexa”
○ Collect audio clips of people saying other stuff
Train model
○ Classify audio clips (Alexa/Not Alexa)
○ Iterate many times till good enough
Deploy model
○ Put ML software in the smart speaker
○ Get data back for failing cases
○ Maintain/update model
Data Science Workflow
Optimizing a sales funnel
1. Collect data
○ History of users going to web pages (time/country/website)
2. Analyze data
○ Iterate many times to get good insights
3. Suggest hypotheses/actions
○ Deploy changes
○ Re-analyze data periodically
ML + DS Workflow
1. Collect data
2. Train model: iterate
● Analyze data (actual vs expected)
● Formulate hypothesis
● Suggest actions
● Implement, test, and repeat
3. Deploy the model
What is the initial phase in the machine learning project lifecycle?
The initial phase in the machine learning project lifecycle is Scoping, where the project's objectives and requirements are defined.
What activities are involved in the "Data" phase of the ML project lifecycle?
In the "Data" phase, the activities include defining data, establishing a baseline, labeling and organizing data.
What does the "Modeling" phase entail?
The "Modeling" phase involves selecting and training a model for the machine learning projec as well as Perform error analysis.
What takes place during the "Deployment" phase?
- Productize the model
- Monitor and maintain the system.
Deployment
- Out of scope
- Data drift/concept drift
- Data and Deployment is where most of the work happens
- Very little jobs for the modeling
How do I start?
Establish the baseline level of performance
Implement quick-and-dirty model
Perform sanity checks
Error analysis
Ways to establish baseline?
● Human level performance
● Literature search for state-of-art/open source
● Quick-and-dirty implementation
● Performance of older system
Baseline helps us understand what may be possible.
In some cases (e.g. HLP) it also gives us understanding of the irreducible error.
Quick and dirty model
What is advisable for first model?
● Do literature search to see what is possible (kaggle forum, courses, blogs, open-source-project)
● Find open source implementations if available
A reasonable algorithm with good data will often outperform a great algorithm with not so good data.
Why is it important to perform sanity checks for code and algorithms?
We are only humans, and our code is error prone.
Sanity checks - How to perform Sanity checks to find errors early on?
● Get a small subset of the data
● Train an overfitting model
● Does your model deliver expected results?
● Are your results uploadable to Kaggle?
Error analysis - What is the single number evaluation metric?
Why human level performance?
As long as ML is worse than humans, you can:
● Get labeled data from humans
● Gain insights from manual error analysis: why did person get it right?
● Do better analysis of bias/variance