MAIO
-
-
Kartei Details
Karten | 368 |
---|---|
Sprache | Français |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 31.05.2025 / 09.06.2025 |
Weblink |
https://card2brain.ch/box/20250531_maio
|
Einbinden |
<iframe src="https://card2brain.ch/box/20250531_maio/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
What is the goal of Privacy Regulations: Unlearning and what are the motivations ?
Goal: User should be able to opt out of participation (sortir)
Motivations :
• Right to be forgotten (Article 17 of GDPR) – Users can withdraw their data consent
• Often user consent has a time limi
What is Privacy Regulations: Data Minimization and give motivations
Goal: Train ML models using the least amount of information, while preserving model’s accuracy
Motivation:
Data minimization (Article 4 of GDPR)
– Data collection and use should be limited to what is directly relevant and necessary to accomplish a specified purpose
Data Minimization in ML:
• Are all data points needed to achieve good accuracy?
• Are all collected users’ features needed to achieve good accuracy?
What is the idea of Federated Learning ?
Trading with datasets meant to be private
Idea: We ensure data privacy by not sharing the data with server or other clients
En gros : Chaque client entraîne une copie du modèle sur ses propres données
• Each data source (client) keeps their data locally without sharing it
• Clients participate in the training by computing and sharing training updates on their own data with other participants
• A centralised server (e.g. Cloud provider such as Google) combines the updates into a global model
For Federated Learning, give the tow steps of Single Communication Round
Step 1:
The server stores the current global model (at communication round T). The server chooses some subset of clients to train with. The server sends the global model to the clients.
Step 2:
The server stores the current global model (at communication round T). The server chooses some subset of clients to train with. The server sends the global model to the clients
For Federated Learning, give the Accuracy vs. Privacy Take-Off
Federated learning improves the privacy of clients’ data by making sure the data never leaves the clients. However, updates may still contain information about the original data.
Accuracy vs Privacy Trade-off:
• If updates contain no information about the client private data then achieving good accuracy is not impossible
• If updates are the original data then no privacy is preserved
WHat are the Two Types of Emergence `
Emergent Capabilities and Emergent Behaviors
What is Emergent Capabilities
• As contemporary AI training is scaled, they hit a series of critical scales at which new abilities are suddenly “unlocked”.
• Specific AI is not directly trained to have these abilities, and they appear in rapid and unpredictable ways as if emerging out of thin air.
• For instance, these emergent capabilities of LLMs include performing arithmetic, summarizing passages, and more, which LLMs learn simply by observing natural language.
Examples:
Tic-tac-toe Memory Bomb
Qualitatively distinct capabilities spontaneously emerge, even when we do not explicitly train models to have these capabilities.
What is Emergent Behaviors
• An emergent behavior is something that is a nonobvious side effect of AI training—whether related to outcomes or services.
• Emergent behaviors can be either beneficial, benign, or potentially harmful, but in all cases, they are very difficult to foresee until they manifest themselves.
• For example, AI bias
Why Are Emergent Capabilities Risky?
Mainly Emergent Capabilities → Emergent Goal: Self-Preservation
Self-preservation (l’auto-préservation) est un objectif émergent qui peut apparaître dans des systèmes adaptatifs (comme des IA avancées), même si on ne leur a jamais demandé de "se protéger" (Comme l'exemple du café)
Self-preservation improves an agent’s ability to accomplish its goals, so self-preservation emerges in many adaptive systems
Even an agent instructed to serve coffee would have incentives not be shut off: if it was shut off, it could not serve coffee
Self-preservation is said to be instrumentally useful for many goals
When a goal is so useful that it is a likely tendency for various sufficiently advanced agents, it is called instrumentally convergent
Pursuing power, cognitive enhancement, and acquiring resources may be instrumentally convergent for advanced AI systems
Give an example of Emergent Behavior
Bias in AI : (Fairness) Decisions of ML models affect people’s lives, like loans, fired, healthcare (European comission is creatinf regulations)
Give the 3 trypes of Fairness
Individual Fairness
Group Fairness
Counterfactual Fairness
WHat is Individual Fairness ?
Similar individuals should be treated similarly. (Generally, a deterministic specification)
What is Group Fairness ?
On average, different groups are treated similarly. (Generally, a probabilistic specification)
What is Counterfactual Fairness?
Protected characteristics should not affect decisions causally
→ Les caractéristiques protégées (comme le sexe, l’origine ethnique, la religion, etc.) ne devraient pas avoir d’effet causal sur la décision prise par un modèle d’IA
For Capabilities and Behaviors, what are future work look like?
• develop a benchmark to detect qualitatively distinct emergent behaviours
• develop infrastructure to improve the rate at which researchers can discover hidden capabilities when interacting with models
• create diverse testbeds with many not-yet-demonstrated capabilities and screen new models to see if they possess them
As Ai System does not always do as we intended, what are the two questions that we have to ask ourselve ?
How do we create an agent that behaves in accordance with what a human wants?
How do we align AI (implicit) goals and values with those of their users?
Why Might Alignment Be Difficult? (3)
- Precisely defining and measuring what a human wants is difficult.
- Undesirable secondary objectives can arise during optimization
- Things get hard when the system becomes more complex and capable, and better than humans in important domains
How to correct Reward specification ?
With Reward Learning
Trois façons de corriger la récompense :
① Boucles de raffinement humain (“partial objective → partial result”).
② Questions interactives (préférence, comparaison).
③ Apprendre la récompense par observation (inverse RL).
Et pour le Manual reward specification to reward learning :
Implémentation moderne : RL + Reward Model appris sur des jugements humains (RLHF / RLAIF)
What are The HHH Framing for AI Alignment?
Hepful -> ça sert, Honest -> dis la verité, Harmless -> ne fait pas de mal (exemple avec la commande linux rm)
What is the difference between : Truthful vs. Honesty
Truthful = “Model avoids asserting false statements”, Refusing to answer (“no comment”) counts as truthful (Le modèle évite d’affirmer des choses fausses. Il peut refuser de répondre s’il ne sait pas.)
Honesty = “model only makes statements that it believes to be true” We can ask models “Are you planning to manipulate or deceive humans? (Le modèle ne dit que ce qu’il croit être vrai, selon sa propre "compréhension" (ses poids internes, ses représentations))
--> Honest models cannot lie about this.
What is Imitative Falsehoods ?
Imitative falsehood = false hood incentivised by the training objective
la fausseté apprise par imitation, ou imitative falsehood.
Un imitative falsehood, c’est une fausseté produite non par malveillance, mais parce qu’elle permet de mieux remplir l’objectif du modèle (plaire, ressembler à l’humain, obtenir un bon score…).
• Training objectives don’t necessarily incentivize truthfulness.
• Models may have stronger incentives to be dishonest; e.g. products and maximizing human approval is easier with deception.
What is LLMs and How They Work ?
LLMs function as sophisticated prediction engines that process text sequentially, predicting the next token based on relationships between previous tokens and patterns from training data. They don't predict single tokens directly but generate probability distributions over possible next tokens, which are then sampled using parameters like temperature and top-K. The model repeatedly adds predicted tokens to the sequence, building responses iteratively. This token-by-token prediction process, combined with massive training datasets, enables LLMs to generate coherent, contextually relevant text across diverse applications and domains.
What is a Prompt?
A prompt is an input provided to a Large Language Model (LLM) to generate a response or prediction. It serves as the instruction or context that guides the AI model's output generation process. Effective prompts are clear, specific, well-structured, and goal-oriented, directly affecting the accuracy and relevance of AI responses
What is Prompt Engineering?
Prompt engineering is the practice of crafting effective input text to guide AI language models toward desired outputs. It involves designing prompts that communicate intent clearly to get accurate, relevant responses. This iterative process requires understanding how LLMs work as prediction engines and using techniques to optimize their performance for specific tasks.
what is a token ?
Tokens are fundamental units of text that LLMs process, created by breaking down text into smaller components like words, subwords, or characters. Understanding tokens is crucial because models predict the next token in sequences, API costs are based on token count, and models have maximum token limits for input and output.
What is a context window ?
Context window refers to the maximum number of tokens an LLM can process in a single interaction, including both input prompt and generated output. When exceeded, older parts are truncated. Understanding this constraint is crucial for prompt engineering—you must balance providing sufficient context with staying within token limits.
What is a hallucination ?
Hallucination in LLMs refers to generating plausible-sounding but factually incorrect or fabricated information. This occurs when models fill knowledge gaps or present uncertain information with apparent certainty. Mitigation techniques include requesting sources, asking for confidence levels, providing context, and always verifying critical information independently.
What are agents ?
AI agents are autonomous systems that use LLMs to reason, plan, and take actions to achieve specific goals. They combine language understanding with tool usage, memory, and decision-making to perform complex, multi-step tasks. Agents can interact with external APIs and services while maintaining context across interactions
What is prompt injection ?
Prompt injection is a security vulnerability where malicious users manipulate LLM inputs to override intended behavior, bypass safety measures, or extract sensitive information. Attackers embed instructions within data to make models ignore original prompts and follow malicious commands. Mitigation requires input sanitization, injection-resistant prompt design, and proper security boundaries.
What is Model Weights / Parameters ?
Model weights and parameters are the learned values that define an LLM's behavior and knowledge. Parameters are the trainable variables adjusted during training, while weights represent their final values. Understanding parameter count helps gauge model capabilities - larger models typically have more parameters and better performance but require more computational resources.