SMA

SMA Unimib

SMA Unimib


Fichier Détails

Cartes-fiches 287
Langue Deutsch
Catégorie Technique
Niveau Université
Crée / Actualisé 06.12.2023 / 15.01.2024
Lien de web
https://card2brain.ch/box/20231206_sma
Intégrer
<iframe src="https://card2brain.ch/box/20231206_sma/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

How does the walktrap algorithm work?

How does the Kernighan-Lin Algorithm work?

The Kernighan-Lin algorithm divides networks into two communities starting from some initial division, such as a random division into equally sized groups. The algorithm then considers each vertex in the network in turn and calculates how much the modularity would change if that vertex were moved to the other group.

It then chooses among the vertices the one whose movement would most increase, or least decrease, the modularity and moves it.

Then it repeats the process, but with the important constraint that a vertex once moved cannot be moved again, at least on this round of the algorithm. And so the algorithm proceeds, repeatedly moving the vertices that most increase or least decrease the modularity.

When all vertices have been moved exactly once, we go back over the states through which the network has passed and select the one with the highest modularity.

We then use that state as the starting condition for another round of the same algorithm, and we keep repeating the whole process until the modularity no longer improves.

What is METIS?

METIS is aimed at partitioning undirected graphs, according to the topological characteristics of the network.

Partitioning is based on a so-called multilevel graph bisection. It implies a progressive reduction of the graph, with a subsequent “regrowth” to its original size.

What is the difference between 

  1. Named Entity Recognition (NER)
  2. Named Entity Linking (NEL)
  3. Named Entity Disambiguation (NED)

NER: Involves identifying and classifying entities, such as names of people, organizations, locations, dates, and other predefined categories, within a given text.

NEL: In the NER process, the system identifies that a word or phrase is a named entity and assigns it a specific category (e.g., person, organization). In the NEL process, the system goes further by trying to link these recognized entities to specific entries in a knowledge base, which provides additional information about those entities.

NED: Resolving ambiguities that arise when a named entity could refer to multiple entities with similar names or descriptions

What is the challenge of NER, NEL and NED in social media text?

  • Short and noisy nature, typographic errors, shortening of words, ambiguity, polysemy --> makes it difficult for the algorithms to read
  • Out of vocabulary --> concept is in the data base, but it is not recognised because a synonym is being used
  • Out of knowledge base --> new word that has not yet been entered in data base
  • Named entity overlap --> word with two meanings in same sentence or close to each other (apple as fruit and apple as company)
  • User-generated entities --> Users on social media platforms often create new terms, hashtags, or nicknames for entities
  • Context variability --> Entities can change roles and relationships rapidly on social media. 

What is the sequence prediction problem?

What are the main charachteristics of SPP?

What is an example of the use of SPP?

What are conditional random fields (CRFs)?

CRFs help in situations where the prediction for one element in a sequence depends on the context of the other elements. They are particularly useful for structured prediction problems where the output has some inherent structure or relationships, and the goal is to model these dependencies to improve prediction accuracy.

 

What are feature functions in CRF?

They are used to model the dependencies between input features and output labels. They are the building blocks that enable CRFs to capture complex patterns and relationships in structured prediction tasks. The combination of feature functions and their associated weights forms the basis for making predictions in a CRF.

Based on which metrics can NER be evaluated?

How does NER look graphically?

What are some ressources for knowledge bases?

What are NEL identifiers?

"Barack Obama was born in Hawaii."

In this sentence, "Barack Obama" is a named entity, and Named Entity Linking would involve finding the unique identifier associated with the entry for "Barack Obama" in a knowledge base.

What charachteristics should NEL identifiers have?

What are two examples of NEL identifiers?

How is RDF used to NEL?

Which to categories of NEL do we differentiate?

Was is the difference between NED and Word Sense Disambiguation (WSD)?

What is a sense and what is its goal?

How does the knowledge-based disambiguation for WSD work?

Disambiguation works through the use of external lexical resources such as dictionaries and thesauri.

Machine Readable Dictionaries (MRD): For each word in the language vocabulary, a MRD provides:

  • A list of meanings
  • Definitions (for all word meanings)
  • Typical usage examples (for most word meanings)

Thesaurus: adds explicit synonymy relation between word meanings. E.g.:

  • plant, works, industrial plant
  • plant, flora, plant life

Semantic networks: adds more semantic relations:

  • {plant, flora, plant life}
    • hypernym: {organism, being}
    • hypomym: {house plant}, {fungus}, ...
    • meronym: {plant tissue}, {plant part}
    • holonym: {Plantae, kingdom Plantae, plant kingdom} 

What is one approach to for NED?

The Lesk algorithm

Goal: compare the contexts of ambiguous words (or entities) with the definitions or contexts of candidate senses from a lexical resource (like a dictionary or a knowledge base) --> definition overlap

1. Retrieve from MRD all sense definitions of the words to be disambiguated.

2. Determine the definition overlap for all possible sense combinations.

3. Choose senses that lead to highest overlap.

 

Example:

We want to understand what is meant by "pine cone" (which is part of a sentence)

We collect all the definitions for both words and see where there is the most overlap (see picture)

 

If we want to know what a word means but we have a longer sentence with multiple words who have multiple senses/definitions, what algorithm can we use?

The simplified lesk algorithm

Instead of looking for the most overlap between the sense combinations, we look at where there is the most overlap between a definition and the words of the sentence we are looking at (see image)

What are unsupervised methods of WSD?

What is Word Sense Induction?

What are chinese whispers?

It belongs to Co-occurrence graphs algorithms of WSI

 

What are privacy issues with our data?

What are advantages and disadvantages of collecting data about people (e.g. electronic patient data)?

Quale legge vale in italia rigaurdo la privacy dei dati?

GDPR

What is

  1. data controller
  2. data processor
  3. dat subject
  4. personal data
  5. sensitive personal data

GDPR

What is the difference between anonymous and pseudonymous data?

What requirements does the GDPR request?

What is the potential privacy issue with log files?

How can we enhance our privacy in regard to log files?

What is the disadvantage of a proxy?

Proxy has all our requested pages, since we request all pages using the same proxy

What is a cookie?

How can cookies be classified?

What are third-party-cookies?

Third-party cookies are generated and placed on the user's device by a different website other than the one the user is visiting. They are created when a user visits a website that includes elements from other sites, such as third-party images or ads.

What can be said about disabling cookies?

What are seal programs?