SMA
SMA Unimib
SMA Unimib
Fichier Détails
Cartes-fiches | 287 |
---|---|
Langue | Deutsch |
Catégorie | Technique |
Niveau | Université |
Crée / Actualisé | 06.12.2023 / 15.01.2024 |
Lien de web |
https://card2brain.ch/box/20231206_sma
|
Intégrer |
<iframe src="https://card2brain.ch/box/20231206_sma/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
How does the Kernighan-Lin Algorithm work?
The Kernighan-Lin algorithm divides networks into two communities starting from some initial division, such as a random division into equally sized groups. The algorithm then considers each vertex in the network in turn and calculates how much the modularity would change if that vertex were moved to the other group.
It then chooses among the vertices the one whose movement would most increase, or least decrease, the modularity and moves it.
Then it repeats the process, but with the important constraint that a vertex once moved cannot be moved again, at least on this round of the algorithm. And so the algorithm proceeds, repeatedly moving the vertices that most increase or least decrease the modularity.
When all vertices have been moved exactly once, we go back over the states through which the network has passed and select the one with the highest modularity.
We then use that state as the starting condition for another round of the same algorithm, and we keep repeating the whole process until the modularity no longer improves.
What is METIS?
METIS is aimed at partitioning undirected graphs, according to the topological characteristics of the network.
Partitioning is based on a so-called multilevel graph bisection. It implies a progressive reduction of the graph, with a subsequent “regrowth” to its original size.
What is the difference between
- Named Entity Recognition (NER)
- Named Entity Linking (NEL)
- Named Entity Disambiguation (NED)
NER: Involves identifying and classifying entities, such as names of people, organizations, locations, dates, and other predefined categories, within a given text.
NEL: In the NER process, the system identifies that a word or phrase is a named entity and assigns it a specific category (e.g., person, organization). In the NEL process, the system goes further by trying to link these recognized entities to specific entries in a knowledge base, which provides additional information about those entities.
NED: Resolving ambiguities that arise when a named entity could refer to multiple entities with similar names or descriptions
What is the challenge of NER, NEL and NED in social media text?
- Short and noisy nature, typographic errors, shortening of words, ambiguity, polysemy --> makes it difficult for the algorithms to read
- Out of vocabulary --> concept is in the data base, but it is not recognised because a synonym is being used
- Out of knowledge base --> new word that has not yet been entered in data base
- Named entity overlap --> word with two meanings in same sentence or close to each other (apple as fruit and apple as company)
- User-generated entities --> Users on social media platforms often create new terms, hashtags, or nicknames for entities
- Context variability --> Entities can change roles and relationships rapidly on social media.
What are conditional random fields (CRFs)?
CRFs help in situations where the prediction for one element in a sequence depends on the context of the other elements. They are particularly useful for structured prediction problems where the output has some inherent structure or relationships, and the goal is to model these dependencies to improve prediction accuracy.
What are feature functions in CRF?
They are used to model the dependencies between input features and output labels. They are the building blocks that enable CRFs to capture complex patterns and relationships in structured prediction tasks. The combination of feature functions and their associated weights forms the basis for making predictions in a CRF.
How does the knowledge-based disambiguation for WSD work?
Disambiguation works through the use of external lexical resources such as dictionaries and thesauri.
Machine Readable Dictionaries (MRD): For each word in the language vocabulary, a MRD provides:
- A list of meanings
- Definitions (for all word meanings)
- Typical usage examples (for most word meanings)
Thesaurus: adds explicit synonymy relation between word meanings. E.g.:
- plant, works, industrial plant
- plant, flora, plant life
Semantic networks: adds more semantic relations:
- {plant, flora, plant life}
- hypernym: {organism, being}
- hypomym: {house plant}, {fungus}, ...
- meronym: {plant tissue}, {plant part}
- holonym: {Plantae, kingdom Plantae, plant kingdom}
What is one approach to for NED?
The Lesk algorithm
Goal: compare the contexts of ambiguous words (or entities) with the definitions or contexts of candidate senses from a lexical resource (like a dictionary or a knowledge base) --> definition overlap
1. Retrieve from MRD all sense definitions of the words to be disambiguated.
2. Determine the definition overlap for all possible sense combinations.
3. Choose senses that lead to highest overlap.
Example:
We want to understand what is meant by "pine cone" (which is part of a sentence)
We collect all the definitions for both words and see where there is the most overlap (see picture)
If we want to know what a word means but we have a longer sentence with multiple words who have multiple senses/definitions, what algorithm can we use?
What is the disadvantage of a proxy?
Proxy has all our requested pages, since we request all pages using the same proxy
What are third-party-cookies?
Third-party cookies are generated and placed on the user's device by a different website other than the one the user is visiting. They are created when a user visits a website that includes elements from other sites, such as third-party images or ads.