06_Information_Retrieval
abc
abc
26
0.0 (0)
Kartei Details
Karten | 26 |
---|---|
Sprache | English |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 07.02.2018 / 21.01.2024 |
Lizenzierung | Keine Angabe |
Weblink |
https://card2brain.ch/box/20180207_6informationretrieval
|
Einbinden |
<iframe src="https://card2brain.ch/box/20180207_6informationretrieval/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Basic concepts in IR
- Information Need:
- State (of a person) of requiring information for solving an actual problem (e.g.: get rid of mice without hurting them). Also called intent
- Query:
- Textual representation of the information need, e.g. as entered into a search engine
- Information need is abstract or fuzzy and therefore does not easily translate into query
- In IR, queries are normally treated as short documents themselves
- Relevance:
- Property of a document with respect to a particular information need. The task of IR is to retrieve relevant documents
- Information needs (intent) and document contents are often subject to
- What one user finds relevant, another might find off-topic!
- Information needs (intent) and document contents are often subject to
- Property of a document with respect to a particular information need. The task of IR is to retrieve relevant documents
- Relevance Feedback:
- Information about the actual relevance of retrieved documents given by the user of an IR system to the system
- IR Evaluation:
- Measuring the quality of an IR system’s performance
Other sources for term weights
- Fields:
- For structured collections. E.g.: term in book title more important than in description
- Zones:
- For semi-structured collections, such as HTML documents. E.g. term in <h1>-Tag more important than in text; terms in certain frames irrelevant
- Static document weights:
- Some documents are more trusted than others, independent of query. Leads to tiered indices: separate indices for documents with high, medium and low ranks
The Concept of Relevance for Evaluation
- In light of the subjectivity of relevance:
- have n>1 annotators independently perform relevance judgments for many queries
- compare the agreement of annotators
- High agreement means: annotation results are reliable
- assign relevance labels by majority vote
- use the average of judgments to define a ranking
- High agreement means: annotation results are reliable
- Annotation Schemata
- hard for annotators:
- judge the (graded) absolute relevance of a document, given a query
- easier for annotators:
- judge relatively, which of two documents is more relevant, given a query.
- Disadvantage: more judgments because of pairings
- judge relatively, which of two documents is more relevant, given a query.
- hard for annotators:
Types of IR evaluation sets
- Result sets (unordered):
- Returned documents = sub-set of all documents
- Binary relevance feature: in/out
- Result lists (ordered):
- Returned documents = sub-set of all documents, ordered/ranked according to their relevance for the topic
- Relevance judgments
- 5-point-scale judgments for top-n results
- specific to (commercial) search engine
Problems with Result Sets
- P, R and F measure the quality of the result set accurately.
- But in a real user setting:
- Result is looked at sequentially in a ranked list
- Users look at top k documents only