08_Summarizing

abc

0.0 (0)

Kartei Details

Karten	13
Sprache	English
Kategorie	Informatik
Stufe	Universität
Erstellt / Aktualisiert	07.02.2018 / 28.05.2020
Lizenzierung	Keine Angabe
Weblink	https://card2brain.ch/box/20180207_8summarizing
Einbinden	<iframe src="https://card2brain.ch/box/20180207_8summarizing/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Flip-Modus

Main components and parameter of summarization systems

Components
- Content selection:
  - selection of information from the document(s) to be summarized
- Ordering of the extracted units
- Sentence realization:
  - improve the output to obtain fluent text
Main Parameter
- Compression rate:
  - length of the summary or proportion of text to be kept

Main use of summarization

Reduce information overload by extracting relevant information

Types of Summaries

Other types:
- Contrastive multiple-document summaries:
  - the common topics of all the documents as well as unique topics pertaining to each document
- Update summaries:
  - only new information that has not been covered before

Single-document summarization

Steps:

Content selection:
- choose sentences to extract from the document, either with an unsupervised or supervised method
- Baseline: use first k sentences
Information ordering:
- choose an order for the sentences
- Baseline: keep the order of the original text
Sentence realization:
- clean up sentences, e.g. sentence simplification, sentence fusion, etc.
- Baseline: do not perform any combination or clean-up

Centroid-based content selection

Further Approaches

The SumBasic system

Compute probability for each word
n is the number of times the word appeared in the input and N is the total number of words in the input
p(w) = n/N
for each sentence average probability of the words in it
then pick best scoring sentence
update probability for words in chosen sentence p(w_new) = p(w_old)^2

Log-Likelihood ratio (LLR)

LexRank

Lexical chains

group sets of words, esp. nouns, which are semantically related (same word - same sense, synonyms, hypernyms/hyponyms, co-hyponyms, collocations)
Lexical chains can be used to identify important concepts from a document
Each noun instance usually belongs to exactly one lexical chain
- → it is necessary to perform word sense disambiguation
Example: Picture

Supervised Content Selection

Define features to assess sentence saliency
Training data: corpus where sentences are annotated as part of the extract summary (1) or not (0)
Example Features:
- Fixed-phrase feature:
  - “in conclusion” indicates summary
- Position feature:
  - 1^st/ last paragraph, initial/ final sentence more likely to be important
- Thematic word feature:
  - Repetition as indicator of importance
- Important words:
  - Sentence with high salience (e.g. several words with TF-IDF weight)
- Uppercase word feature:
  - Often indicates named entities