10_Question_Answering

abc

abc

13

0.0 (0)

Kartei Details

Karten	13
Sprache	English
Kategorie	Informatik
Stufe	Universität
Erstellt / Aktualisiert	07.02.2018 / 09.02.2018
Lizenzierung	Keine Angabe
Weblink	https://card2brain.ch/box/20180207_10questionanswering
Einbinden	<iframe src="https://card2brain.ch/box/20180207_10questionanswering/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Flip-Modus

Question Answering vs. Information Retrieval

INPUT:
- NL language questions and not keyword-based queries:
  - QA:
    - How long do polar bears live?
  - IR:
    - polar bears life span
OUTPUT:
- Precise and concise answers, not whole documents
  - QA:
    - In the wild, polar bears live an average of 15 to 18 years, although biologists have tagged a few bears in their early 30s. In captivity, they may live until their mid- to late 30s. One zoo bear in London lived to be 41.
  - IR:
    - www.gotpetsonline.com/polar-bear/bear-habitat-polar/polar-bear-life-
    - span.html
    - www.starbus.com/polarbear/aboutpb.htm
    - www.polarbearsinternational.org/faq

Generic QA System Architecture

Melde dich an, um Bilder anzusehen.

Question Processing (including Question Type)

Goal: extract clues from the question
- Question type
  - Factual answers → Factoid questions
    - How long do polar bears live?
  - Definitional answers → Definition questions
    - Who is Britney Spears?
  - Opinionated answers → Opinion questions
    - What do you think of Britney Spears' last album?
  - There exist different Taxonomy for Question types: e.g. Example, Comparison, Quantification etc.
    - Li&Roth Two-layered Taxonomy with 6 coarse and 50 fine classes
      - Abbreviation:expression, Entity:animal, Description:def, Human:individual, Location:country, Numeric:date
  - More difficult types like How and Why require complex answers
    - TREC-QA: main question types
      - Factoid
      - List
      - Definition
      - Other
- Expected answer type(s)
  - The answer type is the semantic category of the expected answer
    - Country, author etc.
- Named Entities
- Interesting terms used to query the search engine
- Focus
- Topic
- Prediction of the question difficulty

Question Classification

Rule-based
- Biographical questions
  - Who {is | was | are | were} < person name(s) >?
- Definition questions etc.
- Pros & Cons?
  - Very powerful
  - Cumbersome(mühsam) to create
  - Do not generalise well
Machine Learning
- Trained on hand-labeled questions, such as the corpus of Li & Roth
- Question features
  - Tokens
  - Named Entities
  - POS tags
  - Chunks
  - N-grams
  - Question word

Document Retrieval

Identify the N most relevant documents given an input question
For this, the question has to be reformulated as a query:
- Removal of stop words
- Stemming or lemmatization
- Query expansion
- Apply query reformulation rules
- …

Passage Retrieval and Scoring

Aim: return the N most relevant passages from the top ranked
Passages: sentences, paragraphs, sections / topical segments
The others are ranked based on:
- the number of Named Entities of the right type
- the number of question keywords in the passage
- the rank of the document from which the passage was extracted
- …

Answer Identification

Find the best answer to the question
Two types of methods:
- Pattern extraction, using regular expression patterns corresponding to the expected answer type
- Redundancy-based approach

Query reformulation

Aim: automatically generate answer paraphrases for a given question
Formulate multiple queries for each question and retrieve the 100 best matching pages for each
- Rewrite rules are simple string-based manipulations
  - Question: “Where is the Louvre located?”
  - Rewrite Query:
    - “+the Louvre +is located”
    - “+the Louvre +is +in”
    - “+the Louvre +is near”
- Use of a search engine (Google) to find answers on the Web

1
2