Premium Partner

10_Question_Answering

abc

abc


Kartei Details

Karten 13
Sprache English
Kategorie Informatik
Stufe Universität
Erstellt / Aktualisiert 07.02.2018 / 09.02.2018
Lizenzierung Keine Angabe
Weblink
https://card2brain.ch/box/20180207_10questionanswering
Einbinden
<iframe src="https://card2brain.ch/box/20180207_10questionanswering/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Question Answering vs. Information Retrieval

  •  INPUT:
    •  NL language questions and not keyword-based queries:
      •  QA:
        • How long do polar bears live?
      •  IR:
        • polar bears life span
  •  OUTPUT:
    • Precise and concise answers, not whole documents
      •  QA:
        • In the wild, polar bears live an average of 15 to 18 years, although  biologists have  tagged a few bears in their early 30s. In captivity, they may live until their mid- to late 30s. One zoo bear in London lived to be 41.
      • IR:
        • www.gotpetsonline.com/polar-bear/bear-habitat-polar/polar-bear-life-
        • span.html 
        • www.starbus.com/polarbear/aboutpb.htm
        • www.polarbearsinternational.org/faq

Generic QA System Architecture

Question Processing (including Question Type)

  • Goal: extract clues from the question
    • Question type
      • Factual answers → Factoid questions
        • How long do polar bears live?
      • Definitional answers → Definition questions
        • Who is Britney Spears?
      • Opinionated answers → Opinion questions
        • What do you think of Britney Spears' last album?
      • There exist different Taxonomy for Question types: e.g. Example, Comparison, Quantification etc.
        • Li&Roth Two-layered Taxonomy with 6 coarse and 50 fine classes
          • Abbreviation:expression, Entity:animal, Description:def, Human:individual, Location:country, Numeric:date
      • More difficult types like How and Why require complex answers
        • TREC-QA: main question types
          • Factoid
          • List
          • Definition
          • Other
    • Expected answer type(s)
      • The answer type is the semantic category of the expected answer
        • Country, author etc.
    • Named Entities
    • Interesting terms used to query the search engine
    • Focus
    • Topic
    • Prediction of the question difficulty

Question Classification

  • Rule-based
    • Biographical questions
      • Who {is | was | are | were} < person name(s) >?
    • Definition questions etc.
    • Pros & Cons?
      • Very powerful
      • Cumbersome(mühsam) to create
      • Do not generalise well
  • Machine Learning
    • Trained on hand-labeled questions, such as the corpus of Li & Roth
    • Question features
      • Tokens
      • Named Entities
      • POS tags
      • Chunks
      • N-grams
      • Question word

Document Retrieval

  • Identify the N most relevant documents given an input question
  • For this, the question has to be reformulated as a query:
    • Removal of stop words
    • Stemming or lemmatization
    • Query expansion
    • Apply query reformulation rules

Passage Retrieval and Scoring

  • Aim: return the N most relevant passages from the top ranked
  • Passages: sentences, paragraphs, sections / topical segments
  • The others are ranked based on:
    • the number of Named Entities of the right type
    • the number of question keywords in the passage
    • the rank of the document from which the passage was extracted

Answer Identification

  • Find the best answer to the question
  • Two types of methods:
    • Pattern extraction, using regular expression patterns corresponding to the expected answer type
    • Redundancy-based approach

Query reformulation 

  • Aim: automatically generate answer paraphrases for a given question
  • Formulate multiple queries for each question and retrieve the 100 best matching pages for each
    • Rewrite rules are simple string-based manipulations
      • Question: “Where is the Louvre located?”
      • Rewrite Query:
        •  “+the Louvre +is located”
        • “+the Louvre +is +in”
        • “+the Louvre +is near”
    • Use of a search engine (Google) to find answers on the Web