Premium Partner

CS404 Artifical Intelligence (NUIM)

Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth

Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth

Kartei Details

Karten 43
Sprache English
Kategorie Informatik
Stufe Universität
Erstellt / Aktualisiert 20.05.2015 / 20.05.2015
Lizenzierung Keine Angabe
Weblink
https://card2brain.ch/box/cs404_artifical_intelligence_nuim
Einbinden
<iframe src="https://card2brain.ch/box/cs404_artifical_intelligence_nuim/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Document Retrieval (= Abfrage)

Queries.. ?!

Identify the top quality documents in a database that relate to some query:

  • Query Dependent: Evaluate document quality in relation to some given query
  • Query Independent (Page Rank): Evaluate document quality before (& independent of) any query terms. All ranking is done before any query arrives 

Measuring Retrieval Accuracy

Precision: the number of relevant documents retrieved, divided by the total number of documents retrieved. (=sind möglichst viele die ich bekomme relevant?)

 \(Precision = \frac{\#RelevantRetrieved}{\#TotalDocuments}\)

Recall: the number of relevant documents retrieved divided by the total number of relevant documents in the long-term memory (=bekomme ich möglichst alle relevanten?)

\(Recall = \frac{\#RelevantRetrieved}{\#TotalRelevant}\)

Document Ranking & Retrieval (What is it for? Citations, Examples)

  • For a database of Hyperlinked documents
  • Citations confer quality
  • Citations from highly referenced documents are even better
  • "Citation Ranking" of academic documents:
    • Google Scholar
    • Citeseer
    • Academic research microsoft

H - Index

  • H-index metric for Academic productivity and quality
  • h-index = h if you have published h papers, each of which has at least h citations

In-link vs. Out-link

  • out-link = backlink
  • Hyperlinks on a page are outlinks pointing to other pages on the web
  • An inlink to one page is an outlink from another page 
  • Link from A to B is an out-link from A but an in-link from B

 

Random Walk

  • Consider sombody randomly following links across the web (occasionally restarting at a random page)
  • He will frequently visit pages with many in-coming links (Pages with no in-link only get visited if hey actually starts at that page)
  • The summation of many random walks gives a likelihood of visiting that page \(\rightarrow\) This is the Google PageRank score!
  • Rank is based on theoretical visitors - not the number of people who actually visit a webpage

PageRank (formula!)

The PageRank of a page is the chance that a random surfer will lond on that page

\(PR(p) = (1-d) + d \cdot \sum_{(q,p) \in E} \frac{PR(q)}{outdegree(q)}\)

where PR = PageRank, d = 0.85, outdegree(q) = number of out-links contained on page q

Iterative Calculation of PageRank

  • Iterative Calculation of the true PageRank value: Begin by assuming all pages PR(x) = 0.15
  • Random iterative application of PageRank rule gradually "converges" to the final solution
  • Average Page Rank alue is 1, a few highly ranked pages, and vast numbers of poorly linked pages near 0.15