CS404 Artifical Intelligence (NUIM)
Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth
Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth
Kartei Details
Karten | 43 |
---|---|
Sprache | English |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 20.05.2015 / 20.05.2015 |
Lizenzierung | Keine Angabe |
Weblink |
https://card2brain.ch/box/cs404_artifical_intelligence_nuim
|
Einbinden |
<iframe src="https://card2brain.ch/box/cs404_artifical_intelligence_nuim/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Document Retrieval (= Abfrage)
Queries.. ?!
Identify the top quality documents in a database that relate to some query:
- Query Dependent: Evaluate document quality in relation to some given query
- Query Independent (Page Rank): Evaluate document quality before (& independent of) any query terms. All ranking is done before any query arrives
Measuring Retrieval Accuracy
Precision: the number of relevant documents retrieved, divided by the total number of documents retrieved. (=sind möglichst viele die ich bekomme relevant?)
\(Precision = \frac{\#RelevantRetrieved}{\#TotalDocuments}\)
Recall: the number of relevant documents retrieved divided by the total number of relevant documents in the long-term memory (=bekomme ich möglichst alle relevanten?)
\(Recall = \frac{\#RelevantRetrieved}{\#TotalRelevant}\)
Document Ranking & Retrieval (What is it for? Citations, Examples)
- For a database of Hyperlinked documents
- Citations confer quality
- Citations from highly referenced documents are even better
- "Citation Ranking" of academic documents:
- Google Scholar
- Citeseer
- Academic research microsoft
Random Walk
- Consider sombody randomly following links across the web (occasionally restarting at a random page)
- He will frequently visit pages with many in-coming links (Pages with no in-link only get visited if hey actually starts at that page)
- The summation of many random walks gives a likelihood of visiting that page \(\rightarrow\) This is the Google PageRank score!
- Rank is based on theoretical visitors - not the number of people who actually visit a webpage
PageRank (formula!)
The PageRank of a page is the chance that a random surfer will lond on that page
\(PR(p) = (1-d) + d \cdot \sum_{(q,p) \in E} \frac{PR(q)}{outdegree(q)}\)
where PR = PageRank, d = 0.85, outdegree(q) = number of out-links contained on page q
Iterative Calculation of PageRank
- Iterative Calculation of the true PageRank value: Begin by assuming all pages PR(x) = 0.15
- Random iterative application of PageRank rule gradually "converges" to the final solution
- Average Page Rank alue is 1, a few highly ranked pages, and vast numbers of poorly linked pages near 0.15