CS404 Artifical Intelligence (NUIM)
Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth
Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth
Create or copy sets of flashcards
With an upgrade you can create or copy an unlimited number of sets and use many more additional features.
Set of flashcards Details
Flashcards | 43 |
---|---|
Language | English |
Category | Computer Science |
Level | University |
Created / Updated | 20.05.2015 / 20.05.2015 |
Weblink |
https://card2brain.ch/box/cs404_artifical_intelligence_nuim
|
Embed |
<iframe src="https://card2brain.ch/box/cs404_artifical_intelligence_nuim/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Create or copy sets of flashcards
With an upgrade you can create or copy an unlimited number of sets and use many more additional features.
Log in to see all the cards.
When to use nearest Neighbour?
- Instances map to points in Rn
- Less than 20 attributes per instance
- When you've lots of training data
Advantages of Nearest Neighbour
- "Training" is very fast
- Learn complex target functions
- Don't lose information
- Constantly changing/updated database
Disadvantages of Nearest Neighbour
- Slow at query time
- Easily fooled by irrelevant attributes
Aris Process
- Representation
- Retrieval
- Graph Mapping
- Transfer spec# statements
- validate original code + new spec#
Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.
Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.
What is a GA (Genetic Algorithm)? What is it used for?
- model of machine learning which derives it's behavior from a metaphor of some of the mechanisms of Evolution, in nature
- used for a number of different application areas, like multidimensional optimisation and search problems
- parallel, non-comprehensive search for the global maximum of the graph
- GAs represent a generate-and-test beam-search through hypothesis space
- search is not precise (no guarantee that the global maximum will be found)
- however, the result should be a good approximation of the maximum value
Steps of GA (Genetic Algorithm)
- Create a population of individuals randomly
- Evaluate fitness of each individual in the population
- Select Individuals
- Reproduction (Rank, Tournament)
- Crossover
- Mutation
- Repeat until a suitable fit individual is found
GA: Chromosome
- Individuals within our universe represented by chromosomes - a set of character strings analogous to DNA
- These chromosomes are used to represent the different parameters being optimised
- in practice, we represent chromosomes by having arrays of bits
GA: Population
- We begin by creating an Initial Population of random individuals, whose chromosomes are randomly selected
- Typically we create many individuals (10000 > x > 50) depending on resources and complexity
GA: Selection
- By using a carefully selected fitness function we can determine the relative goodness of the solution represented by each individual (as represented by its chromosomes)
- Only selected entities take place in breeding, and get to be involved in the breeding process
- Fitter entities have a greater likelihood of engaging in breeding
GA: Crossover
- The production of progeny is frequently based on taking the first N chromosomes from parent one, and the remainder from parent two
- This produces a new individual with some genetic material from each parent
- This follows local (hopefully global) maxima
Document Retrieval (= Abfrage)
Queries.. ?!
Identify the top quality documents in a database that relate to some query:
- Query Dependent: Evaluate document quality in relation to some given query
- Query Independent (Page Rank): Evaluate document quality before (& independent of) any query terms. All ranking is done before any query arrives
Measuring Retrieval Accuracy
Precision: the number of relevant documents retrieved, divided by the total number of documents retrieved. (=sind möglichst viele die ich bekomme relevant?)
\(Precision = \frac{\#RelevantRetrieved}{\#TotalDocuments}\)
Recall: the number of relevant documents retrieved divided by the total number of relevant documents in the long-term memory (=bekomme ich möglichst alle relevanten?)
\(Recall = \frac{\#RelevantRetrieved}{\#TotalRelevant}\)
Document Ranking & Retrieval (What is it for? Citations, Examples)
- For a database of Hyperlinked documents
- Citations confer quality
- Citations from highly referenced documents are even better
- "Citation Ranking" of academic documents:
- Google Scholar
- Citeseer
- Academic research microsoft
Random Walk
- Consider sombody randomly following links across the web (occasionally restarting at a random page)
- He will frequently visit pages with many in-coming links (Pages with no in-link only get visited if hey actually starts at that page)
- The summation of many random walks gives a likelihood of visiting that page \(\rightarrow\) This is the Google PageRank score!
- Rank is based on theoretical visitors - not the number of people who actually visit a webpage
PageRank (formula!)
The PageRank of a page is the chance that a random surfer will lond on that page
\(PR(p) = (1-d) + d \cdot \sum_{(q,p) \in E} \frac{PR(q)}{outdegree(q)}\)
where PR = PageRank, d = 0.85, outdegree(q) = number of out-links contained on page q
Iterative Calculation of PageRank
- Iterative Calculation of the true PageRank value: Begin by assuming all pages PR(x) = 0.15
- Random iterative application of PageRank rule gradually "converges" to the final solution
- Average Page Rank alue is 1, a few highly ranked pages, and vast numbers of poorly linked pages near 0.15
Document Indexing
- Documents are pre-ranked and pre-indexed by words
- Google contains a document-word matrix, listing all webpages that use each word
- When a query is supplied, Google finds the highest ranked page(s) that contain that term
tf-idf Term-Document Relevance
- tf-idf = term frequency - inverse document frequency
- The document-term index contains a relevance weight
- indicating the relevance of a term to that document (0<r<1)
- r large if term in: Title, bold, appears twice
- Query Score = PageRank-value * relevance (tf-idf)
- This allows 2 queries to return 2 pages, with "opposite" ranking
- without relevance factor, documents could only be returned in a pre-determined order
- The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
What is a Cellular Automata?
- Simple computational machines
- "Bottom up" approach provided
- Behaviour "emerges" at higher leves: when looking at large clusters of cells
Components of a CA
1. states: e.g. 0 or 1
2. neighbourhood: the two adjacent (angrenzend) cells N C N
3. rules: depicts the new state of a cell for each (possible) local "configuration" of the cell and its two neighbours
CA - Game of Life - Rules (0 - 4)
0. Grid filled with "organisms", alive or dead
1. live cell < two live neighbours -> dies (loneliness)
2. live cell > three live neighbours -> dies (overcrowding)
3. live cell = two or three live neighbours -> lives
4. dead cell = three live neighbours -> comes to live
-
- 1 / 43
-