CS404 Artifical Intelligence (NUIM)

Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth

0.0 (0)

Set of flashcards Details

Flashcards	43
Language	English
Category	Computer Science
Level	University
Created / Updated	20.05.2015 / 20.05.2015
Weblink	https://card2brain.ch/box/cs404_artifical_intelligence_nuim
Embed	<iframe src="https://card2brain.ch/box/cs404_artifical_intelligence_nuim/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Card list

Study

1 - Nearest Neighbour

Given a query instance x_q, find the nearest neighbour x_n and estimate f(x_q) as f(x_n)
Find the most similar recorded problem
- Assume they have the same solutions
- i.e. generalise across one point

KNN - Problem

Predicting f(x_q)

Take "vote" among it's k nearest neighbours
- if discrete-valued target function
Take a mean (Mittelwert) of f(x) values for the k nearest neighbours
- if real-valued target function

When to use nearest Neighbour?

Instances map to points in Rⁿ
Less than 20 attributes per instance
When you've lots of training data

Advantages of Nearest Neighbour

"Training" is very fast
Learn complex target functions
Don't lose information
Constantly changing/updated database

Disadvantages of Nearest Neighbour

Slow at query time
Easily fooled by irrelevant attributes

Aris Process

Representation
Retrieval
Graph Mapping
Transfer spec# statements
validate original code + new spec#

Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.

Crossover: The production of offspring is frequently based on taking the first N chroosomes from parent one, and the remainder from parent two. This produces a new individual with some genetic material from each parent.

Two-Point Crossover: You have two crossover points, so 3 strings of bits

Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.

Mutation: The rate of mutation is determined by a global constant/variable which determines the probability of a random alteration of one bit of genetic material of an individual (This helps avoid local minima)

Single-point mutation: flipping one bit

What is a GA (Genetic Algorithm)? What is it used for?

model of machine learning which derives it's behavior from a metaphor of some of the mechanisms of Evolution, in nature
used for a number of different application areas, like multidimensional optimisation and search problems
parallel, non-comprehensive search for the global maximum of the graph
GAs represent a generate-and-test beam-search through hypothesis space
search is not precise (no guarantee that the global maximum will be found)
however, the result should be a good approximation of the maximum value

Steps of GA (Genetic Algorithm)

Create a population of individuals randomly
Evaluate fitness of each individual in the population
Select Individuals
1. Reproduction (Rank, Tournament)
2. Crossover
3. Mutation
Repeat until a suitable fit individual is found

GA: Chromosome

Individuals within our universe represented by chromosomes - a set of character strings analogous to DNA
These chromosomes are used to represent the different parameters being optimised
in practice, we represent chromosomes by having arrays of bits

GA: Population

We begin by creating an Initial Population of random individuals, whose chromosomes are randomly selected
Typically we create many individuals (10000 > x > 50) depending on resources and complexity

GA: Selection

By using a carefully selected fitness function we can determine the relative goodness of the solution represented by each individual (as represented by its chromosomes)
Only selected entities take place in breeding, and get to be involved in the breeding process
Fitter entities have a greater likelihood of engaging in breeding

GA: Crossover

The production of progeny is frequently based on taking the first N chromosomes from parent one, and the remainder from parent two
This produces a new individual with some genetic material from each parent
This follows local (hopefully global) maxima

Document Retrieval (= Abfrage)

Queries.. ?!

Keyboard commands:

= turn,

= for-/backward,

= scroll

Identify the top quality documents in a database that relate to some query:

Query Dependent: Evaluate document quality in relation to some given query
Query Independent (Page Rank): Evaluate document quality before (& independent of) any query terms. All ranking is done before any query arrives

Measuring Retrieval Accuracy

Keyboard commands:

= turn,

= for-/backward,

= scroll

Precision: the number of relevant documents retrieved, divided by the total number of documents retrieved. (=sind möglichst viele die ich bekomme relevant?)

\(Precision = \frac{\#RelevantRetrieved}{\#TotalDocuments}\)

Recall: the number of relevant documents retrieved divided by the total number of relevant documents in the long-term memory (=bekomme ich möglichst alle relevanten?)

\(Recall = \frac{\#RelevantRetrieved}{\#TotalRelevant}\)

Document Ranking & Retrieval (What is it for? Citations, Examples)

Keyboard commands:

= turn,

= for-/backward,

= scroll

For a database of Hyperlinked documents
Citations confer quality
Citations from highly referenced documents are even better
"Citation Ranking" of academic documents:
- Google Scholar
- Citeseer
- Academic research microsoft

H - Index

Keyboard commands:

= turn,

= for-/backward,

= scroll

H-index metric for Academic productivity and quality
h-index = h if you have published h papers, each of which has at least h citations

In-link vs. Out-link

Keyboard commands:

= turn,

= for-/backward,

= scroll

out-link = backlink
Hyperlinks on a page are outlinks pointing to other pages on the web
An inlink to one page is an outlink from another page
Link from A to B is an out-link from A but an in-link from B

Random Walk

Keyboard commands:

= turn,

= for-/backward,

= scroll

Consider sombody randomly following links across the web (occasionally restarting at a random page)
He will frequently visit pages with many in-coming links (Pages with no in-link only get visited if hey actually starts at that page)
The summation of many random walks gives a likelihood of visiting that page \(\rightarrow\) This is the Google PageRank score!
Rank is based on theoretical visitors - not the number of people who actually visit a webpage

PageRank (formula!)

Keyboard commands:

= turn,

= for-/backward,

= scroll

The PageRank of a page is the chance that a random surfer will lond on that page

\(PR(p) = (1-d) + d \cdot \sum_{(q,p) \in E} \frac{PR(q)}{outdegree(q)}\)

where PR = PageRank, d = 0.85, outdegree(q) = number of out-links contained on page q

Iterative Calculation of PageRank

Keyboard commands:

= turn,

= for-/backward,

= scroll

Iterative Calculation of the true PageRank value: Begin by assuming all pages PR(x) = 0.15
Random iterative application of PageRank rule gradually "converges" to the final solution
Average Page Rank alue is 1, a few highly ranked pages, and vast numbers of poorly linked pages near 0.15

Document Indexing

Keyboard commands:

= turn,

= for-/backward,

= scroll

Documents are pre-ranked and pre-indexed by words
Google contains a document-word matrix, listing all webpages that use each word
When a query is supplied, Google finds the highest ranked page(s) that contain that term

tf-idf Term-Document Relevance

Keyboard commands:

= turn,

= for-/backward,

= scroll

tf-idf = term frequency - inverse document frequency
The document-term index contains a relevance weight
- indicating the relevance of a term to that document (0<r<1)
- r large if term in: Title, bold, appears twice
- Query Score = PageRank-value * relevance (tf-idf)
This allows 2 queries to return 2 pages, with "opposite" ranking
- without relevance factor, documents could only be returned in a pre-determined order
The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

What is a Cellular Automata?

Simple computational machines
"Bottom up" approach provided
Behaviour "emerges" at higher leves: when looking at large clusters of cells

Components of a CA

1. states: e.g. 0 or 1

2. neighbourhood: the two adjacent (angrenzend) cells N C N

3. rules: depicts the new state of a cell for each (possible) local "configuration" of the cell and its two neighbours

1D Cellular Automata

If there are two possible states (0 or 1) for each of 3 cells, there are 2^3 = 8 rules required. And 2^8 = 256 different CA.

CA - Game of Life - Rules (0 - 4)

0. Grid filled with "organisms", alive or dead

1. live cell < two live neighbours -> dies (loneliness)

2. live cell > three live neighbours -> dies (overcrowding)

3. live cell = two or three live neighbours -> lives

4. dead cell = three live neighbours -> comes to live

CA - Game of Life - Structures

Some structures die off
stable
- block
- boat
oscillate
- blinker
move continually
- gliders (move across the grid)
more complex behaviour
- glider guns (make gliders every n updates)

1 / 43

Card list

Study

CS404 Artifical Intelligence (NUIM)

Create or copy sets of flashcards

Create or copy sets of flashcards

Log in to see all the cards.

SWITCHaai

Office 365

Edulog

Apple ID

Google