Lernkartei CS404 Artifical Intelligence (NUIM) (Seite 1 von 2)

Karten	43
Sprache	English
Kategorie	Informatik
Stufe	Universität
Erstellt / Aktualisiert	20.05.2015 / 20.05.2015
Weblink	https://card2brain.ch/box/cs404_artifical_intelligence_nuim
Einbinden	<iframe src="https://card2brain.ch/box/cs404_artifical_intelligence_nuim/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Document Retrieval (= Abfrage)

Queries.. ?!

Identify the top quality documents in a database that relate to some query:

Query Dependent: Evaluate document quality in relation to some given query
Query Independent (Page Rank): Evaluate document quality before (& independent of) any query terms. All ranking is done before any query arrives

Measuring Retrieval Accuracy

Precision: the number of relevant documents retrieved, divided by the total number of documents retrieved. (=sind möglichst viele die ich bekomme relevant?)

\(Precision = \frac{\#RelevantRetrieved}{\#TotalDocuments}\)

Recall: the number of relevant documents retrieved divided by the total number of relevant documents in the long-term memory (=bekomme ich möglichst alle relevanten?)

\(Recall = \frac{\#RelevantRetrieved}{\#TotalRelevant}\)

Document Ranking & Retrieval (What is it for? Citations, Examples)

For a database of Hyperlinked documents
Citations confer quality
Citations from highly referenced documents are even better
"Citation Ranking" of academic documents:
- Google Scholar
- Citeseer
- Academic research microsoft

H - Index

H-index metric for Academic productivity and quality
h-index = h if you have published h papers, each of which has at least h citations

In-link vs. Out-link

out-link = backlink
Hyperlinks on a page are outlinks pointing to other pages on the web
An inlink to one page is an outlink from another page
Link from A to B is an out-link from A but an in-link from B

Random Walk

Consider sombody randomly following links across the web (occasionally restarting at a random page)
He will frequently visit pages with many in-coming links (Pages with no in-link only get visited if hey actually starts at that page)
The summation of many random walks gives a likelihood of visiting that page \(\rightarrow\) This is the Google PageRank score!
Rank is based on theoretical visitors - not the number of people who actually visit a webpage

PageRank (formula!)

The PageRank of a page is the chance that a random surfer will lond on that page

\(PR(p) = (1-d) + d \cdot \sum_{(q,p) \in E} \frac{PR(q)}{outdegree(q)}\)

where PR = PageRank, d = 0.85, outdegree(q) = number of out-links contained on page q

Iterative Calculation of PageRank

Iterative Calculation of the true PageRank value: Begin by assuming all pages PR(x) = 0.15
Random iterative application of PageRank rule gradually "converges" to the final solution
Average Page Rank alue is 1, a few highly ranked pages, and vast numbers of poorly linked pages near 0.15

Document Indexing

Documents are pre-ranked and pre-indexed by words
Google contains a document-word matrix, listing all webpages that use each word
When a query is supplied, Google finds the highest ranked page(s) that contain that term

tf-idf Term-Document Relevance

tf-idf = term frequency - inverse document frequency
The document-term index contains a relevance weight
- indicating the relevance of a term to that document (0<r<1)
- r large if term in: Title, bold, appears twice
- Query Score = PageRank-value * relevance (tf-idf)
This allows 2 queries to return 2 pages, with "opposite" ranking
- without relevance factor, documents could only be returned in a pre-determined order
The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

What is a Cellular Automata?

Simple computational machines
"Bottom up" approach provided
Behaviour "emerges" at higher leves: when looking at large clusters of cells

Components of a CA

1. states: e.g. 0 or 1

2. neighbourhood: the two adjacent (angrenzend) cells N C N

3. rules: depicts the new state of a cell for each (possible) local "configuration" of the cell and its two neighbours

1D Cellular Automata

If there are two possible states (0 or 1) for each of 3 cells, there are 2^3 = 8 rules required. And 2^8 = 256 different CA.

CA - Game of Life - Rules (0 - 4)

0. Grid filled with "organisms", alive or dead

1. live cell < two live neighbours -> dies (loneliness)

2. live cell > three live neighbours -> dies (overcrowding)

3. live cell = two or three live neighbours -> lives

4. dead cell = three live neighbours -> comes to live

CA - Game of Life - Structures

Some structures die off
stable
- block
- boat
oscillate
- blinker
move continually
- gliders (move across the grid)
more complex behaviour
- glider guns (make gliders every n updates)

2D cellular automata

The most popular CA format
2D grid
usually a 3x3 neighbourhood grid (in/excluding the corner squares)
Binary states (more states allowed in more complex CA)

Rabbits & Foxes

Simulate competitive populations
- Simplification of producer/consumer
If there's few foxes, the rabbits thrive (=gedeihen) and breed rapidly
But then, the fox population soars (=steigt), consuming all rabbits
- Until the foxes start to starve
Cells: Fox, rabbit, neutral

4 Types of CA - Name all types

homogeneous state
simple stable or periodic structures
chaotic pattern
long-lived, complx structures

4 Types of CA - Explain Class 1

Class 1: Homogeneous state:

Cells die back completely after few steps of evolution. No initial configuration can produces patterns of any length using these rules, it will always evolve to a homogeneous state. These are non-reversible, because once the fild is blank, no previous state information can be derived.

4 Typles of CA - Explain Class 2

Class 2: Simple stable or periodic structures.

Self-contained structures can exist that will not terminate in any number of time steps. Another way of putting this is that the information of a cell can only propagate a finite distance, and thus cannot affect distant structures - regardless of time. This acts almost as a "filter" on the initial configuration, choosing some features to propagate over infinite steps, and some to remove completely. For a 1dCA this may look like solid bars over time on the automaton, or train tracks, stair steps, etc.
Non-reversible just like class 1, since the fixed structures give no information about the initial state which was lost during evolution.

4 Types of CA - Explain Class 3

Class 3: Chaotic pattern

This is where cellular automata start to get interesting. The amount of current knowledge needed grows as one tries to predict further into the future - hence it is a chaotic system. As with all well-defined chaotic systems, a given state is completely reversible; the previous state (and all the ones before) can be predicted by examining the current state. non-deterministic - to find the value of a cell after an infinite amount of time, an infinite number of initial conditions must be considered. Behavior is chaotic, but not random.

4 Types of CA - Explain Class 4

Class 4: Long-lived, complex structures

These automata propagate cell information out at variable speeds, depending on the structures themselves. These are the only non-trivial, non-reversible automata; since the current cell values could have arisen from more than one previous configuration, current cell values cannot be extrapolated backwards to know previous configurations.
Finding the value of a given cell a certain numer of steps in the future is an intractable problem; except for special cases, it cannot be found without actually doing all the drawn-out computation. Conway's Game of Life is a class 4 cellular automaton.

Other Properties of CA

Binary vs. continuous CA
- Binary, integer or Real cell states
Synchronous vs. asynchronous CA
- Are all cells updated instantaneously, or one at a time (in a random order)
Totalistic CA
- State depends only on total (average) of previous states (i.e. no left-to-right directionality)
Reversible CA
- It's possible to return to previous states, runs forward and backward (in time)
Regular vs. irregular CA
- Regular 2D grid vs. irregular tessellation of 2D plane.

Firing Squad Problem

N riflemen (Schützen) standing next to each other in a line
Each can speak to, or hear, his immediate neighbours
Riflemaster issues command to first rifleman
Can they agree a signal system so that all riflemen fire simultaneously?

One solution to the Firing Squad Problem using 15 states and 3n units of time.

Intelligent Agents (CA)

Complex problems solved by interaction amongst (relatively) simple components (agents)
- society of mind
Each agent performs some specific task, relatively autonomously
Problem solved in "bottom up" manner

Q: How does an individual loaf of bread get from the baker to your table?
A: Individual agents (drivers) deliver quantities to shops. The shop-keeper (agent) stores bread for individual sale. You (agent) then buy a single loaf to deliver to your table!

Conclusion CA

Cellular Automata are simple computational devices
- Cells, states, update rules
"Intelligence" is an emergent property
- like "agent" systems
CA can be seen as simple inteligent agents

Lazy vs. Eager Learning

Lazy: wait for query before "generalizing"
- k-Neirest Neighbour, Case based Reasoning
Eager: generalize before seeing query
- Radial basis function networks, ID3, Backpropagation, NaiveBayes,...
Eager creates global approximation - Lazy create many local approximations
If they use same H(ypothesis), lazy can represent more complex functions
- e.g. consider H = linear functions

KNN - Basics

KNN is basically a two-part algorithm:
- Retrieve the K most similar recorded cases
- "Average" across these K cases
Simplest case is where k=1

1 - Nearest Neighbour

Given a query instance x_q, find the nearest neighbour x_n and estimate f(x_q) as f(x_n)
Find the most similar recorded problem
- Assume they have the same solutions
- i.e. generalise across one point

KNN - Problem

Predicting f(x_q)

Take "vote" among it's k nearest neighbours
- if discrete-valued target function
Take a mean (Mittelwert) of f(x) values for the k nearest neighbours
- if real-valued target function

When to use nearest Neighbour?

Instances map to points in Rⁿ
Less than 20 attributes per instance
When you've lots of training data

Advantages of Nearest Neighbour

"Training" is very fast
Learn complex target functions
Don't lose information
Constantly changing/updated database

Disadvantages of Nearest Neighbour

Slow at query time
Easily fooled by irrelevant attributes

Aris Process

Representation
Retrieval
Graph Mapping
Transfer spec# statements
validate original code + new spec#

Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.

Crossover: The production of offspring is frequently based on taking the first N chroosomes from parent one, and the remainder from parent two. This produces a new individual with some genetic material from each parent.

Two-Point Crossover: You have two crossover points, so 3 strings of bits

Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.

Mutation: The rate of mutation is determined by a global constant/variable which determines the probability of a random alteration of one bit of genetic material of an individual (This helps avoid local minima)

Single-point mutation: flipping one bit

What is a GA (Genetic Algorithm)? What is it used for?

model of machine learning which derives it's behavior from a metaphor of some of the mechanisms of Evolution, in nature
used for a number of different application areas, like multidimensional optimisation and search problems
parallel, non-comprehensive search for the global maximum of the graph
GAs represent a generate-and-test beam-search through hypothesis space
search is not precise (no guarantee that the global maximum will be found)
however, the result should be a good approximation of the maximum value

Steps of GA (Genetic Algorithm)

Create a population of individuals randomly
Evaluate fitness of each individual in the population
Select Individuals
1. Reproduction (Rank, Tournament)
2. Crossover
3. Mutation
Repeat until a suitable fit individual is found

GA: Chromosome

Individuals within our universe represented by chromosomes - a set of character strings analogous to DNA
These chromosomes are used to represent the different parameters being optimised
in practice, we represent chromosomes by having arrays of bits

CS404 Artifical Intelligence (NUIM)

Lernkarteien erstellen oder kopieren

Lernkarteien erstellen oder kopieren

Melde dich an, um alle Karten zu sehen.

SWITCHaai

Office 365

Edulog

Apple ID

Google