CS404 Artifical Intelligence (NUIM)
Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth
Artificial Intelligence & Language Processing CS404 at National University of Ireland, Maynooth
Kartei Details
Karten | 43 |
---|---|
Sprache | English |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 20.05.2015 / 20.05.2015 |
Weblink |
https://card2brain.ch/box/cs404_artifical_intelligence_nuim
|
Einbinden |
<iframe src="https://card2brain.ch/box/cs404_artifical_intelligence_nuim/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Document Retrieval (= Abfrage)
Queries.. ?!
Identify the top quality documents in a database that relate to some query:
- Query Dependent: Evaluate document quality in relation to some given query
- Query Independent (Page Rank): Evaluate document quality before (& independent of) any query terms. All ranking is done before any query arrives
Measuring Retrieval Accuracy
Precision: the number of relevant documents retrieved, divided by the total number of documents retrieved. (=sind möglichst viele die ich bekomme relevant?)
\(Precision = \frac{\#RelevantRetrieved}{\#TotalDocuments}\)
Recall: the number of relevant documents retrieved divided by the total number of relevant documents in the long-term memory (=bekomme ich möglichst alle relevanten?)
\(Recall = \frac{\#RelevantRetrieved}{\#TotalRelevant}\)
Document Ranking & Retrieval (What is it for? Citations, Examples)
- For a database of Hyperlinked documents
- Citations confer quality
- Citations from highly referenced documents are even better
- "Citation Ranking" of academic documents:
- Google Scholar
- Citeseer
- Academic research microsoft
Random Walk
- Consider sombody randomly following links across the web (occasionally restarting at a random page)
- He will frequently visit pages with many in-coming links (Pages with no in-link only get visited if hey actually starts at that page)
- The summation of many random walks gives a likelihood of visiting that page \(\rightarrow\) This is the Google PageRank score!
- Rank is based on theoretical visitors - not the number of people who actually visit a webpage
PageRank (formula!)
The PageRank of a page is the chance that a random surfer will lond on that page
\(PR(p) = (1-d) + d \cdot \sum_{(q,p) \in E} \frac{PR(q)}{outdegree(q)}\)
where PR = PageRank, d = 0.85, outdegree(q) = number of out-links contained on page q
Iterative Calculation of PageRank
- Iterative Calculation of the true PageRank value: Begin by assuming all pages PR(x) = 0.15
- Random iterative application of PageRank rule gradually "converges" to the final solution
- Average Page Rank alue is 1, a few highly ranked pages, and vast numbers of poorly linked pages near 0.15
Document Indexing
- Documents are pre-ranked and pre-indexed by words
- Google contains a document-word matrix, listing all webpages that use each word
- When a query is supplied, Google finds the highest ranked page(s) that contain that term
tf-idf Term-Document Relevance
- tf-idf = term frequency - inverse document frequency
- The document-term index contains a relevance weight
- indicating the relevance of a term to that document (0<r<1)
- r large if term in: Title, bold, appears twice
- Query Score = PageRank-value * relevance (tf-idf)
- This allows 2 queries to return 2 pages, with "opposite" ranking
- without relevance factor, documents could only be returned in a pre-determined order
- The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
What is a Cellular Automata?
- Simple computational machines
- "Bottom up" approach provided
- Behaviour "emerges" at higher leves: when looking at large clusters of cells
Components of a CA
1. states: e.g. 0 or 1
2. neighbourhood: the two adjacent (angrenzend) cells N C N
3. rules: depicts the new state of a cell for each (possible) local "configuration" of the cell and its two neighbours
CA - Game of Life - Rules (0 - 4)
0. Grid filled with "organisms", alive or dead
1. live cell < two live neighbours -> dies (loneliness)
2. live cell > three live neighbours -> dies (overcrowding)
3. live cell = two or three live neighbours -> lives
4. dead cell = three live neighbours -> comes to live
2D cellular automata
- The most popular CA format
- 2D grid
- usually a 3x3 neighbourhood grid (in/excluding the corner squares)
- Binary states (more states allowed in more complex CA)
Rabbits & Foxes
- Simulate competitive populations
- Simplification of producer/consumer
- If there's few foxes, the rabbits thrive (=gedeihen) and breed rapidly
- But then, the fox population soars (=steigt), consuming all rabbits
- Until the foxes start to starve
- Cells: Fox, rabbit, neutral
4 Types of CA - Name all types
- homogeneous state
- simple stable or periodic structures
- chaotic pattern
- long-lived, complx structures
4 Types of CA - Explain Class 1
Class 1: Homogeneous state:
Cells die back completely after few steps of evolution. No initial configuration can produces patterns of any length using these rules, it will always evolve to a homogeneous state. These are non-reversible, because once the fild is blank, no previous state information can be derived.
4 Typles of CA - Explain Class 2
Class 2: Simple stable or periodic structures.
Self-contained structures can exist that will not terminate in any number of time steps. Another way of putting this is that the information of a cell can only propagate a finite distance, and thus cannot affect distant structures - regardless of time. This acts almost as a "filter" on the initial configuration, choosing some features to propagate over infinite steps, and some to remove completely. For a 1dCA this may look like solid bars over time on the automaton, or train tracks, stair steps, etc.
Non-reversible just like class 1, since the fixed structures give no information about the initial state which was lost during evolution.
4 Types of CA - Explain Class 3
Class 3: Chaotic pattern
This is where cellular automata start to get interesting. The amount of current knowledge needed grows as one tries to predict further into the future - hence it is a chaotic system. As with all well-defined chaotic systems, a given state is completely reversible; the previous state (and all the ones before) can be predicted by examining the current state. non-deterministic - to find the value of a cell after an infinite amount of time, an infinite number of initial conditions must be considered. Behavior is chaotic, but not random.
4 Types of CA - Explain Class 4
Class 4: Long-lived, complex structures
These automata propagate cell information out at variable speeds, depending on the structures themselves. These are the only non-trivial, non-reversible automata; since the current cell values could have arisen from more than one previous configuration, current cell values cannot be extrapolated backwards to know previous configurations.
Finding the value of a given cell a certain numer of steps in the future is an intractable problem; except for special cases, it cannot be found without actually doing all the drawn-out computation. Conway's Game of Life is a class 4 cellular automaton.
Other Properties of CA
- Binary vs. continuous CA
- Binary, integer or Real cell states
- Synchronous vs. asynchronous CA
- Are all cells updated instantaneously, or one at a time (in a random order)
- Totalistic CA
- State depends only on total (average) of previous states (i.e. no left-to-right directionality)
- Reversible CA
- It's possible to return to previous states, runs forward and backward (in time)
- Regular vs. irregular CA
- Regular 2D grid vs. irregular tessellation of 2D plane.
Firing Squad Problem
- N riflemen (Schützen) standing next to each other in a line
- Each can speak to, or hear, his immediate neighbours
- Riflemaster issues command to first rifleman
- Can they agree a signal system so that all riflemen fire simultaneously?
One solution to the Firing Squad Problem using 15 states and 3n units of time.
Intelligent Agents (CA)
- Complex problems solved by interaction amongst (relatively) simple components (agents)
- society of mind
- Each agent performs some specific task, relatively autonomously
- Problem solved in "bottom up" manner
Q: How does an individual loaf of bread get from the baker to your table?
A: Individual agents (drivers) deliver quantities to shops. The shop-keeper (agent) stores bread for individual sale. You (agent) then buy a single loaf to deliver to your table!
Conclusion CA
- Cellular Automata are simple computational devices
- Cells, states, update rules
- "Intelligence" is an emergent property
- like "agent" systems
- CA can be seen as simple inteligent agents
Lazy vs. Eager Learning
- Lazy: wait for query before "generalizing"
- k-Neirest Neighbour, Case based Reasoning
- Eager: generalize before seeing query
- Radial basis function networks, ID3, Backpropagation, NaiveBayes,...
- Eager creates global approximation - Lazy create many local approximations
- If they use same H(ypothesis), lazy can represent more complex functions
- e.g. consider H = linear functions
KNN - Basics
- KNN is basically a two-part algorithm:
- Retrieve the K most similar recorded cases
- "Average" across these K cases
- Simplest case is where k=1
When to use nearest Neighbour?
- Instances map to points in Rn
- Less than 20 attributes per instance
- When you've lots of training data
Advantages of Nearest Neighbour
- "Training" is very fast
- Learn complex target functions
- Don't lose information
- Constantly changing/updated database
Disadvantages of Nearest Neighbour
- Slow at query time
- Easily fooled by irrelevant attributes
Aris Process
- Representation
- Retrieval
- Graph Mapping
- Transfer spec# statements
- validate original code + new spec#
Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.
Compare and contrast mutation and crossover. Describe in detail the effects of one specific operator of each type. Use a diagram to illustrate your answer.
What is a GA (Genetic Algorithm)? What is it used for?
- model of machine learning which derives it's behavior from a metaphor of some of the mechanisms of Evolution, in nature
- used for a number of different application areas, like multidimensional optimisation and search problems
- parallel, non-comprehensive search for the global maximum of the graph
- GAs represent a generate-and-test beam-search through hypothesis space
- search is not precise (no guarantee that the global maximum will be found)
- however, the result should be a good approximation of the maximum value
Steps of GA (Genetic Algorithm)
- Create a population of individuals randomly
- Evaluate fitness of each individual in the population
- Select Individuals
- Reproduction (Rank, Tournament)
- Crossover
- Mutation
- Repeat until a suitable fit individual is found
GA: Chromosome
- Individuals within our universe represented by chromosomes - a set of character strings analogous to DNA
- These chromosomes are used to represent the different parameters being optimised
- in practice, we represent chromosomes by having arrays of bits