InfoSpiders

Adaptive Distributed Information Agents

InfoSpider algorithm

Motivation

Goal: search the WWW for relevant documents in a short time t (t << # documents)
Problems with standard approaches:
- WWW is a large, dynamic, distributed database
- Off-line index (no relevance feedback)
- Exhaustive (abstract away neighborhood semantics)
InfoSpiders for automating intelligent navigation
- Distributed algorithm
- On-line (query drives search)
- Adaptative (evolution, learning)

Back to overview

Model

WWW as an ALife environment
- Random directed graph
- Document <=> node
- Hyperlink <=> edge
Assumptions about local information: neighborhood semantics
- Node-node precision correlation (conditional probability)
- Edge-node estimation reliability (best-first heuristics)
Bandwidth is cost
Caching

Back to overview

InfoSpider algorithm

Endogenous fitness
- Query defines fitness function
- Precision => relevance estimate => energy
- Interaction with environment determines survival
- Fixed death and reproduction thresholds
- Shared finite resources, carrying capacity
- Minimal communication (no normalization)
For each alive spider do:
1. Follow random link i from current document with Prob(i)~exp(beta*L(i))
2. If new document d is not in cache:
  1. Pay energy cost for incurred server access bandwith
  2. Receive energy payoff E~Prec(d)
3. If new energy E < 0 destroy spider
4. Else if E > Reproduction_Threshold
  1. Clone spider
  2. Split parent's energy with offspring
  3. Mutate offspring
Back to overview

Conclusions
- Improvement over exhaustive search if local information is available, using any of the following performance measures:
  - Recall vs. (normalized) bandwidth
  - Recall vs. time
  - Precision vs. recall
- Advantages of InfoSpiders for IR on the WWW:
  - Competition for resources intrinsically enforces balanced network load
  - Exploiting information topology
  - Minimal assumptions about search space
  - Robustness of biological model
Back to overview

Credits
- Filippo Menczer, Rik Belew: Artificial life, adaptive information retrieval
- Alvaro Monge, Wolfram Willuhn: Agent interfaces
- Christos Papadimitriou: Online algorithm performance evaluation
Back to overview
Back to InfoSpiders