InfoSpiders
Adaptive Distributed Information Agents
Motivation
- Goal: search the WWW for relevant documents
in a short time t (t << # documents)
- Problems with standard approaches:
- WWW is a large, dynamic, distributed database
- Off-line index (no relevance feedback)
- Exhaustive (abstract away neighborhood semantics)
- InfoSpiders for automating intelligent navigation
- Distributed algorithm
- On-line (query drives search)
- Adaptative (evolution, learning)
Back to overview
Model
- WWW as an ALife environment
- Random directed graph
- Document <=> node
- Hyperlink <=> edge
- Assumptions about local information: neighborhood semantics
- Node-node precision correlation (conditional probability)
- Edge-node estimation reliability (best-first heuristics)
- Bandwidth is cost
- Caching
Back to overview
InfoSpider algorithm
- Endogenous fitness
- Query defines fitness function
- Precision => relevance estimate => energy
- Interaction with environment determines survival
- Fixed death and reproduction thresholds
- Shared finite resources, carrying capacity
- Minimal communication (no normalization)
- For each alive spider do:
- Follow random link i from current document
with Prob(i)~exp(beta*L(i))
- If new document d is not in cache:
- Pay energy cost for incurred server access bandwith
- Receive energy payoff E~Prec(d)
- If new energy E < 0 destroy spider
- Else if E > Reproduction_Threshold
- Clone spider
- Split parent's energy with offspring
- Mutate offspring
Back to overview
Conclusions
- Improvement over exhaustive search if local information is available, using any of the following performance measures:
- Recall vs. (normalized) bandwidth
- Recall vs. time
- Precision vs. recall
- Advantages of InfoSpiders for IR on the WWW:
- Competition for resources intrinsically enforces balanced network load
- Exploiting information topology
- Minimal assumptions about search space
- Robustness of biological model
Back to overview
Credits
- Filippo Menczer, Rik Belew: Artificial life, adaptive
information retrieval
- Alvaro Monge, Wolfram Willuhn: Agent interfaces
- Christos Papadimitriou: Online algorithm performance
evaluation
Back to overview
Back to InfoSpiders