PhD Seminar Course on

Adaptive and Versatile Document Understanding

Cagliari, July 15-17, 2008



Instructor: Prof. Henry Baird
CSE Dept., Lehigh University, Bethlehem, Pennsylvania
Co-director of the Lehigh Pattern Recognition Research Laboratory. Co-founder of the Document Analysis Systems, Document Image Analysis for Libraries, and the Human Interactive Proofs workshop series. Fellow, IEEE and IAPR
Duration: 8 hours
Schedule: Tuesday 15, 15-18
Wednesday 16, 10-13
Thursday 17, 10-12
Venue: Aula Y
Topics:
  1. Document Image Understanding for Digital Libraries:
    Web-based digital libraries: rapid expansion worldwide
    Automatic document image understanding state of the art
  2. Document Content Extraction:
    Highly versatile document processing
    Pixel-accurate segmentation and iterated classification
  3. Combination of Classifers:
    There is no dominant classifier
    Combinations can be better; stochastic discrimination
  4. Contextual Analysis as an Aid to Recognition:
    Geometric, linguistic, and semantic context
    The promise of computable semantics: reading Chess
  5. Adaptive Recognition Systems I:
    Self-retraining classifiers
    Autonomously adaptive classifiers
  6. Adaptive Recognition Systems II:
    Exploiting isogeny (internal consistency) of documents
    Towards whole-book recognition: anytime classifiers
  7. Human Interactive Proofs I:
    Survey of CAPTCHAs and other HIPs
    The arms race: segmentation, occlusion, and linguistics
  8. Human Interactive Proofs II:
    Research frontier: exploiting style consistency and linguistic familiarity
    Future directions: implicit CAPTCHAs, non-textual challenges
Assessment:
  1. Read journal article(s) in document understanding area. Write 1 page summary and give 5 minute oral description; or
  2. Conceptual design of a CAPTCHA.
Organizer: Fabio Roli
Dep. of Electrical and Electronic Engineering
University of Cagliari, Italy
Email: roli@diee.unica.it