PhD Seminar Course on

Visual Content-Based and Semantic Concept-Based
Multimedia Indexing and Search

Cagliari, Sept. 20 -- Sept. 22, 2010

This activity was made possible by the "Visiting Professors 2010" program of the University of Cagliari, sponsored by the Autonomous Region of Sardinia


Dr. Apostol (Paul) Natsev     
IBM Watson Research Center, Hawthorne, NY


8 hours, Sept. 20 -- Sept. 22, 2010


Lecture 1 (4 hours):    Monday,         09:00 --- 13:00,     Sept. 20

Lecture 2 (4 hours):    Wednesday,    09:00 --- 13:00,     Sept. 22


Classroom I


Social media creation and use has skyrocketed in recent years and has become an indelible part of our lives -- from the way we entertain and inform ourselves to the way we communicate, socialize, or learn. More than 24 hours of new video content are uploaded on YouTube each minute, and over 100M new photos are uploaded on Facebook every day. With the tremendous growth of online multimedia content come great opportunities but also great expectations and challenges. Users expect images and video to be searchable as easily as text but technology has unfortunately not kept pace.

In this short course, I will present the current state of art in multimedia search, and will review the current approaches for visual content-based as well as semantic indexing and search. Emphasis will be given on a new promising direction of research, semantic concept-based retrieval, which aims to boost both the effectiveness and usability of multimedia search. I will describe techniques that leverage the computer's ability to effectively analyze visual features of images and video, and apply statistical machine learning techniques to classify and label visual scenes, objects, people, and activities, automatically. I will also describe methods that leverage such automatically generated labels to improve the quality of multimedia indexing and search as well as to enable new applications and content monetization models. The above approaches will be presented and demonstrated in the context of a state-of-art multimedia analysis and retrieval system developed at IBM Research (

Here is a preliminary synopsis and list of topics to be covered (still subject to change but not by much):

  1. Introduction and motivation
    1. Opportunities of multimedia analysis and retrieval
    2. Challenges and basic problems
    3. Example applications and demos
  2. Content-based retrieval
    1. Global Visual Features
      • Color spaces and color features
      • Color feature representations (color histograms, correlograms, moments)
      • Texture features (structural, statistical, spectral
      • Edge and shape features
    2. Local visual features
      • Interest point detection
      • Local descriptor representation
      • Local point matching and spatial registration
    3. Similarity measures and evaluation metrics
    4. Video segmentation and matching
    5. Advanced techniques of content-based retrieval
    6. Video fingerprinting and near-duplicate detection
  3. Semantic concept-based retrieval
    1. Definitions and motivation
    2. Semantic concept vocabulary design
    3. Semantic concept modeling and extraction
    4. Multi-modal fusion and semantic context exploitation
    5. Retrieval by semantic concepts
    6. Concept-based query expansion
  4. Multi-modal video retrieval -- a case study
    1. Speech-based retrieval
    2. Visual content-based retrieval
    3. Semantic concept-based retrieval
    4. Query-dependent multi-modal fusion
    5. Performance evaluation (TRECVID)
    6. Demos and other applications


Prof. Giorgio Giacinto
Dep. of Electrical and Electronic Engineering
University of Cagliari, Italy
Email: giacinto[at]