Testimonial

testimonial

Presentation

The digital library has become more and more a common tool for ev- eryone, a trend accentuated by the success of the Web and the easy access to every kind of information. Since all the documents are not specifically generated to enter into a digital library, we need cost effec- tive tools such as OCR and reverse engineering techniques. When the data is handwritten, a language dependent recognition process is needed. There again the content determines the approach. Depending on the expected quality, these techniques will need more or less adaptation and depth.

In the READ team, we tackle several problems related to one of the major challenges of the labora- tory, concerning the Automatic Processing of Language and Communication (Traitement Automatique de la Langue et des Connaissances). Our approach emphasizes the digital library and pattern recogni- tion aspects, in particular, it addresses the problems of writing capture, handwriting recognition, reverse engineering of document content and structure

Research activities

  1. Natural and bio-inspired systems
  2. Reasonning models for document structure extraction
  3. Document restoration and segmentation

Collaborations

  1. Afef Kacem of UTIC of Tunis
  2. Enterprises ITESOFT et SAGECOM
  3. Dave Doermann of University of Maryland

Keywords

Machine learning, Incremental classifiers, Bayesian networks, Handwriting modeling, Character recognition, Metadata extraction