Tamir Hassan

I am a researcher, developer and consultant in the field of Document Engineering and have over 10 years of experience working with PDF and HTML documents on topics including table recognition, automatic tagging, layout optimization and conversion between the two formats.

I was previously employed at HP Labs in the field of Automated Publishing, working on delivering documents for screen (desktop and mobile devices) and print. Prior to this, I worked in academia, where I co-organized the ICDAR 2013 Table Competition. I am located in Vienna, Austria.

This page contains a summary of my research activities and links to some of my open-source contributions in the field of document engineering.

You can email me at: web (at) tamirhassan.com


I am interested in several topics related to document engineering, such as automatic layout, document authoring, document analysis, information extraction and digital typography. Previously, I worked at the Zukunftskolleg, University of Konstanz working on semi-flexible layouts and at IUPR, TU Kaiserslautern on the Decapod project. Before, I was at PRIP and DBAI, TU Wien. In Spring 2010, I worked for three months with Prof. Roger D. Hersch at the EPF Lausanne on parametric representations of fonts.

I wrote my doctoral thesis at the Database and Artificial Intelligence Group at TU Wien under the supervision of Prof. Georg Gottlob. I have worked on methods for wrapping, or supervised semi-automatic data extraction, from PDF files. Because PDF documents are not structured in the same way as HTML, my work involves using a number of techniques from the document analysis and understanding field, and I have recently worked on page segmentation, converting PDF to HTML (as an input filter to the Lixto Visual Wrapper) and table recognition in PDF files.

In 2009, I worked on a novel approach for wrapping documents using visual extraction patterns; this approach represents the document in an attributed relational graph and uses error-tolerant graph matching techniques to locate the desired wrapping instances. A prototype of this system was presented at CeBIT at the stand of the Austrian Computer Society, a trial version of which is now available for download. For more information, please see the page on GraphWrap.

My first degree is a M.Eng. (Hons) in computer science, obtained from the University of Warwick in 2004.


Here is a selection of recent publications which I have authored or co-authored:

A list of my publications on DBLP is available here.

More about me

In addition to my current field or research, I am also interested in a number of other areas in applied computer science. I have written about two of these areas below:

I have long had a love for typography and am particularly fascinated by the multi-disciplinary aspect of computer science, particularly its application to the arts, as well as human-oriented issues such as HCI. More generally, I love work which requires great attention to detail.

In my free time, I enjoy photography (in particular architecture) and sing in a choir.

Previous work

My PDF-to-HTML converter, created as part of my study at the University of Warwick, can be found here.