Chapter Proposals Submission Deadline: 20/07/2010

Full Chapters Due: 30/10/2010


Learning structure and schemas from documents



A book edited by:

Dr Marenglen Biba, University of New York Tirana, Albania

Dr Fatos Xhafa, University of London (Birkbeck), United Kingdom

To be published in the “Studies in Computational Intelligence” book series, Springer (2011)




The rapidly growing volume of available digital documents of various formats and the possibility to access these through internet-based technologies, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, since the extremely large volumes make it impossible to manually organize such documents and since most of the documents exist in an unstructured form and do not follow any schemas, most of the efforts in this direction are dedicated to automatically infer structure and schemas that can help to better organize hue collections. This is essential in order for these documents to be effectively and efficiently retrieved.


Dealing with unstructured information is a hot research. A growing body of work is addressing the problem of recognizing structure and schemas in documents of various types. Some areas are mainly concerned about the visual representation of documents and increasing improvements are being made in the area of pattern recognition and document layout analysis to classify documents according to structure found in their layout. On the other side, extensive research is being done in the field of machine learning to exploit attributes of documents and relationships among different documents to infer structures in large collections of documents. Important work is also being performed in the data mining and knowledge discovery community which has traditionally dealt with raw data but recently is dedicating attention to learning structure from unstructured information. In addition, Semantic Web researchers are dedicating important efforts to the problem of identifying structure and schemas in order for them to achieve ontology matching or alignment. Another related area regards the database community that has long worked with integration problems but only recently this community has started considering automatic structure and schema learning as a potential approach for schema and database integration. Finally information retrieval and extraction seek to infer structure and schemas from free text in order to build efficient information seeking models from large corpora.



The Overall Objective of the Book

The goal of this book is to present state-of-the-art methods for structure learning and schema inference. Most of the existing fields and technologies have long worked mainly in an isolated fashion even though the tasks they solve have much in common. This has led to a stall of the overall advancement to solving the problem, even though separate fields improve their performances independently on specific datasets. The automatic inference of structure is central to all approaches to organizing documents, therefore it has become important to bring together researchers from different fields and identify common challenges in order to advance the state-of-the-art in structure learning from documents. This will make possible the exploitation of methods developed in one field, from researchers of related fields who might take advantage of novelties introduced in different fields working on the same problem of learning structure in documents.


The book appreciates that an understanding of the interactions between various approaches is essential to develop synergies among different research areas in order to develop more robust methods that can attack the problem in a multi-strategic fashion. Thus the focus of this book is on:

  • Presenting state-of-the-art approaches to learning and inferring structure and schemas from documents
  • Assessing whether methods and techniques of one approach can be extended or exploited by the other approaches in a multi-strategic effort to the problem
  • Assisting in the identification of new and future challenges and opportunities that will raise the bar in the state-of-the-art of several research areas attacking the same problem.
  • Case studies and best practices from real large scale digital libraries, repositories and corpora.


The Target Audience

Although contributions will be open from both academia and industry practitioners and researchers, the audiences of this book are those working in or interested in joining interdisciplinary and transdisciplinary works in the areas of data mining, machine learning, pattern recognition, document analysis and understanding, semantic web, databases. artificial intelligence and digital libraries, whose mainly focus is that of learning structure and schemas from unstructured information. The application areas are also very broad and contributions will be open for applied works in bioinformatics, web mining, text mining, information retrieval, real-world digital libraries, data warehouses and ontology building. Specifically, audiences who are broadly involved in the domains of computer science, web technologies, applied informatics, business or management information systems are: (1) researchers or senior graduates working in academia; (2) academics, instructors and senior students in colleges and universities, and (3) business analysts from industries interested in data integration, information retrieval and enterprise search.



Chapters should be written in a manner readable for both specialists and non-specialists. Chapters could address issues related to past, present and future theories, methods, and practices of learning structure from documents. These should be focused on next generation paradigms and with a particular focus (but not limited) to Structure Learning, Schema Integration, Schema Inference, Document Analysis and Recognition, Document Layout Analysis, Document Image Understanding, Data Mining, Data Annotation, Data Integration, Mining Unstructured Data, Learning Structure from Text, Web Mining, Text Mining, Document Databases and Digital Libraries, Database Integration, Data warehouse Integration, Ontology mapping, Ontology merging, Ontology alignment, Ontology Searching, Ontology Ranking, Ontology Evaluation, Information Retrieval, Information Extraction.


Recommended topic areas include, but are not limited to:

  • Critical Reviews on:
    • Theory and Strategies Fundamentals in Document Structure and Schema Learning
    • Next Generation Technologies for Document Structure and Schema Learning
  • Theory and Strategies Fundamentals in Document Structure and Schema Learning
    • Developing a unifying theory of structure and schema learning from documents
    • Document Image Analysis and Understanding, Document Analysis Systems  
    • Document Classification in Digital Libraries
    • Ontology modeling, reuse, extraction, evolution, mapping, merging, and alignment
    • Searching and ranking ontologies. Ontology evaluation
    • Learning Structure and Schemas from Textual Documents for Information Retrieval and Extraction
    • Retrieval Models and Ranking based on Structure and Schema Learning
    • Queries and Query Analysis based on Structure Learning
    • Recommender systems based on Structure and Schema Learning
    • Integrating Data Warehouses Through Schema Discovery
    • Discovering Structure and Schemas from Biological Data and Documents
    • Merging Biological Databases Through Structure and Schema Learning
    • Learning Structure and Schemas from Heterogeneous Domains
    • Discovering Structure and Schemas in Social and Biological Networks
    • Learning Structure in Graphs
    • Web document analysis (including wikis and blogs)
    • Communities discovery and analysis in large scale online or offline social networks
    • Structure learning for analysis of evolution of patterns and communities in the Web
  • Next Generation Technologies for Document Structure and Schema Learning
    • Artificial Intelligence, Pattern Recognition and Machine Learning for Document Analysis and Understanding
    • Symbolic, Statistical and Hybrid Learning Approaches for Structure and Schema Discovery
    • Machine Learning for the Semantic Web
    • Data Mining for document structure and schema learning
    • Distributed Computing for Learning Web Structure
    • Combinatorial Optimization in Structure and Schema Learning from Documents
    • Meta-heuristics for Structure and Schema Learning from Documents
    • Parallel Computing for Structure and Schema Learning from Documents
    • Learning structure and schemas from Stream, Spatial and Temporal Data
    • Capturing Drift in Time-Changing Documents and Data
    • High performance implementations of document analysis algorithms
    • Multi-agent systems for Document Processing and Structure Learning
    • Novel Document Representation Formalisms and Content Analysis
    • Multimedia analysis, indexing, retrieval by exploiting structure in images videos, speech/audio, etc.
    • Large scale document structure and schema learning for real-world Web scenarios
    • Large scale digital libraries by exploiting structure and schema learning
    • Large scale social network analysis based on structure and schema learning
  • Applications and best practices in Document Structure and Schema Learning
    • Languages, Components, Programs, Knowledge Portals and/or Applications
    • User Community/Organizational Needs Response Developments in various settings including (but not limited to) real-world digital libraries, historical archives, document management systems, enterprise search, multimedia libraries, web search, business intelligence, economics, marketing, advertising, bioinformatics and biological networks, database and data warehouse systems, social networks, etc.
    • Performance, Scalability, Robustness, Verification, Validation, Benchmarking




Submission Information

Submission is possible only through invitation. Academics, researchers and practitioners are invited to submit by 20 July 2010, a 2-page manuscript proposal detailing the background, motivations and structure of their proposed chapter. Authors of accepted proposals will be notified by 1 August 2010 and will be given instructions and guidelines for chapter preparation. Full chapters are due on 30 October 2010 and should be of 8,000 words in length and/or between 25 to 30 pages long. The book is scheduled to be published in the “Studies in Computational Intelligence” book series, Springer. For information about the publisher and the book series, visit http://www.springer.com/series/7092. This publication is anticipated to be released in 2011.


Important Dates

20 July 2010:                      2-page Proposal Submission Deadline

1 August 2010:                    Notification of Proposal Acceptance

30 October 2010:               Full Chapter Submission (in Word or PDF)

15 December:                      Notification of Full Chapter Acceptance

30 January 2011:               Revised Chapter Submission

30 February 2011:             Final Notification of Acceptance

15 March 2011: Final Material Submission



Inquiries and submissions can be forwarded electronically (in Word or PDF) to:


Dr Marenglen Biba

University of New York Tirana, Albania

E-Mail: marenglenbiba@unyt.edu.al

URL: http://www.marenglenbiba.net



Prof. Fatos Xhafa

University of London (Birkbeck), United Kingdom

E-Mail: fatos@lsi.upc.edu

URL: http://www.lsi.upc.edu/~fatos/