WordMapper: Multilingual Text Alignment and Annotation

WordMapper is a multilingual text annotation tool developed in 2016 by the Academic Technology for FAS team. The tool was originally designed to support the creation of annotations within side-by-side Greek/Arabic text, but can be adapted to other types of web content. Details on the application and its development are below. For more information, or to pilot this new browser-based alignment and annotation tool, contact Academic Technology for FAS at atg@fas.harvard.edu.

Partnership and Development

Faculty Partner: Professor Mark Schiefsky, Chair, Department of the Classics

Developer: Artie Barrett, Senior Software Engineer, Academic Technology for FAS

Product Owner: Rebecca Miller, Instructional Technologist, Academic Technology for FAS

Synopsis

WordMapper is a web-accessible tool that enables students, faculty, and other collaborators to build non-contiguous word alignments between two or more texts, so that they can build and share glossaries to aid in learning and research.

Motivations

The development of WordMapper by AT-FAS was sparked when Prof. Mark Schiefsky, Chair of the Harvard Department of the Classics, joined a meeting initially organized to demonstrate digital annotation tools to other members of the Classics Department.  Prof. Schiefsky brought to the meeting a tool he himself had built to aid in his research. While effective for his research, Prof. Schiefsky envisioned a more accessible and collaborative tool with some modest development, and AT-FAS had the development expertise to help him realize that goal. Shortly thereafter it was decided to take on WordMapper as an AT-FAS development project based on three criteria:

 

  1. Broad functionality: While initially designed with a specific use case in mind, namely integration with the Digital Corpus for Graeco-Arabic Studies, the tool could ultimately be used across a range of disciplines, such as language courses, comparative literature, or linguistics.

  2. Unique functionality: Chief among the criteria for Prof. Schiefsky was to be able to select, link, and annotate non-contiguous areas of text across multiple documents.  We found no other annotation tool that could support these use cases, and WordMapper was designed to address these needs specifically.

  3. Effort: Prof. Schiefsky had already created a prototype of WordMapper, which was in active use by himself and his students.  This meant the tool would not need to be built from scratch, significantly decreasing the amount of effort and resources required to begin work on the project.

 

The primary markers of success for the WordMapper project were firstly the implementation of a database server, which would enhance and streamline Prof. Schiefsky’s use of the tool for research by allowing him to seamlessly save his data and collaborate more easily with students and fellow researchers.  Secondly, we determined that there was potential for WordMapper to serve as a case study of a generalizable tool that would have broad application potential to faculty across disciplines in both their research and their teaching.   

Technical Development

The design and implementation of the tool was inspired by the Hypothes.is project, which is a web-based annotation tool that can overlay annotation functionality on websites and store annotations in a database. At the time of development, that tool did not support non-contiguous annotations across parallel texts in the same way that Prof. Schiefsky’s tool did. The value that AT-FAS brought to the project technically was the ability to take Prof. Schiefsky’s prototype and generalize it as a bookmarklet with a client/server storage model.

The technical team followed an agile methodology to develop the tool, which meant short iterations with frequent opportunities to demo and discuss progress between the developer, product owner, and faculty roles. Prof. Schiefsky’s active involvement in the project’s development was a key factor in the success of the project as well as the agreement by all team members on the definition of done. Early on, the project encountered some challenging issues with how to uniquely identify texts and words in a robust way across websites, and it was helpful to discuss the possibilities and tradeoffs with Prof. Schiefsky, so that we could find an appropriate solution. The result of the development effort was an open source release of a client/server storage solution written entirely in Javascript (NodeJS + ExpressJS + Webpack + Postgres) and hosted at wordmapper.fas.harvard.edu.