We address the difficulty of creating a digitised corpus by using a crowdsourced approach for annotating comic books. The resulting XML-based encodings assist researchers, publishers and collection curators equally. To achieve our data collection goal, we develop an online crowdsourcing engine for annotating comics. The tasks are designed to mirror the page reading experience, with participants asked to identify and annotate structural (panel layout, splash pages, meta-panels) and content (characters, places, events, onomatopoeia) elements of comic books.
Our approach provides Digital Humanities (DH) scholars with a (currently missing) structured, annotated corpus; this enables and accelerates research related to comics and sequential art theory.
Curators and collectors of physical or online comics collections are provided with a structured content which could enable the creation of artefacts such as comic books dictionaries, search indices and dictionaries of onomatopoeia.
From a publishing perspective, current standards for digital comics are taking care exclusively of the presentation layer (i.e. rendering a publication on the screen of a device). But the artistic nature of comics and the great potential digital comics have already showcased allow us to go beyond simple content presentation. To this respect we present our contributions with enhancements to current semantic (CBML) and presentation (EPUB) open standards that will allow publishers and digital comics authors to create an improved reading experience.