| 2022 | A cascaded approach for page-object detection in scientific papers. Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina |
| 2022 | Academic writing and publishing beyond documents. Cerstin Mahlow, Michael Piotrowski |
| 2022 | Anonymizing and obfuscating PDF content while preserving document structure. Charlotte Curtis |
| 2022 | Binarization of photographed documents image quality, processing time and size assessment. Rafael Dueire Lins, Rodrigo Barros Bernardino, Ricardo da Silva Barboza, Steven J. Simske |
| 2022 | Chinese public procurement document harvesting pipeline. Danrun Cao, Oussama Ahmia, Nicolas Béchet, Pierre-François Marteau |
| 2022 | Detecting malware using text documents extracted from spam email through machine learning. Luis Ángel Redondo-Gutierrez, Francisco Jáñez-Martino, Eduardo Fidalgo, Enrique Alegre, Víctor González-Castro, Rocío Alaíz-Rodríguez |
| 2022 | Downstream transformer generation of question-answer pairs with preprocessing and postprocessing pipelines. Cheng Zhang, Hao Zhang, Yicheng Sun, Jie Wang |
| 2022 | From print to online newspapers on small displays: a layout generation approach aimed at preserving entry points. Sebastián Gallardo Díaz, Dorian Mazauric, Pierre Kornprobst |
| 2022 | Graphical document representation for french newsletters analysis. Alexis Blandin, Farida Saïd, Jeanne Villaneau, Pierre-François Marteau |
| 2022 | How did dennis ritchie produce his PhD thesis?: a typographical mystery. David F. Brailsford, Brian W. Kernighan, William A. Ritchie |
| 2022 | Long-term lifecycle-related management of digital building documents: towards a holistic and standard-based concept for a technical and organizational solution in building authorities. Uwe M. Borghoff, Eberhard Pfeiffer, Peter Rödig |
| 2022 | Modifying PDF sewing patterns for use with projectors. Charlotte Curtis |
| 2022 | Optical character recognition guided image super resolution. Philipp Hildebrandt, Maximilian Schulze, Sarel Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich |
| 2022 | Optical character recognition with transformers and CTC. Israel Campiotti, Roberto A. Lotufo |
| 2022 | Proceedings of the 22nd ACM Symposium on Document Engineering, DocEng 2022, San Jose, California, USA, September 20-23, 2022 Curtis Wigington, Matthew Hardy, Steven R. Bagley, Steven J. Simske |
| 2022 | Scholarly big data quality assessment: a case study of document linking and conflation with S2ORC. Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles |
| 2022 | SeNMFk-SPLIT: large corpora topic modeling by semantic non-negative matrix factorization with automatic model selection. Maksim Ekin Eren, Nick Solovyev, Manish Bhattarai, Kim Ø. Rasmussen, Charles Nicholas, Boian S. Alexandrov |
| 2022 | Tab this folder of documents: page stream segmentation of business documents. Thisanaporn Mungmeeprued, Yuxin Ma, Nisarg Mehta, Aldo Lipani |
| 2022 | Theory entity extraction for social and behavioral sciences papers using distant supervision. Xin Wei, Lamia Salsabil, Jian Wu |
| 2022 | Triplet transformer network for multi-label document classification. Johannes Werner Melsbach, Sven Stahlmann, Stefan Hirschmeier, Detlef Schoder |