backend scanner
The scanner kicks of OcrMyPDF on documents placed into the corresponding directory.
flow description
the program roughly executes as follows.
- database connection is established.
- the base class is being instanciated to work provide output and other supporting functionalities
- the program then goes into
/data/scan(a local folder needs to be mapped viadocker runcommand) and looks for*.pdffiles
foreach found pdf the following flow is executed:
- a lock is checked (
/tmp/ppyrdOcrMyPdf.txt) and if not existing established to ensure that we dont have concurrent OcrMyPDF processes working on the same large file. - Tesseract is started. The output is written to
/data/inbox - We check if the output has been written.
- if not the original (no ocr) is moved to
/data/scan/error - if found - the original is moved to
/data/scan/archive - if a lock exists - a simple output is generated stating that we wait ...