backend scanner
The scanner kicks of OcrMyPDF on documents placed into the corresponding directory.
flow description
the program roughly executes as follows.
- database connection is established.
- the base class is being instanciated to work provide output and other supporting functionalities
- the program then goes into
/data/scan
(a local folder needs to be mapped viadocker run
command) and looks for*.pdf
files
foreach found pdf the following flow is executed:
- a lock is checked (
/tmp/ppyrdOcrMyPdf.txt
) and if not existing established to ensure that we dont have concurrent OcrMyPDF processes working on the same large file. - Tesseract is started. The output is written to
/data/inbox
- We check if the output has been written.
- if not the original (no ocr) is moved to
/data/scan/error
- if found - the original is moved to
/data/scan/archive
- if a lock exists - a simple output is generated stating that we wait ...