backend scanner

The scanner kicks of OcrMyPDF on documents placed into the corresponding directory.

the program roughly executes as follows.

database connection is established.
the base class is being instanciated to work provide output and other supporting functionalities
the program then goes into /data/scan (a local folder needs to be mapped via docker run command) and looks for *.pdf files

foreach found pdf the following flow is executed:

a lock is checked (/tmp/ppyrdOcrMyPdf.txt) and if not existing established to ensure that we dont have concurrent OcrMyPDF processes working on the same large file.
Tesseract is started. The output is written to /data/inbox
We check if the output has been written.
if not the original (no ocr) is moved to /data/scan/error
if found - the original is moved to /data/scan/archive
if a lock exists - a simple output is generated stating that we wait ...