Cognitive Data Capture on Statutes

The Challenge

A team of experts in compliance, technology, risk and data management has the goal of reducing financial crimes and that its clients can protect themselves from fraudsters, criminals, terrorists and money launderers.

For this purpose, the client offers a company participation reporting service. Given the RUT of a natural or legal person, a report is prepared that shows all the companies of which it is (or was) a part and the percentage of participation that corresponds to said person.

The company has a team of 30 people who are in charge of reviewing the history of commercial statutes published in the Official Gazette of Chile and uploading them manually in a web form designed for this purpose. This involves a very repetitive job, subject to errors due to the lexical complexity with which notaries write these documents and entails an inordinate amount of time, since it requires processing the history of 3 million commercial statutes.

Our Solution

The solution proposed by the Mootech team includes, first of all, the training of multiple machine learning models that involved manually placing labels on each of the existing entities, in a total of 1,000 corporate bylaws.

The project was divided into 3 stages according to the type of document in statutes of creation, statutes of modification and statutes of dissolution so that our data science team could concentrate on specific models.

The development team carried out the implementation of a web application that presents users with the corporate bylaws with the respective entities detected that did not approve the automatic validations. Then the user will be in charge of reviewing them (or updating them if necessary) and thus ensure that erroneous data is not inserted in the database.

Results

Thanks to the model implemented by Mototech, the client was able to process the history of 3 million corporate statutes in 3 months of execution. 70% of the documents were approved automatically, without requiring manual intervention.

30% of the documents that followed the manual process showed an average statistic of time required for validation or update of 1 minute per document showing the recognized entities.

Technologies

Algorithms: Transformers, Embeddings, TL-GAN, GAN-based noise, YOLO, Faster R-CNN, NER (Named entity recognition), LSTM
Libraries: Tensorflow, AsanteOCR, Camelot, QR detection, OpenCV, spaCy
Development: Stack MEAN (Mongo, Express, Angular, Node), FastAPI