Documente Academic
Documente Profesional
Documente Cultură
Brazilian Court
1 Introduction
The use of quantitative methods for problem-solving occurs in several areas such
as Economics, which applies econometrics to evaluate theoretical methods. In the
law field, [1] coined the term ”jurimetrics” which represents the union among
juridic theory, computational methods and statistics with the goal of exploring
jurisprudence and producing descriptive analysis and predictive studies. [2].
Simultaneously, jurimetrics has become the focus of a rising interest in stud-
ies by the Brazilian Law researchers. The research of [3] analyzed manually 1044
sentences looking for real data related to juridic safety at establishing values
2
for moral damages by the judicial authority. Besides that, there was big diffi-
culty to access the immense quantity of decisions that deal with moral damages
estimation.
Finally, it is possible to understand that jurists face their daily problems
with an even bigger issue that is the extraction of valuable information in an
effective way from the Brazilian judicial body decisions. This is partly because
these systems were developed in the 1990s from an infrastructure perspective
prior to the development of ”Big Data” and ”Cloud Computing” concepts. In
fact, the needs of that time were different and did not require large storage
capacity or processing, since most court documents were still written on paper.
In the meantime, new technological solutions are being implemented with the
aim of increasing judicial efficiency. Thus, in 2011 the National Justice Council
initiated the implementation of the Electronic Judicial Process, which allows
the prosecution of the legal case digitally in a computer system [4]. Currently,
more than 8 million cases are pending in this system, and more than 100 million
cases are in progress in the courts [4], partially digital. Thus, it is possible to
observe how the bodies of digital documents kept already demand a significant
amount of storage in the proportion of terabytes, not taking on account all the
new documents and judicial decisions published daily.
However, little was invested in new tools for the exploration, visualization,
and analysis of this publicly accessible corpus. In fact, the tools remain the same,
only sustaining legacy systems. These computer systems were developed by each
Court with the purpose of allowing the research in its judgments base. They
usually have several key-word searches in fields text and selection of judging
section, and as a result, they present a list with thousands of pages. Users must
repeat their query several times, making small changes in the query criteria, and
searching for the answers according to new items of interest [5]. Even with these
solutions, there is still the issue of visualization, since each result is presented
in a text block of approximately 6 lines containing the decision menu. The law
firms must read a large amount of results in order to find the relevant items.
In this context, the business problem is how to know with confidence what
is the Courtjudging tendency that will judge one individual process. It would
not be interesting to make a decision based only on a subjective opinion on
work experience. Therefore, in this situation, the legal director seeks to surround
himself or herself, with the maximum of evidence to support his/her decision to
accept or not a million-dollar agreement, for example. Thus, to conduct this
evidence-based search, lawyers spend hours conducting case-by-case research to
substantiate their claims, using online search tools provided by the courts which
are outdated, disorganized, confusing and superficial.
Dealing with this, a methodology was applied to process thousands of judicial
decisions of the Regional Labor Court of the 3rd Region, located in the State
of Minas Gerais, Brazil. We used text mining technologies for the extraction
and processing of documents, as well as, natural language processing in order
to build a representative model for automatic classification of documents. Thus,
such a representative model was developed with artificial intelligence training
3
by supervised learning using annotated judicial decisions to infer the most im-
portant features for document classification. This classifier presented more than
86% accuracy for document classification.
This paper presents a study that aims to contribute to the development
of software that allows law firms to obtain information more efficiently in less
time, which will allow them to focus their efforts on tasks of greater intellectual
demand, such as strategically evaluating the best way to bring the claim to the
Court. This software also helps higher courts to maintain greater control over the
application of jurisprudence by lower courts, in addition to reducing the amount
of searches by legal firms in their databases.
This paper is organized as follows: next section describes the background
needed to better understand the paper context.
2 Background
Big Data is a term used to describe large volumes of content - usually in quan-
tities measured by terabytes or petabytes, that companies want to track and
analyze [6]. Unstructured data is the largest component in this set that is only
partially archived [7].
[13]. In this way, a labor judicial process begins in 1st Degree being sentenced
by a singular judge. The parties that are not in agreement with the decision can
appeal and apply to the Courts, also called the 2nd Degree of jurisdiction.
The Regional Labor Courts are composed by several judges, who are orga-
nized in Appeal Teams with 3 judges. Thus, when the appeal is forwarded to the
Court, it is randomly distributed to one of the Appeal Teams to be performed by
a collegiate judging. The Regional Labor Court of the 3rd Region, for instance,
is consisted of 10 Appeal Teams.
In addition, a judgment is constituted by the Class members vote who have
received the case to be judged and by the judgment containing the collegiate
decision. It is also made up of a summary of the matter and conclusion of the
judgment [14]. In this way, the summary is composed of key words to facilitate
the jurisprudential research, besides being composed by terms contained in the
legal thesaurus.
3 Related work
Several studies agree that keyword search has several counterpoints and can-
not meet the explosion of digital age relating to legal documents and, is also
technologically insufficient [17], [5], [15] and [26].
Another area of study with increasing focus is the answer to legal questions
[19] and [20]. The goal is to train a searching robot embedded in a conversational
system and a legal database, so it could answer to questions formulated in natural
language.
The Brazilian context [21] Apud [5] presented a proposal for indexing ju-
risprudential documents using Case-based Reasoning (CBR). The author coined
the term Intelligent Jurisprudence Research (IJR) to name the case recovery
process and the described results were superior than the results obtained by
traditional methods.
Another interesting work was presented in [22], where the author compared
the quality of automatic classification to the manual classification, already in
execution, from an existing ontology.
Currently, the jurisprudential search is performed by the law firms and jurists
through Google search and also in the websites of each court. In addition, in the
last few years other sites have appeared adding judicial decisions and providing
them in their research tools.
All specialized sites provide keyword searching, enable the use of logical op-
erators, provide selection of judgmental bodies, and the ordering by relevance
and by date. In relation to the ordering of results, little information is presented
about the algorithm, that is, the criterion used to perform the presentation of
the documents. In addition, none of them present the data through graphs and
7
views, nor does it providing visual aid to facilitate understanding of the docu-
ments or resources for data mining.
The available searching tools provided by public agencies are technologicaly
outdatedBesides, they present disorganized and confusing outcomes since the
consulting performed with identical key-words originated from different com-
puters produce different results. Table 1 presents the systems comparison.
Presentation
Word Exploration Protection against
Key- Logical of the
Website semantic ontology with search robot
word operators winning
root visual aids (Captcha)
party
Google Yes Partial Yes No No No Not detected
Digesto Yes Yes Partial No No No Not detected
JusBrasil Yes Yes No No No No Not detected
Court of the Yes, after
Yes No Yes No No No
3rd Region a few attempts
Court of the Yes, after
Yes No Yes No No No
4rd Region a few attempts
These tasks were needed in order to reduce the insertion of errors in the
Machine Learning classifier. Since the algorithm is based on words that are
found to create the model of prediction, it is important to guarantee that, the
dataset of learning represents adequately the aimed classes to be classified.
Moreover, the extracted documents contain the decision and its grounds,
as well as the names of the applicants and defendants. However, the proposed
classifier must present the information about the winning party in relation to
the company or employee. Thus, the mining result must show if the applicant
is a company or an employee and if the appeal was granted. We removed the
documents where both employee and company appealed. Therefore, the experts
carried out manual annotation of 600 documents from three different Classes of
Appeal in relation to the type of applicant and, also in relation to the judgment
deferral to the applicant.
Secondly, training was done with the resulting dataset for classification be-
tween employee and employer, with the extraction of features through the TF-
IDF index and Bayesian Networks. Thus, this model reached 92% accuracy.
In this phase, the dataset was preprocessed with the withdrawal of stop-words
from the Portuguese language, and all the trainings were performed considering
1/3 of the test dataset.
Finally, the decisions were processed with the extraction of features through
the TF-IDF index and Bayesian Networks to identify the grant of the decision to
the applicant. This model reached 90% accuracy. In addition, such a model was
trained individually in each Recursal Class and, in conjunction with the others, to
be possible to ascertain possible overtraining, which was not identified, since the
cross-tests among classes presented practically the same results with variation
of approximately 3%.
6 Results
More than 10,000 judicial judgments published in 2017 were carried out from the
ten Classes of Judges that compose the Regional Labor Court of the 3rd Region
and processed with the proposed Machine Learnings models, which classified
the type of claimant as a company or employee, and the appeal granting or
denial for each decision. With this dataset was possible to determine the court’s
current view of the legal issues brought to its judgment. This view presents the
proportion of granted requests in relation to the amount of appeals filed by each
party.
As shown in figure 2, 61% of the total number of appeals filed by the claimants
were totally or partially granted. 58% of the total number of appeals filed by the
defendants were totally or partially granted. We observed that the Court appre-
ciated the causes of both parties in approximately the same way, going against
popular knowledge that ”The Labor Court has always leaned more towards the
worker side” [24] and [25].
We also observed that the judging tendency of some individual Classes is dif-
ferent from the Court’s general average. For example, figure 3 shows the results
from two different classes of judges. We can see that the 9th Class (left) granted
30% more resources to employees than to companies, while the 1st Class (right)
granted 18% more resources to companies than to employees. These results cor-
roborate the idea that there are different tendencies in the Brazilian courts in
relation to the favored party being employees or employers.
It has developed a case law analysis project with a Machine Learning technique
to present relevant information about the winners of the causes without the
need to manually read the entire documents. Through supervised learning it
was possible to obtain high scores of accuracy in the classification of documents.
10
Fig. 2. Comparison of the results of the Regional Labor Court of the 3rd Region
Fig. 3. Comparison of the results of the Regional Labor Courts of the 3rd Region
11
Acknowledgement
References
1. Loevinger, Lee: JurimetricsThe Next Step Forward. Minn. L. Rev. 33. HeinOnline:
455. 1948.
2. Jaeger Zabala, Filipe; Silveira, Fabiano Feijo: Jurimetria: Estatstica Aplicada Ao
Direito / Jurimetrics: Statistics Applied in the Law. Revista Direito e Liberdade,
v. 16, n. 1, p. 87103, 2014.
3. Hirata, Alessandro et al.: Dano moral no Brasil. Pensando o Direito, volume 37,
Secretaria de Assuntos Legislativos do Ministerio da Justica, 2010
4. Manuel Carlos Montenegro: PJe atinge a marca de 7,4 mi de processos judiciais
- Portal CNJ. 2016. Available in: http://www.cnj.jus.br/noticias/cnj/81864-pje-
atinge-a-marca-de-7-4-mi-de-processos-judiciais. Access in: 09/12/2017.
5. Constancio, Alex Sebastiao: Ontologia para um Motor de Busca Semantica para
Recuperacao Jurisprudencial no Brasil. Masters thesis. 2017
6. Akerkar, Rajendra: big data computing. 2013.
7. Gandomi, Amir, and Murtaza Haider: 2015. Beyond the Hype: Big Data Concepts,
Methods, and Analytics. International Journal of Information Management 35 (2).
Elsevier: 13744.
8. Erl, Thomas, Wajid Khattak, and Paul Buhler: 2016. Big Data Fundamentals: Con-
cepts, Drivers & Techniques. Prentice Hall Press.
9. Pierson, Lillian: 2015. Data Science for Dummies. John Wiley & Sons.
12
10. Chapman P., Clinton J: 2000. CRISP-DM 1.0: Step-by-Step Data Mining Guide.
CRISP.
11. Becker, Karin: 2017. Slides de aula Processo de KDD. UFRGS.
12. Christopher, D Manning, Raghavan Prabhakar, and SCHTZE Hinrich: 2008. In-
troduction to Information Retrieval. An Introduction To Information Retrieval 151:
177.
13. TSE - Tribunal Superior do Trabalho. Perguntas frequentes - Pesquisa de jurispru-
dencia. Available in: http://www.tse.jus.br. Access in: 09/12/2017.
14. STF - Supremo Tribunal Federal: Glossario Juridico :: STF - Supremo Tri-
bunal Federal. [s.d.]. Available in: http://www.stf.jus.br/portal/glossario/. Access
in: 09/12/2017
15. Borden, Bennett B, and Jason R Baron: 2014. Finding the Signal in the Noise:
Information Governance, Analytics, and the Future of Legal Practice. Rich. JL &
Tech. 20. TC Williams School of Law University of Richmond. Richmond Journal
of Law & Technology: 714.
16. Koniaris, Marios, Ioannis Anagnostopoulos, and Yannis Vassiliou: 2017. Evaluation
of Diversification Techniques for Legal Information Retrieval. Algorithms 10 (1).
Multidisciplinary Digital Publishing Institute: 22.
17. Zhang, Ni, Yi-Fei Pu, and Ping Wang: 2015. An Ontology-Based Approach for
Chinese Legal Information Retrieval.
18. Jo, Dae Woong, and Myung Ho Kim: 2015. A Framework for Legal Information
Retrieval Based on Ontology. Journal of The Korea Society of Computer and Infor-
mation 20 (9): 8796.
19. Adebayo, Kolawole John, Luigi Di Caro, Guido Boella, and Cesare Bartolini: 2016.
An Approach to Information Retrieval and Question Answering in the Legal Do-
main.
20. IBM. 2017. ROSS Intelligence Artificial Intelligence in Legal Research. Blue Hill
Research. Available in: http://www.rossintelligence.com. Access in: 09/12/2017.
21. Weber, Rosina: 1999. Intelligent Jurisprudence Research: a New Concept. In Pro-
ceedings of the 7th International Conference on Artificial Intelligence and Law,
16472. ACM.
22. Ferauche, Thiago, and Mauricio Amaral de Almeida: 2011. Aprendizado de Clas-
sificadores de Ementas da Jurisprudencia do Tribunal Regional do Trabalho da 2
RegiaoSP. In VI Workshop de pesquisa do Centro Estadual de Educao Tecnolgica
Paula SouzaSPBrasil.
23. Chen, Yen-Liang, Yi-Hung Liu, and Wu-Liang Ho. 2013. A Text Mining Approach
to Assist the General Public in the Retrieval of Legal Documents. Journal of the
Association for Information Science and Technology 64 (2). Wiley Online Library:
28090.
24. Divisao de Comunicacao do CSJT: CSJT divulga dados sobre arrecadaao e despesa
em resposta a noticia sobre custo da JT - Noticias Lanamento - CSJT. 2017.
25. Rogerio Barbosa: ConJur - Justica do Trabalho deixa de privilegiar empregado em
acoes trabalhistas. 2012.
26. Ashley, Kevin D.: Artificial Intelligence and Legal Analytics: New Tools for Law
Practice in the Digital Age. Cambridge University Press, 2017.
27. Kubat, Miroslav: An Introduction to Machine Learning. Springer, 2016.