India, in the recent past, has made much progress in the creation of tools, methods, and technologies to create legal knowledge and use it for legal or justice delivery systems. Some use cases are legal text summarisation, fact-search/identification, court administration, precedent retrieval, contract analysis and argument mining. For instance, post the establishment of an Artificial Intelligence committee by the Supreme Court of India, three AI tools have been launched – SUPACE, SUVACE and SCI-Interact. However, while these tools will significantly impact justice delivery in India, there is not enough information available on either of these. For instance, which specific datasets these AI tools rely on, who is building these datasets, who is part of the AI committee, or how these datasets are being annotated, is not known publicly.
The said questions become much more important keeping in view the complex, noisy and unstructured characteristics of legal documents in India.
For this article, we define legal documents as court judgments, civil (for example, property documents) and criminal documents (such as FIRs, chargesheets, medical reports), legal reports (for example, AIR, Indian Law Reports - ILR, State-specific ILRs, Supreme Court cases, Supreme Court Almanac - SCALE).
The inherent challenges with legal documents are fourfold:
Dissimilar style and structure of judgements, and,
Inherent bias in the legal setup.
It is essential to understand these challenges individually as legal documents are a data source for AI tools, increasing their potential to reflect the same challenges in their predictions.
Each Word Has a Contextual Meaning
Legal documents are a set of distinct units. They have statutes divided into abbreviations, sections, and sub-sections, divided into paragraphs and subparagraphs. Each word has its contextual meaning, which can be interpreted in multiple ways. There are formal interpretation methods – literal rule, golden rule, mischief rule and purposive interpretation. It is a moot question whether any AI-powered tool will comprehend the context and interpret accordingly. Thus, dealing with unstructured linguistic information is one of the most challenging aspects of legal information processing.
Legal Vocabulary Isn't Regular Language
Another challenge is that legal documents have specific attributes and require explanation. The legal vocabulary in India has carried its colonial flavours. Therefore, it is distinct from the regular language we read or speak in and can seem highly technical for an ordinary person. It is essential to highlight the differences in the language of any court judgment, legal books and bare acts.
Thus, it is crucial to be conscious of where the data to build these tools is gathered in the legal sector. In the past, Indian judges have not shied away from incorporating Shayari, songs, Shakespearean literature, poems etc., in the context of a particular case, which might be lucid to a person familiar with the legal language but incomprehensible for others.
Our Judgments Don't Follow Any Structure
The next challenge is the structure or form of judgments, which is drastically different from court to court and is entirely dependent on judges in India. Compare this to China, where the Supreme People’s Court, the apex judicial body, issued a protocol to structure a judgment in a particular order of a) brief procedural history, b) parties’ arguments and evidence, c) court’s finding of facts, d) court’s reasoning regarding disputed issues and e) final ruling.
In contrast to India, the official language in SC and HC is English. Still, the court of the first instance – the district courts and tribunals – operate in regional languages for the majority of people
Can Tech Counter or Biases?
An analysis of data shows that by the time the 50th judge of the Supreme Court will be appointed, 32% of Chief Justice would have been male Brahmins. Another set of data shows that there have been only 11 Supreme Court women judges in 71 years. Such appointment imbalance shows the inadequacy and biases of judgments on gender, caste and religion.
Other countries also recognise such bias. Therefore, renowned judge Lady Hale of the UK Supreme court started the UK Feminist Judgments Project, wherein they rewrote a wide range of existing critical decisions delivered by the male bench to show that they could have been given differently. Thus, it cannot be said with conviction that AI tools would have the capacity to provide progressive case recommendations.
Apart from the above-stated challenges associated with legal documents, there are specific technology-specific concerns.
AI Is a 'Black Box'
Artificial Intelligence technology is inherently unexplainable, ie, what Frank Pasquale famously termed as a ‘Black Box’. However, legal decisions need to be reasoned and transparent as they contribute to a civilisational value. But that is not technically possible in the case of Artificial Neural Networks, which use Natural Language Processing (NLP), a branch of AI concerned with granting computers the ability to understand text and spoken words. NLP learns to perform tasks without being programmed with task-specific rules, due to which humans are rarely needed in such approaches. This makes the entire system opaque.
Justice BS Srikrishna, who was at the helm of the committee that prepared the Personal Data Protection Bill, states that though AI should not be used in judicial work, it can be used for non-judicial work like random allocation of cases benches in order to reduce pendency and increase transparency in allocation.
However, in countries like Poland, Serbia, Georgia and Slovakia, research has shown that AI-based random allocation can create biases. For example, in Poland, the AI system for allocation is fully controlled by the Minister of Justice, a party or a potential party in several cases. It has forced one judge to work excessively or favour particular judges for government cases. Thus, system ownership, data storage, how and where systems are trained and tested and explainability of these systems, particularly, gains importance.
The Challenge of Annotation
The challenge associated with annotation is the time, patience, and expertise required due to the unstructured and chaotic legal documents, explained above.
There should be a standard annotation manual that would contain a set of guidelines for all annotators in the legal realm.
For example, each annotator can be asked to annotate some documents without consulting each other, followed by a joint discussion with all the annotators to resolve any issues. If two annotators mark precisely the span of text with the same label for a sentence, this is considered a correct match. Each label can be voted, and if it achieves majority voting, ie, a clear verdict by all annotators, it should be used for developing a system. This will primarily work in cases where the court appears to agree on some of the arguments and disagree with the rest.
(The author is a doctoral candidate & tutor at Durham University, where he is researching the socio-ethical implications of Artificial Intelligence. This is an opinion piece and the views expressed above are the author’s own. The Quint neither endorses nor is responsible for the same.)