Cost of Not Sharing Legal Datasets? Truth Is, We Don’t Know Yet
It is ironic that we are trying to answer a question about datasets without data.
In 2014, Rukmini Shrinivasan reported that 30 percent of all sexual assault cases filed before Delhi District Courts in 2013 dealt with consenting couples whose parents had accused the boy of rape, and another 20 percent dealt with “breach of promise to marry”.
In 2017, when Stayzilla’s founder was arrested on criminal charges, senior advocate Alok Prasanna from Vidhi Legal Policy shed light on the increasing use of criminal cases to try and resolve a civil dispute: the number of cheating cases have doubled between 2006-15.
In 2018, Susan Thomas and Ajay Shah sifted through data and showed that the probability of a case closing in the 180 days prescribed by the Indian Bankruptcy Code is 0.9. And in the same year, Apar Gupta and Abhinav Sekhri brought to our attention that despite Section 66A of the Information Technology Act, 2000 being struck down by the Supreme Court, it continued to be used across the country to arrest citizens.
What Innovative Data-Driven Work Can Help Us Achieve
These examples give us a glimpse of what legal datasets can tell us. Datasets relating to cases and matters relating to our systems of law and justice from public and private sources, can enable us to understand how our courts function, impact of laws and judgments, and offer insights into crime, litigants, cases, and help hold our legal institutions accountable.
Innovative data-driven work can play a critical role in improving our systems of law.
The rapid development in technology can enable new applications for the basic sources of research: datasets, to give major impetus to insights and innovation.
But this future can be fuelled only if we make legal datasets openly accessible. The international Human Genome Project is but one good example of a large-scale endeavour in which openly accessible information is being used successfully by many different users, all over the world, for a great variety of purposes.
Gold Standard For Credibility of Data-Driven Efforts We Must Aim For
Open legal datasets have a similar potential to become an essential part of the infrastructure of legal education, legal practice, judicial functioning and accountability.
This makes obvious sense given that sourcing and cleaning datasets is incredibly time and resource intensive. Conversations with authors of datasets, tell us that it takes researchers 8-24 months on an average to source and clean datasets. Each research project can cost anything from Rs 20 lakhs to 50 lakhs. Thus, only those who are very motivated and have the resources and skills do this, take up such data-driven efforts in law.
The bigger, more monumental loss is that, these datasets that are so painfully created, and most often draw on public sources aided by philanthropic/ public capital, are used only once: by the teams creating them to publish their one story, report or paper.
Now, this would change radically if we view legal datasets as an important community resource. Given legal datasets are ‘public facts’, authors, especially those publicly funded, must share their datasets online: permitting others to read, verify, download, analyse, build upon the datasets easily and respectfully.
We should be striving for a future in which the gold standard for credibility of data-driven efforts is to make the datasets open for others to verify and replicate results.
We Are Trying To Answer A Question About Datasets Sans Data
Wider access to datasets will increase the return on investment in each initiative and the field as a whole. It will attract and fuel new actors, including students, who today do not have the resources to work with the legal datasets. More importantly, it will encourage diversity of studies and opinion; avoid costly repetition of work, promote new areas of work, and enable the exploration of topics not envisioned by the initial authors, leading to innovations needed by a desperate law and justice system.
We, in the legal sector, can draw significantly from other movements such as the Open Science Movement to evolve communities, principles, incentives resources and processes to further open legal data.
Authors need to be more open among themselves; greater recognition needs to be given to the value of data gathering; common standards for sharing information need to be evolved; new technology tools must be developed to enable the sharing, analysis and co-creation of datasets; investors must mandate datasets to be opened and more experts need to manage and support the use of legal datasets.
So then, what really is the cost of not sharing legal datasets? The truth is, we do not know.
It would take time and resources of researchers to estimate the economic and social costs. It is ironic that we are trying to answer a question about datasets without data. But in this instance, we need some imagination and not data, to tell us what the opportunity here really is.
(Supriya Sankaran is the co-founder of Agami. An initiative that’s recently launched the Data for Justice Challenge as a potential solution to this problem. She tweets at @supriyasankaran. This is an opinion piece and the views expressed above are the author’s own. The Quint neither endorses nor is responsible for them.)
(The Quint is available on Telegram. For handpicked stories every day, subscribe to us on Telegram)
Subscribe To Our Daily Newsletter And Get News Delivered Straight To Your Inbox.