As registration for COVID-19 vaccination began on Wednesday, 28 April, for those above 18 years, the government's dedicated website CoWin seemed to malfunction.
The registration process was replete with glitches, with many instances of the app and website crashing, the users not receiving an OTP, and failing to register even after multiple attempts.
In addition, none of the registered users were able to schedule an appointment, with the Aarogya Setu team, announcing that appointments will be the onus of the state governments and vaccination centres.
Nearly 1.33 crore people signed up for COVID-19 vaccination on the first day of Phase-3 vaccination. The Quint spoke to industry experts to understand what was the probable cause of this technical failure.
'Improper Planning, Load-Sharing Issues'
The entire first hour fiasco and the resultant crash may have been a net result of an improper capacity planning, load-sharing issues, and an inaccurate assessment of number of hits anticipated.
Commenting on the technical failure of CoWin, Biju George, Chief Technology Officer of Instasafe — a cyber security company that simplifies security for government and corporates, told The Quint that even though the government and the technical team responsible may have anticipated a heavy rush of traffic, they may have not planned a stimulatory performance/scalability (parallel) testing.
Why Did the Site Crash Exactly at the Launch Time?
George explained that server crashes are commonplace in heavy traffic events. Websites of Tech Giants like Amazon crash during Mega Sale days, despite all the manpower that they have dedicated towards preventing the same.
"Our data shows that the website was hosted on Amazon Web Servers with ELB (Elastic Load Balancing) in front," he said.
This means that with an ELB, the government was prepared to "automatically distribute the incoming traffic across multiple targets, and route the traffic only to relevant healthy targets". George claims that the government was prepared on that front.
Explaining the probable causes of the technical failure, he said that in some instances the ELB itself crashes, which may have been due to much more traffic than anticipated.
“A part of the problem may have been due to bot traffic. Malicious actors often send across bot traffic to such sites to cause server overloads. The technical team could definitely have done with web application firewall implementation and taken extra caution with regard to bot traffic. The fact that the server issues were not visible after the first hour may have been due to the autoscaling feature associated with AWS Servers,” he said.
'Better Planning Could Have Avoided the Failure'
Since scheduling of appointments wasn’t open for 18 to 44 year olds, the same could have been intimated to the public. George suggested four measures that could have been taken to avoid this failure:
- Vaccine registrations on CoWin could have been replaced with decentralised and distributed platforms, handled by state IT teams, that could have easily handled lower amounts of traffic.
- A website and app, especially critical ones such as CoWin or Aarogya Setu should have been tested monthly for glitches of any form, that can be exploited.
- On the technical front, capacity planning should have been taken seriously.
- Aquatic dynamic IT resource allocation is the key to manage such sudden load from users. Cloud-based infrastructure can be of help, and it is important to plan and architect it ahead.
“The CoWin website failed us at two levels yesterday. We’ve witnessed this before with the IRCTC website crashing during the tatkal booking period. These can be avoided by better funding and appropriate testing.”Kazim Rizvi, Founding Director, The Dialogue
'BandWidth Not an Issue'
Speculation was made that poor bandwidth was an issue of the anticipated outage of CoWin servers.
George told The Quint that almost 60 percent of the websites in the world are hosted by Amazon Web Services. Therefore, poor hosting was definitely not a reason to be considered in this case.
Data Privacy Still a Major Issue
The CoWin platform has previously had complaints of glitches coming in from health professionals. In addition, a critical question arises with respect to the security infrastructure on the backend.
The more pertinent questions here are regarding how secure the data is. Is data privacy being maintained? Have security audits been done? The bottomline is, given the somewhat scratchy record that government agencies have with regard to security audits, how safe is the vaccination data?
“Government agencies have been on the backfoot when it comes to replacing their security infrastructure with new tech, and adopting upgrades. Most servers still rely on VPNs for their security, when govt agencies in the US are adopting disruptive technology like Zero Trust. The security of vaccination data is paramount in this case as the same could be exploited to the detriment of millions.”Biju George, Chief Technology Officer of Instasafe