Data Analysis of rape victims in India: Effect of literacy and sex ratio — Part 1

Suryasis Paul
4 min readJun 19, 2021

Background: The increase in the prevalence of crime against women has been a cause of concern for policymakers in India. The platitude that education is the one-stop solution for all social ills. Another plausible hypothesis that has almost become a platitude is that a higher sex ratio leads to a reduction in the number of cases of crime against women. In this article, I am trying to test these maxims against the data received from various sources.

Data sources:

  • Crimes in India | Kaggle — It contains data about the number of victims of rape — Child and Women classified into different age groups for all of the states and union territories in India. I am using the dataset corresponding to the year 2016.
  • Education in India | Kaggle — This dataset contains district and state-wise Indian primary and secondary school education data for 2015–16. The Ministry of Human Resource Development (DISE) has shared the dataset.

Assumption:

  • One assumption that I have made is that there was not a considerable change in the statistics in the year 2016 since the data relates to the previous year.
  • Please feel free if you find others 🙏

Code:

The complete code for exploratory data analysis can be found here.

Observations:

  • First, we tried to plot the number of victims of the highest and the lowest three states wrt the number of victims.

The common apology is that the states which have the largest number of cases also have the larger population sizes. So maybe the number of victims per unit population may provide a better metric.

  • When we plot the victims per unit population, we get a slightly different picture. Though Madhya Pradesh is still among the top three, there is an outlier in Telangana.
  • Next, we get to our main objective, we attempt to plot the correlation between overall literacy and the number of victims of rape per population. Eliminating the outlier Telangana we see a negative correlation here, though the coefficient is not considerable(-0.21).
Plot with Telangana.
Plot without Telangana.
  • Next, we magnify further to consider the effect of female literacy on the number of victims of rape per population. We see a similar result. This is not surprising since overall literacy has nearly a perfect positive correlation with female literacy.
Plot with Telangana.
Plot without Telangana
  • Next, we plot the sex ratio against the victims per population, we see little direct correlation between the two indicated by a low coefficient.
  • Finally, we would like to see the correlation matrix and the correlation heat map.

Next steps:

  • Since the concept of literacy only involves knowledge of 3Rs “ reading, writing, and arithmetic” the criteria may be too rudimentary to understand the overall trends. We must also investigate the same phenomenon correlating it with secondary and higher education.
  • Also, we can attempt to discover insights by dealing with more granular sample spaces, probably on a district level. This would avoid data getting masked by aggregation.
  • Any other further steps, can be suggested.

--

--