Bible Translation Sentiment Analysis
A collection of python code analyzing the sentiment and word usage in different translations of the bible.
There are many translations of the bible. Some are literal word for word translations, some care more about literal meanings, and some are paraphrased. My question was, "Do these translations convey a considerable diffence in tone from one another?" By applying a sentiment analysis on each verse we can answer this question. The source data is each bible in a raw text file that I got from https://openbible.com/texts.htm. At first I wrote this to analyze one at a time but once I did that I decided to make it itterate over the whole list. It must have been fairly intensive because it took my 1260p 16G ram machine over 24 minutes to run through it. Not sure if that's good or bad considering it was roughly 342, 144 individual verses.
The comments in the code should be enough to get the gist of what I was doing.
import re
import pandas as pd
import matplotlib.pyplot as plt
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
import matplotlib.pyplot as plt
import numpy as np
# List file names
filenamelist = ("wbt.txt", "web.txt", "ylt.txt", "akjv.txt", "asv.txt", "cpdv.txt", "dbt.txt", "drb.txt", "erv.txt", "jps.txt", "KJV.txt", "slt.txt")
# Define column names
columns = ["Translation","Positive", "Negative", "Neutral"]
totals_df = pd.DataFrame(columns=columns)
# Itterate through each file
for i in filenamelist:
#Construct the full file path
path = "C:/Users/jpapi/OneDrive/Documents/Jasons/Projects/BibleSentimentAnalysis/" + i
# Get bible abreviation
bible_name = i.split(".")[0]
print(bible_name)
# Read text
with open(path) as f:
lines = f.readlines()
# Put text into Dataframe, separated by individual verses
df = pd.DataFrame(lines)
df["row_id"] = df.index + 1
df.columns=["TextColumn", "RowID"]
# Clean numbers, tabs, colons, tabs, and white spaces
df["TextColumn"] = df["TextColumn"].str.strip()
df["TextColumn"] = df["TextColumn"].str.replace('\d+', '', regex=True)
df["TextColumn"] = df["TextColumn"].str.replace(':', '', regex=True)
df["TextColumn"] = df["TextColumn"].str.replace('\t', '', regex=True)
df["TextColumn"] = df["TextColumn"].str.split(n=1).str[1]
# Function to analize sentiment of the text with scores
def analyze_sentiment(text):
sia = SentimentIntensityAnalyzer()
sentiment = sia.polarity_scores(text)
return sentiment['compound']
# Apply function to each verse
df['TextColumn'] = df['TextColumn'].astype(str)
df['TextColumnOutput'] = df['TextColumn'].apply(analyze_sentiment)
# Classify the scores of each verse into Positve, Negative, and Neutral
def classify_sentiment(score):
if score >= 0.05:
return 'Positive'
elif score <= -0.05:
return 'Negative'
else:
return 'Neutral'
# Put the classification into an adjacent column
df['SentimentClassify'] = df['TextColumnOutput'].apply(classify_sentiment)
# Count number of verses in with each classification
CountOfPositives = (df['SentimentClassify'] == "Positive").sum()
CountOfNegatives = (df['SentimentClassify'] == "Negative").sum()
CountOfNeutrals = (df['SentimentClassify'] == "Neutral").sum()
# Put counts into single line dataframe
single_bible_df = pd.DataFrame({
'Translation': [bible_name],
'Positive': [CountOfPositives],
'Negative': [CountOfNegatives],
'Neutral': [CountOfNeutrals]
})
# Add single line dataframe onto larger dataframe
totals_df = pd.concat([totals_df, single_bible_df], ignore_index = True)
# View results
print(totals_df)
# Set width of bars
bar_width = 0.2
# Set positions of bars on X-axis
r1 = np.arange(len(totals_df['Translation']))
r2 = [x + bar_width for x in r1]
r3 = [x + bar_width for x in r2]
# Create a bar chart
plt.figure(figsize=(12, 6))
plt.bar(r1, totals_df['Positive'], color='yellow', width=bar_width, edgecolor='grey', label='Positive')
plt.bar(r2, totals_df['Negative'], color='red', width=bar_width, edgecolor='grey', label='Negative')
plt.bar(r3, totals_df['Neutral'], color='blue', width=bar_width, edgecolor='grey', label='Neutral')
# Add labels and title
plt.xlabel('Translation', fontweight='bold')
plt.xticks([r + bar_width for r in range(len(totals_df['Translation']))], totals_df['Translation'])
plt.ylabel('Count', fontweight='bold')
plt.title('Sentiment Analysis of Bible Translations', fontweight='bold')
plt.legend()
# Show the plot
plt.show()
Results
After viewing these two visuals we can determine that they do vary slightly in their overall sentiment. The slight variation is likely due to the translators word choice being graded more kindly or harshly than the others by the sentiment function. I was hoping to see some stark differences that I could investigate but as we didn't get that, I won't dive further into it at this point.
Fun Fact--There are around 31,102 verses in the bible, give or take a few depending on the translation.