Bible Translation Sentiment Analysis

A collection of python code analyzing the sentiment and word usage in different translations of the bible.

There are many translations of the bible. Some are literal word for word translations, some care more about literal meanings, and some are paraphrased. My question was, "Do these translations convey a considerable diffence in tone from one another?" By applying a sentiment analysis on each verse we can answer this question. The source data is each bible in a raw text file that I got from https://openbible.com/texts.htm. At first I wrote this to analyze one at a time but once I did that I decided to make it itterate over the whole list. It must have been fairly intensive because it took my 1260p 16G ram machine over 24 minutes to run through it. Not sure if that's good or bad considering it was roughly 342, 144 individual verses.

The comments in the code should be enough to get the gist of what I was doing.

import re

import pandas as pd

import matplotlib.pyplot as plt

import nltk

from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')  

import matplotlib.pyplot as plt

import numpy as np



# List file names

filenamelist = ("wbt.txt", "web.txt", "ylt.txt", "akjv.txt", "asv.txt", "cpdv.txt", "dbt.txt", "drb.txt", "erv.txt", "jps.txt", "KJV.txt", "slt.txt")


# Define column names

columns = ["Translation","Positive", "Negative", "Neutral"]

totals_df = pd.DataFrame(columns=columns)


# Itterate through each file

for i in filenamelist:


    #Construct the full file path

    path = "C:/Users/jpapi/OneDrive/Documents/Jasons/Projects/BibleSentimentAnalysis/" + i

   

    # Get bible abreviation

    bible_name = i.split(".")[0]

    print(bible_name)

   

    # Read text

    with open(path) as f:

        lines = f.readlines()

   

    # Put text into Dataframe, separated by individual verses

    df = pd.DataFrame(lines)

    df["row_id"] = df.index + 1

    df.columns=["TextColumn", "RowID"]


    # Clean numbers, tabs, colons, tabs, and white spaces

    df["TextColumn"] = df["TextColumn"].str.strip()

    df["TextColumn"] = df["TextColumn"].str.replace('\d+', '', regex=True)

    df["TextColumn"] = df["TextColumn"].str.replace(':', '', regex=True)

    df["TextColumn"] = df["TextColumn"].str.replace('\t', '', regex=True)

    df["TextColumn"] = df["TextColumn"].str.split(n=1).str[1]

   

    # Function to analize sentiment of the text with scores

    def analyze_sentiment(text):

        sia = SentimentIntensityAnalyzer()

        sentiment = sia.polarity_scores(text)

        return sentiment['compound']

   

    # Apply function to each verse

    df['TextColumn'] = df['TextColumn'].astype(str)

    df['TextColumnOutput'] = df['TextColumn'].apply(analyze_sentiment)

   

    # Classify the scores of each verse into Positve, Negative, and Neutral

    def classify_sentiment(score):

        if score >= 0.05:

            return 'Positive'

        elif score <= -0.05:

            return 'Negative'

        else:

            return 'Neutral'

   

    # Put the classification into an adjacent column        

    df['SentimentClassify'] = df['TextColumnOutput'].apply(classify_sentiment)

   

    # Count number of verses in with each classification

    CountOfPositives = (df['SentimentClassify'] == "Positive").sum()

    CountOfNegatives = (df['SentimentClassify'] == "Negative").sum()

    CountOfNeutrals = (df['SentimentClassify'] == "Neutral").sum()

   

    # Put counts into single line dataframe

    single_bible_df = pd.DataFrame({

        'Translation': [bible_name],

        'Positive': [CountOfPositives],

        'Negative': [CountOfNegatives],

        'Neutral': [CountOfNeutrals]

    })

   

    # Add single line dataframe onto larger dataframe

    totals_df = pd.concat([totals_df, single_bible_df], ignore_index = True)

   

# View results

print(totals_df)



# Set width of bars

bar_width = 0.2


# Set positions of bars on X-axis

r1 = np.arange(len(totals_df['Translation']))

r2 = [x + bar_width for x in r1]

r3 = [x + bar_width for x in r2]


# Create a bar chart

plt.figure(figsize=(12, 6))

plt.bar(r1, totals_df['Positive'], color='yellow', width=bar_width, edgecolor='grey', label='Positive')

plt.bar(r2, totals_df['Negative'], color='red', width=bar_width, edgecolor='grey', label='Negative')

plt.bar(r3, totals_df['Neutral'], color='blue', width=bar_width, edgecolor='grey', label='Neutral')


# Add labels and title

plt.xlabel('Translation', fontweight='bold')

plt.xticks([r + bar_width for r in range(len(totals_df['Translation']))], totals_df['Translation'])

plt.ylabel('Count', fontweight='bold')

plt.title('Sentiment Analysis of Bible Translations', fontweight='bold')

plt.legend()


# Show the plot

plt.show()


Results

After viewing these two visuals we can determine that they do vary slightly in their overall sentiment. The slight variation is likely due to the translators word choice being graded more kindly or harshly than the others by the sentiment function. I was hoping to see some stark differences that I could investigate but as we didn't get that, I won't dive further into it at this point.

Fun Fact--There are around 31,102 verses in the bible, give or take a few depending on the translation.