I was finally given a challenging project that wasn't a CRUD task. This one is for gathering news data from Indonesian Mainstream Media and creating sentiment analysis.

Making Simple Sentiment Analysis on Laravel


February 10, 2019 | 4 min read

I was finally given a challenging project that wasn’t a CRUD task. This one is for gathering news data from Indonesian Mainstream Media and creating sentiment analysis. Since there are numerous open source scrapper libraries, like Puppeteer, Beautifulsoup, Selenium, and others, I believe there are numerous alternatives to be found for the component of the problem involving gathering news. So, I’ve focused more on figuring out how to create sentiment analysis that supports PHP rather to Python (Because I’m using Laravel as main framework) and can support Indonesian language. Fortunately, I discovered this source called php-sentianalysis-id that meets all of my requirements.

Diving into the library

This library was built by Muhammad Nur Yasir Utomo and was a modification from James Hennessey’s project named phpinsight. Some changes are made in the dataset (lib/PHPInsight/dictionaries and lib/PHPInsight/data) that originally use English Langguage converted to Indonesian Langguage. To be spesific, the list of words of positive and negative in lib/PHPInsight/dictionaries and lib/PHPInsight/data are generated by using modified Devid Haryalesmana’s list of words on his project, ID-OpinionWords. List of words in ignore, neutral and prefix data are original words list from phpInsight with modification/translation to Indonesia. The classifier use dictionary of words that is categories as positive, neutral, and negative. The calculation of possible sentiment is calculated by Naive Bayes Algorithm. The accuracy can be improved by modified the dictionary and algorithm.

If we look the repository, the core logic of this library was put on the directory lib/PHPInsight that contained some files and directoris as follows:

  1. data: This directory contained the data sources used in the sentiment analysis process. It included various PHP files, each housing a list of words categorized by their sentiment connotation (positive, negative, neutral, or ignored).

  2. dictionaries: This directory presumably contained a list of categorized words used by the classifiers.

  3. Sentiment.php: This file is is a crucial component of the php-sentianalysis-id library. It’s part of the PHPInsight namespace and defines the Sentiment class, which is central to the sentiment analysis process.

  4. Autoloader.php: This script automatically loads PHP classes when needed.

Implementing on my project

My Project

Because the library was writing on PHP Language and have adopt modern module system with autoload.php, I don’t face any blocker to implement the library on my project. The library can call smoothly without need to explicitly require it in my project code. I’m just need to copy/clone the lib folder on my root directory project and make require_once to autoload.php in my index.php or my controller (the alternative way you can add it on composer.json schema with autload properties, see the configuration here). The next step is integrate it with collected data news. I’m just using general SQL Databases like MySQL to store and load collected data news. As an illustration, I created several table columns like article title, categories, date, author and 2 columns for store article body. The first column is for data body with html tags and the second one is data body with text only. FYI you can easily to remove text that contain html tags on php with strip_tag function.

<?php
$text = '<p>This is some <strong>bold</strong> text.</p>';
$clean_text = strip_tags($text);
echo $clean_text; // Output: This is some bold text.
?>

This is the example code how I integrate the library with articles data on controller files

<?php

namespace App\Http\Controllers;

use App\Models\Article;
use Illuminate\Http\Request;

class ArticleController extends Controller
{
    /**
     * Display a listing of the resource.
     *
     * @return \Illuminate\Http\Response
     */
    public function index()
    {
        // Include the SentimentAnalysis library
        require_once app_path('Path/To/php-sentianalysis-id/autoload.php');

        // Get all articles
        $articles = Article::all();

        // Initialize the SentimentAnalysis class
        $sentiment = new \PHPInsight\Sentiment();

        // Iterate over the articles
        foreach ($articles as $article) {
            // Get the clean_body text
            $text = $article->clean_body;

            // Calculate the sentiment scores
            $scores = $sentiment->score($text);

            // Categorize the sentiment
            $category = $sentiment->categorise($text);

            // Re-label the category
            if ($category == 'pos' || $category == 'neu') {
                $categoryLabel = 'Positif';
            } else {
                $categoryLabel = 'Negatif';
            }

            // Print the scores and category
            echo 'Article: ' . $article->title . '<br>';
            echo 'Sentiment scores: ' . json_encode($scores) . '<br>';
            echo 'Sentiment category: ' . $categoryLabel . '<br><br>';
        }
    }
}

The illustration results for code above is look like this (with some improvement on controller response and views file)

Sentiment Analysis results

Demo Code

Off course I can’t share or demo my whole code because it was private project. But instead, I have deploy the library php-sentianalysis-id on phpsandbox.io. So you can see the live demo on https://3qeek.ciroue.com/ or just see it on below

References

https://github.com/yasirutomo/php-sentianalysis-id/

https://github.com/JWHennessey/phpInsight

https://github.com/masdevid/ID-OpinionWords