Sunteți pe pagina 1din 2

Sentiment Analysis for Drugs/Medicines

Sentiment analysis is a type of text research. It applies a mix of statistics, natural


language processing (NLP), and machine learning to identify and extract subjective
information from text files, for instance, a reviewer’s feelings, thoughts, judgments, or
assessments about a particular topic, event, or a company.

Nowadays the narrative of a brand is not only built and controlled by the company
that owns the brand. For this reason, companies are constantly looking out across
Blogs, Forums, and other social media platforms, etc for checking the sentiment for
their various products and also competitor products to learn how their brand
resonates in the market. This kind of analysis helps them as part of their post-launch
market research. This is relevant for a lot of industries including pharma and their
drugs.

Business Need

The Client is a manufacturer of drugs and medicines. We have to create a drug review
model to predict whether the consumer is satisfied with the drug usage or not.

Dataset Provided

The dataset has with three main columns - text pertaining to drugs, drug name and
sentiment.
Sentiment can be clubbed into- positve(0), negative(1) and neutral(2).

Challenges

• The dataset provided was highly imbalanced. Creating a predictive model on


such dataset will always give biased prediction.

• The language used in this type on content have usage of sarcasm. Also in some
texts, the consumer has expressed positive sentiment for one drug alongwith
neutral/negative for other drug.

Solution

• Preprocessing- Data was preprocessed to find the most appropriate features


for predictive modeling.

• Synthesizing data- More data was created to reduce imbalacing of the


multilabel dataset by appplying SMOTE analysis.
• Text Cleaning- Stop words were removedand sentences were tokenized.
Stemming and lemmatization were performed on tokenized words. N-grams
were created to make meaningful phrases in the data.

• Visualization- Bar plots and histogramds were created to analyze the


distribution of words for particular sentiments. BOW(bag of words) approach
applied to review frequency of positive or negative words in the text.

Results

• Increase effectiveness of clinical trial programs and drug promotion(increase


awareness of drug benefits)
• Reduction in human efforts and errors
• Increase ROI on drug promotion
• Real time insight analysis

Technologies Used

Python, BeautifulSoup, NLTK, gensim, Pandas, imblearn, Scikit-learn, MSSQL Server

S-ar putea să vă placă și