top of page
Beer Samples
Beer Recommendation System

The project aims to analyze user-generated beer reviews to identify key attributes that influence customer preferences and perceptions. By leveraging natural language processing techniques, the study seeks to uncover significant words and phrases that are commonly associated with positive or negative reviews, aiding in the formulation of a beer recommendation system.

Project Type

Tools

The beer review analysis project adeptly pinpointed essential beer characteristics through word frequency analysis, associating attributes like 'chocolate', 'coffee', and 'bourbon' with positive reviews. Handling unstructured text and sparse meaningful content posed significant challenges, compounded by scraping tool limitations that restricted data access. Utilizing tools such as Python’s NLTK library, BeautifulSoup, and Pandas, the project combined web scraping with advanced natural language processing (NLP) and diverse visualization techniques to elucidate findings. Looking ahead, the project could benefit from incorporating more sophisticated NLP models like BERT to enhance contextual understanding, expanding the dataset, and refining the recommendation system to tailor to user-specific preferences, thereby improving the accuracy and relevance of beer recommendations.

Summary
Screenshot 2024-07-30 at 6.47.31 AM.png
Screenshot 2024-07-30 at 6.48.04 AM.png

The graph reveals that positive terms are more prevalent in higher-rated reviews, while negative terms appeared more frequently in lower-rated reviews. This visual effectively demonstrated the alignment of specific descriptors with overall satisfaction levels.

This visualization highlighted strong correlations between certain pairs of terms (like "rich" and "creamy" or "bitter" and "sour"), indicating that these terms often appeared together in reviews. This can suggest common flavor profiles that are preferred or disliked by the users.

The Process

Data was sourced from BeerAdvocate's website, involving around 5-6k reviews which were scraped and analyzed. Initial exploratory data analysis focused on identifying common words and their frequencies, which helped in understanding the prevalent themes in the reviews.

The raw data underwent cleaning to remove irrelevant sections and normalization to standardize the text format. All review texts were converted to lowercase to standardize the data. Stopword removal and stemming were applied using NLTK to reduce the text data to its most informative components.

Data Acquisition and Preparation

Data Analysis

  • Keyword Extraction: Utilized frequency analysis to identify the most commonly mentioned words in the reviews. This was achieved through NLTK's FreqDist function, which helped pinpoint key terms associated with positive and negative sentiments.

  • Sentiment Analysis: Applied VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based sentiment analysis tool that is particularly good at handling sentiments expressed in social media contexts. We assessed sentiments for each review, categorizing them into positive, neutral, and negative based on compounded score thresholds.

  • Trend Analysis: Analyzed the relationship between specific keywords and overall review ratings to determine which beer attributes (like flavor profiles) correlated strongly with higher ratings.

The analysis highlighted that terms like "rich", "smooth", and "balanced" were strongly associated with higher-rated beers, suggesting that users prefer beers with these characteristics. Negative reviews frequently contained terms like "bitter" and "sour", which were less favored. The sentiment analysis confirmed that positive reviews significantly outnumbered negative ones, indicating overall satisfaction with the beers reviewed on the site.

Screenshot 2024-07-30 at 6.47.31 AM.png

The graph reveals that positive terms are more prevalent in higher-rated reviews, while negative terms appeared more frequently in lower-rated reviews. This visual effectively demonstrated the alignment of specific descriptors with overall satisfaction levels.

Screenshot 2024-07-30 at 6.48.04 AM.png

This visualization highlighted strong correlations between certain pairs of terms (like "rich" and "creamy" or "bitter" and "sour"), indicating that these terms often appeared together in reviews. This can suggest common flavor profiles that are preferred or disliked by the users.

bottom of page