Refining Beats products with data-driven insights

May 2024
Role:
Data Analyst Extern

Overview

During my internship with Extern, I had the opportunity to work on a consumer insights project for Beats by Dr. Dre. The core aim was to gain an understanding of consumer perceptions regarding Beats headphones, as well as those of their key competitors, including Apple, Bose, and Sony. This involved an in-depth analysis of customer review data.


My role included the entire data lifecycle: from gathering and cleaning raw review data, through conducting advanced analyses, including natural language processing. The objective was to precisely identify product strengths, critical weaknesses, and highlight actionable areas for improvement. This work was designed to provide Beats by Dr. Dre with direct, data-driven intelligence to guide their future product initiatives.

The Goal

How can Beats by Dr. Dre leverage data-driven insights and sentiment analysis from consumer reviews to inform strategic product improvements?

Step 1: Collecting Data

Scraping Consumer Reviews

I started by trying to directly web scrape Amazon for reviews, but quickly ran into issues with inconsistent data and frequent blocks. To get around this, I switched to using the Oxylabs API, which was much more reliable for gathering large amounts of detailed customer reviews as JSON files. From there, I used Python scripts to parse that raw data, converting it into clean CSVs for each product, and then combined them into one big dataset.

Cleaning Data

I conducted thorough checks for missing values, particularly columns like rating and review-content, and removed any incomplete or null entries. Next, I addressed duplicate entries, identifying and eliminating them based on unique identifiers such as profile-id to maintain the uniqueness of each review. Finally, I focused on format standardization, ensuring consistency across all data points, specifically by converting the timestamp column to a uniform datetime format and confirming that the rating column was consistently numeric.


Step 2: Exploratory Data Analysis

Data Calculations

I began by calculating key descriptive statistics such as mean, median, mode, variance, and standard deviation for product ratings across all Beats models and their competitors. This helped establish an understanding of central tendencies and data spread.


Visualizations

I developed comparative visualizations, including bar charts of average ratings and distribution plots of ratings. These visuals were crucial for quickly identifying high-level performance trends and, notably, pinpointing outlier products.

Step 3: Sentiment Analysis & AI Integration

Natural Language Toolkit

I applied Natural Language Toolkit (NLTK)'s SentimentIntensityAnalyzer to assign sentiment scores (positive, negative, neutral, compound) to each customer review, providing a quantitative measure of overall sentiment per product and feature. I also performed topic modeling and keyword extraction, analyzing frequently occurring terms within positive and negative reviews to pinpoint specific aspects driving sentiment, such as "sound quality," "noise cancellation," "comfort," "battery life," and "durability."



Gemini AI

The reviews were fed into the Gemini AI model, which was configured using the GenerativeModel API. The model generated detailed summaries and highlighted key points from the reviews. This provided deeper insights into customer feedback, including common themes and specific areas of praise or concern.

Step 4: Findings

Our analysis revealed that most Beats products, like the Studio 3 and Studio Pro, maintained high average ratings, consistent with their premium positioning.

However, the Beats Solo 4 was a significant outlier, showing a notably lower average rating (3.85) and higher variability (1.43), signaling a critical area for improvement. Core strengths identified across the Beats line included sound quality, noise cancellation, and battery life. Key weaknesses, especially for the Solo 4, were concentrated around fit, durability, and comfort.

View Final Report

What I Learned

I gained practical experience in selecting effective data collection methods, recognizing the limitations of web scraping and the advantages of using APIs. I learned how to clean data and preprocessing, understanding the critical role in ensuring data integrity. Furthermore, I enhanced my skills in exploratory data analysis, data visualization, and sentiment analysis. Most importantly, I gained valuable experience in effectively communicating complex data findings into clear, actionable insights and summarizing them into a coherent report for business stakeholders.


Harini Avula - July 2025

Harini Avula - July 2025

Harini Avula - July 2025