Building trust and credibility with our diners

GrubHub is the nation's leading online and mobile food ordering company dedicated to connecting hungry diners with local takeout restaurants. The company’s online and mobile ordering platforms allow diners to order directly from approximately 35,000 takeout restaurants in more than 900 U.S. cities and London. In 2014/15, I led the redesign of GrubHub's ratings & reviews system for both GrubHub and Seamless brands.

Goal

Increase diner engagement, perceived credibility, and usefulness of Ratings and Reviews on GrubHub and Seamless.

Challenges

When I joined GrubHub in 2014, the ratings systems on both GrubHub and Seamless were severely lacking. In order to identify challenges and set goals for this project, I conducted stakeholder interviews, analyzed existing data, and reviewed user research and customer feedback.

The following patterns emerged: (1) Diners didn’t feel like there was a high enough volume of ratings and reviews on GrubHub or Seamless, which caused them to look for review content on Yelp and Google before ordering. (2) Diners did not find our ratings and reviews credible or trustworthy (and we experienced a number of issues with rating fraud). (3) An extremely low percentage of our diners were rating and reviewing restaurants. (4) Diners were struggling to find information relevant to them in our ratings and reviews.

Previous versions of the site showed the Yelp rating next to the GrubHub rating. This wasn't ideal, since it caused diners to click to Yelp to read reviews, breaking the order flow. Additionally, many Yelp reviews focused on ambiance and service - c… — Previous versions of the site showed the Yelp rating next to the GrubHub rating. This wasn't ideal, since it caused diners to click to Yelp to read reviews, breaking the order flow. Additionally, many Yelp reviews focused on ambiance and service - characteristics that our diners didn't care about for take out or delivery.

"...the ratings on seamless are absolutely useless and probably the most important part of your selection process. Whether they're doctored or "cleaned" for a fee by the restaurants themselves or seamless is just not interested in preserving the integrity of the rating system, I definitely see 4 star overall rated restaurants with an abundance of 1 star reviews when I click into the reviews detail.”- AppStore Review

"Reviews could also use tweaking, such as separate options for delivery time and food review. I think this would be a better system, instead of people who post negative reviews for orders being late as opposed to the food being good.” - AppStore Review

“...One minus - inaccurate restaurant ratings. Have to open the restaurant’s profile, click on their rating, only then to see the actual rating history (sometimes not matching the posted rating, or not having any ratings at all.” - AppStore Review

"GrubHub Seamless needs to fix its awful ratings system, because there's no way this Chinese delivery joint has the best food in NYC" - Business Insider

User Research

I worked with one of our fabulous user researchers to put together a more in-depth research study that ran in parallel to a lot of the other exploratory work that we were doing. The primary goal of this study was to learn more about what makes content (like ratings and reviews) feel trustworthy to users.

We ran a three-day study with 30 participants, where we had them log onto a discussion board twice per day to respond to our prompts about the role of online ratings and reviews, why they have/haven't rated or reviewed anything before, trust and transparency, and ratings and reviews in the context of online delivery.

The discussion board format worked really well for us - we had 100% participation from all 30 participants and generated over 1,000 posts. The participants wrote candidly about their own experiences, posted screenshots and photos, and replied to each others posts. Not only did this study help us get our heads around how people feel about ratings and reviews, it also helped us generate a lot of ideas about how to solve the trust problem.

Ideation

In order to help generate concepts and hypotheses, I facilitated sketching workshops with members of the Design and Discovery teams. The resulting sketches were a good jumping off point for me to start creating some early prototypes for usability testing.

As our ideas began to take form (and after we gained some insights from usability testing), I mapped the proposed user flow and created storyboards to help convey our ideas to team members and stakeholders.

Prototyping & Usability Testing

In order to better understand how users would interpret alternate approaches to restaurant ratings and reviews, we conducted a series of 45 min usability sessions with existing NYC Seamless and GrubHub diners. I created clickable prototypes so that we could test several design concepts, including (1) faceted ratings, (2) an improved review layout and “expert” designations, and (3) a rating/review input flow based on binary and multiple choice questions.

We also ran a series of SMS tests with several thousand current diners, with the goal of determining what types of questions diners would be most likely to answer.

Testing Our Hypotheses

Concept 1: Faceted Ratings

Our first hypothesis was that displaying ratings in a more meaningful way by faceting (i.e.: breaking out an overall rating by Food Quality, Delivery Speed, and Order Accuracy) would make diners’ decision-making process easier by surfacing the content they care about.

The faceted approach was well-received in usability testing because it broke down key aspects that the participants cared about (Is the food good? Did it arrive on time? Was my order correct?); they felt this approach would help them make a decision faster and easier.

"Delivery speed would jump up in priority for me at lunch time or if I was really hungry or pressed for time.” - Participant

Concept 2: Improved review layout and "expert" designations

Our second hypothesis was that diners would feel that reviews were more trustworthy if we more clearly attributed them to a person (name and avatar). Additionally, we thought that some diners may trust a review more if they see that it comes from an “expert” (a reviewer who has ordered and/or reviewed a lot of restaurants of a particular cuisine). Assigning a color coded thumbs up or down to each review allowed participants to quickly scan the reviews and understand the sentiment at a glance. The binary approach felt easier than stars for both consuming and submitting ratings. The impact of Expert reviews was mixed; however, some participants gauged trustworthiness by the number of reviews by a reviewer.

"I give more credit to reviewers who've been around longer...trusted, long-standing reviewers are the most helpful.” - Participant

Concept 3: Rating/Review Input Flow

Our third hypothesis was that diners would be more likely to rate or review an order if we asked them direct (possibly binary) questions about their experience (as opposed to asking them to rate it out of 5 stars). The input prompts felt “easy” and “quick” for participants to cycle through and answer questions. Many whizzed right through the prompts and felt this approach would make them rate more.

“A quick one word or reply is easiest. I could see myself clicking through and rating a few restaurants I’ve ordered from while waiting for my food to be delivered.” - Participant

“I wouldn't say I 40% liked a restaurant. This makes it easy to make snap-judgements. It’s easier than rating everything on a scale from 1 to 5.” - Participant

SMS Tests

We sent approximately twenty SMS message variants to diners 40 minutes after the end of their order window. Each message variant was sent to 1,000 diners selected from restaurants that received more than 60 orders in the past 30 days. We focused on specific aspects of the dining experience and tested three types of questions: binary, multiple-choice, and open-ended.

We discovered that the more straightforward the question, the more responses we received. Our best response rates (over 50%!) came from binary questions that didn't require much interpretation from the diner, such as "Was your delivery on time?", "Was the food good?", and "Was your order correct?".

Results

GrubHub unveiled the new ratings & reviews system in July 2016. Each day, the new system collects, on average, 70,000 data points from its customers. Between December and July, while it was in a limited beta test, the new system already collected more data than Grubhub's previous system had collected in 10 years. The data from these responses is then aggregated into an overall star rating as well as a score for each facet of the ordering experience, providing benefits for diners and restaurant owners.

Read about the new ratings & reviews system in Motherboard and the Chicago Tribune.

"At Chopt we're focused on providing our diners with great local ingredients and a top of the line experience whether they order in one of our many locations or online. The new ratings and reviews process provides us with feedback on each level of the delivery experience -- time, accuracy and taste -- so that we are able to maintain our high quality standards for new and existing customers."

- Tom Kelleher, SVP of Operations for Chopt Salad Company.

Bringing Ratings to the Apple Watch

The Apple Watch was released in April 2015, and our new ratings input system seemed like the perfect use case for the watch. It's was simple, binary, and requires minimal user interaction. Additionally, we knew from usability testing that rating past orders was something fun that users might like to do when they had some downtime - like waiting for their order to arrive.

I designed the ratings flow for Version 1.0 of the Seamless Apple Watch app. I also designed the Nearby feature, the Home screen, and assisted with the overall visual design and user experience.

Facebook Messenger Concept

We were really excited about the release of custom layouts for business on Facebook Messenger, so I mocked up a concept of how we might like to use the platform to feed our new ratings system.

This concept would be similar to the ratings input flow that we are planning on using on the site, in the apps, and via SMS, but Messenger would allow us to collect this information from our diners in a conversational and rich way, which we think could help us gather more honest responses about their delivery or takeout experience.