Application: Amazon Reviews

(Section 12.5)


Make sure these are attached:


Amazon Reviews



Each row of the data frame contains:

  • the 1-5 rating that the reviewer assigned to the book
  • a URL fragment that locates the review online;
  • the summary-title of the review;
  • the content of the review itself.

The Hunger Games

hunger <- subset(reviews, book == "hunger")
[1] 24027
[1] "\"<span class=\"\"a-size-base review-text\"\">Clearly <a class=\"\"a-link-normal\"\" href=\"\"/Gregor/dp/0439678137\"\">Gregor</a> was merely the prelude. Suzanne Collins, you've been holding out on us, missy. As an author we were accustomed to your fun adventures involving a boy, his sister, and a world beneath our world. I think it's fair to say that we weren't really expecting something like The Hunger Games. At least I wasn't. But reading it gave me a horribly familiar feeling. There is a certain strain of book that can hypnotize you into believing that you are in another time and place roughly 2.3 seconds after you put that book down. <a class=\"\"a-link-normal\"\" href=\"\"/Life-As-We-Knew-It/dp/0152061541\"\">Life As We Knew It</a> by Susan Beth Pfeffer could convince me that there were simply not enough canned goods in my home. And The Hunger Games? Well as I walked down the street I was under the disctinc impression that there were hidden cameras everywhere, charting my progress home. Collins has written a book that is exciting, poignant, thoughtful, and breathtaking by turns. It ascends to the highest forms of the science fiction genre and will create all new fans for the writer. One of the best books of the 2008 year.<br/><br/>Life in District 12 isn't easy for Katniss and her family. Ever since her father died the girl has spent her time saving her mother and little sister Prim from starvation by hunting on forbidden land. But worst of all is reaping day. Once a year the government chooses two children from each of the twelve districts to compete against one another in a live and televised reality show. Twenty-four kids and teens enter, and only one survives. When Prim's name is called, Katniss exchanges herself without hesitation to compete alongside the baker's boy Peeta. To survive in this game you need to win the heart of your audience, and so District 12's trainers come up with a plan. Why not make it as if Peeta and Katniss were in love with one another? But in a game where only one person can live, Katniss will have to use all her brains, wits, and instincts to determine who to trust and how to outwit the game's creators.<br/><br/>I described the plot of this book to my husband, particularly the part where Katniss and Peeta fake being in love to gain the audience's approval and the very first thing he said was, \"\"Oh! That's the plot of <a class=\"\"a-link-normal\"\" href=\"\"/They-Shoot-Horses-Don-t-They/dp/185242401X\"\">They Shoot Horses, Don't They?</a>\"\" Then I mentioned that it took place in the future and that government leaders set up teenagers to fight one another to the death and he said, \"\"<a class=\"\"a-link-normal\"\" href=\"\"/Battle-Royale/dp/1427807531\"\">Battle Royale</a>\"\". So sure, there are parts of this plot that have been done before. You could say it's <a class=\"\"a-link-normal\"\" href=\"\"/The-Game/dp/B000069HZP\"\">The Game</a> meets <a class=\"\"a-link-normal\"\" href=\"\"/Spartacus/dp/0783226039\"\">Spartacus</a> with some <a class=\"\"a-link-normal\"\" href=\"\"/Survivor/dp/B0001ZDKXI\"\">Survivor</a> thrown in for spice. But that's not what makes a book good or bad, is it? Some of the greatest works of literature out there, regardless of the readerships' age, comes about when an author takes overdone or familiar themes and then makes them entirely new through the brilliance of their own writing. Harry Potter wouldn't have been any great shakes if it weren't for Rowling's storytelling. Similarly, Collins takes ideas that have certainly seen the light of day before and concocts an amazingly addictive text. About the time you get to the fifth chapter that ends with a sentence that forces you to read on, you're scratching your head wondering how the heck she DOES that.<br/><br/>Your story often rests on the shoulders of the protagonist. Is this a believable character? Do you root for him or her? Because basically it is a very hard thing to create a \"\"good\"\" person on the page that your reader is going to fall in love with. Because we readers know that we are flawed, we are often inclined to side with the similarly flawed people we meet between a book's covers. Katniss, on the other hand, is so good in so many ways. She sacrifices herself for her sister. She tries to save people in the game. But there's almost a jock mentality to her too. Katniss can figure out the puzzles and problems in the game, but when it comes to emotional complexity she's sometimes up a tree. Most remarkable to me was the fact that Katniss could walk around, oblivious to romance, and not bug me. Seriously, nothing gets under my skin faster than heroines who can't see that their fellow fellas are jonesing for them. You just want to bonk the ladies upside the head with a brick or something. The different here is maybe the fact that since Katniss knows that Peeta has to play a part, she uses that excuse (however unconsciously) to justify his seeming affection for her. Thems smart writing.<br/><br/>Oh! And did I mention the dialogue at all? The humor? Yep, there's humor. We're talking about a story where adolescents hunger for blood, and Katniss is getting in lines about her trainers like, \"\"And then, because it's Effie and she's apparently required by law to say something awful...\"\" Good stuff. The words pop off the page. And then there's the fact that we're dealing with a dystopian novel where the author has somehow managed to create a believable future. No faux slang here, or casual references to extinct dolphins. There are some animals that were scientifically altered, but you can't have a future without a couple cool details like that, right?<br/><br/>In general, this book throws a big fat wrench into the boy book/girl book view of child/teen literature. People love to characterize books by gender. It stars a boy? Boy book. A girl? Girl book. Now take a long lengthy look at the first book in the Hunger Games Trilogy. It stars a girl... and a boy too. There's a lot of hunting, fighting, and survival... and a lot of romance, kisses, and cool outfits. There's strategy, the world's most fabulous fashion designer, weapons and a girl who knows how to fight. This is not a book that quietly slots into our preconceived stereotypes. And you know what happens to books that span genders? They sell very well indeed. That is, if you can get both boys and girls to read them.<br/><br/>The age range? Well, for most of this story I would have said ten and up. I mean, yeah the basic premise is that a lot of teenagers go around killing one another, and sure there's some romance to deal with, but none of it really seems inappropriate... until a final death scene appears in the book. I won't give any details, but suffice it to say it is gruesome. There are definite horror elements to it as well, so with that in mind I am upping my recommendation to 12 and up. I'm sure that there are 10-year-olds out there who've seen much worse stuff on cable, just as there are 12-year-olds who'll freak out ten pages in. Still, I'm more comfortable recommending it for the older kids rather than the younger. You'll see why.<br/><br/>It occurs to me that there has never been a quintessential futuristic gladiator book for kids. That is undoubtedly the roughest term you can give this book. Now I'm not a person who cries easily when she reads something, particularly something for kids. Yet as I was taking a train to Long Island I found myself tearing up over significant parts of this story. It's good. And it's so ridiculous that a work of science fiction like this could even be so good. You think of futuristic arena tales and your mind instantly sinks to the lowest common denominator. What Collins has done here is set up a series that will sink its teeth into readers. The future of this book will go one of two ways. Either it will remain an unappreciated cult classic for years to come or it will be fully appreciated right from the start and lauded. My money lies with the latter. A contender in its own right.</span>\""

Part of the Review

There is a certain strain of book that can hypnotize you into believing that you are in another time and place roughly 2.3 seconds after you put that book down. <a class=\“\”a-link-normal\“\” href=""/Life-As-We-Knew-It/dp/0152061541\“\”>Life As We Knew It by Susan Beth Pfeffer could convince me that there were simply not enough canned goods in my home.

The author has linked to another book sold on Amazon:


What other books do Hunger Games reviewers link to?

The key to analysis is to observe that links to amazon books follow a template. The following regex pulls out the part of the link specific to the book:

(?<=<a class=\“\”a-link-normal\“\” href=\“\”/)(.+?)(?=\“\”>)

A Regex String for R

Hence we make a regex:

linkPattern <- "(?<=<a class=\"\"a-link-normal\"\" href=\"\"/)(.+?)(?=\"\">)"

Now we can count the number of book linked to in each review:

hungerLinks <-
  hunger %>% 
  mutate(linkCount = str_count(content, linkPattern))

A Table

hungerLinks %>% 
  group_by(linkCount) %>% 
  summarise(n = n())
# A tibble: 10 × 2
   linkCount     n
       <int> <int>
 1         0 23854
 2         1   110
 3         2    34
 4         3    15
 5         4     7
 6         5     3
 7         6     1
 8         7     1
 9         8     1
10         9     1

Finding Titles

One reviewer linked to nine books! Let’s find them and add the base URL

Expand for Code
hungerLinks %>% 
  filter(linkCount == max(linkCount)) %>%  # get the case having most links
  .$content %>%    # get just the content of the review,
                   # a character vector (of length 1 since there
                   # is only one review with the max number of lengths)
  str_extract_all(pattern = linkPattern) %>% # get the matches,
                                             # but this is a list of
                                             # length 1 ...
  unlist() %>%     # ... so unlist it into a character vector
  str_c("", .)  # prepend the base URL to each link
[1] ""     
[2] ""           
[3] ""                                
[4] ""                      
[5] ""
[6] ""           
[7] ""     
[8] ""                                       
[9] ""