Reading in an age of change:
a collaborative project by Meanjin and Overland.

The pirate code

Posted at Wednesday 28 Jul by Emmett Stinson.

Right now publishers are abuzz with discussions of ‘book futures’ and the digital revolution, but there is still an almost complete uncertainty surrounding even the most basic issues. What, if any, devices will become standard for electronic reading? How will books be distributed? What formats will be used? A recent stand-off over pricing resulted in all of Pan Macmillan’s e-book titles being temporarily unavailable on Amazon’s website, demonstrating that the industry can’t even agree on what digital books should cost.

This uncertainty has necessarily been accompanied by a great deal of anxiety, especially given the difficulties experienced in other industries – such as music, television and film – that have entered the digital domain. As these sectors discovered, digital distribution facilitates piracy: that is, the unlawful duplication and distribution of intellectual property without due compensation to its owner.

Debates over piracy usually focus on the validity of current law, the ethics of accessing copyrighted goods illegally, or the virtues of open source, copyleft, creative commons or similar ‘free’ notions of intellectual property.1 At the heart of the argument, however, lies a purely technological issue: the nature of digital data. The chief benefit of digitised information is that it can be easily and virtually instantaneously duplicated. Ethical and legal condemnations of copyright pirates often miss this point: the pirates are simply utilising the inherent qualities of digital technology. As McKenzie Wark has aptly noted, ‘In its reproducibility, the digital is always neither theft nor property, unless the artifice of the law makes it so.’2

In January 2010, the American corporation Attributor released a study on book piracy in the US that tracked illegal downloads of 913 titles listed on Amazon. More than nine million copies of these books were, it concluded, illegally downloaded, resulting in potential losses to the publishing industry of US$2.75 to $3 billion.

There are good reasons to be somewhat sceptical of these conclusions. Firstly, an illegal download doesn’t necessarily correspond to a lost sale (that is, just because someone downloads something for free doesn’t mean they would have bought it). Secondly, download statistics can be inflated by ‘bots’ – programs that are designed to perform tasks on the internet automatically. Most importantly, the study was conducted to advertise the company’s services: for a fee, Attributor provides ‘the monitoring and enforcement’ of copyright by identifying illegal distribution of intellectual property on the web.3 Its services are already employed by such publishers as the Associated Press and John Wiley & Sons.

Nonetheless, whatever the limitations of the Attributor study, it is probably true that book piracy is practised far more widely than even those in the industry suspect. A quick look at two of the better developed sites that provide links to illegally digitised books – Gigapedia and AvaxHome – will allow a user access to upwards of 500 000 books at no cost.4 This figure is more than double the 200 000 titles that were available for Amazon’s Kindle e-book reader when it launched in Australia in October 2009.5 To offer a physical comparison, both the Barr Smith Library at the University of Adelaide and the Matheson Library at Monash University have physical holdings of just over two million books, and so, with a major transition of the industry into e-publishing, availability of pirated books could greatly exceed the holdings of most research libraries.

These figures are even more surprising given that book piracy is often very labour intensive. To convert a physical book into a digital format, pirates must scan every page. If this is done with a standard home scanner, the process can take longer than photocopying an entire book. The scans must then be run through Optical Character Recognition (OCR) software that converts the images into text. OCR technology is not 100 per cent accurate, so diligent pirates also need to review the entire text for errors before converting it into one of the many available publishing formats and placing it onto the internet for download.

There are ways in which this process can be sped up, using cheap materials and a bit of ingenuity. Last year, Daniel Reetz described building a DIY book scanner (which strangely resembles the robots Johnny Five from the 1986 film Short Circuit and the pseudonymous main character of 2008’s Wall·E) at home for under $300, using timber, acrylic, two digital cameras and a few lights, as illustrated.6 Reetz estimated that while scanning a book on a standard flat-bed scanner takes about three and a half hours, his DIY set-up could scan a 400-page book in about twenty minutes.7 An impressive increase in efficiency to be sure, but not exactly rip-and-burn technology either.

Such DIY projects testify to both the resourcefulness and determination of would-be pirates. Those determined to violate copyrights will, it seems, inevitably find ways to elude legislation and lawsuits. But why do people pirate books in the first place?

Portrait of a pirate as a young man

Given the time-commitment required to digitise books illegally, it’s difficult to know exactly what compels people to do so. Even the book pirate known as The Real Caterpillar speculates that the motivations are multiple and that pirates comprise various groups of people, including ‘bibliophiles who want to share their favorite books with others … habitual pirates who want to be the first to upload a new release, and people with some other weird agenda that only they understand.’8 Perhaps other pirates are encouraged by either anti-capitalist political beliefs or the desire to participate in sharing knowledge in online communities.

In order to examine those who download book files, I’ll look at two particular sites: the aforementioned Gigapedia and AvaxHome.9 These examples can only be considered emblematic rather than providing comprehensive data on internet piracy. That said, they are two of the more comprehensive illegal book sites and therefore have much to suggest about e-book piracy practices.

Neither site actually hosts any copyrighted material of any kind. Rather, both sites post links to file-sharing services such as RapidShare, Megaupload and iFile.it, where the pirated e-books are actually stored. Gigapedia is better organised and has roughly double the content of AvaxHome. AvaxHome is freely searchable by anyone on the web; in order to access Gigapedia, you must register and create a username and password. In this way, Gigapedia acts like an online community, with discussion boards and FAQ sections that allow users to interact and ask questions about the site. Users are even giving rankings from ‘newbie’ to ‘VIP’, with each level offering more comprehensive privileges; users move up the ranks by contributing more e-books.

Book piracy is a global phenomenon. Australia accounts for only 1.1 per cent of traffic on AvaxHome and less than 0.5 per cent on Gigapedia. As the tables on the previous page suggest, most users of pirated materials come from developing nations where limited economic resources might make buying expensive textbooks virtually impossible. Some of these nations also have various levels of censorship in place, whether officially sanctioned or socially reinforced. Piracy may, in fact, enable people to access information that is otherwise unavailable, and so can serve an educational and democratising function that is usually ignored by ethical and legal condemnations.

The primary users of both sites are overwhelmingly males between the ages of 18 and 34, without children.10 This is perhaps not surprising given that e-book readers seem to be, by a slim majority, male (in contrast to the generally female book-buying market).11

What’s perhaps most telling for the industry is that most users are currently enrolled as graduate students in university programs and are accessing the internet from educational institutions. Graduate students are (and I’m speaking from experience here) typically time-rich and money-poor, and need a wide array of books and resources. This convergence of limited resources, free time, and a need for accurate and extensive information makes them ideal candidates for book piracy.

Percentage of AvaxHome Users by Country

Country %
1 India 8.2
2 Russia 7.3
3 United States 6.9
4 Iran 6.3
5 China 6.0
6 Germany 5.7
7 Italy 4.5
8 Indonesia 3.5
9 Pakistan 2.1
10 Turkey 2.1

Percentage of Gigapedia Users by Country

Country %
1 Iran 18.5
2 India 12.6
3 China 10.6
4 United States 8.4
5 Indonesia 7.3
6 Pakistan 3.0
7 Egypt 2.4
8 Germany 2.2
9 Bangladesh 1.7
10 Russia 1.7

Although I have not been able to collect comprehensive data relating to the frequency of downloads, I have been able to get a sense of the material housed on these sites by looking at the number of entries per category (see tables on previous page). On Gigapedia, the second largest section is fiction, which seems to comprise just over 10 per cent of the site’s content.12 But the size of the category is misleading. Fiction is an enormously broad term on Gigapedia, including both literary and genre fiction – and Gigapedia users appear to upload an enormous amount of genre fiction. Literary fiction is considerably more spotty, even for current novels: you can download Roberto Bolaño’s 2666 but not Thomas Pynchon’s Inherent Vice. Australian literature is virtually unrepresented, with searches for Helen Garner, Frank Moorhouse, David Malouf and Patrick White turning up nothing.

Gigapedia’s Top 25 Categories by Number of Entries

Category Number of Entries
1 History 39,819
2 Literature / Fiction 39,797
3 Computers / IT 33,474
4 Social Sciences 32,622
5 Medicine 31,482
6 Business / Commerce 25,047
7 Politics 23,989
8 Science General 23,942
9 Engineering 22,429
10 Mathematics 19,071
11 Philosophy / Ethics 18,49
12 Religion / Spirituality 17,591
13 Economics 17,350
14 Biology / Zoology / Life Sciences 16,973
15 Education / Exams / Teaching / Lectures 14,592
16 Psychology 13,531
17 Physics 12,797
18 Health / Mind / Body 12,115
19 Environmental / Agricultural Science 11,996
20 Management / Logistics 11,739
21 Chemistry 11,490
22 Language / Linguistics / Grammar 11,423
23 eMagazines / eJournals 10,129
24 Law 9,502
25 Finance / Investing 8,901

AvaxHome’s Top 25 Categories by Number of Entries

Category Number of Entries
1 Science 55,365
2 History / Military 23,235
3 eLearning 21,375
4 Engineering and Technology 20,175
5 Business, Job 17,445
6 Economics and finances 13,200
7 Development / Programming 12,015
8 Personality 9,525
9 Politics, Sociology 9,420
10 Cultures / Languages 8,085
11 Others 6,975
12 Software Related 5,865
13 Theology and Occultism 5,175
14 Audiobooks 5,085
15 Novels 4,350
16 Encyclopedias, Dictionaries 4,260
17 Cooking and Diets 2,715
18 Drawing, Painting and Design 2,490
19 Artbooks 2,355
20 Travel Guides 2,340
21 Biographies 2,265
22 Architecture 2,115
23 Music 1,755
24 Sports 1,365
25 Hardware 1,035

Examining the categories with the greatest number of entries is revealing. History, Computers/IT, Social Sciences, Medicine, Business, Politics, Engineering, Mathematics: the common denominator is that these books are used for learning, not for leisure. Informative materials are, of course, not limited to the sciences, with scholarly books in the humanities also widely available in pirated versions. Many of these files appear to be professionally produced PDFs, which leads me to suspect that some academics may even intentionally ‘leak’ their books to such sites in the hopes of generating more readers and thereby more citations.

Often, when we think of books, we think of novels, but fiction only accounts for a fraction of the much larger publishing industry.13 The books that you see when you walk into a book store – so-called ‘trade’ publications – only constitute 60 per cent of the overall industry.14 The remainder is made up of forms of publishing (predominantly educational) that exist to provide information. Indeed, even the vast majority of trade publishing can be understood as broadly didactic, when considering books on cooking or gardening, dictionaries and encyclopedias, computer instruction, science, and even much of history and travel.

The key distinction between the piracy of books and the piracy of other media (music, film, television) is that book piracy is first and foremost motivated by a practical need for information on a given subject. Amongst groups of people lacking adequate resources to purchase such information, pirated copies circulate from utilitarian need.15 Thus the all-too-common comparison between the publishing industry and the recent history of the music industry reveals itself as fatally flawed.16 Everyone loves music, but music won’t teach you to make coq au vin or grow azaleas. Millions of songs have been written about love, but few, if any, explain the fundamentals of differential calculus or the meaning of Kant’s distinction between noumena and phenomena.

If book piracy is motivated by, on one hand, a need for information and, on the other, a lack of resources, then it is educational publishing that is most likely to be affected by it. Students, particularly tertiary students, have traditionally been a captive audience for publishers: they are required to buy expensive textbooks in order to pass university courses. But students are notoriously cash-strapped and textbooks are notoriously expensive. Textbook piracy already appears to have caused significant losses in the US, with sales dropping 14.5 per cent in 2008 from the previous year.17

Many educational publishers are already rethinking their strategy. In February, Macmillan announced the launch of its DynamicBooks software, which will enable university lecturers to edit, rewrite and add material to their textbooks in order to tailor them specifically to their courses. While the New York Times likened this move to Wikipedia’s user-generated content, it’s clear that the real motivation for developing this technology was piracy: if textbooks are customised for individual courses, it is much less likely that pirate copies will circulate.18 Students will still have to pay for learning materials but textbooks will be cheaper in digital form and completely relevant to the course at hand, dispensing with the need for course readers and other supplementary materials. Similar initiatives have been launched by McGraw-Hill and John Wiley & Sons.19

Textbooks are, of course, neither the first nor only books to radically transform their delivery models. The Oxford English Dictionary, for example, first went digital in 1988 in the form of a CD-ROM, and has been available as an online subscriber service since 2000. The print version is, from a bibliophile’s perspective, an absolutely beautiful publication, but its twenty heavy volumes hardly make for easy access. Even the so-called ‘compact’ edition consists of one large book with print so small that you literally need to use a magnifying glass (supplied with the text!) to read it. There is no question that, from a reader’s perspective, the electronic version is both more efficient and easier to use. In Australia, the Macquarie Dictionary has also gone online; not only are its digital products cheaper (the unabridged electronic dictionary and thesaurus cost the same as the print version of the much shorter Macquarie Concise Dictionary), they also enable users to import the dictionary into the spellcheckers of word processing programs.

If publishers want to be successful in the transition to digital publishing, they will have to think carefully about how their products can offer advantages that go above and beyond simply digitising content. Even the much-discussed notion of adding multimedia content to e-books isn’t enough,20 since such material can (and will) be easily pirated. What this means is that many objects we have previously thought of as ‘publications’, as textual artefacts, may become far more interactive. Digital publishing means much more than e-books: digital books may resemble something closer to software applications, web services that provide constantly updated content or websites in which users buy a subscription to access password-protected content. Otherwise, content may need to be in some way customisable (as with Macmillan’s DynamicBooks), enabling users to tailor texts to their own needs or generate their own content. If digital publishing does take on these myriad forms, then the development of a so-called ‘iPod for books’ may be a chimera.

The challenge is to conceive of new ways to deliver knowledge that will motivate users to pay for products. The industry can attempt to educate consumers about pricing all it wants, but, if we are to accept the existence of that dread entity labelled homo economicus – a figure that, at this moment, seems to lie at the heart of virtually all nations’ contemporary governance – then we also need to admit that, rationally speaking, people are not going to pay for something they can get for free.

Book pirates should thus be understood not as thieves or copyright infringers but as a potential audience. More importantly, publishers need to think reflexively about their audiences in general and how digitisation can enhance the way that readers will interact with a book’s content.

Of course, academic and scholarly titles, which are already extensively pirated, rely on authoritative, definitive research, and publishers of such material seem much less likely to embrace flexible delivery and user-generated content, although McKenzie Wark’s Gamer Theory offers one model for a ‘networked book.’21 It should, however, be remembered that university presses (which comprise roughly 10 per cent of Australian book publishing22) mostly already have a significant side business in trade publishing, and purely scholarly texts are often effectively subsidised, partially funded by publishing grants from universities and marketed almost exclusively (often at outrageous prices) to research libraries. As long as this model continues, piracy is unlikely to have a significant effect.

Of course, other technological changes may make such issues irrelevant anyway. Melbourne’s re.press, for example, gives most of its books away as PDFs on its website under a creative commons licence but has also been able to implement a global distribution network of print books by utilising print-on-demand facilities and online booksellers and wholesalers. Since most academics publish books as part of their employment at universities, they are not dependent on royalties for income and giving books away for free doesn’t present the same economic hardship as in other areas.

For literary publishing, the digital transition presents both significant advantages and significant problems. Literary publishing has become the domain of small and independent presses in Australia.23 One of the biggest advantages for small presses is the invention of new printing technologies and e-books, which make smaller runs of books profitable,24 while electronic distribution could theoretically help solve the issue that has plagued small presses in Australia for decades: their exclusion from mainstream channels of distribution.25 But publishing fiction, especially new fiction, is always a high-risk endeavour,26 and it remains an open question as to whether electronic distribution is, in fact, likely to increase the actual readership of literary fiction. While small presses have attempted to expand their readership through alternative means of distribution, historically, such efforts have had mixed results.27 The larger problem remains that literary publishing has increasingly become a marginal aspect of the larger publishing industry,28 and it is unlikely that the technological benefits of digital publishing alone will change that fact.

For all of the benefits of the digital, it is ultimately the real – and not the virtual – world that will determine publishing trends. Whether online or in print, publishing will still be determined by cultural and economic forces shaped by the larger social and governmental networks in which we live. This is not to say that electronic publishing is irrelevant. The new forms of distribution will likely cause huge changes in the industry (unless anyone believes that Google digitised 18 million books just for the fun of it) and will very likely affect the way that the average person interacts with books. But digitisation will not remove the vicissitudes of the world or usher in a utopia of (monetarily) free and open exchange of either ideas or content. For all of the changes brought about by technology, the essential issues will remain the same: how publishers can produce materials that consumers are willing to pay for, ideally at a profit.

  1. For an overview of publishers’ approaches to copyright in Australia over the last decade, see Leanne Wiseman, ‘Copyright and the Regulation of Australian Publishing’, in David Carter and Anne Galligan (eds), Making Books: Contemporary Australian Publishing, University of Queensland Press, St Lucia, 2007, pp. 177–97. Lawrence Lessig’s Free Culture (Penguin, New York, 2004) remains the best known legal argument in favour of loosening copyright restrictions. Creative Commons Australia www.creativecommons.org.au provides extensive information on notions of copyleft and open copyrights.
  2. McKenzie Wark, A Hacker Manifesto, Harvard University Press, Cambridge, Mass., 2004, p. 197.
  3. ‘About Us’, Attributor website, http://www.attributor.com/about_us.php.
  4. When accessed on 22 February 2010, Gigapedia included links to 360,600 items. On 25 February 2010, AvaxHome had 186 720 entries for e-books. Inevitably, some of the material on these two sites would be redundant and some would also likely be in the public domain, so an exact figure remains impossible to reach.
  5. Dan Kaufman, ‘Kindle Opens a New Chapter in Publishing’, Age, 13 October 2009.
  6. Reetz’s scanner was devised for the purposes of digitising government documents already in the public domain. His instruction manual on how to construct a DIY scanner is available at http://diybookscanner.org/PDF/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C.pdf.
  7. Priya Ganapati, ‘DIY Scanners Turn Your Books into Bytes’, Wired, 11 December 2009, http://www.wired.com/gadgetlab/2009/12/diy-book-scanner/. Reetz’s estimated time corresponds to that suggested by the book pirate The Real Caterpillar, who approximated scanning time at a rate of one hour per 100 scans. See C. Max Magee, ‘Confessions of a Book Pirate’, The Millions, 25 January 2010, http://www.themillions.com/2010/01/confessions-of-a-book-pirate.html.
  8. See Magee, op. cit.
  9. I’ve chosen these sites over the considerably better known Scribd for two reasons. Firstly, they primarily offer copyrighted content rather than content in the public domain. Secondly, Scribd in fact sells e-books and is therefore invested in protecting content.
  10. This data was collected based on audience demographics collected by web information company Alexa, a subsidiary of Amazon. Information for AvaxHome can be found at http://www.alexa.com/siteinfo/AvaxHome.ws and for Gigapedia at http://www.alexa.com/siteinfo/gigapedia.info#. There are serious questions about the reliability of Alexa’s statistics, which, according to their own website, are based on extrapolation from traffic patterns of those internet users who download the Alexa toolbar, as well as unspecified ‘other, diverse sources’. Still, measuring internet demographics is notoriously difficult, and Alexa provides rough metrics for determining usage of these sites.
  11. According to recent data collected by the US Book Industry Study Group, men represent 51 per cent of e-book users, as compared to print books, in which readership is 58 per cent female. See ‘More Men Read E-Books, and Other Fun Facts from the BISG Study’, Moby Lives, 25 February 2010, http://mhpbooks.com/mobylives/?p=13045.
  12. I have not been able to get a completely accurate snapshot of Gigapedia’s holdings. According to the website, there are 360,622 entries on the site, but tabulating the totals of all the individual categories adds up to 622,195 entries, with many books obviously listed in multiple categories.
  13. Based on data from 2003–04, David Carter determined that fiction publishing accounted for 16.4 per cent of all new titles and roughly 10 per cent of overall sales in Australia. See David Carter, ‘Boom, Bust or Business as Usual: Literary Fiction Publishing’, in Carter and Galligan, op. cit., p. 234.
  14. Jenny Lee, Mark Davis and Leslyn Thompson, University of Melbourne Book Industry Study 2009, Thorpe-Bowker, Melbourne, 2009, p. 9.
  15. Jenny Lee has already noted that scholarly publishing has largely adapted to electronic publishing for the simple reason that it is a utilitarian solution to the need for the dissemination of academic knowledge; see Lee, ‘The Trouble with Books’, Overland, no. 190, 2008, pp. 17–21.
  16. Mass media comparisons between publishing and music are too numerous to catalogue. For one example see Emmy Hennings, ‘Shares and Share Alike’. Overland, no. 190, 2008, pp. 12–16. Hennings’ comparisons, however, relate primarily to fiction, and when speaking about fiction, as opposed to the industry as a whole, such comparisons are much more reasonable.
  17. Jennifer Howard, ‘Textbook Sales Drop, and University Presses Search for Reasons Why’, Chronicle of Higher Education, 4 September 2008, http://www.chronicle.com/free/2008/09/4480n.htm.
  18. Motoko Rich, ‘Textbooks That Professors Can Rewrite Digitally’, New York Times, 21 February 2010, http://www.nytimes.com/2010/02/22/business/media/22textbook.html?emc=eta1.
  19. ibid.
  20. Robert Andrews, ‘First Look: How Penguin Will Reinvent Books with iPad’, Guardian, 2 March 2010, http://paidcontent.co.uk/article/419-first-look-how-penguin-will-reinvent-books-with-ipad/.
  21. Originally published by Harvard University Press, an online version of Wark’s book can be retrieved at http://www.futureofthebook.org/gamertheory/.
  22. Lee, Davis and Thompson, op. cit., p. 22.
  23. Mark Davis, ‘Literature, Small Publishers and the Market in Culture’, Overland, no. 190, 2008, pp. 4–11.
  24. See Lee, Davis and Thompson, op. cit., pp. 21–2: ‘More recently, though, there are signs that smaller publishers’ share of activity is rising as digital technologies make short-run publishing affordable, if not profitable.’
  25. ibid., p. 85.
  26. ibid., p. 30.
  27. ibid., p. 46. See also Kate Freeth, A Lovely Kind of Madness: Small and Independent Publishing in Australia, Small Press Underground Networking Community, November 2007, http://spunc.com.au/static/files/assets/ae9c26cd/FreethSPUNCReport.pdf
  28. Mark Davis, ‘The Decline of the Literary Paradigm in Australian Publishing’, in Carter and Galligan, op. cit., pp. 116–31.

Emmett Stinson is a Lecturer in Publishing and Communications at the University of Melbourne, President of the Small Press Underground Networking Community (SPUNC), and a fiction editor at Wet Ink. His collection of short stories, Known Unknowns, will be published by Affirm Press in June 2010.
©
Emmett Stinson
Overland 199-winter 2010, pp. 63
–70


Recent blog posts

Articles