Bookshare Developer Blog

Bookshare API Delivers DAISY Files Optimized for Large Books and Mobile Devices

The introduction of the Digital Accessible Information System, or DAISY, was a transformative event. Prior to DAISY most print-disabled users relied on Braille or recorded audio to consume books. Print readers take for granted the ability to identify structure and meaning from visual formatting and layout, as well as the ease with which they can navigate to specific sections of a book by using chapter headings and page numbers.

It's a different story when using the fast forward button while playing back a cassette tape, or skimming past a linear stream of Braille. DAISY leveraged the multimedia capabilities of computers in the late 90s, bringing together digital audio, structured text, and support for fine-grained navigation. DAISY books can be adapted for a variety of needs: Braille output for blind readers, large type and high contrast display for users with low vision, even word-by-word text highlighting with synchronized text-to-speech for individuals with learning disabilities. DAISY truly is a better way to read, not just for print-disabled readers, but for everyone!

Bookshare members have a wide spectrum of needs, and DAISY has allowed us to provide accessible resources to each and every one of them. In recent years Bookshare's particular focus has been on making the resources of the National Instructional Materials Access Center more readily available to students in the United States, and that has brought with it certain challenges.

As anyone who remembers lugging a backpack to school can attest to, textbooks are quite large. There is a tremendous amount of content in them, and it is very carefully organized in a deep and detailed structure.

The foundation for DAISY's structured text storage is the eXtensible Markup Language. Flexible, adaptable, and supported by a wide variety of existing tools, XML's expressiveness lends itself well to the rich content in textbooks.

However, that flexibility comes at a cost. Loading XML can be very memory-intensive, especially when using more robust models that allow developers to locate and manipulate XML data at a very fine level. If the loading process isn't optimized, even desktop computers can struggle to load large textbooks.

This is even more true of portable devices like Braille notetakers, smartphones, and tablets, which have less RAM and processing power available. A portable device may easily handle a National Federation for the Blind edition of the Los Angeles Times or the latest New York Times bestseller, but parsing the XML for even a relatively small textbook can result in crashes or unacceptably slow load times.

Portable devices are getting more capable every year, and mobile operating systems like Android and iOS are making great strides in accessibility support. This is the transformative wave of the moment, as mobile computing becomes an essential part of people's lives, including the print-disabled community.

Anyone who develops software for portable devices knows that care must be taken to stay within their memory and processing constraints. Bookshare must also take the same kind of care to make sure that our books provide the best reading experience possible for these devices, so we are rolling out updates to how we package our largest books.

Most DAISY books contain only a single structured text file, which is consistent with the guidance in the DAISY specification. XML is typically not loaded progressively; it must be loaded and parsed all at once, which causes DAISY readers to suffer from long load times, reduced responsiveness, and in the worst cases, crashes.

Fortunately the specification allows for splitting the text content across multiple DTBook files.

A DTB that includes textual content will, in most cases, contain only one textual content file. However, when necessary (with a very large book, for example), a DTB can contain multiple textual content files, each of which must be valid to the DTBook DTD.
http://www.daisy.org/z3986/2005/Z3986-2005.html#TextIntro

So a DAISY package that once looked like this:
diagram of DAISY package file structure with one DTBook file

Can be repackaged like this:


diagram of DAISY package file structure with multiple DTBook files

The playback, synchronization, and navigation components of a DAISY book would remain largely the same; the only change is that the sourcing of text content would come from multiple files rather than just one. By splitting the text content, a DAISY reader only needs to load small segments of the entire text at a time. This allows the content being viewed to fit within limited available RAM, improving load times and responsiveness.

For books in our collection that have a textual content file in excess of 2 MB, we create a second, segmented version of the DAISY book with individual DTBook files no larger than 1 MB. For example, a textbook that weighs in at 25 MB would be split into 25 separate files of 1 MB or less. Our testing has shown that loading and navigating through such segmented books speeds up by at least a factor of 2, and in some cases can be 6 to 8 times faster.

Outside of Bookshare's textbook collection, most DAISY books are smaller works or audio-only, and the DAISY convention of one textual content file per book has held up very well. But this is only a convention; the DAISY specification gives us clear guidance for handling larger books.

Because this convention has prevailed for so long, some DAISY readers simply assume that there is only one textual content file and cannot read these books properly. However, there are other readers that handle these files just fine, most notably the DAISY Consortium's Amis, the APH's Book Wizard software and Book Port Plus portable player, and Bookshare's own Read2Go app for iOS.

Some makers of assistive technology products have addressed large books by investing a lot of work into highly-optimized XML parsing. But for those who have not, and indeed even for those who have, our segmented DAISY books can help unburden you from under-the-hood optimizations, allowing you to devote more time to focus the kinds of features that delight and engage your users.

Segmented books are currently available only via the Bookshare API. If you have not signed up for an API key, you can request one here.

To get a segmented book:

  • Perform a book search
  • Check the <result> elements for the <download-format> elements that indicate the formats each book is available in.
  • Large books will include a <download-format> element with the value "DAISY with multiple DTBooks".
  • You can download the segmented version of a large book by using the format id 2 instead of the typical 0 (for BRF) or 1 (for DAISY)

We welcome your feedback and hope we can continue collaborating on improvements like segmented DAISY that can enhance the experiences of all our users.

2 Comments

  1. Mattias_Dolphin5 years ago

    Hi, First, Dolphin has been part of this test and our tools do not have any issues supporting large DTBook files and our tools has supported DAISY with multiple content files for many, many years. It would be nice if you could add Dolphin EasyReader to that list above =) Second, we have seen this new value "DAISY with multiple DTBooks" in the responses now and I wonder if this is limited to certain accounts, or if this is "live" on all user accounts? Thanks, Mattias

  2. Gerardo5 years ago

    Mattias, because not all AT supports DAISY with multiple content files like you do, we're only offering this format via the Bookshare API and not via our web application. This is why you're seeing a value of "DAISY with multiple DTBooks". This is available for everyone, but the format is limited to books that are large enough to warrant splitting them across multiple files. When it's available you use format id 2 instead of 1 in your download API request. Now that you're integrating with the Bookshare API, you can take advantage of the format with EasyReader. Email me if you have further questions.

Please sign in to post a comment.