“Wow, it’s WordStar!” Exploring a Beloved Early Word Processor and its Many Formats

Dan Hockstein and Mari Allison are 2022 Junior Fellows in the Digital Collections Management and Services Division (DCMS) working under the mentorship of Kate Murray.

Over the course of our Junior Fellowship this summer, we have focused on a variety of streams of work around the Library of Congress’ Sustainability of Digital Formats website. The site contains an extensive list of commonly used file formats, wrappers, and encodings. There are thousands of these created by legacy equipment and software that present challenges in identification, preservation, and use. Among these is the file format produced by the now-defunct word processing platform WordStar.

WordStar’s History

The WordStar format is the default proprietary plain-text format for a word processing platform of the same name. The initial version of WordStar was first published by MicroPro International for Digital Research, Inc’s CP/M operating system. Subsequent releases of WordStar’s early versions were ports for microcomputers and their operating systems – for example, Tandy’s LDOS-5, the Epson PX-8, the Osborne 1, and the  Apple ][.

A Wyze 100 Microcomputer running WordStar. JL_pics, CC BY 3.0 <https://creativecommons.org/licenses/by/3.0>, via Wikimedia Commons.

Microsoft’s MS-DOS operating system became a platform for WordStar’s wide adoption, beginning with version 3.0 of the program.  At this point, MicroPro International also began to splinter into various companies through staffing changes, some of which created direct competition with WordStar. Through our research, it became clear that the program changed hands several times, intersected with and borrowed from other pieces of software, and created a complicated pathway that created several output files that could all be called “WordStar” files. As a result, the structure of WordStar files did not exist in a streamlined, linear trajectory of updates and versioning either – changes were fairly drastic. In the first few versions of WordStar, the 8th bit of ASCII characters, usually reserved to extend the character set, was instead used to store print and formatting information. This limited cross-compatibility with other word processors and was later changed with the release of WordStar 5.0.

The introduction of additional word processing programs, such as Microsoft Word and Apache OpenOffice, minimized the market share of WordStar for its use case. The software is now hosted and available for paid download, but is no longer developed or maintained by its original owners.

Quite a few WordStar files exist within the Library’s collections, and they have unique properties that, in comparison to similar or more modern text formats, are more complex. In order to document and assist preservation and access to these files, we have been tasked with creating a Format Description Document, or FDD, for the WordStar file format. This research is still in progress, but will be published at WordStar File Format Family.

The WordStar Community

One of the most fun aspects of doing research on WordStar was seeing the passionate community of writers who still swear by the software. For example, George R.R. Martin still exclusively uses WordStar 4.0 for DOS to write A Song of Ice and Fire, the book series that inspired the Game of Thrones TV series. Hobbyists keep discussion alive in online forums, post guides on how to set up a DOS machine or emulator to run WordStar, and modify Microsoft Word to include the same key command shortcuts as WordStar. Through reading these posts, we came to understand what people love about the WordStar application. It was the first word processor that was able to render the document on screen, formatted almost exactly as it would appear when printed. The efficient command keys, used to navigate menus and perform operations such as Print or Save, are another favorite application feature, vividly explained in this post about a man teaching his 9 year old daughter how to use them. The command keys are distinct from dot commands, which aren’t just application features, but present in the actual text file. These are visible on the screen during the editing process but become formatting information when rendered into a printed page.

While it was wonderful to explore WordStar’s online community, the unofficial nature of the information, often taking the form of blog posts on someone’s personal website, somewhat complicated our research. When writing FDDs, the Library prefers to use primary sources, and at times, we were hesitant about linking out to personal sites. In order to verify information, we tried to cross reference several personal websites and news articles.

Identifying WordStar Files

Utilizing a modern graphic user interface to access, organize, and name files is quite different from creating content on the early microcomputers that many WordStar files were created on. While these systems did have directory structure, it was not always represented visually by folders, and operating systems did not necessarily associate filetypes with applications. Generally, much more was left to the end user.

Because of this, identifying the WordStar files in our collection was an unexpected challenge. By modern conventions, most people use file extensions, or the 2-4 characters preceding the “.” in a file name, as a high level but imprecise way to identify a format at a glance -  “.docx” for Microsoft Word documents, “.mp3” for MP3 files, or “.csv” for Comma Separated Values. Some WordStar files may follow similar standards with .ws and optionally .ws2, .ws3, etc. depending on the version of WordStar, but other WordStar files may have very different extensions. The WordStar Reference Manual from 1983 p.1-12 states: “The most useful file name is one that helps you remember the file contents… For example, you might add .LET after each letter file name, .REP after each report or .912 to indicate that September 12 was the last editing session.” Depending on which standard an individual creator has decided to follow, this increases the difficulty of verifying WordStar as the software used to create a single file at a glance. At an institution with many word processing documents dating back to the ‘70s, this poses an issue!

Examining a file for signature information is a more consistent way to identify formats. A signature is a piece of embedded metadata used to identify a filetype, often found in the header or footer of the file.

The header of a WordStar sample file, viewed in a hex editor. A 128 byte header beginning with 1D 7D 00 and ending with 7D 00 1D are indicative of WordStar versions 5.0 and above. 50 in the 5th byte position signifies that this is WordStar 5.0.

Many of WordStar’s different versions have their own unique file signature, which greatly increases the complexity of trying to identify any given file. Signature information for WordStar 5.0, 6.0, 7.0 and 2000 are currently available in the National Archives UK’s PRONOM registry, but other versions have not had signatures identified yet.

We are also looking into unique features of the format that could also serve as identifiers. One example is symmetrical sequences which first showed up in WordStar 5.0. Symmetrical sequences serve as tags that enclose extra information like font color and footnotes. Symmetrical sequences follow a defined byte structure, and the opening and closing tags have their own control character, 1DH. Theoretically, this distinguishing feature can be a way to identify files from WordStar 5.0 and above, but it’s not as consistent as a file signature, nor is it as easy to automate checking.

When researching WordStar, we had to maintain a balance between technical and contextual information. To create a holistic format description, it’s important to describe the history and adoption of WordStar’s many versions while also providing data about byte sequences and ASCII encodings. Combining our research skills and technical knowledge to uncover more about WordStar was an extremely rewarding process. We uncovered and better understood a history of early word processing documentation, unearthed some great graphic design, and created new resources to sustain digital formats into the future!

Final Reflections

Mari: I’ve learned a lot about conducting file format research through this whole process. It’s been really fun to dive down some of the rabbit holes, from studying the full format specification and picking out useful identification information to reading blog posts and interviews from science fiction and fantasy authors. Sometimes it’s difficult to know if I’m going too deep on esoteric information, but I’ve learned through interactions with the greater file format and digital preservation community that every detail is valued. It feels incredible to contribute to a resource that will be used both within the Library, and also by outside researchers.

Dan: It’s been a great experience to discover the inner workings of what makes a file, all while contributing to an important body of knowledge for the digital preservation community. Knowing that our research may make identification and access of WordStar files easier for future researchers, and being able to contribute to Library resources, has allowed me to build new skills while also making a lasting impact as a Fellow. It also made me even more interested in early computing and technology.

6 Comments

  1. steve olson
    July 21, 2022 at 2:30 pm

    i was an early user of Word*Star and the famous cursor diamond using my K-PRO dual-floppy luggable computer. I even earned some extra cash typing papers in college (1980s) using that setup and my 300baud modem. Great memories!

  2. Gerald
    July 22, 2022 at 11:34 am

    You may want to look at WordTsar http://wordtsar.ca

  3. Carl Fleischhauer
    July 22, 2022 at 11:36 am

    Terrific overview of the topic and the issues faced by students (and preservers!) of content in digital form. Thank you!! for the helpful discussion. As an old duffer, it is reassuring to see the next generation engage these important matters. It was not the subject for this blog but many readers will wonder, “gee, WordStar files are carriers/containers for texts — and the text is the important asset, not the formatting — so, for future researchers, ought an archive migrate that text forward into a carrier/container format that is judged to be good for, um, the next 25 years?” (Of course, an archive might want to “freeze” and retain the WordStar file just in case the migration was imperfect.) Alas, that question is a bit of a puzzle too.

  4. Mari Allison
    July 27, 2022 at 1:11 pm

    Dan and I really appreciate all the comments so far! We love hearing about personal experiences with WordStar and community projects.

    We also welcome discussion about digital preservation challenges! There are a lot of different considerations that will vary between individual archives.

  5. S. Andrews
    July 31, 2022 at 9:55 am

    I loved WordStar. I am not a computer buff. I used WordStar when I first started using word processing software. I was writing text that included translations from other languages and was able to format the documents myself using diacritics and other special characters that aided in translation and transposition.

    I could write small lines of code to support these efforts and see what it did in the text. I held out with WordStar as long as I could until Microsoft’s Word programs made it almost impossible to continue with WordStar.

    Now, not being a computer techie, I must adapt my writing and need for innovation to Microsoft’s increasing overriding of individual innovations in the software. I do not have the time or expertise to try to override Microsoft’s interpretation of a user’s need to create formatting that meets my needs.

    I am happy to see your efforts and wish you much success.

  6. Monique
    August 17, 2022 at 9:19 pm

    Thank you for this post! This is such a timely one for me as I am working with a collection full of these files. Unfortunately the files don’t have typical WS magic numbers and all of the file extensions have been changed (and many of them are sequences of numbers) but some of the header information has retained the version of WS. Typical conversion scripts for WS to Word haven’t worked, which I assume is because these files lack the typical features of a WordStar file. Trying to emulate these to see if that works better. Will update with my result!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.