{ subscribe_url: '/share/sites/library-of-congress-blogs/law.php' }

Legislative Data Challenges, One Year Later

The following is a guest post by Jim Mangiafico. Jim is the winner of our Legislative Data Challenges and has been working with our partner, the National Archives of the United Kingdom, for the second challenge to further the work he began during our challenges. He has graciously agreed to provide an update on his exciting progress with Akoma Ntoso and Legislative XML.

It has been a year since the Library’s Legislative Data Challenges, and we have learned much from the comparative study of legal markup. The Data Challenges asked participants to develop tools to translate legislative documents from their native XML formats into Akoma Ntoso, a newer XML schema currently in the process of standardization by the Organization for the Advancement of Structured Information Standards (OASIS). In the past year, I have continued the work begun during the Challenges, writing code for the National Archives of the United Kingdom to generate Akoma Ntoso versions of the laws available at legislation.gov.uk. An Application Program Interface (API) for UK legislation in Akoma Ntoso will soon be made public. In the process, we had to confront fundamental design decisions about the structure of legislative markup, and we developed some new tools that we hope will improve access to legislation.

The biggest difficulty I encountered when translating UK legislation into Akoma Ntoso stems from the differing paragraph models in the two XML formats. The native XML schema governing UK legislation, called the Crown Legislative Markup Language (CLML), follows what it calls a “true” paragraph model, according to which all elements associated with a paragraph are represented as children of the paragraph element. (This is in contrast to, say, HTML, in which lists and other elements are frequently represented as siblings of the <p> elements with which readers naturally associate them.) Consequently, it is possible in CLML to have a section of an act with multiple paragraphs of text, only one of which is grouped with the section’s subsections. For example, the following pattern is not uncommon in CLML:

CLML (<P1> denotes a section, <P2> a subsection)
<P1>
    <Pnumber>1</Pnumber>
    <P1para>
        <Text>some text</Text>
    </P1para>
    <P1para>
        <Text>some more text</Text>
        <P2></P2>
        <P2></P2>
    </P1para>
</P1>

Markup such as this is not easily translated into Akoma Ntoso, which does not contemplate an association between a subsection and any one textual component of its parent section. Akoma Ntoso permits introductory paragraphs before a section’s first subsection and concluding paragraphs after its last, but all subsections must be direct children of their parent section, and there can be nothing between them that is not their sibling. Consequently, we have chosen to translate CLML like the above as follows:

Akoma Ntoso
<section>
    <num>1</num>
    <intro>
        <p>some text</p>
        <p>some more text</p>
    </intro>
    <subsection></subsection>
    <subsection></subsection>
</section>

As you can see, the semantics of these two fragments is different: the association between the second paragraph of text and the subsections has been lost. We take some comfort in the fact that both will likely be displayed identically to readers, but it remains for us an open question the extent to which legislative markup benefits from the ability to group subsections within a section.

Another challenge we faced relates to the need in the UK to specify the territorial applicability of individual parts of legislation. Legislation in the United Kingdom often contains alternative versions of individual sections, each with a geographical restriction. For example, an act may have two versions of Section 1, the first applying to England and Wales and the second to Scotland. CLML has a dedicated attribute for such cases. Akoma Ntoso allows authors to define jurisdictional restrictions in the metadata and to link them to sections of the document body, but to my mind this mechanism is not as elegant as Akoma Ntoso’s vocabulary for capturing temporal restrictions.

On the whole, however, we have grown quite fond of the simplicity of the Akoma Ntoso data model, and we have borrowed ideas from it for other projects. For example, The National Archives is very interested in supporting HTML5. We have been experimenting with a near one-to-one serialization of Akoma Ntoso in HTML5 and have produced HTML5 versions of all legislation available on legislation.gov.uk. The goal has been to follow the structure of Akoma Ntoso as closely as possible, while using all of the native semantics of HTML5. The core nodes of the document tree–parts, chapters, sections and other high level “hierarchical containers” in Akoma Ntoso–are represented as nested HTML <section> elements, allowing the document outline to be parsed faithfully by HTML5 validators. We had a lively debate about the best use of HTML’s <section> element in legislative documents, ultimately deciding not to use it to represent hierarchical levels beneath the subsection, such as clauses. Also, we mirror the rich Akoma Ntoso metadata structure with native HTML elements using RDFa Lite.

Lastly, in the course of developing testing procedures for our document conversions we began thinking about ways to count elements within legislation and the relationships between them. Now, as part of The National Archives’ Big Data for Law project, we will conduct a “census” of the UK statute book and release data about the frequencies of structural patterns within legislative documents and the changes in those frequencies over time. We’re also using natural language processing to trace changes in statutory language. Look for this soon on legislation.gov.uk.

I would like to thank John Sheridan, Head of Legislation Services at The National Archives, for giving me the opportunity to do the kind of work I find so rewarding. I hope it proves to be useful.

Making Legislative Information Accessible, Discoverable and Usable

The following is a guest post by Noriko Ohtaki, who was a research fellow at the Law Library of Congress.  She previously blogged about Searching for Current Japanese Laws and Regulations. G8 leaders signed the Open Data Charter on June 18, 2013.  Open Data is intended to make information resources accessible, discoverable, and usable electronically to the public, increase […]

Five Questions with Pamela Barnes Craig, Retiring Instruction/Reference Librarian for the Law Library of Congress

The following is a guest post from Pamela Barnes Craig, retiring Instruction/Reference Librarian in the Law Library of Congress.   It is cross posted on Teaching with the Library of Congress.   Describe what you do at the Library of Congress and the materials you work with. Pam Craig talks with teachers at the 2013 Summer Teacher […]

A New Akoma Ntoso Tool: the LIME Editor

Monica Palmirani, one of the judges of our Legislative Data Challenges, recently alerted us to a new tool developed by the University of Bologna: the LIME Editor. This open source, web-based editor allows for the quick conversion of non-structured legal documents into XML, including Akoma Ntoso XML. The tool combines a component-based JavaScript framework and […]

Jim Mangiafico and Garrett Schure Announced as Winners of the Second Library of Congress Legislative Data Challenge

After months of hard work, we are pleased to announce Jim Mangiafico and Garrett Schure as the winners of the Library of Congress Second Legislative Data Challenge, Legislative XML Data Mapping. As you may remember, we launched this challenge last fall with the goal of advancing the development of international exchange standards for legislative data and […]

The United States Code Online – Downloadable XML Files and More

The following is a guest post by Rob Sukol, Deputy Law Revision Counsel, U.S. House of Representatives. Since 1927, the United States Code has been the official codification of Federal statutory law. The Code contains the general and permanent laws of the United States, organized into titles based on subject matter. The printed and online […]

Second Library of Congress Legislative Data Challenge Launched

In July, the Library announced its first legislative data challenge. We are delighted to tell you about another Library of Congress legislative data challenge, Legislative XML Data Mapping. Like the first data challenge, this challenge incorporates the Akoma Ntoso legislative schema, but instead of asking competitors to apply the schema to bill text, we are […]

Library of Congress Announces First Legislative Data Challenge

Andrew and I have both mentioned the Akoma Ntoso schema for representing law and legislation in XML and enabling easier exchange of this information on In Custodia Legis in the past. Today we have more exciting news for you. To help advance the development of international exchange standards for legislative data, the Library of Congress is […]