The issues of document management and retention are on the agenda for every business, large or small. The fact that these documents are increasingly digital (native or eventually digitized) has led companies to confront new tools and concepts that are not always easy to understand. Above all, it’s not always easy to understand the importance and centrality of certain elements, which makes it more difficult to understand how to use them effectively and productively, or why it is important to pay attention to them.
The topic we’re going to talk about today falls right into this description: metadata. We produce and use metadata every day, sometimes even without knowing it, and it is fundamental for supporting the activities we carry out as part of our work that involves the production, use, and consultation of documents of all kinds.
Although the word “metadata” may call to mind something vague and abstract, it is something extremely concrete: without it, we could not produce or research an invoice, nor could we retrieve that contract we signed a few years ago and now need to consult. Without metadata, we couldn’t even retrieve the records that the tax authority may ask us to show.
So, let’s try to get some clarity to learn more about these issues.
What is metadata?
To explain what metadata is, we’ll refer to one of the most used and probably most effective definitions: metadata is data that describes other data.
This means that metadata provides data and information. If we talk about electronic documents, metadata provides information about these documents, helps describe and contextualize them, clarifying their function and, if necessary, their relationship with other documents.
Often, metadata can be derived from the content of the document itself, other times it is derived from the production and use of that document and can be associated with the document at a later time. Other times, it is used to provide indications on how a document should be handled or processed. Metadata has several functions and is therefore generally divided into categories such as:
- descriptive metadata;
- structural metadata
- administrative and management metadata
Some metadata is easy to identify and already found in the document; other can be found in any computer document, while other types of metadata are specific to certain types of documents.
Information that we find in every document includes, for example, the date affixed to the document, or information about who created it, whether it is a natural person, a company, or an institution. The subject of the document, that is, a brief description of its content, is a transversal metadata that is used in the description of every type of document.
On the other hand, if we’re talking about invoices or other fiscal documents, a fundamental type of metadata is the VAT number.
Vice versa, for a healthcare document, such as a medical report, it could be useful to include a reference indicating the facility where a diagnostic test was performed, in order to more easily retrieve the document according to this search criteria, among the metadata.
In short, any information contained in or associated with the document can become metadata, as long as it is able to play a role in the life cycle of this document: in its creation, in its management and preservation, and in its retrieval and subsequent use.
Why metadata is critical
Now that we have given a few examples and identified the object of our interest, let’s try to understand why metadata is essential.
Leaving aside the legal provisions, which we will return to in a moment, the first function of metadata is to allow us to search and retrieve a document from our computer archives when we need it.
In fact, when we need to retrieve a document from our archives, we need to use search keys, if we don’t want to spend our days scrolling through endless lists of records. These search keys are represented by the metadata associated with the document when it was entered in the management system or in the preservation system. This metadata, such as date, subject, document number, VAT number, and so on, are the tools we use to retrieve the right documents at the right time.
Metadata for document management and preservation
In practice, the set of metadata that accompanies a document, first in the document management system and then in the preservation system, is structured according to precise criteria and standards and associated with the document itself. It’s like the information on the back of a document folder that allows us to understand whether we will find the documents we are looking for inside that folder or not.
Even though a set of metadata can potentially be composed of a wide variety of information, whenthe goal is to ensure effective management and preservation in line with current regulations, it is necessary to adhere to precise criteria and standards.
As we have mentioned, metadata can be subdivided according to its function.
Descriptive metadata, as it is easy to guess, aims to describe the document or digital object to which it is associated, in order to facilitate its search and retrieval. Information that specifies the subject of a document is descriptive metadata.
Administrative and management metadata
This metadata, on the other hand, provides information about the treatments to which the document has been or must be subjected, in order to ensure its long-term preservation, integrity, and authenticity over time. Administrative and management metadata can be metadata that provides information about access rights to the document, or metadata that indicates how long the document should be preserved.
Finally, structural metadata provides information that makes it possible to locate the document within the preservation system, to define the internal structure of the document, or to identify the relationships with other documents. For example, structural metadata allows a document to be linked to the computer file to which it belongs, so as to establish a stable relationship between the two objects.
The role of sector regulations
Some metadata, therefore, must always be present, because their presence is indispensable to retrieve documents; others must be present because they serve as a guarantee of the integrity of the document, and others because they contextualize the document and help correctly place it in relation to other documents.
In addition, other metadata is required by regulations, precisely because of their very important role.
The AgID guidelines on the creation, management, and preservation of electronic documents identify a precise set of mandatory minimum metadata that must be associated with electronic documents, electronic administrative documents, and electronic document aggregations (such as electronic files, for example). The minimum mandatory metadata must be present at all times in order to ensure legally compliant storage.
In some cases, there are specific industry regulations in addition to the Agency for Digital Italy guidelines.
This is the case of documents with fiscal and tax relevance, such as invoices, transport documents, orders, and accounting registers. In this case, some norms, such as DPR 600/72 and the Ministerial Decree of 17 June 2014, provide precise indications on the methods of preservation for these document categories and also on metadata. The D.M of 17 June 2014, in article 3 establishes that, among the metadata associated with these particular documents, name and surname, denomination tax code, VAT number, date, must be included in the document itself. This is because this information and the associations between them are considered indispensable to ensure the retrieval of documents so that they can be reproduced to the authorities.
This is a simple example to show how industry regulations often intervene, providing precise indications that must be kept in mind when discussing digital preservation.
The impact of the new AgID Guidelines
The recent entry into force of the new AgID Guidelines on IT documents has put the spotlight on the issue of metadata.
The new guidelines have modified the system of the minimum mandatory metadata to be associated with the IT document, which up to now was limited to 6 mandatory metadata. The new standard actually doubles the number of minimum metadata, requiring that new information must be enhanced; this is not always easy to identify for all organizations.
This massive change has forced companies and specialized preservation providers to make major upgrades to their systems in order to be able to handle the new structure.
Beyond the minimum mandatory metadata
Of course, in addition to the minimum metadata required by the various regulations, it is always possible to include additional information in our metadata set, which can be useful within a specific business context and to support additional research, management, or structural functions.
At the same time, when we want to enrich a metadata set with additional elements, it is good to follow at least some basic criteria in order to avoid the proliferation of fields beyond what is needed and any unnecessary complication.
First of all, it’s good to identify the functions we need to ensure or, for example, the search keys we use most often to retrieve a specific type of document, and therefore, what is the metadata we need. Some of this information is probably already part of the minimum metadata set, such as date or subject; other information, perhaps specifically related to a certain context or established business practices, may be missing and it may be useful to include it.
However, the “less is more” rule often applies: there is no point in adding a lot of information that we will never or hardly ever use to search for that particular document; it’s better to focus on a few but meaningful pieces of information. A structure that is too rich and complex, in the long run, can become difficult to manage and can prove counterproductive, unnecessarily complicating the work of operators and increasing the risk of errors.
Finally, it is good to assign “labels” to the metadata we intend to add, i.e., designations that make the subject matter immediately clear and that identifies the information we should expect from that metadata. This is essential for several reasons.
First, it facilitates the work of operators and reduces the risk of configuration or compilation errors.
Second, as we move into the world of digital preservation, clarity is crucial. If we need to move our archives from one preservation service provider to another, perhaps after a few years, having used unclear metadata could cause a number of problems and undermine the degree of interoperability and comprehensibility of the indexes in preservation packages, making it more difficult to perform tasks.