Model for Data Content - Files

This is a model for publishing a dataset in a file or files. Use the model to guide your own data content publishing.

This model demonstrates how MA government organizations should design pages that include downloadable files. The model both demonstrates the typical structure of such pages and also provides guidance on best practices for each page element. If you have questions, please contact the Massachusetts Data Office by scheduling office hours.

Title: The title should summarize the purpose of the page as succinctly as possible using words your audience will know. It should be unique among Mass.gov pages. Keep in mind that most people find your content from search engines. They might see this page’s title without visiting your other pages. 

Subheader (or “Short Description” in the CMS): Short descriptions should complement titles. If a title summarizes what’s on the page, a short description provides more description of what the page is for and what you can use it to do. Keep them to 1-2 sentences.

Table of contents: A table of contents will automatically appear if there are 3 or more sections on your page.

Table of Contents

Overview

The overview should be a brief introduction to the data and the files in which the data is contained. Here you can provide information on the purpose of the data and what it represents, and any other information that would provide the user with adequate background information before accessing the files. 

It should provide critical context that readers need to understand the data, including

  • A description of your data 
  • Publishing organization(s) 
  • When the data was last updated and the general cadence of updates you publish 
  • Why the data was collected 

There may be other important context to provide, too. This model dedicates space below your dashboard or table to an "About the data" section, where you can provide more information for users.

Please note, although there is a dedicated section within the CMS for an overview, you need to create your own (the way this model does) by adding a new heading and section content. This will allow you to organize the page so the overview of the data appears after the table of contents.

Adding contact information to your data content

Use the contact information template to provide an email for users to get in touch with your data owners. Note that on this Mass.gov content type, you'll find contact information both at the bottom and in the top right.

You may need to create a new contact information template that differs from the normal one you provide on non-data pages. Once created, you can insert it onto any page that contains data. It is possible to include both sets of contact information if you need to. 

We strongly recommend that questions about data go to a shared mailbox (e.g. AgencyData@mass.gov) rather than a specific person’s email. That way, if your data owner moves to a different role or organization, you don’t need to update your web content. Alternatively, your organization needs to be vigilant about updating contact information when your staffing changes. Either way, you or someone on your team should be accountable for receiving and responding to user feedback and questions.

Data downloads

Use this section to post your downloads. You should include a brief description of the file along with the download. If you have multiple files--for example, a dataset, a summary report, and instruments you used to gather the data--you can create a bulleted list. Here are a few examples:

A single file:

2019-2021 Massachusetts Animal Species (Mock Data)

Data from a study of animal species in Massachusetts, 2019-2021. File format is CSV. See About the data section for data dictionary. 

A single dataset published in multiple formats:

2017-2021 Massachusetts Animal Species (Mock Data) (CSV) | (XLSX)

Data from a study of animal species in Massachusetts, 2019-2021. File format is CSV. See About the data section for data dictionary.

Multiple files:

 

Best practices for selecting file types & formatting

  • Choose file types that make it easy to access and manipulate data. For example, tabular data should be published in CSV or xlsx format—not in pdf or docx. 
  • Don't break datasets into many small files. If a dataset is regularly updated, replace the currently published file with a new one that combines the old data and the updated data.
  • Don't post your files in the the "additional resources" section. Create a section like this one focused on the download
  • Choose formats that are commonly in use by the researchers who are likely to look for your data
  • Choose file names that explain what the data is about. Avoid using acronyms, jargon, and brand names that users might not be familiar with.
  • The first row of a tabular dataset should be a header, and the header names should only contain numbers, letters, hyphens, and underscores
  • Be consistent in how you format categorical and null values. For example, a column should not alternate between using NULL and NA to represent blank values.
  • Make sure file separators are not used in data values. For example, if a CSV contains numbers that use commas—such as 1,245—computers may mistakenly treat values with commas as two separate numbers—1 and 245. Choose a separator that doesn’t appear in your data (tab, colon, vertical pipe, etc.)

  • Don’t combine reports and data in the same file. Data files should only contain data. They should not contain summary statistics or metadata.

It's often very useful to post files in multiple formats. To do this, you can use a vertical pipe divider:

 

About the data

Include a section like this one to provide more details about your data. Describing key takeaways, data collection methods, and limitations and gaps in the data will help users in understanding the data and how the data may be used.

Data dictionary

Your data content should include a data dictionary that lists the meaning and types of data in each field. You can do this in HTML, in a downloadable file, or both.

A good data dictionary includes plain language definitions of each variable, dimension or metric. Here's an excerpt from a data story on youth arrests.

Definitions:

Calendar year: Always refers to a period of January 1 through December 31 of the same year

Diversion: Any program that allows youth who commit an offense to be directed away from more formal juvenile justice system involvement. In the context of youth arrests, diversion means giving a youth a warning or referring them to a program rather than making an arrest or issuing a court summons. 

Felony: Any serious crime which, if committed by an adult, could be punished by incarceration in state prison.

Fiscal year: Always refers to a period of July 1 through June 30 of the same year.

Gender: The National Incident Based Reporting System (NIBRS) reports with the following options: Male or Female. Data collection for gender is officer-observed.

Additional information to include about your data

Consider including any of the following sections if they are relevant:

  • Provide a detailed summary of the data (that is, elaborate on your overview).
  • State major takeaways.
  • State any limitations of the data if applicable.

Contact   for Model for Data Content - Files

Address

1 Ashburton Place, Boston, MA 02108

Help Us Improve Mass.gov  with your feedback

Please do not include personal or contact information.
Feedback