Publishing Public Sector Information

The release of public sector information (PSI) creates opportunities for innovative use and reuse of data, and allows the commercial, research and community sectors to add value. Releasing PSI also assists government in making evidence-based policy and service delivery decisions, and supports increased citizen participation in government.

Data is electronically stored information or recordings. Examples (adapted from Public.Resource.Org’s Eight Principles of Open Government Data) include documents, statistical tables, databases of contracts, transcripts of hearings, and audio/visual recordings of events. When published openly PSI is often referred to as public data or Open Government data or Open Data.

Government data can be released to the public as PSI. In a 2008 recommendation, the Organisation for Economic Co-Operation and Development (OECD) Council for Enhanced Access and More Effective Use of Public Sector Information defined PSI as “information, including information products and services, generated, created, collected, processed, preserved, maintained, disseminated, or funded by or for the government or public institutions, taking into account [relevant] legal requirements and restrictions”.

Why is it important?

In July 2010 the Australian Government made a Declaration of Open Government that committed to strengthening citizen’s rights of access to information, establishing a pro-disclosure culture across Australian Government agencies including through online innovation, and making government information more accessible and usable”.

This Declaration follows the Government’s acceptance in May 2010 of recommendations by the Government 2.0 Taskforce to encourage increased availability of government information. The acceptance recognises that:

  • using technology to increase citizen engagement and collaboration on policy making and providing service will help achieve a more consultative, participatory and transparent government
  • public sector information is a national resource and that releasing as much of it on as permissive terms as possible will maximise its economic and social value to Australian and reinforce its contribution to a healthy democracy.

These initiatives are supported by the reforms to the Freedom of Information Act 1982 (FOI Act). The FOI Act reforms promote disclosure of government information, where appropriate, to:

  • give the Australian community access to information by requiring agencies to publish the information, and provide a right of access
  • contribute to increased citizen participation in government processes and increased scrutiny, discussion and review of government activities
  • increase recognition that information held by government is a national resource
  • promote public access to information promptly and at the lowest reasonable cost.

Principles on open public sector information

The Office of the Australian Information Commissioner (OAIC) has released the Principles on open public sector information to help agencies proactively publish information and improve their information management practices.

The OAIC encourages agencies to embed the principles in their internal policies and procedures on information management.

  • Principle 1: Open access to information – a default position
  • Principle 2: Engaging the community
  • Principle 3: Effective information governance
  • Principle 4: Robust information asset management
  • Principle 5: Discoverable and useable information
  • Principle 6: Clear reuse rights
  • Principle 7: Appropriate charging for access
  • Principle 8: Transparent enquiry and complaints processes

Publishing PSI

The recommended process for publishing and managing PSI is:

A diagram depicting the continuous cycle of 'Discover, Process, License, Publish and Refine' involved in publishing PSI

1. Discover

Agencies should examine their existing data and information holdings to identify those appropriate for release. This could include material that is:

  • in the public domain but could be more useful if published as a dataset
  • collected by the agency but not previously  published
  • collected by the agency and sold on a commercial basis (where a case for open access can be made)
  • currently in a non-electronic form but can be made available electronically where reasonable and feasible.

Agencies should also consider datasets that have been requested either through direct community engagement or FOI requests.

Agencies are encouraged to prioritise high value data sets that the commercial, research and community sectors can add value to. These datasets often relate to the agencies key strategic initiatives. For example, the Department of Finance could consider grants program data.

Agencies should further prioritise datasets that will assist the decision-making processes of other government agencies. For example, the Australian Bureau of Statistics prioritises Census data as it underpins many government policy and investment decisions.

Freedom of Information

Information made available under an FOI request is now published in a disclosure log (subject to some exceptions). When identifying documents for publication, agencies must be aware of their publication obligations under the Information Publication Scheme, which commenced on 1 May 2011.

For more information about FOI and agency publication obligations under the Information Publication Scheme, please visit the Office of the Australian Information Commissioner website.

Dataset considerations

Before making data available as PSI, agencies must consider:

  • completeness and sensitivity
  • timeliness
  • usage costs
  • quality.
Completeness and Sensitivity

Datasets released by the government should be as complete as possible, reflecting the entirety of what is recorded about the relevant subject. Consider the release of raw data as PSI, providing there are no reasons for maintaining confidentiality in the information such as privacy, security, cabinet confidentiality and other legal requirements.

Metadata that defines and explains the data should also be included, along with formulas and explanations of how data was derived and calculated. This helps users understand the scope of information available and examine each data item at the greatest level of detail available. If possible, including details on the data collection process is recommended. This allows users to verify how the information was collected and recorded.

Timeliness

Datasets released by the government should be available to the public in a timely fashion. In some cases, real-time updates help to maximise the use of information.

Data quality

The quality of data must be assessed before it is released. The National Statistical Service recommends that in analysing the quality of data, agencies should:

  • assist data users, data producers, data custodians, and data owners by communicating the key data quality issues
  • provide information about data quality to enable information use and reuse by the wider community
  • assist people using data to understand appropriate use and assess whether it can be used for the purpose they have in mind.

The Australian Bureau of Statistics’ Data Quality Online tool assists in drafting data quality statements that can be provided with the data.

2. Process

Before making data available as PSI, agencies must address privacy, security and other relevant concerns.

Privacy

Under the Privacy Act 1988, information becomes ‘personal information’ if it is reasonably identifiable i.e. if the identity of the individual to whom it relates can reasonably be ascertained. As more information becomes available, and particularly as datasets are made available for data-matching, it becomes easier to link information back to an individual. Steps to anonymise the data must be taken, to ensure confidentiality of individuals, while at the same time maintaining the integrity of the data and its potential value to users.

Confidentiality

Before releasing data as PSI it should be assessed as to whether there are reasons that it should not be released such as national security or other classification restrictions, such as cabinet-in-confidence. Material that is published (or could be published) on a public website is usually suitable as PSI.

Legal, Policy and Contractual Requirements

Agencies should consider legislative and contractual requirements applicable to data, for example:

3. License

Open licences, such as Creative Commons, enable content to be shared and reused in new and useful ways without requiring permission to be sought from the copyright owner.

Under the Statement of Intellectual Property Principles for Australian Government Agencies, released by the Attorney-General’s Department (AGD) in October 2010, agencies must consider a Creative Commons Attribution Australia licence as the starting point when releasing PSI and may only use more restrictive licences when there is a compelling reason to do so. The Australian Governments Open Access and Licensing Framework (AusGOAL) is a useful tool to assist in licensing decisions.

Third party material/conditions of use restrictions

To issue any sort of material under an open licence, an agency should either:

  • own copyright in the material, or
  • have permission from the copyright owner to reuse the material (through specifically granted permission or through a licensing arrangement such as Creative Commons).

4. Publish

Where to publish

Options for publishing datasets include:

  • agency websites
  • data.gov.au
  • Data collections or catalogues
  • third party sites.
Agency websites

An agency can publish a dataset directly on their website. It can be posted as a supplement to a publication or report; but often justifies a stand-alone ‘landing page’, complete with its own metadata and documentation. An agency’s website should be capable of using metadata (discussed further below) to allow the landing page to be easily discoverable as a dataset.

Data.gov.au

Data.gov.au  is the central access and discovery point for Australian Government data. It is capable of hosting agency data for public release or linking to existing sources of government data (e.g. on an agency website) and allows agencies to manage the associated metadata.

Collections or catalogues

An agency can make their dataset discoverable (and optionally hosted) by using an existing collection or catalogue repository containing datasets related to their domain of interest. Examples include:

Third party site

Third party sites or external hosting services can be used by agencies after considering privacy, security and legal implications.

How to publish

In considering how to publish data as PSI, an agency must consider issues of:

  • accessibility
  • discrimination
  • open standards
  • metadata/documentation.
Accessibility

Datasets released by the Australian Government should be machine-readable. For more information and guidance on accessibility, refer to Web Guide – Accessibility.

Non-discrimination

Published datasets should be non-discriminatory. Non-discriminatory access means that any person can access the data at any time without having to identify themself or provide a justification for access. Barriers to use of data, which could include registration or membership requirements or limited access to data, should be removed.

Open standards

An open standard is a form of technology that has been documented and is available for reuse on different platforms without proprietary restrictions.

Agencies should avoid using data formats that are dependent on proprietary software to open and interpret their datasets. Open, platform-independent and machine-consumable standards are recommended wherever possible. Human-readable documents (e.g. Microsoft Word, RTF and PDF files) are unlikely to constitute a downloadable dataset. Large datasets can be provided in a compressed form (e.g. zip file) to minimise download time. Linked Data or an Application Programming Interface (API) may also facilitate access to data.

Examples of possible substitute formats for commonly-used, proprietary formats:

Internal Proprietary Format Example Use Possible Alternative Open Format(s)
Access Database Series of related tables XML, CSV, RDF
Excel Spreadsheet Series of related tables XML, CSV, RDF
ESRI file Large quantity of geospatial data KML/KMZ , SHP, TAB, MID/MIF
Metadata standards and documentation

Datasets must be sufficiently described using applicable metadata. Although domain-specific or whole-of-government data catalogues will have different metadata requirements, they will describe datasets using a core set of metadata values. For example:

  • Title
  • Description / Abstract
  • Date Published
  • Authoring Agency
  • Subject / category
  • Licence
  • Temporal Coverage
  • Spatial Coverage

Additional metadata which should be considered to improve useability of data could include:

  • Low-level keywords / tags
  • Granularity
  • Update frequency
  • Date of last update
  • Agency program
  • Agency jurisdiction
  • Collection mode

Additional metadata related to your domain of interest should also be incorporated, for example the AGLS metadata standard and ANZLIC.

Metadata is essential in improving the discoverability of data – the more information provided in describing a dataset, the more discoverable the data is. For example, the title for a dataset might be “Characteristics of Recent Migrants, Australia (Nov 2010)”. This offers limited information for search engines to utilise as it provides inadequate information on the contents of the dataset. Adding a detailed description, such as “Characteristics of Recent Migrants presents data on migration category, country of birth, proficiency in spoken English, educational attainment on arrival and since arrival, employment prior to arrival and since arrival, and sources of household income”, provides many more searchable terms and greatly improves the discoverability of the data.

For more information, refer to the Web Guide’s advice on metadata.

5. Refine

Once an agency has released its PSI in a useful format and under a permissive licence, consideration should then turn to refining the dataset to ensure its future usefulness. Refining a dataset may include revisiting many of the considerations discussed above. It also includes elements such as the permanence of the data, engagement with dataset users and improvement of the data.

Permanence

Agencies must consider the permanence of any datasets they release. Permanence refers to the capability of finding information over time. Agencies are required to ensure older information is archived but still accessible and available to the public for reuse.

Permanence may also include providing updated versions of datasets over time, for example as the agency collects new data. Providing data across a period of time allows analyses of trends and can demonstrate improvements in Government services.

Engagement

Members of the public or organisations making use of Australian Government datasets may be able to provide useful feedback and suggestions for improvement. For example, data.gov.au allows users to comment on and rate datasets, providing the ability for agencies to engage with users of their datasets.

Engaging with dataset users could also provide benefits in allowing an agency to see where and how their dataset is being used. This in turn could demonstrate the benefits of allowing open access to the data, such as the development of new applications.

Improvement

Building on the permanence and engagement phases above, agencies should consider any potential improvements they could make to their datasets over time. For example, should the data be released in a different or additional format? Is it adequately described through metadata? Would there be value in allowing access to the data through an API? Does the agency have any other data holdings which could be incorporated into or released separately from the dataset to add additional value?

Last Reviewed: 2011-12-23