The release of public sector information (PSI) creates opportunities for innovative use and reuse of data, and allows the commercial, research and community sectors to add value. Releasing PSI also assists government in making evidence-based policy and service delivery decisions, and supports increased citizen participation in government.
Data is electronically stored information or recordings. Examples (adapted from Public.Resource.Org’s Eight Principles of Open Government Data) include documents, statistical tables, databases of contracts, transcripts of hearings, and audio/visual recordings of events. When published openly PSI is often referred to as public data or Open Government data or Open Data.
Government data can be released to the public as PSI. In a 2008 recommendation, the Organisation for Economic Co-Operation and Development (OECD) Council for Enhanced Access and More Effective Use of Public Sector Information defined PSI as “information, including information products and services, generated, created, collected, processed, preserved, maintained, disseminated, or funded by or for the government or public institutions, taking into account [relevant] legal requirements and restrictions”.
In July 2010 the Australian Government made a Declaration of Open Government that committed to “strengthening citizen’s rights of access to information, establishing a pro-disclosure culture across Australian Government agencies including through online innovation, and making government information more accessible and usable”.
This Declaration follows the Government’s acceptance in May 2010 of recommendations by the Government 2.0 Taskforce to encourage increased availability of government information. The acceptance recognises that:
The Office of the Australian Information Commissioner (OAIC) has released the Principles on open public sector information to help agencies proactively publish information and improve their information management practices.
The OAIC encourages agencies to embed the principles in their internal policies and procedures on information management.
The recommended process for publishing and managing PSI is:
Agencies should examine their existing data and information holdings to identify those appropriate for release. This could include material that is:
Agencies should also consider datasets that have been requested either through direct community engagement or FOI requests.
Agencies are encouraged to prioritise high value data sets that the commercial, research and community sectors can add value to. These datasets often relate to the agencies key strategic initiatives. For example, the Department of Finance and Deregulation could consider grants program data.
Agencies should further prioritise datasets that will assist the decision-making processes of other government agencies. For example, the Australian Bureau of Statistics prioritises Census data as it underpins many government policy and investment decisions.
Information made available under an FOI request is now published in a disclosure log (subject to some exceptions). When identifying documents for publication, agencies must be aware of their publication obligations under the Information Publication Scheme, which commenced on 1 May 2011.
For more information about FOI and agency publication obligations under the Information Publication Scheme, please visit the Office of the Australian Information Commissioner website.
Before making data available as PSI, agencies must consider:
Datasets released by the government should be as complete as possible, reflecting the entirety of what is recorded about the relevant subject. Consider the release of raw data as PSI, providing there are no reasons for maintaining confidentiality in the information such as privacy, security, cabinet confidentiality and other legal requirements.
Metadata that defines and explains the data should also be included, along with formulas and explanations of how data was derived and calculated. This helps users understand the scope of information available and examine each data item at the greatest level of detail available. If possible, including details on the data collection process is recommended. This allows users to verify how the information was collected and recorded.
Datasets released by the government should be available to the public in a timely fashion. In some cases, real-time updates help to maximise the use of information.
The quality of data must be assessed before it is released. The National Statistical Service recommends that in analysing the quality of data, agencies should:
The Australian Bureau of Statistics’ Data Quality Online tool assists in drafting data quality statements that can be provided with the data.
Before making data available as PSI, agencies must address privacy, security and other relevant concerns.
Under the Privacy Act 1988, information becomes ‘personal information’ if it is reasonably identifiable i.e. if the identity of the individual to whom it relates can reasonably be ascertained. As more information becomes available, and particularly as datasets are made available for data-matching, it becomes easier to link information back to an individual. Steps to anonymise the data must be taken, to ensure confidentiality of individuals, while at the same time maintaining the integrity of the data and its potential value to users.
Before releasing data as PSI it should be assessed as to whether there are reasons that it should not be released such as national security or other classification restrictions, such as cabinet-in-confidence. Material that is published (or could be published) on a public website is usually suitable as PSI.
Agencies should consider legislative and contractual requirements applicable to data, for example:
Open licences, such as Creative Commons, enable content to be shared and reused in new and useful ways without requiring permission to be sought from the copyright owner.
Under the Statement of Intellectual Property Principles for Australian Government Agencies, released by the Attorney-General’s Department (AGD) in October 2010, agencies must consider a Creative Commons Attribution Australia licence as the starting point when releasing PSI and may only use more restrictive licences when there is a compelling reason to do so. The Australian Governments Open Access and Licensing Framework (AusGOAL) is a useful tool to assist in licensing decisions.
To issue any sort of material under an open licence, an agency should either:
Options for publishing datasets include:
An agency can publish a dataset directly on their website. It can be posted as a supplement to a publication or report; but often justifies a stand-alone ‘landing page’, complete with its own metadata and documentation. An agency’s website should be capable of using metadata (discussed further below) to allow the landing page to be easily discoverable as a dataset.
Data.gov.au is the central access and discovery point for Australian Government data. It is capable of hosting agency data for public release or linking to existing sources of government data (e.g. on an agency website) and allows agencies to manage the associated metadata.
An agency can make their dataset discoverable (and optionally hosted) by using an existing collection or catalogue repository containing datasets related to their domain of interest. Examples include:
Third party sites or external hosting services can be used by agencies after considering privacy, security and legal implications.
In considering how to publish data as PSI, an agency must consider issues of:
Datasets released by the Australian Government should be machine-readable. For more information and guidance on accessibility, refer to Web Guide – Accessibility.
Published datasets should be non-discriminatory. Non-discriminatory access means that any person can access the data at any time without having to identify themself or provide a justification for access. Barriers to use of data, which could include registration or membership requirements or limited access to data, should be removed.
An open standard is a form of technology that has been documented and is available for reuse on different platforms without proprietary restrictions.
Agencies should avoid using data formats that are dependent on proprietary software to open and interpret their datasets. Open, platform-independent and machine-consumable standards are recommended wherever possible. Human-readable documents (e.g. Microsoft Word, RTF and PDF files) are unlikely to constitute a downloadable dataset. Large datasets can be provided in a compressed form (e.g. zip file) to minimise download time. Linked Data or an Application Programming Interface (API) may also facilitate access to data.
Examples of possible substitute formats for commonly-used, proprietary formats:
|Internal Proprietary Format||Example Use||Possible Alternative Open Format(s)|
|Access Database||Series of related tables||XML, CSV, RDF|
|Excel Spreadsheet||Series of related tables||XML, CSV, RDF|
|ESRI file||Large quantity of geospatial data||KML/KMZ , SHP, TAB, MID/MIF|
Datasets must be sufficiently described using applicable metadata. Although domain-specific or whole-of-government data catalogues will have different metadata requirements, they will describe datasets using a core set of metadata values. For example:
Additional metadata which should be considered to improve useability of data could include:
Metadata is essential in improving the discoverability of data – the more information provided in describing a dataset, the more discoverable the data is. For example, the title for a dataset might be “Characteristics of Recent Migrants, Australia (Nov 2010)”. This offers limited information for search engines to utilise as it provides inadequate information on the contents of the dataset. Adding a detailed description, such as “Characteristics of Recent Migrants presents data on migration category, country of birth, proficiency in spoken English, educational attainment on arrival and since arrival, employment prior to arrival and since arrival, and sources of household income”, provides many more searchable terms and greatly improves the discoverability of the data.
For more information, refer to the Web Guide’s advice on metadata.
Once an agency has released its PSI in a useful format and under a permissive licence, consideration should then turn to refining the dataset to ensure its future usefulness. Refining a dataset may include revisiting many of the considerations discussed above. It also includes elements such as the permanence of the data, engagement with dataset users and improvement of the data.
Agencies must consider the permanence of any datasets they release. Permanence refers to the capability of finding information over time. Agencies are required to ensure older information is archived but still accessible and available to the public for reuse.
Permanence may also include providing updated versions of datasets over time, for example as the agency collects new data. Providing data across a period of time allows analyses of trends and can demonstrate improvements in Government services.
Members of the public or organisations making use of Australian Government datasets may be able to provide useful feedback and suggestions for improvement. For example, data.gov.au allows users to comment on and rate datasets, providing the ability for agencies to engage with users of their datasets.
Engaging with dataset users could also provide benefits in allowing an agency to see where and how their dataset is being used. This in turn could demonstrate the benefits of allowing open access to the data, such as the development of new applications.
Building on the permanence and engagement phases above, agencies should consider any potential improvements they could make to their datasets over time. For example, should the data be released in a different or additional format? Is it adequately described through metadata? Would there be value in allowing access to the data through an API? Does the agency have any other data holdings which could be incorporated into or released separately from the dataset to add additional value?
Last Reviewed: 2011-12-23