Government of Canada Web Archive Launches the Vancouver 2010 Olympic and Paralympic Collection

Version française

Image of fingers on a keyboard

By Tom J. Smyth

As we mark the fifteenth anniversary of the Vancouver 2010 Olympic and Paralympic Games, LAC is proud to launch a web archival collection documenting this important event in Canada’s history.

A poster for the Vancouver Olympics titled “With glowing hearts.”

Image from the web archive homepage of the Vancouver Olympics.

What is web archiving and why do we do it?

“Web archiving” is a specialized digital curation and preservation-based discipline that guarantees future access to unique resources from the Internet. It uses specialized hardware and software to target, download, arrange, describe, preserve and replay the original published and interactive context of web resources via emulation in a specialized public discovery and access portal.

Web archiving is practised by national libraries and archives all over the world to capture and preserve web resources that are usually unique and expressed in no other medium. Preserving our digital documentary heritage from our national Internet domain is, therefore, of vital importance to the nation’s history.

Acquiring web resources became a formal part of LAC’s mandate in 2004 under the Library and Archives of Canada Act, subsection 8(2). LAC’s means of realizing this part of its mandate is the Web and Social Media Preservation Program (WSMPP) within the Digital Services Sector, which has operated as a daily activity since mid-2005.

The program curates data and research collections of unique web resources documenting Canadian historical and cultural themes and events. Curating these collections aligns with LAC’s priorities and policy frameworks, requirements of computational use (e.g. in textual and data mining, AI, Machine Learning [ML], and Large Language Models [LLMs]) and modern digital humanities scholarship. We then make these resources publicly available for generations to come and to support future international research on Canada via the Government of Canada Web Archive (GCWA).

The discipline is advanced by the 50-plus members of the International Internet Preservation Consortium, of which LAC is a founding member and currently holds a Steering Committee chair.

Web archival collections curation for the Olympics

From the inception of the Web and Social Media Preservation Program, LAC has collected resources on the Olympic games as they were running, beginning with the Torino 2006 Winter Games (Turin, Italy).

In the beginning, our effort was modest and consisted of collecting the official Olympic site and the Canadian Olympic Committee site. We then progressed into collecting information on federal support programs (“Own the Podium”), individual Olympic sport organizations and the athlete blogs.

LAC’s extensive holdings in web archival Olympic and Paralympic collections now includes:

  • 2006 Winter, Turin, February 10–26, 2006
  • 2008 Summer, Beijing, August 8–24, 2008
  • 2010 Winter, Vancouver, February 12–28, 2010
  • 2012 Summer, London, July 27–August 12, 2012
  • 2014 Winter Games, Sochi, February 7–23, 2014
  • 2016 Summer, Rio de Janeiro, August 5–21, 2016
  • 2018 Winter, Pyeongchang, February 9–25, 2018
  • 2020 Summer, Tokyo, July 23–August 8, 2020
  • 2022 Winter, Beijing, February 4–20, 2022
  • 2024 Summer, Paris, July 26–August 11, 2024

Canada has hosted the Olympic Games on three occasions: the 1976 Summer Games in Montreal, the 1988 Winter Games in Calgary, and most recently, the 2010 Winter Games in Vancouver.

The Vancouver 2010 Winter Olympic and Paralympic Games ran February 12–28, 2010 (1). Canada sent some 209 athletes to the Olympic games, our fourth-greatest contribution historically, where they placed third in the overall medal standings with 14 gold, 7 silver, and 5 bronze (Canada however placed first in terms of total gold medals) (2).

Women’s hockey team celebrating their victory on the ice. Goalie’s net is displaced, helmets, gloves and hockey sticks are on the ice around the players as they are celebrating.

Team Canada celebrates after winning the women’s hockey gold medal game at the Vancouver Olympics in February 2010. Credit: Jason Ransom. (MIKAN 5570828)

The 2010 Games were special for Canada and involved “unprecedented partnerships” with some Indigenous communities (which does not speak for or reflect the opinion of all Indigenous groups). It was also the last and greatest Canadian Olympic hosting effort and marked an important milestone for the Web and Social Media Preservation Program in the development of the program and thematic collection and curation methodologies.

Evolving collection development and web archival digital curation

Beginning with Vancouver 2010, we have continuously elaborated our methodologies and curated extensive web archival collections documenting Canada’s performance and perspectives, as well as the experiences of Canadian Olympians at the Winter, Summer and Paralympic Games.

Curation for Vancouver 2010 began in June 2009. At that time, we were approached by an academic researcher who was interested in web archiving, particularly in the promotion of tourism and related sports activities. How was tourism in British Columbia being promoted while it hosted the games?

We had to admit that our answer to the question of tourism in British Columbia was… “no idea!” Starting the curation process early, however, gave us plenty of lead time to collect news media and web resources documenting preparations and developments leading up to the formal games. It also allowed us to consider new and uniquely Canadian perspectives in our curation, such as Indigenous viewpoints.

Data and web resources on “tourism” as a parallel topic to the Olympic and Paralympic Games wasn’t something we deliberately targeted and collected previously (again, we hadn’t hosted a Games event since Calgary 1988). This begged the question: what other resources or themes would researchers be looking for in our web archive that we hadn’t anticipated?

This question began something of a renaissance in our curation thinking and our alignment with broader principles of national legal deposit. Since client research needs can never be fully anticipated, it is important that we collect resources as broadly as possible. To take it a step further: how could we curate and arrange our data in such a way that it would support future computational and digital humanities research use of web archival collections as “big data”?

We then began considering new themes and sub-themes for curation, such as infrastructural and venue development, environmental and “green” impact, economic impact of hosting the Games and even anti-Olympic sentiment. Expanding our focus in this way required additional research but resulted in a much richer and more comprehensive web archive for future generations.

This effort paid off. Before the end of 2009, the work came to the attention of our host organization. The Federal Secretariat for the Olympic and Paralympic Games at Heritage Canada learned of our project and expressed interest in promoting the work.  The project was then showcased in the 2009-10 Government of Canada Performance Report (3) as part of LAC’s and the Secretariat’s deliverables for the Vancouver 2010 Games.

Our current collections methodology has matured to the point where many topics, such as the Olympics (also the federal government domain presence, change of government or cabinet, the federal elections and so on), now have a refined “core seedlist.” A core seedlist is a set of web URLs that are unlikely to change and that can be quickly, efficiently and frequently collected as the key resources for those topics. This frees web archiving specialists to concentrate on curating and including extra resources that are generated as a direct result of, and are attuned more specifically to, unique events. A pertinent example is the Paris 2024 games.

Paris 2024 and announcing public access for the Vancouver 2010 collection

For the Paris 2024 games, there would clearly be some new issues and topics that perhaps weren’t as relevant or that didn’t exist in 2010. For example, eSports first became a serious consideration for the formal Olympics, and we also witnessed the introduction of “breaking” as an Olympic sport. Security was also a major concern, which was curated as a major topic for the first time.

While our initial intention was to publish the Paris 2024 collection to kick off our Olympics curation, we discovered that most extensive work on this had already been done while preparing the web archival metadata and controlled vocabularies for the Vancouver 2010 collection. It should therefore be the Vancouver collection that kicked off our publishing on Olympics, as it could serve as the most complex and “template” model for arranging our historical Olympics collections via the Government of Canada Web Archive.

Wouldn’t it be grand(er), if we could lead our Olympics collections with the publication of one dear to our hearts, which was pivotal to the development of the program?

On that note, we are pleased to launch our Vancouver 2010 Olympic and Paralympic collection—within days of the fifteenth anniversary of the Games!

To facilitate browsing and discovery, the collection has been arranged into sub-topics including the following:

  • Blogs
  • Own the Podium
  • Sponsors
  • Tourism
  • Government – municipal
  • Government – provincial
  • Government – federal
  • Environment
  • Indigenous perspectives
  • Sports organizations
  • Non-profit organizations
  • Education
  • Canada Post
  • Official Olympics websites
  • Community
  • News media
  • Alternative perspectives and protests
  • Venues
  • Athletes
  • Paralympics
  • Corporate
  • Commemoration
  • Looking back

In establishing these topics and facets, controlled vocabularies and metadata architecture necessary to support, arrange and publish the Vancouver 2010 Olympic and Paralympic collection, we have set the groundwork on which to build, expand, augment, and publish all our other historical Olympics collections, which can now follow in due course.

We hope you enjoy the Vancouver 2010 collection!

References

  1. Vancouver 2010 – Team Canada – Official Olympic Team Website
  2. Team Canada’s Team Size by Olympic Winter Games – Team Canada – Official Olympic Team Website
  3. Report of the President of the Treasury Board of Canada. Canada’s Performance: The Government of Canada’s Contribution. Annual Report to Parliament 2009-10, pp. 77.

Tom J. Smyth is the Manager of the Web and Social Media Preservation Program (WSMPP) and the Government of Canada Web Archive (GCWA) at Library and Archives Canada. The WSMPP team includes Elizabeth Doyle, Jason Meng, Kevin Palendat and Russell White.

Improving your online experience: Launch of the new Government of Canada Web Archive

Image of fingers on a keyboard

By Tom J. Smyth

Introduction and program history

Library and Archives Canada (LAC) is the nation’s designated national memory institution, with a legislated mandate to acquire, describe, preserve and provide long-term access to Canada’s documentary heritage.

This includes the Canadian Web! Resources in formats for the Web are recognized internationally as an important facet of a nation’s modern digital heritage. These irreplaceable web resources are important evidence of Canadian history and culture in the 21st century, but they are volatile and prone to disappearing without warning.

What can be done about this? How do we “rescue” resources generated in real time, which exist outside the normal production streams of archival records or traditional publications? How do we safeguard web resources that can therefore contain information found in no other medium, which may document national historic events or important aspects of culture as they are unfolding?

Owing to their precarious nature, immediate and managed action is required to select, arrange, make available and ensure the digital preservation and data continuity of web resources that constitute Canadian digital documentary heritage. This action is referred to internationally as “web archiving,” which is a discipline based on digital preservation and curation that is practiced and advanced by, for example, the 50-plus members of the International Internet Preservation Consortium (of which LAC is a founding member).

Acquiring web resources became a formal part of LAC’s mandate in 2004 under the Library and Archives of Canada Act, subsection 8(2). LAC’s means of realizing this part of its mandate is the Web and Social Media Preservation Program within the Digital Services Sector. The program curates data and researches collections of unique web resources documenting Canadian historical and cultural themes and events, in alignment with the requirements of modern digital scholars. It also makes these resources available to the public for posterity and to support future international research on Canada.

The web resources acquired by the program are made available through the Government of Canada Web Archive (GCWA). While the program and the GCWA are well known in Canada, their scale may not be.

How big is the GCWA? How much data does the GCWA contain?

In 2022–23, the Web and Social Media Preservation Program at LAC reached an important milestone.

As of February 2023, we are pleased that the GCWA exceeded 120+ terabytes of total data and surpassed over 3.1 billion assets or documents.

This is about the same amount of data as 4,600 Blu-ray movie discs (1,150 in 4K, or 384 copies of your favourite movie trilogies in 4K). If the GCWA were printed out on paper, it would take up some 57.5 billion sheets; stacking this up, it would reach the same height as 12,263 CN Towers!

Some program clients may be surprised to hear this, because since 2005, LAC has only provided public access to portions of its federal web archival collections. This means that fully 50 percent of the total collections have therefore never been available to the public until now.

Screenshot of a Government of Canada Web Archive page.

New functionalities and features of the relaunched Government of Canada Web Archive (GCWA)

New collections

We are delighted to announce that, with the relaunch of the GCWA in 2023, LAC will begin providing access to all non-federal collections curated since 2005. At the time of launch, the following collections will be available:

  • The Truth and Reconciliation Commission Collection (curated in partnership with the Centre for Truth and Reconciliation, the University of Manitoba and the University of Winnipeg)
  • The LAC collection on COVID-19 and its impacts on Canada (20+ terabytes of data)
  • All federal government data collected since 2005 (55+ terabytes of data)
  • Additional curated collections (to be arranged and published in the upcoming fiscal year)

The GCWA is one of the most comprehensive sources in existence for the following:

  • Canadian cultural and historical events as documented on the Web (2005–)
  • Official publications of the Government of Canada (GC) (2005–)
  • The federal and historical GC web presence (gc.ca domain, 2005–)
    • Historical GC financial and departmental plans and performance reports (2005–)
    • Historical GC policy frameworks (2005–)
    • Historical GC proactive disclosure (2005–)
    • Data and statistics from the federal web (2005–)
    • Material removed from the federal web under Common Look and Feel 2.0 (2005–08)
    • Material removed from the federal web under “CLF 3.0” (2008–13)
    • Material removed from the federal web under the Web Renewal Initiative (2013–)

Overall, the GCWA is the definitive source for any historical study of the Government of Canada web domain over time.

New portal design

From 2005 to 2019, the GCWA arranged data according to, and only provided access to federal government web resources under, Crown copyright (at maximum, approximately 15 terabytes of data were available). With the launch of the new GCWA in 2023, we have expanded our search tools and filters to help users explore our non-federal data and thematic web collections.

Clients will now be able to engage non-federal collections in a specialized portal and user interface. The relevant interface (government versus non-federal collections) will be presented automatically based on the collection being accessed.

Full text search of the web archive, individual collections or collection themes

Since 2011, LAC has not provided a full-text search capability or service to the public for navigating the GCWA. This situation was very problematic, and it limited client access to discovery and browsing. For the launch in 2023, a complex and powerful full-text search will be made available:

  • Clients will be able to search at multiple hierarchical levels, from the entire archive down to individual files.
  • An advanced search will also be available, including the ability to search by collection, keywords, exclusions, exact phrase, URL/domain, web resource type and date range.
  • An ability to quickly search by exact URL will also be available.
  • Further, clients will be able to discover and access the content of non-federal collections by sub-theme (for example: show all resources collected having to do with the “economic impact on Canada of COVID-19”).

Specialized reference services

LAC provides reference services and support for the GCWA. If you have difficulty locating a known resource within the GCWA, we would be pleased to assist you with the following:

  • Locating obscure Government of Canada official publications or decommissioned websites
  • Locating obscure historical reports, policies, financial data or proactive disclosure
  • Locating genres of Government of Canada content where exact titles or dates are not known
  • History and development of the Government of Canada domain (gc.ca)
  • Use of the web archives as a historical source or as computational data
  • Copyright or privacy concerns
  • Questions on how to have your web resource digitally preserved at LAC

Do you have ideas on what should be collected? Please let us know!

Ask us a question. We can help with all reference questions dealing with the web archive, nominations of Canadian web resources for acquisition, or requests for computational access to our web archival collections data.


Tom J. Smyth is the manager of the Web and Social Media Preservation Program at Library and Archives Canada.

Digital preservation at the crossroads

By Faye Lemay

Did you know that Library and Archives Canada (LAC) has not only photos, books, paintings and manuscripts, but we also have a collection of digital material? Since we are the stewards of Canada’s documentary heritage, we need to make digital and analogue content available and usable.

Imagine creating a WordPerfect file in 1996, saving it to a floppy disk and then trying to open it today. Three things could occur: 1) you might not have a floppy drive, 2) the floppy disk might not work anymore, or 3) you might not have the software to open the WordPerfect file. Now imagine this on a scale that includes thousands of different types of files created by federal government workers, private Canadian citizens, publishers, etc., and stored on many different kinds of systems, diskettes and computers.

A colour photograph of an envelope containing different types of floppy disks.

Floppy disks in the Published Heritage collection.

Digital collections are inherently vulnerable to degradation and decay at a speed much faster than paper. To ensure the material lasts hundreds of years, digital preservation specialists must monitor and take action to prevent digital loss. These specialists monitor what types of file formats people are using (e.g., PDF, WPD), plan for changes in technology and create multiple copies, which are stored in climate-controlled vaults. We also make sure that the content of the files has not changed over time. Given how fast technology changes, we are always thinking ahead to prevent losing these treasured collections.

A colour photograph of a cabinet drawer containing hundreds of CD cases.

A small sample of the music CD collection, encompassing over 70,000 titles.

For LAC, our digital crossroads is now. We are in an era where digital collections are surpassing analogue collections in size. A recent inventory of our digital material revealed a vast and varied collection, both online and in physical media such as floppy disks, CDs and DVDs. This inventory also revealed that the volume of digital copies of university theses held at LAC is approaching that of analogue copies—and we only began acquiring theses in PDF digital formats in 1998. Since 2014, LAC has been acquiring theses in digital formats only. Official federal publications are also now primarily in digital format, since the government publishing regulations switched in 2013 to allowing online formats only. In addition, for the first time in its history, LAC received a private donation with 90 per cent of the collection in digital file format.

The LAC Digital Archive in the Preservation Centre serves as the central repository for LAC’s digital collections. Currently we preserve over five (5) petabytes of digital material, comprising primarily audiovisual material, the Government of Canada Web Archive, and digitized copies of paper records.  Five petabytes of data would be equivalent to 1,338 metres (4,390 feet) of DVDs stacked on top of one another!

Despite the considerable effort to preserve digital content today, we recognize that there is much more to be done to ensure all digital collections at LAC are protected.

November 30, 2017, marks the first annual International Digital Preservation Day. As a member of the Digital Preservation Coalition, we celebrate this day by launching the Strategy for a Digital Preservation Program. This strategy describes the additional steps needed to further preserve LAC’s digital treasures for the future and ensure that we are on the right path to success.

A colour photograph of a long white shelf on the left and high-density storage on the left.

Linear Tape Open (LTO) tape library of digital documentary heritage that are preserved in the LAC Digital Archive at the Preservation Centre.


Faye Lemay is a manager of digital preservation in the Digital Operations and Preservation Branch of Library and Archives Canada.

Web Archiving the Truth and Reconciliation Commission

By Russell White

The World Wide Web is the defining communications medium of our era, and a vital source of Canadian documentary heritage. At the same time, websites lack the durability of analogue materials and have a limited lifetime online.

As the Truth and Reconciliation Commission of Canada (TRC) was coming to a close in late 2015, there was concern in the archival community that historically valuable information created on the web since the TRC’s 2008 inception could be lost. To meet this challenge, Library and Archives Canada (LAC) archivist Emily Monks-Leeson and LAC‘s web archiving team began preserving websites related to the TRC that were national in scope. We collaborated on the project with archivists at The University of Winnipeg and the University of Manitoba, who were at that time working on preserving TRC-related websites focused on the province of Manitoba.

Making It Public

The result of this collaboration is the Truth and Reconciliation Commission Web Archive. Launched jointly with the National Centre for Truth and Reconciliation (NCTR), The University of Winnipeg and the University of Manitoba in July 2017, the TRC Web Archive provides public access to a spectrum of voices from the web related to the commission itself and, more broadly, to the theme of reconciliation. These include official TRC and NCTR websites and related documents, blogs and personal sites on the residential school system, media articles, and sites with a community focus on survivors, commemoration, healing and reconciliation.

The websites in the collaborative TRC Web Archive were captured, described and made accessible through the Internet Archive’s Archive-It platform. To date, LAC has collected approximately 260 resources that, we believe, will be invaluable to researchers, students, survivors and their families, and anyone wanting to learn more about the TRC, its effects and legacy, and the responses to it from individuals, organizations, and media.

Here are a few examples of archived websites in the collection:

  • âpihtawikosisân: Meaning “half-son”, this is the personal blog of Métis writer and educator Chelsea Vowel, who writes about education, aboriginal law, and the Cree language. The archived blog includes observations on the legacy and public perception of residential schools.
  • We Were So Far Away – The Inuit Experience of Residential Schools: A virtual exhibit that presents the stories of Inuit survivors of residential schools, providing moving examples of what life was like for students.
  • “The Indian Residential Schools Truth and Reconciliation Commission” (Parliament of Canada): This paper by the Parliamentary Information and Research Service reviews the TRC‘s historical context, provides an overview of its terms of reference and its purpose, and discusses certain themes drawn from past truth commissions and other transitional justice initiatives conducted internationally.

About the Commission

The TRC, which began its work in 2008, spent six years collecting testimony from over 7,000 former students of Canada’s residential schools, in order to reveal the harmful legacy of the residential school system. The Commission concluded in December 2015 with the creation of the NCTR at the University of Manitoba and the release of the TRC final report, which included 94 calls to action for reconciliation and healing across Canada.

View the archived TRC reports and calls to action from the NCTR website.

Students in uniform standing in front of the Battleford Indian Industrial School in Battleford, Saskatchewan, 1895.

Battleford Indian Industrial School, Saskatchewan, 1895 (MIKAN 3354528)

What’s Next?

The TRC Web Archive is an ongoing project, and we continue to add resources to it. In the course of our work, we were also inspired by TRC Call to Action #88—in support of Indigenous sport—to create a separate online archival collection focused on the 2017 North American Indigenous Games, held in Toronto with more than 5,000 participants from across North America.

We welcome nominations from the public. If you know of a site related to the TRC, reconciliation, or Indigenous issues more broadly that would enhance our collections, please send an email to LAC’s web archiving team at bac.archivesweb-webarchives.lac@canada.ca, and we’ll assess it for preservation.

Library and Archives Canada sincerely hopes that the TRC Web Archive adequately preserves the history and legacy of the Truth and Reconciliation Commission as a respectful and sensitive documentary and research resource.

 

Related Resources


Russell White is a Senior Project Officer in Digital Integration at Library and Archives Canada