Bilingual census data: a better search experience for all Canadians

Web banner for The 1931 Census series. On the right, typed text: "The 1931 Census". On the left, moving train going by a train station.By Julia Barkhouse

This article contains historical language and content that some may consider offensive, such as language used to refer to gender, racial, ethnic and cultural groups. Please see our historical language advisory for more information.

Library and Archives Canada (LAC) is the guardian of Canada’s distant past and recent history. It holds the historical census returns for Canada, including some dating back to New France and some for Newfoundland. We have indexed some dating from 1825 to 1926, and these are available online through Census Search.

Before Confederation, censuses were generally collected in either English or French, depending on the location. The Dominion Bureau of Statistics (now Statistics Canada) phased in bilingual forms after Confederation in 1867.

Example of a bilingual Census 1921 form enumerated in English and French:

A census-taking sheet from Census 1921. This particular image is page 6 for the sub-district of Scots Bay in Kings District, Nova Scotia.

Census 1921 form enumerated in English (e002910991).

A census-taking sheet from Census 1921. This particular image is page 19 for the sub-district of Wolfestown (Township) in Richmond-Wolfe District, Quebec.

Census 1921 form enumerated in French (e003096782).

The language used to record answers to census questions may reflect the language preference of the enumerator or the language in which the answers were provided. The historical census data that we have reflects our linguistic duality as a nation. Census returns from Quebec and some parts of New Brunswick and Manitoba are written (or, enumerated) in French, while the rest of Canada was enumerated in English.

When our partners, including Ancestry and FamilySearch, indexed the censuses from 1825 to 1926, we produced a wealth of data with names of individuals, their gender, marital status, etc. However, we were faced with a serious challenge: census data could be collected in either English or French depending on the personal preference of the enumerator. So how did we handle this?

Life as an Enumerator

Let us detour for a moment and describe the journey of the enumerator. Enumerators were Canadians hired by the Dominion Bureau of Statistics to collect census data in one or more sub-districts. They received a book of instructions (such as this one for Census 1921) that detailed what they were supposed to write on the form depending on what people answered. They were given a booklet of census return forms and instructions on which sub-districts to enumerate. Then this person had a timeframe to enumerate a number of sub-districts and mail these forms back to the government department. You can imagine this person going from door to door in a horse-drawn carriage or perhaps an early automobile (maybe a Ford Model T) by 1921.

The enumerator knocked on the door and asked to speak to the head of the household (typically the father and/or husband). They might be invited in to sit at the kitchen table as they asked questions. If the family was not home, there might be a notice or calling card left on the door with contact details to follow up and meet the enumerator by a given date to be counted in the census.

Depending on the province in Canada, the enumerator either wrote down information in the language of the person speaking or in their personal language preference. Therefore, it is possible that French regions of Canada around Quebec, New Brunswick and Manitoba were enumerated in one or both languages depending on the enumerator’s personal preference.

Fast forward: the data captured on the forms was transcribed by our partners around 92 years after the census taking and put online.

Language Barriers

When LAC put these databases online, we noticed that we had data in both languages. If you wish to search for your ancestor, you have to search in the language of the enumerator. Did the enumerator write your grandmother’s information in English or French? Does your name have an accent (é, è …) that might have been misheard (or not captured) by the enumerator? Does your uncle’s name have a silent “h” that might have been omitted? This creates a language barrier for our researchers, who want to find people but do not speak the language used at the time. Some of our Francophone researchers have to search in English to find their French-speaking ancestors. This is an unbalanced search experience for Canadians who access our Census Search interface in French.

Creation of Census Search

When the Digital Access Agile Team reorganized and consolidated the 17 census databases into Census Search last November, we wanted to deliver a better search experience for all Canadians. Our aim was to provide the same search experience for Francophones as for Anglophones, so that any of our clients who use the French Census Search interface can search and get the same results as if they were searching in English.

So how do you do this? How do you translate information like gender, marital status, ethnic origin and occupation for over 44 million individuals to offer an equal experience for all Canadians? It’s actually very simple. The solution? Data cleanup.

A Peek Under the Hood

Let’s go behind the scenes for a moment to look at how census data is saved. Census Search is the public interface that LAC clients can use to search. The census data for each individual in Census Search is saved in one master table called EnumAll.

Census data saved in a table in SQL Management Server.

Screenshot of Census.EnumAll from SQL Management Server (Library and Archives Canada).

In this table, each line represents an individual person. The data captured about that person is separated into columns. If we do not have data in a particular column, it says NULL.

Creation of Common Data Pools

Census.EnumAll acts as the master data table. From this, we created common data groups (or, pools). What do I mean by this? We copied all of the data for one of the columns (Gender, Marital Status, Ethnic Origin, Religion, etc.) into a separate table. The only information in this separate table is a list of Genders or options for Marital Status, etc. We call this a common data pool, meaning that all the data in this table (or, pool) relates to one piece of information.

The common data table separates the data (e.g., “Male” or “Female”) from the individual person. If you look at 44 million individuals, you see the same data repeated, such as the number of times the enumerator wrote “Male” or “Married.” In a common table, you see “Male” only once, with a value count for the number of people with this information (which we call an attribute).

This is where the magic happens.

The Gender table in the back end of Census Search. Of note, there are variances (Male and M, Female and F) and two columns titled TextLongEn (English display) and TextLongFr (French display).

Screenshot of T_Gender from SQL Management Server (Library and Archives Canada).

As you can see in this separate common data table, we can do more things. With codes, we establish one way to write each Gender (in this example). This is called an authority. We then perform cleanup so all the variants point to this one authority. In the screenshot above, you’ll notice a variance between “Male” and “M.”

Once we have this authority, we create columns for how we want to display the information in Census Search. We create an English (TextLongEn) and a French (TextLongFr) display. We then add the bilingual translation once and it applies to everything. In this case, we translated “Male” to “Homme,” and it applies to all 20,163,488 people who identified as “Male” across 17 censuses.

We then put all the tables back together and index the records to display in Census Search. So depending on the language of your choice, the interface and the data itself will now translate for you.

English Census Search interface showing Gender drop-down with values for Female, Male and Unknown alongside the French Census Search interface showing the same drop-down with Gender values for Femme, Homme and Inconnu.

Screenshots of Census Search in English and French (Library and Archives Canada).

Now, when I search for my great-grandfather, Henry D. Barkhouse, and display any of his census entries, the data translates as well.

Two screenshots, one in English and one in French, of a Census 1911 record for Henry D. Barkhouse, with arrows pointing out where the data translates.

English and French display of Census 1911 record for Henry D. Barkhouse (e001973146).

Progress Check-in

As you can imagine, this work takes time as we diligently clean up and translate our data. Our first priority was to create drop-down menus on Census Search for Gender and Marital Status. Now, if you wish to search by either of these fields, you will see a short list of terms that are translated and available in both official languages. As we continue this work, our next priorities are Ethnic Origin and Place of Birth. We are about 60–70% finished with these two, and our clients should see new options coming to Census Search in 2024. After these two priority fields, we will continue to translate other fields like Religion, Relationship to head of household, Occupation, etc.

Conclusion

Consolidating all 17 censuses into one platform, Census Search, gave us the opportunity to create a bilingual display for our census data by cleaning up the data. Since its launch, our platform delivers a more equal search experience in the language of your choice. I encourage you to try it out and tell us if your search experience has improved.

As always, we love to read your feedback and ideas via our email or you can sign up for a 10-minute feedback session with us.


Julia Barkhouse has worked at Library and Archives Canada in data quality, database management and administration for the last 14 years. She is currently the Collections Data Analyst on the Digital Access Agile Team.

Why are the 1931 census returns organized geographically?

At Library and Archives Canada (LAC), we receive questions about why materials in our collections are organized the way they are.

When it comes to the census returns, typically we explain that, as an archive, we acquire the census returns as they are—even when the writing is blurry or unreadable—as historical records. We also strive to maintain the records’ original order and context.

For the 1931 census returns, maintaining the records’ original order is not difficult. Rather than receiving 234,678 pieces of paper, we received 187 microfilm reels. On the microfilm, the imaged census returns are organized, overall, by province (east-to-west) and then by northern area (west-to-east). This is because the Dominion Bureau of Statistics imaged the census returns in order of census district number, and, within each census district, in order of census sub-district number. When we digitized the archival records here at LAC, we worked to ensure that the digital access copies would reflect the archival records’ original order and context to the extent possible. For instance, we grouped digitized images according to the title cards used in the microfilmed images. To each group of digitized images, we added additional metadata, extracted from the Dominion Bureau of Statistics’ listings accompanying the census returns.

A handwritten index card.

Example of a title card used to organize 1931 census returns on microfilm. This title card is for the eight pages of returns from Prince Edward Island, census district 3 (Queens), sub-district 10 (MIKAN 5744023)

But this explanation does not answer the original question: why are the returns from the 1931 census of population organized geographically—by census district and sub-district—in the first place?

For answers to this original question, we turn to the Administrative Report included in the Dominion Bureau of Statistics’ 1936 publication Seventh Census of Canada, 1931, Volume 1, Summary.

Answer 1: Because of the original purpose of the census of Canada. The census returns are organized geographically because of the decennial census’ role in shaping representation in the House of Commons.

“In Canada the immediate, legal raison d’être of the census is to determine representation in the […] House of Commons. Under the [Constitution], the province of Quebec is given a fixed number of seats[…] while the number assigned to the other provinces is pro rata on a population basis as determined by the census[….] The Canadian Census is thus taken primarily to enable a redistribution bill to be passed through Parliament” [Page 32; emphasis added]

It is worth getting a bit technical here. In the early part of the 20th century, redistribution bills updated the number and boundaries of federal electoral districts based on changes in population—as established by the previous decennial census—among other considerations established by law.

A map depicting federal electoral districts in the prairie provinces—the boundaries are indicated by thick blue lines, and the names are indicated in blue type. The map was prepared on a base map featuring rivers and lakes, railways, cities and grid lines.

“Map of Federal electoral districts of Manitoba, Saskatchewan & Alberta” from an atlas created in 1924, prepared by the Department of the Interior (e011315903)

The updated boundaries of federal electoral districts informed the census districts used for enumeration purpose in the subsequent decennial census. In other words, the relationship between federal electoral district and census district had elements of a chicken-and-egg scenario.

Egg: Federal electoral district boundaries, as established in the Representation Act of 1924.

The Representation Act of 1924, sometimes referred to as the Redistribution Act, established the electoral districts to be used in subsequent federal elections. This redistribution of electoral districts was based, in part, on the population count and distribution established by the previous decennial census, namely, the Sixth Census of Canada in 1921.

The Representation Act established the official boundaries of federal electoral districts in form, not in maps.

  • To view transcriptions of these descriptions of electoral districts, consult the Library of Parliament’s online resource “Elections and Candidates” for the 17th parliament; the description of each electoral district can be viewed by clicking on a constituency title listed for the general election of July 28, 1930, and then scrolling below the header “Information” to the description under the subheader “S.C. 1924, c.63,” which refers to the Representation Act.
  • Although the Representation Act did not include maps, the Department of the Interior prepared an atlas with maps depicting the updated federal electoral district boundaries. To view digitized images of the maps in the 12-volume atlas, there are two options. The catalogue record for the Federal electoral district maps, 1924 displays thumbnail images. A 1931 Census Maps research tool supplies links to individual, higher-resolution maps.

Chicken: Census districts used for 1931 census.

1931 census districts used in enumeration mostly corresponded to the federal electoral districts established in the Representation Act of 1924, because

“[f]or the purposes of the census, the Statistics Act requires that the country be first divided into “census districts” corresponding as nearly as possible with the federal electoral divisions or constituencies for the time being—this in view of the association of the census with parliamentary representation.” [Page 51]

Nevertheless, at least eight electoral districts were “too large or too varied in physical or economic character” for the purposes of census work, and so each was split into two or three census districts (in Quebec, Charlevoix–Saguenay, Gaspé, Labelle and Pontiac; in Ontario, Port Arthur–Thunder Bay; in Alberta, Peace River; and in British Columbia, Cariboo and Comox–Alberni). Other areas to be enumerated fell outside a federal electoral district (e.g., Northwest Territories, Royal Canadian Navy ships).

The population established by the Seventh Census of Canada in 1931 was then used to inform the next redistribution of federal electoral districts.

Egg: Federal electoral districts, as established in The Representation Act, 1933

This chicken-and-egg scenario is why, at LAC, we often use federal electoral district maps as working stand-ins for census district maps to help us navigate the census returns from the early part of the 20th century. To navigate the 1931 census returns, we use maps depicting the federal electoral districts as established in the Representation Act of 1924. In the year 2028, we will likely be using the maps depicting federal electoral districts according to The Representation Act of 1933 to help us navigate the 1936 census returns from the prairie provinces after those census returns are transferred to LAC.

Answer 2: Logistics was another reason why the census returns were organized geographically.

The 1931 Census of Canada, like most population censuses, was meant to enumerate each person within the boundaries of the Dominion of Canada once, and only once. For this purpose, land within the borders of Canada, and navy ships, were divided into 15,167 units of enumeration. Each geographical unit of enumeration—called a census sub-district—was assigned to a single enumerator (more or less), tasked with recording each person residing within the census sub-district.

Four people interacting in a sub-zero-degree landscape.

Enumeration proceeding in 1961: An R.C.M.P. member talks to three people from an Inuit community to collect census information (e011177562)

“The census enumerator is the only census official coming into direct contact with the general public; [s/he] is who makes the house-to-house and farm-to-farm canvass and who is primarily responsible for the details collected on the census schedules. The necessity of providing that no more or no less than a suitable amount of work should be assigned to each enumerator (experience has demonstrated this to be a population of 600–800 in ordinary rural districts, and of 1,200–1,800 in urban) […] renders departure necessary in many cases from the electoral boundaries; […] and the polling subdivisions are not always convenient as census sub-districts. In all such cases, however, the division is effected in a way that permits compilation of the results in the form required for the purposes had in mind by the Act.” [Page 51]

Establishing the geographical boundaries to be used in the census was no minor feat:

“The drawing up of the scheme of census districts and sub-districts is a task of considerable magnitude; it is put in hand about two years in advance of the census date, and is carried out not only in the light of conditions revealed in the preceding census, but in consultation with local officials, so that no inhabited area may be overlooked or left unprovided with the organization best suited.” [Page 51]

Ninety-two years later, the task of finding a particular census return among the 1931 census returns that are organized geographically—organized according to census district and sub-district—may seem a little overwhelming. The blog posts Puzzling through 1931 Census sub-districts – Part 1 and Part 2 describes the approaches we use at LAC to navigate the 15,167 sub-districts used for enumeration in the 1931 census.