How catalogues work: algorithms

The last part of how catalogues work is looking at algorithms.

( was not a computer science major. This is going to be the non-technical discussion.

Also, the two links I mention here are from 2016, and technology has moved on a bit, as technology does, but these are good illustrations of my specific points.

Catalogues: Wooden chest of old-fashioned catalogue cards

What is an algorithm?

A good definition is that an algorithm is a step by step way of doing this. This video from the University of Washington notes that sorting your laundry is an algorithm (is it a white shirt? This pile, these things get washed together. Is it a red shirt? That goes in a different pile. Does it need special treatment? Follow these steps.) The video’s a great overview of the topic in a couple of minutes.

Computers are extremely fast at doing this kind of step, but how successful the algorithm is depends on what the people programming the algorithm have told it to do.

An important digression

The fact that human beings design these lists is a particularly business-centered reason why diversity in technology (and in companies in general) is such a big deal – people who have different backgrounds, life experiences, or ways of looking at the world are going to think about different things in the design process,. When that’s managed well, a diverse group will likely come up with algorithms and other programming that work much better for a wider range of people.

(An example here – though it is a legitimately sort of complicated record keeping – is that the Apple Health app didn’t include any menstrual cycle tracking for a long time, and it’s still much more rudimentary than some other apps. If your body does things outside of the expected timeframe, you have fewer options.)

What does this mean for catalogues?

Some of the things a catalogue uses an algorithm for are pretty straightforward. Sorting a list by the last name of the author, or the title of the work, or the year it was published is pretty simple, so long as the data is consistent.

What data might be inconsistent? An example would be if the date formats swap between United States standard dating (Month-Day-Year) and the Day-Month-Year common in parts of Europe, your results are going to be confusing. Good data is essential to sorting and organising your catalogue.

(This is why I am spending my summer cleaning up a lot of data in our catalogue at work. This week, this has meant hours of moving identifying file numbers from the format area, where they shouldn’t be anyway, to a different area, and making sure the correct format is actually entered.

We can automate some kinds of data changes, but this one requires moving data into a different field, and we don’t have an easy way to do that automatically.)

Where it gets complicated

However, once we get into things where there is a bit more of a value judgement.

What kinds of images should we get if we search on “beautiful”?

One of the examples that has stuck with me the most was something illustrated in a keynote Dr. Safiya Noble illustrated in the keynote she gave at the LibTech conference in 2016 (LibTech is my favourite library conference for a reason) It’s worth noting this was given in March of 2016, and she talks about the manipulation of algorithms and the effect on elections….

In her keynote, she did an illustration where she did a search on “Beautiful” and at that time the algorithm turned up a lot of landscapes (that were really gorgeous). But if you searched on ‘beautiful woman’, you turned up white women (and white women of a particular kind when it came to facial structure, hair, body size, and a bunch of other characteristics’. That’s what happens when human programming goes awry, or is not sufficiently questioned.

And if you tried searches like “black girls”, you got a whole different set of results, and much more mixed ones in terms of positive and negative.

So, when your library catalogue tells you you can sort by ‘relevance’ or gives you options for ‘similar topics’, there are probably a lot of different things at play. Usually, there’s software decisions in there somewhere. Some of these may be accessible to the library staff, others may be decided by the software programmers, and the librarians may have no idea how it works.

(In our new catalogue, we can choose which things to weight more – so for example, we could choose to weight phrases in the abstract (where we put a summary of the content) more than the title, or less than the title (depending on what decided). We haven’t played around with this much yet, but it’s a way to help refine options for people.)

Even more complicated

Large companies – Google, Amazon, Facebook, any of the big ones – also look at your reactions to what you click on, where you spend time, what you click away from and when (and where you go to) – because it helps them create vast maps of data they can use. Sometimes this is really handy (like when Amazon’s list of also-boughts shows you a book you love and you had no idea it existed, or Spotify’s algorithm suggests music you really like.)

Sometimes it’s a lot creepier and more awful. There’s a famous story, when it comes to algorithms, of Target figuring out a teenager was pregnant based on other purchases before her father found out about it, based entirely on purchases that were not specifically intended for a baby, but rather things like body lotion, a larger purse, two common supplements, and a bright blue rug.

And of course, it gets even scarier if we start talking about government agencies making decisions about who can get visas, fly, or do many other kinds of things, based on algorithms and data management decisions that are obscured to the end user of the information.

What you should take away from this

Trusting a computer on fairly simple sorts (like title or author or date) is fine – but if the computer is suggesting related items, and you care about getting a wider range of options, or you are concerned about implicit bias in how a system designed by unknown people might work, that’s a good time to do some more digging, or to try a variety of searches with specific parameters so that you get a sense of what is there, what’s recommended, and maybe what isn’t.

Simply knowing more about algorithms will also give you a lot more choices and awareness.

Bookmark the permalink.

Comments are closed.