How catalogues work: algorithms

The last part of how catalogues work is looking at algorithms.

( was not a computer science major. This is going to be the non-technical discussion.

Also, the two links I mention here are from 2016, and technology has moved on a bit, as technology does, but these are good illustrations of my specific points.

Catalogues: Wooden chest of old-fashioned catalogue cards

What is an algorithm?

A good definition is that an algorithm is a step by step way of doing this. This video from the University of Washington notes that sorting your laundry is an algorithm (is it a white shirt? This pile, these things get washed together. Is it a red shirt? That goes in a different pile. Does it need special treatment? Follow these steps.) The video’s a great overview of the topic in a couple of minutes.

Computers are extremely fast at doing this kind of step, but how successful the algorithm is depends on what the people programming the algorithm have told it to do.

An important digression

The fact that human beings design these lists is a particularly business-centered reason why diversity in technology (and in companies in general) is such a big deal – people who have different backgrounds, life experiences, or ways of looking at the world are going to think about different things in the design process,. When that’s managed well, a diverse group will likely come up with algorithms and other programming that work much better for a wider range of people.

(An example here – though it is a legitimately sort of complicated record keeping – is that the Apple Health app didn’t include any menstrual cycle tracking for a long time, and it’s still much more rudimentary than some other apps. If your body does things outside of the expected timeframe, you have fewer options.)

What does this mean for catalogues?

Some of the things a catalogue uses an algorithm for are pretty straightforward. Sorting a list by the last name of the author, or the title of the work, or the year it was published is pretty simple, so long as the data is consistent.

What data might be inconsistent? An example would be if the date formats swap between United States standard dating (Month-Day-Year) and the Day-Month-Year common in parts of Europe, your results are going to be confusing. Good data is essential to sorting and organising your catalogue.

(This is why I am spending my summer cleaning up a lot of data in our catalogue at work. This week, this has meant hours of moving identifying file numbers from the format area, where they shouldn’t be anyway, to a different area, and making sure the correct format is actually entered.

We can automate some kinds of data changes, but this one requires moving data into a different field, and we don’t have an easy way to do that automatically.)

Where it gets complicated

However, once we get into things where there is a bit more of a value judgement.

What kinds of images should we get if we search on “beautiful”?

One of the examples that has stuck with me the most was something illustrated in a keynote Dr. Safiya Noble illustrated in the keynote she gave at the LibTech conference in 2016 (LibTech is my favourite library conference for a reason) It’s worth noting this was given in March of 2016, and she talks about the manipulation of algorithms and the effect on elections….

In her keynote, she did an illustration where she did a search on “Beautiful” and at that time the algorithm turned up a lot of landscapes (that were really gorgeous). But if you searched on ‘beautiful woman’, you turned up white women (and white women of a particular kind when it came to facial structure, hair, body size, and a bunch of other characteristics’. That’s what happens when human programming goes awry, or is not sufficiently questioned.

And if you tried searches like “black girls”, you got a whole different set of results, and much more mixed ones in terms of positive and negative.

So, when your library catalogue tells you you can sort by ‘relevance’ or gives you options for ‘similar topics’, there are probably a lot of different things at play. Usually, there’s software decisions in there somewhere. Some of these may be accessible to the library staff, others may be decided by the software programmers, and the librarians may have no idea how it works.

(In our new catalogue, we can choose which things to weight more – so for example, we could choose to weight phrases in the abstract (where we put a summary of the content) more than the title, or less than the title (depending on what decided). We haven’t played around with this much yet, but it’s a way to help refine options for people.)

Even more complicated

Large companies – Google, Amazon, Facebook, any of the big ones – also look at your reactions to what you click on, where you spend time, what you click away from and when (and where you go to) – because it helps them create vast maps of data they can use. Sometimes this is really handy (like when Amazon’s list of also-boughts shows you a book you love and you had no idea it existed, or Spotify’s algorithm suggests music you really like.)

Sometimes it’s a lot creepier and more awful. There’s a famous story, when it comes to algorithms, of Target figuring out a teenager was pregnant based on other purchases before her father found out about it, based entirely on purchases that were not specifically intended for a baby, but rather things like body lotion, a larger purse, two common supplements, and a bright blue rug.

And of course, it gets even scarier if we start talking about government agencies making decisions about who can get visas, fly, or do many other kinds of things, based on algorithms and data management decisions that are obscured to the end user of the information.

What you should take away from this

Trusting a computer on fairly simple sorts (like title or author or date) is fine – but if the computer is suggesting related items, and you care about getting a wider range of options, or you are concerned about implicit bias in how a system designed by unknown people might work, that’s a good time to do some more digging, or to try a variety of searches with specific parameters so that you get a sense of what is there, what’s recommended, and maybe what isn’t.

Simply knowing more about algorithms will also give you a lot more choices and awareness.

How catalogues work: editorial influence

There are several places in a catalogue where there’s a degree of what might best be called editorial influence. More bluntly put, it’s people (at some degree) making decisions about these things, and those decisions come with biases, both good and bad.

We also use algorithms and those algorithms have biases, and that’s a different topic (and one for next week.)

Words mean things

Those words we use as a controlled vocabulary come from somewhere. Humans came up with them – humans with all their virtues and all their biases.

Sometimes, terms were recommended by experts in the field, or people who knew a topic intimately. (Those aren’t always the same thing!) Both these perspectives bring history and assumptions with them that may or may not fit in with the larger collection or way terms are arranged.

Sometimes those terms were the current thing at a particular time, but we have come to new understanding (this is true for a lot of terms about gender identity and sexual orientation, and also for terms around neurodiversity, and around topics like disability.)

Sometimes topics are entirely new – as technology changes, we need to come up with words to help us find things about it. Do we catalogue it by the current tech device, or do we use a more general term, because the iPhone X of today is going to be barely in service in five years, and mostly forgotten in fifteen?

Sometimes we have to pick one – like my exaple in earlier posts, you sometimes need to pick an option so that you have one main subject heading, rather than making people search through

  • Cat
  • Cats
  • Felines
  • House cats
  • Kitties
  • Pussy cats
  • Fuzzballs who take over the bed

(Ok, that last one isn’t very likely.)

Some of these terms are more clinical than others. Some are questions of ‘do we make a standard of singular or plural for groups of things’? Some are ‘do we include a common nickname or slang term’. Some terms might be more historically dated than others.

Why does this matter, anyway?

This might not seem like a big deal with cats – but it can be a bigger deal if you’re talking about health information, or topics where there’s often a difference between experience of a thing and professional knowledge and training about a thing.

(Dealing with the legal system as a person dealing with a crime versus lawyers and judges. Dealing with a health issue as a person experiencing a problem versus being a doctor or nurse or health care professional.)

Sometimes terms can bias our assumptions about results. I mentioned the issue with the Library of Congress wanting to drop ‘illegal alien’ and use other terms, and being blocked by Congress (because of the role the Library of Congress plays with the actual work of Congress and the need to reflect the terms used in the laws.)

Individual library systems may decide to change their terms for these kinds of topics, to create a more welcoming and diverse environment in various ways and to reflect the needs of their particular communities.

That part, of course, is where it can get complicated. Libraries are aware that they’re serving the people who come into their building (many of whom do so fairly anonymously: librarians don’t know what you look at on the shelf, and many libraries deliberately do as little tracking of activities, loans, and other user-specific details as they can get away with, to preserve patron privacy.)

But libraries also serve people who never come into them. Not just the people who use online resources (libraries can see what’s getting used), but libraries should also be thinking about all the people who don’t use their services but could.

This is most easily illustrated by public libraries, since they serve a particular location. A library might notice that they’re seeing some types of people use the library regularly, and may be able to tell from demographic information about their area that they’re not seeing some groups as often as they should be.

Sometimes that’s about the words we choose. Whether people can see themselves reflected in the library and the catalogue and the displays.

Who decides subject headings for a work?

There is also a degree of editorial influence on who sets the subject headings.

Large publishers often suggest them – you may see this in the front of the book, on the copyright page. Below the legal information, there will be some suggested subject headings and call numbers. Libraries don’t have to listen to that, but in practice they often do unless there’s a specific reason to overrule them.

In other cases, it may be a central cataloger (in a large library system) or an individual librarian. It’s hard to tell!

Generally, no one in this process (except maybe someone on the publisher’s end) has read the whole book, and the subject headings will reflect the large topics in the book, not specific ones.

People will also pick how specific the subject headings are. For example, do you pick United States – History or Massachusetts – History? Or maybe Women – United States – History – 20th Century. (Here’s a page explaining some of the options from New York University.)

Next time, a brief look at algorithms and how they affect searches. (It’s a huge topic)

How catalogues work: figuring out search terms

One key step in using catalogues is figuring out search terms.

Catalogues: Wooden chest of old-fashioned catalogue cards

What kinds of searches can you do?

In most electronic catalogues you can search by all sorts of things.

Many libraries have gone to the single search box (popularised by Google). Technically, this is called a keyword search, and it usually searches all the text in the record.

Pro: You don’t need to guess which field a given thing might be in, and searching on things that aren’t subject headings but show up in the title or blurb will still come up.

Con: You can get a lot of false results that don’t actually have what you want, especially if you’re searching for commonly used words.

If you end up with all sorts of results that don’t help you, two things can help. First, there’s probably an option somewhere on that first search screen that says something like ‘advanced search’. Second, once you do a search, you may be presented with some options to help you filter the results.

Advanced search

Depending on the catalog, you will usually see a variety of options that let you limit your search in different ways. Common ones include:

  • Searching just the author, subject, or title fields.
  • Searching a range of years.
  • Limiting the results to a particular format, location (for systems with multiple locations), or sometimes specific collections (like juvenile books), or languages.

You may need to do a little digging in the help information (likely also linked from the search form) to understand your options in detail.

Limiting results

It’s sometimes (okay, often) a lot easier to start with a keyword search and then limit your results in different ways.

In my library’s catalog, I can limit by the following, to give you an example:

  • Location (so I can find books in my local library)
  • Availability (books I can get right now, either in a library or online)
  • Whether the search term is found in the title or subject
  • Format (book, ebook, audiobook, etc.)
  • What collections it is in (this distinguishes library and children or adult)
  • Places the book takes place

And then it shows me related searches, including established subject terms, and some additional suggestions.

Understanding subject headings

In practical terms, you are probably not going to do what librarians do to learn about subject headings.

(For the curious, this involves most library schools require a class in cataloging that includes a lot of the specifics. Then you go out into the world and spend a lot of time starting at instructions and hoping you’re doing it right, punctuated by asking other people if you are.)

Individual libraries also have their own policies – the library I work at has set up a list of keywords instead of official subject headings, because a lot of our needs aren’t represented in them (or are using terms that aren’t a great fit for us – they’re dated, they draw from specialities that aren’t the terms the people who use us will use, or both!)

As a library catalog user, my best tip is for you to look for hints about what kinds of terms will work. Fortunately, these are pretty straightforward

1) Try searches

One of the best tips for getting your bearings in a new catalogue (by which I mean one that’s new to you) is to try some searches of items you’re pretty sure are in there, and that are reasonably similar for other items you want to look for.

Ideally, these will be the same subject (generally speaking) as the items you want, but if you’re not sure about that, at least try for the same topic area – if you want to do searches about religious information, try other religious titles or topics. If you’re looking for history, try other historical things. And so on.

The goal here is to do a few searches and see what comes up and how the search terms work.

2) Linked subjects

In many library catalogs, you have the option to click on the subject headings to find other items with that subject heading. This can be tremendously helpful once you find one book that’s what you want. (Of course, it’s finding that first thing that can be tricky!)

You may want to add several books to a wish list or cart (whatever the catalog uses) or bookmark them before you go too far astray in your searches, so you can get back to your starting point again easily.

If you’re having trouble with searches, try simpler ones – for example, if you’re trying to search an entire title, try

3) Look for known books or topics that should be in the collection.

For example, for modern Pagan materials, I often suggest people try Scott Cunningham’s Wicca for the Solitary Practitioner, or Starhawk’s Spiral Dance. Both are commonly held by most moderate to large library systems, and they’ll give you a starting place for what terms are being used.

In my local library system, Cunningham’s book comes up with the subject headings “witchcraft”, “magic”, and “ritual”.

That’s a hint that I probably want to check ‘witchcraft’ as well as ‘Wicca’ as subject headings.

(This is because older books were cataloged before Wicca became an official Library of Congress subject heading around 2006 or 2007 – libraries don’t generally go back and recatalog subject headings unless there’s a very significant reason to, because it’s a big cost of staff time.

Something like ‘witchcraft’ and ‘Wicca’ where it can be tricky to figure out exactly which heading applies to some books, and where ‘witchcraft’ is still accurate, if a bit more general ideal, is less likely to get edited than, say, a library that is fixing or updating subject headings to reflect current understanding of gender identity or sexual orientation or legal issues.)

4) Check the ‘about’ for information or ask a librarian.

Still stuck? Check the library’s help information or ask a librarian for help – you can ask general questions, and they can help you navigate.

If you don’t want to (or can’t get to) the physical library easily, most libraries have an option for email or chat help these days, at least some of the time.

How catalogues work: Controlled vocabulary

Today’s discussion of catalogues is about how you find things by topic. I talked about some of this in my post from March about personal libraries, but I want to talk more here about how libraries select subject terms.

Catalogues: Wooden chest of old-fashioned catalogue cards

It’s mysterious

Let’s be honest. A lot of the process librarians use to select subject terms is pretty mysterious. That’s because we’re trying to label quite complex things in a very complex world, and we’re using a variety of tools to do it, because outside of very very small collections (relatively speaking – in practice, this is probably a couple of thousand books at the smallest), it’s too big for anyone to keep in their head.

On the good side, this means people have to write things down, which makes long-term consistency easier, and which can help us see patterns.

On the bad side, it means things can feel (and be) very rigid, or slow to change, or complicated to navigate. All of which can make things a lot less accessible or useful. And the speed of change often means terms don’t reflect current understanding of things like identity, culture, or communities.

So where do these terms come from?

In libraries, libraries usually pick a set of subject headings to use. The subject headings act as a controlled vocabulary (which basically means ‘we have a fixed set of terms we choose from.) Like I explained in the post last March, this is what helps us avoid using all of these terms for the same thing:

  • felines
  • cats
  • cat
  • domesticated cat

Sometimes we might want to make distinctions (domestic cats as compared to lions or tigers or snow leopards), but if we don’t, we want to pick one term and settle on it.

Libraries use one of a couple of common lists for subject headings. The most common, probably, are the Library of Congress. These are very extensive (it takes up about 20 volumes as print books on a shelf) but the fact the Library of Congress deals with so many different topics means that it’s often quite slow to make adjustments.

For example, the addition of the word “Wicca” as a subject heading only took place in about 2004, and only after a petition from a librarian. (This is often the way changes get made: one or more librarians notice that a term needs adding or improving or changing, and they provide evidence.) The term ‘Wicca’ had been in broad general use since the 1950s and 60s, so that’s about 50 years.

This isn’t always simple – here’s a story of attempts in 2016 to get the terms ‘aliens’ and ‘illegal aliens’ changed, and how the support from librarians and library associations for a student-led project ran smack into issues of law.

(Why does Congress get a say in this, you might be wondering? The Library of Congress’s first job is to provide resources for Congress and members of Congress. Makes sense if you think about the name.)

One other important note is that many libraries don’t have the resources to go back and catalogue older items to the new subject headings – so you may see pointers from new terms to check older terms as well. (This depends a lot on the library and the priority of the topic.)

Who assigns the terms?

Good question. In many cases, the subject headings are primarily assigned by whoever it is at the Library of Congress assigns the headings for that particular item. These are likely people who have some experience in the general field or area of the books, but you can usually assume they’re not experts or specialists in all the nuances of the field or topic.

(In other words, they’re not going to get really nuanced about choosing, say, a term of magic or ritual in a Pagan setting. They may assign them both.)

Usually terms are based on the few most obvious and relevant topic. If something is mentioned for less than a chapter or two, it almost certainly won’t get a subject heading unless it’s something really unusual. For a full length nonfiction book, you can usually expect 3-5 subject headings.

You can also assume the person doing the cataloguing probably hasn’t read the book. Cataloguers don’t have time for that! They’re relying on the blurb on the back and things like skimming the table of contents. Publishers can also suggest subject headings or terms to include.

Some libraries do have their own cataloguers evaluate materials and add or edit terms. This is particularly true for things like local history or other items of particular local interest.

Or a school library might assign a heading for particular regular class assignments or projects, to make it easier to find those items. (There are other ways to group things, too.) Some libraries do a “Best resource” subject heading to make it easy to find the best resources in a topic. (Mine does this.)

Next week, more about working with search terms in practice.

How catalogues work: Search

Welcome to part 2 of my series on catalogues. In this part, I’m going to talk a bit about different kinds of searches you might want to be able to do.

Skills and tools : Glasses and pen resting on sheets of printed music

Keyword

This is the kind of search many of us are most familiar with today is a keyword search. You know, the kind where you. get presented with a search box, and you type words in, and sometimes the thing you want comes out the other end?

Keyword searches basically search anywhere in the searchable text for a word or phrase (depending on how things are set up). This can be really helpful, or really horrible.

Helpful

Keyword searches can be great because you don’t have to remember what kind of information the thing you’re searching for is. And you don’t have to figure out how it might be organised in the thing you’re searching. The word matches or it doesn’t.

They’re also fantastic for something we’re doing at work – instead of having hundreds of subject headings that get used for just one thing (a person, a device or software program, a tool), we’re making sure those are in the abstract, and then assigning more general subject headings.

That way, people can both look at groups of things (handy in a rapidly changing setting like technology) and also find specific tools or people if they need to.

Horrible

The downside to keyword searching is that if the word is only in one place in the record, and there’s a typo or something else that affects how the word is entered, you won’t ever find that record.

That’s also true if someone uses a similar word to the one you’re searching on – but not the exact one. (Remember what I said in part one, about libraries not having cutting-edge computing power?)

For example, in some systems, “cat” and “cats” may be treated as different words, and typos or alternate spellings definitely will be. There’s a word that is all over our work catalog, but it is sometimes spelled with a hyphen and sometimes no hyphen (the two parts of the word together) and our catalogue searches these as different things.

Depending on the system, the catalogue may adapt some things for you, but it probably won’t be as wide-ranging as your favourite search engines.

Somewhere in between

One of the challenges of keyword searching is that you need to have terms that are unique enough to make a search find what you want – and sometimes that’s going to be really challenging.

I was part of a long-term Harry Potter project – it ran for 7 years, and over those years, we averaged 100 emails many days. As you can imagine, a lot of them had very similar terms and names in them, so we had to learn to figure out other ways to search email to find specific details (and since it was such a long project, sometimes those details were a year or two back, and relied on someone’s memory of what term we were using.)

This eventually drove me to create a wiki for the project that ended up with 9000+ pages, but that’s a whole other story…. (And set of posts.)

This is where learning to think about your keyword searches in more complex ways (such as using multiple terms, using boolean searches, or using ways to filter or limit the results) can be a big help.

Boolean?

You may remember hearing about this in library classes back in your education somewhere. Boolean is the term for doing searches that are joined by AND, OR, or NOT.

(You don’t usually actually have to capitalise the terms, and some systems may use symbols instead of words, but people often do when explaining them because it makes it a lot easier to figure out what’s going on. On some systems, you can select them from drop-down menus.)

AND means you want results that match all the items you list with AND. For example, “cats AND dogs” will return only those items that talk about both cats and dogs.

OR means you want anything that mentions either of them (or any of them, if you have more than two terms.) In this case, “cats OR dogs” would return any page that has “cats” on it, any page that has “dogs”, and also any pages that have both terms.

NOT means that it won’t return pages that have the term indicated by the NOT. So, “cats NOT dogs” would give you all the pages about cats, but not any that mention dogs. This one can be tricky because it would also leave out things like “Cats are not like dogs at all!” or “This is the page for people who love cats, no dogs here.”

In many search tools for catalogues, you can do different combinations – for example, you could say you wanted to search all items mentioning cats or dogs (keyword search), and then say you didn’t want a particular format (NOT book, in the format search).

Other tips

In some search tools, you can also do more complex searches. Usually, you can find out about the options by looking for the search help information, or sometimes an advanced search tool.

Some common options include:

1) Searching on a phrase.

Usually, this is done by putting quotes around the phrase. “Sun and moon”, for example. Normally this will search for the exact words in that order.

2) Limiting results in different ways.

These can include by date (usually you can specify a range, with some common ones being pre-set, like ‘last month’ or ‘last year’.) It can include things like ‘this email has an attachment’. It can include multiple search fields.

A lot of this depends on context and your particular technology.

3) Type of resource

In some tools, you can search by different types of resource ( for example, on Google, you can search on word or phrase, and then also look for images, news stories, videos, etc. each of which have some additional tools

Next up, talking about controlled vocabulary and why it is both handy and complicated.

4) Not finding expected results?

Sometimes you’ll do a search, and you’ll get very different results than you expected – you know there are things about that in the thing you’re searching. If that’s the case, try a simpler search (just the title or just a phrase from the subtitle, for example). Sometimes a symbol will do something you didn’t expect (in our catalogue at work, a colon will tell the computer to do a ‘from X to Y’ search, so you get really weird results when you put a colon between a title and subtitle if you don’t put the whole thing in quotation marks.)

Usually the help information or the library staff can help you sort this out.

How catalogues work: An introduction

We’re working on a major catalogue update at work, which has me thinking a lot about how people use catalogues, databases, and other collections of information.

In talking about our new catalogue, I’ve also been reminded that most people don’t know how these things work, or what might be useful to them – so it seems like a great time for a short series of posts about that.

Catalogues: Wooden chest of old-fashioned catalogue cards

The basics

So, the first thing we should start with is what’s a catalogue?

For libraries, a catalogue is a highly specialised database that holds information about books in the collection. Often these are parts of an Integrated Library System, or ILS, that tracks a whole bunch of things. Sometimes the catalogue only does pieces of it.

Common things included:

  • Information about works in the collection (such as title, author, publisher, publication information, call number, subject headings). This is sometimes referred to as the bibliographic record.
  • Information about particular items in the collection, i.e. each actual thing that’s on the shelf (or however it’s stored or accessed). This is sometimes called a ‘holding’ record (because it describes the holdings of the library).
  • Loan information about specific items in the collection and who has them.
  • Information about electronic resources (sometimes this might be a link to them, sometimes systems pull in all the things in a database so you can search for them all in one place.)
  • Additional resources the library has chosen to add (documents, files, etc.)

These records may have public notes (things to help library users) or staff-only notes (to help staff manage resources and answer questions.)

Again, not every library will have all these things.

Our collection at work has bibliographic records, but doesn’t have separate holdings records (all the information about all our copies is in a single record: this is sometimes a bit clunky, but it works okay for us because we don’t check a lot of items out.)

Likewise, we don’t have a separate circulation (or loan) module – all the loan information is in the record. Library users can’t see it, because it doesn’t display to them, just to the tools staff uses.

(In some libraries, this would be a problem, but in our library, there’s just one and a half staff members, and we both need to have access to it. The library assistant usually deals with loans and circulation, but if she’s on vacation or something comes up, I need to be able to see what’s going on and make changes too.)

Metadata

When you put information into a catalogue, you are collecting metadata – that’s the term for ‘information about a thing’.

My favourite explanation of metadata comes from a Scientific American piece from 2012 that used Santa Claus and Christmas lists as examples. Go read it, if you’re not sure how metadata works, I’ll wait for you.

So, metadata about books includes the title, author, publication information. It might also include things like if a book is considered a particularly good resource, or is on a recommendation list. It might include if it was donated (and if so, by who). All kinds of things can be metadata.

Libraries have some commonly used systems for formatting it. A lot of libraries still use MARC (which stands for Machine Readable Cataloging record). Here’s a longish explanation from the Library of Congress about the details. This provides the structure for the data.

Besides the structure, there needs to be consistency in how you write things down. For a long time, libraries used the Anglo-American Cataloging Rules (AACR or AACR2 for the 2nd edition, etc.) but now a lot of libraries use RDA or Resource Description and Access.

Why all the rules?

Computers are still fairly stupid – they’re really quick at matching up things we tell them with things that they have stored, but they need a lot of help to match up things like typos or alternate ways of phrasing things.

(Google, Amazon, and the other tech giant companies have huge amounts of resources and lots of cutting edge design capabilities to make that work. Your average library just doesn’t. Your average library is probably pretty excited if their staff computers are less than four years old.)

So, in order for the computer to match things up, the library needs to be using consistent words (what’s called a “Controlled vocabulary” for things like subject headings and formats) and an underlying structure.

What does this mean in practice?

A lot of what I’m doing in our new catalogue right now is setting up that structure and arranging the different screens so they do what we want.

For example, we have a lot of options on the screen to add things to the catalogue, but when we edit things, we’re usually only editing a couple of specific pieces. So I set it up so those are at the top of the screen, and then we can get to everything else if we need to, but don’t have to scroll to get to it.

(You have no idea how exciting this is, when you’ve been spending years having to scroll down a very similar-appearing form to look for one specific field.)

But another big part of what I’m working on is fixing things so we’re using a smaller list of terms for things like format and location. That means people will be able to filter usefully by them, which will be amazing.

(This is going to take months and months. Fixing the formats and locations are pretty quick, but we have 14,000 subject headings, and a lot of them are tiny variants or typos of the ones we actually want.)

How research has changed : digital work flow

Penultimate in the current series on how research has changed, I want to talk about digital-only workflows.

Massive pendulum clock (from the Warner Brothers Harry Potter studios) with the text "Times change"

Electronic workflow

I don’t know about you, but a whole lot of how I get information starts digitally these days. Having a workflow that works for you is critical if you’re doing larger projects.

There are a fair number of resources out there to help you get a grip on tools that work for you (I’m going to talk about my current setup here, but there are lots of other ways to do this.)

I find the Prof. Hacker blog, a collective blog focusing on tech tools and resources, a helpful read. A lot of the tools aren’t things I need, but they highlight things I want to know about fairly regularly, and I find it interesting to know about other tools. The already mentioned Productivity Alchemy podcast also brings up interesting tools regularly, on a less academic front.

Basically, though, you want a way to collect things, and then a way to organize the things. If you’re like me, many of your things may be webpages or sites.

My basic workflow

This is what I use for all online content I want to save – it works for me, but it’s not the most elegant option. What I like about it is:

1) I can use it from any device

I use a Mac at home, a Windows machine where I can usually add browser extensions but not apps at work, and an iPad when travelling. Because this relies on extensions (or the iOS ‘send to this app’ option) it’s pretty easy to use anywhere I happen to be.)

2) The management can be sporadic

Obviously, there are benefits to keeping on top of it, but the way my system works, it’s okay if I get behind on moving from the collection point to the organisation part.

3) I can usually find the thing I’m looking for.

This is key. If I couldn’t find things, it’d be a bad system. But I usually know which place to look for it, and the search tools work well enough.

Steps

I rely on two tools, Instapaper and Pinboard. Instapaper is currently free (but is owned by Pinterest, so changes are possible in the future). Pinboard has a small yearly fee ($11 currently) but is run by someone independent, Maciej Cegłowski, who designed and runs the site. There’s also a full page archival option for another $25 a year.

(There are plenty of other tools out there for saving things as you read them, but I really do recommend Pinboard for organizing them once you’ve got them.)

My actual steps look like this.

  • Read or find a thing I want to save.
  • Use extension to save it to Instapaper.
  • Periodically, go through Instapaper and move new items to about 8 folders in Instapaper for later sorting.
  • When I’ve got time and feel like it, put things into Pinboard with much more useful tags.

Right now, I go through Instapaper every two weeks, a few days before I start doing my newsletter for the fortnight. I have a folder where I put the links I want to share in the newsletter, so I can work my way through writing them up efficiently.

My other folders include recipes, links related to my day job, writing, Pagan topics, writing, and business things. I have a catchall folder (cleverly called ‘links’) for anything else I want to save. I also have folders for things to read (which is where I save books I want to read), and things to watch or listen to.

Every so often, I make a point of churning through links and tagging them in Pinboard – it’s a great project for when I don’t have a lot of focus to write and have a thing I want to watch.

I usually can remember if I’ve moved something to Pinboard yet, so I also usually can figure out where to look for something.

Having a two step process also helps for saving things to read later (especially when I’m travelling and have less time or internet access), or weeding out highly aspirational recipes I’m never going to actually consider making.

I use this process for all my links, but it’s pretty easy to see how to adapt it for research work. You could have a folder for each big project, or make a point of moving those to a bookmarking service more frequently.

Or you could use a citation manager. Which will be my final post in this series, coming next week.

How research has changed : online databases

Today’s installment of what’s changed in research comes back to a topic I’ve talked about before – the relative wonders of online databases.

(Relative, because they’re not entirely perfect, but they’re still a big improvement in many ways over the previous options.)

Massive pendulum clock (from the Warner Brothers Harry Potter studios) with the text "Times change"

What’s an online database?

An online database collects articles or materials from various (relevant) sources, and provides a way to find things in different ways (by topic, by author, by publication, by whether it’s a peer-reviewed publication, all sorts of options. often.)

A database can collect material from one source (like an archive of a particular newspaper) or it can collect material from dozens or hundreds of possible sources.

Often, when a database is pulling from many sources, they’ll be about roughly similar topics. For example, the ERIC database gathers educational journals and materials, and JSTOR has a number of different modules, many of which focus on different collections of journals in the humanities.

As you might guess from these descriptions, some of what’s in a database can be rather obscure at first glance.

Getting access to databases

Database access is expensive – in academic libraries, there’s a pretty good chance more than 50-60% of the library’s collection budget goes for database access these days. Worse, the costs go up all the time, sometimes by double digit increases.

That means that libraries make choices every year about what databases they have, and which they continue to have – and how to manage access to them.

The actual details get incredibly messy and complicated, because publishers often bundle access (you can only get access to things A, C, and G you really want, if you also get access to B, D, E, and F, which are sort of useful for your library’s users, but if you had the choice, you probably wouldn’t get that, you’d do something else with your limited funds.)

Also complicating the details are the fact that sometimes groups of libraries arrange access to databases jointly – sometimes a library consortium, sometimes there are state or regional contacts.

Probably obviously, there are lots of different kinds of databases out there, and different kinds of libraries will make different choices. A public library doesn’t really need access to a specialist chemistry database, the academic library maybe doesn’t need one about crafts or genealogy.

What that means for you is that it’s usually best to look at a combo of what you’re doing, and what your local or area libraries offer, as a first step of figuring out access to materials.

How do you find out what databases a library has access to?

Usually there will be lists on their website – it might be under “Electronic resources” or “Online databases” or “A-Z database list” or other phrases like that. Sometimes it’ll be along with other kinds of resources, like ebook access or music downloads.

This should give you links, information about what you need to access it. Sometimes you may need to be on site, often you may need a library barcode or other login method.

Tips later in this series will help you find out about other kinds of articles and resources, which you can usually request through interlibrary loan, even if your library doesn’t directly offer access.

How do you get access to a library?

Most libraries, even very small ones, offer a little access to databases – but they may not be very useful ones for your research.

In some places, you can get access to databases at very large public libraries if you live or work or go to school in the state. In some places, you can get access as an alum (though licensing costs make this a bit less common). In some cases, you can get access if you’re physically in the building, but not otherwise.

It’s worth checking the policies of any library you can reasonably get to – even if that chance is once every few months or every year, you can store up things that need database access and do it then.

Especially in more rural areas, many campuses have more generous access options for people who live in the area. And in the United States, state colleges and universities often have fairly generous guest access.

Figuring out what’s out there

Once you find out what databases you have access to, I advise doing a little exploring. Figure out which databases deal with the topics you’re particularly interested in, and explore. There will often be a list of topics covered, or you can find a list of specific journal titles through links about the resource. (Often this will say something like “Publications”

You can also search for topics outside the library database ecosystem. Google Scholar, Academia.edu, and other sites gather information about articles and resources from various sources, and make it accessible in different ways. In many cases, you won’t be able to get direct access to an article this way, but you can read the title and abstract and other information, and figure out how much you want to track it down.

One of the problems with database searches is that computers are often stupid. While Google and Amazon have a lot of data to do predictive searching, the academic journal databases aren’t usually quite so wide-reaching. If you search on a different term or a different way of wording something, you might not find what you’re hoping for.

These things might help:

  • Take a quick look at Wikipedia, Google Scholar, and other public resources to see what kinds of terms or phrasing show up. Sometimes these tools will help you make better searches.
  • If you find an article that looks promising, check to see if there are subject headings assigned by the database. You can often click on these to find other similar articles.
  • If you find an article you like, check out more about the author. Often they’ve written other things on similar topics.
  • Check out the articles they reference – it’s a great way to find more similar items.

As you go, it’s worth paying attention to terms people use. Many academic fields have preferred ways to phrase things (at least at the moment) so figuring out what those are will help you narrow down your research much more effectively. The same thing goes if you’re researching something where the name has changed: dig a little and figure out alternate possible names, and you’ll likely find more articles.

How research has changed : online catalogues

Welcome to another post in how research has changed (well, for those of us who are more than 5 or so years out of school.)

Today’s installment is about online catalogues.

Massive pendulum clock (from the Warner Brothers Harry Potter studios) with the text "Times change"

The state of the map

These days, most libraries (even very small ones) are likely to have some online method of accessing their collection online.

If you’re responsible for a small library – like many religious communities have, or community centres or hobby groups, there are some great tools out there to manage your collection.

My personal recommendation is LibraryThing, which has an option called TinyCat that provides circulation and other tools to small libraries. TinyCat is free for personal use (which covers ‘I am lending things from my personal collection to friends’) and very affordable otherwise.

Bigger and established libraries obviously have more elaborate systems – which can be a good thing or a completely overwhelming thing, depending. Sometimes it can be really hard to figure out how to do a search, or what works. That’s what this article is for – to give you some tips.

WorldCat

WorldCat is what is referred to as a union catalogue. Thousands of libraries around the world share records, so that you can try searching on a title (or author, or subject) and see books and other items.

You can enter a zip code to figure out what libraries near you might have a copy (very useful for figuring out if you can get a copy easily, or need to look at interlibrary loan. And if you need to look at interlibrary loan, knowing where there are copies can help you with the request.

WorldCat is also great for helping you figure out things like the most recent edition of regularly revised books, or tracking down older books that may not be in bookstores or in print anymore.

Library of Congress

In the United States, many books end up in the Library of Congress, which is the library of record for the country. (Other countries have similar things). This covers books published in that country, and also selections from other places.

The Library of Congress catalogue is a good way to find out more about topics, titles, and authors – and it will also help you find the most widely used subject headings for many topics.

Information in entries

Many online catalogues have some additional nifty tools that can help you. For example, you are often able to click on the subject headings for a particular title, and it will help you find other books with that subject. (You can do the same thing for the author, and sometimes for other aspects.)

Some catalogues have an option to ‘browse nearby on shelf’ which will show you titles that are near the one you’re currently looking at. This is really handy if you want to see other items that are closely related but may have different subject headings assigned.

Limitations

Of course, not everything works in an ideal way. So, as well as talking about the awesomeness of online catalogues, we have to talk about some of the limitations.

Not all books are in libraries

The biggest one is that not all books end up in libraries.

Many libraries don’t collect widely in the popular Pagan and magical title areas – they’ll get a few every year, but not everything that’s published. The same is true for other topic areas, especially those that rely on self-publishing, small niche publishers, or other areas of publishing.

For these, you’ll have to go to places that focus on that topic, to commercial sellers (at least to get a sense of what’s out there) or to resources like bibliographies and publisher websites as you can find them.

Not all libraries are part of WorldCat

Being part of a union catalogue system comes with obligations for the libraries – and those don’t make sense for most smaller specialised libraries. These can involve things like how records are shared (small libraries may be using software or formats that doesn’t make this at all easy), involve staff time they just don’t have, or other factors.

(The library I work in doesn’t share our catalogue with anyone, though it’s available online. We use both a less common back-end, and we use highly specialised subject headings that would mesh badly with other systems.)

You still have to figure out access

Just because you know a book exists doesn’t mean it’s easy to get your hands on it, unfortunately!

You may still have to figure out how to get a book through interlibrary loan, track down a used copy you can afford (if one exists), or get yourself to a library where you can access it. But at least, with modern tools, you can figure out what your options are, mostly from the comfort of your computer (or even a mobile device.)

Next time

Next time, I’ll be talking about databases and options for access.

How research has changed : digital access

I was talking to a friend online a couple of weeks ago, who was marvelling at how easy it is to do some kinds of research now – and how much has changed. So let’s take a few posts to talk about how – and what you might want to know.

Massive pendulum clock (from the Warner Brothers Harry Potter studios) with the text "Times change"

A little background

When I was in college, in the late 90s (I graduated in 98), we had computer catalogues, but they were often a bit limited. They’d tell you what that library had, but finding out what other libraries nearby had was complicated. Systems often didn’t talk to each other well. You could often do searches, but not find similar books, or books that were nearby on the shelves.

And while there were some computer databases out there, a lot of journals, you still had to go look up topics in the index – printed volumes that came out with additional volumes every so often. Once you figured out what issues you wanted, you’d then have to go walk down rows and rows of shelving and find the actual physical copy. If it was actually on the shelf, and someone else wasn’t using it (or it hadn’t been misplaced.)

As you can imagine, this all took rather a lot of time to even figure out if a thing you were interested in was available, never mind looking at it to see if it was useful for what you needed.

By the time I finished graduate school in library science almost a decade later, in 2007, a lot of things had changed. Catalogues talked to each other much more. And a lot of academic journals had at least an online index, and often online access to articles. You could do a search, find articles (or at least the basics and an abstract) and then go find the article if you had to.

What’s out there?

There are three huge things that have changed in the past decade or two.

  • Figuring out what books (or other resources) might be available.
  • Much more rapid access to them in many cases
  • Easier to find specialists, archives, and museum collection items.

All of these combine to change how research works. (I’m mostly talking in the humanities here, obviously the sciences are different!) We spend a lot less time just getting access to materials or figuring out what materials we might eventually get access to, and can spend a lot more time actually studying those materials, or reading more about them, or accessing detailed research materials.

I’ll be talking more about online catalogue resources, online database resources, and citation managers in future posts, but I want to talk a little about finding experts here.

Experts and specialist knowledge

One of the things that the Internet has made much better is that we have a lot more access to historical material than we used to. Many libraries and archives have been able to digitize at least some of their material.

Why not all of it? Most archives have some things that are confidential or restricted for various reasons. But also, there’s a lot of material that may not be a big priority for researchers, or is difficult to digitize. The library I work in, the archives have papers of the institution’s directors, and the more recent ones are restricted since they may have discussions about students or staff who are still alive.

We have huge collections of incoming and outgoing correspondence, but the outgoing letters are on very thin carbon paper, and difficult and time-consuming to scan well (and also, mostly not a high focus of research interest) so they’re not as high a priority as other materials that are more commonly asked about, or that are easier to scan.

But I digress.

A thing that the Internet makes a lot more possible is figuring out if there’s someone out there who is an expert in the thing you’re doing, or is a librarian or an archivist or a museum curator whose collection has more about a topic you’re interested in.

Obviously, you want to do research in other ways, too, but there are a lot of solutions, now, for those questions that books and academic articles don’t answer (yet!)

A lot of what I do at work is help point people at resources and materials – because I work with those materials all the time, and they don’t, and I can say “Oh, yes, this will help.”

We have resource guides that deal with some of the more common questions and issues so that we can pull them out – they took me a while to write up, but now they’re done, and they’re helpful to people.

Anyway, these are now easier to find than they used to be. Most collections of any meaningful size will have a website, and if you can hit on the right search terms, or do a little digging (like looking for institutions or organizations associated with the thing you’re interested in) then you can find more resources. Often the websites themselves will have a lot – but even better, you can find other ways to connect or communicate. Even if the first place you try doesn’t have something, maybe they’ll know someone else who does and can point you there.

(Some institutions are really competitive with each other. But there are others out there that are just delighted to connect people with information, however that happens.)

Next time

I’ll be tackling online catalogues in my next post, with a few great resources for figuring out what materials might be out there, and how to get hold of them.