Mechanics of a project

My new project seems an excellent time to walk through a way to think about a large long-term research project (it’s always nice to have a handy example!)

Tea ball with a mix of herbs and dried flowers, cracked slightly open.

Step one: What am I trying to do here?

I described a lot of this in my previous post, but I want to turn out articles that take on the core theories behind what a given plant was used for, and provide more information about them, in a way that allows people to figure out where the information comes from.

(As I said, there’s nothing wrong with intuitive response – but it helps a lot to know what’s underlying it. A lot of our intuition is built on our experiences and the connections we’ve made between our experiences, so knowing what’s influencing that is pretty helpful.)

It’s also really helpful to know where a particular idea comes from if you need to make an adjustment. For example, if you need to substitute a herb you’d use for a spell (you can’t get it, or you’re allergic to it, or some other reason), wouldn’t you rather have a detailed idea of what people think about the alternatives? Too often, people put a whole lot of things together, while forgetting that ‘herb that does passionate lust’ is possibly a different flavour (magically speaking, and possibly also physically) from one that is known for helping build a long-lasting committed relationship.

Why do I want to know about the source?

I don’t believe that older sources are better. A quick look at medical history suggests why learning things is so powerful and important! But I do think knowing where our ideas and information comes from is very helpful, in figuring out what it means to us.

I want to look at the sources to figure out where they came from, and to begin to understand the other associations. One common one from the Middle Ages is the question of why certain figures (Mary is a common one here) are so commonly shown in certain colours.

In Mary’s case, it’s a particular shade of blue. She’s painted in that shade, because it was an incredibly rare and expensive colour to make at the time. So the same way you might put gold leaf on the most important pieces of art, you painted that shade of blue to demonstrate how something in the picture was important or central or most honoured.

In this day of other options for colours, maybe that reason for choosing blue is less relevant than the fact that we know psychologically it is calming, or that it echoes water, or some other reason. (Probably, given colour symbology, multiple different reasons.)

Step two: What do I need?

I need some sources. My idea, to start with, is to pick a number of well-known plants and culinary herbs, things that are widely used and pretty widely documented. (There are whole books devoted to roses, for example!)

And then check out those items in a selection of well-known sources.

This means I need to collect those sources, which will fall roughly into three groups:

Early sources

By which I mean mostly Classical sources – Greeks, Romans, and maybe some Medieval and Renaissance texts. These are things that largely predate our understanding of modern medicinal uses. (In other words, some of their medicinal uses worked, but they might have the wrong idea about why. In other cases, the suggested things and their uses are just bizzare. Want more about this? Listen to just about any episode of the podcast Sawbones…)

Early modern sources

In the 17th to 19th centuries, you start getting a more systematic review of medicinal uses – but of course, you see less discussion of magical uses. However, many of the herbals of this time included folklore and stories. A great example here is Nicholas Culpepper, whose Complete Herbal is a classic in the field. This exists as an ebook on Project Gutenberg, which has the advantage of being searchable.

Modern sources

There are of course dozens, hundreds, of modern resources out there about these topics, from a variety of different perspectives (medicinal herbalism, magical herbalism, religious and magical sources, folklore collections, and many many more.) Making sense of them is baffling.

Obviously, I’m not going to read every modern source – my time, my library, my available ability to hold things in my head won’t allow it. But I can look at a few of the most widely referenced ones, and look at what they talk about, and try and track down stories. For example, Scott Cunningham’s Complete Book of Magical Herbs doesn’t have citations, but I can use it to help me look at stories to trace backwards. I won’t be able to figure out sources for all of them, but I can do some.

Step three: Make a plan

Identify some widely referenced sources, and see what a handful say about each plant, and then follow stories from there. For example, Culpepper, talking about saffron, says: “It is an herb of the Sun, and under the Lion, and therefore you need not demand a reason why it strengthens the heart so exceedingly.” From there, one can follow some notes about where it comes from, how one recognises a plant suitable for use, and so on.

I full expect that sometimes later research will turn up new information or new sources to explore. But one of the things I want to do with this project is model how a long-term project can go, how there often is a spiralling pattern to the work, where you come back to things over time as you’ve learned more or have a new way to connect them.

Mixed with the plants and stones, I also want to highlight useful books, or at least books that are relevant to the project, putting them into context of why they were writtten, what kinds of information they’re interesting and useful for. High on my list is Victoria Finlay’s Color: A Natural History of the Palette which is a fabulous look at stories and information about colour.

Challenges of research

Wrapping up this post, I want to talk about a few challenges of research.

The biggest one, I suspect, is going to be variations in names. In books written before we had species names for plants, they can get referred to in a wide range of ways. Sometimes the names are consistent through the centuries, but often they’re not. I’ll have to do some digging and hunting to figure out what the references might be in many cases.

The other big challenge I’ve discussed above: so many sources, so little time. The only way to do this is to do things in a manageable chunk, and remind myself (and everyone reading) that I can and will come back.

New research project

So, I’d planned for this post to be an introduction to Zotero, as an example of how to use a citation management tool.

But then i was talking to a visiting friend about a project I’ve been kicking around for a while, and got some great ideas on how to approach it. So instead, we’re going to have a post explaining the project, and then I’ll be using it as part of an example of setting up Zotero, since it’s a great fit for that.

Tea ball with a mix of herbs and dried flowers, cracked slightly open.

What’s the project?

If you’re familiar with the Pagan or magical communities, you’ve probably come across the term ‘correspondences’ before. Basically, it’s a concept that relates to certain theories about how magic works (or might work), that different plants, animals, stones, foods, and other materials have properties that are associated with particular energies, deities, other non-physical beings.

The idea is that you can use these items in your magic, to help you with a specific goal. You might use a particular herb for love, a given stone for clarity of communication, a specific colour for getting other people to take you seriously. (Some of these are magic. Some of them are psychology. Often the line betweeen the two is surprisingly narrow.)

Where do correspondences come from?

Here’s the thing. A lot of times, people will quote things in books about what is associated with what – and there’s no source and no explanation.

This drives me up a wall. I have absolutely no objection to someone saying “This is my intuition and experience, it’s not based on anything else.” I am all for people doing what works for them. But that’s different than saying “Traditionally, this is used for X.” Who says? Where did they say it? Was it just them, or is that a pretty common thing, not all based on the same source?

This is a huge problem – and a huge project. The possible sources for this kind of information, even if we’re just focusing on Western Europe, and in about the last millenia, span at least a dozen languages, many different places, and a huge range of possible names for things. (Since the nomenclature we use in modern science is pretty modern, all things considered.)

I’ve been thinking about the problem for a while, and I want to try tackling it. Whatever I do won’t be comprehensive – there’s no way I can promise that. I don’t read enough languages, I have other things in my life besides this project, and I want to try and keep both my research and the actual output somewhat manageable.

The suggestion I got

My friend suggested it’d be a great project for a Patreon. I haven’t set one up yet! I want to work on a couple of example essays, first, so I can get a sense of how many I am likely to produce in a given month and how to structure the articles usefully. It is a project I want to tackle for my own reasons, and sharing it with others would be wonderful.

How might this work?

As you might remember, one of my pieces of advice for any significant research is figuring out how you know when you’ve found what you’re looking for or gotten as much as you can for your goal. Here’s the time to do that.

What do I need for this project?

I’d like to create a summary article for a lot of different items (herbs, stones, plus probably some on different animals, colours, and other things) that talks about the basic physical reality of the thing (where does it come from, does it have any medicinal properties or is it used for art or food?) and then that discusses some of the major stories and associations, plus where they may come from.

Realistically speaking, I’m quite sure this will be relatively easy for some items, and terribly difficult for others. I may start on one and get stuck pretty fast! Or I may find pieces that I can’t sort out, given my available time and resources. I expect I’ll be able to track down sources for some of the stories and correspondences, and not for others.

I also expect there will be revisions over time – because as I look for other items, I’ll come across different things, or stories, or places where someone translated the name of the plant a different way, and a connection becomes apparent. Or perhaps I’ll be able to make special trips to look at rare manuscripts (I do live in the Boston area, and we have lots of libraries with great rare book collections!) Or I’ll find more books and articles that give me more details.

How big a project is this?

Well, it’s huge. But there are two ways to limit it. One is about how I go about the project, and the other is by scope of what I’m going to research.

It would be tempting to go about this project by finding a given source that has information – the writings of Pliny the Elder, or various naturalists who collected folklore about the plants they described, or early grimoires – and index what they have to say. While that might be useful, it is also exactly the kind of project that can bog down very easily, or feel overwhelming very quickly. Also, it’s a kind of research I can do, but often don’t find particularly fun.

So let’s go with the other approach, which is to start with a particular item – a given herb or stone or other thing that has correspondences – and look at a reasonable selection of material that might have things about it, do a range of searches on the open web and in some databases like JSTOR, and rummage for what I can find, tracing sources back where I can to something that is close to the starting point. Pull them together, when I’ve got a satisfying amount, and then move on to the next one.

The other way to limit the scope is to look at what items I’m taking on. Realistically, my language skills are a mcuch better fit for things derived from from Western Europe than other places. (I read and write modern English, but can also take a reasonable stab at Middle English, French, Attic Greek, and a bit of German and Latin. Certainly enough to poke at things with translation tools and get a sense.) I have a lot less of a chance for things that started out in Arabic, or in Chinese or Japanese, or dozens of other languages.

The other part of this, of course, is in what I use. I live in a global community, and have access to herbs and spices from around the globe. But at the same time, my actual magical practice is rooted partly in Western Europe, and partly in where I live (New England), and my research is going to focus first on the things used in those places. Again, I may very well expand in some cases, but I know there are huge swaths of magical and ritual practice that use things I’ve never explored. I’d rather leave research on them to people who have those skills and experiences.

What about the practical aspects of this?

My idea is that I’d post one or two articles every month (probably: see above about testing how long it takes me to produce something useful first.) I suspect I’ll focus heavily on stones and herbs, but I may include other kinds of things from time to time (colours, animals, other kinds of plants, foods).

I want to produce something that is useful to me, to my covenmates. But I also want to produce something that’s useful to other people, because if I’m going to do the work, it makes sense to share it.

My current plan is to make Patreon posts, and periodically (every six months or year, depending on how many I’m turning out and how long they are) collect the articles into something available in other formats. Some of the research will involve getting more books: the Patreon money will go toward buying those books, and also (if there’s money for this) throwing money at some things that save me time so I can spend more of it on the project.

What does this mean for you?

If you’re interested in this, I will put the most frequent updates in my newsletter – sign up, and get a complete list of what I’ve written recently, plus other links I’ve found intriguing in the past fortnight. But I’ll post an announcement here and a few other places when I get things up and running.

If you have thoughts or ideas (or suggestions for a particular set of correspondences for me to tackle first), I’d love to hear about that, too. The easiest way is through my contact form.

Research tools: how to choose

Last week, I talked about different kinds of tools you might use for research. Today, I’m going to talk about how to choose tools.

Research tools: an astronomical device opens up like a pocket watch with many tools

Where will you use it?

This is one I think about a lot.

Some people do all of their research work or personal computer work on a single machine. I am not one of those people.

I work on a Mac at home (I’ve been a Mac user since … well, before there were Macs, technically, I started on an Apple IIC.) I have specific software I use on my home machine – Scrivener for long-form writing. I use Aeon Timeline mostly for fiction projects, but I know people use it for historical research as well. (It is an excellent and detailed timeline application.)

But some things I need access to on multiple machines. I want to be able to pop a note into my to-do list if I think of it at work, so I can follow up later. I want to be able to dip into my personal email (that’s fine at my workplace, within reason).

My work machine is a Windows machine.

I could bring a personal device with me. In my case, that’s an iPad with a Bluetooth keyboard: it has the iOS app version of Scrivener on it among other things. But the wifi is unreliable in my office, so it’s often not usable. Even for writing on breaks, that’s not ideal, because I need to be able to sync files, and keeping up with the syncing at home would take a fair bit of time and attention. I may want to check something that’s sitting in my email, and not scroll through a lot of text due to the small screen size of the phone, or have easier access to search tools.

So for me, I want tools that have at least some option for web access, even if I also use a specific application on my computer or mobile device most of the time.

You may well make different choices, if you only need to access your work in one location, or you have (and regularly carry) a laptop. Or if you’re regularly doing work in places without reliable internet.

Make the choices that work for you, reevaluate as you change jobs or technology, to make sure they’re choices that still work for you.

Backing up

One of the very first questions you should ask yourself about a tool is how the information is backed up, and how you can get information out of it in a format you can refer to or (ideally) transfer to a different tool. You don’t want your critical material to be held hostage by a company going out of business, or lose material because you have a computer failure.

The actual issues are slightly different – you may have no warning of a computer failure (or someone stealing a laptop, or any of a number of other things): your ideal is continuous backups. That means a copy on your computer itself, a copy on a separate medium (external hard drive, USB drive, etc.) and probably a copy in the cloud (among other reasons, this means that if something happens to your physical location – fire, flood, tornado – you have a copy somewhere else.)

If you don’t want to trust the cloud for some reason, is there a friend who lives in a different region of the country who you can mail a copy periodically? (Cheap USB drive, files burned to CD-ROM, etc.)

How specific and exacting you are about your backup plans will probably depend a bit on your technology setup, a bit on how critical the files are, and a bit on how good you are about manual process things like sticking something in the mail.

Me, I have a copy on my computer, the critical files sync to Dropbox, and I periodically pull copies onto a separate drive (I usually leave mine at work, for a backup in a sufficiently different physical location.)

If you are working on something that you absolutely can’t recreate in a timely manner (like a dissertation or all of your research notes for multiple years) you want to be more attentive to your backups than writing you do solely for fun or emails to a friend. (Those are great to back up, and they can hurt a lot to lose, but they probably won’t derail a significant part of your life for months or years if you lose them.)

How do you get information out of it?

The other side of this question is making sure you can keep control over your information and research, no matter what happens. So long as you’re regularly using software or a tool, you should have at least a bit of warning before a site or tool disappears (though sometimes it can happen nearly overnight!) It’s good to get in the habit of pulling an export regularly.

There are a couple of different considerations with exports.

It’s often easiest to pull a copy that has all your information, but not in a format you can stick into another program easily. For example, it may be easiest to pull a copy of your material as a PDF, but you’d need to do some wrangling (possibly with some specific software) to get the text out easily. VoodooPad, an application I use for keeping personal wiki-type information (where I can link to other pages in the document) will let me export in a number of formats, but I may lose formatting and some connections between files.

Knowing what your options are in advance, and picking the best options for your current needs is usually a good way to go.

What format should you save things into?

Good question. The formats that will absolutely save the core of your material (but may lose formatting, connections between files, or ‘about this work’ type information) are plain text and csv files for spreadsheets. (CSV stands for ‘comma separated values’ which means that each column is separated by commas. You can often set a different character, if your actual data may have commas, and then tell the program you load it into what you picked.)

A slightly more complex option for text is RTF or rich text format. This will save much of the formatting for you, but it may add glitches or not include some specialised formatting .

Saving files in widely used formats – such as Microsoft’s .docx or .xls formats – will often work too, but again may add some additional material or leave some things out. (Microsoft formats are sort of notorious for bloating files with a lot of additional formatting data that can cause problems on import.)

Sometimes you may have the option to export as an HTML or XML format – usually this is an option for linked pages, like a website or wiki. These formats should preserve the links between pages, and you can access them by opening the file on your computer as if it’s in a web browser. (And from there you can save the material into other formats if you need.)

Thinking about how you might want to use the information if you need to resurrect it is usually a good indicator for your best format.

Research tools: what I use

Time for a new series – this one on keeping track of reference materials. In this post, I’m going to talk about a couple of different aspects. Then, in future posts, I’ll be looking at some specific tools to keep track of references, like Zotero (one of the citation management programs.)

Research tools: an astronomical device opens up like a pocket watch with many tools

Why have a system to keep track of things?

If you’re only managing a few references, or a few sites (for values of ‘few’ that go up to about 50), you probably don’t need a big system – you can keep track of a few dozen things in a word processing or text file pretty easily.

But once you get over a few dozen, it gets harder to keep things organised. Our brains have a harder time processing a long list: it’s easier to miss something, or duplicate entries, or otherwise have housekeeping errors. Different people will have different length limits, but somewhere between 20 and 40, you’ll probably hit your personal ‘this is too long’ .

The same thing goes if the items you’re keeping track of fit into multiple categories. It’s one thing to have a list of items you need to read – but what happens when you want to list things as “to read” and then by the type of content. How do you file things? Do you list it every possible place? That makes for a much longer list.

Either way, if you want to keep track of lots of things, you need a system.

What kinds of options are out there?

For many people, the system that works best will depend on what you’re trying to keep track of. You may need a different approach for websites than for print books, or a different way to handle ebooks.

I suggest that you think about the difference between what you own, and what you use as reference material. You may want to own a book (and keep track of the fact you own it), but not care about it as reference material. You might have a system for keeping track of books, and a different one for tracking reference material. Having multiple systems can be annoying (and potentially confusing) but not if you’re clear about why you’re using a specific tool.

Here’s what I use:

What books I have copies of: LibraryThing

I use LibraryThing to keep track of everything I own – print and ebooks. Items get entered in the catalog. I have a collection of print books (so I can just search things I have in print), or ebooks (just things that live on my phone.)

Everything also gets content-specific tags like genre, or when it’s set if it’s historical, or topic. I keep my tags edited, so that I can search them easily, and I refine them regularly so that I don’t have tags with only a couple of items unless it’s really necessary. (I am not a fan of lists overwhelming for me.)

You can add books with a simple form or by importing a spreadsheet if you’ve been using a different tool, or by scanning the back of books in many cases with a tool in the mobile app.

If they’re print books, they also get assigned a tag that indicates where they’re shelved (so I can find them again.) The shelving tags are really simple – I have the IKEA cube bookshelves, so I do A1, A2, A3, A4, B1, B2, B3, B4, etc. Each cube only has about 12-15 books, so it’s easy to spot once I’m looking in the right place.

The actual ebook files are managed through Calibre. This means I can search or tag and manage files much more easily, or save files to a different place if needed.

Costs: LibraryThing has a fee for over 200 books in an account. The fee is $10 for a year, or $25 for a lifetime account. (Obviously, one of these is a much better buy if you expect to keep using the site.) Calibre is free, but they appreciate donations!

Websites I want to share: Pinboard

Pinboard describes itself as “a bookmarking website for introverted people in a hurry.” (It’s also been described as anti-social bookmarking, in contrast to social bookmarking.)

I have a personal account, and one for coven links. My personal account is private and where I put things I want to find later, the coven site is public. You can set tags, group tags, and do some additional things.

I use Instapaper as an interim tool to keep track of things I want to save, read later, or think about reading.

Costs: There’s a yearly fee for new Pinboard accounts ($11 a year right now) and it’s well worth it if you want to share bookmarks, keep track of more items than your web browser’s bookmark tools will handle easily, or access bookmarks from multiple browsers or devices.

Instapaper is free, and there are other similar services (Pocket is the other big one)

References (books, websites, PDFs): Zotero

Zotero is one of a handful of widely used citation management programs, and the one I’d recommend for most people – it’s free, has an add-on for Chrome, and has other benefits. It will help you keep track of references, and you can produce a formatted bibliography with a few clicks (though you probably still need some human review. Citation styles are tricky!)

If you’re in academia, you may have access to other options through your school. Your library (or the library website) probably has more information. (There are certain advantages to using the same system other people in your institution are, and if you’re working in a research lab or closely with a professor or researcher, you may not have a lot of choice about which tool is used.)

Costs: Depends on the tool, but Zotero is free. If you want to store PDFs on their site, you’ll likely need to pay for additional storage if you have more than a few.

Notes and writing: Ulysses, 4theWords, Scrivener

My briefer writing is sometimes a little tricky, because Ulysses is a Mac only app, and I can’t access it at work. You can have lots of folders, tag items, create smart folders, and much more. There’s even a publishing option for putting things into WordPress (and a few other tools).

I’m also using a site called 4thewords which is exploring a gamification approach to writing. You battle various monsters and win by writing a certain number of words in a span of time. As you do quests, you can earn items for your avatar or other game objects. As I wrote this sentence, I won a battle of 500 words.

It also keeps some stats I’m finding more useful than I expected about how long I was actively writing a given piece. (And I’ve found the battles a certain incentive for doing just another hundred or two hundred words, several times.)

Scrivener is where my long-form writing lives, and it is amazing for being able to move things around, save a piece you cut but want to keep just in case, and has a lot of tagging and drafting tools to help.

Costs: Ulysses is a subscription (it’s also currently part of the SetApp subscription option, if you’re interested in other apps they offer, which includes Aeon Timeline, a popular timeline app) and 4theWords has a month free trial and then is $4 a month.

There are obviously lots of free options in this space: I use Google Docs for sharing word processing with other people, and especially for editing that we’re looking at together. It’s also my go-to for things I may want to add to during lunch at work.

Come back next week.

Join me next week for part 2, things to think about when choosing tools.

If you use other tools you think I should look at, I’d love to hear from you – the contact form is probably the best way.

How catalogues work: algorithms

The last part of how catalogues work is looking at algorithms.

( was not a computer science major. This is going to be the non-technical discussion.

Also, the two links I mention here are from 2016, and technology has moved on a bit, as technology does, but these are good illustrations of my specific points.

Catalogues: Wooden chest of old-fashioned catalogue cards

What is an algorithm?

A good definition is that an algorithm is a step by step way of doing this. This video from the University of Washington notes that sorting your laundry is an algorithm (is it a white shirt? This pile, these things get washed together. Is it a red shirt? That goes in a different pile. Does it need special treatment? Follow these steps.) The video’s a great overview of the topic in a couple of minutes.

Computers are extremely fast at doing this kind of step, but how successful the algorithm is depends on what the people programming the algorithm have told it to do.

An important digression

The fact that human beings design these lists is a particularly business-centered reason why diversity in technology (and in companies in general) is such a big deal – people who have different backgrounds, life experiences, or ways of looking at the world are going to think about different things in the design process,. When that’s managed well, a diverse group will likely come up with algorithms and other programming that work much better for a wider range of people.

(An example here – though it is a legitimately sort of complicated record keeping – is that the Apple Health app didn’t include any menstrual cycle tracking for a long time, and it’s still much more rudimentary than some other apps. If your body does things outside of the expected timeframe, you have fewer options.)

What does this mean for catalogues?

Some of the things a catalogue uses an algorithm for are pretty straightforward. Sorting a list by the last name of the author, or the title of the work, or the year it was published is pretty simple, so long as the data is consistent.

What data might be inconsistent? An example would be if the date formats swap between United States standard dating (Month-Day-Year) and the Day-Month-Year common in parts of Europe, your results are going to be confusing. Good data is essential to sorting and organising your catalogue.

(This is why I am spending my summer cleaning up a lot of data in our catalogue at work. This week, this has meant hours of moving identifying file numbers from the format area, where they shouldn’t be anyway, to a different area, and making sure the correct format is actually entered.

We can automate some kinds of data changes, but this one requires moving data into a different field, and we don’t have an easy way to do that automatically.)

Where it gets complicated

However, once we get into things where there is a bit more of a value judgement.

What kinds of images should we get if we search on “beautiful”?

One of the examples that has stuck with me the most was something illustrated in a keynote Dr. Safiya Noble illustrated in the keynote she gave at the LibTech conference in 2016 (LibTech is my favourite library conference for a reason) It’s worth noting this was given in March of 2016, and she talks about the manipulation of algorithms and the effect on elections….

In her keynote, she did an illustration where she did a search on “Beautiful” and at that time the algorithm turned up a lot of landscapes (that were really gorgeous). But if you searched on ‘beautiful woman’, you turned up white women (and white women of a particular kind when it came to facial structure, hair, body size, and a bunch of other characteristics’. That’s what happens when human programming goes awry, or is not sufficiently questioned.

And if you tried searches like “black girls”, you got a whole different set of results, and much more mixed ones in terms of positive and negative.

So, when your library catalogue tells you you can sort by ‘relevance’ or gives you options for ‘similar topics’, there are probably a lot of different things at play. Usually, there’s software decisions in there somewhere. Some of these may be accessible to the library staff, others may be decided by the software programmers, and the librarians may have no idea how it works.

(In our new catalogue, we can choose which things to weight more – so for example, we could choose to weight phrases in the abstract (where we put a summary of the content) more than the title, or less than the title (depending on what decided). We haven’t played around with this much yet, but it’s a way to help refine options for people.)

Even more complicated

Large companies – Google, Amazon, Facebook, any of the big ones – also look at your reactions to what you click on, where you spend time, what you click away from and when (and where you go to) – because it helps them create vast maps of data they can use. Sometimes this is really handy (like when Amazon’s list of also-boughts shows you a book you love and you had no idea it existed, or Spotify’s algorithm suggests music you really like.)

Sometimes it’s a lot creepier and more awful. There’s a famous story, when it comes to algorithms, of Target figuring out a teenager was pregnant based on other purchases before her father found out about it, based entirely on purchases that were not specifically intended for a baby, but rather things like body lotion, a larger purse, two common supplements, and a bright blue rug.

And of course, it gets even scarier if we start talking about government agencies making decisions about who can get visas, fly, or do many other kinds of things, based on algorithms and data management decisions that are obscured to the end user of the information.

What you should take away from this

Trusting a computer on fairly simple sorts (like title or author or date) is fine – but if the computer is suggesting related items, and you care about getting a wider range of options, or you are concerned about implicit bias in how a system designed by unknown people might work, that’s a good time to do some more digging, or to try a variety of searches with specific parameters so that you get a sense of what is there, what’s recommended, and maybe what isn’t.

Simply knowing more about algorithms will also give you a lot more choices and awareness.

How catalogues work: editorial influence

There are several places in a catalogue where there’s a degree of what might best be called editorial influence. More bluntly put, it’s people (at some degree) making decisions about these things, and those decisions come with biases, both good and bad.

We also use algorithms and those algorithms have biases, and that’s a different topic (and one for next week.)

Words mean things

Those words we use as a controlled vocabulary come from somewhere. Humans came up with them – humans with all their virtues and all their biases.

Sometimes, terms were recommended by experts in the field, or people who knew a topic intimately. (Those aren’t always the same thing!) Both these perspectives bring history and assumptions with them that may or may not fit in with the larger collection or way terms are arranged.

Sometimes those terms were the current thing at a particular time, but we have come to new understanding (this is true for a lot of terms about gender identity and sexual orientation, and also for terms around neurodiversity, and around topics like disability.)

Sometimes topics are entirely new – as technology changes, we need to come up with words to help us find things about it. Do we catalogue it by the current tech device, or do we use a more general term, because the iPhone X of today is going to be barely in service in five years, and mostly forgotten in fifteen?

Sometimes we have to pick one – like my exaple in earlier posts, you sometimes need to pick an option so that you have one main subject heading, rather than making people search through

  • Cat
  • Cats
  • Felines
  • House cats
  • Kitties
  • Pussy cats
  • Fuzzballs who take over the bed

(Ok, that last one isn’t very likely.)

Some of these terms are more clinical than others. Some are questions of ‘do we make a standard of singular or plural for groups of things’? Some are ‘do we include a common nickname or slang term’. Some terms might be more historically dated than others.

Why does this matter, anyway?

This might not seem like a big deal with cats – but it can be a bigger deal if you’re talking about health information, or topics where there’s often a difference between experience of a thing and professional knowledge and training about a thing.

(Dealing with the legal system as a person dealing with a crime versus lawyers and judges. Dealing with a health issue as a person experiencing a problem versus being a doctor or nurse or health care professional.)

Sometimes terms can bias our assumptions about results. I mentioned the issue with the Library of Congress wanting to drop ‘illegal alien’ and use other terms, and being blocked by Congress (because of the role the Library of Congress plays with the actual work of Congress and the need to reflect the terms used in the laws.)

Individual library systems may decide to change their terms for these kinds of topics, to create a more welcoming and diverse environment in various ways and to reflect the needs of their particular communities.

That part, of course, is where it can get complicated. Libraries are aware that they’re serving the people who come into their building (many of whom do so fairly anonymously: librarians don’t know what you look at on the shelf, and many libraries deliberately do as little tracking of activities, loans, and other user-specific details as they can get away with, to preserve patron privacy.)

But libraries also serve people who never come into them. Not just the people who use online resources (libraries can see what’s getting used), but libraries should also be thinking about all the people who don’t use their services but could.

This is most easily illustrated by public libraries, since they serve a particular location. A library might notice that they’re seeing some types of people use the library regularly, and may be able to tell from demographic information about their area that they’re not seeing some groups as often as they should be.

Sometimes that’s about the words we choose. Whether people can see themselves reflected in the library and the catalogue and the displays.

Who decides subject headings for a work?

There is also a degree of editorial influence on who sets the subject headings.

Large publishers often suggest them – you may see this in the front of the book, on the copyright page. Below the legal information, there will be some suggested subject headings and call numbers. Libraries don’t have to listen to that, but in practice they often do unless there’s a specific reason to overrule them.

In other cases, it may be a central cataloger (in a large library system) or an individual librarian. It’s hard to tell!

Generally, no one in this process (except maybe someone on the publisher’s end) has read the whole book, and the subject headings will reflect the large topics in the book, not specific ones.

People will also pick how specific the subject headings are. For example, do you pick United States – History or Massachusetts – History? Or maybe Women – United States – History – 20th Century. (Here’s a page explaining some of the options from New York University.)

Next time, a brief look at algorithms and how they affect searches. (It’s a huge topic)

How catalogues work: figuring out search terms

One key step in using catalogues is figuring out search terms.

Catalogues: Wooden chest of old-fashioned catalogue cards

What kinds of searches can you do?

In most electronic catalogues you can search by all sorts of things.

Many libraries have gone to the single search box (popularised by Google). Technically, this is called a keyword search, and it usually searches all the text in the record.

Pro: You don’t need to guess which field a given thing might be in, and searching on things that aren’t subject headings but show up in the title or blurb will still come up.

Con: You can get a lot of false results that don’t actually have what you want, especially if you’re searching for commonly used words.

If you end up with all sorts of results that don’t help you, two things can help. First, there’s probably an option somewhere on that first search screen that says something like ‘advanced search’. Second, once you do a search, you may be presented with some options to help you filter the results.

Advanced search

Depending on the catalog, you will usually see a variety of options that let you limit your search in different ways. Common ones include:

  • Searching just the author, subject, or title fields.
  • Searching a range of years.
  • Limiting the results to a particular format, location (for systems with multiple locations), or sometimes specific collections (like juvenile books), or languages.

You may need to do a little digging in the help information (likely also linked from the search form) to understand your options in detail.

Limiting results

It’s sometimes (okay, often) a lot easier to start with a keyword search and then limit your results in different ways.

In my library’s catalog, I can limit by the following, to give you an example:

  • Location (so I can find books in my local library)
  • Availability (books I can get right now, either in a library or online)
  • Whether the search term is found in the title or subject
  • Format (book, ebook, audiobook, etc.)
  • What collections it is in (this distinguishes library and children or adult)
  • Places the book takes place

And then it shows me related searches, including established subject terms, and some additional suggestions.

Understanding subject headings

In practical terms, you are probably not going to do what librarians do to learn about subject headings.

(For the curious, this involves most library schools require a class in cataloging that includes a lot of the specifics. Then you go out into the world and spend a lot of time starting at instructions and hoping you’re doing it right, punctuated by asking other people if you are.)

Individual libraries also have their own policies – the library I work at has set up a list of keywords instead of official subject headings, because a lot of our needs aren’t represented in them (or are using terms that aren’t a great fit for us – they’re dated, they draw from specialities that aren’t the terms the people who use us will use, or both!)

As a library catalog user, my best tip is for you to look for hints about what kinds of terms will work. Fortunately, these are pretty straightforward

1) Try searches

One of the best tips for getting your bearings in a new catalogue (by which I mean one that’s new to you) is to try some searches of items you’re pretty sure are in there, and that are reasonably similar for other items you want to look for.

Ideally, these will be the same subject (generally speaking) as the items you want, but if you’re not sure about that, at least try for the same topic area – if you want to do searches about religious information, try other religious titles or topics. If you’re looking for history, try other historical things. And so on.

The goal here is to do a few searches and see what comes up and how the search terms work.

2) Linked subjects

In many library catalogs, you have the option to click on the subject headings to find other items with that subject heading. This can be tremendously helpful once you find one book that’s what you want. (Of course, it’s finding that first thing that can be tricky!)

You may want to add several books to a wish list or cart (whatever the catalog uses) or bookmark them before you go too far astray in your searches, so you can get back to your starting point again easily.

If you’re having trouble with searches, try simpler ones – for example, if you’re trying to search an entire title, try

3) Look for known books or topics that should be in the collection.

For example, for modern Pagan materials, I often suggest people try Scott Cunningham’s Wicca for the Solitary Practitioner, or Starhawk’s Spiral Dance. Both are commonly held by most moderate to large library systems, and they’ll give you a starting place for what terms are being used.

In my local library system, Cunningham’s book comes up with the subject headings “witchcraft”, “magic”, and “ritual”.

That’s a hint that I probably want to check ‘witchcraft’ as well as ‘Wicca’ as subject headings.

(This is because older books were cataloged before Wicca became an official Library of Congress subject heading around 2006 or 2007 – libraries don’t generally go back and recatalog subject headings unless there’s a very significant reason to, because it’s a big cost of staff time.

Something like ‘witchcraft’ and ‘Wicca’ where it can be tricky to figure out exactly which heading applies to some books, and where ‘witchcraft’ is still accurate, if a bit more general ideal, is less likely to get edited than, say, a library that is fixing or updating subject headings to reflect current understanding of gender identity or sexual orientation or legal issues.)

4) Check the ‘about’ for information or ask a librarian.

Still stuck? Check the library’s help information or ask a librarian for help – you can ask general questions, and they can help you navigate.

If you don’t want to (or can’t get to) the physical library easily, most libraries have an option for email or chat help these days, at least some of the time.

How catalogues work: Controlled vocabulary

Today’s discussion of catalogues is about how you find things by topic. I talked about some of this in my post from March about personal libraries, but I want to talk more here about how libraries select subject terms.

Catalogues: Wooden chest of old-fashioned catalogue cards

It’s mysterious

Let’s be honest. A lot of the process librarians use to select subject terms is pretty mysterious. That’s because we’re trying to label quite complex things in a very complex world, and we’re using a variety of tools to do it, because outside of very very small collections (relatively speaking – in practice, this is probably a couple of thousand books at the smallest), it’s too big for anyone to keep in their head.

On the good side, this means people have to write things down, which makes long-term consistency easier, and which can help us see patterns.

On the bad side, it means things can feel (and be) very rigid, or slow to change, or complicated to navigate. All of which can make things a lot less accessible or useful. And the speed of change often means terms don’t reflect current understanding of things like identity, culture, or communities.

So where do these terms come from?

In libraries, libraries usually pick a set of subject headings to use. The subject headings act as a controlled vocabulary (which basically means ‘we have a fixed set of terms we choose from.) Like I explained in the post last March, this is what helps us avoid using all of these terms for the same thing:

  • felines
  • cats
  • cat
  • domesticated cat

Sometimes we might want to make distinctions (domestic cats as compared to lions or tigers or snow leopards), but if we don’t, we want to pick one term and settle on it.

Libraries use one of a couple of common lists for subject headings. The most common, probably, are the Library of Congress. These are very extensive (it takes up about 20 volumes as print books on a shelf) but the fact the Library of Congress deals with so many different topics means that it’s often quite slow to make adjustments.

For example, the addition of the word “Wicca” as a subject heading only took place in about 2004, and only after a petition from a librarian. (This is often the way changes get made: one or more librarians notice that a term needs adding or improving or changing, and they provide evidence.) The term ‘Wicca’ had been in broad general use since the 1950s and 60s, so that’s about 50 years.

This isn’t always simple – here’s a story of attempts in 2016 to get the terms ‘aliens’ and ‘illegal aliens’ changed, and how the support from librarians and library associations for a student-led project ran smack into issues of law.

(Why does Congress get a say in this, you might be wondering? The Library of Congress’s first job is to provide resources for Congress and members of Congress. Makes sense if you think about the name.)

One other important note is that many libraries don’t have the resources to go back and catalogue older items to the new subject headings – so you may see pointers from new terms to check older terms as well. (This depends a lot on the library and the priority of the topic.)

Who assigns the terms?

Good question. In many cases, the subject headings are primarily assigned by whoever it is at the Library of Congress assigns the headings for that particular item. These are likely people who have some experience in the general field or area of the books, but you can usually assume they’re not experts or specialists in all the nuances of the field or topic.

(In other words, they’re not going to get really nuanced about choosing, say, a term of magic or ritual in a Pagan setting. They may assign them both.)

Usually terms are based on the few most obvious and relevant topic. If something is mentioned for less than a chapter or two, it almost certainly won’t get a subject heading unless it’s something really unusual. For a full length nonfiction book, you can usually expect 3-5 subject headings.

You can also assume the person doing the cataloguing probably hasn’t read the book. Cataloguers don’t have time for that! They’re relying on the blurb on the back and things like skimming the table of contents. Publishers can also suggest subject headings or terms to include.

Some libraries do have their own cataloguers evaluate materials and add or edit terms. This is particularly true for things like local history or other items of particular local interest.

Or a school library might assign a heading for particular regular class assignments or projects, to make it easier to find those items. (There are other ways to group things, too.) Some libraries do a “Best resource” subject heading to make it easy to find the best resources in a topic. (Mine does this.)

Next week, more about working with search terms in practice.

How catalogues work: Search

Welcome to part 2 of my series on catalogues. In this part, I’m going to talk a bit about different kinds of searches you might want to be able to do.

Skills and tools : Glasses and pen resting on sheets of printed music

Keyword

This is the kind of search many of us are most familiar with today is a keyword search. You know, the kind where you. get presented with a search box, and you type words in, and sometimes the thing you want comes out the other end?

Keyword searches basically search anywhere in the searchable text for a word or phrase (depending on how things are set up). This can be really helpful, or really horrible.

Helpful

Keyword searches can be great because you don’t have to remember what kind of information the thing you’re searching for is. And you don’t have to figure out how it might be organised in the thing you’re searching. The word matches or it doesn’t.

They’re also fantastic for something we’re doing at work – instead of having hundreds of subject headings that get used for just one thing (a person, a device or software program, a tool), we’re making sure those are in the abstract, and then assigning more general subject headings.

That way, people can both look at groups of things (handy in a rapidly changing setting like technology) and also find specific tools or people if they need to.

Horrible

The downside to keyword searching is that if the word is only in one place in the record, and there’s a typo or something else that affects how the word is entered, you won’t ever find that record.

That’s also true if someone uses a similar word to the one you’re searching on – but not the exact one. (Remember what I said in part one, about libraries not having cutting-edge computing power?)

For example, in some systems, “cat” and “cats” may be treated as different words, and typos or alternate spellings definitely will be. There’s a word that is all over our work catalog, but it is sometimes spelled with a hyphen and sometimes no hyphen (the two parts of the word together) and our catalogue searches these as different things.

Depending on the system, the catalogue may adapt some things for you, but it probably won’t be as wide-ranging as your favourite search engines.

Somewhere in between

One of the challenges of keyword searching is that you need to have terms that are unique enough to make a search find what you want – and sometimes that’s going to be really challenging.

I was part of a long-term Harry Potter project – it ran for 7 years, and over those years, we averaged 100 emails many days. As you can imagine, a lot of them had very similar terms and names in them, so we had to learn to figure out other ways to search email to find specific details (and since it was such a long project, sometimes those details were a year or two back, and relied on someone’s memory of what term we were using.)

This eventually drove me to create a wiki for the project that ended up with 9000+ pages, but that’s a whole other story…. (And set of posts.)

This is where learning to think about your keyword searches in more complex ways (such as using multiple terms, using boolean searches, or using ways to filter or limit the results) can be a big help.

Boolean?

You may remember hearing about this in library classes back in your education somewhere. Boolean is the term for doing searches that are joined by AND, OR, or NOT.

(You don’t usually actually have to capitalise the terms, and some systems may use symbols instead of words, but people often do when explaining them because it makes it a lot easier to figure out what’s going on. On some systems, you can select them from drop-down menus.)

AND means you want results that match all the items you list with AND. For example, “cats AND dogs” will return only those items that talk about both cats and dogs.

OR means you want anything that mentions either of them (or any of them, if you have more than two terms.) In this case, “cats OR dogs” would return any page that has “cats” on it, any page that has “dogs”, and also any pages that have both terms.

NOT means that it won’t return pages that have the term indicated by the NOT. So, “cats NOT dogs” would give you all the pages about cats, but not any that mention dogs. This one can be tricky because it would also leave out things like “Cats are not like dogs at all!” or “This is the page for people who love cats, no dogs here.”

In many search tools for catalogues, you can do different combinations – for example, you could say you wanted to search all items mentioning cats or dogs (keyword search), and then say you didn’t want a particular format (NOT book, in the format search).

Other tips

In some search tools, you can also do more complex searches. Usually, you can find out about the options by looking for the search help information, or sometimes an advanced search tool.

Some common options include:

1) Searching on a phrase.

Usually, this is done by putting quotes around the phrase. “Sun and moon”, for example. Normally this will search for the exact words in that order.

2) Limiting results in different ways.

These can include by date (usually you can specify a range, with some common ones being pre-set, like ‘last month’ or ‘last year’.) It can include things like ‘this email has an attachment’. It can include multiple search fields.

A lot of this depends on context and your particular technology.

3) Type of resource

In some tools, you can search by different types of resource ( for example, on Google, you can search on word or phrase, and then also look for images, news stories, videos, etc. each of which have some additional tools

Next up, talking about controlled vocabulary and why it is both handy and complicated.

4) Not finding expected results?

Sometimes you’ll do a search, and you’ll get very different results than you expected – you know there are things about that in the thing you’re searching. If that’s the case, try a simpler search (just the title or just a phrase from the subtitle, for example). Sometimes a symbol will do something you didn’t expect (in our catalogue at work, a colon will tell the computer to do a ‘from X to Y’ search, so you get really weird results when you put a colon between a title and subtitle if you don’t put the whole thing in quotation marks.)

Usually the help information or the library staff can help you sort this out.

How catalogues work: An introduction

We’re working on a major catalogue update at work, which has me thinking a lot about how people use catalogues, databases, and other collections of information.

In talking about our new catalogue, I’ve also been reminded that most people don’t know how these things work, or what might be useful to them – so it seems like a great time for a short series of posts about that.

Catalogues: Wooden chest of old-fashioned catalogue cards

The basics

So, the first thing we should start with is what’s a catalogue?

For libraries, a catalogue is a highly specialised database that holds information about books in the collection. Often these are parts of an Integrated Library System, or ILS, that tracks a whole bunch of things. Sometimes the catalogue only does pieces of it.

Common things included:

  • Information about works in the collection (such as title, author, publisher, publication information, call number, subject headings). This is sometimes referred to as the bibliographic record.
  • Information about particular items in the collection, i.e. each actual thing that’s on the shelf (or however it’s stored or accessed). This is sometimes called a ‘holding’ record (because it describes the holdings of the library).
  • Loan information about specific items in the collection and who has them.
  • Information about electronic resources (sometimes this might be a link to them, sometimes systems pull in all the things in a database so you can search for them all in one place.)
  • Additional resources the library has chosen to add (documents, files, etc.)

These records may have public notes (things to help library users) or staff-only notes (to help staff manage resources and answer questions.)

Again, not every library will have all these things.

Our collection at work has bibliographic records, but doesn’t have separate holdings records (all the information about all our copies is in a single record: this is sometimes a bit clunky, but it works okay for us because we don’t check a lot of items out.)

Likewise, we don’t have a separate circulation (or loan) module – all the loan information is in the record. Library users can’t see it, because it doesn’t display to them, just to the tools staff uses.

(In some libraries, this would be a problem, but in our library, there’s just one and a half staff members, and we both need to have access to it. The library assistant usually deals with loans and circulation, but if she’s on vacation or something comes up, I need to be able to see what’s going on and make changes too.)

Metadata

When you put information into a catalogue, you are collecting metadata – that’s the term for ‘information about a thing’.

My favourite explanation of metadata comes from a Scientific American piece from 2012 that used Santa Claus and Christmas lists as examples. Go read it, if you’re not sure how metadata works, I’ll wait for you.

So, metadata about books includes the title, author, publication information. It might also include things like if a book is considered a particularly good resource, or is on a recommendation list. It might include if it was donated (and if so, by who). All kinds of things can be metadata.

Libraries have some commonly used systems for formatting it. A lot of libraries still use MARC (which stands for Machine Readable Cataloging record). Here’s a longish explanation from the Library of Congress about the details. This provides the structure for the data.

Besides the structure, there needs to be consistency in how you write things down. For a long time, libraries used the Anglo-American Cataloging Rules (AACR or AACR2 for the 2nd edition, etc.) but now a lot of libraries use RDA or Resource Description and Access.

Why all the rules?

Computers are still fairly stupid – they’re really quick at matching up things we tell them with things that they have stored, but they need a lot of help to match up things like typos or alternate ways of phrasing things.

(Google, Amazon, and the other tech giant companies have huge amounts of resources and lots of cutting edge design capabilities to make that work. Your average library just doesn’t. Your average library is probably pretty excited if their staff computers are less than four years old.)

So, in order for the computer to match things up, the library needs to be using consistent words (what’s called a “Controlled vocabulary” for things like subject headings and formats) and an underlying structure.

What does this mean in practice?

A lot of what I’m doing in our new catalogue right now is setting up that structure and arranging the different screens so they do what we want.

For example, we have a lot of options on the screen to add things to the catalogue, but when we edit things, we’re usually only editing a couple of specific pieces. So I set it up so those are at the top of the screen, and then we can get to everything else if we need to, but don’t have to scroll to get to it.

(You have no idea how exciting this is, when you’ve been spending years having to scroll down a very similar-appearing form to look for one specific field.)

But another big part of what I’m working on is fixing things so we’re using a smaller list of terms for things like format and location. That means people will be able to filter usefully by them, which will be amazing.

(This is going to take months and months. Fixing the formats and locations are pretty quick, but we have 14,000 subject headings, and a lot of them are tiny variants or typos of the ones we actually want.)