New research project

So, I’d planned for this post to be an introduction to Zotero, as an example of how to use a citation management tool.

But then i was talking to a visiting friend about a project I’ve been kicking around for a while, and got some great ideas on how to approach it. So instead, we’re going to have a post explaining the project, and then I’ll be using it as part of an example of setting up Zotero, since it’s a great fit for that.

Tea ball with a mix of herbs and dried flowers, cracked slightly open.

What’s the project?

If you’re familiar with the Pagan or magical communities, you’ve probably come across the term ‘correspondences’ before. Basically, it’s a concept that relates to certain theories about how magic works (or might work), that different plants, animals, stones, foods, and other materials have properties that are associated with particular energies, deities, other non-physical beings.

The idea is that you can use these items in your magic, to help you with a specific goal. You might use a particular herb for love, a given stone for clarity of communication, a specific colour for getting other people to take you seriously. (Some of these are magic. Some of them are psychology. Often the line betweeen the two is surprisingly narrow.)

Where do correspondences come from?

Here’s the thing. A lot of times, people will quote things in books about what is associated with what – and there’s no source and no explanation.

This drives me up a wall. I have absolutely no objection to someone saying “This is my intuition and experience, it’s not based on anything else.” I am all for people doing what works for them. But that’s different than saying “Traditionally, this is used for X.” Who says? Where did they say it? Was it just them, or is that a pretty common thing, not all based on the same source?

This is a huge problem – and a huge project. The possible sources for this kind of information, even if we’re just focusing on Western Europe, and in about the last millenia, span at least a dozen languages, many different places, and a huge range of possible names for things. (Since the nomenclature we use in modern science is pretty modern, all things considered.)

I’ve been thinking about the problem for a while, and I want to try tackling it. Whatever I do won’t be comprehensive – there’s no way I can promise that. I don’t read enough languages, I have other things in my life besides this project, and I want to try and keep both my research and the actual output somewhat manageable.

The suggestion I got

My friend suggested it’d be a great project for a Patreon. I haven’t set one up yet! I want to work on a couple of example essays, first, so I can get a sense of how many I am likely to produce in a given month and how to structure the articles usefully. It is a project I want to tackle for my own reasons, and sharing it with others would be wonderful.

How might this work?

As you might remember, one of my pieces of advice for any significant research is figuring out how you know when you’ve found what you’re looking for or gotten as much as you can for your goal. Here’s the time to do that.

What do I need for this project?

I’d like to create a summary article for a lot of different items (herbs, stones, plus probably some on different animals, colours, and other things) that talks about the basic physical reality of the thing (where does it come from, does it have any medicinal properties or is it used for art or food?) and then that discusses some of the major stories and associations, plus where they may come from.

Realistically speaking, I’m quite sure this will be relatively easy for some items, and terribly difficult for others. I may start on one and get stuck pretty fast! Or I may find pieces that I can’t sort out, given my available time and resources. I expect I’ll be able to track down sources for some of the stories and correspondences, and not for others.

I also expect there will be revisions over time – because as I look for other items, I’ll come across different things, or stories, or places where someone translated the name of the plant a different way, and a connection becomes apparent. Or perhaps I’ll be able to make special trips to look at rare manuscripts (I do live in the Boston area, and we have lots of libraries with great rare book collections!) Or I’ll find more books and articles that give me more details.

How big a project is this?

Well, it’s huge. But there are two ways to limit it. One is about how I go about the project, and the other is by scope of what I’m going to research.

It would be tempting to go about this project by finding a given source that has information – the writings of Pliny the Elder, or various naturalists who collected folklore about the plants they described, or early grimoires – and index what they have to say. While that might be useful, it is also exactly the kind of project that can bog down very easily, or feel overwhelming very quickly. Also, it’s a kind of research I can do, but often don’t find particularly fun.

So let’s go with the other approach, which is to start with a particular item – a given herb or stone or other thing that has correspondences – and look at a reasonable selection of material that might have things about it, do a range of searches on the open web and in some databases like JSTOR, and rummage for what I can find, tracing sources back where I can to something that is close to the starting point. Pull them together, when I’ve got a satisfying amount, and then move on to the next one.

The other way to limit the scope is to look at what items I’m taking on. Realistically, my language skills are a mcuch better fit for things derived from from Western Europe than other places. (I read and write modern English, but can also take a reasonable stab at Middle English, French, Attic Greek, and a bit of German and Latin. Certainly enough to poke at things with translation tools and get a sense.) I have a lot less of a chance for things that started out in Arabic, or in Chinese or Japanese, or dozens of other languages.

The other part of this, of course, is in what I use. I live in a global community, and have access to herbs and spices from around the globe. But at the same time, my actual magical practice is rooted partly in Western Europe, and partly in where I live (New England), and my research is going to focus first on the things used in those places. Again, I may very well expand in some cases, but I know there are huge swaths of magical and ritual practice that use things I’ve never explored. I’d rather leave research on them to people who have those skills and experiences.

What about the practical aspects of this?

My idea is that I’d post one or two articles every month (probably: see above about testing how long it takes me to produce something useful first.) I suspect I’ll focus heavily on stones and herbs, but I may include other kinds of things from time to time (colours, animals, other kinds of plants, foods).

I want to produce something that is useful to me, to my covenmates. But I also want to produce something that’s useful to other people, because if I’m going to do the work, it makes sense to share it.

My current plan is to make Patreon posts, and periodically (every six months or year, depending on how many I’m turning out and how long they are) collect the articles into something available in other formats. Some of the research will involve getting more books: the Patreon money will go toward buying those books, and also (if there’s money for this) throwing money at some things that save me time so I can spend more of it on the project.

What does this mean for you?

If you’re interested in this, I will put the most frequent updates in my newsletter – sign up, and get a complete list of what I’ve written recently, plus other links I’ve found intriguing in the past fortnight. But I’ll post an announcement here and a few other places when I get things up and running.

If you have thoughts or ideas (or suggestions for a particular set of correspondences for me to tackle first), I’d love to hear about that, too. The easiest way is through my contact form.

Research tools: how to choose

Last week, I talked about different kinds of tools you might use for research. Today, I’m going to talk about how to choose tools.

Research tools: an astronomical device opens up like a pocket watch with many tools

Where will you use it?

This is one I think about a lot.

Some people do all of their research work or personal computer work on a single machine. I am not one of those people.

I work on a Mac at home (I’ve been a Mac user since … well, before there were Macs, technically, I started on an Apple IIC.) I have specific software I use on my home machine – Scrivener for long-form writing. I use Aeon Timeline mostly for fiction projects, but I know people use it for historical research as well. (It is an excellent and detailed timeline application.)

But some things I need access to on multiple machines. I want to be able to pop a note into my to-do list if I think of it at work, so I can follow up later. I want to be able to dip into my personal email (that’s fine at my workplace, within reason).

My work machine is a Windows machine.

I could bring a personal device with me. In my case, that’s an iPad with a Bluetooth keyboard: it has the iOS app version of Scrivener on it among other things. But the wifi is unreliable in my office, so it’s often not usable. Even for writing on breaks, that’s not ideal, because I need to be able to sync files, and keeping up with the syncing at home would take a fair bit of time and attention. I may want to check something that’s sitting in my email, and not scroll through a lot of text due to the small screen size of the phone, or have easier access to search tools.

So for me, I want tools that have at least some option for web access, even if I also use a specific application on my computer or mobile device most of the time.

You may well make different choices, if you only need to access your work in one location, or you have (and regularly carry) a laptop. Or if you’re regularly doing work in places without reliable internet.

Make the choices that work for you, reevaluate as you change jobs or technology, to make sure they’re choices that still work for you.

Backing up

One of the very first questions you should ask yourself about a tool is how the information is backed up, and how you can get information out of it in a format you can refer to or (ideally) transfer to a different tool. You don’t want your critical material to be held hostage by a company going out of business, or lose material because you have a computer failure.

The actual issues are slightly different – you may have no warning of a computer failure (or someone stealing a laptop, or any of a number of other things): your ideal is continuous backups. That means a copy on your computer itself, a copy on a separate medium (external hard drive, USB drive, etc.) and probably a copy in the cloud (among other reasons, this means that if something happens to your physical location – fire, flood, tornado – you have a copy somewhere else.)

If you don’t want to trust the cloud for some reason, is there a friend who lives in a different region of the country who you can mail a copy periodically? (Cheap USB drive, files burned to CD-ROM, etc.)

How specific and exacting you are about your backup plans will probably depend a bit on your technology setup, a bit on how critical the files are, and a bit on how good you are about manual process things like sticking something in the mail.

Me, I have a copy on my computer, the critical files sync to Dropbox, and I periodically pull copies onto a separate drive (I usually leave mine at work, for a backup in a sufficiently different physical location.)

If you are working on something that you absolutely can’t recreate in a timely manner (like a dissertation or all of your research notes for multiple years) you want to be more attentive to your backups than writing you do solely for fun or emails to a friend. (Those are great to back up, and they can hurt a lot to lose, but they probably won’t derail a significant part of your life for months or years if you lose them.)

How do you get information out of it?

The other side of this question is making sure you can keep control over your information and research, no matter what happens. So long as you’re regularly using software or a tool, you should have at least a bit of warning before a site or tool disappears (though sometimes it can happen nearly overnight!) It’s good to get in the habit of pulling an export regularly.

There are a couple of different considerations with exports.

It’s often easiest to pull a copy that has all your information, but not in a format you can stick into another program easily. For example, it may be easiest to pull a copy of your material as a PDF, but you’d need to do some wrangling (possibly with some specific software) to get the text out easily. VoodooPad, an application I use for keeping personal wiki-type information (where I can link to other pages in the document) will let me export in a number of formats, but I may lose formatting and some connections between files.

Knowing what your options are in advance, and picking the best options for your current needs is usually a good way to go.

What format should you save things into?

Good question. The formats that will absolutely save the core of your material (but may lose formatting, connections between files, or ‘about this work’ type information) are plain text and csv files for spreadsheets. (CSV stands for ‘comma separated values’ which means that each column is separated by commas. You can often set a different character, if your actual data may have commas, and then tell the program you load it into what you picked.)

A slightly more complex option for text is RTF or rich text format. This will save much of the formatting for you, but it may add glitches or not include some specialised formatting .

Saving files in widely used formats – such as Microsoft’s .docx or .xls formats – will often work too, but again may add some additional material or leave some things out. (Microsoft formats are sort of notorious for bloating files with a lot of additional formatting data that can cause problems on import.)

Sometimes you may have the option to export as an HTML or XML format – usually this is an option for linked pages, like a website or wiki. These formats should preserve the links between pages, and you can access them by opening the file on your computer as if it’s in a web browser. (And from there you can save the material into other formats if you need.)

Thinking about how you might want to use the information if you need to resurrect it is usually a good indicator for your best format.

Research tools: what I use

Time for a new series – this one on keeping track of reference materials. In this post, I’m going to talk about a couple of different aspects. Then, in future posts, I’ll be looking at some specific tools to keep track of references, like Zotero (one of the citation management programs.)

Research tools: an astronomical device opens up like a pocket watch with many tools

Why have a system to keep track of things?

If you’re only managing a few references, or a few sites (for values of ‘few’ that go up to about 50), you probably don’t need a big system – you can keep track of a few dozen things in a word processing or text file pretty easily.

But once you get over a few dozen, it gets harder to keep things organised. Our brains have a harder time processing a long list: it’s easier to miss something, or duplicate entries, or otherwise have housekeeping errors. Different people will have different length limits, but somewhere between 20 and 40, you’ll probably hit your personal ‘this is too long’ .

The same thing goes if the items you’re keeping track of fit into multiple categories. It’s one thing to have a list of items you need to read – but what happens when you want to list things as “to read” and then by the type of content. How do you file things? Do you list it every possible place? That makes for a much longer list.

Either way, if you want to keep track of lots of things, you need a system.

What kinds of options are out there?

For many people, the system that works best will depend on what you’re trying to keep track of. You may need a different approach for websites than for print books, or a different way to handle ebooks.

I suggest that you think about the difference between what you own, and what you use as reference material. You may want to own a book (and keep track of the fact you own it), but not care about it as reference material. You might have a system for keeping track of books, and a different one for tracking reference material. Having multiple systems can be annoying (and potentially confusing) but not if you’re clear about why you’re using a specific tool.

Here’s what I use:

What books I have copies of: LibraryThing

I use LibraryThing to keep track of everything I own – print and ebooks. Items get entered in the catalog. I have a collection of print books (so I can just search things I have in print), or ebooks (just things that live on my phone.)

Everything also gets content-specific tags like genre, or when it’s set if it’s historical, or topic. I keep my tags edited, so that I can search them easily, and I refine them regularly so that I don’t have tags with only a couple of items unless it’s really necessary. (I am not a fan of lists overwhelming for me.)

You can add books with a simple form or by importing a spreadsheet if you’ve been using a different tool, or by scanning the back of books in many cases with a tool in the mobile app.

If they’re print books, they also get assigned a tag that indicates where they’re shelved (so I can find them again.) The shelving tags are really simple – I have the IKEA cube bookshelves, so I do A1, A2, A3, A4, B1, B2, B3, B4, etc. Each cube only has about 12-15 books, so it’s easy to spot once I’m looking in the right place.

The actual ebook files are managed through Calibre. This means I can search or tag and manage files much more easily, or save files to a different place if needed.

Costs: LibraryThing has a fee for over 200 books in an account. The fee is $10 for a year, or $25 for a lifetime account. (Obviously, one of these is a much better buy if you expect to keep using the site.) Calibre is free, but they appreciate donations!

Websites I want to share: Pinboard

Pinboard describes itself as “a bookmarking website for introverted people in a hurry.” (It’s also been described as anti-social bookmarking, in contrast to social bookmarking.)

I have a personal account, and one for coven links. My personal account is private and where I put things I want to find later, the coven site is public. You can set tags, group tags, and do some additional things.

I use Instapaper as an interim tool to keep track of things I want to save, read later, or think about reading.

Costs: There’s a yearly fee for new Pinboard accounts ($11 a year right now) and it’s well worth it if you want to share bookmarks, keep track of more items than your web browser’s bookmark tools will handle easily, or access bookmarks from multiple browsers or devices.

Instapaper is free, and there are other similar services (Pocket is the other big one)

References (books, websites, PDFs): Zotero

Zotero is one of a handful of widely used citation management programs, and the one I’d recommend for most people – it’s free, has an add-on for Chrome, and has other benefits. It will help you keep track of references, and you can produce a formatted bibliography with a few clicks (though you probably still need some human review. Citation styles are tricky!)

If you’re in academia, you may have access to other options through your school. Your library (or the library website) probably has more information. (There are certain advantages to using the same system other people in your institution are, and if you’re working in a research lab or closely with a professor or researcher, you may not have a lot of choice about which tool is used.)

Costs: Depends on the tool, but Zotero is free. If you want to store PDFs on their site, you’ll likely need to pay for additional storage if you have more than a few.

Notes and writing: Ulysses, 4theWords, Scrivener

My briefer writing is sometimes a little tricky, because Ulysses is a Mac only app, and I can’t access it at work. You can have lots of folders, tag items, create smart folders, and much more. There’s even a publishing option for putting things into WordPress (and a few other tools).

I’m also using a site called 4thewords which is exploring a gamification approach to writing. You battle various monsters and win by writing a certain number of words in a span of time. As you do quests, you can earn items for your avatar or other game objects. As I wrote this sentence, I won a battle of 500 words.

It also keeps some stats I’m finding more useful than I expected about how long I was actively writing a given piece. (And I’ve found the battles a certain incentive for doing just another hundred or two hundred words, several times.)

Scrivener is where my long-form writing lives, and it is amazing for being able to move things around, save a piece you cut but want to keep just in case, and has a lot of tagging and drafting tools to help.

Costs: Ulysses is a subscription (it’s also currently part of the SetApp subscription option, if you’re interested in other apps they offer, which includes Aeon Timeline, a popular timeline app) and 4theWords has a month free trial and then is $4 a month.

There are obviously lots of free options in this space: I use Google Docs for sharing word processing with other people, and especially for editing that we’re looking at together. It’s also my go-to for things I may want to add to during lunch at work.

Come back next week.

Join me next week for part 2, things to think about when choosing tools.

If you use other tools you think I should look at, I’d love to hear from you – the contact form is probably the best way.

How catalogues work: algorithms

The last part of how catalogues work is looking at algorithms.

( was not a computer science major. This is going to be the non-technical discussion.

Also, the two links I mention here are from 2016, and technology has moved on a bit, as technology does, but these are good illustrations of my specific points.

Catalogues: Wooden chest of old-fashioned catalogue cards

What is an algorithm?

A good definition is that an algorithm is a step by step way of doing this. This video from the University of Washington notes that sorting your laundry is an algorithm (is it a white shirt? This pile, these things get washed together. Is it a red shirt? That goes in a different pile. Does it need special treatment? Follow these steps.) The video’s a great overview of the topic in a couple of minutes.

Computers are extremely fast at doing this kind of step, but how successful the algorithm is depends on what the people programming the algorithm have told it to do.

An important digression

The fact that human beings design these lists is a particularly business-centered reason why diversity in technology (and in companies in general) is such a big deal – people who have different backgrounds, life experiences, or ways of looking at the world are going to think about different things in the design process,. When that’s managed well, a diverse group will likely come up with algorithms and other programming that work much better for a wider range of people.

(An example here – though it is a legitimately sort of complicated record keeping – is that the Apple Health app didn’t include any menstrual cycle tracking for a long time, and it’s still much more rudimentary than some other apps. If your body does things outside of the expected timeframe, you have fewer options.)

What does this mean for catalogues?

Some of the things a catalogue uses an algorithm for are pretty straightforward. Sorting a list by the last name of the author, or the title of the work, or the year it was published is pretty simple, so long as the data is consistent.

What data might be inconsistent? An example would be if the date formats swap between United States standard dating (Month-Day-Year) and the Day-Month-Year common in parts of Europe, your results are going to be confusing. Good data is essential to sorting and organising your catalogue.

(This is why I am spending my summer cleaning up a lot of data in our catalogue at work. This week, this has meant hours of moving identifying file numbers from the format area, where they shouldn’t be anyway, to a different area, and making sure the correct format is actually entered.

We can automate some kinds of data changes, but this one requires moving data into a different field, and we don’t have an easy way to do that automatically.)

Where it gets complicated

However, once we get into things where there is a bit more of a value judgement.

What kinds of images should we get if we search on “beautiful”?

One of the examples that has stuck with me the most was something illustrated in a keynote Dr. Safiya Noble illustrated in the keynote she gave at the LibTech conference in 2016 (LibTech is my favourite library conference for a reason) It’s worth noting this was given in March of 2016, and she talks about the manipulation of algorithms and the effect on elections….

In her keynote, she did an illustration where she did a search on “Beautiful” and at that time the algorithm turned up a lot of landscapes (that were really gorgeous). But if you searched on ‘beautiful woman’, you turned up white women (and white women of a particular kind when it came to facial structure, hair, body size, and a bunch of other characteristics’. That’s what happens when human programming goes awry, or is not sufficiently questioned.

And if you tried searches like “black girls”, you got a whole different set of results, and much more mixed ones in terms of positive and negative.

So, when your library catalogue tells you you can sort by ‘relevance’ or gives you options for ‘similar topics’, there are probably a lot of different things at play. Usually, there’s software decisions in there somewhere. Some of these may be accessible to the library staff, others may be decided by the software programmers, and the librarians may have no idea how it works.

(In our new catalogue, we can choose which things to weight more – so for example, we could choose to weight phrases in the abstract (where we put a summary of the content) more than the title, or less than the title (depending on what decided). We haven’t played around with this much yet, but it’s a way to help refine options for people.)

Even more complicated

Large companies – Google, Amazon, Facebook, any of the big ones – also look at your reactions to what you click on, where you spend time, what you click away from and when (and where you go to) – because it helps them create vast maps of data they can use. Sometimes this is really handy (like when Amazon’s list of also-boughts shows you a book you love and you had no idea it existed, or Spotify’s algorithm suggests music you really like.)

Sometimes it’s a lot creepier and more awful. There’s a famous story, when it comes to algorithms, of Target figuring out a teenager was pregnant based on other purchases before her father found out about it, based entirely on purchases that were not specifically intended for a baby, but rather things like body lotion, a larger purse, two common supplements, and a bright blue rug.

And of course, it gets even scarier if we start talking about government agencies making decisions about who can get visas, fly, or do many other kinds of things, based on algorithms and data management decisions that are obscured to the end user of the information.

What you should take away from this

Trusting a computer on fairly simple sorts (like title or author or date) is fine – but if the computer is suggesting related items, and you care about getting a wider range of options, or you are concerned about implicit bias in how a system designed by unknown people might work, that’s a good time to do some more digging, or to try a variety of searches with specific parameters so that you get a sense of what is there, what’s recommended, and maybe what isn’t.

Simply knowing more about algorithms will also give you a lot more choices and awareness.

How catalogues work: editorial influence

There are several places in a catalogue where there’s a degree of what might best be called editorial influence. More bluntly put, it’s people (at some degree) making decisions about these things, and those decisions come with biases, both good and bad.

We also use algorithms and those algorithms have biases, and that’s a different topic (and one for next week.)

Words mean things

Those words we use as a controlled vocabulary come from somewhere. Humans came up with them – humans with all their virtues and all their biases.

Sometimes, terms were recommended by experts in the field, or people who knew a topic intimately. (Those aren’t always the same thing!) Both these perspectives bring history and assumptions with them that may or may not fit in with the larger collection or way terms are arranged.

Sometimes those terms were the current thing at a particular time, but we have come to new understanding (this is true for a lot of terms about gender identity and sexual orientation, and also for terms around neurodiversity, and around topics like disability.)

Sometimes topics are entirely new – as technology changes, we need to come up with words to help us find things about it. Do we catalogue it by the current tech device, or do we use a more general term, because the iPhone X of today is going to be barely in service in five years, and mostly forgotten in fifteen?

Sometimes we have to pick one – like my exaple in earlier posts, you sometimes need to pick an option so that you have one main subject heading, rather than making people search through

  • Cat
  • Cats
  • Felines
  • House cats
  • Kitties
  • Pussy cats
  • Fuzzballs who take over the bed

(Ok, that last one isn’t very likely.)

Some of these terms are more clinical than others. Some are questions of ‘do we make a standard of singular or plural for groups of things’? Some are ‘do we include a common nickname or slang term’. Some terms might be more historically dated than others.

Why does this matter, anyway?

This might not seem like a big deal with cats – but it can be a bigger deal if you’re talking about health information, or topics where there’s often a difference between experience of a thing and professional knowledge and training about a thing.

(Dealing with the legal system as a person dealing with a crime versus lawyers and judges. Dealing with a health issue as a person experiencing a problem versus being a doctor or nurse or health care professional.)

Sometimes terms can bias our assumptions about results. I mentioned the issue with the Library of Congress wanting to drop ‘illegal alien’ and use other terms, and being blocked by Congress (because of the role the Library of Congress plays with the actual work of Congress and the need to reflect the terms used in the laws.)

Individual library systems may decide to change their terms for these kinds of topics, to create a more welcoming and diverse environment in various ways and to reflect the needs of their particular communities.

That part, of course, is where it can get complicated. Libraries are aware that they’re serving the people who come into their building (many of whom do so fairly anonymously: librarians don’t know what you look at on the shelf, and many libraries deliberately do as little tracking of activities, loans, and other user-specific details as they can get away with, to preserve patron privacy.)

But libraries also serve people who never come into them. Not just the people who use online resources (libraries can see what’s getting used), but libraries should also be thinking about all the people who don’t use their services but could.

This is most easily illustrated by public libraries, since they serve a particular location. A library might notice that they’re seeing some types of people use the library regularly, and may be able to tell from demographic information about their area that they’re not seeing some groups as often as they should be.

Sometimes that’s about the words we choose. Whether people can see themselves reflected in the library and the catalogue and the displays.

Who decides subject headings for a work?

There is also a degree of editorial influence on who sets the subject headings.

Large publishers often suggest them – you may see this in the front of the book, on the copyright page. Below the legal information, there will be some suggested subject headings and call numbers. Libraries don’t have to listen to that, but in practice they often do unless there’s a specific reason to overrule them.

In other cases, it may be a central cataloger (in a large library system) or an individual librarian. It’s hard to tell!

Generally, no one in this process (except maybe someone on the publisher’s end) has read the whole book, and the subject headings will reflect the large topics in the book, not specific ones.

People will also pick how specific the subject headings are. For example, do you pick United States – History or Massachusetts – History? Or maybe Women – United States – History – 20th Century. (Here’s a page explaining some of the options from New York University.)

Next time, a brief look at algorithms and how they affect searches. (It’s a huge topic)