by Joe Miller,
Director of Development
Last week I got a little—okay, fine, a lot—nerdy about metadata.
Several people have reached out since then asking questions about where all that metadata comes from. Where are we getting all the metadata that goes into our catalog? Do we type it all in ourselves? Are we writing the book descriptions you see in our catalog or on Libby? Are they written by other librarians across the state?
When I was first learning to teach, an advisor warned that if a couple of people in a class had a question about something, the odds were that most everyone else did, too. So, I thought it might be worth saying a little more about how our library systems work.
I think it’s helpful to understand how computers use metadata. We’re going to do that by way of what philosophers call a thought experiment. Don’t worry. It’ll be fun.
Imagine that you’re inside a room whose only access to the outside is a mail slot. The walls are lined with instruction manuals.
Every once in a while, a piece of paper with a bunch of markings on it comes in through the mail slot. You look up those markings on the left-hand page of your instruction manuals. You then copy down the markings on the right-hand page and push the paper back through the mail slot.
Now as it turns out, the markings are actually Chinese characters and each paper coming through the mail slot asks a question. The markings you put back down form intelligible responses to those questions.
To a Chinese speaker outside the room, it would appear that there’s an ongoing conversation in Chinese. But you don’t actually speak Chinese!
In philosophy-speak, the conclusion of this thought experiment is that syntax is not semantics.
Syntax refers to the set of rules we use for constructing sentences. Semantics refers to the meaning of words inside the sentence.
The thought experiment shows that—given a sufficiently detailed set of rules—it’s possible to create meaningful sentences (syntax) without understanding the meaning of any of the words in the sentence (semantics).
Computers are stuck inside the Chinese room.
When you put a question into a search box or a ChatGPT query, the computer sees a bunch of markings. It can apply a set of rules (called algorithms) to manipulate markings into meaningful sentences. But it doesn’t understand the meaning of the words any more than someone inside the Chinese room understands Chinese.
Metadata lets computer scientists write better rules for manipulating markings.
Underneath each webpage or mobile app screen is a set of instructions telling the computer what to do with all words and images that make up the content you read.
At a very basic level, the instructions describe how things appear on the page or screen—this thing is a paragraph, that thing is a heading, etc. But we can also add metadata to those instructions, specifying things like this page is about a book; the title of the book is “Moby Dick”; the author of the book is “Herman Melville”.
Once you’ve got metadata encoded on the web, there are all sorts of cool things you can do.
One of those is making search engines that answer questions rather than simply providing links. If you go to Google right now and type in who wrote Moby Dick, “Herman Melville” will appear in large type at the top of your search results page.
More relevantly for our initial question, we can also use metadata to help computers share information. For example, when we add a new book to our local library catalog, we don’t usually type in all the metadata. If it’s not already in our library network systems, we access other catalogs until we find the exact version in our library, then simply import the data.
That works because we’ve added metadata to tell the computer which sets of markings are the author and which sets the title and so forth. Without metadata, webpages are just meaningless markings to a computer.
The original source for metadata is the book’s publisher. Everyone else either pulls the information directly from the publisher or pulls it from a catalog that pulled it from the publisher.
That’s why, when you look up a specific book on Amazon or Barnes & Noble or Libby, you’ll find exactly the same description at each place. Those descriptions were written by the publisher and imported automatically.
And that’s how it all happens.
Thanks to those who wrote in with questions this week. I always love getting feedback—and especially love follow-up questions! (Providing information is, after all, one of the main functions of a library!)
If there’s a topic you’d like to see covered or if you have follow-up questions about a column, drop me a line at joe@pocahontaslibrary.org