Thinking about Sources and Citations…
Let’s start with a confession: part of me doesn’t care about any of this Source/Citation nonsense. Now that might be heresy in genealogical circles, but hear me out. I don’t think I’m as “radical” as that might sound. Honestly, I’m just trying to come to grips with an aspect of genealogy that I, at least, had thought was settled.
As a hobbyist living in the same house as a real genealogist (my wife), and trying to codify this hobby into a software package, I’ve been forced to delve into many aspects of genealogy that I had only lightly thought about before. Sources and Citations are just one of those aspects.
Prior to beginning Origins, I was under the impression that the gold standard of genealogy citations was Elizabeth Shown Mills’ Evidence Explained. We’ve got a copy of it around the house somewhere, but could never get myself to really follow it. My biggest problem was the breadth and strictness the book seemed to promote. No offense to Ms. Mills, but it took on the feel of “Understanding Poetry” by Dr. J. Evans Pritchard. The write up on Amazon proudly trumpets “More than a thousand citation models“. Now, I’m reasonably intelligent, but I can’t deal with more than a thousand examples of anything. It’s just more than I could ever hope to grok. Largely for this reason alone, I didn’t follow EE, but I always felt like that was somehow wrong.
Rebel Without a Clue?
Imagine my surprise, then, when a few people on our Facebook feedback group for Origins expressed a desire to not use EE! Wait. There were other “rebels” in the world?! Hallelujah!
That’s what has led me to this point. I now find myself questioning a fundamental aspect of genealogy and how we’re going to implement it in Origins. When I sit down to write the code for this it will actually be the third time I’ve done it – the first was a simple prototype so not too detailed, the second went much further, but we just weren’t happy with the way it evolved as it went along, so this post is an attempt to collect my thoughts and gather some feedback before diving in to the third, and hopefully final, pass at this code.
Start With the End in Mind
The goal of Sources and Citations in genealogy is to provide traceability. In other words, to allow you, or anyone else, to backtrack through your research to validate the conclusions you’ve drawn. Simply stating John Smith was born April 5th 1924 is not a traceable conclusion. It may very well be true, but there is no way to discern that, unless you personally happen to have been an eyewitness to the event; and even that doesn’t help anyone but you.
So what additional information do we need to provide to support this statement, allowing others to draw the same conclusion we have – that John Smith was actually born on the fifth of April, 1924? Before we can get to that we need to have a somewhat existential discussion about the nature of truth. I’ll apologize for this in advance.
But What is Truth?
Notice that, so far, I have avoided using the term fact in this discussion. That’s been intentional. Before we can really dig in and discuss traceability, we need to understand what it is we’re tracing. I contend (and I’m not the first to do so) that we do not work with facts when we do genealogy research. Bear with me for a minute before you dismiss me as a quack, there is logic in my madness.
While the distinction may be largely semantic, I’m a believer that words have power and to have a precise discussion on any topic we must use the correct terms. So if not facts what do we work with?
As genealogical researchers we draw conclusions; we do not state facts. The research we do is not empirical; it does not lead to observable results, unless you have a time machine, and even then you have the problem of alternate time streams and multiple realities, so let’s not go there. Instead, we collect evidence in the form of historical documents and records and use that evidence to support the conclusions we draw from them. Ideally, our research is thorough enough, and well-documented, so that others can look at it and not only verify the research by following our documentation, but also determine that the work we’ve done is reasonably exhaustive and then draw the same conclusions we have.
That is, I think, the goal for all people who do genealogy – whether professionally (i.e for paying clients) or as a hobby (realizing full well that some hobbyists are better, more thorough at this than some professionals).
In the end, this is the goal to which we should all strive:
- Every conclusion we draw in our research (e.g. a person’s date of birth, location of death, spouse name, parents names, etc. – those things often called facts in genealogy software) is supported by one or more claims (more on claims in a minute)
- Every claim supports the end conclusion by either proving one or more aspects of it or disproving a conflicting claim
- Every claim is traceable. In other words, as the person making that claim, we provide sufficient documentation on the research we have done that others can:
- Locate and review the same source material
- Identify all of the material we examined to make that claim, so they can determine that our research was thorough (the term used for this is typically “reasonably exhaustive”).
- Come to the same conclusion we have.
So now that we know what we’re trying to achieve, how do we get there?
I don’t know if anyone else has used this term to describe the approach I’m advocating. I can’t imagine that I’m the first, but a casual online search doesn’t turn up anything with more than a passing similarity. I’m probably restricting my search to too narrow a set of words and ideas. Anyway, here goes…
For me, Claims-based Research is what we do in genealogy (and probably other disciplines, though I can speak even less authoritatively on them than I can for genealogy, so we won’t try to apply this too broadly). But what does this mean?
A claim is pretty easy. It’s what we state to be true; what others might call a fact. So why not call it a fact? Because we don’t know it to be, and can’t observe it to be, true. Also, because fact is a loaded term. It carries certain connotations that I don’t care for. Claim works much better, as you’ll see.
In the course of my genealogical research, I come across certain pieces of information which lead me to make a claim, such as my previous example of John Smith being born April 5, 1924.
That is my claim: John Smith, D.o.B: 5 April 1924.
The “certain pieces of information” I found which led me to make that claim are the evidence for that claim. Other systems or software programs might call these sources, master sources, citations, or just fallback on the catch-all term of facts.
So far, things aren’t that different. I’ve just swapped in some new words:
- Claim instead of fact
- Evidence instead of one of the terms listed above
But now things are going to heat up a little… 🙂
The reason I like claim instead of fact is because there can be only one fact, but I can make any number of claims. Using my John Smith example, there is only one date on which he was born. That is an undeniable truth (leaving out the impact of different calendar systems, etc as they’re not really relevant). All people are born on one and only one day. However, different records (or other evidence) might, for any number of reasons, list different dates, or incomplete dates. For example:
- A birth record, or birth certificate, would list the full and complete legal birth date of 5 April 1924. This is almost certainly the best evidence for a birth that you will find.
- A newspaper article may mention that John was celebrating his first birthday on April 7, 1925. But is that his birth date or just the date of the celebration?
- A military service record may list John’s date of birth as 19 November 1923
- What if the birth record were smudged and you couldn’t tell if the date was the 5th or the 3rd?
- A death record may provide the information that John died on 10 July, 1995, at the age of 71 years, 6 months and 5 days. But is it really accurate? (If you do the math, you’ll see that it’s not. According to that information he was born January 5, 1924, so something is amiss.)
Those are just a few examples of the types of situations we have to deal with. Incomplete and conflicting information is a fact of life for genealogists. Without solid evidence, we have to be able to piece together the “truth” from scraps.
As a quick aside, I have to acknowledge that not even what most genealogists consider “solid” evidence is necessarily “true”. I’m a perfect example. My birth certificate says I was born on one date, when in reality I was born a day later. It’s a long story, and a birth date off by one day probably doesn’t matter, but it’s an example of how things can be wrong, even when they are “official”. I think this is another reason to make the mental shift that switching terminology from “fact” to “claim” allows. “Fact” implies “truth” when that may not be the case. Especially as you go further back in time, and/or need to deal with potential social/religious/royal/whatever impacts of “truth” we have to concede that what a primary source document says may have been “adjusted” to meet the needs of the person or their family.
Anyway, back to the task at hand…
Jumping back to the situations listed as examples above, each of those bullets would be a claim against John Smiths birth. Depending on the validity I, as the person doing this research, assign to each claim, I make a determination as to when John’s birth actually occurred.
For the sake of argument, assume I didn’t have the birth record. That leaves three pieces of evidence:
- Newspaper article about his first birthday
- Military service record
- Death record
Two of those (1 & 3) put his birth in the winter or spring spring of 1924, but they aren’t necessarily solid evidence. One (#2) places it in the late Autumn of 1923, and appears to be pretty solid evidence, except for one thing. There’s a good chance that the military record lists John’s date of birth as provided when he enlisted (either drafted or voluntarily enlisted). Assuming for a moment that the 1924 date is correct, that would have made John about 17 years and 6+ months old when Japan attacked Pearl Harbor in December of 1941. It is easy to imagine that John lied about his birth date in order to enlist. Lots of young men did at that time.
So now, looking at those three claims, we can downplay the accuracy of what initially appeared to be a pretty solid piece of evidence, the official military record and place more weight on the other two. Enough to say that John was most likely born in the Spring of 1924 (again, pretending that we don’t have the birth record, and adding a little more weight to the newspaper article as it is a contemporary record, as opposed to one looking back 70+ years and involving some somewhat difficult date math involving leap years and such.)
So where does that leave us? We have claims instead of facts because there can be any number of claims, including contradictory ones, but one and only one fact (leaving aside the current fascination for the logically impossible “alternative facts”). This also allows us to include what we’ve determined to be inaccurate claims (such as the military record) in our research but not give it the same weight as other claims upon which we set more credence. We certainly don’t want to not include the military record in our research, but we need to somehow account for and document the fact that we’re not using it as evidence for John’s date of birth and the reasons why.
Claims Give Us History
If all claims gave us was the ability to think about things differently and manage contradictory information, while good, it would probably not be enough to upset the apple cart. But claims give us more than that. Claims give us the ability to evolve our research as more information is uncovered, without losing the history of how we got to where we are. This allows us to easily revert to a previous state if we discover that we need to.
Claims give us the ability to evolve our research
An example is probably the best way to explain what I mean. Using our existing John Smith date of birth example, assume that we had all of the information listed above, including the birth record, but that instead of the date being smudged, it was the mother’s name that was hard to read.
Now we have what we think is pretty solid evidence as to John’s birth date, after all, we have the official record of his birth. We record the date listed (5 April 1924) as his Birth Date, discount the other claims as incorrect or otherwise less applicable than the birth record and continue merrily along our way.
Now imagine that we revisit John’s family for some reason and in the course of our research discover that there was another John Smith born in the same county in the first half of 1924. (Think that’s unlikely? My wife has 3 “Thomas Wallace” births in the same county within 4 months of each other, all to fathers also named Thomas and to mothers whose names all begin with an “S”. Anything is possible.)
So, back to our scenario. Now there is some doubt that the John Smith whose birth record we have is the correct one. What if in the course of our research, we discover that it is not the right one and so we need to discard it. But we don’t want to actually delete it, because it is still evidence and it would be nice to still have to show ourselves and others that we knew about the two John Smiths and that we discarded this birth record because it wasn’t the right one (as well as some notes about how we determined that). However, we don’t want this showing up as a “birth fact” in our reports and showing as a part of our John Smith’s life.
We could add a note to the “birth fact” but notes are sometimes hard to work with and search and it is likely that it would get lost in the crowd if we’ve got more than just a few notes for our John Smith and/or his birth date.
Instead, we keep it as a claim, set it’s validity to “none” or something equivalent and then add a note to just that claim entry recording why it was discounted. Now the note is only relevant to that claim and won’t show up anywhere else. We’ll only have to deal with it if we go looking for it, but it’s there and available if we or anyone else are validating our research and come across the invalid birth record again.
This ability to see the progress of our research, record and retain incorrect paths and not clutter up our primary research line is yet another reason that claims work better than facts.
Moving Beyond Claims
We also have evidence which thus far I’ve said is somewhat analogous to sources but now I’m going to tweak that a bit. Here are a couple of new definitions:
- Source: Going back to something closer to a true dictionary definition of the word: a place, person, or thing from which something comes or can be obtained (https://en.oxforddictionaries.com/definition/source). For our purposes, we’re going to tweak this slightly to say that source is just the person, place, or thing from which we obtained our evidence. In concrete examples, a source could be any of the following:
- A website
- A book
- The transcription of an interview
- The family bible
- A census record
- Land deeds
- Court records
What a source is not is the actual information we use to support our claims.
- Evidence: Evidence can be cleaned up a little now that we have split off sources. Evidence is the specific information, contained within a source, that is used to support a claim.
- Citation: Besides the title of this post, we haven’t even talked about citations yet. Citations are the glue that connect evidence to a source, that’s it. BUT…this is likely the most important part of everything we have to talk about. Citations are what allow us to show that we’ve done our research and drawn valid conclusions. A citation lists the information which allows us to locate the source as well as the evidence for a claim within that source.
- Fact: Here’s the word that sparked a lot of this discussion trying to figure out what is a fact? To fit in with the rest of what I’ve covered, something called a fact is still necessary, but it’s slightly different than how it’s been used in the past. For this model to work, a fact is that thing against which you make claims. For example, a person could have the following facts, among many others:
- and many, many others – we’ve got 80+ different types of facts in Origins already
A location can have facts (latitude/longitude, address, name, etc.) An event can have facts (date, purpose, etc). All of these are things against which we make claims, supported by evidence, coming from sources. Often these are called “fact types.”
The way this all ties together is like this:
- We make claims against facts to build our path to a conclusion
- Each claim has one or more piece of evidence
- Each piece of evidence has one or more citations
- Each citation connects one claim to one source. A citation cannot have more than one claim or one source.
- Each source has one or more citations
Again, an example to help clarify: In our John Smith example, we have four claims to John’s date of birth, but let’s add a fifth to help illustrate. Let’s say that the family always celebrated John’s birthday on January 5th. Here are the claims we have for John’s Birth Fact:
- 5 April 1924 (birth record, now considered incorrect)
- About April 1924 (newspaper article)
- 19 November 1923 (military record)
- 5 January 1924 (death record)
- 5 January (family tradition)
The first three claims are for unique values and they each have one citation to their relevant sources (birth record, newspaper article, military record) which contain the evidence for that claim. The final two dates are similar, the last just missing a year. We can either merge them into one claim (5 January, 1924) with multiple pieces of evidence, represented by distinct citations connecting to distinct sources, or else make them separate claims, each with one piece of evidence, each with one citation, to their respective sources. The choice here is really one of preference.
The end result is that we place more credence upon the last two claims (or one merged claim) and assign a value of 5 January, 1924 to John’s “Birth Fact.”
Applying this to Origins
If you’ve stuck with me this far, thanks. Now we can start to move all of this from theoretical ramblings to a real application in the real world. The original purpose behind writing this article was to help me gather my thoughts on how we were going to implement “facts,” “sources,” and “citations” in Origins. Everything prior to this point has been me thinking out loud; now it’s time to make it real.
First things first. Our goal with Origins is obviously not to throw out the entire body of genealogical process and wisdom that has evolved over the years/decades/centuries. That would be the epitome of throwing the baby out with the bathwater. Instead, what we’d like to do is see if it makes sense to tweak things just a little to fit in better with how things work now in the age of computers, Google/Bing and the Internet. All 3000+ words of this article (so far, yikes!) have led up to deciding if there’s reason to make some changes in the way Origins works.
Our goal with Origins is not to throw out the entire body of genealogical process and wisdom that has evolved over the years
Another key design goal behind Origins is to make things easier. Two famous quotes come to mind and drive a lot of what we do:
- Everything should be made as simple as possible, but not simpler (Albert Einstein)
- Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage — to move in the opposite direction (E.F. Schumacher)
Now, I’m claiming neither genius nor courage, but I think that if all we did was churn out another genealogy program that worked the same as the products already on the market then we’re just wasting our time. At the very least, we need to examine the assumptions and processes used in genealogy and see if there isn’t something we can make just a little simpler, just a little easier, just a little more powerful, just a little more user-friendly.
I’m as curious as you are to see how this works out, so let’s finish up.
Assuming for a moment that all of the above makes sense and that we were going to implement something to support it, what would that look like?
Starting from the user interface (UI – the screens with which users interact ), we’d need the following capabilities:
- The ability to see the conclusion made for each type of fact.
- The ability to rate our level of surety or confidence in each conclusion
- The ability to create, edit and delete conclusions
Claims and Evidence
- The ability to display all of the claims for a fact (or a filtered list of claims if desired), even disproven or “negative” claims.
- The ability to see the information gleaned from each piece of evidence supporting each claim
- The ability to create, edit and delete claims and evidence
- The ability to rate our level of confidence in the validity of a claim or piece of evidence. Including negative claims (i.e. I’m highly confident that this birth record is not for my John Smith)
- The ability to see the citation for each piece of evidence, showing where it came from (both the source as well as, for certain types of sources, a location within that source)
- The ability to identify similar citations
- The ability to see details on existing sources, add new sources, edit existing sources and delete existing sources (with a warning if we edit or delete an existing source that has citations already connected to it.
- The ability to clone an existing source, without the citations, and then edit that to make a “new” source
- The ability to identify sources which have no connected citations
- The ability to relate sources that are similar (e.g “all Census sources”)
- Templates for predefined conclusions, claims, citations and sources
- The ability to clone and edit the above templates
- The ability for users to create their own templates, including everything form a totally free-form, empty text box into which they can type whatever they wish for everything, to their own structured input form, and (just about) anything in between.
A quick note here: So far, believe it or not, just about everything we’ve covered has been about input, or collecting information – entering it into the software. Nothing we’ve covered hs really touched upon the output – viewing it on the screen, in reports, etc. To a large extent, this has been intentional as I believe that these are two entirely separate things and should remain distinct. The way in which you input information should have as little bearing as possible on how you output it. Ideally, your input allows you to generate different output depending on the current need. So here are my requirements for output:
- Be able to produce different outputs for different needs. Often, a live, interactive output, such as a webpage or other computer application has very different output needs than a static report. Software should be able to support that.
- Provide sensible defaults, but allow the user to override those to control the output to meet their needs. For example, a report for consumption just by a user or their family probably doesn’t need the same level of detail as one to be included in a scholarly journal. The input and output systems of an application should support this.
And this brings me full circle to the beginning of the article, where I stated that part of me doesn’t care about Sources and Citations. Now I can explain that a little bit more.
I don’t care about Sources and Citations because what I really care about is the Inputs and the Outputs. The Inputs need to be simple enough to allow me to do it quickly and easily. I don’t want to have to think too much about exactly what information I need to provide and in what order so that I can produce an output that meets some canned set of criteria. I, like I assume most of you, am busy. I’m not the world’s fastest typist, either. I don’t “do genealogy” to spend all of my time typing in information and thinking about “sentences”. I do genealogy because I want to research and learn about my ancestors and their lives.
Along the same lines, I don’t need to see scholarly journal level details every time I’m looking at something. Simply seeing “1920 Census” is sufficient the vast majority of the time for me to know where a piece of information came from. Anything more is just a distraction. But there are times when I need those details, so I need to make sure they’re available.
I think that software can play a huge role in making this easy. If you’re researching on your computer, it is possible for the software to know what website you pulled information from. If you’re simply doing data-entry for information from offline sources, it is possible for the software to help with that, too. Once you’ve entered or collected information, software can easily analyze it and look for similarities, differences, potential contradictions, etc. That’s the level to which we want to take Origins. It should be a true partner in your research, helping you as much as possible.
That just about wraps this up. I honestly don’t know whether any of this makes any sense whatsoever, but it’s the culmination of thoughts that have been rattling around my head for months. I still need to digest a lot of what I’ve written here. I’ve been at this on and off for most of the day, and have coalesced a lot of vague ideas and thoughts together in the course of writing this. None of this is going to have any impact on Origins until I think it through some more.
I also want feedback. I realize that this has been a very long read, and so if you’ve stuck with it, THANK YOU. If I could ask you to spend just a minute or two more to leave a comment with any thoughts you might have, I would be quite grateful. As I’ve said repeatedly, we couldn’t do this (Origins) without the incredible support we’ve gotten from genealogists all over the world. If any of this seems worthwhile or even interesting, I’d love to hear it. I’d also love to hear if you think this is all poppycock and I need to see my doctor about increasing my medication. All feedback is good feedback, so let’s see where this goes.