This post continues a look at the effect metadata has on the amount of views an object receives. Part 1 can be found at: http://scande3.com/2015/04/effect-of-metadata-subjects-on-a-digital-objects-discoverability/. The same criteria from part 1 still applies to these stats and those overall global rules are:
- A six month time period from October 1st, 2014 until March 31st, 2015 for objects the existed in the repository before December 31st, 2014.
- The listed view counts come from the Google Analytics API and reflect views on the object’s main result page only.
The first exciting match puts “LCSH pre-coordinated” subject topics against those that lack the “–” concatenation.
LCSH Topic Subject Comparisons
LCSH Style Topic Subject Objects | Non-LCSH Style Topic Subject Objects | Mixture of Both Styles Topic Subject Objects | |
Total Records | 8,119 | 122,353 | 8,111 |
Average Views | 0.938 | 1.570 | 1.243 |
Percent with 1+ Views | 30.8% | 42.9% | 40.5% |
It would appear the hypothesis in the previous analysis post is correct: normalized non-LCSH style subjects soundly defeat those items that use the concatenation. But there is a notable asterisk to this victory in that the amount of objects using LCSH style subjects is significantly smaller. Natively in the Digital Commonwealth system, we do not generally pre-coordinate LCSH subjects as our “best practice” and thus that policy decision has an affect on how metadata was done for the vast majority of items. That doesn’t mean we don’t use “complex subjects” in LCSH that represent a complete topic. For example, we do have “best practice” objects that use “United States–History–Civil War, 1861-1865” as that is the single Library of Congress topic entry that defines that war. The majority of these cases are in the “Mixture” category in the above table. For example, the item I looked at with that string also had the subjects of “Monuments & memorials” and “Churches“.
Back on topic, this means the vast majority of “LCSH Style” topic subjects came to us from metadata sources we do not control. That would namely be OAI feeds from institutions that use the “pre-coordinated LCSH Subjects” as their metadata practice and that we were unable to break up on our end. This is an important note as these records coming from a series of uniform minority sources in the system could indicate other factors are at play for these numbers. Taking into account the numerous potential factors (such as quality of source metadata in other areas of the record or how interesting the items are) are mostly beyond the scope of this blog post. I will provide a breakdown of these items comparing those that have a topic subject but no geographic subject to those objects that do contain that geographic subject. Albeit I must add a caveat that this breakdown does make the numbers much more volatile as the size of records in a category becomes quite limited and thus I’d avoid coming to conclusions from these:
LCSH Style Topic Subject Object (no Geographic) | Non-LCSH Style Topic Subjects Objects (no Geographic) | Mixture of Both Styles Topic Subject Objects (no Geographic) | |
Total Records | 549 | 59,328 | 1,224 |
Average Views | 1.741 | 0.829 | 1.622 |
Percent with 1+ Views | 52.6% | 28.9% | 53.1% |
LCSH Style Topic Subject Objects (with Geographic) | Non-LCSH Style Topic Objects (with Geographic) | Mixture of Both Styles Topic Subject Objects (with Geographic) | |
Total Records | 7,570 | 63,025 | 6,887 |
Average Views | 0.880 | 2.267 | 1.176 |
Percent with 1+ Views | 29.2% | 56.1% | 38.2% |
OAI Harvested (Metadata Only Records) vs Hosted Native Records
As a DPLA Hub, we offer both hosted and harvesting options for our member institutes. The majority of our content is hosted directly in the system and those items that are ingested almost always go through our Digitization department where the metadata is often either cleaned up, created by, or given advice on its creation by our Metadata Mob. This then theoretically creates much more uniform metadata for our system that will play well with other objects when searching or faceting. Meanwhile, while we do enrichment on metadata from OAI feeds, we often have much less control over the policies those institutions implement (such as the previous topic subject differences). As such, the variation on the standards used for those objects is likely much greater. This table hopes to quantify that difference… but does have one huge flaw. In cases of an OAI Harvested metadata item, we provide DPLA with a direct link to that object in its native system rather than forcing the user to go through Digital Commonwealth first. As such, OAI Harvested objects below will be missing statistics on those views and the DPLA is a one of our top sources for referral traffic. (As an aside, the site we get the most traffic directed to us from is Facebook).
Hosted Object | OAI Harvested Object | |
Total Records | 108,282 | 37,801 |
Average Views | 1.921 | 0.364 |
Percent with 1+ Views | 51.7% | 15.0% |
The results are as I would expect. I wish I could tell how much of an effect the loss of the DPLA traffic on the stats for the OAI Harvested records had on these results. Still: it does seem highly likely that the uniformity of the metadata does have an effect on how often an object is discovered in our shared system.
The Fourth Dimension!
While knowing where a record is from and what it is about is quite important, I haven’t talked about the “when” aspect. I decided I’d run some quick stats that looks at how having a date on a particular item might increase the findability of an item.
Objects with a Date | Objects with No Date | |
Total Records | 143,097 | 2,986 |
Average Views | 1.500 | 2.405 |
Percent with 1+ Views | 41.9% | 55.6% |
The good news: 98% of our records have a date associated with it! That is actually higher than I would have expected. More objects have a date in our system than have a subject associated with it!
The bad news? This means I don’t have a large enough group of “no date” items to figure out what effect a date might have on the views of an object. From the stats above, it would seem that objects without a date have a significant higher viewer average than those that contain a date which does not make logical sense. So while the above table are the actual stats, the only sense I can make from it is that individuals creating these records are doing an awesome job adding in a date.
Conclusion
It would appear the “exploded LCSH” or “non pre-coordinated LCSH” topic subject items are more discoverable in our system. However, it also appears likely that uniformity of metadata increases the odds of an object being discovered, so that could be a result of that being the primary policy we implemented for topic subjects. It would be interesting to see the same subject analysis that have been run here run over all of the DPLA data to see if the same patterns hold up in an even larger pool of objects.
Thanks for reading once again! Next time will likely be a move away from stats and on another aspect of the Digital Commonwealth system. Take care!