Recap, Caveats, and Decisions
Last time I broke down some top level statistics comparing the traffic through the DPLA pipeline. As the notice above that post reads, there may be an issue with the DPLA numbers. It is worth noting that this isn’t directly the HTTP to HTTPS conversion: after all, none of the DPLA traffic for those statistics ever hits our site and the numbers come from DPLA’s Google Analytics. However, Michael Bitta of DPLA still feels there were broader statistical issues caused by some of their recent changes that could have affected their internal numbers as well.
With the above in mind, it is hard to do deep analysis of the dataset. It has become apparent that I don’t have it in me to complete a real “part 2″ of this series… but that doesn’t mean that you can’t dig into the numbers as they exist! I’ve decided that I’ll provide the source data for anyone who might be curious.
All the qualifications from my part 1 blog post apply to these numbers. Additionally:
- Not all items have a DPLA ID listed because those items had no clickthroughs or views in the report from DPLA. I could have still scripted a way to look them up but never implemented that piece. You should be able to get the DPLA ID from the DPLA API using the Digital Commonwealth PID though.
- Some DPLA items show a clickthrough but no item view. This is not a bug. On the DPLA site, is possible to click through to the place hosting an item without visiting the detailed item page on https://dp.la. Essentially the lack of a view means they did a search on DPLA and just clicked to view the item at its source location directly in the search results view.
The Dataset Download
Download the dataset here: dpla_stats_2017_01_14-2017_03_14.xlsx