Getting external data into Herodotus is an important part of the project. Data can be encoded in a wide array of formats and they all have to be parsed into Actionscript Value Objects. I've yet to define the metadata for these VOs so in an effort to find some commonality I spent much of the last few months digging through a mountain of temporal specifications, projects and APIs. It's been hard work and a journey peppered with surprises and revelations. Over this and the next two posts I hope to cover some of what I've learned and how this has shaped my semantic model of time.
The semantics of time
The properties of time are elusive. It's a fundamental concept with a myriad of definitions, each dependent upon the experiences and requirements of the individual or group who defines it. Disciplines as diverse as physics, chemistry, biology, astronomy, theology, philosophy, history, music and politics have all had a hand in shaping our definitions of time.
It's fascinating to watch different people approach the same problem. Some attempts make you want to scream "oi, no!", and others have you immediately reaching for your notebooks; throwing out old ideas in favour of new, each revelation informing your overall understanding of the problem.
The fundamentals and architecture of time require some deep thought and on the whole this is evident in the specifications I looked at. The introduction to the ISO8601 specification gives a good introduction to the basic temporal concepts of a Timeline, an Instant and an Interval. The Time ontology in OWL describes them in a little more detail. These three concepts seem to be universally applied or at least implied in all of the specs I looked at.
Old Father Time: ISO8601
In a review of Temporal Web Standards I showed that almost all web standards for DateTime have their roots in the ISO8601 specification. This was great because it gives me the commonality I was looking for.
Unfortunately this common ancestry throws up another problem, but before I get into that it's worth noting the main innovations to take from ISO8601:
- Dates are human readable.
- Dates are stored internally in UTC and converted to local time as required.
When date and time are represented as strings they are formatted in order of granularity, from largest to smallest. This offers the following advantages:
- Dates are language neutral.
- Dates are unambiguous.
- Dates are machine sortable.
Julian 1.1
Now for the bad news. ISO8601 models our modern civil "Gregorian" calendar. Introduced in the Inter gravissimas papal bull by Pope Gregory XIII this system of measuring time reformed the earlier Julian calendar. The difference between the two calendars is small - only a slight change to the leap year rule to correct the drifting date of Easter. In software terms the Gregorian calendar would be considered a patch: it's Julian 1.1.
The problem of dating historical events with ISO8601 is really one of practicality. The reform came into effect, in Catholic countries at least, in 1582. The thing is (rather obviously) a lot of history happened before then. These sorts of dating problems aren't new to historians but the conversion to Gregorian, and often back again, can be non trivial.
For example, meet Tom. Tom is a historian studying the French Revolution. He has found a date on a document that uses the short-lived put completely wonderful French Republican calendar. The FRC was a metric calendar (10 days in a week, 10 hours in a day etc.) adopted very briefly alongside the rest of the metric system. If Tom wanted to store information on the web about this document he would have to convert the date from French Republican to Gregorian.
The Proleptic Gregorian Calendar
The common response of spec writers to the problem of historical dating is to use the proleptic Gregorian calendar. A proleptic calendar is one in which the calendar rules are extended backwards in time before when the calendar was introduced. Now this sounds like a great idea until you actually try and use it.
Meet Tom again. This time he is studying the life of Julius Caesar. Caesar was born when one of many permutations of the Roman calendar was in effect, and he died two years into the counting of his eponymous Julian calendar. Dates in his life require conversion from at least two different calendar systems into the proleptic Gregorian. Just to add to the complexity all of those dates are considered BC in the Julian/Gregorian calendars. While most implementations of proleptic Gregorian (including ECMAScript) happily count zero and on into negative numbers, the proper BC/AD (or BCE/CE if you prefer) system doesn't count a year zero. 1AD is preceded by 1BC. Conversions like this are complex and error-prone. I really don't think the proleptic Gregorian is a very practical storage solution.
As if I haven't rammed the point home enough then consider a further example. Meet Dick, he's a geologist studying the Cretaceous Tertiary boundary. What is 65.5 million years ago in Gregorian? It's all a bit fuzzy. Don't even bother asking Harry the cosmologist.
The revelation
Once you realise there is no practical support for historical events prior to 1582 it quickly becomes obvious why we don't yet have a deep-zoom for history. No online support = no online data. To pull this off I need a model of time (and space, but I'll get into that later) that goes back beyond the common Gregorian calendar and in fact beyond any calendar into the timescales of geology and cosmology.