Replacing High Date with database function returning current timestamp.

silverkiwi Thu Apr 26, 2012 10:02 pm

I'd like thoughts on replacing the high date (9999-12-31) with a database function returning the current date at the time of query (e.g. date(now) or current(date)) for usage in SCD tables

A high date has been traditionally used to signify something that is still current, quite often in dimension tables, however falls flat when users query with dates in the future.

As we only know at the time of running the query information up until right now - at any point in time the high date may be replaced.

There a number of situations where the high date when returned to the user is non-sense - as we can't tell the future.

By replacing the highday with a current timestamp function :

1) Dimension tables would become real-time in relation to the time dimension.
2) Ensures a user never gets to see rubbish data like 9999-12-31 - instead would recieve information stamped at the time of running the query.
3) Ensures users can't return information about the future which we don't know yet.

What are your thoughts on this?

ngalemmo Fri Apr 27, 2012 12:58 am

1) Dimension tables would become real-time in relation to the time dimension.

Huh? Its a table. What does that have to do with real time? Besides, you can't store a function. And, if you are using a function (virtual column) in lieu of an expiration date (physical column), how would you ever know the row expired?

2) Ensures a user never gets to see rubbish data like 9999-12-31 - instead would recieve information stamped at the time of running the query.

Why is it rubbish? If you show expiration date on a report, how is a user going to know what the current timestamp means when they look at the report a couple of days later? Did everything expire? Why not just print nothing in that column if it hasn't expired? It's easy enough to do in the report logic.

3) Ensures users can't return information about the future which we don't know yet.

If we are talking about an expiration date, nobody's foretelling the future. All it means is it hasn't expired yet. But, as I said, you can display anything you want... but current timestamp would not be one of my choices. As far as the table goes, the reason a high date is used is so that you can do point in time analysis using BETWEEN in your query without the need for complex predicate expressions.

silverkiwi Fri Apr 27, 2012 6:54 am

To clarify - I was thinking the expiration date would be coalesce(expiration_date, date(now)). So the column would hold an expiration date if populated - however the current date (at time of running the query) if not.

This would prevent reporting things in the future. E.g. How many customers have gold status in June of 2013? Which is impossible to tell - however using 9999-12-31 would return a result.

I think the idea the dimension tables using 9999-12-31 for the expiration date are not accurate. The expiration date is updating itself at the same pace as time moves forward - it's not static until it is actually updated.

How to tell what's the most current record? I'd suggest using another flag (current_flag) - not double loading the expiration date with two meanings.

Replacing the expiration date with nothing would make time analysis more difficult as your pointed out (using BETWEEN).

I think the expiration date is constantly updating as time moves forward - (hence the reference to real time) until it does get updated with a permanent expiration date which replaces the current date function.

umutiscan Fri Apr 27, 2012 7:46 am

If you refer to "valid_to" columns used in SCDs by expiration date, these columns are not generally visible to reporting user. This column is only used for findig the picture of a dimension in a given time.

If we are talking about an attribute in a dimension (like birth date), your solution would have much more impact on the reports. If you don't know the birth date, you have to use null or a constant value which is known by the users.

I think "How many customers have gold status in June of 2013?" is not a so meaningfull question in a data warehouse if you don't have any prediction data. But if I don't know the expiration date, I can say "Yes, this customer will have gold status if nothing changes."

Jeff Smith Fri Apr 27, 2012 10:57 am

That would be something for a report but not stored in the Dimension Table. You might be able to add such a query object in the query tool, but it's not something that belongs in the dimension table.

Sponsored content

Replacing High Date with database function returning current timestamp.

Replacing High Date with database function returning current timestamp.

Re: Replacing High Date with database function returning current timestamp.

Re: Replacing High Date with database function returning current timestamp.

Re: Replacing High Date with database function returning current timestamp.

Re: Replacing High Date with database function returning current timestamp.

Re: Replacing High Date with database function returning current timestamp.