Multivariate Time Series Database Management
Much of our research centers around studying structure within many time series. Unfortunately, most common statistical models operate on univariate time series data.
A multivariate time series consists of multiple time series, all on the same scale and all observed during the same time period. Sometimes when we receive data, for example energy usage in San Francisco, it is not on the same level as say, energy usage in Palo Alto. One time series spans 2009-2011, and the other spans 2010 - 2011. Perhaps another series is introduced which spans 2008 - 2010. Only the year 2010 has data on all three series. As the number of time series grows large, it can be extremely difficult to understand how much data is available and for how long.
Frequently, one time series is represented as a SQL table on its own. We could run multiple joins on the time index to see all these series together, but this is extremely slow and memory intensive. Instead, we have implemented a python script to get data from a MySQL database and build a 2D Numpy array representing the multivariate time series matrix. You can download the python script here.
A multivariate time series consists of multiple time series, all on the same scale and all observed during the same time period. Sometimes when we receive data, for example energy usage in San Francisco, it is not on the same level as say, energy usage in Palo Alto. One time series spans 2009-2011, and the other spans 2010 - 2011. Perhaps another series is introduced which spans 2008 - 2010. Only the year 2010 has data on all three series. As the number of time series grows large, it can be extremely difficult to understand how much data is available and for how long.
Frequently, one time series is represented as a SQL table on its own. We could run multiple joins on the time index to see all these series together, but this is extremely slow and memory intensive. Instead, we have implemented a python script to get data from a MySQL database and build a 2D Numpy array representing the multivariate time series matrix. You can download the python script here.