Data Analyst Project 2

Analyzing the NYC Subway Dataset for Dependencies Between Weather Data and Subway Ridership

by Benjamin Söllner, benjamin.soellner@gmail.com

based on the Udacity.com Intro To Data Science Course

Illustration of a Weather Turnstile

Preface

This report is provided as a Python Notebook which is part of a GitHub repository containing this course's code completed with my personal solutions. The (runnable) code segments in this notebook generate output which is discussed in this report. For certain discussions, it might be beneficial to the reader's understanding to look at the code side-by-side with this report.

How to find the relevant code in the GitHub repository?: The import statements at the top of each code fragments guide you to the corresponding solution, i.e. the following line

from project_a.topic.file import f

would reference the file project_a/topic/file.py, and more precisely, function f(...), in the GitHub repository. Note, how this also ties nicely into this course's structure and guides you directly to the respective topic in the project_a part of the Intro To Data Science Course.

References (0.)

There were two forum posts which helped me out very much during this project:

The first forum post / problem could only be solved after a coach appointment with Carl, who swiftly me with this Gist code snippet.

Additionally, here are a few more useful references:

Visualization (3.)

Visualizations were created during the Problem Set 4 of the courses. The following sections reference the code which was created to get a few visualization and, moving on, describe them in detail according to the questions from the project rubric:

Visualization 1: Histogram of Hourly Riders on Rainy vs. Not-Rainy Days

The following visualization is drawn from Problem 3.1 and shows a histogram of hourly rides on rainy (green) vs. non-rainy (blue) days. Depicted are the number of days (y-axis) when the hourly average number of subway rides are falling within a specific range (x-axis).

In [1]:
%matplotlib inline
from project_3.plot_histogram.plot_histogram import entries_histogram
import pandas as pd
from ggplot import * 

turnstile_weather = pd.read_csv("project_3/plot_histogram/turnstile_data_master_with_weather.csv")
plt = entries_histogram(turnstile_weather)
print plt
<module 'matplotlib.pyplot' from 'C:\Unmanaged Programs\Anaconda\lib\site-packages\matplotlib\pyplot.pyc'>

Interpretation

We can see a non-normal distribution of the data. For non-rainy days, there are more days with a lower hourly average than for rainy days.

Visualization 2: Subway Entries by Day of Week

The next visualization is taken from Problem 4.1 and shows the average number of subway entries per hour (y-axis) for each weekday (x-axis).

In [1]:
%matplotlib inline
from project_4.exercise_1.data_visualization import plot_weather_data
import pandas as pd
from ggplot import * 

turnstile_weather = pd.read_csv("project_4/exercise_1/turnstile_data_master_with_weather.csv")
turnstile_weather['datetime'] = turnstile_weather['DATEn'] + ' ' + turnstile_weather['TIMEn']
gg =  plot_weather_data(turnstile_weather)
print gg
<ggplot: (32947260)>

Interpretation

There are much fewer subway rides (entries) on weekends (Saturday and Sunday) than on working days with Monday beeing the working day where least people enter the subway.

Visualization 3: 10 Stations With the Lowest & Highest Number of Entries vs. Exits

The last visualization is taken from Problem 4.2 and shows the ratio between entries & exits on various stations. Depicted are only the first 10 and the last 10 stations where that ratio is smallest and biggest, respectively. The y-axis is cut at 100 since unit "R070" has a much higher ratio than all the other units. The output of the program code below also shows the raw data as print-out.

In [1]:
%matplotlib inline
from project_4.exercise_2.data_visualization import plot_weather_data
import pandas as pd
from ggplot import * 

turnstile_weather = pd.read_csv("project_4/exercise_2/turnstile_data_master_with_weather.csv")
turnstile_weather['datetime'] = turnstile_weather['DATEn'] + ' ' + turnstile_weather['TIMEn']
gg =  plot_weather_data(turnstile_weather)
print gg
    index  UNIT  EXITSn_hourly  ENTRIESn_hourly         ratio
0     325  R338     112.908571        25.508571      0.225922
1     439  R454     710.897590       178.054217      0.250464
2     440  R455     437.702381       141.119048      0.322409
3     323  R336     323.753086       114.061728      0.352311
4      40  R042    1760.792350       682.508197      0.387614
5     441  R456     598.080214       267.582888      0.447403
6     251  R263     300.174129       176.049751      0.586492
7      12  R013    3871.911458      2491.994792      0.643608
8      30  R032    6020.570681      4144.047120      0.688315
9     403  R418      49.386364        37.068182      0.750575
10    304  R317     142.268156       753.363128      5.295374
11    391  R405     130.713483       750.196629      5.739244
12    293  R306      39.723757       327.276243      8.238804
13    328  R341      58.861878       531.270718      9.025718
14    331  R344      24.304878       510.274390     20.994732
15    442  R459       3.373494        87.819277     26.032143
16    174  R185      28.191011      1046.129213     37.108609
17    449  R469      11.338889       518.588889     45.735424
18    448  R468       9.664835       512.609890     53.038658
19     68  R070       0.025641      1647.912821  64268.600000
<ggplot: (29660315)>

Interpretation

The 3 stations with the most exits (vs. entries) are:

  1. R338 - Beach 36 St - A beachy station close to Rockaway Park on Long Island
  2. R454 - Prospect Ave in Brooklyn - A major junction on the Brooklyn riverside
  3. R455 - 25 St in Brooklyn - A station just south of Prospect Ave

The 3 stations with the most entries (vs. exits) are:

  1. R070 - St Georges on Staten Island - the station on the banks of Staten Island (the "overwhelming" winner)
  2. R468 - Roosevelt Island Tram - Manhattan Side
  3. R469 - Roosevelt Island Tram - Roosevelt Island Side

The last two items hint at a data error: since the Roosevelt Island Tram consists only of two stations, the number of people entering the tram summed up over those two stations must be similar to those who exit the tram. We can rule out a systematic difference in data collection like tram-exits not being counted at all, since the number of exits is non-zero. There seems to be something else at play here.

Statistical Test (1.)

As a statistical test, a Mann-Whitney-U-Test has been performed in Problem 3.3 of this course to find out whether more people ride the subway when it is raining vs. when it is not raining. The following code returns the results of this test in the following format:

(mean_with_rain, mean_without_rain, U, p)
In [4]:
from project_3.mann_whitney_u_test.mann_whitney_u import mann_whitney_plus_means
import pandas as pd

input_filename = "project_3/mann_whitney_u_test/turnstile_data_master_with_weather.csv"
turnstile_master = pd.read_csv(input_filename)
print mann_whitney_plus_means(turnstile_master)
(1105.4463767458733, 1090.278780151855, 1924409167.0, 0.019309634413792565)

Methodology / Applicability (1.1, 1.2)

The Mann-Whitney-U-Test is applied to data that is non-normal. Based on the histograms shown in Visualization 1, we can assume that the average number of subway riders per day is non-normally distributed (both for the subset of the data for rainy days vs. the subset of the data for non-rainy days). The Mann-Whitney-U-Test per definition is one-tailed, but we can perform a two-tailed Test by multiplying the resulting p-Value by 2.

Our Null-Hypothesis in our case is: "The distributions of ridership on rainy vs. non-rainy days are not significantly different". We choose $\alpha = p_\text{critical} = 0.05$ and a two-tailed test in order to test for both lower and higher ridership on rainy days.

Results / Significance / Interpretation (1.3, 1.4)

The test results above give us a $p_\text{one-tailed} = 0.0193$ and a $p_\text{two-tailed} = 0.0386$ with the sample means $\bar{x}_\text{rain} = 1105.44 \frac{\text{rides}}{\text{hour}}$ and $\bar{x}_\text{norain} = 1090.2788 \frac{\text{rides}}{\text{hour}}$. Based on the p-Value and our previously chosen $p_\text{critical}$ we can reject the Null-Hypothesis.

Therefore, there is a significant difference between the distribution of daily average rides per hour on rainy days vs. on non-rainy days: the number of people riding the subway is significantly larger ($p_\text{critical} = 0.05$) on rainy vs. on non-rainy days and the distribution therefore more skewed to the left.

Regression (2.)

The most promising Regression Model could be calculated using the Library statsmodel (see References) and the improved dataset which was provided for this project. Therefore, code of the regression function is not part of the website submission but of the GitHub repository. Since statsmodel's excessively long output the listing of the output has been moved to the Appendix.

Approach (2.1)

I chose to use a non-linear model with a regression based on Ordinary Least Squares implemented using statsmodel. In particular, in preparing my feature matrix (features) I use the same method I would use for a simple linear model. The only difference - marked below by (*) - is, I will be adding additional columns to the matrix for those features not described by one coefficient ($a X$) but by a polynom ($a_1 X + a_2 X^2 + ... + a_g X^G$) instead.

The Data Frame features is computed as follows:

  1. Start with the Data Frame of the CSV data reduced to the columns containing only the chosen features.
  2. Handle the numerical features:
    • Normalize them
    • (*) For all features that need to be threated as a non-linear / polynomial term: With $g$ being the chosen grade (= highest exponent) of the polynomial term, add $g-1$ Pandas Series to the dataframe with each frame representing one of the higher powers of the feature starting from $2$ leading up to $g$.
  3. Handle the categorical features: For each categorical feature, create a Pandas Data Frame of dummy variables, drop the last Pandas Series to eliminate collinearity, then join the Data Frame of dummy variables with the original Data Frame.
  4. Add a constant term as an additional column to the Data Frame.

We then prepare another Data Frame values only containing the one column with the response variable. With both variables in place, we can easily use the statsmodels.api (sm) to calculate the model, fit it to the dataset and calculate the predicted values of the features (prediction).

result = sm.OLS(values, features).fit()
prediction = result.predict(features)
print result.summary()

Chosen Features / Coefficients (2.2, 2.3, 2.4)

The following features were systematically chosen for the regression model:

  • Hour of day (hour) was chosen as a polynomial term with grade 4 and 4 coefficients respectively ($a_1 X + a_2 X^2 + a_3 X^3 + a_4 X^4$). My intuition was, that there would be three extrema in the ridership of subways: a climb to a maximum in the morning-rush-hour, a minimum during the day and another maximum in the evening-rush-hour.
  • Mean Temperature (meantempi) was chosen as a 2 grade polynomial term ($a_1 X + a_2 X^2$) because of the assumption, that there is a minimum, where less people take the subway - when the temperature is just right to be outside. If it's too cold or too hot, people might use the subway more often. This model yielded a higher $R^2$ value than a simple linear term.
  • Mean Precipitation (meanprecipi) as a linear term with the assumption that people take the subway more often when it rains.
  • Mean Pressure (meanpressurei) as a linear term guessing that high pressure weather conditions (generally nicer weather) means, people might decide to stay out more.
  • Mean Windspeed (meanwindspdi) as a linear term assuming that storms make people take the subway more often.
  • Weekend or not (is_weekend) as a linear / quasi-categorical term assuming that there is a significant difference between the number of people taking the subway on weekends vs. on other days.
  • Weather Conditions (conds) as a (linear) categorical term using dummy variables, assuming that there is a significant difference in the number of people taking the subway during different weather conditions
  • Turnstile Unit (UNIT) as a (linear) categorical term using dummy variables, assuming that there is a more complex significant difference between the riderships on different subway stations. This might overfit the model with little possibilities to add additional subway stations (see Conclusion). I decided to use this term anyway since using "latitude & longitude" instead, even as polynomial term, I only got to about $R^2 = 28.0$.

The Coefficients (or Parameters, Weights) can be found in the program output above in the coeff column.

$R^2$ & Interpretation of $R^2$ (2.5 & 2.6)

With $R^2 = 0.528$, we do have a rather high value compared to $R^2 = 0.40$ from Problem Set 3. $R^2$ indicates, how much of the total variability in the dataset is explained by our model, which would be about 52%. For a model representing human behaviour solely on based on time of day, weekend/non-weekend and weather, this is a reasonable value for $R^2$.

It should be noted, that $R^2$ does not signify, whether our model is overfitted / biased (and might therefore not represent data outside the dataset). Therefore, additional tests would be required with a learning and a training set.

$R^2$ also does not explain Goodness-of-fit. To asess this, we would have to compare individual predicted values with values from the dataset to look for potential systematic over- or underpredictions in the dataset.

Conclusion (4.)

A Mann-Whitney-U-Test was conducted (see Statistical Test) to compare the number of subway rides in rainy and non-rainy conditions. There was a significant difference ($\alpha = 0.05$) in the average number of subway rides per hour during rainy ($\bar{x}_\text{rain} = 1105.44 \frac{\text{rides}}{\text{hour}}$) vs. during non-rainy ($\bar{x}_\text{norain} = 1090.27 \frac{\text{rides}}{\text{hour}}$) conditions; $U = 1924409167.0, p = 0.0193096$

After controlling for other factors (type of rain conditions, subway unit etc.) using a regression model (see Regression), we could furthermore conclude that the coefficients predicting the number of subway rides have a significant impact on our regression model. Although we cannot expect to 100% eliminate collinearity between, e.g., weather conditions (conds parameter) and numerical weather data, like percipitation (meanprecipi), the results of this model give us a good idea that the coefficients taking rain into account have a significant impact. As the tabulated results below show, the respective p-Value (P>|t|) is calulated low (around 0.000) for all rainy conditions except "Light Rain", suggesting, that those conditions do indeed have significance.

                             coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------------
[...]
meanprecipi               63.6329     13.523      4.706      0.000        37.128    90.138
[...]
conds_Heavy Rain        -760.9784    124.282     -6.123      0.000     -1004.574  -517.383
conds_Light Drizzle     -728.2438    118.055     -6.169      0.000      -959.634  -496.853
conds_Light Rain          58.2079     55.368      1.051      0.293       -50.315   166.731
[...]
conds_Rain              -521.3172     84.996     -6.133      0.000      -687.912  -354.723

The coef value describes the impact of rainy conditions on subway ridership.

  • "Heavy rain" (conds_Heavy Rain == 1.0) reduces the number of riders by 760.9784, but the value of meanprecipi, assumingly positive for heavy rain, will increase the number of predicted riders again by 63.6329 per percipitation unit
  • "Rain" (conds_Rain == 1.0) and "Light Drizzle" (conds_Light Drizzle == 1.0) likewise reduces the number of riders by 521.3172 and 728.2438 respectively, mitigated, again, by an increased value of meanprecipi

In order to gauge, which of the two factors (conds_* and meanprecipi) has a higher effect, we need to look at the distribution of meanprecipi after normalization for the subset of data for each value of the conds_*feature:

analysis = pd.DataFrame( {"conds_Rain": features[features["conds_Rain"] == 1.0]["meanprecipi"].describe(), \
                          "conds_Heavy Rain": features[features["conds_Heavy Rain"] == 1.0]["meanprecipi"].describe(), \
                          "conds_Light Drizzle": features[features["conds_Light Drizzle"] == 1.0]["meanprecipi"].describe() } )
print analysis
========= Analyzing mean precipitation data: ===========
       conds_Heavy Rain  conds_Light Drizzle  conds_Rain
count        288.000000           335.000000  961.000000
mean           1.311504            -0.223961    3.339870
std            0.373432             0.052187    2.731260
min            0.533254            -0.282527   -0.282527
25%            1.043118            -0.282527    0.839172
50%            1.552981            -0.180554    1.349036
75%            1.552981            -0.180554    6.141753
max            2.470736            -0.129568    9.353893

Based on this data:

  • For "heavy rain", the decrease by 760.9784 based on the conds_Heavy Rain term is increased on average by $1.311504 * 63.6329 = 83.45480$ by the meanprecipi term. Assuming normal distributed meanprecipi with 98.5% of data points falling within two standard-deviations of the mean, we can bound the effect of the meanprecipi to an increase in subway ridership of $[82.4752, 84.43432]$, and in conjunction with the conds_Heavy Rain feature, an overall decrease of subway ridership of $[676.54408, 678.50311]$ passengers
  • For "light drizzle", the decrease by 728.2438 based on the conds_Light Drizzle term is increased on average by $-0.223961 * 63.6329 = -14.25128$ by the meanprecipi term (or, in fact, further decreased by 14.25128). With the same argumentation as above, we can find a 98,5% CI for the effect of this coefficient with $[-7.60967,-20.89291]$ and in conjunction with conds_Light Drizzle conclude an overall decrease of subway ridership of $[735.8535,749.1367]$ passengers.
  • For "rain", the decrease by 58.20796 based on the conds_Rain term is increased on average by $3.33987 * 63.6329 = 12.13341$ by the meanprecipi term. The 98,5% CI assuming normally distributed meanprecipi data for the value is $[-7.11375,31.97820]$, which, in conjunction with the conds_Rain term, yields an overall decrease of subway ridership of $[26.22975,65.91934]$.

Now, Do more People ride the Subway in Rainy Conditions?

No, in fact less people do. This came as a surprise to me - maybe generally more people would like to stay indoors.

The previous analysis showed, that rain generally leads to a decrease in number of users using the subway. Depending on the type of rain, this effect seems to be more or less prominent, with "heavy rain" and "light drizzle" leading to even fewer subway riders than normal "rain".

Coming to this conclusion was not an easy process since there are cross-dependencies in the coefficients. How this potentially affected the results of this analysis is explained in the following chapter.

Reflection (5.): Shortcomings of Dataset / Analysis & Other Insights (5.1, 5.2)

There are a few potential shortcomings like:

  • Potential remaining collinearity of parameters: Parameters like meanprecipi are corellated with values like conds_Rain etc. In our example, however, I analyzed the correlation matrix and found the corellation between those values to actually be not all that high (except conds_Rain). This might be due to poor data (subclassifications Heavy rain etc. inconsistently used).

                           meanprecipi conds_Heavy Rain conds_Light Drizzle conds_Light Rain conds_Rain
      meanprecipi              1.00000          0.10814           -0.019928          0.06396    0.50710
      conds_Heavy Rain         0.10814          1.00000           -0.007337         -0.01771   -0.01252
      conds_Light Drizzle     -0.01993         -0.00734            1.000000         -0.01911   -0.01351
      conds_Light Rain         0.06396         -0.01771           -0.019107          1.00000   -0.03260
      conds_Rain               0.50710         -0.01252           -0.013509         -0.03260    1.00000
  • Potential overfitting of model: I did not separate the data into learning set and test set to gauge potential overfitting of the model. One particular area, where the model might be overfitted is the subway stations. One interesting experiment would be to remove one subway station entirely and test, how well the model would predict the ridership of that new subway station. Although a model based on latitude and longitude coordinates (see Chosen Features) has a much lower $R^2$ value, based on such a test, this model might turn out to be better for a Use Case, where the model should predict ridership of new subway stations

  • Missing Data: Of course, additional data could be used to predict subway ridership even more accurately: what comes to mind is data about city events, tourism, financial data to try to model the impact of shopping trips etc. or, if attainable, other traffic data, e.g., from cars or tunnels.

Appendix

Output of Regression

In [2]:
from project_3.other_linear_regressions.advanced_linear_regressions import predictions
import pandas as pd

input_filename = "project_3/other_linear_regressions/turnstile_weather_v2.csv"
turnstile_master = pd.read_csv(input_filename)
predictions(turnstile_master)
========= Feature vector before normalization: ===========
               hour     meantempi  meanpressurei   meanprecipi     meanwspdi  \
count  42649.000000  42649.000000   42649.000000  42649.000000  42649.000000   
mean      10.046754     63.103780      29.971096      0.004618      6.927872   
std        6.938928      6.939011       0.131158      0.016344      3.179832   
min        0.000000     49.400000      29.590000      0.000000      0.000000   
25%        4.000000     58.283333      29.913333      0.000000      4.816667   
50%       12.000000     60.950000      29.958000      0.000000      6.166667   
75%       16.000000     67.466667      30.060000      0.000000      8.850000   
max       20.000000     79.800000      30.293333      0.157500     17.083333   

      is_weekend  
count      42649  
mean   0.2855636  
std    0.4516877  
min        False  
25%            0  
50%            0  
75%            1  
max         True  
========= Feature vector after normalization: ===========
               hour     meantempi  meanpressurei   meanprecipi     meanwspdi  \
count  4.264900e+04  4.264900e+04   4.264900e+04  4.264900e+04  4.264900e+04   
mean   2.920228e-17 -6.118135e-13  -4.717182e-12  1.005557e-14 -3.489246e-14   
std    1.000000e+00  1.000000e+00   1.000000e+00  1.000000e+00  1.000000e+00   
min   -1.447883e+00 -1.974889e+00  -2.905620e+00 -2.825272e-01 -2.178691e+00   
25%   -8.714248e-01 -6.946878e-01  -4.404070e-01 -2.825272e-01 -6.639360e-01   
50%    2.814911e-01 -3.103871e-01  -9.985174e-02 -2.825272e-01 -2.393853e-01   
75%    8.579490e-01  6.287476e-01   6.778341e-01 -2.825272e-01  6.044748e-01   
max    1.434407e+00  2.406138e+00   2.456854e+00  9.353893e+00  3.193710e+00   

         is_weekend     hour_exp2     hour_exp3     hour_exp4  meantempi_exp2  
count  4.264900e+04  42649.000000  42649.000000  42649.000000    42649.000000  
mean   1.143934e-16      0.999977     -0.023130      1.683780        0.999977  
std    1.000000e+00      0.826948      1.799226      1.923422        1.229088  
min   -6.322146e-01      0.079237     -3.035289      0.006279        0.000066  
25%   -6.322146e-01      0.087005     -0.661743      0.007570        0.108321  
50%   -6.322146e-01      0.759381      0.022305      0.576660        0.482591  
75%    1.581704e+00      2.057523      0.631516      4.233401        1.426302  
max    1.581704e+00      2.096364      2.951325      4.394743        5.789501  
                            OLS Regression Results                            
==============================================================================
Dep. Variable:        ENTRIESn_hourly   R-squared:                       0.528
Model:                            OLS   Adj. R-squared:                  0.525
Method:                 Least Squares   F-statistic:                     182.2
Date:                Sat, 01 Aug 2015   Prob (F-statistic):               0.00
Time:                        17:26:07   Log-Likelihood:            -3.8530e+05
No. Observations:               42649   AIC:                         7.711e+05
Df Residuals:                   42388   BIC:                         7.734e+05
Df Model:                         260                                         
Covariance Type:            nonrobust                                         
==========================================================================================
                             coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------------
const                    681.7243    161.518      4.221      0.000       365.147   998.302
hour                    1967.5236     28.451     69.156      0.000      1911.760  2023.287
meantempi                -24.5567     18.238     -1.346      0.178       -60.303    11.190
meanpressurei            -20.4526     12.228     -1.673      0.094       -44.420     3.515
meanprecipi               63.6329     13.523      4.706      0.000        37.128    90.138
meanwspdi                -32.0245     15.414     -2.078      0.038       -62.237    -1.812
is_weekend              -452.0921     10.436    -43.320      0.000      -472.547  -431.637
hour_exp2              -1550.2715     56.817    -27.285      0.000     -1661.634 -1438.909
hour_exp3               -647.0255     15.768    -41.035      0.000      -677.930  -616.121
hour_exp4                809.1889     24.360     33.218      0.000       761.443   856.935
meantempi_exp2          -113.4626     13.790     -8.228      0.000      -140.491   -86.434
conds_Fog                677.0078    300.765      2.251      0.024        87.503  1266.513
conds_Haze               -91.0131     60.541     -1.503      0.133      -209.676    27.649
conds_Heavy Rain        -760.9784    124.282     -6.123      0.000     -1004.574  -517.383
conds_Light Drizzle     -728.2438    118.055     -6.169      0.000      -959.634  -496.853
conds_Light Rain          58.2079     55.368      1.051      0.293       -50.315   166.731
conds_Mist              1643.8253    418.743      3.926      0.000       823.080  2464.571
conds_Mostly Cloudy     -377.7322     35.110    -10.758      0.000      -446.549  -308.915
conds_Overcast          -169.8078     30.563     -5.556      0.000      -229.712  -109.904
conds_Partly Cloudy       -4.8383     48.697     -0.099      0.921      -100.286    90.610
conds_Rain              -521.3172     84.996     -6.133      0.000      -687.912  -354.723
conds_Scattered Clouds   -21.7502     42.806     -0.508      0.611      -105.651    62.151
UNIT_R004                322.3011    219.965      1.465      0.143      -108.834   753.436
UNIT_R005                329.4671    220.890      1.492      0.136      -103.481   762.416
UNIT_R006                458.6957    218.488      2.099      0.036        30.455   886.937
UNIT_R007                147.2483    221.553      0.665      0.506      -287.000   581.497
UNIT_R008                153.5040    221.859      0.692      0.489      -281.343   588.351
UNIT_R009                119.2868    219.960      0.542      0.588      -311.840   550.413
UNIT_R011               7065.5363    218.567     32.327      0.000      6637.140  7493.933
UNIT_R012               8410.9003    218.013     38.580      0.000      7983.590  8838.211
UNIT_R013               2309.3412    218.013     10.593      0.000      1882.031  2736.652
UNIT_R016                496.1102    218.570      2.270      0.023        67.708   924.512
UNIT_R017               3924.3358    218.013     18.000      0.000      3497.025  4351.646
UNIT_R018               7633.7119    216.906     35.194      0.000      7208.573  8058.851
UNIT_R019               3081.9226    216.774     14.217      0.000      2657.041  3506.804
UNIT_R020               6100.4326    218.013     27.982      0.000      5673.122  6527.743
UNIT_R021               4418.7476    218.575     20.216      0.000      3990.336  4847.159
UNIT_R022               9244.8250    218.013     42.405      0.000      8817.515  9672.136
UNIT_R023               5880.0025    218.013     26.971      0.000      5452.692  6307.313
UNIT_R024               3046.3988    217.047     14.036      0.000      2620.981  3471.816
UNIT_R025               5178.8151    216.774     23.890      0.000      4753.934  5603.696
UNIT_R027               2694.4379    218.013     12.359      0.000      2267.127  3121.748
UNIT_R029               6956.4111    218.013     31.908      0.000      6529.101  7383.722
UNIT_R030               2826.6046    218.013     12.965      0.000      2399.294  3253.915
UNIT_R031               4078.3734    218.013     18.707      0.000      3651.063  4505.684
UNIT_R032               4177.0510    218.280     19.136      0.000      3749.218  4604.884
UNIT_R033               7961.3842    218.013     36.518      0.000      7534.074  8388.695
UNIT_R034                881.5682    222.710      3.958      0.000       445.052  1318.084
UNIT_R035               2531.7212    218.551     11.584      0.000      2103.356  2960.086
UNIT_R036                611.5150    219.221      2.789      0.005       181.837  1041.193
UNIT_R037                703.9129    217.471      3.237      0.001       277.666  1130.160
UNIT_R038                 73.1697    219.834      0.333      0.739      -357.709   504.048
UNIT_R039                611.7135    222.734      2.746      0.006       175.150  1048.277
UNIT_R040               1107.5742    217.186      5.100      0.000       681.886  1533.263
UNIT_R041               2826.4164    218.013     12.964      0.000      2399.106  3253.727
UNIT_R042                333.6930    219.679      1.519      0.129       -96.881   764.268
UNIT_R043               2613.5508    218.013     11.988      0.000      2186.240  3040.861
UNIT_R044               4404.9595    218.013     20.205      0.000      3977.649  4832.270
UNIT_R046               8071.4111    218.013     37.023      0.000      7644.101  8498.722
UNIT_R049               2499.2229    218.013     11.464      0.000      2071.912  2926.533
UNIT_R050               3753.8488    218.570     17.175      0.000      3325.447  4182.251
UNIT_R051               4860.9648    218.013     22.297      0.000      4433.654  5288.275
UNIT_R052               1103.8626    220.770      5.000      0.000       671.148  1536.577
UNIT_R053               3033.4014    217.185     13.967      0.000      2607.714  3459.089
UNIT_R054               1192.7083    218.571      5.457      0.000       764.305  1621.112
UNIT_R055               8189.3920    216.629     37.804      0.000      7764.794  8613.990
UNIT_R056               1178.6015    218.567      5.392      0.000       750.205  1606.998
UNIT_R057               4609.3519    218.013     21.143      0.000      4182.041  5036.662
UNIT_R058                372.7363    218.269      1.708      0.088       -55.076   800.548
UNIT_R059                933.8524    220.582      4.234      0.000       501.508  1366.197
UNIT_R060                538.7767    219.672      2.453      0.014       108.215   969.338
UNIT_R061                364.2453    223.716      1.628      0.103       -74.243   802.733
UNIT_R062               2463.2121    218.013     11.298      0.000      2035.902  2890.523
UNIT_R063                919.4435    223.387      4.116      0.000       481.600  1357.287
UNIT_R064                587.8751    220.884      2.661      0.008       154.938  1020.812
UNIT_R065                599.4163    222.052      2.699      0.007       164.191  1034.642
UNIT_R066                 17.6752    222.709      0.079      0.937      -418.840   454.190
UNIT_R067                756.6749    222.396      3.402      0.001       320.774  1192.576
UNIT_R068                351.1897    222.398      1.579      0.114       -84.714   787.094
UNIT_R069                815.6401    220.453      3.700      0.000       383.548  1247.732
UNIT_R070               1514.1906    218.013      6.945      0.000      1086.880  1941.501
UNIT_R080               3338.5186    218.013     15.313      0.000      2911.208  3765.829
UNIT_R081               3288.2409    218.557     15.045      0.000      2859.866  3716.616
UNIT_R082               1236.3030    218.573      5.656      0.000       807.895  1664.711
UNIT_R083               2852.9272    218.013     13.086      0.000      2425.617  3280.238
UNIT_R084               9757.2014    218.013     44.755      0.000      9329.891  1.02e+04
UNIT_R085               2338.6799    218.551     10.701      0.000      1910.315  2767.045
UNIT_R086               2326.5186    218.013     10.671      0.000      1899.208  2753.829
UNIT_R087                967.2519    219.137      4.414      0.000       537.738  1396.765
UNIT_R089                249.8982    218.570      1.143      0.253      -178.503   678.300
UNIT_R090                321.9616    222.397      1.448      0.148      -113.941   757.864
UNIT_R091               1014.6858    221.409      4.583      0.000       580.720  1448.651
UNIT_R092               1871.1428    219.530      8.523      0.000      1440.860  2301.426
UNIT_R093               1908.9978    220.143      8.672      0.000      1477.512  2340.483
UNIT_R094               1631.2878    217.186      7.511      0.000      1205.599  2056.977
UNIT_R095               2049.9760    218.042      9.402      0.000      1622.609  2477.342
UNIT_R096               2218.9656    216.907     10.230      0.000      1793.824  2644.107
UNIT_R097               2842.7498    216.906     13.106      0.000      2417.611  3267.889
UNIT_R098               1561.0777    218.013      7.160      0.000      1133.767  1988.388
UNIT_R099               2118.1046    218.013      9.715      0.000      1690.794  2545.415
UNIT_R100                432.9019    217.756      1.988      0.047         6.095   859.709
UNIT_R101               2552.4971    218.013     11.708      0.000      2125.187  2979.808
UNIT_R102               3441.3358    218.013     15.785      0.000      3014.025  3868.646
UNIT_R103               1295.3176    220.770      5.867      0.000       862.605  1728.030
UNIT_R104               1218.3243    217.187      5.610      0.000       792.634  1644.015
UNIT_R105               3091.0347    218.013     14.178      0.000      2663.724  3518.345
UNIT_R106               1001.4545    222.395      4.503      0.000       565.557  1437.352
UNIT_R107                417.5365    222.731      1.875      0.061       -19.020   854.093
UNIT_R108               4980.2014    218.013     22.844      0.000      4552.891  5407.512
UNIT_R111               2977.3250    218.013     13.657      0.000      2550.015  3404.636
UNIT_R112               1560.6520    216.907      7.195      0.000      1135.510  1985.794
UNIT_R114                754.0904    217.047      3.474      0.001       328.674  1179.507
UNIT_R115               1129.4118    216.774      5.210      0.000       704.531  1554.293
UNIT_R116               2956.4702    218.013     13.561      0.000      2529.160  3383.781
UNIT_R117                795.3551    222.064      3.582      0.000       360.106  1230.604
UNIT_R119               1749.8065    218.625      8.004      0.000      1321.298  2178.315
UNIT_R120               1410.3913    220.141      6.407      0.000       978.911  1841.872
UNIT_R121               1248.8447    220.849      5.655      0.000       815.975  1681.714
UNIT_R122               2464.0394    218.046     11.301      0.000      2036.665  2891.414
UNIT_R123               1400.7440    219.729      6.375      0.000       970.072  1831.416
UNIT_R124                450.3554    222.404      2.025      0.043        14.438   886.273
UNIT_R126               1618.3788    218.013      7.423      0.000      1191.068  2045.689
UNIT_R127               4559.0885    218.013     20.912      0.000      4131.778  4986.399
UNIT_R137               2335.3006    216.629     10.780      0.000      1910.703  2759.898
UNIT_R139               2297.6964    218.296     10.526      0.000      1869.832  2725.561
UNIT_R163               3094.8250    218.013     14.196      0.000      2667.515  3522.136
UNIT_R172               1660.8627    218.013      7.618      0.000      1233.552  2088.173
UNIT_R179               6541.6369    218.013     30.006      0.000      6114.326  6968.947
UNIT_R181               1547.2284    219.988      7.033      0.000      1116.048  1978.409
UNIT_R183                703.7566    222.732      3.160      0.002       267.197  1140.316
UNIT_R184                804.6280    222.701      3.613      0.000       368.129  1241.127
UNIT_R186                857.1858    219.438      3.906      0.000       427.083  1287.288
UNIT_R188               2098.7537    218.280      9.615      0.000      1670.921  2526.587
UNIT_R189               1185.0919    220.564      5.373      0.000       752.782  1617.402
UNIT_R194               1777.8782    220.023      8.080      0.000      1346.629  2209.128
UNIT_R196               1102.4778    218.856      5.037      0.000       673.516  1531.439
UNIT_R198               1883.4943    218.574      8.617      0.000      1455.086  2311.903
UNIT_R199                503.3321    219.987      2.288      0.022        72.152   934.512
UNIT_R200                939.6611    217.893      4.312      0.000       512.587  1366.735
UNIT_R202               2123.5941    217.187      9.778      0.000      1697.903  2549.286
UNIT_R203               1556.8419    221.167      7.039      0.000      1123.350  1990.333
UNIT_R204               1222.3627    218.013      5.607      0.000       795.052  1649.673
UNIT_R205               1412.9121    217.753      6.489      0.000       986.111  1839.713
UNIT_R207               1773.5218    218.309      8.124      0.000      1345.632  2201.411
UNIT_R208               2419.9129    217.755     11.113      0.000      1993.108  2846.717
UNIT_R209                606.8343    223.370      2.717      0.007       169.024  1044.644
UNIT_R210                328.6627    220.610      1.490      0.136      -103.737   761.062
UNIT_R211               2179.7928    218.013      9.998      0.000      1752.482  2607.103
UNIT_R212               1465.5076    218.285      6.714      0.000      1037.665  1893.350
UNIT_R213                951.4599    220.288      4.319      0.000       519.692  1383.228
UNIT_R214                482.6848    223.745      2.157      0.031        44.140   921.230
UNIT_R215               1370.6144    218.561      6.271      0.000       942.230  1798.999
UNIT_R216                546.2133    218.560      2.499      0.012       117.831   974.596
UNIT_R217                810.3915    223.377      3.628      0.000       372.569  1248.214
UNIT_R218               1885.3542    216.907      8.692      0.000      1460.213  2310.496
UNIT_R219               1165.8431    217.186      5.368      0.000       740.153  1591.533
UNIT_R220               1254.5186    218.013      5.754      0.000       827.208  1681.829
UNIT_R221               1217.1496    223.720      5.441      0.000       778.655  1655.644
UNIT_R223               2022.9115    216.907      9.326      0.000      1597.770  2448.053
UNIT_R224                530.1011    220.833      2.400      0.016        97.263   962.939
UNIT_R225                360.3930    219.439      1.642      0.101       -69.712   790.498
UNIT_R226                670.9567    221.410      3.030      0.002       236.989  1104.924
UNIT_R227                873.7820    218.013      4.008      0.000       446.471  1301.093
UNIT_R228                920.0493    223.017      4.125      0.000       482.932  1357.167
UNIT_R229                378.4393    222.411      1.702      0.089       -57.490   814.369
UNIT_R230                321.8995    221.201      1.455      0.146      -111.659   755.458
UNIT_R231                754.8780    219.400      3.441      0.001       324.849  1184.907
UNIT_R232                828.8376    222.057      3.733      0.000       393.602  1264.073
UNIT_R233                913.9123    224.389      4.073      0.000       474.105  1353.719
UNIT_R234                124.4378    223.427      0.557      0.578      -313.484   562.360
UNIT_R235               2350.8390    218.269     10.770      0.000      1923.027  2778.651
UNIT_R236               1435.4395    217.752      6.592      0.000      1008.641  1862.238
UNIT_R237                591.8134    221.736      2.669      0.008       157.207  1026.420
UNIT_R238               2008.1222    216.907      9.258      0.000      1582.981  2433.264
UNIT_R239                670.9433    218.013      3.078      0.002       243.633  1098.254
UNIT_R240               2442.1360    218.841     11.159      0.000      2013.202  2871.070
UNIT_R242                339.5787    220.260      1.542      0.123       -92.135   771.292
UNIT_R243               1302.0154    220.453      5.906      0.000       869.924  1734.107
UNIT_R244               1518.3745    220.460      6.887      0.000      1086.268  1950.481
UNIT_R246                480.1166    223.139      2.152      0.031        42.760   917.473
UNIT_R247                -39.2485    225.063     -0.174      0.862      -480.376   401.879
UNIT_R248               2880.0261    218.296     13.193      0.000      2452.162  3307.891
UNIT_R249               1150.4911    219.697      5.237      0.000       719.880  1581.103
UNIT_R250                766.2618    220.287      3.478      0.001       334.495  1198.029
UNIT_R251               1057.7657    218.802      4.834      0.000       628.910  1486.621
UNIT_R252                737.6586    218.559      3.375      0.001       309.278  1166.039
UNIT_R253                632.2834    220.776      2.864      0.004       199.558  1065.009
UNIT_R254               2484.7331    216.907     11.455      0.000      2059.591  2909.875
UNIT_R255                658.2741    217.892      3.021      0.003       231.201  1085.347
UNIT_R256                834.6368    218.568      3.819      0.000       406.239  1263.035
UNIT_R257               1658.1154    218.013      7.606      0.000      1230.805  2085.426
UNIT_R258               1296.1897    218.847      5.923      0.000       867.246  1725.134
UNIT_R259                716.6516    219.687      3.262      0.001       286.060  1147.243
UNIT_R260                877.3599    227.546      3.856      0.000       431.365  1323.355
UNIT_R261               1483.2068    219.223      6.766      0.000      1053.526  1912.888
UNIT_R262                275.8648    225.100      1.226      0.220      -165.335   717.064
UNIT_R263                -89.9463    220.292     -0.408      0.683      -521.724   341.831
UNIT_R264                243.0262    218.304      1.113      0.266      -184.854   670.907
UNIT_R265                679.5368    222.423      3.055      0.002       243.582  1115.491
UNIT_R266                805.3420    216.907      3.713      0.000       380.200  1230.483
UNIT_R269                656.1340    218.568      3.002      0.003       227.737  1084.531
UNIT_R270                275.6711    223.115      1.236      0.217      -161.638   712.980
UNIT_R271                162.2130    222.721      0.728      0.466      -274.325   598.751
UNIT_R273               1120.2123    223.370      5.015      0.000       682.403  1558.022
UNIT_R274                745.7277    222.133      3.357      0.001       310.343  1181.113
UNIT_R275                813.5804    219.224      3.711      0.000       383.897  1243.263
UNIT_R276               1180.6530    218.013      5.416      0.000       753.342  1607.964
UNIT_R277                246.3959    226.174      1.089      0.276      -196.910   689.702
UNIT_R278                165.8579    222.400      0.746      0.456      -270.051   601.767
UNIT_R279                674.7570    218.624      3.086      0.002       246.249  1103.265
UNIT_R280                428.5448    225.355      1.902      0.057       -13.155   870.244
UNIT_R281               1062.6201    219.661      4.838      0.000       632.080  1493.160
UNIT_R282               1356.1560    218.586      6.204      0.000       927.724  1784.588
UNIT_R284                546.0553    218.559      2.498      0.012       117.675   974.436
UNIT_R285                490.0211    222.532      2.202      0.028        53.854   926.188
UNIT_R287                361.1686    224.097      1.612      0.107       -78.066   800.403
UNIT_R291               1573.8842    218.013      7.219      0.000      1146.574  2001.195
UNIT_R294                721.1593    220.606      3.269      0.001       288.766  1153.552
UNIT_R295                517.4105    233.841      2.213      0.027        59.077   975.744
UNIT_R300               1982.9487    218.013      9.096      0.000      1555.638  2410.259
UNIT_R303               1011.0042    219.698      4.602      0.000       580.391  1441.618
UNIT_R304                848.6679    218.571      3.883      0.000       420.264  1277.072
UNIT_R307                262.7445    222.732      1.180      0.238      -173.814   699.303
UNIT_R308                699.5203    220.453      3.173      0.002       267.428  1131.613
UNIT_R309                705.9845    220.142      3.207      0.001       274.502  1137.467
UNIT_R310               1207.9182    221.410      5.456      0.000       773.950  1641.886
UNIT_R311                166.2306    221.215      0.751      0.452      -267.355   599.816
UNIT_R312                101.9214    218.845      0.466      0.641      -327.019   530.862
UNIT_R313                -36.3407    223.068     -0.163      0.871      -473.559   400.878
UNIT_R318                311.0221    219.120      1.419      0.156      -118.458   740.502
UNIT_R319               1124.8596    219.969      5.114      0.000       693.716  1556.003
UNIT_R321                845.5455    218.013      3.878      0.000       418.235  1272.856
UNIT_R322               1544.0401    219.998      7.018      0.000      1112.840  1975.241
UNIT_R323               1045.9769    222.061      4.710      0.000       610.733  1481.221
UNIT_R325                245.0520    219.833      1.115      0.265      -185.824   675.928
UNIT_R330                755.9121    220.603      3.427      0.001       323.526  1188.298
UNIT_R335                268.2792    223.543      1.200      0.230      -169.870   706.429
UNIT_R336                -98.1496    223.524     -0.439      0.661      -536.261   339.962
UNIT_R337                -60.4869    222.213     -0.272      0.785      -496.029   375.055
UNIT_R338               -195.5844    220.277     -0.888      0.375      -627.331   236.163
UNIT_R341                368.5297    217.611      1.694      0.090       -57.992   795.052
UNIT_R344                239.7185    224.725      1.067      0.286      -200.746   680.184
UNIT_R345                235.9263    221.225      1.066      0.286      -197.680   669.532
UNIT_R346               1011.1078    221.506      4.565      0.000       576.951  1445.264
UNIT_R348                 19.3938    220.139      0.088      0.930      -412.083   450.871
UNIT_R354                 64.2897    222.539      0.289      0.773      -371.891   500.470
UNIT_R356                944.3392    219.693      4.298      0.000       513.737  1374.942
UNIT_R358                 98.1688    222.517      0.441      0.659      -337.968   534.306
UNIT_R370                242.7384    220.840      1.099      0.272      -190.112   675.589
UNIT_R371                464.7694    222.063      2.093      0.036        29.522   900.017
UNIT_R372                467.0272    223.047      2.094      0.036        29.850   904.205
UNIT_R373                402.0316    223.377      1.800      0.072       -35.791   839.854
UNIT_R382                768.4437    219.356      3.503      0.000       338.501  1198.387
UNIT_R424                127.9361    224.405      0.570      0.569      -311.901   567.773
UNIT_R429                874.0374    217.469      4.019      0.000       447.793  1300.281
UNIT_R453               1537.7553    225.374      6.823      0.000      1096.018  1979.493
UNIT_R454               -134.2191    223.691     -0.600      0.548      -572.658   304.219
UNIT_R455               -187.0811    223.491     -0.837      0.403      -625.128   250.966
UNIT_R456                -38.6170    220.611     -0.175      0.861      -471.018   393.784
UNIT_R459                -61.2943    267.969     -0.229      0.819      -586.519   463.930
UNIT_R464               -250.8199    221.853     -1.131      0.258      -685.657   184.017
==============================================================================
Omnibus:                    29943.708   Durbin-Watson:                   1.569
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          1035900.421
Skew:                           2.953   Prob(JB):                         0.00
Kurtosis:                      26.411   Cond. No.                         762.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Out[2]:
array([ -778.42172231, -2201.67405866,  -155.91380219, ...,   690.14905763,
         845.2340505 ,  1417.04774007])