02 August 2014

I have to handle like 3 million records to join with a stations table so I decided to setup a local database before doing anything on the web so I can explore my data in a more convenient way to do aggregations and so on.

I’ve installed Postgres+PostGIS and created a climate database. Then I’ve imported the stations and the stations_2008 shapefiles as well as the Admin 1 - States, provinces shapefile from Natural Earth so I can do a region or country aggregation if needed.

To import the Natural Earth I’ve executed:

$ ogr2ogr -f PostgreSQL PG:"user=myUser password=myPassword dbname=climate" \
	ne_10m_admin_1_states_provinces_resaved.shp -overwrite \
	-nlt MULTIPOLYGON -nln provinces -lco GEOMETRY_NAME=geom

So now, I have my regions and stations on my database and I can start doing basic things like this to count the number of stations by country on the geodata (not the alphanumeric data provided by the dataset).

	pr.admin as country, count(*) c
from stations st join provinces pr
	on ST_Within(st.geom,pr.geom)
group by pr.admin
order by c desc

Or count the number of stations by region on a given country:

	sp.name as region, count(*) c
from stations st join provinces sp
	on ST_Within(st.geom,sp.geom)
where st.country = 'SP'
group by sp.name
order by c desc;

I’ve also set up a class in Python that parses a line of the observation data, so as for now, I’m ready to create the observation table and parse all the observations to have everything on my database.