I have to handle like 3 million records to join with a stations table so I decided to setup a local database before doing anything on the web so I can explore my data in a more convenient way to do aggregations and so on.
I’ve installed Postgres+PostGIS and created a
climate database. Then I’ve imported the
stations and the
stations_2008 shapefiles as well as the Admin 1 - States, provinces shapefile from Natural Earth so I can do a region or country aggregation if needed.
To import the Natural Earth I’ve executed:
$ ogr2ogr -f PostgreSQL PG:"user=myUser password=myPassword dbname=climate" \ ne_10m_admin_1_states_provinces_resaved.shp -overwrite \ -nlt MULTIPOLYGON -nln provinces -lco GEOMETRY_NAME=geom
So now, I have my regions and stations on my database and I can start doing basic things like this to count the number of stations by country on the geodata (not the alphanumeric data provided by the dataset).
select pr.admin as country, count(*) c from stations st join provinces pr on ST_Within(st.geom,pr.geom) group by pr.admin order by c desc
Or count the number of stations by region on a given country:
select sp.name as region, count(*) c from stations st join provinces sp on ST_Within(st.geom,sp.geom) where st.country = 'SP' group by sp.name order by c desc;
I’ve also set up a class in Python that parses a line of the observation data, so as for now, I’m ready to create the observation table and parse all the observations to have everything on my database.