02 August 2014

Time to write a post while the database is busy. Basically I’ve set up a table on the db with the proper fields for weather data, using a combined station_id that is created from both identifiers on the text files, USAF and the WBAN. Then I’ve created a class in Python to map all the data coming from the files and a small piece of code to process all that data into the database.

# [... set up imports, folders and helper class ...]
conn = psycopg2.connect("dbname=climate user=myUser password=myPass")
cur = conn.cursor()
for op_file in os.listdir(obs_dir): #walk the directory
    with open(os.path.join(obs_dir,op_file)) as f: #open the file
        for line in f: #walk the file
                if len(line)>0:
                        o = observation(line) #instantiate the helper class
                        cur.execute(o.getSQL()) #get the insert SQL
                    except Exception, an_error:
                        print an_error

Then, once the data is stored on the database we can create an aggregation by month, something more approachable for the map I want to build.

with data as (select * from observations)
	extract(month from obs_date) m,
	count(*) c ,
	avg(tmp) tmp,
	avg(dewp) dewp,
	avg(sea_pr) sea_pr,
	avg(st_pr) st_pr,
	avg(vis) vis,
	avg(wind) wind,
	max(max_wind) max_wind,
	max(max_gust) max_gust,
	max(max_tmp) max_tmp,
	min(min_tmp) min_tmp,
	sum(precip) precip,
	avg(snow_depth) snow_depth,
	sum(fog::int) fog_count,
	sum(rain::int) rain_count,
	sum(snow::int) snow_count,
	sum(hail::int) hail_count,
	sum(thunder::int) thunder_count,
	sum(tornado::int) tornado_count
into temp stations_month_tmp
from data o
group by o.station_id, m;

Depending on the variable I’ll use avg, max, min or sum to get the appropriate data for the statistics. This gives a 12 times 11000 rows table, that’s still more than required for a free account but well, at least I have an API to query for month level.

But I want also all the raw data by station so I’ll generate small files to be requested by the visuazilation later if I want the details of one particular station. That is, I’ll translate the fixed width files into CSVs that I’ll be able to request on demand directly from the web server, without any service involved.

To summarize I want:

  • Stations and monthly aggregated data on CartoDB for dinamic queries and map visualizations.
  • Raw observations on a web server for station details, out of the map.