But don’t expect to see that Canadian weather alert site anytime this century, if the past few days have been any indication. See, I was supposed to have launched a working prototype of my geography quiz thing but nooooo….

So I thought I’d try the same trick as last night of switching to another project to get a mental boost from a quick win. But just like last night, I picked something that wasn’t quick at all. Thus defeating the point.

one weather scraper to rule them all

Despite deciding to learn Elm before deciding that I also needed a backend language to go with Elm (Elixir beat out Haskell), I actually completed something successfully first in Elixir: a weather data scraper.

It’s eventually supposed to include everything that my weather sites need, since oftentimes the data is useful in more than one place (e.g., like on a site covering US weather alerts (not released yet), if a hurricane happens to be making landfall, then my hurricane tracker and the unnamed US severe weather site are both going to need access to the data).

Officially though, the Elixir project only does one thing, which is to deliver raw data of the US severe weather watches to my Ruby/Opal powered site so that it can parse/display it there.

After completing the watches page, I started work on scraping the Canadian Public Weather Alerts, but ran into timezone issues that ended up sidetracking me enough to switch me back to Elm, where I eventually released my first Elm powered thing tertremo - the earthquake tracker.

But tonight, I picked it back up to try to sort out the timezone issues.

I am a terrible programmer

So, dealing with a lot of US centric data, I’ve been spoiled. Things like EDT/CDT/etc are just handled magically. If you throw a time string into a parser with one of those abbreviations, pas de soucis. So imagine my shock when it didn’t magically work for a timezone in Canada.

Oh, timezones, how I hate you.

I’m not particularly thrilled with how I handled things, but for now anyway, it does actually work.

defp parse_time(html) do
  html
  |> get_raw_time
  |> HexStorm.Util.sanitize_and_parse_time("%l:%M %p %Z %A %e %B %Y")
  |> Timezone.convert("Etc/UTC")
end

I think it does a good job of hiding the absolute insanity that goes on behind the scenes in sanitize_and_parse_time. And to be clear, I don’t think anyone should do this, because it just looks bloody awful.

defmodule HexStorm.Util do
  # https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

  def sanitize_timezone("NST"), do: "America/St_Johns"
  def sanitize_timezone("NDT"), do: sanitize_timezone("NST")

  def sanitize_timezone("AST"), do: "America/Halifax" 
  def sanitize_timezone("ADT"), do: sanitize_timezone("AST")

  def sanitize_timezone(timezone_string), do: timezone_string

  def sanitize_and_parse_time(time_string, time_format) do
    # so basically instead of trying to do string parsing to figure out what %Z
    # maps to, we play dumb and throw it at Timex.parse and see what happens
    # and then let it tell us what the bad value is and fix it

    # time_string is your raw string like "10:20 PM NDT Sunday 06 August 2017"
    # time_format is your strftime style thing like "%l:%M %p %Z %A %e %B %Y"

    case Timex.parse(time_string, time_format, :strftime) do
      {:ok, timex_value} -> timex_value
      {:error, reason } ->
        {{:invalid_timezone, bad_timezone_string}, []} = Code.eval_string(reason)
        sanitized_timezone_string = sanitize_timezone(bad_timezone_string)
        time_string
        |> String.replace(bad_timezone_string, sanitized_timezone_string)
        |> Timex.parse!(time_format, :strftime)
    end
  end
end

So basically, it takes a string and a strftime style format, and throws it at Timex.parse in the hopes that it works. If it doesn’t, it uses Timex’s error (which for some reason is a string of a tuple instead of an actual tuple (hence the Code.eval_string)) to figure out what was the bad timezone abbreviation was and then throws that abbreviation into my translator, turning things like “NDT” (Newfoundland Daylight Time) into “America/St_Johns”.

It’s not pretty but it works well enough. So far. And that’s not saying much since this part of the scraper doesn’t auto run yet since the future site that will use it is nowhere near going live (cough, haven’t even started).

But if you’re considering ever getting into any sort of programming involving timezones, just don’t. AVOID AT ALL COSTS.

Watch this video, and you’ll have some insight into why.