Tuesday, June 9, 2020

[AWS] Athena aka Hive/Presto - renaming/mapping a JSON attribute named TIMESTAMP to be called TS

(More Athena stuff coming, probably)

Since I couldn't get Amazon's documentation example with  ColumnToJsonKeyMappings to work... (fortunately, the Hive documentation is better, and Athena uses Hive for DDL)

My JSON looks like this (which I'm not formatting, because Athena demands every row be on one line, with no fancy "line feeds" or "carriage returns")

{"event":"eventififer","globalid":"3fd6dce3-6650-4e3f-8d8d-a46d1baaca02","timeStamp":"2020-06-08T19:29:31.114Z"}

In this case, "timestamp" is a reserved word, so you have to query it like:
and "timestamp" >= '2020-06-08T18:30:55Z' 

So, in order to rename the field, you could either create a view on top of it, or rename it during the creation.

CREATE EXTERNAL TABLE myawsdatacatalog.mytablename(
event string ,
globalid string,
ts string)
partitioned by (dt string)

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties ("mapping.ts"= "timestamp")
LOCATION 's3://mybucket/my/folder/path/';

Yes, the ts is a string, because it has to meet their particular format EXACTLY. If I try TIMESTAMP for the data type for ts, it creates fine! But when I try to query it, I get the following. 

HIVE_BAD_DATA: Error parsing field value '2020-06-04T20:01:47.829Z' for field 3: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

If you need a workaround, I'd say to create it as string, then toss a view on top with: 
select from_iso8601_timestamp(ts) as ts

No comments: