Thursday, May 12, 2022

Read your Microsoft Teams messages using Microsoft Graph and Powershell

 I suck at time cards. Part of it is that I'm jumping between 30 tasks at once - help this team, help that team, training, documentation, and my sprint work. At the end of the day I have a bare-ass notebook with a few cryptic lines about what I did today - if I'm lucky.  

But a lot of it is available elsewhere. We're all virtual now, so all my interactions are either voice calls, meetings, or chats.  I want to list out what I did in a given day so I can reconstruct it - how much time I spent helping whom.

Enter Microsoft Graph. It's an API that selects from it. There's a set of cmdlets that let you do quite a bit.  I really do wish the get-mgchatmessagedelta properly handled "-filter" to make this faster, but I'll cope.

This will rip through ALL your chats, then for each one get the most recent 200 messages, filter it down to the past day or two, then toss it out on a format-table. 

My pain, your gain.  It's always fun when there's 2 google results for something. 

Install-Module Microsoft.Graph
Import-Module Microsoft.Graph.Teams
$RequiredScopes = @("Chat.ReadBasic", "Chat.ReadWrite")
Connect-MgGraph -Scopes $RequiredScopes
#at this point a browser window should appear - allow it.
#I had to find my user id in graph explorer UI by running GET   V1.0
#unsure how to get it otherwise - but you don't need it with get-mgchat
#take one of those IDs
#let's look at this chat:
get-mgchat  -ChatId 19:deadbeefb2d949d88d4455f5279e5d8b@thread.v2
get-mgchatmessage -ChatId 19:deadbeefb2d949d88d4455f5279e5d8b@thread.v2 

#this nests and walks through properly, strips HTML, but lord this will be slow.  And shows more than one line per message, even when I try now to.
#By default you can get all your chats by running get-mgchat.  -all and -pagesize 50 is required for the module to paginate the request and get you everything. But in my case it grabbed all 2000 chats. The -first 5 is for testing.
$tzone = Get-TimeZone  # conversion from GMT to local time.
$mychats = get-mgchat -all -PageSize 50  |select -first 5
$all_chat_info = @() #force-setting to an array
$all_chat_info = foreach ($chat in $mychats) { 
    $chatinfo = get-mgchat -ChatId $ #get base details about the CHAT itself
    #set some details about the chat itself for the later query
    $chatname = $chat.Topic
    $members = $chat.Members
    $chattype = $chat.chattype
    #now get every message from that chat since midnight yesterday.  Note LastModifiedDateTime is GMT.    The jdhit page says -($tzone...), but all I had to do was .tolocaltime() ... I think.
    #the -top 200 -pagesize 50 is to get the most recent 200 messages, and again you have to paginate.   
    $recentchatmessages = get-mgchatmessage -ChatId $ -top 200 -pagesize 50 |where {$_.LastModifiedDateTime.tolocaltime() -gt (get-date).date.AddDays(-1)} # all from after midnight yesterday |select -first 5
    #and now use select expression to add the above fields and parse the below fields, stripping out HTML (but I can't seem to only get the first line in OGV)
    $recentchatmessages | select @{Label='LastModified';Expression={($_.LastModifiedDateTime.tolocaltime())}}, @{Label='ChatName';Expression={($chatname)}}, @{Label='members';Expression={($members)}}, @{Label='ChatType';Expression={($chattype)}}, 
    @{Label='From';Expression={($_.from.user.displayname)}}, @{Label='Body';Expression={ ($_.Body.content -split '\n')[0] -replace '<[^>]+>',''}}
    #@{Label='From';Expression={($_.from.user.displayname)}}, @{Label='Body';Expression={( ($_.Body.content -replace '<[^>]+>','').split([Environment]::NewLine)|select -first 1)}}
#and now close/disconnect

Wednesday, April 20, 2022

Xquery - a surprising difference between Azure SQL Database and Azure Managed Instance with Quotes

I'm moving processes into Azure.  Given that it's me, probably poorly. But I'm using this as an opportunity to improve things where possible, specifically looking at moving some processes to Azure SQL Database. 
Why? I love them for several reasons: 

  • Simpler
  • Cheaper
  • Quick to create
  • Quick to modify the instance (seconds, not hours)

Yes, there are still gotchas (Elastic Jobs, no cross-db queries, no CLR, can't replicate from it), but overall? Soooo nice.


I was trying to get some Xquery parsing working, and ran into a weird error:
"CONDITIONAL failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER'"

Weird.  Doubly so because this code has worked on-prem for a decade, and the Managed Instance is running it now.  And because the SP specifically has a SET QUOTED_IDENTIFIER ON; 

Okay, let's take alook online.

Yeah, that's the problem.  And indeed, my code has similar xquery to the example:
set @myval = @xml.[value]('(*/A[B="C"]/D[E="F" or G="H"]/I)[1]', 'varchar (35)');

What's the fix? Use two single-quotes instead of the double quote.
set @myval = @xml.[value]('(*/A[B=''C'']/D[E=''F'' or G=''H'']/I)[1]', 'varchar (35)');

Saturday, March 5, 2022

We stand with Ukraine

growing up, my Baba always pronounced it weirdly to me - you-craw-een. Unsurprisingly, this turns out to be the right way. watching the footage,i cant help but feel for them. I don't know how this is going to shake out, but they're in the right and russia is wrong.

Friday, January 21, 2022

SSRS - getting the ACTUAL connection strings for ALL your datasources in SSRS

File under "and this is why I'm not an MVP"...

Yeah, it's been a few months. Azure, Powershell, Python, Parquet, and the entire team turning over. 

However, now I get to deal with... SSRS?  Really?  Yeah, until we figure out how to convert them to PowerBI or something Azure-centric that requires no effort.  Seriously, I can't believe Microsoft doesn't have this.  It's DUM. D-U-M-B, dum.

But I digress.

Today: comparing your old SSRS Datasource connection strings to the new server, since they don't match and you don't think you can just backup/restore over ReportServer.

<script src=""></script>

and , in case THAT doesn't work.


#to install the MS module you need:

#Invoke-Expression (Invoke-WebRequest

$SSRS_Servername = "yourservernamehere"

#query cribbed from

$SSRS_list_datasources = @"


Let's say you want to move a database to an other SQL Server, but which of the SSRS Shared Datasources uses this database and must be changed afterwards?

With this Transact-SQL query for ReportServer database you get the connection string of all Shared Datasources,

to document the usage or to search for a specific server/database.

Please remark: Querying the ReportServer database directly is not a supported way.

Works with SSRS 2005 and higher version ReportServer databases.

Requieres select rights on the "Catalog" table in ReportServer database.


-- Connection strings of all SSRS Shared Datasources

;WITH XMLNAMESPACES  -- XML namespace def must be the first in with clause.

    (DEFAULT ''


     AS rd)


    (SELECT AS SharedDsName


           ,CONVERT(xml, CONVERT(varbinary(max), content)) AS DEF

     FROM dbo.[Catalog] AS SDS

     WHERE SDS.Type = 5)     -- 5 = Shared Datasource





    (SELECT SDS.[Path]


           ,DSN.value('ConnectString[1]', 'varchar(150)') AS ConnString


          CROSS APPLY 

          SDS.DEF.nodes('/DataSourceDefinition') AS R(DSN)

     ) AS CON

-- Optional filter:

-- WHERE CON.ConnString LIKE '%Initial Catalog%=%TFS%'




. C:\Powershell_Scripts\invoke-sqlcmd2a.ps1

$datasource_list = invoke-sqlcmd2 -query $SSRS_list_datasources -ServerInstance $SSRS_Servername -Database ReportServer

$DataSource_Details = @()

$DataSource_Details += foreach ($datasource in $datasource_list) {

$ConnectString = Get-RsDataSource -ReportServerUri "http://$SSRS_Servername/ReportServer" -Path $datasource.Path

$connectstring|select @{Label='SharedDsName';Expression={$datasource.SharedDsName}} ,@{Label='Path';Expression={$datasource.path}}, @{Label='ConnectionString';Expression={$_.ConnectString}} , @{Label='Username';Expression={$_.UserName}} 



Wednesday, June 9, 2021

SQL Workbench /J & Aurora & MariaDB driver getting "could not load system variables" - FIXED!

 Using the 2.7.3 MariaDB java connector with SQL Workbench /J, I was trying to reach my aurora instance but kept getting this error:

could not load system variables

Finally found an answer here:
As per Diego Dupin:

Aurora proxy has a known race condition issue that results in skipping other queries in proxy buffer. connection might be in hang waiting for query response. During authentication, socket has a timeout, that will result throwing the error you describe.

So if `usePipelineAuth` or `useBatchMultiSend` is enable, it might work ... or not. All pipeline options (`usePipelineAuth` and `useBatchMultiSend`) must be disabled when using aurora. Problem has been reported to aurora a few times without correction.

What worked for us was adding this after the DB name 


like this:


Friday, June 4, 2021

Powershell - using ODBC and the Simba Driver to query AWS Athena

 I was running into issues with the Linked Server lopping off long JSON that I'm having to pull out from the raw files.  I can't explain it - doesn't appear to be SSMS.  See previous post

But I needed to automate this, rather than use SQL Workbench, save to "Excel" (it was XML), then opening it again and saving it so that instead of 250mb, it's 30mb.  Runs against the previous month, one day at a time (walking the partitions), and then saves to a file.  You got your Athena, your ODBC, your Export-Excel...

#This has two requirements:

# 1 - simba installed as an ODBC driver, called "your_simba_athena_dsn_name_here".  Free from Amazon for this use.

# 2 - ImportExcel.  Requires Powershell V4 (V5?), but for sure v3 won't work.

#Get-ODBC-Data2 is just a slight modification of Anders' code that includes the UID/PWD. Thanks!


#modified because we have to pass in user/password to Athena.

function Get-ODBC-Data2{




    [string]$query=$(throw 'query is required.')


    $conn = New-Object System.Data.Odbc.OdbcConnection

    $conn.ConnectionString = "DSN=$dsn;$param"



    $cmd = New-object System.Data.Odbc.OdbcCommand($query,$conn)

    $ds = New-Object system.Data.DataSet

    (New-Object system.Data.odbc.odbcDataAdapter($cmd)).fill($ds) | out-null




 #Set start and end.  Those can be whatever, but I made it one month for simplicity

 [datetime]$start_date_yyyymmdd = (get-date).addmonths(-1).tostring("yyyy-MM-01")

 [datetime]$end_date_yyyymmdd = $start_date_yyyymmdd.addmonths(1).tostring("yyyy-MM-01")

 #[datetime]$end_date_yyyymmdd = (get-date).addmonths(0).tostring("yyyy-MM-01")

 $filename = "Myfile_" + $start_date_yyyymmdd.tostring("yyyyMM")

 #set a variable so that as we walk through, we have an array to save everything to.



 #loop through the days

 while ($start_date_yyyymmdd -lt $end_date_yyyymmdd) {

 $startdate = $start_date_yyyymmdd.tostring('yyyy-MM-dd')

 "starting $startdate"

 $query = @"

 select json_extract_scalar(myfield, '$.fieldjsonparsed') as fielda, dt,

 fieldb, fieldc

  from myschema.mytablename

  where dt='$startdate'



 $results = get-odbc-data2 -query $query -dsn "your_simba_athena_dsn_name_here" -param "uid=youraccesskeyhere;PWD=yoursecretkeyhere" 

 #save non-systemdatarow version to our main array

 $formattedresults +=$results|select-object fielda,fieldb,fieldc 

  $start_date_yyyymmdd = $start_date_yyyymmdd.adddays(1)



 #export using ImportExcel module, which unfortunately needs at least higher than V3

 $formattedresults| export-excel -AutoSize -BoldTopRow -FreezeTopRow -TableName Test -TableStyle "Light13" -WorkSheetname Test -Path c:\temp\$filename.xlsx

EXECUTE AT against Athena via Linked Server and "does not support the required transaction interface"

 TLDR - using EXECUTE AT with Athena via linked server.  I used to get it set up and it works!  But there are some issues.

1) Partition your tables in Athena.  Better performance if you have a decent amount of data

2) This means you can't just do "select from <linkedserver>...tablename", as that doesn't tell Athena to do partition elimination

3) Once you're using OPENQUERY, if you run into issues because it doesn't know what to expect, you can use CAST or CONVERT to force metadata (aka put "cast(fielda as varchar(500))" if it thinks the field is only 255 characters.

4) That doesn't help with over varchar(8000).  You need to move to EXECUTE AT instead, but that seems better then OPENQUERY as it appears to just pass the data back - I got rid of my CASTs in the athena query. Plus, dynamic SQL is easier! You will have to enable RPC OUT for the Linked Server.

5) I couldn't do INSERT INTO EXECUTE AT until I disabled Distributed Transactions in that Linked Server.

Update 2021/06/04 - and there's something screwy with EXECUTE AT.  It runs without the CAST statements and doesn't complain , but doesn't appear to save everything - I have a handful of rows (out of 100k, mind you) that don't match up - one of the fields is lopped off. It's the most annoying thing, since my other option is SQL Workbench which WORKS but isn't easily automated.  After futzing with it for a bit, I wound up moving over to a pure Powershell solution - I'll post that in a few.

Thursday, March 25, 2021

System_health - parsing out queryProcessing node

Needed this today, my pain = your gain.

System_health has a TON of useful info.  However, it's buried in XML, and my shredder (see other posts with the system_health tag) doesn't go this deep.  

sp_server_diagnostics runs every 5 minutes, and there are other posts on getting some of the data out.  But I needed more, and I needed more than just "right now".  System_health saves 32 or so, in 5 minute increments.

This is not complete.  But you do get max workers, max in use, max idle, memory allocations, and non-preemptive wait types, both by count and duration.


if object_id('tempdb..#systemhealthsessiondata') is not null

    DROP TABLE #systemhealthsessiondata


INTO #SystemHealthSessionData

FROM sys.dm_xe_session_targets xet

JOIN sys.dm_xe_sessions xe

ON (xe.address = xet.event_session_address)

WHERE = 'system_health'

SELECT C.query('.') EventXML

, T.C.value('@timestamp', 'datetime')

, T.C.value('(data/value/queryProcessing/@maxWorkers)[1]','varchar(255)') as maxWorkers

, T.C.value('(data/value/queryProcessing/@workersCreated)[1]','varchar(255)') as workersCreated

, T.C.value('(data/value/queryProcessing/@workersIdle)[1]','varchar(255)') as workersIdle

, T.C.value('(data/value/queryProcessing/@tasksCompletedWithinInterval)[1]','varchar(255)') as tasksCompletedWithinInterval

, T.C.value('(data/value/queryProcessing/@pendingTasks)[1]','varchar(255)') as pendingTasks

, T.C.value('(data/value/queryProcessing/@oldestPendingTaskWaitingTime)[1]','varchar(255)') as oldestPendingTaskWaitingTime

, T.C.value('(data/value/queryProcessing/@hasUnresolvableDeadlockOccurred)[1]','varchar(255)') as hasUnresolvableDeadlockOccurred

, T.C.value('(data/value/queryProcessing/@hasDeadlockedSchedulersOccurred)[1]','varchar(255)') as hasDeadlockedSchedulersOccurred

, T.C.value('(data/value/queryProcessing/@trackingNonYieldingScheduler)[1]','varchar(255)') as trackingNonYieldingScheduler

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="MEMORY_ALLOCATION_EXT"]/@waits)[1]','varchar(255)') as MEMORY_ALLOCATION_EXT_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="RESERVED_MEMORY_ALLOCATION_EXT"]/@waits)[1]','varchar(255)') as RESERVED_MEMORY_ALLOCATION_EXT_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="OLEDB"]/@waits)[1]','varchar(255)') as OLEDB_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="WRITELOG"]/@waits)[1]','varchar(255)') as WRITELOG_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="PAGEIOLATCH_SH"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_SH_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="ASYNC_NETWORK_IO"]/@waits)[1]','varchar(255)') as ASYNC_NETWORK_IO_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="IO_COMPLETION"]/@waits)[1]','varchar(255)') as IO_COMPLETION_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="PAGELATCH_EX"]/@waits)[1]','varchar(255)') as PAGELATCH_EX_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="PAGEIOLATCH_EX"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_EX_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="CXPACKET"]/@waits)[1]','varchar(255)') as CXPACKET_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="SQLTRACE_WAIT_ENTRIES"]/@waits)[1]','varchar(255)') as SQLTRACE_WAIT_ENTRIES_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="CLR_AUTO_EVENT"]/@waits)[1]','varchar(255)') as CLR_AUTO_EVENT_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION"]/@waits)[1]','varchar(255)') as HADR_FILESTREAM_IOMGR_IOCOMPLETION_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="TRACEWRITE"]/@waits)[1]','varchar(255)') as TRACEWRITE_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="ASYNC_NETWORK_IO"]/@waits)[1]','varchar(255)') as ASYNC_NETWORK_IO_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="WRITELOG"]/@waits)[1]','varchar(255)') as WRITELOG_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="PAGEIOLATCH_SH"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_SH_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="OLEDB"]/@waits)[1]','varchar(255)') as OLEDB_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="PAGEIOLATCH_EX"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_EX_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="LCK_M_SCH_S"]/@waits)[1]','varchar(255)') as LCK_M_SCH_S_ms



              <wait waitType="SQLTRACE_WAIT_ENTRIES" waits="2559734" averageWaitTime="5898" maxWaitTime="304986" />

              <wait waitType="CLR_AUTO_EVENT" waits="42" averageWaitTime="148452579" maxWaitTime="1161135755" />

              <wait waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION" waits="6283026" averageWaitTime="500" maxWaitTime="38514" />

              <wait waitType="TRACEWRITE" waits="1656275" averageWaitTime="1892" maxWaitTime="15678" />

              <wait waitType="ASYNC_NETWORK_IO" waits="170646385" averageWaitTime="4" maxWaitTime="34341" />

              <wait waitType="WRITELOG" waits="322871416" averageWaitTime="1" maxWaitTime="166927" />

              <wait waitType="PAGEIOLATCH_SH" waits="204943649" averageWaitTime="1" maxWaitTime="157813" />

              <wait waitType="OLEDB" waits="996781932" averageWaitTime="0" maxWaitTime="158157" />

              <wait waitType="PAGEIOLATCH_EX" waits="32693084" averageWaitTime="1" maxWaitTime="121494" />

              <wait waitType="LCK_M_SCH_S" waits="10554" averageWaitTime="4387" maxWaitTime="94983" />





              <wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="1935312" averageWaitTime="0" maxWaitTime="206" />

              <wait waitType="PREEMPTIVE_OS_CRYPTOPS" waits="1391671" averageWaitTime="0" maxWaitTime="143" />

              <wait waitType="PREEMPTIVE_OS_QUERYREGISTRY" waits="912266" averageWaitTime="0" maxWaitTime="188" />

              <wait waitType="PREEMPTIVE_OS_WRITEFILE" waits="817138" averageWaitTime="30" maxWaitTime="145310" />

              <wait waitType="PREEMPTIVE_OS_GETPROCADDRESS" waits="637470" averageWaitTime="0" maxWaitTime="15" />

              <wait waitType="PREEMPTIVE_XE_CALLBACKEXECUTE" waits="629446" averageWaitTime="0" maxWaitTime="9" />

              <wait waitType="PREEMPTIVE_OS_AUTHORIZATIONOPS" waits="560759" averageWaitTime="0" maxWaitTime="59" />

              <wait waitType="PREEMPTIVE_OS_CLOSEHANDLE" waits="494332" averageWaitTime="1" maxWaitTime="124284" />

              <wait waitType="PREEMPTIVE_OS_CRYPTACQUIRECONTEXT" waits="464831" averageWaitTime="2" maxWaitTime="129" />

              <wait waitType="PREEMPTIVE_OS_REVERTTOSELF" waits="96931" averageWaitTime="0" maxWaitTime="36" />



              <wait waitType="PREEMPTIVE_OS_WRITEFILE" waits="817138" averageWaitTime="30" maxWaitTime="145310" />

              <wait waitType="PREEMPTIVE_OS_CRYPTACQUIRECONTEXT" waits="464831" averageWaitTime="2" maxWaitTime="129" />

              <wait waitType="PREEMPTIVE_OS_CLOSEHANDLE" waits="494332" averageWaitTime="1" maxWaitTime="124284" />

              <wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="1935312" averageWaitTime="0" maxWaitTime="206" />

              <wait waitType="PREEMPTIVE_OS_QUERYREGISTRY" waits="912266" averageWaitTime="0" maxWaitTime="188" />

              <wait waitType="PREEMPTIVE_OS_FILEOPS" waits="18032" averageWaitTime="16" maxWaitTime="3488" />

              <wait waitType="PREEMPTIVE_OS_REPORTEVENT" waits="2032" averageWaitTime="135" maxWaitTime="102489" />

              <wait waitType="PREEMPTIVE_OS_CRYPTOPS" waits="1391671" averageWaitTime="0" maxWaitTime="143" />

              <wait waitType="PREEMPTIVE_OS_FLUSHFILEBUFFERS" waits="1924" averageWaitTime="60" maxWaitTime="3880" />

              <wait waitType="PREEMPTIVE_OS_CREATEFILE" waits="3048" averageWaitTime="26" maxWaitTime="6397" />





FROM #SystemHealthSessionData a

CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') as T(C)

--cross apply T.C.nodes('/event/data') as E(D)


 T.C.query('.').value('(/event/@name)[1]', 'varchar(255)') in ('sp_server_diagnostics_component_result')

and  C.value('(data/text)[1]','varchar(255)')= 'QUERY_PROCESSING'

Thursday, October 8, 2020

UNION ALL returns "conversion... overflowed an int column"

I had a stored procedure throwing the following error:

The conversion of the varchar value '9999999999' overflowed an int column.  

There's two queries joined, so I split them up to see which one was throwing the error.  

Neither.  Of course.  : - \

If you break the query apart, both halves work.  

Okay, so let's figure this out:

exec sp_describe_first_result_set @tsql = N'myqueryhere'

exec sp_describe_first_result_set @tsql = N'myotherqueryhere'

Oh, hey look, varchar in one, int in the other.

Cause? My own stupidness from 7 years ago:

case when fielda is null or fielda = 0 then 0

else replace(replace(convert(varchar(14),fielda(,'.',''))


On one set of data, there are fields that have values, specifically the 9999999999; the second set returns nothing.  

So it tries to convert to the "least" datatype, the INT, and the 9,999,999,999 is too large for INT, so it fails.

Tuesday, June 9, 2020

[AWS] Athena aka Hive/Presto - renaming/mapping a JSON attribute named TIMESTAMP to be called TS

(More Athena stuff coming, probably)

Since I couldn't get Amazon's documentation example with  ColumnToJsonKeyMappings to work... (fortunately, the Hive documentation is better, and Athena uses Hive for DDL)

My JSON looks like this (which I'm not formatting, because Athena demands every row be on one line, with no fancy "line feeds" or "carriage returns")


In this case, "timestamp" is a reserved word, so you have to query it like:
and "timestamp" >= '2020-06-08T18:30:55Z' 

So, in order to rename the field, you could either create a view on top of it, or rename it during the creation.

CREATE EXTERNAL TABLE myawsdatacatalog.mytablename(
event string ,
globalid string,
ts string)
partitioned by (dt string)

with serdeproperties ("mapping.ts"= "timestamp")
LOCATION 's3://mybucket/my/folder/path/';

Yes, the ts is a string, because it has to meet their particular format EXACTLY. If I try TIMESTAMP for the data type for ts, it creates fine! But when I try to query it, I get the following. 

HIVE_BAD_DATA: Error parsing field value '2020-06-04T20:01:47.829Z' for field 3: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

If you need a workaround, I'd say to create it as string, then toss a view on top with: 
select from_iso8601_timestamp(ts) as ts