Friday, June 4, 2021

Powershell - using ODBC and the Simba Driver to query AWS Athena

 I was running into issues with the Linked Server lopping off long JSON that I'm having to pull out from the raw files.  I can't explain it - doesn't appear to be SSMS.  See previous post


But I needed to automate this, rather than use SQL Workbench, save to "Excel" (it was XML), then opening it again and saving it so that instead of 250mb, it's 30mb.  Runs against the previous month, one day at a time (walking the partitions), and then saves to a file.  You got your Athena, your ODBC, your Export-Excel...



#This has two requirements:

# 1 - simba installed as an ODBC driver, called "your_simba_athena_dsn_name_here".  Free from Amazon for this use.

# 2 - ImportExcel.  Requires Powershell V4 (V5?), but for sure v3 won't work.

#Get-ODBC-Data2 is just a slight modification of Anders' code that includes the UID/PWD. Thanks!

#https://www.andersrodland.com/working-with-odbc-connections-in-powershell/

#modified because we have to pass in user/password to Athena.

function Get-ODBC-Data2{

    param(

    [string]$dsn,

    [string]$param,

    [string]$query=$(throw 'query is required.')

    )

    $conn = New-Object System.Data.Odbc.OdbcConnection

    $conn.ConnectionString = "DSN=$dsn;$param"

    $query

    $conn.open()

    $cmd = New-object System.Data.Odbc.OdbcCommand($query,$conn)

    $ds = New-Object system.Data.DataSet

    (New-Object system.Data.odbc.odbcDataAdapter($cmd)).fill($ds) | out-null

    $conn.close()

    $ds.Tables[0]

 }


 #Set start and end.  Those can be whatever, but I made it one month for simplicity

 [datetime]$start_date_yyyymmdd = (get-date).addmonths(-1).tostring("yyyy-MM-01")

 [datetime]$end_date_yyyymmdd = $start_date_yyyymmdd.addmonths(1).tostring("yyyy-MM-01")

 #[datetime]$end_date_yyyymmdd = (get-date).addmonths(0).tostring("yyyy-MM-01")

 $filename = "Myfile_" + $start_date_yyyymmdd.tostring("yyyyMM")

 #set a variable so that as we walk through, we have an array to save everything to.

 $formattedresults=@()

 

 #loop through the days

 while ($start_date_yyyymmdd -lt $end_date_yyyymmdd) {

 $startdate = $start_date_yyyymmdd.tostring('yyyy-MM-dd')

 "starting $startdate"

 $query = @"

 select json_extract_scalar(myfield, '$.fieldjsonparsed') as fielda, dt,

 fieldb, fieldc

  from myschema.mytablename

  where dt='$startdate'

"@

 

 $results = get-odbc-data2 -query $query -dsn "your_simba_athena_dsn_name_here" -param "uid=youraccesskeyhere;PWD=yoursecretkeyhere" 

 #save non-systemdatarow version to our main array

 $formattedresults +=$results|select-object fielda,fieldb,fieldc 

  $start_date_yyyymmdd = $start_date_yyyymmdd.adddays(1)

 }

 

 #export using ImportExcel module, which unfortunately needs at least higher than V3

 $formattedresults| export-excel -AutoSize -BoldTopRow -FreezeTopRow -TableName Test -TableStyle "Light13" -WorkSheetname Test -Path c:\temp\$filename.xlsx


No comments: