Load database from CSV with columns and tables read from file.

The issue at hand here is to load a database with CSV-files in a generic way without having to write separate script for each table.

This short Groovy-script is working based on two conventions:

  • The name of the CSV-file matches the name of the table in the database.
  • The name of the CSV-header matches the name of the column in the table.

The script requires the CSV-files to be read in the folder “to_digest”.

The connection details is placed in a property file, which can be added to the git ignore file if the script is meant to be under GIT version control. In this example configuration variables are stored in the file ‘configuration.properties’ which needs to be in the same folder as the script is executed from.

The property-file contains the following lines:

db_url=jdbc:oracle:thin:@yourdbserver:1521/schema
db_user=yourusername
db_password=yourpassword

Please also note that this script is written for Oracle, and that the Oracle drivers are available in the local Nexus repository as these are not available on the open Internet. This can easily be changed to another database vendor. If you choose to use an Open Source database vendor such as MySQL or Postgres the drivers are available in public maven repositories and you will not need the reference to a local repository through the @GrabResolver annotation.

 

@GrabResolver(name='nexus', root='http://yourorganizationsnexusinstance.you', m2Compatible = 'true')
@GrabConfig(systemClassLoader=true)
@Grapes([
        @Grab(group='com.oracle', module='ojdbc8', version='12.2.0.1'),
        @Grab(group='com.opencsv', module='opencsv', version='3.9')
])


import groovy.sql.*
import com.opencsv.*

import static groovy.io.FileType.FILES


def path = new File("to_digest")
path.traverse(type : FILES, nameFilter: ~/\w*\.csv/) { it ->
    List entries = digestFile(it)
    insertFile(entries.subList(0,1)[0],entries.subList(1,entries.size()),it.name.take(it.name.lastIndexOf('.')))

}


private List digestFile(def path){

    CSVReader reader = new CSVReader(new FileReader(path),(char)';')
    List myEntries = reader.readAll()
    return myEntries
}


private void insertFile(def header, def data, def name){
Properties properties = new Properties()
File propertiesFile = new File('configuration.properties')
propertiesFile.withInputStream {
    properties.load(it)
}
    println name
    Sql conn = Sql.newInstance(properties.db_url,properties.db_user,properties.db_password)
    data.each { rows ->
            String columns = String.join(",", header)
            String values = rows.collect {"'$it'"}.join(",")
            String query = "INSERT INTO ${name} (${columns}) VALUES (${values})"
            conn.execute(query)
    }
}

Safe functions for casting numbers and timestamp in Oracle SQL

In the process of importing data which can be irregular from a flat file it is convenient to import all values as character strings, in this case Oracle’s varchar2, and then cast them into the proper format using the built-in datatype converters as e.g. to_number. The problem occurs when some value which is not convertible is stored in the column. This could be either because table column-overflow in the import process or because the value is optional and may be null or other non-set value. In this case the import methods doesn’t work.

Here are two methods I find useful for solving these problems, resulting in two PL/SQL functions embedding the functions used by oracle, but with a controlled exception where null values are stored and the error logged. Working on tables with several million rows I find this approach more useful than halting the process.

Safe number converter

CREATE OR REPLACE FUNCTION
safe_to_number (p varchar2)
RETURN number
IS
retval number;

BEGIN
IF p IS NULL THEN
retval := null;
ELSE
retval := to_number(p);
END IF;
return retval;

EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Could not change value to number');
retval := null;
return retval;

END;

 

 

Safe timestamp converter

create or replace function safe_to_timestamp
( p_str in varchar2
, p_mask in varchar2)
return timestamp
is
d date;
begin
d := to_timestamp(p_str, p_mask);
return d;
exception
when others then
DBMS_OUTPUT.PUT_LINE('Could not change value to timestamp');
return null;
end;

 

I use these after the tables are imported as straight string-tables, to create a copy, but with the correct data format.

CREATE TABLE formatted_table as
SELECT
safe_to_timestamp(playdatetime,'YYYY-MM-DD HH24.MI.SS.FF') playdatetime,
safe_to_timestamp(dateval,'YYYY-MM-DD HH24.MI.SS.FF') dateval,
safe_to_number(id_table) id_table
FROM unformatted_table;