ETL with Bulk Insert Using ODP.Net

As a part of an ETL process it is likely to load data from some flat files. And in some situations, data in the file may not be well-formed. So data need to be cleaned/transformed before loading into database. To achieve this goal, developing your own tool could be much more practical. Let's say, you decided to write your code in C#. In that case, you need to use ODP.Net library to connect Oracle. I'd like to show you three ways of loading data using ODP.Net. 

First one is row-by-row basis. Pseudo code representing the case is as below:

For each row in the file
   Read row;
   Do transformations of the row;
   Insert row;
loop;

To execute insert statements, I will use the method below. It takes advantage of bind variable to avoid hard parsing.

private void ExecuteNonQuery(string commandText, Dictionary<string, object> parameters)
{
   using (OracleCommand comm = new OracleCommand(commandText))
   {
      if (orclConn.State != System.Data.ConnectionState.Open) 
         orclConn.Open();
      comm.Connection = this.orclConn;
      foreach (KeyValuePair<string, object> kvp in parameters)
         comm.Parameters.Add(kvp.Key, kvp.Value);
      comm.ExecuteNonQuery();
   }
}

When I tested the code above for a file with 46000 rows and 24 columns on a test database. It took 641.26 seconds to complete.

Second method is almost same with the first one accept it corrects a mistake of it. This method will commit only once. The code above doesn't contain a commit statement because OracleCommand objects commits automatically, in other words auto commit is on by default. Which means in the first method I've committed for 46000 times. To correct the situation by turning auto commit down, my coding will be:

Begin a transaction: OracleCommand comm = GetTransaction();
For each row in the file
   Read row;
   Do transformations of the row;
   Insert row: ExecuteNonQuery(comm, "<insert statement>", parameters);
loop
Commit transaction: comm.Transaction.Commit();

Methods referenced above are:

private OracleCommand GetTransaction()
{
   OracleTransaction dbTran = this.BeginTransaction;
   OracleCommand comm = new OracleCommand();
   comm.Connection = this.orclConn;
   comm.Transaction = dbTran;
   return comm;
}

private void ExecuteNonQuery(OracleCommand comm, string commandText, Dictionary<string, object> parameters)
{
   comm.CommandText = commandText;
   foreach (KeyValuePair<string, object> kvp in parameters)
      comm.Parameters.Add(kvp.Key, kvp.Value);
   comm.ExecuteNonQuery();
}

With this method, test took 604.04 seconds, faster than the first one a little bit. Why the difference is so minor? Because in Oracle, commits take almost the same time for a row or a million rows inserted. Redo buffer is already flushed probably many times since LGWR process flushes redo:
  • Every three seconds, 
  • When someone commits, 
  • When switching log file,
  • When redo buffer gets full 1/3 of it or has 1 MB of cached redo log data
So what is left for commit is triggering one more flush. Actually what is expensive in a transaction is rollback rather than commit.

Third and the final method is using arrays as bind variables as described in the article. To implement array binding, we'll store the rows in a list and extract values from that list as arrays:

For each row in the file
   Read row;
   Do transformations of the row;
   Add row to a list;
loop
Insert rows;

To insert rows using array binding, required code:

OracleCommand comm = GetTransantionCommand;

comm.ArrayBindCount = rows.Count;
comm.CommandText = "insert into t value(:p1,:p2)";
comm.Parameters.Add("p1", OracleDbType.Varchar2, rows.Select(t => t.Attribute1).ToArray(), ParameterDirection.Input);
comm.Parameters.Add("p2", OracleDbType.Date, rows.Select(t => t.Attribute2).ToArray(), ParameterDirection.Input);
comm.ExecuteNonQuery();
// If another DML if needed, clear parameters and array bind
comm.Parameters.Clear();
comm.ArrayBindCount = 0;
//
comm.Transaction.Commit();

This time test resulted in 9.81 seconds, more than 98% improvement. Single insert statement sweeps all the overhead of round trips caused by multiple inserts.

These are the good, the bad and the ugly (if we skip using bind variable, it could be uglier) methods of bulk insert. 

Changing IP Addresses of a 11g RAC

There are three phases for changing IP addresses of an Oracle Cluster related to three groups of networks used; public, VIP and SCAN. Below you may find the steps to alter IP addresses of a 11g RAC installation running on Oracle Linux 5. Be sure you follow the steps in order because changes for public network should be done before VIP network changes.

Changing Public IPs

1. Check current network information:
$GRID_HOME/bin/oifcfg getif
eth0  10.10.11.0   global   public
eth1  192.168.0.0   global   cluster_interconnect

2. Delete the existing interface information from OCR
$GRID_HOME/bin/oifcfg delif -global eth0/10.10.11.0

3. Add it back with the correct information
$GRID_HOME/bin/oifcfg setif -global eth0/10.10.12.0:public

4. Shutdown the cluster
$ crsctl stop cluster -all

5. Modify the IP address at network layer, DNS and /etc/hosts file to reflect the change. Files to modify/check are:
   - /etc/sysconfig/network-script/ifcfg-eth0
   - /etc/sysconfig/network
   - /etc/hosts

Restart network interface to activate changes
$ ifdown eth0
$ ifup eth0

6. Restart the cluster
$ crsctl start cluster -all

Changing VIPs

1. Check current configuration
$ srvctl config nodeapps -a
Network exists: 1/10.10.11.0/255.255.255.0/eth0, type static
VIP exists: /racnode1-vip/10.10.12.11/10.10.12.1/255.255.255.0/eth0, hosting node racnode1
VIP exists: /racnode2-vip/10.10.12.12/10.10.12.1/255.255.255.0/eth0, hosting node racnode2

2. Stop the database instance and VIP:
$ srvctl stop instance -d <db_name> -n racnode1
$ srvctl stop vip -n racnode1 -f

3. Ensure VIP is offline and VIP is not bounded to network interface
$ crsctl stat res -t
$ ifconfig -a

4. Modify the IP address at network layer, DNS and /etc/hosts file to reflect the change

5. Alter the VIP
$ srvctl modify nodeapps -n racnode1 -A 10.10.12.12/255.255.255.0/eth0

6. Verify the change
$ srvctl config nodeapps -n racnode1 -a
VIP exists.: /racnode1-vip/10.10.12.12/255.255.255.0/eth0 hosting node racnode1

7. Start the database instance and VIP
$ srvctl start vip -n racnode1
$ srvctl start instance -d <db_name> -n racnode1

8. Ensure VIP is online and VIP is bounded to network interface
$ crsctl stat res -t
$ ifconfig -a

9. Repeat the steps above for the other nodes in the cluster

Changing SCAN IPs

1. Update DNS with the new IP addresses. If host file is used, change IP in the host file.

2. Stop the SCAN listener and the SCAN
$ $GRID_HOME/bin/srvctl stop scan_listener
$ $GRID_HOME/bin/srvctl stop scan
$ $GRID_HOME/bin/srvctl status scan

3. Check the current IP address(es) of the SCAN
$ $GRID_HOME/bin/srvctl config scan
SCAN name: <scan_name>, Network: 1/10.10.12.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /<scan_name>/10.10.11.15

4. Refresh the SCAN with the new IP addresses from the DNS entry:
$ $GRID_HOME/bin/srvctl modify scan -n <scan_name>

5. Check whether the SCAN has been changed
$GRID_HOME/bin/srvctl config scan
SCAN name: <scan_name>, Network: 1/10.10.12.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /<scan_name>/10.10.12.15

You can refer to documents; [ID 276434.1] for public and VIP network changes and [ID 952903.1] for SCAN IP changes for detailed information.

Oracle R Enterprise Output Error: ORA-22621

Embedding R into Oracle database is a terrific idea. I believe anyone practicing data analysis and/or data mining should should take look at it. Although it's not a mature environment yet, some workarounds may be needed sometimes. Such as the one I come up against recently. To be clear beforehand, I'm running Oracle R Enterprise 1.1 on a 11.2.0.3 database. To illustrate the problem here is a demonstration:

Create a embedded function which returns a data frame/table with three columns which are type of integer, decimal and character respectively:
begin
  sys.rqScriptCreate('Test',
     'function(){
        data.frame(a=1:10, b= seq(0.1, by=0.1),c=letters[1:10])
  }');
end;
/

And call it within a query:
select a, b, c
from table(rqsys.rqEval(null,
         'select 1 a, 1 b, cast(''c'' as varchar2(30)) c from dual',
         'Test'));

Which will result with error:
ORA-22621: error transfering an object from the agent
ORA-06512: at "RQSYS.RQTABLEEVALIMPL",  line 60
22621. 00000 -  "error transfering an object from the agent"
*Cause:    Any error returned from pickler routines on the agent side.
*Action:   Contact Oracle Support.

The actual problem is the output format in fact. rq*Eval() functions takes output table definition as a parameter and it works smoothly with numeric columns, yet same cannot be said for character based columns. The workaround I came up with is taking advantage of good old XML. If you call the function with XML option instead of giving table format definition, you have the result set without an error. So keep on going this way and process the output which is CLOB containing XML:

select x.*
from table(rqsys.rqEval(null, 'XML', '
Test')) t,
     xmltable('//root/frame_obj/ROW-frame_obj' passing xmltype.createxml(t.value)
             columns a varchar2(1) path 'a/text()',
                     b varchar2(3) path 'b/text()',
                     c varchar2(1) path 'c/text()')(+) x;


Database Triggers

Database triggers are well-matched ways of monitoring your database and logging events you'd like know about. It's possible to capture errors occurred and even notify yourself for exceptional situations. An example of these benefits is what I'll briefly present here. Below you'll find a demonstration of a trigger which logs errors and sends a mail when out of space error occurs.

First of all, start with creating the table to log into:

CREATE TABLE SERVERERROR_LOG
(
ERROR_TIME TIMESTAMP,
DB_USER VARCHAR2(30),
OS_USER VARCHAR2(30),
USER_HOST VARCHAR2(64),
CLIENT_PROGRAM VARCHAR2(48),
SQLID VARCHAR2(13),
ERROR_STACK VARCHAR2(2000)
);


Table keeps records of SQL running (actually ID of the SQL to save space, since full SQL text can be queried) and the error it caused beside client information. Here is the trigger:

CREATE OR REPLACE TRIGGER log_server_errors
AFTER SERVERERROR ON DATABASE
DECLARE
osuser VARCHAR2(30);
machine VARCHAR2(64);
prog VARCHAR2(48);
sqlid VARCHAR2(13);
err_type VARCHAR2(19);
obj_type VARCHAR2(19);
owner VARCHAR2(30);
ts VARCHAR2(30);
obj VARCHAR2(30);
sub_obj VARCHAR2(19);
BEGIN
SELECT osuser, machine, program, sql_id
INTO osuser, machine, prog, sqlid
FROM gv$session
WHERE audsid = userenv('sessionid') and inst_id = userenv('instance') and status = 'ACTIVE';

INSERT INTO servererror_log VALUES
(systimestamp, sys.login_user, osuser, machine, prog, sqlid, dbms_utility.format_error_stack);
COMMIT;

IF (SPACE_ERROR_INFO(err_type,obj_type,owner,ts,obj,sub_obj) = TRUE) THEN
UTL_MAIL.SEND( sender => 'oradb@xyz.com',
recipients => 'dba@xyz.com',
subject => 'Insufficient Tablespace Size',
message => err_type || ' at tablespace ' || ts
);
END IF;
END log_server_errors;


Trigger queries gv$session view to gather client information and inserts to log table. Then, if error is a out of space error, trigger sends an e-mail using UTL_MAIL package. This's a pretty straightforward trigger you can use and improve according to your needs.

Upgrading Oracle Enterprise Manager 12c R1 to 12c R2

As you might know, release 2 of Oracle Cloud Control 12c is released. If you already have installed 12c R1, upgrading to R2 is a simple process. 1-system upgrade approach is the only way to upgrade, even you have installed bundle patch 1 or not. Basically, upgrade contains two steps; upgrading Oracle Management Service and its repository (installer handles both upgrades) and upgrading agents. Here are the steps on Linux :

1. Download installation files from OEM Downloads page and unzip them. A disk space of 5.3 GB is required for decompressed installation files.

2. Stop OMS. You can also stop the service when installer prompts you to say it is still running.

$ emctl stop oms

3. Run the installer

$ ./runInstaller







 As a prerequisite, emkey should be configured properly. Run the following command:

$ emctl config emkey -copy_to_repos_from_file -repos_host <host_fqdn> -repos_port <port_number> -repos_sid <database_sid> -repos_user sysman -emkey_file /u01/app/Oracle/Middleware/wlserver_10.3.5/oms/sysman/config/emkey.ora






During installation process, repository database is also upgraded.

4. After the upgrade process, as the second phase, you need to upgrade agents. Logon to OEM, from menu Setup -> Manage Cloud Control -> Upgrade Agents. Click "Add"  and select agents to upgrade, then submit the job.


Linux based agents are marked as upgradable right away. However, when you check not upgradable agent from the Upgrade Agents page, you will see Windows based agent are listed. Because they need an extra step. Agent software should be updated from Setup -> Extensibility -> Self Update. Find the new versions of Microsoft Windows agents under Agent Software and download them. After download (you may select "Notify me" option for the download job), apply update. Following these steps, you may find your Windows based agents in upgradable agents list.

Finally, if you have upgraded Linux based agents with a non-root user, you need to run root.sh script located in <agent_home>/core/12.1.0.2.0/

NFS Sharing for Linux

Sharing a directory using NFS to other Linux systems is a common and handy way. To be able to explain briefly, let me separate tasks into two as server and client side. I use Oracle Linux 6 and Oracle Linux 5 respectively as server and client, though just the opposite goes through same steps. However these commands should run on every Red Hat based system.

On the server that you will create the share, edit the export file:

$ vi /etc/exports

Add line(s) to file such as:
/share_dir *(rw,sync,fsid=0) 
and/or
/shares/nfs 192.168.101.11(ro,sync,fsid=0)

Pay attention not to use extra space on a line, it may cause a problem. Here, instead of "*" - which means sharing is for every one, to world - you can use client's IP or FQDN as shown above. Also if it's going to be read-only share write "ro" instead of "rw". We use "fsid=0" parameter not to get "mount.nfs4: Operation not permitted" error.

Permissions of directory, "/share_dir" and/or "/shares/nfs", should be read and execute to others if it's a read-only share or read-write and execute to others if it's writable. Otherwise you will face with "mount.nfs4: Permission denied" error when mounting the share. So alter permissions of directory:

$ chmod 757 /share_dir

Restart related services (If your server is Linux 5, use portmap instead of portreserve. ):

$ /etc/init.d/portreserve restart
$ /etc/init.d/nfs restart

You can check shares by:

$ showmount -e <server_ip> 
$ exportfs -avr

On the client simply mount the share by:

$ mount -t nfs4 <server_ip_or_host_name>:/ /mnt

Notice that, even the shared directory is "/share_dir" or "/shares/nfs" we use "/" only.

Retrieving Index Usage Statistics

Indexes are old friends of every DBA. To check out what your friends are up to, in Oracle, basic way is enable monitoring for the index by:

alter index IX_FOO monitoring usage;

Then you can query V$OBJECT_USAGE to see if it is being used. After a period of time meaningful to you, you can stop monitoring till the next time by:

alter index IX_FOO nomonitoring usage;

However what is lacking in this way of monitoring is statistics. You can see whether index is used but cannot see how many times or how it is used. Fortunately Oracle provides many V$ views. The query below returns basic statistics about your indexes. How it is used (range or full scan etc.), how many times and when was the last time it was used:

select sp.object_name as index_name,
sp.options,
sum(sa.executions) as executions#,
max(timestamp) as last_used
from v$sql_plan sp, v$sqlarea sa
where sa.address = sp.address
and sa.hash_value =sp.hash_value
and sp. operation = 'INDEX'
and object_owner in ('SCOTT')
group by object_name, options
order by 3 desc;

With Exadata, we've started to use less indexes and even dropped some existing ones. Because we want to use Exadata features like storage indexes and smart scan instead, we let the cells do their job. On Exadata we prefer to see fast full scans instead of range scans. To see Exadata specific statistics related to indexes, you can add columns with prefix "io_cell" in V$SQLSTATS or V$SQLAREA views. Such as:

select sp.object_name as index_name,
sp.options,
sum(sa.executions) as executions#,
max(timestamp) as last_used,
sum(sa.disk_reads) as disk_reads,
sum(sa.direct_writes) as direct_writes,
sum(sa.io_cell_offload_eligible_bytes)/1024/1024 as offloadable_mb,
sum(sa.io_cell_offload_returned_bytes)/1024/1024 as offloaded_mb,
sum(sa.rows_processed) as rows_processed
from v$sql_plan sp, v$sqlarea sa
where sa.address = sp.address
and sa.hash_value =sp.hash_value
and sp. operation = 'INDEX'
and object_owner in ('SCOTT')
group by object_name, options
order by 3 desc;

There are many other statistics you can find in V$ views, so you can improve the query according to your needs.