ETL with Bulk Insert Using ODP.Net
12/12/2012 01:48:00 AM
Gönderen Mete Karar
As a part of an ETL process it is likely to load data from some flat files. And in some situations, data in the file may not be well-formed. So data need to be cleaned/transformed before loading into database. To achieve this goal, developing your own tool could be much more practical. Let's say, you decided to write your code in C#. In that case, you need to use ODP.Net library to connect Oracle. I'd like to show you three ways of loading data using ODP.Net.
First one is row-by-row basis. Pseudo code representing the case is as below:
For each row in the file
Read row;
Do transformations of the row;
Insert row;
loop;
To execute insert statements, I will use the method below. It takes advantage of bind variable to avoid hard parsing.
private void ExecuteNonQuery(string commandText, Dictionary<string, object> parameters)
{
using (OracleCommand comm = new OracleCommand(commandText))
{
if (orclConn.State != System.Data.ConnectionState.Open)
orclConn.Open();
comm.Connection = this.orclConn;
foreach (KeyValuePair<string, object> kvp in parameters)
comm.Parameters.Add(kvp.Key, kvp.Value);
comm.ExecuteNonQuery();
}
}
When I tested the code above for a file with 46000 rows and 24 columns on a test database. It took 641.26 seconds to complete.
Second method is almost same with the first one accept it corrects a mistake of it. This method will commit only once. The code above doesn't contain a commit statement because OracleCommand objects commits automatically, in other words auto commit is on by default. Which means in the first method I've committed for 46000 times. To correct the situation by turning auto commit down, my coding will be:
Begin a transaction: OracleCommand comm = GetTransaction();
For each row in the file
Read row;
Do transformations of the row;
Insert row: ExecuteNonQuery(comm, "<insert statement>", parameters);
loop
Commit transaction: comm.Transaction.Commit();
Methods referenced above are:
private OracleCommand GetTransaction()
{
OracleTransaction dbTran = this.BeginTransaction;
OracleCommand comm = new OracleCommand();
comm.Connection = this.orclConn;
comm.Transaction = dbTran;
return comm;
}
private void ExecuteNonQuery(OracleCommand comm, string commandText, Dictionary<string, object> parameters)
{
comm.CommandText = commandText;
foreach (KeyValuePair<string, object> kvp in parameters)
comm.Parameters.Add(kvp.Key, kvp.Value);
comm.ExecuteNonQuery();
}
With this method, test took 604.04 seconds, faster than the first one a little bit. Why the difference is so minor? Because in Oracle, commits take almost the same time for a row or a million rows inserted. Redo buffer is already flushed probably many times since LGWR process flushes redo:
- Every three seconds,
- When someone commits,
- When switching log file,
- When redo buffer gets full 1/3 of it or has 1 MB of cached redo log data
So what is left for commit is triggering one more flush. Actually what is expensive in a transaction is rollback rather than commit.
Third and the final method is using arrays as bind variables as described in the article. To implement array binding, we'll store the rows in a list and extract values from that list as arrays:
For each row in the file
Read row;
Do transformations of the row;
Add row to a list;
loop
Insert rows;
To insert rows using array binding, required code:
OracleCommand comm = GetTransantionCommand;
comm.ArrayBindCount = rows.Count;
comm.CommandText = "insert into t value(:p1,:p2)";
comm.Parameters.Add("p1", OracleDbType.Varchar2, rows.Select(t => t.Attribute1).ToArray(), ParameterDirection.Input);
comm.Parameters.Add("p2", OracleDbType.Date, rows.Select(t => t.Attribute2).ToArray(), ParameterDirection.Input);
comm.ExecuteNonQuery();
// If another DML if needed, clear parameters and array bind
comm.Parameters.Clear();
comm.ArrayBindCount = 0;
//
comm.Transaction.Commit();
These are the good, the bad and the ugly (if we skip using bind variable, it could be uglier) methods of bulk insert.
This entry was posted on October 4, 2009 at 12:14 pm, and is filed under
Oracle
. Follow any responses to this post through RSS. You can leave a response, or trackback from your own site.
Subscribe to:
Post Comments (Atom)
13/12/12 15:53
Thanks and this was very helpful at right time! SAP ABAP Online Training
13/12/12 22:47
I'm glad if I could help:)