상세 컨텐츠

본문 제목

Bulk Insert into SQL from C# App

C#

by 탑~! 2008. 6. 18. 16:39

본문

One of the common problems I have seen is to bulk upload data to a SQL Server database. If you have the flexibility to directly run your code in SQL, you have a ton of options. But let's say that you have to massage the data before you throw it in to the database, then you have to really know your SQL (well to do it in SQL).

Let's say you have to read data from an RSS feed, parse it and then load it into SQL. Let's assume further that this feed updates every 2 hours. It would be a trivial task to write a C# app that reads and parses the feed. One crude way to upload this data would be to do a single row insert for each data element. This would be terribly inefficient. The other option would be to use .Net framework's SqlBulkCopy class.

The basic template would be something like

private void WriteToDatabase()
{
    // get your connection string
    string connString = "";
    // connect to SQL
    using (SqlConnection connection = 
            new SqlConnection(connString))
    {
        // make sure to enable triggers
        // more on triggers in next post
        SqlBulkCopy bulkCopy = 
            new SqlBulkCopy
            (
            connection, 
            SqlBulkCopyOptions.TableLock | 
            SqlBulkCopyOptions.FireTriggers | 
            SqlBulkCopyOptions.UseInternalTransaction,
            null
            );

        // set the destination table name
        bulkCopy.DestinationTableName = this.tableName;
        connection.Open();

        // write the data in the "dataTable"
        bulkCopy.WriteToServer(dataTable);
        connection.Close();
    }
    // reset
    this.dataTable.Clear();
    this.recordCount = 0;
}

The above code snippet shows you the API usage. But before you actually do that, you need to follow a couple of steps to setup your data table.

First, let's look at a simple record structure (as reflected in C# class):

using System;
using System.Data;
using System.Configuration;

/// <summary>
/// Summary description for MyRecord
/// </summary>
public class MyRecord
{
    public int TestInt;
    public string TestString;

	public MyRecord()
	{
	}

    public MyRecord(int myInt, string myString)
    {
        this.TestInt = myInt;
        this.TestString = myString;
    }
}

Now, let's start dissecting the class that we will use to upload the data:

using System;
using System.Data;
using System.Collections.Generic;

using System.Data.SqlClient;
using System.Configuration;
using System.IO;

namespace SqlExamples.FileLoader
{
/// <summary>
/// Summary description for BulkUploadToSql
/// </summary>
public class BulkUploadToSql
{
    private List<MyRecord> internalStore;

    protected string tableName;
    protected DataTable dataTable = new DataTable();
    protected int recordCount;
    protected int commitBatchSize;

Note that we have an internal List data structure as well as the DataTable. This is redundant and you can avoid using the internalStore if your application does not need to massage the data before it's sent to SQL.

I then define 2 private constructors. The reason is that we want to use the factory pattern to return our object to the caller.

    private BulkUploadToSql(
        string tableName, 
        int commitBatchSize)
    {
        internalStore = new List<MyRecord>();

        this.tableName = tableName;
        this.dataTable = new DataTable(tableName);
        this.recordCount = 0;
        this.commitBatchSize = commitBatchSize;

        // add columns to this data table
        InitializeStructures();
    }

    private BulkUploadToSql() : 
        this("MyTableName", 1000) {}

Note that we set the commit batch size. This is a very important factor that needs to be fine tuned for your database. What this defines is the number of records that we would send in one shot to the database.

The next step is to Initialize the data table with columns that reflect the actual table structure.

private void InitializeStructures()
{
    this.dataTable.Columns.Add("TI", typeof(Int32));
    this.dataTable.Columns.Add("TS", typeof(string));
}

I then provided a factory method to load data into my internal structure from a data source. In the example code below, I use a Stream, but this can be any data source from where you wish to populate your data.

public static BulkUploadToSql Load(Stream dataSource)
{
    // create a new object to return
    BulkUploadToSql o = new BulkUploadToSql();
    
    // replace the code below
    // with your custom logic 
    for (int cnt = 0; cnt < 10000; cnt++)
    {
        MyRecord rec = 
            new MyRecord
            (
            cnt, 
            string.Format("string{0}", cnt)
            );
        o.internalStore.Add(rec);
    }

    return o;
}

This would make sure that our class is properly initialized and loaded with data. Once the caller has a valid object, they can now "Flush" the data as shown below:

public void Flush()
{
    // transfer data to the datatable
    foreach (MyRecord rec in this.internalStore)
    {
        this.PopulateDataTable(rec);
        if (this.recordCount >= this.commitBatchSize)
            this.WriteToDatabase();
    }
    // write remaining records to the DB
    if (this.recordCount > 0)
        this.WriteToDatabase();
}

private void PopulateDataTable(MyRecord record)
{
    DataRow row;
    // populate the values
    // using your custom logic
    row = this.dataTable.NewRow();

    row[0] = record.TestInt;
    row[1] = record.TestString;

    // add it to the base for final addition to the DB
    this.dataTable.Rows.Add(row);
    this.recordCount++;
}

In the example above, the call to Flush() actually massages the data (and at the same time loads it into the actual data table). As I mentioned before, you can actually skip this step if your application does not require massaging.

As a example of an app that uses this class:

using System;
using System.Collections.Generic;
using System.Text;
using SqlExamples.FileLoader;
using System.IO;

namespace DemoApp
{
class Program
{
    static void Main(string[] args)
    {
        using (Stream s = 
            new StreamReader(@"C:\TestData.txt"))
        {
            BulkUploadToSql myData = 
                BulkUploadToSql.Load(s);
            myData.Flush();
        }
    }
}
}

As always, this is JUST demo code to explain a concept. This is NOT production quality code and please make sure to follow the coding guidelines in your team.

출처 : http://blog.theple.com/starhunt/folder/4.html?cmode=View

'C#' 카테고리의 다른 글

이벤트 동시 등록하는 방법  (0) 2008.06.18
ADO.NET DB연결 문자열  (0) 2008.06.18
.Net 2.0 원격 DB 디버깅  (0) 2008.04.04
C#으로 Transactional Object에 Object Pooling을 사용하도록 구성하기  (0) 2008.04.01
진법변환  (0) 2008.04.01

관련글 더보기