NPOI 2.0 - Automatic Format/Type Inference for Excel 2003/2007

This is the 1st post of a series of posts about NPOI 2.0.

NPOI is jumping from version 1.2.5 directly to 2.0. A big reason for this is that it now supports Office 2007 formats (.xlsx and .docx). Given this fact, using NPOI 2.0 will seem a little bit more complicated than the previous version because NPOI now provides multiple namespaces, including HSSF (Excel 2003), XSSF (Excel 2007) and XWPF (Word 2007).

To be able to automatically identify Excel formats when reading a file from the file system and to avoid you having to infer the file type yourself, NPOI provides a very convenient class NPOI.SS.WorkbookFactory.

public class WorkbookFactory
    public static WorkbookFactory Create(POIFSFileSystem fs){ ... }
    public static WorkbookFactory Create(OPCPackage pkg){ ... }
    public static IWorkbook Create(Stream inputStream){ ... }
    public static IFormulaEvaluator CreateFormulaEvaluator(IWorkbook workbook){ ... }

The difference between the first two Create methods introduced under POIFSFileSystem and OPCPackage is that POIFSFileSystem reader library uses OLE2 format and OPCPackage uses OOXML format (commonly known as ActiveX Document Format). These two libraries are used to read both Excel 2003 (.xls) and Excel 2007 (.xlsx) formats respectively and because they’re the underlying libraries, they’re not limited to reading .xls and .xlsx formats. You can also read Thumb.db using POIFSFileSystem file format for example. You can download a sample project here.

Since you already know the difference between POIFSFileSystem and OPCPackage, you should understand what these two methods do. You are required to know what kind of document you are opening and then pass the file system to WorkbookFactory that in turn can automatically store a HSSFWorkbook or XSSFWorkbook depending on the file type. As these two classes implement IWorkbook interface, in most cases you do not need to care about what class instance it returns unless you use some advanced or specific Excel 2007 feature.

The third one is the most important part to introduce today – it does automatic inference, as long as you pass a Stream object it will know/infer whether it's .xls or .xlsx. Finally it returns the appropriate workbook instance.

The last one returns an IFormulaEvaluator that describes a formula object. Here a similar inference principle applies. Both HSSF and XSSF have formula calculation/evaluation classes, namely HSSFFormulaEvaluator and XSSFFormulaEvaluator.

OK we covered a little bit of code above but it was only abstractions, let’s see some real code:

using System;
using NPOI.SS.UserModel;
using System.IO;

class Program
    static void Main(string[] args)
        if (args.Length < 1)
            Console.WriteLine("missing argument: Excel file path (both 2003 and 2007 are supported)");
return; } using (FileStream fs = File.OpenRead(args[0])) { IWorkbook wb = WorkbookFactory.Create(fs);
Console.WriteLine("Value of Cell B2: " + wb.GetSheetAt(0).GetRow(1).GetCell(1)); }
Console.Read(); } }

This program assumes that the workbook file with a sheet exists on the file system and that the first sheet has a value in cell B2 or an error will be raised. The following is a screenshot of the workbook in question.

Excel Workbook with a single Sheet and value 5 in cell B2Figure 1 - Excel Workbook with a single Sheet and value 5 in Cell B2

Command line statement and test parameters were as follows:

Excel 2007: WorkbookFactoryDemo.exe demo2007.xlsx

Excel 2003: WorkbookFactoryDemo.exe demo2003.xls

The result should be equal the value of cell B2 = 5.

If we analyze the above program it seems we are not accessing anything from XSSF and HSSF namespaces, but in fact the wb is certainly an instance of HSSFWorkbook or XSSFWorkbook, that is, the real instance type/format is transparent to the user. This is where the IWorkbook interface shines.

For now you can study the source code here. Many NPOI 2.0 new features are not documented yet. So keep an eye on this blog for the next installment on this series.