Recently, we have learned how to read Excel workbooks using the Microsoft Office COM APIs. As you may already know that the COM APIs are slow while performing operation, we will see another way to read the content which is faster.
In this article, we will learn how to read Excel 2007 workbooks using the Apache NPOI libraries which is available freely to use in your application.
Basic concepts about NPOI library
Before starting with the code, you should have the basic knowledge about the NPOI library. NPOI is the .NET version of POI Java project, originally hosted at http://poi.apache.org. It is a free, open source project which can help you to read/write Word, Excel, PowerPoint document files. You can find the source code of NPOI project hosted at https://github.com/tonyqus/npoi. The libraries can be downloaded from NuGet from this URL: https://www.nuget.org/packages/NPOI.
You may like to read:
Reading Excel 2007 document format using NPOI
To read the 'Excel 2007' file format, i.e. the file having extension of .xlsx, you will need to use the NPOI.XSSF.Extractor.XSSFExcelExtractor class. It extends base POIXMLTextExtractor and inherits IExcelExtractor interface. The exposed property 'Text' provides you the document content that includes all the sheets.
To read the content of the said Excel file, create an instance of XSSFExcelExtractor by passing the file path to the constructor. Optionally you can include or exclude cell comments, header and footer information, sheet names to the output result. Now, call the Text property of the instance to read the file text. Code has been shared below, for easy reference. You may have to handle the exceptions that you encounter while accessing/reading the content.
I hope that the above code was helpful for you to read the Excel 2007 file format. In the next post, we will learn how to read Excel 97-2003 format using the free, open source NPOI library. Don't forget to read and share the post that I publish. Have a great day!