Windows 7 - Speed of using OLE for automation

Asked By Christian on 25-Jan-07 06:28 PM
Hi,

I need some information in speed of OLE with MFC. I have created some
functions to parse a MS-Word document. This functions extract all the
text in the document body, including tables, headers and footers.
Additionally, style, font size, page and line number is extracted.
What I am doing is to get all the paragraphs of the document and handle
every one of them (getting the text, getting style and font
information...). I have to do it this way because I need to find some
specific parts of the text characterized by style, font or parts of the
text.
The challenge is that I need to parse large documents with my
functions. The documents are about 1.5MB/80 pages. At the moment, it
takes around 20-30 minutes to check these documents. This is definitely
to much. Is there any hint how the speed of OLE may be increased.
I used a profiler to see where the time is lost.
Most of the time is used for calls to
COleDispatchDriver::InvokeHelper
COleDispatchDriver::CreateDispatch
COleDispatchDriver::~COleDispatchDriver

These three functions take around 90% of the overall time.

Is there a way to for speed improvement?

Or do you have any experience how long it should take to parse such
documents.
Any help is appreciated.
Thanks.

Christian




Cindy M. replied on 30-Jan-07 11:44 AM
Hi Christian,

If you automate Word, it's going to be slow. If you have to "walk" the
document, it's going to be slow.

You might get an increase in speed if you pack the automation code that
works closely with the Word object model into VBA procedures. Most likely
in a template that your app loads as an Addin object, then unloads when
it's finished. "Native" code runs significantly faster when dealing with
an Office application because it doesn't have to cross the OLE
boundaries.

If you didn't need the page and line numbers, I'd say save the documents
as RTF or (Word 2003 and later) XML, then extract the information from
them without opening Word. But if you need line and page numbers, you
have no choice because Word has to lay out the document dynamically for
this information.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :-)
Christian replied on 04-Feb-07 06:22 AM
Hi Cindy,

thanks for your reply. Unfortunately, I have to walk throught the
whole document to check if there is "interesting" text in it. And I
also need page and line number. I realized that the functions to get
this information are extremly slow. No idea why but it seems that they
take half of the time.
So it seems that there is no more room for optimization.
Thank you.

Christian