What is screen scraping?

From Screen Scraper Studio

Jump to: navigation, search

Definition

Screen scraping is according to Wikipedia "a technique in which a computer program extracts data from the display output of another program. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing."

The challenge

Wikipedia continues by shedding light on how difficult the scraping is from a technological standpoint. "Screen scraping is generally considered an ad-hoc, inelegant technique, often used only as a "last resort" when no other mechanism is available. Aside from the higher programming and processing overhead, output displays intended for human consumption often change structure frequently. Humans can cope with this easily, but computer programs will often crash or produce incorrect results."

Prior to ScreenScraper SDK, screen scraping solutions were based on employing OCR techniques on screenshots. OCR is traditionally slower, error prone (the best accuracy rates are 95% for typewritten documents), not suitable for applications screens as many UI elements interfere with the OCR algorithms producing undesired results and it is extremely expensive.

Another screen scraping challenge is to cope with windows or controls that change their size and position. Let's say you need to get the invoice number from your CRM app. It is displayed as a label in the Customer Invoice screen. You cannot use fixed screen coordinates because users might change the position of CRM main window, you cannot use relatively client coordinates because the label might flow if the users change the size of the invoicing window and you cannot use the label handle (hwnd) because it changes between different runs of the app.

The solution

The original idea behind Screen Scraper was to intercept and analyze Windows GDI API calls in order to detect the particular text that an application is writing in a given region on the screen. While the idea is clean and nice the implementation is not trivial. It took us 5 years of continuous development to achieve the solid performance of today versions. Our stress test regularly performs 1 million consecutive screen scrapings without any crash or degradation in the system performance.

What about other technologies that do not use GDI to render text?

We support PDF, Flash, Flex, Java, WPF, FoxPro, HTML. For each of these technologies we had to create connectors that understand their internal document object model. This is more limited in scope because text layout information is not exposed but it has the advantage to extract entire text from most UI controls and it works even with scrolling or hidden windows.

Does it cover all scenarios?

As the last resort, we have leveraged OCR technology to be able to scrape those screens that display text as images. We took Google tesseract engine and tweak it to work with screen fonts. This is not a trivial task in itself as OCR engines works best on scaned paper. We were able to get 99.5% accuracy at character level.

What about text position changing?

Here comes UIElement SDK. It identifies windows or controls based on a text ID that is calculated from immutable attributes of a window/control like title and class. You can consider it as a moniker used to recognize a running instance of a UI object.

What's best, UIElement provides a single interface that works the same with different technologies and different controls. You program the same against a a MFC and a Java app. All you have to do is to use ScreenScraper Studio to calculate an ID when you design your scraping process and then you can use that ID to “attach” a UIElement object to the particular UI object you have to work with.

Personal tools
Namespaces
Variants
Actions
Documentation
API reference