From Screen Scraping to Raw HTML: A Powerful Application Development Platform for the Desktop Too

Blog



From Screen Scraping to Raw HTML: A Powerful Application Development Platform for the Desktop Too


As a response to our recent article "Using Alpha Five Web Development Tools to Crawl and Scrape Other Websites", we received the following blog from independent Alpha developer Jay Talbott showing how the use of "screen scraping" can add powerful functionality to desktop applications as well as web and mobile apps. As readers of this blog know, here at Alpha Software, we are committed to creating an application development platform that allows you to build applications once, and run them on the desktop, web and mobile devices. As we move swiftly into the mobile-plus era, we understand that for business, mobile devices are not going replace desktop and laptops.

From Screen Scraping to Raw HTML

by Jay Talbot

I have a client whose business requires them to enter scores of customers' addresses each day. This task can introduce errors at many levels: the customer can write the address illegibly, the data entry person can fail to read the address properly, or the data can be mistyped as it is entered. Wrong addresses cause many problems and become costly, so the client asked about address verification. Who better to ask than the US Postal Service? The first inquiry to the USPS uncovered a expensive DVD with limited address verification capability. This software contains valid ranges of addresses for specific streets in their respective cities, but does not validate any particular address. In other words, the software will confirm that addresses on N. Jackson Street in a particular city may range from 50 to 1890, but it will not tell you that 102 N. Jackson Street does not exist.

We needed more specific software. The USPS website has a page which will verify ZIP Codes here. This page allows the user to input a street address, the city and state, and the page returns a validated ZIP Code+4. The input page looks like this:



and the resulting page after clicking on the Submit button looks like this:



Notice the section below entitled: Full Address in Standard Format. This gives the street address, city, state, and ZIP+4.

A non existent address (102 N. Jackson St., Arlington, VA) returns this:



This page states that the address “may be Non-Deliverable”.

So, the challenge was two-fold: input the address automatically, and extract the results automatically.

I first decided to "screen scrape" the web site by opening the USPS web page in an ActiveX browser, and fill in the address programmatically. The first working iteration looked something like this:
dim global vc_address1 as c = "100 N. Jackson St."
dim global vc_address2 as c = ""
dim global vc_city as c = "Arlington"
dim global vc_state as c = "VA"
dim global vc_zip as c = "22201"
dim edit as P
dim edit.object as P
dim edit.class as C
dim edit.events as C
dim shared varC_result as C
dim edit_url as C = "www.msn.com"
edit_url="http://zip4.usps.com/zip4/zcl_0_results.jsp?visited=1&pagenumber=0&firmname=&address2="+\
vc_address2+"&address1="+vc_address1+"&city="+vc_city+"&state="+vc_state+"&urbanization=&zip5="+vc_zip
edit.class = "shell.explorer"
ok_button_label = "&OK"
cancel_button_label = "&Cancel"
varC_result = ui_dlg_box("Address Validation for "+vc_address1+", "+vc_city+","+vc_state+" "+vc_zip,<<%dlg%
{startup=init}
{region}
{activex=200,50edit?.f.};
{endregion};
{line=1,0};
{region}
<*15=ok_button_label!OK> {endregion};
%dlg%,<<%code%
if a_dlg_button = "init" then
a_dlg_button = ""
hourglass_cursor(.t.)
if edit_url <> "" then
on error goto edit_error
edit.object.navigate(edit_url)
on error goto 0
end if
hourglass_cursor(.f.)
end if
if a_dlg_button = "edit_go" then
a_dlg_button = ""
hourglass_cursor(.t.)
if edit_url <> "" then
on error goto edit_error
edit.object.navigate(edit_url)
on error goto 0
end if
hourglass_cursor(.f.)
end if
end
edit_error:
ui_msg_box("Error","Invalid URL.",UI_STOP_SYMBOL)
end
%code%)

This script looks like this in an Xdialog box:



What this script does is to open a web page in an Xdialog box. The address is hard-coded into the variables, but can easily be transferred from a form. This feature can be extremely useful in many cases, but the problem with this was two-fold. First, it popped up a web page even if the address was valid, and that only served to slow data entry down. Second, it would not return the ZIP+4 into the database, so an alternative script was needed. The alternative script doesn't use an ActiveX browser control at all, but uses Alpha Five's http_get_page2() function to get the text of a web page.

What we finally ended up with were the following scripts:

Script 1 plays on the OnDepart event of the ZIP field on the form:
dim global vc_address1 as c
dim global vc_city as c
dim global vc_state as c
dim global vc_zip as c
dim t as p
t=table.current(1)
vc_address1 = t.cstreet
vc_city = t.ccity
vc_state = t.cstate
vc_zip = t.czip
script_play("address_validation")
if t.mode_get() <> 1 then
t.change_begin()
t.czip = vc_zip
else
t.czip = vc_zip
end if
parentform.refresh_fields()
Script 2 is the Address_Validation script referred to in Script 1 (above):
dim global vc_address1 as c
dim global vc_address2 as c
dim global vc_city as c
dim global vc_state as c
dim global vc_zip as c
url="http://zip4.usps.com/zip4/zcl_0_results.jsp?visited=1&pagenumber=0&firmname=&address2="+\
vc_address2+"&address1="+vc_address1+"&city="+vc_city+"&state="+vc_state+"&urbanization=&zip5="+vc_zip
result=http_get_page2(url,.f.)
select
case "td headers=\"full\""$result
msg="Valid address!"
scanner=stringscanner.create(result)
scanner.SkipToString("td headers=\"full\"")
scanner.SkipToString(">")
scanner.SkipOverString(">")
scanner.SkipToString("  ")
scanner.SkipOverString("  "
vc_zip=scanner.ScanToString("
")
vc_zip=alltrim(vc_zip)
msg=msg+crlf()+"Zip is "+vc_zip
case "td headers=\"non\""$result
msg="Non-deliverable address!"+crlf()+"Please correct !"
ui_msg_box("Note:",msg,ui_stop_symbol)
end
case else
msg="Program error. Cannot tell if address is deliverable."
ui_msg_box("Note:",msg,ui_stop_symbol)
end select
This script relies on the http_get_page2 function to read the contents of a web page without opening a browser window. The http_get_page2 function returns the contents of the web page into a text variable (in this case we are calling the variable “result”). The stringscanner functions are then used to preform the functions needed on that text. This function “provides high performance string manipulation functions” according to the Alpha help file, and you can see from the script above it allows us to get to the exact location in the “result” variable and pick out the ZIP+4 very easily and quickly.

The select – case function determines if the text is coming from a web page which is validating the address (the td headers contains the word “full”) or if the web page rejects the address (the td headers contains the word “non”). The second and third screen shots (above) show why we are using that criteria.

The above scripts are used in conjunction with a ZIP code lookup table, which automatically puts in the city and state when the ZIP code is put into the field.

This helped save my client the $900 software fee from the USPS, and ensured that the client would never have to update the software. It also made data entry seemless and accurate, only flagging the user if the address was invalid.

A short video of the scripts in action is available below:



Jay is an independent software developer. You can reach him at jay@jamestalbott.com





Why IT project failure is largely a thing of the past with Alpha Five Version 11
Forrester Reports On "The Expanding Role Of Mobility In The Workplace", Reinforcing Alpha Software’s Increasing Focus on Mobile App Development

About Author

Default Author Image
Chris Conroy

Chris Conroy runs digital programs for Alpha Software.

Related Posts
Role-Based Security for Business Apps
Role-Based Security for Business Apps
Evaluating Low Code Mobile App Development Platforms
Evaluating Low Code Mobile App Development Platforms
Building Business Apps with Flexible Design
Building Business Apps with Flexible Design

Comment

Subscribe To Blog

Subscribe to Email Updates