Get a Jumpstart with our Sample Apps
Results 1 to 9 of 9

Thread: scraper data too large for variable how to cycle past unwanted data

  1. #1
    Member
    Real Name
    Steven Greer
    Join Date
    Jun 2008
    Location
    Tampa, Florida
    Posts
    352

    Default scraper data too large for variable how to cycle past unwanted data

    I have a wp scraper that I have built and for quite a while it has ran good, however now there is too much meta data and unneeded text on the html page to put into a variable so data is getting cut off and lost and not working.

    This is a 2 part script that runs a function I wrote.
    Code:
    'Date Created: 29-Sep-2012 08:15:44 PM
    'Last Updated: 28-Mar-2017 11:46:41 AM
    'Created By  : Steven
    'Updated By  : steve
    dim shared tbl as P
    dim shared p3 as waitdialog
    'dim count as N
    'a5.command("VIEW_TRACE")
    
    tbl = table.open("catdata")
    
    
    tbl.fetch_first()
    
    WHILE .NOT. tbl.fetch_eof()
    	
    	DIM shared street as C
    	DIM shared zipcode as C
    	dim shared city as c
    	dim shared state as c
    	p3.create(1,"repeating")
    	p3.set_message("Automated Scrape Currently Scraping ")
    	p3.Pause()
    	p3.Set_Color("red")
    p3.resume()
    street = tbl.Street
    zipcode = tbl.Zipcode
    city = tbl.City
    state = tbl.State
    sdate = tbl.Saledate	
    	autowp()
    	tbl.fetch_next()
    END WHILE
    
    
    
    tbl.close()
    p3.Close()
    
    
    DIM Shared varP_leadsbr as P
    DIM layout_name as c 
    layout_name = "leadsbr@c:\srgypgrabber\whiteleads.ddd"
    DIM tempP as p
    'Get pointer to existing window. In case layout_name is qualified with a dictionary name, extract up to first @. In case formname has spaces, normalize it
    tempP=obj(":"+object_Name_normalize(word(layout_name,1,"@")))
    'Test if pointer is valid
    if is_object(tempP) then 
    	'Test if pointer refers to a form or browse
    	if tempP.class() = "form" .or. tempP.class() = "browse" then 
    		'If so, then activate the already open window
    		tempP.activate()
    		
    	else
    		'Window is not already open, so open it
    		varP_leadsbr = :Browse.view(layout_name)
    		
    
    	end if
    else 
    	varP_leadsbr = :Browse.view(layout_name)
    	
    
    end if
    that is the script to get the data to search with

    this is the function that I wrote to run

    Code:
    'Date Created: 14-Mar-2015 01:57:58 AM
    'Last Updated: 31-Aug-2017 10:21:30 PM
    'Created By  : Steven
    'Updated By  : steve
    FUNCTION autowp AS C ( )
    	dim shared tbl as p
    
    dim dmain as c
    dim cc as c
    dim street as c
    dim zipcode as c
    dim shared city as c
    dim shared state as c
    dim shared saledate as c
    'dim shared casenum as c
    'casenum=tbl.casenum
    saledate=tbl.saledate
    street=tbl.Street
    zipcode=tbl.Zipcode
    city=tbl.city
    state=tbl.state
    'dmain="https://people.yellowpages.com/whitepages/address?street=1000+park+avenue&qloc=fairmont+nc+28340"
    dmain = "https://people.yellowpages.com/whitepages/address?street=" +alltrim(street)+"+" +"&qloc=" +alltrim(city)+"+" +alltrim(state)+"+"+alltrim(zipcode)
    cc = http_get_page2(dmain)
    
    dim srgstring as c
    dim reslts as c
    dim co as c
    co = ""
    co = EXTRACT_STRING( cc,"<div class=\"result-top-left-detail\">","</strong>",1)
    dim coe as c
    coe= extract_string(co,"<strong>"," ")
    	if coe = "page" then
    'reslts="a" 
    dim reslts1 as C
    dim phtrim as c
    dim nmtrim as c
    dim zptrim as c
    dim zipdone as c
    dim st1 as c
    dim en1 as c 
    st1="class=\"\""
    en1="<div class=\"address-map\">"
    dim en2 as c
    en2="</a>"
    reslts1 = EXTRACT_STRING( cc,st1,en1 )
    nmtrim = EXTRACT_STRING(reslts1,">",en2)
    phtrim =EXTRACT_STRING(reslts1,"(","<")
    'zptrim =EXTRACT_STRING(reslts1,"<div class=\"address\">","</div> ")
    'zipdone = right(zptrim,5)
    
    
    
    
    dim ltbl as p
    ltbl=table.open("whiteleads")
    ltbl.enter_begin()
    ltbl.Listed_Name = nmtrim
    ltbl.Street = street
    ltbl.City = city
    ltbl.State = state
    ltbl.Zipcode = zipcode
    ltbl.Phone = phtrim
    ltbl.Saledate = saledate
    'ltbl.Casenum = casenum
    ltbl.enter_end(.t.)
    ltbl.close()
    
    	else 
    	end if
    	
    	
    END FUNCTION
    I tested in the interactive window and realized that it was getting cut off due to variable size

  2. #2
    "Certified" Alphaholic Stan Mathews's Avatar
    Real Name
    Stan Mathews
    Join Date
    Apr 2000
    Location
    Bowling Green, KY
    Posts
    24,698

    Default Re: scraper data too large for variable how to cycle past unwanted data

    You could use SAVE_TO_FILE() to capture the page.

    save_to_file(http_get_page2(dmain),drive_path_name_ext)

    Then loop through the text in the file.
    There can be only one.

  3. #3
    Member
    Real Name
    Steven Greer
    Join Date
    Jun 2008
    Location
    Tampa, Florida
    Posts
    352

    Default Re: scraper data too large for variable how to cycle past unwanted data

    I think that is what I did the first time I wrote the code and got lazy speeding it up gotta look thru my old code and see if I can find it.

    I like the Highlander Siggy stan. Thx gonna try to run it that way

  4. #4
    "Certified" Alphaholic Stan Mathews's Avatar
    Real Name
    Stan Mathews
    Join Date
    Apr 2000
    Location
    Bowling Green, KY
    Posts
    24,698

    Default Re: scraper data too large for variable how to cycle past unwanted data

    I don't think the problem is with the variable. You are checking for the section between

    class="" and <div class="address-map"

    but the page is now using <div class="address-map full-profile--present">

    as the end of that section.
    There can be only one.

  5. #5
    Member
    Real Name
    Steven Greer
    Join Date
    Jun 2008
    Location
    Tampa, Florida
    Posts
    352

    Default Re: scraper data too large for variable how to cycle past unwanted data

    I looked at that too but I was thinking that it would stop the string scanner when it got to the first part and stop after the space after map

  6. #6
    "Certified" Alphaholic Stan Mathews's Avatar
    Real Name
    Stan Mathews
    Join Date
    Apr 2000
    Location
    Bowling Green, KY
    Posts
    24,698

    Default Re: scraper data too large for variable how to cycle past unwanted data

    I misstated.

    Your ending delimiter is

    <div class="address-map">

    The closing > is what is causing the extract to fail. the space after map would be sufficient if not for the >.
    There can be only one.

  7. #7
    Member
    Real Name
    Steven Greer
    Join Date
    Jun 2008
    Location
    Tampa, Florida
    Posts
    352

    Default Re: scraper data too large for variable how to cycle past unwanted data

    It's still not grabbing the results there is a bunch of meta data i gotta figure out how to get past.

  8. #8
    "Certified" Alphaholic Stan Mathews's Avatar
    Real Name
    Stan Mathews
    Join Date
    Apr 2000
    Location
    Bowling Green, KY
    Posts
    24,698

    Default Re: scraper data too large for variable how to cycle past unwanted data

    I was wrong that <div class="address-map" would work because the ending delimiter contains the second quote mark which is not present in the variable contents. In other words, extract_string is trying to find <div class="address-map", not <div class="address-map. You'll need to change to use

    <div class="address-map full-profile--present">
    or
    <div class="address-map full-profile--present"

    Code:
    en1="<div class=\"address-map full-profile--present\">"
    ? EXTRACT_STRING( cc,st1,en1 )
    = Aisa Hajdarevic</a>
                      <div class="address">
                   1229 Shannon WAY #WA, Bowling Green, KY 42101            </div>
                      <div class="phone">
                   (270) 999-9999            </div>
    Phone number changed to protect the innocent.
    There can be only one.

  9. #9
    Member
    Real Name
    Steven Greer
    Join Date
    Jun 2008
    Location
    Tampa, Florida
    Posts
    352

    Default Re: scraper data too large for variable how to cycle past unwanted data

    Winner Winner chicken dinner. It's working again thx stan for all the help

Similar Threads

  1. sorting limitation of list on large data
    By amitloh in forum Mobile & Browser Applications
    Replies: 25
    Last Post: 07-18-2017, 05:29 PM
  2. Want to Load Customers from Act and Past Invoice Data
    By markusof in forum Application Server Version 10 - Web/Browser Applications
    Replies: 2
    Last Post: 03-25-2012, 03:54 PM
  3. How to protect end user edit data in the past
    By peterth in forum Alpha Five Version 10 - Desktop Applications
    Replies: 4
    Last Post: 04-19-2010, 09:21 AM
  4. don't allow to edit the data in the past
    By peterth in forum Alpha Five Version 10 - Desktop Applications
    Replies: 9
    Last Post: 03-13-2010, 11:24 AM
  5. Auto advancing past required data field
    By Connie Brouillette in forum Alpha Five Version 5
    Replies: 3
    Last Post: 09-01-2004, 09:22 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •