Categories
Mastering Development

Social Media and email extraction code needs updating

I am a bit stuck on this code that I put together. I took two codes and put them together, although this code does work for the most part there are still some issues that I can not fix.

WHAT THE CODE DOES

Opens a urls in Sheet3 column A, extracts the emails from the site and any social media links. Then puts the results from column B onwards

enter image description here

Problems

1) TWO code have been put together, but can it be written smarter as currently it looks for emails first and then it looks for social media urls, so if there was 100 urls, it would look for emails in them first and then GO BACK and look for social media urls again, hence the 100 are processed 200 times. When it should find emails and social media urls AT THE SAME TIME

2) Browser has been set to false but still shows in TASK BAR

enter image description here

3) When the process is finished a userform called, “Complete” should show. but it does NOT.

4) As the browser would show, I had to put some code at the end to close the browser, but it is not closing the browser. idealy I would not want the browser to show.

5) Both IE and “MSXML2.ServerXMLHTTP.6.0” are used. As I stated this was TWO code that I put together and therefore not the best. I think “MSXML2.ServerXMLHTTP.6.0” would be much faster. However I could not change the first half of the code to only use “MSXML2.ServerXMLHTTP.6.0” as I have always used IE in the past. Please could someone advise of what to do

What I have tried so far.

I have tried to place the code in several variations, non worked. I was able to fix the part where the active sheet did not have to be sheet3. So it does not matter what sheet I am on it will GET and Paste the results into sheet3.

Private Sub SocialEmailStartBut_Click()
''Extract emails only from urls
Dim ie As InternetExplorer
Dim url As String
Dim x As Long
Dim HTML As HTMLDocument
Dim ElementCol As Object
Dim Worksheet As Sheet3
Set HTML = CreateObject("htmlfile")

Set ie = CreateObject("internetexplorer.application")
    ie.Visible = False '###### set to false BUT shows in task bar #####

x = 2 '''start row
Do While Sheet3.Cells(x, 1) <> ""
    url = Sheet3.Cells(x, 1)

ie.navigate url
    Do While ie.readyState <> READYSTATE_COMPLETE
    DoEvents
Loop

Set HTML = ie.document
    Set ElementCol = HTML.getElementsByTagName("a")

For Each link In ElementCol
    If InStr(link, "mailto:") Then
        Sheet3.Cells(x, 2).Value = link
        Sheet3.Cells(x, 2) = Right(link, Len(link) - InStr(link, ":"))
        Sheet3.Cells(x, 2).Columns.AutoFit
    End If
Next
x = x + 1
Loop

'#################################################################
'###################Social URL Extractor##########################
'#################################################################
Dim counter As Long
Dim website As Range
Dim row As Long
Dim continue As Boolean
Dim respHead As String

''''The row where website addresses start
row = 2
    continue = True

Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    Do While continue
'''Could set this to first cell with URL then OFFSET columns to get next web site
Set website = ThisWorkbook.Worksheets("Sheet3").Range("A" & row)
        If Len(website.Value) < 1 Then
            continue = False
        Exit Sub
        End If

        If website Is Nothing Then
            continue = False
        End If

'''Debug.Print website
    With http
        On Error Resume Next
        .Open "GET", website.Value, False
        .send

'''If the website sent a valid response to our request, URLS ARE IN COLUMN A
    If Err.Number = 0 Then
        If .Status = 200 Then
            HTML.body.innerHTML = http.responseText
                Set links = HTML.getElementsByTagName("a")
'''COLUMN C = FACEBOOK
   For Each link In links
      If InStr(UCase(link.outerHTML), "FACEBOOK") Then
      website.Offset(0, 2).Value = link.href
   End If
'''COLUMN D = INSTAGRAM
    If InStr(UCase(link.outerHTML), "INSTAGRAM") Then
        website.Offset(0, 3).Value = link.href
    End If
'''COLUMN E = TWITTER
    If InStr(UCase(link.outerHTML), "TWITTER") Then
        website.Offset(0, 4).Value = link.href
    End If
'''COLUMN F = YOUTUBE
    If InStr(UCase(link.outerHTML), "YOUTUBE") Then
        website.Offset(0, 5).Value = link.href
    End If
'''COLUMN G = LinkedIn
    If InStr(UCase(link.outerHTML), "LINKEDIN") Then
        website.Offset(0, 6).Value = link.href
    End If
Next
    End If
    Set website = Nothing
Else
'''Debug.Print "Error loading page IN COLUMN H"
    website.Offset(0, 8).Value = "Error with website address"
    End If
On Error GoTo 0
 End With
row = row + 1
Loop

Complete.Show '#### THIS FORM DOES NOT SHOW AT THE END ####
''' CLOSE BROWSER
ie.Quit
Set ie = Nothing
Set ElementCol = Nothing
End Sub

Leave a Reply

Your email address will not be published. Required fields are marked *