Home
Search
 
What's New
Index
Books
Links
Q & A
Newsletter
Banners
 
Feedback
Tip Jar
 
C# Helper...
 
XML RSS Feed
Follow VBHelper on Twitter
 
 
 
MSDN Visual Basic Community
 
 
 
 
 
TitleGrab images from a Web page in Visual Basic .NET
DescriptionThis example shows how to grab images from a Web page in Visual Basic .NET. It uses a WebBrowser control to go to a Web page. It read that object's Document property to get an HtmlDocument. That object's Images property returns information about the page's images. Finally the program uses a WebClient to download the images. The program provides some other handy features such as the ability to view the images and select those that should be saved into files.
Keywordsgrab images, Web, HTML, Visual Basic .NET, VB.NET, WebBrowser, WebClient, download, download images, screen scraping, HtmlDocument
CategoriesVB.NET, Utilities, Internet
 
This description only touches on the most interesting parts of the program. Download it to see the details.

You can click the links on the WebBrowser to navigate to a Web page, or enter a URL and click the Go button to navigate there. The following code shows how the program navigates.

 
' Navigate to the entered URL.
Private Sub btnGo_Click(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles btnGo.Click
    Try
        wbrWebSite.Navigate(txtUrl.Text)
    Catch ex As Exception
        MessageBox.Show("Error navigating to web site " & _
            txtUrl.Text & vbCrLf & ex.Message, _
            "Navigation Error", _
            MessageBoxButtons.OK, _
            MessageBoxIcon.Error)
    End Try
End Sub
 
After you have navigated to the desired Web page, click the List Images button to execute the following code. The program removes all controls from the flpPictures FlowLayoutPanel control by setting their Parent properties to Nothing. This removes all references to those controls so they are destroyed when garbage collection runs.

Next the code gets the WebBrowser's Document property, which returns an HtmlDocument object representing the Web page, and loops through the HtmlDocument's Images collection. It gets the image object's src property, which contains the image's URL.

The code makes a new PictureBox, calls subroutine GetPicture to download the image into the PictureBox, and places the PictureBox in the flpPictures FlowLayoutPanel. That control automatically arranges its children in rows, wrapping when necessary, and displaying scroll bars if the pictures don't all fit. Notice that the code saves the image's URL in the PictureBox's Tag property.

Finally the code registers the pic_Click event handler to catch the PictureBox's Click events.

This routine also contains code to let you see new PictureBoxes as they are created and to stop the loop before it finishes. See the code for details.

 
' Show the images from the URL.
Private m_Running As Boolean = False
Private Sub btnListImages_Click(ByVal sender As _
    System.Object, ByVal e As System.EventArgs) Handles _
    btnListImages.Click
    If btnListImages.Text = "List Images" Then
        Me.Cursor = Cursors.WaitCursor
        btnListImages.Text = "Stop"
        btnGo.Enabled = False
        btnSaveImages.Enabled = False
        Application.DoEvents()

        ' Remove old images.
        For i As Integer = flpPictures.Controls.Count - 1 _
            To 0 Step -1
            flpPictures.Controls(i).Parent = Nothing
        Next i

        ' List the images on this page.
        Dim doc As System.Windows.Forms.HtmlDocument = _
            wbrWebSite.Document
        m_Running = True
        For Each element As HtmlElement In doc.Images
            Dim dom_element As mshtml.HTMLImg = _
                element.DomElement
            Dim src As String = dom_element.src

            Dim pic As New PictureBox()
            pic.BorderStyle = BorderStyle.Fixed3D
            pic.SizeMode = PictureBoxSizeMode.AutoSize
            pic.Image = GetPicture(src)
            pic.Parent = flpPictures
            pic.Tag = src
            tipFileName.SetToolTip(pic, src)

            AddHandler pic.Click, AddressOf pic_Click

            Application.DoEvents()

            If Not m_Running Then Exit For
        Next element
        m_Running = False

        btnListImages.Text = "List Images"
        btnGo.Enabled = True
        btnSaveImages.Enabled = True
        Me.Cursor = Cursors.Default
    Else
        m_Running = False
    End If
End Sub
 
The GetPicture function uses a WebClient to download a picture. It calls the WebClient's DownloadData method to pull the image down into a memory stream. It then uses the Image class's FromStream method to convert the stream into an image.
 
' Get the picture at a given URL.
Private Function GetPicture(ByVal url As String) As Image
    Try
        url = Trim(url)
        If Not url.ToLower().StartsWith("http://") Then url _
            = "http://" & url
        Dim web_client As New WebClient()
        Dim image_stream As New _
            MemoryStream(web_client.DownloadData(url))
        Return Image.FromStream(image_stream)
    Catch ex As Exception
        MessageBox.Show("Error downloading picture " & _
            url & vbCrLf & ex.Message, _
            "Download Error", _
            MessageBoxButtons.OK, _
            MessageBoxIcon.Error)
    End Try
    Return Nothing
End Function
 
After you display the images, click on any that you don't want to download. When you click on a PictureBox, the following code sets that control's Parent property to Nothing. That removes it from the FlowLayoutPanel, which automatically rearranges its remaining children.
 
' Remove the clicked PictureBox.
Private Sub pic_Click(ByVal sender As System.Object, ByVal _
    e As System.EventArgs)
    Dim pic As PictureBox = DirectCast(sender, PictureBox)
    pic.Parent = Nothing
End Sub
 
When you click the Save Images button, the following code loops through the PictureBoxes that remain in the FlowLayoutPanel. It gets each image's file name from the PictureBox's Tag property and saves the control's image in an appropriately named file.
 
' Save the images that have not been removed.
Private Sub btnSaveImages_Click(ByVal sender As _
    System.Object, ByVal e As System.EventArgs) Handles _
    btnSaveImages.Click
    Dim dir_name As String = txtDirectory.Text
    If Not dir_name.EndsWith("\") Then dir_name &= "\"

    For Each pic As PictureBox In flpPictures.Controls
        Dim bm As Bitmap = pic.Image
        Dim filename As String = pic.Tag
        filename = _
            filename.Substring(filename.LastIndexOf("/") + _
            1)
        Dim ext As String = _
            filename.Substring(filename.LastIndexOf("."))
        Dim full_name As String = dir_name & filename

        Select Case ext
            Case ".bmp"
                bm.Save(full_name, Imaging.ImageFormat.Bmp)
            Case ".gif"
                bm.Save(full_name, Imaging.ImageFormat.Gif)
            Case ".jpg", "jpeg"
                bm.Save(full_name, Imaging.ImageFormat.Jpeg)
            Case ".png"
                bm.Save(full_name, Imaging.ImageFormat.Png)
            Case ".tiff"
                bm.Save(full_name, Imaging.ImageFormat.Tiff)
            Case Else
                MessageBox.Show( _
                    "Unknown file type " & ext & _
                    " in file " & filename, _
                    "Unknown File Type", _
                    MessageBoxButtons.OK, _
                    MessageBoxIcon.Error)
        End Select
    Next pic

    Beep()
End Sub
 
This program still has a few weak spots. The error handling isn't perfect. For example, you can click the Save Images button even if you haven't listed any images. The program simply saves zero files so it doesn't hurt anything but it would make sense to disable that button unless some images were displayed.

The program also downloads images when it needs them rather than pulling them from cache so it isn't as fast as it might be. It also probably cannot save images that are generated on the fly by the Web server.

 
 
Copyright © 1997-2010 Rocky Mountain Computer Consulting, Inc.   All rights reserved.
  Updated