blog

Generating Facebook Like Preview using Regular Expression

First of all, we need to understand What is Web Request?

Web Request is request/response model to access the data over the internet. Request sent from client to the server. Based on request server will return the response to the client. Web Request throws web exception if any error in accessing the resources over the internet.

for more information refer [https://msdn.microsoft.com/en-us/library/system.net.webrequest(v=vs.110).aspx].

What is Regular Expression?

Regular expression is formed of special characters and alpha-numeric characters which describes search pattern for particular scenario.

for example
if we need to check that any enter value is valid amount format or not.
We know that valid amount format is any number following by two decimal places. i.e. 99.99

for that below regular expression can be useful.[It’s just an example, not accurate regex]
([0-9]{7}\.[0-9]{2})

image credit : https://regexper.com/ (licence – no modification)

Now, We have basic understanding of WebRequest and Regex (Regular Expression).

If we remember, what facebook does when we paste any link in status block? It creates a preview, which consist of image/s, Title of the website, valid url and Description.

Sometimes we won’t get any description or image/s. So, question is Why it doesn’t display any image, sometimes?

To understand this we need to understand that What scenario may be used by the Facebook. As per my study, I came to know that, It uses meta tags of the website to fetch the details which are necessary to generate the preview. If any website won’t have any meta tags which includes image,description then they’ll be not shown in the preview.

We are going to use the same scenario. But, Here we are giving user a functionality to enter any Url without HTTP and/or HTTPS.

In first step, we’ll take user’s input. When user enters any Url, first of all we need to format the url to make a WebRequest.

suppose user had enters “google.com”. so, we’ll have four url’s from which we’ll get WebResponse from atleast one Url from the list of Urls.

possible list of Urls
1) http://google.com
2) https://google.com
3) http://www.google.com
4) https://www.google.com

How to make a WebRequest? and How to check its Response? please refer below code.

// making a webrequest
 HttpWebRequest request = HttpWebRequest.Create(Url) as HttpWebRequest;

 // webresponse
 response = request.GetResponse() as HttpWebResponse;
 if (response.StatusCode.ToString().ToLower() == "ok")
 {
    //valid url
 }

Once we get a valid Url, We need to Download the rendered HTML page as a string on which we can apply regex and exctract the information we needed to generate the preview.

To download the rendered HTML page as string, refer below code.Taken from [ http://www.mikesdotnetting.com/article/49/how-to-read-a-remote-web-page-with-asp-net-2-0 ]

 public static string GetHtmlPage(string strURL)
        {
            string strResult;
            HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create(strURL);
            objRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
            WebResponse objResponse = objRequest.GetResponse();
            using (var sr = new StreamReader(objResponse.GetResponseStream()))
            {
                strResult = sr.ReadToEnd();
                sr.Close();
            }
            return strResult;
        }

once we have downloaded the Html page as a string, we need to exctract the information which we required.
First of all, We’ll exctract the meta tags. To exctract the meta tags we need to build a regex which helps to exctract all the meta tags in the page.

regex : <meta[\\s]+[^>]*?content[\\s]?=[\\s\”\’]+(.*?)[\”\’]+.*?>

image credit :