HttpWebResponse encoding

I have a problem with encoding while trying to get html from google.com. Please, give me a advice how to resolve this problem. Thanks a lot.

public string Html
    {
        get
        {
            try
            {
                var request = WebRequest.Create(Url) as HttpWebRequest;
                request.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.83 Safari/537.1,gzip(gfe)";
                if (request != null)
                {
                    var response = request.GetResponse() as HttpWebResponse;
                    if (response != null)
                    {
                        string Charset = response.CharacterSet;
                        Encoding encoding = Encoding.GetEncoding(Charset);
                        var sr = new StreamReader(response.GetResponseStream(), encoding);
                        return sr.ReadToEnd();
                    }
                }
                return string.Format("Could not create object HttpWebRequest for '{0}'", Url);
            }
            catch (Exception e)
            {
                return e.Message;
            }
        }
    }

Here is an image as well:

Answers


The problem you are facing is because for some reason Google doesn't send out any encoding information in the headers. If you inspect the headers using the links below (specifically the Content-Type header) and compare the first one (which is from your image) to the second one you will see that the first one is missing some vital information.

http://web-sniffer.net/?url=http://www.google.com.ua/intl/ils/ads/

http://web-sniffer.net/?url=http://www.google.de/

What you need to do here is to first parse the HTML that is returned and look for a <meta>-element which specifies the encoding and then redecode the stream you are getting with that new information. Depending on what you are doing with the HTML afterwards you might want to look into http://htmlagilitypack.codeplex.com/ as a great library for working with HTML or just write a regular expression to extract the encoding (though I would really recommend the first alternative instead).


Need Your Help

Rails says where query is wrong

ruby-on-rails ruby ruby-on-rails-3

I am building a web application using Rails 3.2.6 and when I try to do this query it compains about the comma after the first half of the query.

Continuous file corruptions multiple GIT repositories containing Django Projects

css django git web gitlab

I have this very frustrating and weird problem regarding two Django Projects, which are in 2 separate GIT repositories.