I have a URL which can be any of the following formats:
http://example.com https://example.com http://example.com/foo http://example.com/foo/bar www.example.com example.com foo.example.com www.foo.example.com foo.bar.example.com http://foo.bar.example.com/foo/bar example.net/foo/bar
Essentially, I need to be able to match any normal URL. How can I extract
example.com (or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?
Well you can use
parse_url to get the host:
$info = parse_url($url); $host = $info['host'];
Then, you can do some fancy stuff to get only the TLD and the Host
$host_names = explode(".", $host); $bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];
Not very elegant, but should work.
If you want an explanation, here it goes:
First we grab everything between the scheme (
http://, etc), by using
parse_url‘s capabilities to… well…. parse URL’s. 🙂
Then we take the host name, and separate it into an array based on where the periods fall, so
test.world.hello.myname would become:
array("test", "world", "hello", "myname");
After that, we take the number of elements in the array (4).
Then, we subtract 2 from it to get the second to last string (the hostname, or
example, in your example)
Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD
Then we combine those two parts with a period, and you have your base host name.
Answered By – Tyler Carter