How do I remove a substring from the end of a string in Python?

I have the following code:

url = 'abcdc.com'
print(url.strip('.com'))

I expected: abcdc

I got: abcd

Now I do

url.rsplit('.com', 1)

Is there a better way?

Answers


strip doesn't mean "remove this substring". x.strip(y) treats y as a set of characters and strips any characters in that set from the ends of x.

Instead, you could use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

Or using regular expressions:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

If you are sure that the string only appears at the end, then the simplest way would be to use 'replace':

url = 'abcdc.com'
print(url.replace('.com',''))

def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]

Since it seems like nobody has pointed this on out yet:

url = "www.example.com"
new_url = url[:url.rfind(".")]

This should be more efficient than the methods using split() as no new list object is created, and this solution works for strings with several dots.


Depends on what you know about your url and exactly what you're tryinh to do. If you know that it will always end in '.com' (or '.net' or '.org') then

 url=url[:-4]

is the quickest solution. If it's a more general URLs then you're probably better of looking into the urlparse library that comes with python.

If you on the other hand you simply want to remove everything after the final '.' in a string then

url.rsplit('.',1)[0]

will work. Or if you want just want everything up to the first '.' then try

url.split('.',1)[0]

In one line:

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]

How about url[:-4]?


If you know it's an extension, then

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

This works equally well with abcdc.com or www.abcdc.com or abcdc.[anything] and is more extensible.


For urls (as it seems to be a part of the topic by the given example), one can do something like this:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

Both will output: ('http://www.stackoverflow', '.com')

This can also be combined with str.endswith(suffix) if you need to just split ".com", or anything specific.


url.rsplit('.com', 1)

is not quite right.

What you actually would need to write is

url.rsplit('.com', 1)[0]

, and it looks pretty succinct IMHO.

However, my personal preference is this option because it uses only one parameter:

url.rpartition('.com')[0]

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

I want to repeat this answer as the most expressive way to do it. Of course, the following would take less CPU time

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

However, if CPU is the bottle neck why write in Python?

When is CPU a bottle neck anyway?? in drivers , maybe.

The advantages of using regular expression is code reusability. What if you next want to remove '.me' , which only has three characters?

Same code would do the trick.

>>> rm_sub('abcdc.me','.me')
'abcdc'

This is a perfect use for regular expressions:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

Or you can use split:

a = 'abccomputer.com'
res = a.split('.com',1)[0]

def remove_file_type(infile):
import re
return(re.sub('\.[^.]*$','',infile))
remove_file_type('abc.efg')'abc'

In my case I needed to raise an exception so I did:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

If you mean to strip only extension

url = 'abcdc.com'
print('.'.join(url.split('.')[:-1]))

It works with any extension, with potential other dots existing in filename as well. It simply splits string to list on dots and joins it without last element.

Probably not the fastest, but for me it's more readable than other methods.


Need Your Help

JAXB: How to avoid repeated namespace definition for xmlns:xsi

java xml jaxb

I have a JAXB setup where I use a @XmlJavaTypeAdapter to replace objects of type Person with objects of type PersonRef that only contains the person's UUID. This works perfectly fine. However, the

Why cannot a non-member function be used for overloading the assignment operator?

c++ operator-overloading assignment-operator

The assignment operator can be overloaded using a member function but not a non-member friend function: