Fork me on GitHub

SuperString

by Wael Boutglay

Introduction

SuperString is an efficient string library for C++, that achieves a remarkable memory and CPU optimization.

SuperString uses Rope (data structure) and optimization techniques.

Features

Table of contents

  1. Introduction
  2. Features
  3. Table of contents
  4. Contribute and support
  5. Install and use
    1. With CMake
    2. Without CMake
  6. API
    1. Construct a new string
    2. Static methods
      1. SuperString::Const
      2. SuperString::Copy
    3. Methods
      1. codeUnitAt(index)
      2. indexOf(pattern)
      3. isEmpty()
      4. isNotEmpty()
      5. lastIndexOf(pattern)
      6. length()
      7. print(stream)
      8. print(stream, startIndex, endIndex)
      9. substring(startIndex, endIndex)
      10. trim()
      11. trimLeft()
      12. trimRight()
    4. Operators
      1. operator *
      2. operator +
    5. Nested types
      1. SuperString::Encoding
      2. SuperString::Error
      3. SuperString::Byte
      4. SuperString::Result<T, E>

Contribute and support

You have any feature idea, a bug to correct or an improvement, feel free to open a issue or send your pull request.

Install and use

Using CMake

In your project, clones SuperString to a directory where third-party libraries live (let's call it ext).

mkdir ext && cd ext
git clone https://github.com/btwael/SuperString.git

Now, you will need to add those lines to your CMakeLists.txt

# include SuperString
add_subdirectory(ext/SuperString)

# add SuperString headers to include directory
include_directories(ext/SuperString/include)

# link your executable against SuperString 
target_link_libraries(myexecutable SuperString)

Without CMake

The header file that contains SuperString declarations is SuperString/include/SuperString.hh, the source file that contains the definitions is SuperString/src/SuperString.cc, use them as you prefer.

API

Construct a new string

As mentioned above, SuperString is automatically garbage collected, so you don't have to think about how and when to free a SuperString instance. To allow this, there're are two way to create a SuperString using static method SuperString::Const or SuperString::Copy.

#include <iostream>
#include "SuperString.hh"

SuperString myFunc() {
    char chars[] = "I'm using SuperString!";
    SuperString string = SuperString::Copy(chars);
    return string;
}

char seq[] = "SuperString is cool!";

int main(int argc, char const *argv[]) {
    SuperString s1 = myFunc();
    SuperString s2 = SuperString::Const(seq);
    // equivalent to SuperString::Const("SuperString is cool!");
    std::cout << s1 << "\n" << s2;
    return 0;
}

In myFunc, we used SuperString::Copy because the sequence that we're building our string from, has a limited lifetime and well be deleted once the function returns, that why we use ::Copy to tell SuperString that we should copy the data and keep them for further use.

In the other hand, we used ::Const in main because the sequence will live as long as the executable lives, that's important because SuperString will not copy the sequence to avoid memory redundancy.

Static methods

SuperString::Const

static SuperString SuperString::Const(const char *chars, SuperString::Encoding encoding);
static SuperString SuperString::Const(const SuperString::Byte *bytes, SuperString::Encoding encoding);

This static method creates a new SuperString from a const sequence of characters of a given supported encoding, the encoding paramter has SuperString::Encoding::UTF8 as default value. This method does not replicate string data in memory

#include <iostream>
#include "SuperString.hh"

char asciiseq[] = "SuperString is cool!";
SuperString::Byte utf8seq[] = {0xe2, 0x82, 0xac, 0x00};
SuperString::Byte utf16beseq[] = {0x00, 0x24, 0x00, 0x00};
int utf32seq[] = {0x10437, 0x0000};

int main(int argc, char const *argv[]) {
    SuperString s1 = SuperString::Const(asciiseq, SuperString::Encoding::ASCII);
    SuperString s2 = SuperString::Const(utf8seq, SuperString::Encoding::UTF8);
    SuperString s3 = SuperString::Const(utf16beseq, SuperString::Encoding::UTF16BE);
    SuperString s4 = SuperString::Const(utf32seq, SuperString::Encoding::UTF32);
    std::cout << s1 << "\n"; // SuperString is cool!
    std::cout << s2 << "\n"; // €
    std::cout << s3 << "\n"; // $
    std::cout << s4 << "\n"; // 𐐷
    return 0;
}

SuperString::Copy

static SuperString SuperString::Copy(const char *chars, SuperString::Encoding encoding);
static SuperString SuperString::Copy(const SuperString::Byte *bytes, SuperString::Encoding encoding);

This static method creates a new SuperString from a const sequence of characters of a given supported encoding, the encoding paramter has SuperString::Encoding::UTF8 as default value. This method copys the given sequence to a new allocated memory space.

#include <iostream>
#include "SuperString.hh"

SuperString getString(int i) {
    char asciiseq[] = "SuperString is cool!";
    SuperString::Byte utf8seq[] = {0xe2, 0x82, 0xac, 0x00};
    SuperString::Byte utf16beseq[] = {0x00, 0x24, 0x00, 0x00};
    int utf32seq[] = {0x10437, 0x0000};
    switch(i) {
        case 0:
            return SuperString::Copy(asciiseq, SuperString::Encoding::ASCII);
        case 1:
            return SuperString::Copy(utf8seq, SuperString::Encoding::UTF8);
        case 2:
            return SuperString::Copy(utf16beseq, SuperString::Encoding::UTF16BE);
        case 3:
            return SuperString::Copy(utf32seq, SuperString::Encoding::UTF32);
        default:
            return SuperString::Copy("Nothing"); // by default this is UTF8
    }
}

int main(int argc, char const *argv[]) {
    std::cout << getString(0) << "\n"; // SuperString is cool!
    std::cout << getString(1) << "\n"; // €
    std::cout << getString(2) << "\n"; // $
    std::cout << getString(3) << "\n"; // 𐐷
    return 0;
}

Methods

codeUnitAt(index)

SuperString::Result<int, SuperString::Error> codeUnitAt(std::size_t index) const;

Returns the code unit at the given index, if index is less than the length of the string, if not, it returns SuperString::Error::RangeError.

#include <iostream>
#include <cstddef>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString::Result<int, SuperString::Error> res;
    SuperString s = SuperString::Const("SuperString is fast!");

    res = s.codeUnitAt(1);

    if(res.isOk()) {
        std::cout << "Valid index, code unit: " << res.ok() << "\n";
    }

    res = s.codeUnitAt(100);

    if(res.isErr()) { // && res.err() == SuperString::Error::RangeError
        std::cout << "Range error\n";
    }

    for(std::size_t i = 0; i < s.length(); i++) {
        std::cout << s.codeUnitAt(i).ok() << "\n"; // sometime it's just safe
    }
    return 0;
}

indexOf(pattern)

SuperString::Result<std::size_t, SuperString::Error> indexOf(SuperString other) const;

Returns the position of the first occurrence of other in this string, if not found, it returns SuperString::Error::NotFound.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString::Result<std::size_t, SuperString::Error> res;
    SuperString s = SuperString::Const("SuperString is fast and fast!");

    res = s.indexOf(SuperString::Const("fast"));

    if(res.isOk()) {
        std::cout << res.ok() << "\n"; // 15
    } else {
        std::cout << "Not found" << "\n";
    }
    return 0;
}

isEmpty()

bool isEmpty() const;

Returns true if this string is empty.

#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s1;
    SuperString s2 = SuperString::Const("");
    SuperString s3 = SuperString::Const("SuperString");

    s1.isEmpty(); // true
    s2.isEmpty(); // true
    s3.isEmpty(); // false

    return 0;
}

isNotEmpty()

bool isNotEmpty() const;

Returns true if this string is not empty.

#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s1;
    SuperString s2 = SuperString::Const("");
    SuperString s3 = SuperString::Const("SuperString");

    s1.isNotEmpty(); // false
    s2.isNotEmpty(); // false
    s3.isNotEmpty(); // true

    return 0;
}

lastIndexOf(pattern)

SuperString::Result<std::size_t, SuperString::Error> lastIndexOf(SuperString other) const;

Returns the position of the last occurrence of other in this string, if not found, it returns SuperString::Error::NotFound.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString::Result<std::size_t, SuperString::Error> res;
    SuperString s = SuperString::Const("SuperString is fast and fast!");

    res = s.lastIndexOf(SuperString::Const("fast"));

    if(res.isOk()) {
        std::cout << res.ok() << "\n"; // 24
    } else {
        std::cout << "Not found" << "\n";
    }
    return 0;
}

length()

std::size_t length() const;

Returns the length of this string.

#include <iostream>
#include "SuperString.hh"

char asciiseq[] = "SuperString is cool!";
SuperString::Byte utf8seq[] = {0xe2, 0x82, 0xac, 0x00};
SuperString::Byte utf16beseq[] = {0x00, 0x24, 0x00, 0x00};
int utf32seq[] = {0x10437, 0x0000};

int main(int argc, char const *argv[]) {
    SuperString s1 = SuperString::Const(asciiseq, SuperString::Encoding::ASCII);
    SuperString s2 = SuperString::Const(utf8seq, SuperString::Encoding::UTF8);
    SuperString s3 = SuperString::Const(utf16beseq, SuperString::Encoding::UTF16BE);
    SuperString s4 = SuperString::Const(utf32seq, SuperString::Encoding::UTF32);
    std::cout << s1.length() << "\n"; // 20
    std::cout << s2.length() << "\n"; // 1
    std::cout << s3.length() << "\n"; // 1
    std::cout << s4.length() << "\n"; // 1
    return 0;
}

print(stream)

void print(std::ostream &stream) const;

Prints this string to the given stream.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s = SuperString::Const("SuperString is fast and fast!");

    s.print(std::cout); // equivalent to: std::cout << s;
    return 0;
}

print(stream, startIndex, endIndex)

void print(std::ostream &stream, std::size_t startIndex, std::size_t endIndex) const;

Prints a substring of this string that starts at startIndex, inclusive and end at endIndex, exclusive.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s = SuperString::Const("SuperString is fast and fast!");

    s.print(std::cout, 0, 11); // Will print: SuperString 
    return 0;
}

substring(startIndex, endIndex)

void substring(std::size_t startIndex, std::size_t endIndex) const;

Returns the substring of this string that extends from startIndex, inclusive, to endIndex, exclusive.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString::Result<SuperString, SuperString::Error> res;
    SuperString s = SuperString::Const("SuperString is fast and fast!");

    res = s.substring(0, 11);

    if(res.isOk()) {
        std::cout << res.ok() << "\n"; // Will print: SuperString
    } else {
        std::cout << "Range error" << "\n";
    }
    return 0;
}

trim()

SuperString trim() const;

Returns the string without any leading and trailing whitespace.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s = SuperString::Const("  \tSuperString  ");

    std::cout << s.trim(); // Will print "SuperString" not "  \tSuperString  "

    return 0;
}

trimLeft()

SuperString trimLeft() const;

Returns the string without any leading whitespace.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s = SuperString::Const("  \tSuperString  ");

    std::cout << s.trimLeft(); // Will print "SuperString  " not "  \tSuperString  "

    return 0;
}

trimRight()

SuperString trimRight() const;

Returns the string without any trailing whitespace.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s = SuperString::Const("  \tSuperString  ");

    std::cout << s.trimRight(); // Will print "  \tSuperString" not "  \tSuperString  "

    return 0;
}

Operators

operator *

SuperString operator*(std::size_t times) const;

Creates a new string that concatenate this string with itself a number of times.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s = SuperString::Const("bla");
    s = s * 3;

    std::cout << s << "\n"; // blablabla
    std::cout << s.substring(2, 7).ok() << "\n"; // ablab

    return 0;
}

operator +

SuperString operator+(const SuperString &other) const;

Creates a new string by concatenating this string with other.

#include <iostream>
#include "SuperString.hh"

int main(int argc, char const *argv[]) {
    SuperString s1 = SuperString::Const("bla");
    SuperString s2 = SuperString::Const("kla");
    SuperString s = s1 + s2 + s1;

    std::cout << s << "\n"; // blaklabla
    std::cout << s.substring(2, 9).ok() << "\n"; // aklabla

    return 0;
}

Nested types

SuperString::Encoding

This type is defined as fellow:

class SuperString {
...
    enum class Encoding {
        ASCII,
        UTF8,
        UTF16BE,
        UTF32
    };
...
};

SuperString::Error

This type is defined as fellow:

class SuperString {
...
    enum class Error {
        Unimplemented,
        Unexpected, // Something that never happens, Unreachable code
        RangeError,
        InvalidByteSequence,
        NotFound
    };
...
};

SuperString::Byte

This type is defined as fellow:

class SuperString {
...
    typedef unsigned char Byte;
...
};

SuperString::Result<T, E>

This type is inspired from Rust type std::Result<T, E>, and defined as:

class SuperString {
...
    template<class T, class E>
    class Result {
    private:
        char *_ok;
        char *_err;

    public:
        Result(T ok);

        Result(E err);

        Result(const SuperString::Result<T, E> &other) /*copy*/;

        ~Result();

        /**
         * Returns the error value.
         */
        E err() const;

        /**
         * Returns true if the result is Ok.
         */
        bool isErr() const;

        /**
         * Returns true if the result is Err.
         */
        bool isOk() const;

        /**
         * Returns the success value.
         */
        T ok() const;

        /**
         * Sets this to Err with given [err] value.
         */
        void err(E err);

        /**
         * Sets this to Ok with given [ok] value.
         */
        void ok(T ok);

        SuperString::Result<T, E> &operator=(const SuperString::Result<T, E> &other);
    };
...
};