Sunteți pe pagina 1din 36

The Strinx Library

Version 0.66
shablool@users.sf.net

   
Strinx Motivation
Need to solve real-life problems
Insufficient and error prone functionality of
C/C++ strings
Avoid heap exhaustion due to extensive
usage of std::basic_string
Avoid performance degradation of STL
containers in multi-threaded environment
Open source

 
Lightweight  
C++ Design Aims
(Stroustroup, An Overview of the C++ Programing Langauge, 1999)

C++ makes programming more enjoyable


for serious programmers
Better C
Supports data abstraction
Supports object-oriented programming
Supports generic programming

   
STL History
1979: Alexander Stepanov, Generic
Programing
1987: Stepanov & Musser, List prosessing
in Ada
1992: Stepanov & Lee, Initial C++ version
(HP)
1994: Approved by C++ ANSI/ISO standard
comitty

   
The Power of Generic
Programming

// Print items of list to std::cout, separated with 
// newline.
template <typename ListT>
void print_list(const ListT& ls)
{
    typedef typename ListT::value_type        value_type;
    typedef std::ostream_iterator<value_type> output_iterator;

    std::copy(ls.begin(), ls.end(),
                  output_iterator(std::cout, "\n"));
}

   
Typical Usage of STL Containers
// Split s into two substring, using whitespace as delimiter.
// WARN: THIS CODE HAS BUG!
std::list<std::string> 
stl_naive_split(const std::string& s)
{
    size_t i, j;
    std::list<std::string> toks_list;
 
    i = s.find(' ');
    j = s.find_first_not_of(' ', i);
    toks_list.push_back(s.substr(0, i));
    if (j < s.size())
    {
        toks_list.push_back(s.substr(j, s.size()­j));
    }
    
    return toks_list;
}

   
Example: Split & Print

// Split argv[1] and print.
int main(int argc, char* argv[])
{   
    if (argc > 1)
    {
        std::string s(argv[1]);
        print_list(stl_naive_split(s));
    }

    exit(EXIT_SUCCESS);
    return 0;
}

   
What is a String?

“A string is an ordered sequence of


symbols, chosen from a
predetermined set or alphabet”

   
Overview std::basic_string
template<typename _CharT, typename _Traits, 
         typename _Alloc> 
class basic_string;

std::string s1(“hello, world”), s2;
std::string s3 = s1;
s2 = s3; S1
if (s2[0] == 'H') { ... }
S2

11,24 h e l l o , w o r l d \0 . . . .

S3

   
What is Wrong with
std::basic_string?
Always use heap allocation
“Behind the scenes” allocations
Monoliths "Unstrung" (GoTW #84)
Non-elegant interface
Prone to errors (e.g., tokenizing)

   
Strinx substring
A semantically rich wrapper over const-
char pointer and size.
Reference to existing strins, No memory
allocations!

SS
size=5

S1
h e l l o , w o r l d \0 . . . .

   
Operations with substrings
All immutable operations of
std::basic_string
substr, chop, split, trim
Token parsing
Generics (test_if, count_if)

   
Example: Strip & Split
int main(int argc, char* argv[])
{
    using namespace strinx;
    SubString ss, key, val;
    for (int i = 1; i < argc; ++i)
    {
        // Strip leading and trailing whitespaces
        ss = strip(SubString(argv[i]));

        // Expecting key=value.
        tie(key, val) = split(ss, '=');
        key = strip(key); val = strip(val);

        // Output
        if (key.size() && val.size())
            std::cout << "KEY: " << key 
                      << " VALUE: " << val << std::endl;
    }        
    return 0;
}
   
Example: Token Parsing
int main(int arc, char * argv[])
{
    using namespace strinx;
    
    // Parse 'line' into tokens, delimeted by ':',';' or '/'. 
    // Output is: Ant Bee Cat Dog Eel Frog Giraffe
    SString line("/Ant:::Bee;:Cat:Dog;Eel/Frog:/Giraffe///");
    const char seps[] = ":;/";
    
    SubString tok = find_token(line, seps);
    while (tok.size() > 0)
    {
        std::cout << tok << ' ';        
        tok = find_next_token(line, tok, seps);        
    }

    std::cout << std::endl;

    return 0;
  }  
Char-Containers Alternatives to
std::basic_string
Class xstring: Dynamic allocated string.
Max-size is set upon construction. Uses
internal buffer for short strings.

Class sstring: A wrapper over char-array.


Max-size is set as template parameter at
compile-time. No dynamic-allocations:
template <size_type Size, typename CharT, typename Traits>
class sstring;

   
Example: Stringify Types
typedef strinx::sstring<30, char>   TString;
typedef std::complex<int>           Complex;

// Converts complex­number to string repr.
TString str(const Complex& c)
{
    TString s;
    strinx::format(s, "(%d,%d)", c.real(), c.imag());
    return s;    
}

// Stringify and print complex number
int main(void)
{
    Complex c(5, 12);
    std::cout << str(c) << std::endl;

    return 0;
}

   
Logging

Complex c(5, 12);    

printf("Complex: (%d,%d)\n", c.real(), c.imag());
    
std::cout << "Complex: (" << c.real() << ',' 
                          << c.imag() << ")" << std::endl;
   

printf("Complex: %s\n", str(c).c_str());
    
std::cout << "Complex: " << str(c) << std::endl;

   
Format Types “on the Stack”

// “C” style string formating
void to_string(strinx::SString& s, 
               const std::complex<int>& c, 
               const basic_fmt* fmt = 0)
{
    format(s, "(%d,%d)", c.real(), c.imag());    
}

using namespace strinx;
std::complex<int> c(3, 4);
SString s;
format(s, "<%20r>", c);
std::cout << s << std::endl;

   
Locale
Relatively new feature in C++ standard
Not integrated well to std::basic_string
Strinx solution: set of functions over
substring:
SubString abc_lower("abcdef"), abc_upper("ABCDEF");
SubString xdigs_lower("0123456789abcdef");
// Case insensative compare.
int n = icompare(abc_lower, abc_upper, std::locale("C"));
// Remove any leading and trailing alpha­characters.
SubString ss = strip_alphas(xdigs_lower, std::locale("C"));
   
Memory Spaces of Linux
Process

   
Memory Allocator

   
Dynamic Memory Allocations on
Multithreaded Enviroments
Doug Lea allocator (glibc) does not
distinguish between threads
Modern allocators are better designed for
multithreaded enviroments running on
multprocessors architectures (Hoard,
Solaris mtmalloc, Google TCmalloc)

   
Some Storage Management Techniques for
Container Classes
(Doug Lea, C++ Report 1989)

“Nearly all non-trivial storage management strategies


are ``lazy'' in the sense that individual allocation and
deallocation calls do not necessarily always individually
obtain or reclaim system storage space, but that
storage is ultimately managed in a safe and correct
manner. In other words, these techniques attempt to
have good “amortized” performance, meaning that
while any particular allocation or deallocation call may
not be especially fast, sequences of them, considered
together, are.”

   
The State of the Language
(Concurrency)
(Stroustrup, August 2008)

Memory model supporting modern machine


architectures
Threading ABI
Atomic types
Mutexes and locks
Thread local storage
Asynchronous message exchange

   
Memory Allocation in STL
Containers
// A standard container which offers fixed time access to 
// individual elemnts
template<typename _Tp, typename _Alloc = std::allocator<_Tp> 
>
class list;

_List_node<_Tp>*
_M_get_node()
{ return _M_impl._Node_alloc_type::allocate(1); }
      
void
_M_put_node(_List_node<_Tp>* __p)
{ _M_impl._Node_alloc_type::deallocate(__p, 1); }
      

   
Memory Pools

   
GNU libstdc++ Custom
Allocators
new_allocator, malloc_allocator:
Simply wraps ::operator new/malloc and
::operator delete/free.
__pool_alloc: A high-performance, single
pool allocator. The reusable memory is
shared among identical instantiations of
this type.
__mt_alloc: A high-performance fixed-size
allocator with exponentially-increasing
allocations.
   
Associative Containers
Set, Map, MultiSet, MultiMap
Underlying data structure: BST
AVL, Red-Black, TREAP (Skip-List)
Boost.Intrusive, Boost.Pool

   
Example: Random Numbers
// Generate random numbers within the range [first, last)
template <typename FwdIterator>
void stl_make_randoms(FwdIterator first, FwdIterator last)
{
    while (first != last)
    {
        *first = rand();
        ++first;
    }
}

template <typename ContainerT>
void strinx_make_randoms(ContainerT& c, int n)
{
while (n­­ > 0)
  {
c.push_back(rand());
}
}
   
Example: Random Numbers

int main(int argc, char* argv[])
{   
// Calls 10 times to T's constructor!
int n_elems = 10;
std::list<int>      stl_list(n_elems); 
    stl_make_randoms(stl_list.begin(), stl_list.end());
  print_list(stl_list);

    strinx::list<int>   strinx_list(n_elems);
  strinx_make_randoms(strinx_list, n_elems);
    print_list(strinx_list);

strinx::bounded_list<int>   strinx_list2(n_elems);
strinx_make_randoms(strinx_list2, n_elems);
print_list(strinx_list2);

    return 0;
}
   
Abnormal Allocations
(Multithreaded)
What is n_elems=­1?
What happens in case or run­time allocation 
failure?
How can we know which thread failed to allocate 
memory?

   
Strinx Containers
Normal: Allocates memory as needed via
allocator and retain an internal pool (free-
list) of unused memory, associated with
the object's instance
Bounded: Allocates a single continuous
memory region upon object's construction
and mange it internally.
Static Bounded: Manages an internal
memory region, whose max-size is set as
a template parameter.
   
Buffer & List
Strinx buffer is a fixed size container, with
double-ended queue semantics.
Designed to work with atomic_t (single-
reader single-writer threads)
bounded_list is designed for small sized
containers with list semantics

   
Example: Retokenize
#define MAX_TOKENS      (10)

// Tokenize string (using ':', '/' or ';' as delimiters).
// Put result in list and dump to std::cout.
int main(int argc, char* argv[])
{   
    using namespace strinx;

    list< SubString > tok_list(MAX_TOKENS);
    SString line("  /Ant:::Bee;:Cat:Dog;Eel/Frog:/Giraffe///  ");
        
    tokens(tok_list, strip(line), ":/;");
        
    std::copy(tok_list.begin(), tok_list.end(),
     std::ostream_iterator< xstring<char> >(std::cout, " "));
    std::cout << std::endl;

    return 0;
}

   
Performance Benchmarks
Intel Xeon 8cores, 2992.536MHz, 6144KB cache size, Linux Kernel
2.6.18, GCC 4.1.2 using -O3

CONTAINER SUBTYPE MAX-SIZE THREADS STL/STRINX


buffer bounded 200000 4 1.23
list normal 500000 4 1.59
list bounded 500000 4 1.56
set normal 100000 4 2.63
set bounded 100000 4 3.08
map normal 100000 4 2.46
map bounded 100000 4 3.82
multiset normal 100000 4 2.48
multiset bounded 100000 4 2.66
multimap normal 200000 4 1.39
multimap bounded 200000 4 2.22
stack bounded 100000 4 1.75

   
Questions?

   

S-ar putea să vă placă și