June 26, 2013

Comparison of data structures in C++ and Haskell

Comparison of Data Structures in C++ and Haskell

Once again, just to satisfy my curiosity, I want to do a tiny comparison of a basic feature in both languages, trying to see how data structures are implemented. Ok, I'm fairly certain how it works for in C++, I'm really just curious about Haskell.

C++ version


#include <iostream>

#include <memory>

#include <vector>



struct BinaryNode {

  int val;

  BinaryNode *leftchild, *rightchild;

  BinaryNode(int v, BinaryNode* lc = nullptr, BinaryNode* rc = nullptr)

    : val(v), leftchild(lc), rightchild(rc) {}

};



struct NaryNode {

  int val;

  std::vector<NaryNode*> children;

  NaryNode(int v, std::vector<NaryNode*>& c) : val(v), children(c) {}

};



int binarySum(BinaryNode* node) {

  if (node == nullptr) {

    return 0;

  }

  return node->val + binarySum(node->leftchild) + binarySum(node->rightchild);

}



int narySum(NaryNode* node) {

  if (node == nullptr) {

    return 0;

  }

  int sum = node->val;

  for (const auto it : node->children) {

    sum += narySum(it);

  }

  return sum;

}



// Create dummy binary and n-ary trees and print out their sum

int main() {

  std::unique_ptr<BinaryNode> bnode1(new BinaryNode(4));

  std::unique_ptr<BinaryNode> bnode2(new BinaryNode(5));

  std::unique_ptr<BinaryNode> broot(new BinaryNode(0, bnode1.get(), bnode2.get()));

  std::cout << "BinarySum = " << binarySum(broot.get()) << std::endl;



  std::vector<NaryNode*> children;

  std::unique_ptr<NaryNode> nnode1(new NaryNode(4, children));

  std::unique_ptr<NaryNode> nnode2(new NaryNode(5, children));

  std::unique_ptr<NaryNode> nnode3(new NaryNode(6, children));

  children.push_back(nnode1.get());

  children.push_back(nnode2.get());

  children.push_back(nnode3.get());

  std::unique_ptr<NaryNode> nroot(new NaryNode(0, children));

  std::cout << "NarySum = " << narySum(nroot.get()) << std::endl;

}

Assembly highlights:

Uploaded the full file as a Gist (it's 6437 lines long, we aren't in Hello World territory any more; on the other hand, this program would have been much longer in C.

The function _Z9binarySumP10BinaryNode is defined near the top, and the function _Z7narySumP8NaryNode right under it.

The assembly code follows along almost exactly as the C++ code above (of course all the struct offsets are "just added" and you don't see the members being accessed directly).

The recursion is more obviously visible for the first function, where you see a pattern of "call the function", "call the function again" and then add the node value to it. For the second function, you can see calls to vector<NaryNode>::begin() and vector<NaryNode>::end() at the beginning of the loop, and the test 'n' jump and the recursive call in the middle of the body.

It is useful also to see how main() works here, you can see three calls to the BinaryNode constructor (_ZN10BinaryNodeC1EiPS_S0_), and then similar stuff for the NaryNode objects. At the end are multiple calls to the destructors (similar to callq _ZNSt10unique_ptrI8NaryNodeSt14default_deleteIS0_EED1Ev).

Most of the rest of the code is the definition of these functions (vector begin/end and unique_ptr delete etc.)

Note: I found a better way to show the assembly <-> source correspondence, in an interactive way: GCC Explorer and I'll use that next time, it's just way more convenient. Also, I've been showing unoptimized assembly, which is just noise, will refer to -O2 next time.

Haskell version


data BinaryTree a = BinaryLeaf a | BinaryBranch (BinaryTree a) (BinaryTree a)

data NaryTree a = NaryLeaf a | NaryBranch [NaryTree a]



binarySum :: (Num a) => (BinaryTree a) -> a

binarySum (BinaryLeaf x) = x

binarySum (BinaryBranch a b) =  (binarySum a) + (binarySum b)



narySum :: (Num a) => NaryTree a -> a

narySum (NaryLeaf x) = x

narySum (NaryBranch b) = foldl (\acc x -> acc + (narySum x)) 0 b



--Create dummy binary and n-ary trees and print out their sum

main = let bt = BinaryBranch (BinaryLeaf 4) (BinaryLeaf 5)

           nt = NaryBranch ((NaryLeaf 4) : (NaryLeaf 5) : (NaryLeaf 6) : [])

       in do

          putStrLn ("BinarySum = " ++ show(binarySum bt))

          putStrLn ("NarySum = " ++ show(narySum nt))

Assembly highlights:

Uploaded the full file (core) as a Gist

main is present as main1 near the bottom, which has two calls to Handle.Text.hPutStr2

The binary sum is unpacked quite straightforwardly :-


main_$sbinarySum =

  \ (ds_dBk :: BinaryTree Type.Integer) ->

    case ds_dBk of _ {

      BinaryLeaf x_aaA -> x_aaA;

      BinaryBranch a_aaB b_aaC ->

        Type.plusInteger

          (main_$sbinarySum a_aaB) (main_$sbinarySum b_aaC)

    }

whereas the nary sum is less so :-


main_$snarySum =

  \ (ds_dtA :: NaryTree Type.Integer) ->

    case ds_dtA of _ {

      NaryLeaf x_aad -> x_aad;

      NaryBranch b_aae -> lgo_r1MR lvl_r1MQ b_aae

    }



lgo_r1MR =

  \ (z_atM :: Type.Integer)

    (ds_atN :: [NaryTree Type.Integer]) ->

    case ds_atN of _ {

      [] -> z_atM;

      : x_atS xs_atT ->

        lgo_r1MR

          (Type.plusInteger z_atM (main_$snarySum x_atS))

          xs_atT

    }

with the second function being defined for iteration. I uploaded a gist with the assembly output, which is of course tedious and quite spaghetti-like.

Here is an example of the nary leaf:


.globl Main_NaryLeaf_closure

.type Main_NaryLeaf_closure, @object

Main_NaryLeaf_closure:

	.quad	Main_NaryLeaf_info

.text

	.align 8

	.quad	4294967301

	.quad	0

	.quad	15

Main_NaryLeaf_info:

.LcBb:

	addq $16,%r12

	cmpq 144(%r13),%r12

	ja .LcBg

	movq $Main_NaryLeaf_con_info,-8(%r12)

	movq %r14,0(%r12)

	leaq -7(%r12),%rbx

	jmp *0(%rbp)

.LcBg:

	movq $16,192(%r13)

.LcBe:

	movl $Main_NaryLeaf_closure,%ebx

	jmp *-8(%r13)

	.size Main_NaryLeaf_info, .-Main_NaryLeaf_info

Running time

Pretty much on par here:


$ time ./cpptree

BinarySum = 9

NarySum = 15



real	0m0.005s

user	0m0.000s



$ time ./haskelltree 

BinarySum = 9

NarySum = 15



real	0m0.004s

user	0m0.000s

sys	0m0.000s

Fast compilation, comparable run-time, shorter code ... we're talking serious competition here :)

Tags: old-post

« Rudimentary note taking on the command line Clang vs Gcc, which would you use? »