March 13, 2013

Comparing basic iteration in C++ and Haskell

Iteration in C++ and Haskell: Part 1

For loop iteration: C

First, basic iteration over an array.

C++ Source :


#include <iostream>

#include <vector>



int main(int argc, char**argv) { 

  std::vector<int> v { 12, 23, 34 ,45 ,56 ,67 ,78, 89 };

  for (auto i : v) { 

    std::cout << i << std::endl;

  } 

}

Compile/Assemble:


$ g++ -std=c++0x -S -o simpleloopcpp simpleloop.cpp

I've stored the result in [this

gist](https://gist.github.com/agam/5194257). As you can see, this is

enormous !! And all because we introduced ```STL``` into the picture !!!

So I obviously won't include all 1742 lines here, but let's see if we can

scan some high level features and go a bit deeper into the parts of the

code we'll compare with the haskell version of this.

(Skipping some boiler plate and the initializer list creater )


	.text

	.globl	main

	.type	main, @function

main:

.LFB1490:

	.cfi_startproc

	.cfi_personality 0x3,__gxx_personality_v0

	.cfi_lsda 0x3,.LLSDA1490

	pushq	%rbp

	.cfi_def_cfa_offset 16

	.cfi_offset 6, -16

	movq	%rsp, %rbp

	.cfi_def_cfa_register 6

	pushq	%rbx

	subq	$120, %rsp

	movl	%edi, -116(%rbp)

	movq	%rsi, -128(%rbp)

	leaq	-17(%rbp), %rax

	movq	%rax, %rdi

	.cfi_offset 3, -24

Here is the

```std::initializer_list``` being initialized


	call	_ZNSaIiEC1Ev

	leaq	-80(%rbp), %rax

	movl	$8, %edx

	movl	$._89, %esi

	movq	%rax, %rdi

	call	_ZNSt16initializer_listIiEC1EPKim

(It is declared elsewhere)


	.size	._89, 32

._89:

	.long	12

	.long	23

	.long	34

	.long	45

	.long	56

	.long	67

	.long	78

	.long	89

... and then the ```vector``` initialized from it ...`


	leaq	-17(%rbp), %rcx

	movq	-80(%rbp), %rsi

	movq	-72(%rbp), %rdx

	leaq	-112(%rbp), %rax

	movq	%rax, %rdi

.LEHB0:

	call	_ZNSt6vectorIiSaIiEEC1ESt16initializer_listIiERKS0_

Ignore this, it's just freeing up the ```std::allocator``` object that is

passed to the vector (the actual function signature for the constructor is

std::allocator const&)```)




.LEHE0:

	leaq	-17(%rbp), %rax

	movq	%rax, %rdi

	call	_ZNSaIiED1Ev

	leaq	-112(%rbp), %rax

	movq	%rax, -32(%rbp)

	movq	-32(%rbp), %rax

	movq	%rax, %rdi

Aha ... finally, the iteration, from beginning

.LEHB1:

	call	_ZSt5beginISt6vectorIiSaIiEEEDTcldtfp_5beginEERT_

	movq	%rax, -64(%rbp)

	movq	-32(%rbp), %rax

	movq	%rax, %rdi

... to end

	call	_ZSt3endISt6vectorIiSaIiEEEDTcldtfp_3endEERT_

	movq	%rax, -48(%rbp)

	jmp	.L3

Beginning of the loop: we take the value of the iterator ...

.L4:

	leaq	-64(%rbp), %rax

	movq	%rax, %rdi

	call	_ZNK9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEdeEv

	movl	(%rax), %eax

	movl	%eax, -24(%rbp)

	movl	-24(%rbp), %eax

	movl	%eax, %esi

... and print it out.
(This is the ```operator <<```, being called on the contents of ```edi```,
which is ```std::cout```)

	movl	$_ZSt4cout, %edi

	call	_ZNSolsEi

And this big mangled symbol is ```std::endl```.

	movl

$_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi

	movq	%rax, %rdi

	call	_ZNSolsEPFRSoS_E

BTW a useful tool to find
unmangled symbols is ```c++filt```, use it as

$ c++filt _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_



std::basic_ostream<char, std::char_traits<char> >& std::endl<char,

std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char>

>&)



I googled and found an [online version of the
same](http://slush.warosu.org/c++filtjs/) but it wasn't always accurate, so
YMMV.
Moving on ... then the iterator advances along the vector ...

	leaq	-64(%rbp), %rax

	movq	%rax, %rdi

	call	_ZN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEppEv

.L3:

	leaq	-48(%rbp), %rdx

	leaq	-64(%rbp), %rax

	movq	%rdx, %rsi

	movq	%rax, %rdi

... and if we haven't reached the end yet, jump back to the beginning of
the loop (the ```.L4``` label above)

	call

_ZN9__gnu_cxxneIPiSt6vectorIiSaIiEEEEbRKNS_17__normal_iteratorIT_T0_EESA_

.LEHE1:

	testb	%al, %al

	jne	.L4

	leaq	-112(%rbp), %rax

	movq	%rax, %rdi

When done, destroy the vector (with some error handling, so




.LEHB2:

	call	_ZNSt6vectorIiSaIiEED1Ev

	movl	$0, %eax

	addq	$120, %rsp

	popq	%rbx

	popq	%rbp

	.cfi_remember_state

	.cfi_def_cfa 7, 8

	ret

.L7:

	.cfi_restore_state

	movq	%rax, %rbx

	leaq	-17(%rbp), %rax

	movq	%rax, %rdi

	call	_ZNSaIiED1Ev

	movq	%rbx, %rax

	movq	%rax, %rdi

	call	_Unwind_Resume

.LEHE2:

.L8:

	movq	%rax, %rbx

	leaq	-112(%rbp), %rax

	movq	%rax, %rdi

	call	_ZNSt6vectorIiSaIiEED1Ev

	movq	%rbx, %rax

	movq	%rax, %rdi

.LEHB3:

	call	_Unwind_Resume

.LEHE3:

	.cfi_endproc

Now I've skipped quite a lot of the generated assembly, and you should
look at the gist I mentioned above for the code for each of the function
calls mentioned above, e.g. the code for
the 'beginning' of the iterator:

_ZSt5beginISt6vectorIiSaIiEEEDTcldtfp_5beginEERT_:

.LFB1564:

	.cfi_startproc

	pushq	%rbp

	.cfi_def_cfa_offset 16

	.cfi_offset 6, -16

	movq	%rsp, %rbp

	.cfi_def_cfa_register 6

	subq	$16, %rsp

	movq	%rdi, -8(%rbp)

	movq	-8(%rbp), %rax

	movq	%rax, %rdi

	call	_ZNSt6vectorIiSaIiEE5beginEv

	leave

	.cfi_def_cfa 7, 8

	ret

	.cfi_endproc

(which wraps around a call to ```std::vector::begin()```)
For loop iteration: Haskell (Basic lists)
Here's a simple program that uses the standard ```Prelude``` list type.

x = [ 12, 23, 34, 45, 56, 67, 78 ,89 ]



main = mapM_ print x

Full assembly is here ... and would
you believe it, it's one-fifth the size of the C++ assembly.
Based on our experience in the [Hello World
example](/posts/hello-world-comparison.html), the Haskell assembly won't
have a linear one-to-one mapping with the source, so let's start with the
function.




.globl Main_main_info

.type Main_main_info, @object

Main_main_info:

.Lcuj:

	leaq -16(%rbp),%rax

	cmpq %r15,%rax

	jb .Lcul

	addq $16,%r12

	cmpq 144(%r13),%r12

	ja .Lcun

	movq $stg_CAF_BLACKHOLE_info,-8(%r12)

	movq 160(%r13),%rax

	movq %rax,0(%r12)

	movq %r13,%rdi

	movq %rbx,%rsi

	leaq -8(%r12),%rdx

Once again, we see the ```CAF``` allocation

	subq $8,%rsp

	movl $0,%eax

	call newCAF

	addq $8,%rsp

	testq %rax,%rax

	je .Lcuo

.Lcup:

	movq $stg_bh_upd_frame_info,-16(%rbp)

	leaq -8(%r12),%rax

	movq %rax,-8(%rbp)

Somewhat boring ... it uses ```Control::MapM``` and ```
GHC::Base::IO```

	movl $base_ControlziMonad_mapMzu_closure,%ebx

	movl $base_GHCziBase_zdfMonadIO_closure,%r14d

	movl $stc_closure,%esi

	movl $stb_closure+2,%edi

	addq $-16,%rbp

	jmp stg_ap_ppp_fast

.Lcun:

	movq $16,192(%r13)

.Lcul:

	jmp *-16(%r13)

.Lcuo:

	jmp *(%rbx)

... which in turn is defined as (skipping the usual paperwork of
and ```stg_CAF_BLACKHOLE_info```)




.Lcu2:

	movq $stg_bh_upd_frame_info,-16(%rbp)

	leaq -8(%r12),%rax

	movq %rax,-8(%rbp)

	movl $base_SystemziIO_print_closure,%ebx

	movl $base_GHCziShow_zdfShowInteger_closure,%r14d

	addq $-16,%rbp

	jmp stg_ap_p_fast

.Lcu0:

	movq $16,192(%r13)

.LctY:

	jmp *-16(%r13)

.Lcu1:

	jmp *(%rbx)

(using ```System::IO::print``` and ```GHC::Show::ShowInteger```)
So nothing much to directly glean here – iteration isn't anything special
in haskell, it's another function call.
The assembly for the data is incredibly verbose, on the lines of 

.data

	.align 8

.align 1

ssX_closure:

	.quad	integerzmgmp_GHCziIntegerziType_Szh_static_info

	.quad	56

.data

	.align 8

.align 1

ssZ_closure:

	.quad	ghczmprim_GHCziTypes_ZC_static_info

	.quad	ssX_closure+1

	.quad	ssW_closure+2

	.quad	0



...

...

...



.data

	.align 8

.align 1

st6_closure:

	.quad	integerzmgmp_GHCziIntegerziType_Szh_static_info

	.quad	23

.data

	.align 8

.align 1

st8_closure:

	.quad	ghczmprim_GHCziTypes_ZC_static_info

	.quad	st6_closure+1

	.quad	st5_closure+2

	.quad	0

For loop iteration: Haskell (Other list structures)
Let's look at a different kind of list in haskell

import Prelude hiding (mapM_);

import Data.Sequence;

import Data.Foldable (mapM_);



x = Data.Sequence.singleton 12 |> 23 |> 34 |> 45 |> 56 |> 67 |> 78  |> 89

main = mapM_  print  x

(Assembly here)
The cells of this sequence are still constructed in what seems to me a bit
heavyweight manner:

sEH_info:

.LcGi:

	leaq -16(%rbp),%rax

	cmpq %r15,%rax

	jb .LcGk

	addq $32,%r12

	cmpq 144(%r13),%r12

	ja .LcGm

	movq $stg_upd_frame_info,-16(%rbp)

	movq %rbx,-8(%rbp)

	movq $integerzmgmp_GHCziIntegerziType_Szh_con_info,-24(%r12)

	movq $67,-16(%r12)

	movq $sED_info,-8(%r12)

	movl $containerszm0zi4zi2zi1_DataziSequence_zbzg_closure,%ebx

	leaq -8(%r12),%r14

	leaq -23(%r12),%rsi

	addq $-16,%rbp

	jmp stg_ap_pp_fast

.LcGm:

	movq $32,192(%r13)

(This extract shows the closure that adds ```67``` in the list)
It's possible to construct this directly from a finite list, and the same
can be done for the ```Data.Vector``` type too :

import Prelude hiding (mapM_);

import Data.Vector;



x = Data.Vector.fromList [ 12, 23, 34, 45, 56, 67, 78, 89 ]

main = mapM_  print  x

(Assembly here)
The vector is still constructed piece by piece though:

	movq $23,-64(%r12)

	movq $ghczmprim_GHCziTypes_ZC_con_info,-56(%r12)

	leaq -71(%r12),%rax

	movq %rax,-48(%r12)

	leaq -94(%r12),%rax

	movq %rax,-40(%r12)

	movq $integerzmgmp_GHCziIntegerziType_Szh_con_info,-32(%r12)

	movq $12,-24(%r12)

	movq $ghczmprim_GHCziTypes_ZC_con_info,-16(%r12)

	leaq -31(%r12),%rax

	movq %rax,-8(%r12)

	leaq -54(%r12),%rax

	movq %rax,0(%r12)

	movl $vectorzm0zi9zi1_DataziVector_fromList_closure,%ebx

	leaq -14(%r12),%r14

	addq $-16,%rbp

	jmp stg_ap_p_fast



(This extract shows the steps for picking up 23 and 12 (skipping similar
steps for the other numbers)
I was curious if there was a way to avoid constructing the list

import Data.Vector.Unboxed as U;



x :: U.Vector Double

x = U.fromList [ 12, 23, 34, 45, 56, 67, 78, 89 ]

main = U.foldr ((>>) . print) (return ())  x

(Assembly here)

...

... (skipping numbers after 23)

...

	movq $23,-48(%r12)

	movq $ghczmprim_GHCziTypes_ZC_con_info,-40(%r12)

	leaq -55(%r12),%rax

	movq %rax,-32(%r12)

	leaq -78(%r12),%rax

	movq %rax,-24(%r12)

	movq $ghczmprim_GHCziTypes_ZC_con_info,-16(%r12)

	movq $stg_INTLIKE_closure+449,-8(%r12)

	leaq -38(%r12),%rax

	movq %rax,0(%r12)

	movl $vectorzm0zi9zi1_DataziVectorziUnboxed_fromList_closure,%ebx

	movl

$vectorzm0zi9zi1_DataziVectorziUnboxedziBase_zdfUnboxInt_closure,%r14d

	leaq -14(%r12),%rsi

	addq $-16,%rbp

	jmp stg_ap_pp_fast





In this case, looking at ```Core``` might help.



Main.main :: GHC.Types.IO ()

[LclIdX]

Main.main =

  Data.Vector.Unboxed.foldr

    @ GHC.Types.Int

    @ (GHC.Types.IO ())

    Data.Vector.Unboxed.Base.$fUnboxInt

    (GHC.Base..

       @ (GHC.Types.IO ())

       @ (GHC.Types.IO () -> GHC.Types.IO ())

       @ GHC.Types.Int

       (GHC.Base.>> @ GHC.Types.IO GHC.Base.$fMonadIO @ () @ ())

       (System.IO.print @ GHC.Types.Int GHC.Show.$fShowInt))

    (GHC.Base.return

       @ GHC.Types.IO GHC.Base.$fMonadIO @ () GHC.Tuple.())

    (Data.Vector.Unboxed.fromList

       @ GHC.Types.Int

       Data.Vector.Unboxed.Base.$fUnboxInt

       (GHC.Types.:

          @ GHC.Types.Int

          (GHC.Types.I# 12)

          (GHC.Types.:

             @ GHC.Types.Int

             (GHC.Types.I# 23)

             (GHC.Types.:

                @ GHC.Types.Int

                (GHC.Types.I# 34)

                (GHC.Types.:

                   @ GHC.Types.Int

                   (GHC.Types.I# 45)

                   (GHC.Types.:

                      @ GHC.Types.Int

                      (GHC.Types.I# 56)

                      (GHC.Types.:

                         @ GHC.Types.Int

                         (GHC.Types.I# 67)

                         (GHC.Types.:

                            @ GHC.Types.Int

                            (GHC.Types.I# 78)

                            (GHC.Types.:

                               @ GHC.Types.Int

                               (GHC.Types.I# 89)

                               (GHC.Types.[] @ GHC.Types.Int))))))))))



:Main.main :: GHC.Types.IO ()

[LclIdX]

:Main.main = GHC.TopHandler.runMainIO @ () Main.main

But no, even in the case of a ```Data.Vector.Unboxed```, the representation
for the array data is never as compact as the C++ version.
In terms of running speed though (again, printing out tiny lists is a poor,
poor benchmark, but still) it seems close enough:

$ time (while ((n++ < 100)); do ./simpleloopcpp; done)



real	0m0.312s

user	0m0.016s

sys	0m0.044s



$ time (while ((n++ < 100)); do ./simpleloophaskell; done)



real	0m0.355s

user	0m0.024s

sys	0m0.060s




    
 
    Tags: 
    
    old-post
    





    
        
        « Hakyll broken ?
        
        
        Comparing `Hello World` in Go »




Copyright © 2020 Agam Brahma
    Powered by Cryogen