<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I See Dead Code &#187; lang:en</title>
	<atom:link href="http://shlomme.diotavelli.net/category/english/feed/" rel="self" type="application/rss+xml" />
	<link>http://shlomme.diotavelli.net</link>
	<description>… as sounding brass, or a tinkling cymbal.</description>
	<lastBuildDate>Sun, 11 Dec 2011 00:53:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>Infrequently Asked Linguistics Questions (IALQ), No. 1</title>
		<link>http://shlomme.diotavelli.net/2011/12/11/infrequently-asked-linguistics-questions-ialq-no-1/</link>
		<comments>http://shlomme.diotavelli.net/2011/12/11/infrequently-asked-linguistics-questions-ialq-no-1/#comments</comments>
		<pubDate>Sun, 11 Dec 2011 00:53:12 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[treealigner]]></category>
		<category><![CDATA[uni]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=257</guid>
		<description><![CDATA[Q: “How many constituent trees fit on a single A0 poster in style?” A: “886” Disclaimer Just to be clear, this is really old work. I just happened to find the old draft in my WordPress instance and regenerated the poster. Genesis Back in 2009, I had entertained the idea of pretty-printing parts of the [...]]]></description>
			<content:encoded><![CDATA[<p>Q: “How many constituent trees fit on a single A0 poster in style?”<br />
A: “886”<br />
<span id="more-257"></span></p>
<div style="text-align: center">
<a href="http://diotavelli.net/files/smultron_poster.pdf"><br />
<img src="http://diotavelli.net/files/img/smultron_poster.png" alt="SMULTRON" /><br />
</a>
</div>
<h3>Disclaimer</h3>
<p>Just to be clear, this is really old work. I just happened to find the old draft in my WordPress instance and regenerated the poster.</p>
<h3>Genesis</h3>
<p>Back in 2009, I had entertained the idea of pretty-printing parts of the <a href="http://www.cl.uzh.ch/research/paralleltreebanks/smultron_en.html">SMULTRON treebank</a> onto a large poster for quite some time. While there had always been PDF export in the TreeAligner (well-demonstrated in <a href="http://www.zora.uzh.ch/8816/">Marek, T; Lundborg, J; Volk, M (2008)</a>), the code base had technical restrictions that made rendering large amounts of graphs infeasible. With the new Qt backend that was eventually used for the work in <a href="http://www.zora.uzh.ch/24418/">Marek, T; Schneider, G; Volk, M (2009)</a>, rendering large scenes became feasible; and after wasting a lot of paper, the first version of this poster was presented during the Research Day of the CompSci department of the University of Zürich in September 2009 (the original might still exist, but it is not in my possession), to a largely confused, but well-meaning audience.</p>
<h3>Mass Production</h3>
<p>The original did not have the nice, dynamically-spaced trees and no colors, logos, legends or URLs. After more iterations, we ended up with the final layout as shown in the PDF, of which ten A0 posters were printed. One poster was given to the Linguistics Department of the University of Konstanz, one to the Department of Corpus Linguistics and Morphology at the Humboldt University in Berlin, on to the Theoretical Computational Linguistics Group in Tübingen, and one poster was presented<sup><a href="http://shlomme.diotavelli.net/2011/12/11/infrequently-asked-linguistics-questions-ialq-no-1/#footnote_0_257" id="identifier_0_257" class="footnote-link footnote-identifier-link" title="I still feel bad about this since I cheated my way into the poster session, not having handed it in before, but presenting it anyway at the insistence of the organizers.">1</a></sup> at TLT8 at the Catholic University of Milano and left there as a gift. Two others were sent to SMULTRON collaborators in Sweden. When I left UZH in the May of 2010, the three leftover posters were still in my old office; I do not know what has become of them.</p>
<h3>Tech</h3>
<p>The underlying code doing the rendering is still available as part of <a href="http://kitt.cl.uzh.ch/kitt/hg/treequest/master/">TreeQuest</a>, which itself never saw a public release and may warrant a post of its own at some point. The rendering is done using Qt4, whose graphics facilities I have in fonder memory than the Cairo API. The actual code for creating the layout has not been publicly released, any requests for it should go to the <a href="http://www.cl.uzh.ch/">Department of Computational Linguistics at UZH</a>. The number of graphs being rendered, sizes of graphical elements and line lengths have all been hand-tweaked extensively. There is also an A3 variant available, which has been used as advertisement for <a href="http://www.mlta.uzh.ch/aboutus.html">MTLA</a> as far as I know.</p>
<p>The treebank data used is version 2.0.1 of <a href="http://www.cl.uzh.ch/research/paralleltreebanks/smultron_en.html">SMULTRON</a>. I do not know of any significant changes in version 3.0 that would make it unusable for the script.</p>
<ol class="footnotes"><li id="footnote_0_257" class="footnote">I still feel bad about this since I cheated my way into the poster session, not having handed it in before, but presenting it anyway at the insistence of the organizers.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2011/12/11/infrequently-asked-linguistics-questions-ialq-no-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oh my god—it&#8217;s full of cores!</title>
		<link>http://shlomme.diotavelli.net/2009/05/24/gpu-computing/</link>
		<comments>http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#comments</comments>
		<pubDate>Sun, 24 May 2009 16:17:48 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[hardware]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=204</guid>
		<description><![CDATA[Beginnings After the bleak, joyless work of releasing software, plugging holes of preventing users from clicking buttons they should not have clicked in the first place, writing unctuous documentation and release notes and wrapping code into neat little installers with tiny bows and bells that gently tingle when touched, there are few things as profoundly [...]]]></description>
			<content:encoded><![CDATA[<h5>Beginnings</h5>
<p>After the bleak, joyless work of releasing software, plugging holes of preventing users from clicking buttons they should not have clicked in the first place, writing unctuous documentation and release notes and wrapping code into neat little installers with tiny bows and bells that gently tingle when touched, there are few things as profoundly satisfying as taking code that looked like line noise in the first place, applying some non-trivial transformation to it and still have it look like line noise, only now it&#8217;s twice as fast, or half as long or of some other inaccessible quality that can, on some technical level, be referred to as &#8220;cool&#8221;.</p>
<p>Quite some time ago, I decided that this time, &#8220;cool&#8221; will mean &#8220;runs on graphics hardware, possibly faster&#8221;. For this end, I took my trusty T61 to work, downloaded the latest <a href="http://www.nvidia.com/object/cuda_home.html">CUDA toolkit</a>, found out that nearly all examples crashed when compiled, installed the latest beta drivers from NVIDIA, noticed that everything worked now and started playing around with the examples. </p>
<p>After some time marveling at ball pit simulations and similarly telling examples, I printed out the <a href="http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.pdf">CUDA programming guide</a> and spent a considerable amount of time marveling at how well the NVidia green looked when printed with our institute&#8217;s color laser printer, before reading its more important parts.<br />
<span id="more-204"></span></p>
<h5>Technical Details</h5>
<p>Roughly said, modern graphics hardware from NVIDIA consists of an array of multiprocessors. Each multiprocessor has eight cores, some shared memory, a controller unit and units for transcendental functions. Kernels (methods that run on the device, as opposed to host code) are automatically distributed to the available multiprocessors (their number depending on the hardware, with as many as 120 on the Tesla cards). </p>
<p>Kernel invocations are organized in 3-dimensional thread blocks and 2-dimensional grids. A single thread block can have at most 512 threads (i.e. x × y × z ⩽ 512) and is always executed by a single multiprocessor. Thread blocks are organized in a grid (with at most 65,536 elements on each dimension), which are distributed to the available multiprocessors, and thus inherently scalable. </p>
<p>A multiprocessor divides the threads of a block into <em>warps</em> and all threads of a single warp are executed in parallel. Optimal performance is reached when the control flow in threads of the same warp does not diverge based on the input data, the cores are optimized to execute the same code in several threads on different data (therefore the name SIMT—Single instruction, multiple thread—for this architecture). </p>
<p>There is also a memory hierarchy, from registers to on-die shared memory to global device memory. Data has to be explicitly copied from the system RAM to device memory, which is costly and should be minimized.</p>
<p>In other words, if you have a computation that has to be executed many, many times on different input data, move it the GPU instead and use the freed CPU cycles for shuffling around the data. In other words, ZOOOOOOM.</p>
<h5>Some Experiments</h5>
<p>Back in the day, we had to write code for training <a href="http://shlomme.diotavelli.net/2008/03/13/gmm-code/">Gaussian Mixture Models</a>. At the core of the algorithm, an auxiliary vector containing each point in the training set has to be created for each Gaussian. The creation of the auxiliar vector includes exponentiation, which proved to be quite costly, since it accounted for 50% of the running time of the program<sup><a href="http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#footnote_0_204" id="identifier_0_204" class="footnote-link footnote-identifier-link" title="There might be a smart way of getting rid of the exp() call, though I haven&amp;#8217;t found it yet">1</a></sup>. </p>
<p>Instead of doing comparedly boring optimization work for running that algorithm on a single processor, and even more boring work to have it run on two processors, I partially ported it to have the auxiliary vectors be computed on the GPU, with the result that by moving only this one part of the computation leads to the three-fold speed increase for the whole program. </p>
<p>On my machine (2 GiB RAM, 2.2 GHz Core 2 Duo, NVS 140M with 2 multiprocessors), running the C++-only version<sup><a href="http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#footnote_1_204" id="identifier_1_204" class="footnote-link footnote-identifier-link" title="compiled using g++ 4.4.0 -O3">2</a></sup> takes 33s for 90,000 points and three Gaussians. Running the computation with the same data set on the GPU takes 11s. </p>
<h5>How is it done</h5>
<p>In order to invoke a kernel running on the GPU, the code has to be compiled with <tt>nvcc</tt>, NVIDIA&#8217;s compiler for CUDA. It separates device, which is compiled to some binary code, from host code which is transformed and handed over to the platform compiler. Therefore, it is needed to write some intermediate methods which can be called from &#8220;normal&#8221; C++ and that take care of shuffling data to the device and invoking the kernel methods<sup><a href="http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#footnote_2_204" id="identifier_2_204" class="footnote-link footnote-identifier-link" title="looking at the SDK examples was quite helpful here">3</a></sup>:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">typedef</span> <span style="color: #993333;">float</span> PROB<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #339933;">#define BLOCK_SIZE 16</span>
&nbsp;
Matrix d_in<span style="color: #339933;">;</span>
Matrix d_out<span style="color: #339933;">;</span>
&nbsp;
__device__ __constant__ KernelGaussian d_params<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">20</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">extern</span> <span style="color: #ff0000;">&quot;C&quot;</span>
<span style="color: #993333;">void</span> LoadPoints<span style="color: #009900;">&#40;</span>Matrix in<span style="color: #339933;">,</span> size_t params<span style="color: #009900;">&#41;</span> 
<span style="color: #009900;">&#123;</span>
    d_in.<span style="color: #202020;">height</span> <span style="color: #339933;">=</span> in.<span style="color: #202020;">height</span><span style="color: #339933;">;</span>
    d_in.<span style="color: #202020;">width</span> <span style="color: #339933;">=</span> in.<span style="color: #202020;">width</span><span style="color: #339933;">;</span>    
    size_t size_in <span style="color: #339933;">=</span> in.<span style="color: #202020;">width</span> <span style="color: #339933;">*</span> in.<span style="color: #202020;">height</span> <span style="color: #339933;">*</span> <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>PROB<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    cutilSafeCall<span style="color: #009900;">&#40;</span>cudaMalloc<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">void</span><span style="color: #339933;">**</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span>d_in.<span style="color: #202020;">elements</span><span style="color: #339933;">,</span> size_in<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    cutilSafeCall<span style="color: #009900;">&#40;</span>cudaMemcpy<span style="color: #009900;">&#40;</span>d_in.<span style="color: #202020;">elements</span><span style="color: #339933;">,</span> in.<span style="color: #202020;">elements</span><span style="color: #339933;">,</span> size_in<span style="color: #339933;">,</span> cudaMemcpyHostToDevice<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    d_out.<span style="color: #202020;">width</span> <span style="color: #339933;">=</span> in.<span style="color: #202020;">width</span><span style="color: #339933;">;</span>
    d_out.<span style="color: #202020;">height</span> <span style="color: #339933;">=</span> params<span style="color: #339933;">;</span>
&nbsp;
    size_t size_out <span style="color: #339933;">=</span> d_out.<span style="color: #202020;">width</span> <span style="color: #339933;">*</span> d_out.<span style="color: #202020;">height</span> <span style="color: #339933;">*</span> <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>PROB<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    cudaMalloc<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">void</span><span style="color: #339933;">**</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span>d_out.<span style="color: #202020;">elements</span><span style="color: #339933;">,</span> size_out<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
__global__ <span style="color: #993333;">void</span> computeAuxVectors<span style="color: #009900;">&#40;</span>Matrix in<span style="color: #339933;">,</span> Matrix out<span style="color: #009900;">&#41;</span> 
<span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">int</span> p <span style="color: #339933;">=</span> blockIdx.<span style="color: #202020;">y</span> <span style="color: #339933;">*</span> BLOCK_SIZE <span style="color: #339933;">+</span> threadIdx.<span style="color: #202020;">y</span><span style="color: #339933;">;</span>
    KernelGaussian g <span style="color: #339933;">=</span> d_params<span style="color: #009900;">&#91;</span>blockIdx.<span style="color: #202020;">x</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    PROB d <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>in.<span style="color: #202020;">elements</span><span style="color: #009900;">&#91;</span>p<span style="color: #009900;">&#93;</span> <span style="color: #339933;">-</span> g.<span style="color: #202020;">mean</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span> g.<span style="color: #202020;">stddev</span><span style="color: #339933;">;</span>
    out.<span style="color: #202020;">elements</span><span style="color: #009900;">&#91;</span>blockIdx.<span style="color: #202020;">x</span> <span style="color: #339933;">*</span> out.<span style="color: #202020;">width</span> <span style="color: #339933;">+</span> p<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> g.<span style="color: #202020;">factor</span> <span style="color: #339933;">*</span> expf<span style="color: #009900;">&#40;</span><span style="color: #339933;">-</span><span style="color:#800080;">0.5</span><span style="color: #339933;">*</span>d<span style="color: #339933;">*</span>d<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">extern</span> <span style="color: #ff0000;">&quot;C&quot;</span>
<span style="color: #993333;">void</span> RunComputeAuxVectors<span style="color: #009900;">&#40;</span>
        KernelGaussian<span style="color: #339933;">*</span> g<span style="color: #339933;">,</span> 
        std<span style="color: #339933;">::</span><span style="color: #202020;">vector</span><span style="color: #339933;">&lt;</span>std<span style="color: #339933;">::</span><span style="color: #202020;">vector</span><span style="color: #339933;">&lt;</span>PROB<span style="color: #339933;">&gt;</span> <span style="color: #339933;">&gt;&amp;</span> out<span style="color: #339933;">,</span> 
        size_t points<span style="color: #339933;">,</span> size_t params<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    cutilSafeCall<span style="color: #009900;">&#40;</span>
        cudaMemcpyToSymbol<span style="color: #009900;">&#40;</span>
          d_params<span style="color: #339933;">,</span> g<span style="color: #339933;">,</span> 
          params<span style="color: #339933;">*</span><span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>KernelGaussian<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> 
          <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> cudaMemcpyHostToDevice<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    dim3 dimGrid<span style="color: #009900;">&#40;</span>params<span style="color: #339933;">,</span> points <span style="color: #339933;">/</span> BLOCK_SIZE<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    dim3 dimBlock<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> BLOCK_SIZE<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    computeAuxVectors<span style="color: #339933;">&lt;&lt;&lt;</span>dimGrid<span style="color: #339933;">,</span> dimBlock<span style="color: #339933;">&gt;&gt;&gt;</span><span style="color: #009900;">&#40;</span>d_in<span style="color: #339933;">,</span> d_out<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    cutilCheckMsg<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;Kernel execution failed&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span>size_t i <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> params<span style="color: #339933;">;</span> <span style="color: #339933;">++</span>i<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        cutilSafeCall<span style="color: #009900;">&#40;</span>
            cudaMemcpy<span style="color: #009900;">&#40;</span>
              <span style="color: #339933;">&amp;</span>out<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #339933;">&amp;</span>d_out.<span style="color: #202020;">elements</span><span style="color: #009900;">&#91;</span>i <span style="color: #339933;">*</span> points<span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> 
              <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>PROB<span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> points<span style="color: #339933;">,</span> 
              cudaMemcpyDeviceToHost<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The matrix <tt>d_in</tt> holds the points, which are constant over the whole computation and therefore copied over to the device exactly once. <tt>d_out</tt> is the matrix that holds the auxiliary values computed for each point and each Gaussians. For faster retrieval, the Gaussians are stored in constant device memory (<tt>d_params</tt>), which is set to contain 20 elements at most, i.e. the algorithm is limited to learn mixture models with at most 20 Gaussians. </p>
<p>In the method <tt>LoadPoints</tt>, device memory is allocated for the points and the output vectors and the points are copied from host to device memory using <tt>cudaMemcpy</tt>. </p>
<p>The actual invocation of the kernel is done in the host method <tt>RunComputeAuxVectors</tt>, which first copies the parameters into the constant storage and invokes the kernel <tt>computeAuxVectors</tt> using the new <tt><<<grid, block>>></tt> syntax. <tt>grid</tt> holds the dimensions of the grid, which is <emph>number of gaussians</em>× <em>points / BLOCK_SIZE</em>. The block size is the width of a thread block and somewhat arbitrarily chosen to be 16<sup><a href="http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#footnote_3_204" id="identifier_3_204" class="footnote-link footnote-identifier-link" title="I didn&amp;#8217;t try out other block sizes">4</a></sup>, currently this also limits the number of points to be a multiple of 16<sup><a href="http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#footnote_4_204" id="identifier_4_204" class="footnote-link footnote-identifier-link" title="this is a restriction of the implementation, one could simply pad the points with however many zeros as needed and compute a little more">5</a></sup>. After the blocking kernel invocation, the results are copied back into host memory.</p>
<p>Important:</p>
<ul>
<li>My card only supports single-precision floats, newer ones also have double-precision</li>
<li><tt>blockIdx</tt> and <tt>threadIdx</tt> are global variables provided by CUDA compiler</li>
<li><tt>__global__</tt> methods are run on the device and can be called from both host and device code</li>
<li>Currently, the device memory for <tt>d_in</tt> and <tt>d_out</tt> is never freed, the program simply exits</li>
</ul>
<h5>Results</h5>
<p>Working with CUDA is definitely fun (if rewarding), the system itself is quite unforgiving though. Wrapping each invocation of a CUDA method into <tt>cutilSafeCall</tt> therefore is strongly recommended, in case of an error it will print a (not always helpful) error message and exit. Though in this case the code running on the GPU itself is quite trivial, it can be hard to figure out what is going wrong, because the GPU itself is a black box, and it&#8217;s not possible to simply print out a message<sup><a href="http://shlomme.diotavelli.net/2009/05/24/gpu-computing/#footnote_5_204" id="identifier_5_204" class="footnote-link footnote-identifier-link" title="or maybe it is, and I don&amp;#8217;t know how to do it&amp;#8230;">6</a></sup>. It&#8217;s possible to emulate the code on the CPU, and there are other tools available which I haven&#8217;t tried out. </p>
<p>I&#8217;ve also started toying around with my original idea, having the constraint checks for the TIGER query evaluation run on the graphics hardware and initial results are mixed. Still, if nothing at all, this gives me justification to spent a lot of money on expensive graphics hardware for my next computer—it&#8217;s all for science, this harsh and demanding mistress.</p>
<ol class="footnotes"><li id="footnote_0_204" class="footnote">There might be a smart way of getting rid of the <tt>exp()</tt> call, though I haven&#8217;t found it yet</li><li id="footnote_1_204" class="footnote">compiled using g++ 4.4.0 -O3</li><li id="footnote_2_204" class="footnote">looking at the SDK examples was quite helpful here</li><li id="footnote_3_204" class="footnote">I didn&#8217;t try out other block sizes</li><li id="footnote_4_204" class="footnote">this is a restriction of the implementation, one could simply pad the points with however many zeros as needed and compute a little more</li><li id="footnote_5_204" class="footnote">or maybe it is, and I don&#8217;t know how to do it&#8230;</li></ol>]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2009/05/24/gpu-computing/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Scrollable Widgets with PyGTK</title>
		<link>http://shlomme.diotavelli.net/2009/05/17/scrollable-widgets-with-pygtk/</link>
		<comments>http://shlomme.diotavelli.net/2009/05/17/scrollable-widgets-with-pygtk/#comments</comments>
		<pubDate>Sat, 16 May 2009 23:49:17 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=178</guid>
		<description><![CDATA[It is possible to write custom GTK widgets that have &#8220;native&#8221; scrolling support, as opposed to just shoving them into a GtkViewPort and forgetting about them. Apart from having mastered a small coding challenge, as it turned out to be, this also gives you greater control over the scrolling itself, like making sure that certain [...]]]></description>
			<content:encoded><![CDATA[<p>It is possible to write custom GTK widgets that have &#8220;native&#8221; scrolling support, as opposed to just shoving them into a <a href="http://library.gnome.org/devel/gtk/unstable/GtkViewport.html">GtkViewPort</a> and forgetting about them. </p>
<p>Apart from having mastered a small coding challenge, as it turned out to be, this also gives you greater control over the scrolling itself, like making sure that certain elements are visible, viewport panning etc.</p>
<p>Anyway, especially when using PyGTK, it&#8217;s a bit unclear on how to proceed. From the documentation, it somehow gets clear that it has to do with the signal <code>set_scroll_adjustment_signal</code>:</p>
<blockquote><p>
This signal is emitted when a widget of this class is added to a scrolling aware parent, gtk_widget_set_scroll_adjustments() handles the emission. Implementation of this signal is optional.
</p></blockquote>
<p>This is not a signal name, but a signal ID<sup><a href="http://shlomme.diotavelli.net/2009/05/17/scrollable-widgets-with-pygtk/#footnote_0_178" id="identifier_0_178" class="footnote-link footnote-identifier-link" title="which you usually don&amp;#8217;t seen when coding with PyGTK">1</a></sup> that has to be set in <a href="http://developer.gnome.org/doc/GGAD/z144.html">GtkWidgetClass</a><sup><a href="http://shlomme.diotavelli.net/2009/05/17/scrollable-widgets-with-pygtk/#footnote_1_178" id="identifier_1_178" class="footnote-link footnote-identifier-link" title="ditto">2</a></sup>.</p>
<p>Some more documentation reading reveals that you can set this signal by using the <code>set_set_scroll_adjustments_signal</code> method on a widget:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> ScrollableWidget<span style="color: black;">&#40;</span>gtk.<span style="color: black;">DrawingArea</span><span style="color: black;">&#41;</span>:
    __gsignals__ = <span style="color: black;">&#123;</span>
        <span style="color: #483d8b;">&quot;set-scroll-adjustments&quot;</span>: <span style="color: black;">&#123;</span>
            gobject.<span style="color: black;">SIGNAL_RUN_LAST</span>,
            gobject.<span style="color: black;">TYPE_NONE</span>, <span style="color: black;">&#40;</span>gtk.<span style="color: black;">Adjustment</span>, gtk.<span style="color: black;">Adjustment</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>,
    <span style="color: black;">&#125;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        gtk.<span style="color: black;">DrawingArea</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">set_set_scroll_adjustments_signal</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;set-scroll-adjustments&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>It doesn&#8217;t really matter <em>how</em> you call the signal as long as it takes two arguments (the horizontal and vertical adjustment). This will make the method <a href="http://library.gnome.org/devel/pygtk/stable/class-gtkwidget.html#method-gtkwidget--set-scroll-adjustments"><code>set_scroll_adjustments</code></a> (which you can&#8217;t override from within Python) return <em>True</em> when it is called and signal that the widget supports scrolling.</p>
<p>This, however, is only half the way, because the scrollable widget still needs the adjustments handed in via said methods. It&#8217;s of course possible to connect to the signal explicitly, but there&#8217;s an even more direct way by using action signals. </p>
<p>Action signals are the C programmer&#8217;s idea of &#8220;generic methods&#8221;. In order to create such a signal, it has to have the flag <code>gobject.SIGNAL_ACTION</code> and they are directly connected to a function which is then called on each signal emission. While in C, you have to provide a function pointer, in Python you can just implement functions with a compounded magic name<sup><a href="http://shlomme.diotavelli.net/2009/05/17/scrollable-widgets-with-pygtk/#footnote_2_178" id="identifier_2_178" class="footnote-link footnote-identifier-link" title="a technique which I thoroughly dislike and should be converted to be used with decorators">3</a></sup> and have it called automatically. I haven&#8217;t found any documentation on that in the PyGObject or PyGTK docs, only some examples in the web:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> ScrollableWidget<span style="color: black;">&#40;</span>gtk.<span style="color: black;">DrawingArea</span><span style="color: black;">&#41;</span>:
    __gsignals__ = <span style="color: black;">&#123;</span>
        <span style="color: #483d8b;">&quot;set-scroll-adjustments&quot;</span>: <span style="color: black;">&#123;</span>
            gobject.<span style="color: black;">SIGNAL_RUN_LAST</span> | gobject.<span style="color: black;">SIGNAL_ACTION</span>, 
            gobject.<span style="color: black;">TYPE_NONE</span>, <span style="color: black;">&#40;</span>gtk.<span style="color: black;">Adjustment</span>, gtk.<span style="color: black;">Adjustment</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>,
    <span style="color: black;">&#125;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        gtk.<span style="color: black;">DrawingArea</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">set_set_scroll_adjustments_signal</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;set-scroll-adjustments&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> do_set_scroll_adjustments<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, h_adjustment, v_adjustment<span style="color: black;">&#41;</span>:
         <span style="color: #808080; font-style: italic;"># do some useful stuff here, like saving them</span>
         ...</pre></div></div>

<p>The method being called on emission has to start with <code>do_</code>, following by the signal names with hyphens replaced by underscores.</p>
<p>The adjustment objects can then be configured to one&#8217;s own liking to have scroll bars show up or not. However, to know when the user did some scrolling, it&#8217;s necessary to listen on some signals:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">    <span style="color: #ff7700;font-weight:bold;">def</span> do_set_scroll_adjustments<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, h_adjustment, v_adjustment<span style="color: black;">&#41;</span>:
        h_adjustment.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;value-changed&quot;</span>, <span style="color: #008000;">self</span>._scroll_value_changed<span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>._hadj = h_adjustment
        v_adjustment.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;value-changed&quot;</span>, <span style="color: #008000;">self</span>._scroll_value_changed<span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>._vadj = v_adjustment</pre></div></div>

<p>To make the scroll bar show up, modify <code>upper</code>, <code>lower</code> and <code>page_size</code> on the adjustments.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #008000;">self</span>._hadj.<span style="color: black;">lower</span> = <span style="color: #ff4500;">0</span>
<span style="color: #008000;">self</span>._hadj.<span style="color: black;">upper</span> = <span style="color: #ff4500;">50</span>
<span style="color: #008000;">self</span>._hadj.<span style="color: black;">page_size</span> = <span style="color: #ff4500;">10</span></pre></div></div>

<p>This tells the scrollbar that the size of the underlying picture (<code>upper - lower</code>) is 50, while the visible size (<code>page_size</code>) is 10. </p>
<p>The page size obviously depends on the current size of the widget, which can be retrieved from the underlying <code>gtk.gdk.Window</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">width, height = <span style="color: #008000;">self</span>.<span style="color: black;">window</span>.<span style="color: black;">get_size</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>The current position of the scroll bar is controlled by the property <code>value</code> of the adjustment object and should be in the range <code>[lower .. upper - page_size]</code>. Whenever the property is changed, the <code>value-changed</code> signal is emitted, which we&#8217;ve connected to previously, and the widget can be repainted.</p>
<p>If you&#8217;re curious, you can also see the whole gloriousness in <a href="http://www.cl.uzh.ch/kitt/hg/sta/torsten/file/dc2c113ef300/STA/app/ui/gtktreeview.py#l199">actual working code</a>.</p>
<ol class="footnotes"><li id="footnote_0_178" class="footnote">which you usually don&#8217;t seen when coding with PyGTK</li><li id="footnote_1_178" class="footnote">ditto</li><li id="footnote_2_178" class="footnote">a technique which I thoroughly dislike and should be converted to be used with decorators</li></ol>]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2009/05/17/scrollable-widgets-with-pygtk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Hardware</title>
		<link>http://shlomme.diotavelli.net/2009/05/08/new-hardware/</link>
		<comments>http://shlomme.diotavelli.net/2009/05/08/new-hardware/#comments</comments>
		<pubDate>Fri, 08 May 2009 00:07:48 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[hardware]]></category>
		<category><![CDATA[lang:en]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=176</guid>
		<description><![CDATA[Even before I started working at UZH, I got my new hardware. Back in February, when I visited Zürich to look for rooms, I was offered to choose a new notebook for myself. Since I already have a large notebook (at least that&#8217;s what I consider my 14.1&#8243; T61) and toyed around with the idea [...]]]></description>
			<content:encoded><![CDATA[<p>Even before I started working at UZH, I got my new hardware. Back in February, when I visited Zürich to look for rooms, I was offered to choose a new notebook for myself. Since I already have a large notebook (at least that&#8217;s what I consider my 14.1&#8243; T61) and toyed around with the idea of getting a desktop again (after 4 years of exclusive notebook use!), I chose an <a href="http://www.thinkwiki.org/wiki/Category:X301">X301</a>. I included some hardware upgrades that weren&#8217;t included in the basic offer:</p>
<ul>
<li>+2 GB RAM (4 GB overall)</li>
<li>3G card</li>
<li>USB Port Replicator</li>
<li>DisplayPort->DVI converter</li>
</ul>
<p>Being in Switzerland, I had the choice between CH and US keyboard layout. Since CH is physically the same as the German layout (105 keys), I took it and haven&#8217;t really noticed that the keys have different symbols (as far as most of the special characters are concered) printed on that what appears on the screen when I hit them. I should really get <a href="http://www.daskeyboard.com/">Das Keyboard</a> after all.</p>
<p>The X301 is not quite the workhorse the T61 is, which is especially noticeable from graphics speed, although that might be due to an Intel driver being in several different transitions right now. The SSD compensates for that, booting and starting up is a breeze. Fortunately, I don&#8217;t have to do much booting these days, because unlike other major graphics hardware creators, Intel&#8217;s developers are able to support powersave modes on Linux hardware, both Suspend-to-RAM and Suspend-to-disk.</p>
<p>Everything else works more or less, even the DisplayPort, dutyfully serving my 24&#8243; screen at the university. The pièce de résistance is definitely the 3G card. I had to install it myself, which was quite a bit more involved than what I would have liked, but it worked on the second try. I got myself a prepaid data contract which allows me to surf the net for 3 CHF/h, which is a good complement to free wireless access at most public hotspots in Switzerland, which is due to <a href="http://www.switch.ch/">SWITCH</a>. The wireless became especially hand today, when I was shopping for a new router without having a clue about what&#8217;s good. Standing in the Media Markt (yes, this very pillar of the German retail market also metastasised into Switzerland) network hardware aisles browsing the web with a notebook probably didn&#8217;t look as superior as doing the same thing with a smartphone, but it helped me finding a router.</p>
<p>What really sets the X301 apart is the weight. I&#8217;m quite used to it by now, since I&#8217;ve had it for over a month, but it&#8217;s usually the first response I get when I hand it over to somebody else. It is, however, very noticeable in direct comparison, and the T61 feels clunky and hugely oversized when I carry it around or use it on my lap. The battery lifetime is okay, usually I get 3h under Linux. There&#8217;s probably some more running time to be gotten, but I&#8217;m more or less done tweaking the system for now and using it for actual work; or for blogging, emailing and chatting on my nice, new, comfy chair.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2009/05/08/new-hardware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stack Overflow Statistics</title>
		<link>http://shlomme.diotavelli.net/2009/02/08/stack-overflow-statistics/</link>
		<comments>http://shlomme.diotavelli.net/2009/02/08/stack-overflow-statistics/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 14:38:23 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[lang:en]]></category>
		<category><![CDATA[other]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=129</guid>
		<description><![CDATA[Going through a computational linguistics program will bring you in touch with Zipf&#8217;s Law. Its core claim: In a corpus, the frequency of any word is inversely proportional to its rank. Translated into less-wordy terms, it means that some words (events) occur very often and many words only occur a few times, or only once. [...]]]></description>
			<content:encoded><![CDATA[<p>Going through a computational linguistics program will bring you in touch with <a href="http://en.wikipedia.org/wiki/Zipf%27s_law">Zipf&#8217;s Law</a>. Its core claim:</p>
<blockquote><p>In a corpus, the frequency of any word is inversely proportional to its rank.</p></blockquote>
<p>Translated into less-wordy terms, it means that some words (events) occur very often and many words only occur a few times, or only once.</p>
<p>Zipf&#8217;s Law also holds for similar structures like DNA, but the distribution can also be observed in the user reputation of <a href="http://stackoverflow.com">Stack Overflow</a>. The following three graphs contain the reputation (X) and the frequency of this particular reputation value (Y) on log-scaled axes. With increasing normalization, the plot gets more Zipf-like, with the typical long &#8220;tails&#8221; at the lower end.</p>
<p><img width="500" src="http://diotavelli.net/files/img/zipf-overflow.png" alt="Distribution of Reputation, no normalization" /></p>
<p><img width="500" src="http://diotavelli.net/files/img/zipf-overflow-10.png" alt="Distribution of Reputation, 10r normalization" /></p>
<p><img width="500" src="http://diotavelli.net/files/img/zipf-overflow-100.png" alt="Distribution of Reputation, 100r normalization" /></p>
<p>If we plot the mass distribution of reputation orderd by decreasing reputation on log-log axes, we get something that looks like the cumulative of an exponential distribution:</p>
<p><img width="500" src="http://diotavelli.net/files/img/rmd.png" alt="Reputation Mass Distribution" /></p>
<p>On 2009-02-05, the total amount of reputation on Stack Overflow was 8,491,989, and around 15% of the users make up 85% of the reputation (not completely <a href="http://en.wikipedia.org/wiki/Pareto_distribution">Pareto&#8217;s 80-20</a>), with the top user (of 41,082) owning 0.39% of the overall reputation.</p>
<p>For these graphs, I&#8217;ve scraped the user overview pages, scraping every single user page would allow for more interesting (and accurate, since inactive users can be removed) statistics, but I&#8217;d rather wait for a proper API. </p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2009/02/08/stack-overflow-statistics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The big wheel of commits</title>
		<link>http://shlomme.diotavelli.net/2009/01/31/the-big-wheel-of-commits/</link>
		<comments>http://shlomme.diotavelli.net/2009/01/31/the-big-wheel-of-commits/#comments</comments>
		<pubDate>Sat, 31 Jan 2009 22:28:51 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[treealigner]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=99</guid>
		<description><![CDATA[Yesterday, I merged the frame semantics branch, which I have been working on for my MSc thesis, into my personal repository. Since such a grave step always demands introspection, I looked at all 513 commits I ever to to the TreeAligner repository and created a little statistic on commit times. The picture contains a 24h [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, I merged the frame semantics branch, which I have been working on for my MSc thesis, into my <a href="http://hg.diotavelli.net/sta/shlomme">personal repository</a>. Since such a grave step always demands introspection, I looked at all 513 commits I ever to to the TreeAligner repository and created a little statistic on commit times. </p>
<p>The picture contains a 24h clock and shows the number of commits which were done in this hour of the day, scaled by the highest commit count (that being 2 in the afternoon).</p>
<div style="text-align: center">
<img src="http://diotavelli.net/files/img/commitwheel.png" alt="Commit statistics" /></div>
<p>What does all this tell us, apart from the fact that I had a free hour today? Well, I never code at 5 or 6 in the morning. For the rest of that, it&#8217;s more worthwhile to split the statistics in two parts:</p>
<div style="text-align: center">
<img src="http://diotavelli.net/files/img/before_master.png" alt="Commit statistics, before 10/2008" /><br />
08/2007 – 10/2008, 234 commits
</div>
<div style="text-align: center">
<img src="http://diotavelli.net/files/img/during_master.png" alt="Commit statistics, starting 10/2008" /><br />
10/2008 – now, 279 commits
</div>
<p>Wow, I used to be cool. Hacking late in the evening. Now, it&#8217;s just a day job.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2009/01/31/the-big-wheel-of-commits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I have no life and I must program</title>
		<link>http://shlomme.diotavelli.net/2008/05/31/i-have-no-life-and-i-must-program/</link>
		<comments>http://shlomme.diotavelli.net/2008/05/31/i-have-no-life-and-i-must-program/#comments</comments>
		<pubDate>Sat, 31 May 2008 10:44:53 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=91</guid>
		<description><![CDATA[Some participants of this semester&#8217;s „Computational Linguistics” course (which is a code word for „10 different lecturers guide you through the wonderful world of algorithms and theoretical foundations of CoLi”) obviously lack a life and willfully extended their own homework assignment, writing small toolkits for finite state automata. Surprisingly, all those toolkits were written in [...]]]></description>
			<content:encoded><![CDATA[<p>Some participants of this semester&#8217;s „Computational Linguistics” course (which is a code word for „10 different lecturers guide you through the wonderful world of algorithms and theoretical foundations of CoLi”) obviously lack a life and willfully extended their own homework assignment, writing small toolkits for finite state automata.</p>
<p>Surprisingly, all those toolkits were written in Python and made our C++-affine lecturer wonder if he probably should look into this language some more, apart from suggesting that we „maybe […] should get a life.”</p>
<p>So what does my toolkit do?</p>
<ul>
<li>Determinization of NFSAs to DFSAs</li>
<li>Creation of DFAs from simple regular expressions</li>
<li>Application of FSTs</li>
<li>dot graph output for *FSAs</li>
</ul>
<p>All in all, not very impressive, and not very hard. Still, if you like <a href="http://pyparsing.wikispaces.com/">PyParsing</a> and generator-expression-prone Python code, you might want to have look at <a href="http://diotavelli.net/files/tinyfst.tar.bz2">the TinyFST code</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/05/31/i-have-no-life-and-i-must-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Look at me, I know Functional Programming!</title>
		<link>http://shlomme.diotavelli.net/2008/05/05/look-at-me-i-know-functional-programming/</link>
		<comments>http://shlomme.diotavelli.net/2008/05/05/look-at-me-i-know-functional-programming/#comments</comments>
		<pubDate>Mon, 05 May 2008 20:22:14 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2008/05/05/look-at-me-i-know-functional-programming/</guid>
		<description><![CDATA[Sometimes I wonder what the hell I might have been thinking while writing code like this1 : 1 2 3 4 5 6 7 def uncurry&#40;f&#41;: return lambda t: f&#40;*t&#41; &#160; def longest_common_prefix&#40;Sa, Sb&#41;: return len&#40;list&#40; takewhile&#40;uncurry&#40;eq&#41;, izip&#40;Sa, Sb&#41;&#41;&#41;&#41; I bet the tenses are all wrong again.]]></description>
			<content:encoded><![CDATA[<p>Sometimes I wonder what the hell I might have been thinking while writing code like this<sup><a href="http://shlomme.diotavelli.net/2008/05/05/look-at-me-i-know-functional-programming/#footnote_0_89" id="identifier_0_89" class="footnote-link footnote-identifier-link" title="I bet the tenses are all wrong again.">1</a></sup> :</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> uncurry<span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff7700;font-weight:bold;">lambda</span> t: f<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>t<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> longest_common_prefix<span style="color: black;">&#40;</span>Sa, Sb<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">list</span><span style="color: black;">&#40;</span>
        takewhile<span style="color: black;">&#40;</span>uncurry<span style="color: black;">&#40;</span>eq<span style="color: black;">&#41;</span>,
                  izip<span style="color: black;">&#40;</span>Sa, Sb<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<ol class="footnotes"><li id="footnote_0_89" class="footnote">I bet the tenses are all wrong again.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/05/05/look-at-me-i-know-functional-programming/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GMM Code</title>
		<link>http://shlomme.diotavelli.net/2008/03/13/gmm-code/</link>
		<comments>http://shlomme.diotavelli.net/2008/03/13/gmm-code/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 21:35:34 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2008/03/13/gmm-code/</guid>
		<description><![CDATA[For one of the exercises, we had to implement the EM algorithm for Gaussian Mixture Models. I&#8217;ve spent a considerable amount of time on my solutions, either because I wanted to learn a new language (Scala version) or I wanted to not forgot an old one (C++ version), so I don&#8217;t want the code simply [...]]]></description>
			<content:encoded><![CDATA[<p>For one of the exercises, we had to implement the EM algorithm for <a href="http://en.wikipedia.org/wiki/Mixture_model">Gaussian Mixture Models</a>. I&#8217;ve spent a considerable amount of time on my solutions, either because I wanted to learn a new language (<a href="http://diotavelli.net/files/code/gmm.scala">Scala version</a>) or I wanted to not forgot an old one (<a href="http://diotavelli.net/files/code/gmm.cpp">C++ version</a>), so I don&#8217;t want the code simply rotting on my hard drive.</p>
<p>The C++ version isn&#8217;t that much faster than the Scala version if I remember my experiments correctly (about 4x). Judging from the call graph of the C++ version, most of the time is spent in the <i>exp</i> function anyway, which is as fast as it gets.</p>
<p><img style="margin-left:auto; margin-right:auto; display:block" src="http://diotavelli.net/files/img/gmm-callgraph.png" alt="Callgraph of the C++ version" /></p>
<p>The input file simply lists one float value per line, and the initial parameters for the Gaussians can be specified in the source files. Be advised that the number of Gaussians used to approximate the data needs to be known before the algorithm is run.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/03/13/gmm-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pattern &amp; Speech Recognition Leftovers</title>
		<link>http://shlomme.diotavelli.net/2008/03/13/pattern-speech-recognition-leftovers/</link>
		<comments>http://shlomme.diotavelli.net/2008/03/13/pattern-speech-recognition-leftovers/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 21:07:14 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[lang:en]]></category>
		<category><![CDATA[uni]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2008/03/13/pattern-speech-recognition-leftovers/</guid>
		<description><![CDATA[While learning for the exam of this semester&#8217;s Pattern &#038; Speech Recognition course by Prof. Klakow (highly, highly recommended), we (a couple of people, look for the names in the document itself) put together a summary with a couple a topics from the course. Topics Feature Extraction from Sound Bayesian Decision Theory Maximum Likelihood Estimation [...]]]></description>
			<content:encoded><![CDATA[<p>While learning for the exam of this semester&#8217;s <a href="http://www.lsv.uni-saarland.de/e_speech_ws0708.htm">Pattern &#038; Speech Recognition</a> course by Prof. Klakow (highly, highly recommended), we (a couple of people, look for the names in the document itself) put together a summary with a couple a topics from the course.</p>
<h5>Topics</h5>
<ol>
<li>Feature Extraction from Sound</li>
<li>Bayesian Decision Theory</li>
<li>Maximum Likelihood Estimation</li>
<li>Nonparametric Techniques</li>
<li>Gaussian Mixture Models</li>
<li>Decision Trees</li>
</ol>
<p>The text is put together from various sources, but mostly based on slides and notes from the lectures. Some sections are pending (Hidden Markov Models, Bayesian Networks, Markov Random Fields), other topics from the lecture are plainly missing (HMMs in Speech Recognition, Acoustic Modeling, Speaker Adaptation, Normal Distributions). However, to my knowledge, nobody has been examined in any of the missing speeach-recognition related missing sections.</p>
<p>Get the <a href="http://diotavelli.net/files/psr0708-summary/latest.pdf">PDF version</a> of the summary.</p>
<h5>LaTeX Sources</h5>
<p>The LaTeX source file, along with pictures, is kept in a <a href="http://www.selenic.com/mercurial/">Mercurial</a> repository. To get the source files, do:<br />
<tt>$&nbsp;hg&nbsp;clone&nbsp;static-http://diotavelli.net/files/psr0708-summary/repository&nbsp;psr</tt>
</p>
<p>The source file is named <i>summ.tex</i> and should build on most LaTeX installations without requiring additional packages.</p>
<p><b>Please notify me of any bugs or errors!</b></p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/03/13/pattern-speech-recognition-leftovers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

