<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I See Dead Code &#187; coli</title>
	<atom:link href="http://shlomme.diotavelli.net/category/coli/feed/" rel="self" type="application/rss+xml" />
	<link>http://shlomme.diotavelli.net</link>
	<description>… as sounding brass, or a tinkling cymbal.</description>
	<lastBuildDate>Sun, 12 Jul 2009 19:38:04 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Yes, Master.</title>
		<link>http://shlomme.diotavelli.net/2009/05/24/yes-master/</link>
		<comments>http://shlomme.diotavelli.net/2009/05/24/yes-master/#comments</comments>
		<pubDate>Sun, 24 May 2009 13:50:29 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[studies]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=191</guid>
		<description><![CDATA[My Master thesis &#8220;Integration of Light-weight Semantics into a Syntax Query Formalism&#8221; is available from the SALSA project pages.
Abstract
In the Computational Linguistics community, much work is put into the creation of large, high-quality linguistic resources, often with complex annotation. In order to make these resources accessible to nontechnical audiences, formalisms for searching and filtering are [...]]]></description>
			<content:encoded><![CDATA[<p>My Master thesis &#8220;Integration of Light-weight Semantics into a Syntax Query Formalism&#8221; is available from the <a href="http://www.coli.uni-saarland.de/projects/salsa/page.php?id=theses">SALSA project pages</a>.</p>
<h3>Abstract</h3>
<blockquote><p>In the Computational Linguistics community, much work is put into the creation of large, high-quality linguistic resources, often with complex annotation. In order to make these resources accessible to nontechnical audiences, formalisms for searching and filtering are needed. </p>
<p>The TIGER query language can, by describing partial structures, be used to search treebanks with syntactic annotation. Recently, augmented treebanks have been published, including the SALSA corpus which features frame semantic annotation on top of syntactic structure. Query languages, however, need to keep up with newly introduced annotation, allowing it to be searchable and easy to access.</p>
<p>We design an extension for the TIGER language which allows searching for frame structures along with syntactic annotation. To achieve this, the TIGER object model is expanded to include frame semantics, while remaining fully backwards-compatible.</p>
<p>Finally, these extensions have been added to our own implementation of TIGER, which includes novel indexing features not found in the original work of Lezius (2002a).</p></blockquote>
<p><span id="more-191"></span></p>
<h3>What does it all mean?</h3>
<p>In the most basic sense of all, the TIGER query language allows specification of nodes (which are flat feature structures) and relations between these nodes. So far, only syntactic nodes (words and phrases) and syntactic relations (dominance, precedence and structure sharing) were supported in the query language, while the underlying annotation formalism had been extended to include frame semantics as well. My conservative extension of the query language introduces types and relations for frame semantics. This makes it possible to express linguistic queries such as <emph>Find all sentences where the role TOPIC in the frame<br />
STATEMENT is realized by a PP with the preposition &#8220;über&#8221;</emph>, which was not possible previously:</p>
<pre>
{frame="Statement"} > #r:{role="Topic"} &#038;
#pp:[cat="PP"] >AC [word="über"] &#038;
#r > #pp &#038; arity(#r, 1)
</pre>
<p>What is cryptically referred to as &#8220;novel indexing techniques&#8221; are improvements to the candidate selection for relation checks, which now exploits some graph-theoretic notions which can be used as rough filters prior to actual relation checks, which can be quite expensive. All in all, the implementation is generally faster than TIGERSearch (the original implementation by Lezius) for complex queries, for simple queries, it is slower, because our node index is slower.</p>
<h3>Can I try it out</h3>
<p>A demo system for the original and extended query language is online on the <a href="http://fnps.coli.uni-saarland.de:8080/">CoLi webservers</a> in Saarbrücken. With regard to features, this is the latest version, since then I committed one bugfix to the query evaluator.</p>
<h3>Will you continue work on it?</h3>
<p>Hopefully, yes. Current directions of ongoing development include:</p>
<ul>
<li>In Progress:
<ul>
<li>Client-side rendering of trees in the query front-end (using the <a href="https://developer.mozilla.org/en/Canvas_tutorial">HTML5 canvas</a>)</li>
</ul>
</li>
<li>Planned:
<ul>
<li>Custom-written node index</li>
<li>Relations between graphs and nodes in different graphs</li>
</ul>
</li>
</ul>
<p>I&#8217;m also running some experiments for massively parallel constraint evaluation using GPUs, but that might not lead anywhere and depends on the availabity of special hardware.</p>
<h3>Thanks</h3>
<p>Again, special thanks go to Martin Lazarov and Armin Schmidt, who both read the full draft version and provided many comments and corrections.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2009/05/24/yes-master/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>I have no life and I must program</title>
		<link>http://shlomme.diotavelli.net/2008/05/31/i-have-no-life-and-i-must-program/</link>
		<comments>http://shlomme.diotavelli.net/2008/05/31/i-have-no-life-and-i-must-program/#comments</comments>
		<pubDate>Sat, 31 May 2008 10:44:53 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/?p=91</guid>
		<description><![CDATA[Some participants of this semester&#8217;s „Computational Linguistics” course (which is a code word for „10 different lecturers guide you through the wonderful world of algorithms and theoretical foundations of CoLi”) obviously lack a life and willfully extended their own homework assignment, writing small toolkits for finite state automata.
Surprisingly, all those toolkits were written in Python [...]]]></description>
			<content:encoded><![CDATA[<p>Some participants of this semester&#8217;s „Computational Linguistics” course (which is a code word for „10 different lecturers guide you through the wonderful world of algorithms and theoretical foundations of CoLi”) obviously lack a life and willfully extended their own homework assignment, writing small toolkits for finite state automata.</p>
<p>Surprisingly, all those toolkits were written in Python and made our C++-affine lecturer wonder if he probably should look into this language some more, apart from suggesting that we „maybe […] should get a life.”</p>
<p>So what does my toolkit do?</p>
<ul>
<li>Determinization of NFSAs to DFSAs</li>
<li>Creation of DFAs from simple regular expressions</li>
<li>Application of FSTs</li>
<li>dot graph output for *FSAs</li>
</ul>
<p>All in all, not very impressive, and not very hard. Still, if you like <a href="http://pyparsing.wikispaces.com/">PyParsing</a> and generator-expression-prone Python code, you might want to have look at <a href="http://diotavelli.net/files/tinyfst.tar.bz2">the TinyFST code</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/05/31/i-have-no-life-and-i-must-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gemmen der maschinellen Übersetzung</title>
		<link>http://shlomme.diotavelli.net/2008/04/22/gemmen-der-maschinellen-ubersetzung/</link>
		<comments>http://shlomme.diotavelli.net/2008/04/22/gemmen-der-maschinellen-ubersetzung/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 21:20:48 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:de]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2008/04/22/gemmen-der-maschinellen-ubersetzung/</guid>
		<description><![CDATA[Bewertung der Übersetzung von Satzfragmenten, gegeben: „Sporty Spice (Melanie Chisholm)”

Sportliches Gewürz (Melanie Chisholm)
sportliches Spice (Melanie Chisholm)

Die Fehler oben sind vielleicht lustig, aber erwartbar. Erstaunlich ist eigentlich, dass eines der Systeme die korrekte Übersetzung angibt. Andererseits wurde das System wahrscheinlich einfach nur auf Texten von Klatschseiten trainiert.
]]></description>
			<content:encoded><![CDATA[<p>Bewertung der Übersetzung von Satzfragmenten, gegeben: „Sporty Spice (Melanie Chisholm)”</p>
<ul>
<li>Sportliches Gewürz (Melanie Chisholm)</li>
<li>sportliches Spice (Melanie Chisholm)</li>
</ul>
<p>Die Fehler oben sind vielleicht lustig, aber erwartbar. Erstaunlich ist eigentlich, dass eines der Systeme die korrekte Übersetzung angibt. Andererseits wurde das System wahrscheinlich einfach nur auf Texten von Klatschseiten trainiert.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/04/22/gemmen-der-maschinellen-ubersetzung/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>GMM Code</title>
		<link>http://shlomme.diotavelli.net/2008/03/13/gmm-code/</link>
		<comments>http://shlomme.diotavelli.net/2008/03/13/gmm-code/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 21:35:34 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2008/03/13/gmm-code/</guid>
		<description><![CDATA[For one of the exercises, we had to implement the EM algorithm for Gaussian Mixture Models. I&#8217;ve spent a considerable amount of time on my solutions, either because I wanted to learn a new language (Scala version) or I wanted to not forgot an old one (C++ version), so I don&#8217;t want the code simply [...]]]></description>
			<content:encoded><![CDATA[<p>For one of the exercises, we had to implement the EM algorithm for <a href="http://en.wikipedia.org/wiki/Mixture_model">Gaussian Mixture Models</a>. I&#8217;ve spent a considerable amount of time on my solutions, either because I wanted to learn a new language (<a href="http://diotavelli.net/files/code/gmm.scala">Scala version</a>) or I wanted to not forgot an old one (<a href="http://diotavelli.net/files/code/gmm.cpp">C++ version</a>), so I don&#8217;t want the code simply rotting on my hard drive.</p>
<p>The C++ version isn&#8217;t that much faster than the Scala version if I remember my experiments correctly (about 4x). Judging from the call graph of the C++ version, most of the time is spent in the <i>exp</i> function anyway, which is as fast as it gets.</p>
<p><img style="margin-left:auto; margin-right:auto; display:block" src="http://diotavelli.net/files/img/gmm-callgraph.png" alt="Callgraph of the C++ version" /></p>
<p>The input file simply lists one float value per line, and the initial parameters for the Gaussians can be specified in the source files. Be advised that the number of Gaussians used to approximate the data needs to be known before the algorithm is run.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/03/13/gmm-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Need Some Music for Your Romantic Valentine Dinner?</title>
		<link>http://shlomme.diotavelli.net/2008/02/14/need-some-music-for-your-romantic-valentine-dinner/</link>
		<comments>http://shlomme.diotavelli.net/2008/02/14/need-some-music-for-your-romantic-valentine-dinner/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 17:16:46 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2008/02/14/need-some-music-for-your-romantic-valentine-dinner/</guid>
		<description><![CDATA[Go to Jason Eisner&#8217;s homepage, please don&#8217;t come back to complain.
Just to make it more geeky by nitpicking: if you are OOM, or have a segfault, you most likely won&#8217;t be able to finish your song.
]]></description>
			<content:encoded><![CDATA[<p>Go to <a href="http://cs.jhu.edu/~jason/fun/grammar-and-the-sentence/">Jason Eisner&#8217;s homepage</a>, please don&#8217;t come back to complain.</p>
<p>Just to make it more geeky by nitpicking: if you are OOM, or have a segfault, you most likely <b>won&#8217;t</b> be able to finish your song.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2008/02/14/need-some-music-for-your-romantic-valentine-dinner/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stockholm TreeAligner 0.8 „Gamla Stan” released</title>
		<link>http://shlomme.diotavelli.net/2007/12/13/stockholm-treealigner-08-%e2%80%9egamla-stan%e2%80%9d-released/</link>
		<comments>http://shlomme.diotavelli.net/2007/12/13/stockholm-treealigner-08-%e2%80%9egamla-stan%e2%80%9d-released/#comments</comments>
		<pubDate>Thu, 13 Dec 2007 00:14:10 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:en]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[treealigner]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2007/12/13/stockholm-treealigner-08-%e2%80%9egamla-stan%e2%80%9d-released/</guid>
		<description><![CDATA[It&#8217;s only a couple of months late, but we&#8217;ve just released a new version of the Stockholm TreeAligner to an awed audience. This release features the prototype implementations of TIGERSearch and alignment queries, which will be perfected in the next release, due in March 2008.
For those who are wondering what kind of code name Gamla [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s only a couple of months late, but we&#8217;ve just released a new version of the <a href="http://dev.ling.su.se/treealigner">Stockholm TreeAligner</a> to an awed audience. This release features the prototype implementations of TIGERSearch and alignment queries, which will be perfected in the next release, due in March 2008.</p>
<p>For those who are wondering what kind of code name <a href="http://en.wikipedia.org/wiki/Gamla_Stan">Gamla Stan</a> is: STA releases are named after Stockholms subway stations.</p>
<p>Align your trees while the release is still hot!</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2007/12/13/stockholm-treealigner-08-%e2%80%9egamla-stan%e2%80%9d-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TLT 2007</title>
		<link>http://shlomme.diotavelli.net/2007/12/06/tlt-2007/</link>
		<comments>http://shlomme.diotavelli.net/2007/12/06/tlt-2007/#comments</comments>
		<pubDate>Thu, 06 Dec 2007 10:46:56 +0000</pubDate>
		<dc:creator>shlomme</dc:creator>
				<category><![CDATA[coli]]></category>
		<category><![CDATA[lang:de]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://shlomme.diotavelli.net/2007/12/06/tlt-2007/</guid>
		<description><![CDATA[Es ist soweit, und die ganze Welt der Baumdatenbanken bekommt einen Geschmack vom Stockholm TreeAligner. Joakim wird morgen ein Poster über das „Gamla Stan”-Release von STA präsentieren (Paper).
Nachdem das Release nur ein paar Monate verspätet war, machen wir uns auch gerade daran, die technische Basis mit ein paar gezielten geradezurücken. Die Hauptattraktionen für das nächste [...]]]></description>
			<content:encoded><![CDATA[<p>Es ist soweit, und die ganze Welt der Baumdatenbanken bekommt einen Geschmack vom <a href="http://dev.ling.su.se/treealigner">Stockholm TreeAligner</a>. Joakim wird morgen ein Poster über das „Gamla Stan”-Release von STA präsentieren (<a href="http://tlt07.uib.no/papers/18.pdf">Paper</a>).</p>
<p>Nachdem das Release nur ein paar Monate verspätet war, machen wir uns auch gerade daran, die technische Basis mit ein paar gezielten geradezurücken. Die Hauptattraktionen für das nächste Release (Codename „Solna”, geplant für Mitte März) sollen Diff-Views für verschiedene Versionen eines Korpus und eine verbesserte &amp; erweiterte Suche per TSQL (TigerSearch, für einzelne Baumdatenbanken) und AQL (Alignment Query Language, für parallele Baumdatenbanken).</p>
<p>Helfer werden natürlich immer gebraucht – bei entsprechenden Programmierkenntnissen oder Interesse reicht eine Email; Arbeit gibt es genug!.</p>
]]></content:encoded>
			<wfw:commentRss>http://shlomme.diotavelli.net/2007/12/06/tlt-2007/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
