• strict warning: Non-static method view::load() should not be called statically in /home/sigpboo7/public_html/modules/views/views.module on line 842.
  • strict warning: Declaration of views_handler_field_comment::init() should be compatible with views_handler_field::init(&$view, $options) in /home/sigpboo7/public_html/modules/views/modules/comment/views_handler_field_comment.inc on line 50.
  • strict warning: Declaration of views_plugin_row::options_validate() should be compatible with views_plugin::options_validate(&$form, &$form_state) in /home/sigpboo7/public_html/modules/views/plugins/views_plugin_row.inc on line 135.
  • strict warning: Declaration of views_plugin_row::options_submit() should be compatible with views_plugin::options_submit(&$form, &$form_state) in /home/sigpboo7/public_html/modules/views/plugins/views_plugin_row.inc on line 135.

Welcome to SIGPWNED!

SIGPWNED is Andy Boothe’s personal website and blog. Welcome to it.

I’ll be using it for writing about lots of things: food, Austin San Francisco, software development, personal projects, entrepreneurship. And probably some funny pictures of cats, too. Because the internet just doesn’t have enough.

Feel free to send any thoughts on the site my way via twitter at @sigpwned.

Nutch 2 and ElasticSearch

Today I set up my first instance of Apache Nutch. It was much easier than I expected, actually.

I think I should have given Nutch a try long ago.

It took some jimmying to get Nutch 2 to work with ElasticSearch, though. Here’s how I got it done.

A Year of Mashable - 2012

At work I finally got around to doing a project I’ve been wanting to do for a long time: analyze the sharing behavior of a year’s worth of content at Mashable.

It’s no small project. First, a year’s worth of Mashable content must be collected, which ended up being 13,979 articles in total. Next, the author, publish date, headline, and full text of each post must be extracted from each page, which requires a (fortunately simple) custom scraper to be built. Next, the social resonance data of each article must be collected. For this analysis, I collected share counts for Twitter, Facebook, StumbleUpon, LinkedIn, Google+, and Pinterest, plus clicks from Bitly and per-article submissions from Reddit. Once all that data has been collected and structured, it must be analyzed. I found good ol’ Excel pivot tables to be perfect for most of the analysis. Tracking mentions of key topics from the past year, like Gangnam Style, requires the ability to do full-text searches against headlines and article content, so I indexed all of the data from above into ElasticSearch. It performed brilliantly.

I ended up making an infographic out of the data, and I posted it to the corporate blog of the company I work for. I’m proud of it, so I’m posting it here too.

By no means the finest infographic ever made, but hopefully not the worst one, either. Enjoy!

Stork, An Example Programming Language, Lesson 4: Variables

Welcome back!

For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 4 in a series of 10, so if you’re just joining now, you may want to take a peek at lessons 1, 2, and 3 to gear up a bit for this post.

Lesson 4 adds variables to Stork, which involves adding statements in addition to the expressions already in the language. The addition of variables provides fodder for some additional (and more interesting) static analysis as well. At the end of this lesson, Stork will be a working interpreter for simple numerical expressions with support for variables. (The variables will become much more interesting over the course of the next couple of lessons, which will add support for functions and control structures.)

The code for this lesson is available on github under the tag lesson4, and you can follow the discussion about this lesson on reddit. As a quick preview, though, Stork is getting cool:

$ java com.sigpwned.stork.Stork
>>> var x:Int
>>> x
ERROR: Variable may not have been initialized: x
>>> x = 1+2*(3+4)
15
>>> var y:Float
>>> y = x
15.0
>>> x = y
ERROR: Will not coerce to less precise type: Float -> Int
>>> x = (cast Int) y
15
>>> x+y
30.0
>>> ^D
$

The "Spirit Quest" to San Francisco: Day 2

Day two of Kelly and my #spiritquest from Austin to San Francisco was long, but also (mostly) picturesque and with some fun destinations. All in all, a pretty great drive.

The day started ugly, but got prettier from there. We started the day in North Texas, where we were greeted with the following sights and associated smells:

oil pump in North Texas

The "Spirit Quest" to San Francisco

As part of my move to San Francisco, Kelly and I are driving my car from Austin back to California this weekend. Over brunch with some friends, we decided this trip was actually a spirit quest. Surprisingly, no alcohol was involved in this decision.

We started the trip today, and I’m writing this post from our hotel room in Lubbock after 7 hours of driving. Today’s trip included many harrowing adventures — massive windmills, trippy dream catchers, and (of course) wrong turns — but we got some great pictures too. So far, this one is our favorite:

Stork, An Example Programming Language, Lesson 3: Expression Evaluation

Welcome back!

For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 3 in a series of 10, so if you’re just joining now, you may want to check out lessons 1 and 2 to gear up a bit for this post.

Lesson 3 covers the basics of compiler design (front end versus back end) and types, plus a very brief preview of static analysis. At the end of this lesson, Stork will be a working interpreter for simple numerical expressions.

The code for this lesson is available on github under the tag lesson3, and you can follow the discussion about this lesson on reddit.

(Basic) Compiler Theory

Most developers are familiar with the use of compilers like javac and gcc — instant program, just add source code — but aren’t familiar with their inner workings. Stork is intended to dispel some of the mystery around compilers, and its far enough along now to start discussing Stork in the greater context of compiler design.

In the most general sense, compilers are simply translators that turn program source code into executable instructions. There are many compilers: javac, the Java compiler, turns Java code into Java bytecode; gcc, the Gnu C Compiler, turns C code into native instructions, and so on. There are also similar programs called interpreters that execute program source code directly without first compiling them down to instructions, like ruby or the subject of this course, Stork. While interpreters are technically different from compilers, the same design principles apply, so the Stork interpreter will serve nicely as a platform for exploring simple compiler design.

Compiler Design: Front End, Middle End, and Back End

At a high level, compilers look like this:

Basic Compiler Design

jMonkeyEngine Tips, Tricks, and Gotchas

I’ve been having great fun hobbying around on a 3D rendering project in between lessons of stork lately. I’m not terribly familiar with 3D libraries and such, so to get myself started I’ve adopted jMonkeyEngine3 for some momentum. I’m not talking much about the project yet, but I’ll drop a hint here:

Anyway, I’ve noticed some… undocumented features of the engine, and I thought i’d scribble them down in case anyone happens to find them useful.

Stork, An Example Programming Language, Lesson 2: Expression Parsing

Welcome back!

For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 2 in a series of 10, so if you’re just joining now, you may want to check out lesson 1 to gear up a bit for this story.

Lesson 2 covers the basics of parsing numerical expressions using the tokenizer implemented in Lesson 1. Evaluation of these expressions will be handled in Lesson 3.

The code for this lesson is available on github under the tag lesson2, and you can follow the discussion about this lesson on reddit.

What is Parsing?

If a programming language is a (more) convenient language humans can use to describe tasks to computers, then parsing is the process of turning a program’s tokens into sentences, or “abstract syntax trees” (ASTs), that the computer can understand. For example, consider this simple mathematical expression for the area of a circle with radius 5:

3.14159*5*5

For this program text, the tokens would be 3.14159, *, 5, *, 5, and a parser would build the following AST for it:

Clearly, parsing is essentially “sentence diagramming” for a programming language.

This lesson covers how to transform a token stream into a parse tree like the above example. Looking at parse trees — the syntactic relationships among tokens — instead of the tokens themselves will make evaluating those expressions much easier in the next lesson.

Stork, An Example Programming Language, Lesson 1: Tokenization

Welcome back!

For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 1 in the series, so if you’re just joining now, you haven’t missed much!

The code for this lesson is available on github under the tag lesson1, and you can follow the discussion about this lesson on reddit.

What is Tokenization?

If a programming language is a (more) convenient language humans can use to describe tasks and processes to computers, then tokenization is the process of turning a program’s raw program text into words, or “tokens,” that the computer can understand. For example, consider this simple Python program for the factorial function:

def factorial(n):
     if n == 0:
         return 1
     else:
         return n * factorial(n-1)

For this program text, the tokens would be: def, factorial, (, n, ), :, if, n, ==, and so on. Looking at tokens — atomic units of program semantics — as opposed to characters makes the next lesson’s topic of “parsing,” or discovering the semantic relationships among the different parts of the program text, much easier.

In a very real sense, the tokenizer defines the vocabulary of the programming language.