About Guido Tapia

Over the last 2 years Guido has been involved in building 2 predictive analytics libraries for both the .Net platform and the Python language. These libraries and Guido's machine learning experience have placed PicNet at the forefront of predictive analytics services in Australia.

For the last 10 years Guido has been the Software and Data Manager at PicNet and in that time Guido has delivered hundreds of successful software and data projects. An experience architect and all round 'software guy' Guido has been responsible for giving PicNet its 'quality software provider' reputation.

Prior to PicNet, Guido was in the gaming space working on advanced graphics engines, sound (digital signal processing) engines, AI players and other great technologies.

Interesting Links:

Fluent python interface for Machine Learning

I often say that Machine Learning is like programming in the 60s, you prepare your program, double check everything, hand in your punch cards to the IBM operator, go home and wait.  And just like back then, if you had a bug in your code it would mean a huge amount of wasted time.  Sometimes these things cannot be helped, for instance; it is not uncommon to leave a feature selection wrapper running over the weekend only to find on Monday morning that you got an out of memory error sometime during the weekend.  This article explains one way to reduce these errors and make your code less buggy.

Less code = less bugs

This is the only truth in software development.  A bug free system is only possible if it also contains no code.  So we should always aim to reduce the amount of code needed.  How??

  • Use tried and tested libraries
  • Write reusable code and test this code enough to have confidence that it works
  • Only use this reusable code
  • Whenever possible test your new code
  • Write expressive code.  Make logical bugs obvious.


All libraries are full of bugs, again code=bugs so this is of no fault of the library.  However, if a library has lots of users you can be fairly certain that most bugs you will have been found and hopefully fixed.  If you are pushing the boundaries of the library you will inevitably also find bugs but this is not the general case.  Usually, a well-respected library should be reasonably safe to use and to trust.

Reusable Code

Most libraries you use are generic, meaning that they can be used in many contexts.  Depending on your job you will need something more specific.  So write it, wrap your libraries in an abstraction that is specific to what you do.  Do this and then TEST IT!!!  Every time you find a use-case that your abstraction does not support, write it and test it.  Use scikit-learns dummy datasets to create reproducible test cases that will guarantee a certain feature works for your given use case.

Try to always maintain this abstraction separate from any specific predictive project and ensure that it is project agnostic.

Fluent interfaces for ML

This article focuses on using your reusable code wisely aiming to minimize bugs and enhance the expressiveness of the code.

Expressiveness is a key to writing logically correct code.  If you want all rows with a date greater than the start of this year it is much easier to catch a logical bug in this code:

filtered = filter(data, greater_than_start_of_this_year)

Instead of this code:

filtered = filter(data, lambda row: row.date_created >=
  date(date.today().year(), 1, 1))

Whilst the ‘greater_than_start_of_this_year’ function has the same functionality as the lambda expression in the second example it differs in several important ways:

  • It is easily tested:  It is a separate function totally isolated from the context it is running in, this makes testability much easier.
  • It is much, MUCH easier to read and review (it is more expressive).

This expressiveness is sometimes described as ‘declarative’ where the non-expressive form is sometimes called ‘imperative’.  You should always strive to write declarative code as it is easier to read.

One of the best ways, I have found to write declarative code is to use fluent interfaces. These interfaces were popularized by jQuery and then by .Net Linq expressions and others.  A sample fluent jQuery snippet is:

    .css("color", "blue")
    .append("Some new text");

It’s funny but this ‘fluent’ style of programming was slammed in the late 90s as error prone, Marin Fowler identified ‘Message Chains’ as a code smell that should be remedied however, I have found totally the opposite effect.  Fluent programming interfaces are easier to read, this means less bugs.

How can this be applied to machine learning?  Easy, have a look at the following code:

# load training data
classifier = linear_model.LogisticRegression()
X, y = load_train_X_and_y(6e6)

# replace missing values with the mode for
#   categoricals and 0 for continous features
X = X.missing('mode', 0)

# split the training set into categoricals and
# numericals features
X_categoricals = X[X.categoricals()]
X_numericals = X[X.numericals()]

# do some feature engineering (add log and linear combinations
# for all numericals features). Scale the numerical dataset and
# append one hot encoded categorical features to this dataset.
# Then cross validate using LogisticRegression classifier and
# 1 million samples.
  cross_validate(classifier, 1e6)

The comments in the above code are totally redundant, the code pretty much documents itself; see:

classifier = linear_model.LogisticRegression()
X, y = load_train_X_and_y(6e6)
X = X.missing('mode', 0)

X_categoricals = X[X.categoricals()]
X_numericals = X[X.numericals()]

  cross_validate(classifier, 1e6)

I would then add a comment at the end of this code block, something like:

# 0.98 +/- 0.001 – took 2.5 minutes

Then commit this experiment to git.

The fact that I can trust my reusable code means I just have to review the code I write here and given the expressiveness of the code finding bugs is usually very straight forward.

After several experiments this is what a source file will look like.  See how easy the code is to read.  See how simple it is to review past experiments and think about what works and does not work.

classifier = linear_model.LogisticRegression()
X, y = load_train_X_and_y(6e6)
X = X.missing('mode', 0)

  cross_validate(classifier, 1e6)
# 0.92 +/-0.0002  

  cross_validate(classifier, 1e6)
# 0.90 +/-0.001  

  cross_validate(classifier, 1e6)
# 0.86 +/-0.003

My wrapper for pandas and scikit-learn is available here and depends on naming conventions described here.  But I encourage you to write your own.  You need confidence in your code and the only way to achieve that is to write it and test it yourself to your own level of comfort.

Naming Conventions in Predictive Analytics and Machine Learning

In this article I am going to discuss the importance of naming conventions in ML projects. What do I mean by naming conventions?  I mainly mean using descriptive ways of labelling features in a data set.  What is the reason for this?  Speed of experimentation.

Naming Conventions

  • Categorical columns start with ‘c_’
  • Continuous (numerical) columns start with ‘n_’
  • Binary columns start with ‘b_’
  • Date columns start with ‘d_’

Examples of Benefits

Once your datasets is labelled clearly with these conventions then experimenting with features becomes very fast.

cv = functools.partial(do_cv, LogisticRegression(), n_folds=10, n_samples=10000)
cv(one_hot_encode(X), y) # One hot encode all categorical features
cv(contrasts(X), y) # Do simple contrast coding on all categorical features
cv(bin(X, n_bins=100), y) # Split all continuous features into 100 bins
X = engineer(X, ‘c_1(:)c_2’) # Create a new categorical feature that is a combination of 2 other
X = engineer(X, ‘n_1(*)n_2’) # Create a combination of 2 numericals (by multiplication)
X = engineer(X, ‘n_1(lg)’) # Create a log of feature ‘n_1’
X = engineer(X, ‘(^2)’) # Create a square feature for each numerical feature
X = engineer(X, ‘(lg)’) # Create a log feature for each numerical feature

In a real world example this would look something like:

X = remove(X, dates=True)
for n1, n2 in combinations(X, group_size=2, numericals=True): X = engineer(X, n1 + ‘(*)’ + n2)
for c1, c2 in combinations(X, group_size=2, categoricals=True): X = engineer(X, c1 + ‘(:)’ + c2)
X = engineer(X, ‘(^2)’)
X = engineer(X, ‘(lg)’)
cv(X, y)


The resulting DSL from using good naming convention leads to very clear code that relates directly to the data munging operations being done.  Another benefit is that but once your ‘one_hot_encode’ method is written and tested you can trust it for future projects (as long as they use the same naming conventions).

Using private partial classes to hide implementation details of an interface. Workaround for package level protection in C#

I miss very few things from the Java language, one gem I really miss is the package-private accessibility modifier. This was so useful, your IDE colour coded your package classes in another colour so you knew they were not part of the public API. You could skim read the files in a package (namespace) and see exactly what you needed to look at, ignoring all low-level implementation details.

This unfortunately is not in C#, the closest C# gets is the internal modifier. I personally really dislike this modifier as I think it has contributed to the nightmare that is 100-200 project solutions which are so common amongst some .Net shops.

This pattern is an alternative, I think its a very common alternative but recently during a code review I explained it to someone who appreciated the experience so I thought I’d write it up.

Often, C# developers will do this kind of encapsulation using nested private classes. I have a big problem with this as it leads to those 2-3k line files which are unintelligible. So why not just make those nested classes private partials? Let’s see how this would work.

Let’s assume we have a namespace Clown whose responsibility is to create clowns for customers (i.e. like a clown booking service for kids parties). The customer basically fills in the details that their clown should have and then books a clown for their party.

The details are specified using an instance of ClownSpecifications:

public class ClownSpecifications {
  public bool KidFriendly { get;set; }
  public bool Fun { get;set; }
  public bool Scary { get;set; }
  public bool Creepy { get;set; }

The clown itself is simply an implementation of the IClown interface. This interface is the only thing the user ever sees.

public interface IClown {
  void DoYourThing();

And then we need a clown factory that builds clowns based on the provided specifications:

public partial class ClownFactory
  public IClown CreateClown(ClownSpecifications specs) {
    if (specs.Creepy && specs.Scary) { return new ThatClownFromStephenKingsBook(); }
    if (specs.Creepy) { return new TheJoker(); }
    if (specs.KidFriendly && specs.Fun) { return new Bobo(); }
    if (specs.Fun) { return new RudeClown(); }
    return new GenericBoringClown();

  private partial class ThatClownFromStephenKingsBook {}
  private partial class TheJoker {}
  private partial class Bobo {}
  private partial class RudeClown {}
  private partial class GenericBoringClown {}

A few things to notice here. The first is that the ClownFactory itself needs to be marked partial:

public partial class ClownFactory

This is required simply because there is no way to create top level private partial classes.

Secondly, the implementation classes are defined in a super minimalistic fashion:

private partial class ThatClownFromStephenKingsBook {}

They don’t event implement the IClown interface in this definition.

So now an implementation of IClown looks like this:

public partial class ClownFactory {
  private partial class ThatClownFromStephenKingsBook : IClown {
    public void DoYourThing() {
      // ...

That’s it, this is actually working code. And the great thing about it is that your namespace now looks like this:

See, you can now more easily tell that the public API of that namespace is IClown, ClownSpecifications and ClownFactory. To clean this up even more you could create a new directory called impl and hide the implementations there. I personally do not do this as then Resharper starts yelling at me about mismatching namespaces.

Solid Principles: Part Two – Open Closed Principle

Software entities (classes, modules, functions, etc.) should be open for extension but closed for modification.‘ [R. Martin]

The OCP is a set of strategies based on inheritance and polymorphism that aims to make code more extensible with fewer side effects when extending. The way side effects are controlled is by adding functionality to the system without modifying any existing code.
The key to the OCP is programming to abstractions, i.e. Interfaces and base classes. But not only programming to abstractions, but doing it well without using many of the common pitfalls that violates the OCP.
An example is in order. Let’s assume we have a Library stock management system. The system has several types of Stock items. These could be: Magazines, Periodicals, Books, DVDs, CDs, etc. Each of these items may have their own checkout rules and we may program this as follows:

class OrderProcess:
  def checkout_item(item):
    var due_date
    switch (item.type)
        due_date = now.AddWeeks(2)
      case ITEM_TYPE_DVD:
        due_date = now.AddWeeks(1)
      case ITEM_TYPE_BOOK:
        due_date = now.AddWeeks(6)

This is a neive implementation because every time a new stock item type is added we will need to add a case to this statement (and any other switch statements that switch on item.type). So a better implementation, one that respects the OCP would be.

interface ItemType:
  def get_due_date(from)
  def checkout()

And we could have implementations like this:

class MagazineItemType implementes ItemType:
  def get_due_date(from):
    return from.AddWeeks(2)

  def checkout()
    var due = this_get_due_date(now)
    // Any other checkout processes applicable to Magazines

class DVDItemType implementes ItemType:
  def get_due_date(from):
    return from.AddWeeks(1)

  def checkout()
    var due = this_get_due_date(now)
    // Any other checkout processes applicable to DVDs

class OrderProcess:
  def checkout_item(ItemType item):

So now, if for any reason we needed to add a new Item Type, say Blu-ray we can just create a new class and the system will be able to handle it without modifying any existing code.
The reason the system now ‘magically’ works with a new Item Type (Blu-ray) without modifying any code is that the high level functions of the system do not know anything about Magazines, DVDs, etc. They simply know about the abstraction which is the ItemType interface.
As we’ve seen, switch statements or long if chains can be a smell that you’re violating the OCP. Other signs include having code like this in your system:

  def checkout_item(item):
    if (item.type is ITEM_TYPE_BLU_RAY)
      throw error ('Blu-ray cannot be checked out')

Having high level functions that are even remotely aware of concrete types is an indication of future heart ache. It is also important to note that even low level types should not know about each other. For instance, a Magazine should not know about DVDs. These relationships are also violations to the OCP.
It is important to note that all of these techniques are ways to manage the complexity of source code. Now, inheritance and polymorphism themselves are complex tools so like always, you need to be judicious in your usage of inheritance hierarchies. It is important to adhere to the OCP when you think part of a system is likely to change. For instance, in the example above, it is perfectly reasonable for there to be new Stock Item Types in the future so creating a hierarchy of these types is a good idea. Other areas in the system which are not likely to change can violates the OCP. The thing is not to be religious about any technique but if you are going to ignore an OCP violation (or any other technique) do so consciously.

Solid Principles: Part One

Over the coming weeks I plan to do a bit of a study on the SOLID principles.  SOLID stands for:

  • Single Responsibility
  • Open-Closed
  • Liskov Substitution
  • Interface Segregation
  • Dependency Inversion

The term was coined by Robert Martin [http://cleancoder.posterous.com/].

The five principles if used judiciously should result in code that is easier to maintain by being highly decoupled and allow the changing of specific implementation details without (or with less) friction.

Like every principle/guideline in software development the SOLID principles need to be understood but not used blindly.  It is very easy to over architect a solution by being too dogmatic about the use of any guideline.  You do however, need to be aware when a violation of SOLID principles occurs and make that decision based on its context and merits.

Single Responsibility Principle – SOLID Principles

Robert Martin describes the Single Responsibility Principle (SRP) as: “A class should have only one reason to change“(1).  I think the best way to get our heads around this concept is to view some code.  So let’s consider the following example which is a business rules object that defines how jobs are handled in an issue tracking system.

class JobHandler(db, query_engine, email_sender):
  this.db = db
  this.query_engine = query_engine
  this.email_sender = email_sender

  def add_job(job):

  def delete_job(job):

  def update_job(job):

  def email_user_about_job(job):
    this.email_sender.send(job.get_html_details(), job.user.email)

  def find_all_jobs_assigned_to(user):
    return this.query_engine.run("select all jobs assigned to: ", user)

  def find_all_completed_jobs(user):
    return this.query_engine.run("select all jobs with status: ", "completed")

So, what is the jobs handler doing?

  • Doing basic CRUD operations on the jobs (add/delete/update).  We could also assume that we would do validation in these methods also.
  • Doing queries on jobs.  These could potentially get very complex if we add pagination support, etc.
  • Doing workflow functions, such as email users.

Let’s critically review this code.  What can we see?

  • There are 3 dependencies (db, query_engine and email_sender)
  • There is low cohesion (http://en.wikipedia.org/wiki/Cohesion_(computer_science)) which is the ‘smell’ that Robert Martin was trying to address with this principle.  Basically cohesion means that we have dependencies that are only used by part of a class.  Low cohesion is usually an indication that a class is doing too much (or violates the Single Responsibility Principle).
  • The name Handler, Controller, Manager, Oracle, Deity are all indications that you have a class that could be potentially too loosely defined and which in turn may have too many responsibilities.
  • If we wanted to have a unit test to test the work flow of the system we would also need to instantiate a db and a query_engine dependency.  This adds friction to our tests and usually results in poor test coverage.

I think it’s clear that the above object has 3 obvious responsibilities these are:

  • Performing validation and CRUD like operations on a job
  • Performing complex queries on jobs
  • Managing workflows as they relate to jobs

So perhaps a better design would be something like:

class JobRepository(db):
  this.db = db

  def add_job(job):

  def update_job(job):

  def delete_job(job):

class JobFinder(query_engine):
  this.query_engine = query_engine

  def find_all_jobs_assigned_to(user):
    return this.query_engine.run("select all jobs assigned to: ", user)

  def find_all_completed_jobs(user):
    return this.query_engine.run("select all jobs with status: ", "completed")

class JobWorkFlow(email_sender):
  this.email_sender = email_sender

  def email_user_about_job(job):
    this.email_sender.send(job.get_html_details(), job.user.email)

So let’s critically analyse this code.

  • We can see we have increased the number of classes to 3.  This arguably increases complexity of the system as it adds modules that need to be understood.
  • We can see that each class is highly cohesive and very small and focused.  This is a good thing.
  • We can see that any unit test only has a single dependency to initialise or mock to test a class.  This will encourage developers to keep the test quality up to a good standard.
  • If we place these 3 classes in a well named namespace such as ‘jobs’ it could in fact ease the complexity of the system (contradicting the first item in this list).  As we could just browse the file names without even opening them to know exactly what functions are done by each class.


Conclusion? Well there really is no conclusion.  It is important to realise that this is a trivial example whose responsibilities were obvious.  Many times separating concerns is not as easy and decoupling these concerns may be very difficult.

In the example above I would comfortably say that the refactored code is better than the original code but this may not be the case with a real world example. Now when you see a class that as; low cohesion, too much responsibility,  too many reasons to change, too many dependencies, etc.  You can recognise this as a smell and violation of the SRP.  You can then make the educated decision as to whether refactoring the code will result in better, cleaner more maintainable code.

On the other hand, refactoring is a hard process and the more you do it the easier it becomes, so do not be scared to take a little bit of time to refactor something like this.  You will find that the case for not fixing SRP violations will become less compelling.

A faster, better sql server and sql azure log appender for log4net

After much effort, trying to get the default DatabaseAppender working in log4net I decided to write my own, so with the help of one of my alpha geeks (Tnx Chinsu) we created this awesome (its awesome because it uses batch inserts and actually works on Azure) database appender for Log4Net.


Use at your own risk, I did have to modify the code slightly to remove an internal dependency and I did not test this in prod after the modification.  However the modification was very minor and should not cause any issues.

Also remember to schedule a service that deletes your old log files.

Guido Tapia

Understanding Nancy – Sinatra for .Net

For a project we are currently working on here at PicNet we decided to forgo the bloat of ASP.Net and Mvc and go for a super light weight web platform. We tried Kayak but this was a little too ‘bare’ so we then shifted our attention to Nancy which is a Sinatra clone (well, “inspired” project) in .Net.

We’ve now been working with Nancy for a few weeks and have thoroughly enjoyed the experience. One of the tougher parts of getting started with Nancy is the lack of documentation and tutorials, so I thought I would put what I’ve learned over the last week or so down on paper in the hopes that others may benefit.

This post is an introduction to the internal components of Nancy and how the nuts and bolts work (from a high level). In upcoming posts I hope to delve deeper into some of these areas and do a tutorial to bring all this into context.

Capturing http requests and sending responses back to the client are left to the ‘Host’. The host is really a web server and Nancy comes pre-bundled with a couple of options. You can use IIS, WCF and finally Nancy comes with its own “Self” implementation of the low level http hosting functionality. Both WCF and Self hosts can be used in windows services and in command line apps.

The host is also responsible for initialising and delegating request processing to the NancyEngine. Internally (and hence not really important uless developing Nancy) the host usually uses a NancyBootstrapper to initialise the engine. The bootstrapper is responsible for initializing all modules of the system, we will iterate through most of those models in this document.

Example: Setting up a WCF host with http and windows authentication binging:

WebHttpBinding whb =
  new WebHttpBinding(WebHttpSecurityMode.TransportCredentialOnly);
whb.Security.Transport.ClientCredentialType = HttpClientCredentialType.Windows;

var host = new WebServiceHost(new NancyWcfGenericService(), new Uri(serverUrl));
host.AddServiceEndpoint(typeof (NancyWcfGenericService), whb, "");
ServiceMetadataBehavior smb = new ServiceMetadataBehavior {
  HttpGetEnabled = true

Internally, Nancy uses the bootstrapper to initialise all required modules and the DI container that will be used by other core modules. The default Nancy project comes with its own IoC implementation but if you are a fan of other IoC containers then Nancy comes pre-bundled with support for Ninject, StructureMap, Unity and Windsor.

The NancyEngine is the brains that the Host delegates requests to. The engine manages the request lifecycle. When the Engine receives a request from the Host, the engine:

  • Creates the request context
  • Invokes the pre-request hook (if specified). If the pre-request hook returns a Response then then no Routing is used as Nancy assumes the Request has been processed.
  • If the pre-request hook does not return a Response then Nancy looks for a valid Route to handle this Request. Routes are managed by Modules. (Note: Modules also have pre/post request hooks described below).
  • The post-request hook is then called with the current context.

If you are not writing your own bootstrapper I would leave the pre/post hooks alone and use the NancyModule’s Before/After pipeline support.

The NancyContext has a reference to the Request, Response (once it has been resolved) and the Request or context Items. The context Items is just an in-memory dictionary ideal for storing Request wide data.

The Request is a pretty standard http request having a Body, Cookies, Session, Files, Form, Headers, the Method (get/post/etc), Protocol, Query and Uri details.

The module(s) register the routes handled by the system. Each module can have a root path (module path) meaning that it is only responsible for that path branch.

Modules can also define pre request hooks by adding items to the Before member. post-request hooks are added using the After member. The rules for pre/post hooks are the same as for the NancyEngine hooks. Basically if a pre-request hook returns a response then processing of that Request ends and the Response is passed to the post-request hook.

The module also has access to the current context.

Example: Setting up simple routes in a custom NancyModule

public class MyModule : NancyModule
  public IEModule() {
      Get["/"] = x => GetMainWindow();
      Get["/resources/{name}"] = x =>Response.AsFile(x.name);
      Get["/dl/{filePath}"] = x =>
        GetDownloadResponse(Response.AsImage(filePath), filePath);
      Post["/open/{filePath}"] = x =>
        GetDownloadResponse(Response.AsImage(filePath), filePath);
      Post["/ul/{dirid}/{fileName}"] = x => UploadFile(x.dirid, x.fileName);
  private Response GetDownloadResponse(Response r, string fileName) {
      "attachment;filename=" + new FileInfo(filePath).Name);
    return r;

Each request has to return a Response to the client. There are currently implicit casts in-place that allow you to return objects of the following types:

All of the non Response types are simply wrapped in a Response. The Response has the StatusCode, Headers, Cookies, ContentType and the Contents (as a stream).

The module also has access to helper methods that you can use to create Reponses. These are:

  • Reponse.AsFile
  • Reponse.AsCss
  • Reponse.AsImage
  • Reponse.AsJs
  • Reponse.AsJson
  • Reponse.AsRedirect
  • Reponse.AsXml

Views / ViewEngines
Nancy comes pre-packed with support for NDjango, Razor and Spark. Of course you could also just server html.

Example: Delegating rendering of model data to the current View engine:

Get["/"] = x =>
  IEnumerable model =
  model = model.OrderBy(e=>e.EventDate).Take(10);
  // Current view engine will render this model
  return View["views/index",model.ToArray()];

That is pretty much every important module in Nancy. Who would have thought that a fully functional web framework was conceptually so simple. Next bat time we will go through a step by step tutorial in getting started with Nancy.


Guido Tapia Software Development Manager PicNet

PicNet’s Closure Library Controls Now Public!!!

It gives me great pleasure to anounce that PicNet’s closure control library is now public.  We have started with only 2 of our controls but over the next few weeks we will port all of our control code to this project.

For a demo just visit http://picnet.github.com/picnet_closure_repo/demos/.

The google closure library is a project that I think is greatly under represented and I hope that this on going contribution does something to fix that.

Currently the library includes a Date Range Picker and a ‘off the page’ slide in and out panel.  See the demo for the full details.  

This demo will also be updated every time a new Control/Component is added.


Guido Tapia

Google Closure Template Project – Getting Started Quickly with Google Closure Tools

One of the hardest things about getting started properly with Google Closure Tools is that it has a huge amount of framework boilerplate that needs to be organised in order to write your first line of code. I have found that the effort is definately worthwhile but the process can definatelly be simplified.  These views are shared by many in the community and projects like plovr from Michael Bolin (author of the most awesome Closure: The Definitive Guide) offer alternatives to get started in a more efficient and intuitive way.

I created this sample project to get you going on your own closure project very quickly.  I did not use plovr as I’m not a fan of the additional server (hence additional deployment step, hence one more thing that can go wrong) design.

To get started with this sample project you need to do the following:

  1. Download this zip and extract somewhere on your dev box <installdir>.
  2. Download (if required) python 2.6 or 2.7.  Note: do not download 3+ as the calcdeps.py script does not work in 3 without modifications.
  3. SVN Checkout ‘http://closure-library.googlecode.com/svn/trunk’ to <installdir>\lib\closure-library (which is currently empty).  You may want to set up externals if you are using svn in your project.  Once this step is completed you should have the following directory tree in your project <installdir>.
    • <installdir>\src\…
    • <installdir>\lib\soy\…
    • <installdir>\lib\closure-library\
    • <installdir>\lib\closure-library\closure\…
    • <installdir>\lib\closure-library\third_party\…
  4. Edit <installdir>\build.bat, ensure python install path is correct
  5. Run <installdir>\build.bat
  6. If everything above worked and both <installdir>\index.html and <installdir>\index.compiled.html show the following text:SumExpected Value: 15Actual Value: 15

Thats it, you have a running closure project, you can now use goog.require / goog.provide in your own source and have the above scripts manage all your dependencies.  You can also use soy templates without any additional work.

Now just look around the code and familiarise yourself with the structure of the code.  I’ll go through the main items here:

build.bat: The build file.  I used a windows batch file as using Ant is too java specific, using NAnt or MSBuild is too .Net specific.  Apologies to non win people but changing this to any other build tool should be very easy.

externs.js: If your source requires access to external APIs then they need to be defined in this file.  See here for documentation on extern files (http://code.google.com/closure/compiler/docs/api-tutorial3.html)

index.html: A development html file.  This is used in development mode to debug your application.

index.compiled.html: Production html file.  This file includes the compiled javascript and soy files

requirements.js: This file defines the entry point into the application.  This is required as the closure compiler and calcdeps.py do not accept html files as valid input files.

deps.js: This file is created during the build process and its role is to define the dependencies of the application.  This dependency tree is used in development mode (index.html) to load all the neccessary js files.

lib\: Contains the compiler, the closure library, soy library files, etc

lib\closure-library\: An svn extern link to http://closure-library.googlecode.com/svn/trunk (You need to set this up yourself as its 20+ MB)

src\: Your source code.  It is recommended you use a Java like namespace directory structure.  Eg.  Class picnet.ui.tools.Grid should live in src\picnet\ui\tools\Grid.js file.  Note: google use lowercase file names (i.e. grid.js) but this makes little sense to me so I choose to spit in the face of conventions.

Thanks All

Guido Tapia

IndexedDB Google Closure extern file

I’ve been doing some work with IndexedDB (on Mouse Eye Tracking) recently and created a closure extern file (download) to help with the process.

Hope this helps someone :)

Please let me know if someone would like me to create an open source project somewhere to continue improving this.

Note: The specs and FF implementation appear to be changing very quickly so this is just a guide as I’m sure it will be out of date soon.

Please assume MIT license.


Guido Tapia