Semantic Linguine

Saturday, January 26, 2013

There's a Bug in My Test!

Since becoming sold on TDD a while ago, I've noticed something bothersome. It's hard to write correct automated unit tests!

Unit test code is just as likely as any other program to contain bugs, logic errors, and defects. Code that passes the tests then is still likely to have bugs. Damn, all that work for nothing!

Not quite. There are plenty of benefits of writing unit tests prior to coding the actual deliverable product.

Writing unit tests first forces me to write the product code in a more testable, more modular style. This happens to be a more maintainable style also. So, it's easier to find and fix the inevitable bugs when they raise their dirty little heads.

Designing the unit tests makes me think about the program more deeply, also. I give more thought to the use cases, range of possible inputs, the required outputs, and possible exceptions. In short it makes me more thorough.

So I'm sticking with TDD.

Friday, August 13, 2010

Interview With Steve Vinoski

This is a fantastic interview explaining some of the awesomeness of the Erlang programming language as well as the REST architecture:

http://www.infoq.com/interviews/steve-vinoski-erlang-rest#

Friday, May 21, 2010

Self Documenting Code (Because I Hate Maintaining Comments)

Suppose you coded up some phone switch record parsing software a few months ago and now you're modifying it and you run across this method. What kind of records does it skip?


 /**
  * @return true if we should skip (NOT add to the database) the record.
  */
 public boolean skipEntity() {
   
     if ("119".equals(Call_Type_Code)
             &&  !Incoming_Trunk_Id.startsWith("10101")) {
         return true;
     }
   
     if ("005".equals(Call_Type_Code)
             &&  "00001".equals(Structure_Code)) {
         return true;
     }
   
     return false;
 }

You probably wouldn't know without looking it up in your data dictionary. So let's add some comments:


 /**
  * @return true if we should skip (NOT add to the database) the record.
  */
 public boolean skipEntity() {
   
     // Skip terminating access
   
     if ("119".equals(Call_Type_Code)
             &&  !Incoming_Trunk_Id.startsWith("10101")) {
         return true;
     }
   
     // Skip local calls
   
     if ("005".equals(Call_Type_Code)
             &&  "00001".equals(Structure_Code)) {
         return true;
     }
     return false;
 }

Oh ok, we skip terminating access and local call records. That was easy.

Problem: Now we have comments to maintain in addition to the code. Comments can be as hard to maintain as code. You have to make sure they stay with the correct block of code, that they still correctly describe the code, etc.

So instead, let's make the code self-documenting by splitting it up into more methods.


 /**
  * @return true if we should skip (NOT add to the database) the record.
  */
 public boolean skipEntity() {
   
     if (isTerminatingAccess()) {
         return true;
     }
   
     if (isLocalCall()) {
         return true;
     }
   
     return false;
 }

 public boolean isTerminatingAccess() {
     return ("119".equals(Call_Type_Code))
               &&  !Incoming_Trunk_Id.startsWith("10101"));
 }

 public boolean isLocalCall() {
     return ("005".equals(Call_Type_Code)
             &&  "00001".equals(Structure_Code));
 }

Is it a little more code? Yes. Is it more readable and maintainable code? Yes.

Wednesday, September 16, 2009

Parameterize Your SQL

Just read this from Antonio Cangiano's blog:

Parameterized queries are therefore efficient and go a long way towards preventing SQL injection attacks in your applications. They have virtually no downside.

Newbie developers often ignore the existence of this feature and end up irritating seasoned DBAs who have to deal with the consequences of their incompetence.

And all this time I've been avoiding parameterized queries because I thought creating prepared statements was less efficient. Turns out they get cached though. Doh!

After reading these brief articles, here and here, I vow to change my ways.

Friday, April 10, 2009

Dynamic DOM Node Element Creation in JavaScript

Lately I've been maintaining and upgrading a browser-based web app that consists mostly of JavaScript files. There's a lot of dynamic creation of DOM elements and I was getting tired of looking at line after line like this:

var elem=document.createElement('div');
elem.id = "mydiv";
elem['className'] = "padded label";
elem.appendChild(document.createTextNode("Customer Information "));
contentDiv.appendChild(elem);

That's just to create a single text node inside a div.

Or this kind of thing:

div.innerHTML="<b class="'smallBold'">Line Info:</b><br />" +
   "<table class="'smallLeftTable'">" +
   "<tr><th>Property</th>" + headers + "</tr>" +
   "<tr><td>Site:</td>" + siteHTML + "</tr>" +
   "<tr><td>Route:</td>" + routeHTML + "</tr>" +
   "<tr><td>Pair #:</td>" + pairNumberHTML + "</tr>" +
   "<tr><td>Pair Type:</td>" + pairTypeHTML + "</tr>" +
   "<tr><td>Status:</td>" + statusHTML + "</tr>" +
   "</table>";

That's even worse, I think. There's something buried deep within my psyche that makes me loathe using innerHTML.

So I decided I needed a slicker way to create DOM elements in my code. I remember reading that someone somewhere had created a function named $E()--in the spirit of prototype.js--for creating elements. I couldn't find that one though, so I wrote my own version of $E().

The HTML file below contains the code for $E() and some example usage, but it boils down to specifying the type of element to create, all of its attributes, and all of its child nodes in a single call. $E() then recursively builds the whole tree. The div with text, above, would be created like this:


var elem = $E('div', {id:"mydiv", className:"padded label"}, "Customer Information");

Remember when specifying a css class as an attribute, don't use the keyword "class". Use "className" instead.

Here's the HTML file containing $E(), its documentation, and examples. Go ahead and copy and paste it to a file on your system, pull it up in a browser, and start modifying and playing with it. I think you'll like how much smaller your .js files can be using $E().

<html>
<head><title>$E() Test</title>
   <style type="text/css">
     table.styled {
       border-collapse: collapse;
       background-color: yellow;
     }
   </style>
<head>
<body>

<!--
   Here's the example table, created the old-fashioned way.
-->
<table id="table1" border="1">
<caption>Cups of coffee consumed by each senator</caption>
<tr>
  <th>Name</th>
  <th>Cups</th>
  <th>Type of Coffee</th>
  <th>Sugar?</th>
<tr>
  <td>T. Sexton</td>
  <td>10</td>
  <td>Espresso</td>
  <td>No</td>
<tr>
  <td>J. Dinnen</td>
  <td>5</td>
  <td>Decaf</td>
  <td>Yes</td>
</table>
<br/>
<br/>


<!--
   This div is a root for the dynamically created tables.
-->
<div id="mydiv">
</div>


<!--
   The rest of this file is the script that runs our demonstration.
-->
<script language="javascript">

//
// Here it is: $E()!
//
// Returns a DOM Element or an entire tree of DOM Elements.
// Usage:
//        $E(text) -> Text node element
//        $E(Element) -> Element
//        $E(tag, attributes, children) -> Tree of DOM Elements
//    Where:
//         text = string
//         Element = DOM Element
//         tag = String -- The type of Element, e.g. "div"
//         attributes = object -- The attributes of the Element, e.g. {id:"mydiv", height:"50px"}
//         children = text | elem | list
//       Where:
//            text = String -- Appended as a Text node.
//            elem = Element -- Appended
//            list = Array -- A list of Strings or Elements (or both), appended
//
function $E() {
   if (arguments.length == 1) {
       if (typeof arguments[0] === 'string') {
           return document.createTextNode(arguments[0]);
       } else {
           return arguments[0];
       }
   }
   var elem = document.createElement(arguments[0]);
   if (arguments[1]) {
       for (key in arguments[1]) {
           elem[key] = arguments[1][key];
       }
   }
   if (arguments[2]) {
       if (arguments[2] instanceof Array) {
           for (var i = 0; i < arguments[2].length; i++) {
               elem.appendChild($E(arguments[2][i]));
           }
       } else {
           elem.appendChild($E(arguments[2]));
       }
   }
   return elem;
}


//
// Begin the demonstration by getting a handle to the root
// div we created above.
//
var mydiv = document.getElementById("mydiv");


//
// This demonstrates coding the same table as above using $E()
// and nested calls to $E().
//
mydiv.appendChild(
 $E("table", {id:"table2", border:"1"},
       [$E("caption", null, "Same table using $E()"),
                      $E("tr", {}, [$E("th", null, "Name"),
                                  $E("th", {}, "Cups"),
                                  $E("th", {}, "Type of Coffee"),
                                  $E("th", {}, "Sugar?")
                                 ]
                      ),
                      $E("tr", {}, [$E("td", {}, "T. Sexton"),
                                  $E("td", {}, "10"),
                                  $E("td", {}, "Espresso"),
                                  $E("td", {}, "No")
                                 ]
                      ),
                      $E("tr", {}, [$E("td", {}, "J. Dinnen"),
                                  $E("td", {}, "5"),
                                  $E("td", {}, ["Decaf"]),
                                  $E("td", {}, "Yes")
                                 ]
                      )
       ]));

// A little white space for looks.
mydiv.appendChild($E("br", {}, null));
mydiv.appendChild($E("br", null, []));


//
// Now the good stuff.
// Dynamically generate a table filled with data from an array.
//

//
// The data
//
data = [["T. Sexton", "10", "Espresso", "No"],
       ["J. Dinnen", "5", "Decaf", "Yes"],
       ["A. Johnson", "3", "Latte", "Yes"],
       ["B. Brown", "1", "Cappucino", "No"],
       ["C. Jones", "7", "Mocha", "Yes"]];

//
// Initialize the table
// Notice we use "className" (not "class") to set css class.
//
var table = $E("table", {id:"table2", border:"1", className:"styled"},
              [$E("caption", null, "Same table using data from an array, "+
                                   "with a few more rows and some style."),
               $E("tr", {}, [$E("th", null, "Name"),
                           $E("th", {}, "Cups"),
                           $E("th", {}, "Type of Coffee"),
                           $E("th", {}, "Sugar?")
                          ]
               )]);

//
// Populate the table from our data array
//
for (var i = 0; i < data.length; i++) {
   var tr = $E("tr", null, null);
   for (var j = 0; j < data[i].length; j++) {
       tr.appendChild($E("td", null, data[i][j]));
   } 
   table.appendChild(tr);
}

//
// Append the newly created table to mydiv
//
mydiv.appendChild(table);

// A little more white space.
mydiv.appendChild($E("br", null, null));
mydiv.appendChild($E("br", null, null));

</script>
</body>
</html>

Friday, November 21, 2008

Java MapReduce Improved

My MapReduce class is much improved, now fitting into a single class. I find that I mostly use the pmap() method for parallel processing of list elements.

Here is example usage using the same example as my previous Java MapReduce post:

import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Vector;
import java.util.Map.Entry;

import MapReduce;

public class WordCount {

 static final String doc1 = "This is document 1";
 static final String doc2 = "This is another document";
 static final String doc3 = "Document 3";

 public Object map(Object data) {
     String doc = (String) data;
     String[] tokens = doc.trim().split("\\s+");
     HashMap results = new HashMap();
     for (int i = 0; i < tokens.length; i++) {
         accumulate(tokens[i], results);
     }
     return results;
 }

 void accumulate(String s, HashMap acc) {
     String key = s.toLowerCase();
     if (acc.containsKey(key)) {
         Integer I = (Integer) acc.get(key);
         int newval = I.intValue() + 1;
         acc.put(key, new Integer(newval));
     } else {
         acc.put(key, new Integer(1));
     }
 }

 public Object reduce(Object input, Object acc) {
     HashMap h = (HashMap) acc;
     Collection entries = ((HashMap) input).entrySet();
     for (Iterator j = entries.iterator(); j.hasNext();) {
         Entry e = (Entry) j.next();
         Object key = e.getKey();
         Integer val = (Integer) e.getValue();
         if (h.containsKey(key)) {
             Integer oldval = (Integer) h.get(key);
             h.put(key, new Integer(val.intValue() + oldval.intValue()));
         } else {
             h.put(key, val);
         }
     }
     return h;
 }

 public static void main(String[] args) {
     Vector docs = new Vector();
     docs.add(doc1);
     docs.add(doc2);
     docs.add(doc3);
  
     HashMap results = new HashMap();
     WordCount wc = new WordCount();
  
     try {
         results = (HashMap) MapReduce.mapReduce(docs, wc, "map", wc, "reduce", results);
     } catch (Exception e) {
         e.printStackTrace();
     }
  
     System.out.println(results.toString());
 }
}

Pretty similar to the earlier usage but now you can pass in an instance and have access to it's data and methods from inside your map function - just be careful of synchronization if you're writing to any variable inside the map function.

You can also use a static method to do the mapping or folding.

Here's the new MapReduce class:

import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.List;

/**
* The MapReduce class provides static methods
* for encapsulating parallel processing.
* This class cannot be instantiated.
*
* The pmap() (parallel map) method in particular makes concurrent
* processing simple by abstracting away all the threading and
* synchronization.
*
* @author mike
*/
public class MapReduce extends Thread {

  /**
   * Concurrently maps each object A in the List inputList to a new object B by applying
   *   the method meth to every element in the list. Equivalent to calling
   *   pmap(inputList, obj, meth, 0)  I.e. with no limit on the number of threads.
   * @return ArrayList
   * @param inputList - List of objects to be mapped
   * @param obj - An instance of the class that defines meth
   * @param meth - The name of the method to be run on each object in the data list.
   *        The method must have the prototype:
   *           Object method_name(Object input)
   */
  public static List pmap(List inputList, Object obj, String meth) throws Exception {
      return MapReduce.pmap(inputList, obj, meth, 0);
  }

  /**
   * Concurrently maps each object A in the List inputList to a new object B by applying
   *   the method meth to every element in the list. Returns the new B objects in a
   *   new list. Mappings in the new list are in the same order and correspond to the
   *   objects in the original list but since each mapping is done in parallel,
   *   the evaluation order is undefined.
   *
   * @return ArrayList
   * @param inputList - List of objects to be mapped
   * @param obj - An instance of the class that defines meth
   * @param meth - The name of the method to be run on each object in the data list.
   *        The method must have the prototype:
   *           Object method_name(Object input)
   * @param maxThreads - The maximum number of threads to run at once.
   *        If 0, no limit. Use this limit to prevent OutOfMemoryErrors when
   *        processing large lists.
   */
  public static List pmap(List inputList, Object obj, String meth, int maxThreads) throws Exception {
      int size = inputList.size();
      int inc = maxThreads <= 0 ? size : maxThreads;
      ArrayList retval = new ArrayList(size);
      for (int i = 0; i < size; i += inc) {
          int end = (i + inc < size ? i + inc : size);
          List threads = createThreads(inputList, i, end, obj, meth);
          waitForThreads(threads);
          for (int j = 0; j < threads.size(); j++) {
              retval.add(((MapReduce)threads.get(j)).output);
          }
      }
      return retval;
  }


  /**
   * Calls meth(elem, accIn) on successive elements of list, starting with accIn == acc0.
   *   meth must return an accumulator which is passed to the next call.
   *   The function returns the final value of the accumulator.
   *   acc0 is returned if the list is empty.
   * @param list - The list to be folded into a single object.
   * @param obj - The instance of the class that defines meth.
   * @param method - The accumulating function.
   *        The method must have the prototype
   *          Object method_name(Object input, Object accIn)
   * @param acc0 - Initial accumulator
   * @return Object
   * @throws Exception
   */
  public static Object fold(List list, Object obj, String meth, Object acc0) throws Exception {
      Class[] types = {Object.class, Object.class};
      Method m = obj.getClass().getMethod(meth, types);
      for (int i = 0; i < list.size(); i++) {
          Object[] args = {list.get(i), acc0};
          acc0 = m.invoke(obj, args);
      }
      return acc0;
  }

  /**
   * Combines the operations of pmap and fold with no limit on the number
   * of concurrent threads.
   */
  public static Object mapReduce(List list, Object mapObj, String mapMeth, Object foldObj, String foldMeth, Object foldAcc) throws Exception {
      return mapReduce(list, mapObj, mapMeth, foldObj, foldMeth, foldAcc, 0);
  }

  /**
   * Combines the operations of pmap and fold with a thread limit.
   */
  public static Object mapReduce(List list, Object mapObj, String mapMeth, Object foldObj, String foldMeth, Object foldAcc, int maxThreads) throws Exception {
      List mapResult = pmap(list, mapObj, mapMeth, maxThreads);
      return fold(mapResult, foldObj, foldMeth, foldAcc);
  }


  static List createThreads(List list, int begin, int end, String obj, String meth) throws Exception {
      return createThreads(list, begin, end, obj, meth, true);
  }

  static List createThreads(List list, int begin, int end, Object obj, String meth) throws Exception {
      return createThreads(list, begin, end, obj, meth, false);
  }

  static List createThreads(List list, int begin, int end, Object obj, String meth, boolean isStaticMethod) throws Exception {
      ArrayList threads = new ArrayList(end - begin);
      for (int i = begin; i < end; i++) {
          try {
              MapReduce p = isStaticMethod ? new MapReduce((String)obj, meth, list.get(i)) : new MapReduce(obj, meth, list.get(i));
              threads.add(p);
              p.start();
          } catch (java.lang.OutOfMemoryError e) {
              System.err.println("Error: thread " + i);
              throw e;
          }
      }
      return threads;
  }


  static void waitForThreads(List threads) {
      for (int i = 0; i < threads.size(); i++) {
          Thread thread = (Thread) threads.get(i);
          try {
              thread.join();
          } catch (InterruptedException e) {}
      }
  }

//
// Non-static instance methods and fields
//
  Object obj;
  Method meth;
  Object input;
  Object output;

  // This class should never be instantiated except by its own static methods.
  private MapReduce(String classname, String meth, Object in) throws Exception {
      Class[] types = {Object.class};
      this.meth = Class.forName(classname).getMethod(meth, types);
      this.input = in;
  }

  // This class should never be instantiated except by its own static methods.
  private MapReduce(Object obj, String meth, Object in) throws Exception {
      this.obj = obj;
      Class[] types = {Object.class};
      this.meth = obj.getClass().getMethod(meth, types);
      this.input = in;
  }

  public void run() {
      Object[] args = {this.input};
      try {
          this.output = this.meth.invoke(this.obj, args);
      } catch (Exception e) {
          throw new RuntimeException(e);
      }
  }
}

Enjoy!

Friday, November 7, 2008

Vector v. ArrayList

Vector has always been by favorite Java List implementation. With it's auto-resizing and easy iteration features it's always been my first choice for storing data in list form. (As an aside, the new version of Sun's Java which requires Vector to be parameterized seems broken to me. But I digress.) Another feature of Vector that I always regarded as a bonus is its built-in synchronization. However, I've begun to rethink that.

Only recently, I've begun to do much concurrent programming, and usually using my MapReduce class. (I have a new, much improved version of that, which I will blog about soon.) MapReduce uses Vector but not in a way that requires synchronization. In fact, I can't think of a single place where I'm using Vector that does.

Does this matter? I think so. Being synchronized, every addition to or removal from a Vector requires a lock to be obtained and then released. Not very noticeable overhead in a small program, but lately I've been trying to shave milliseconds off the running time in each module of a program that runs for over two hours.

Enter ArrayList. I never used ArrayList before but apparently it is basically equivalent to Vector except without synchronization. So I ran a quick test of adding and then removing a million Integer objects from both Vector and ArrayList. (Care had to be taken to avoid including garbage collection time in the results.) Guess what? ArrayList ran in about 3/4 of the time.

I think I have a new favorite Java List implementation.