November 30, 2009

Simple HTML Parser in Objective C

For my current project I needed a way to fetch remote html and then parse it into a more accessible data form. So I took my Java XML Parser work and ported it over to Objective C and extended it to work with HTML, which tends to be far more messy and broken... grr. To combat this, unlike a full html parser, this converts it to a psudo xml form, where all character data between > and < and > or /> is appended to the tag string.  The down side to this is that you need to parse out any needed tag attributes separately, but that is a price I am willing to pay in this case.

Check out the files below for the code...
HTMLNode.h
NTMLNode.m


Using the HTMLNode class should be simple enough, just import the HTMLNode.h file and then use the example below to get started. It is good to note that this parser expects clean and valid HTML/XHTML, however most sites have some issue or mistake. This may cause you a few headaches, it did for me. Still the parser should get most if not all the tags, so in this case use the search function "-(HTMLNode*) search:(HTMLNode*) root: (NSString*) term" to find a containing div tag and then use getChildN for traversing the rest.

// Setup and build html node tree in root...
NSString *url = @"http://www.google.com";
HTMLNode *root = [[HTMLNode alloc] init];
[root buildFromURL: url: root];

// Get the head tag which should be root child 0...
HTMLNode *headnode = [root getChildN:0];

// The tag of the head node should be "head"...
NSLog([headnode getTag]);


As usual the code is free to use, but please give me some credit if it is used in a large project, or at least leave a comment about what it was used in.

November 27, 2009

Creating an Array of NSDictonary Objects

I had created an interface for the NSTableView class in InterfaceBuilder and needed a way to update the table with items. The easiest way seemed to be with an array of NSDictionary objects. But as I was not quite fimilar with the NSDictionary class I first had to look up how to create and fill one. Below is a base example I came up with.

// Aloc and Init Array
NSMutableArray *array = [[NSMutableArray alloc] initWithCapacity:1];

// Setup keys
NSArray *keys = [NSArray arrayWithObjects:@"Name", @"Job", nil];

// Setup values
NSArray *values = [NSArray arrayWithObjects: @"Epic Box", @"running", nil];

// Add new NSDictionary with keys and values
[array addObject:[NSDictionary dictionaryWithObjects: values forKeys: keys]];

To get a value from a NSDictionary object you can do as follows...

//Using the array from the example above
NSInteger index = 0;
NSString *key = @"Name";

// Get value
NSString *result [[array objectAtIndex: index] objectForKey: key];

Remote File Request to NSString in Objective C

A current side project in Obj C that I m working on required a way to fetch a remote HTML file and parse through it to get the url of links and images. The first step required a way to get a remote file and store it as a NSString. The code below is an example of how to so.

NSString *url = @"http://www.google.com";
NSURLRequest *urlrequest = [ [NSURLRequest alloc] initWithURL: [NSURL URLWithString:url] ];
NSData *returnData = [ NSURLConnection sendSynchronousRequest:urlrequest returningResponse: nil error: nil ];
NSString *returnstring = [[NSString alloc] initWithData:returnData encoding:NSASCIIStringEncoding];

Tested on OS X 10.5+ with GCC 4.2

November 14, 2009

Simple Java XML Parser

To continue my SN Project work I started to convert the simple object to data save format I had been using to save time during the semester into a xml based file system. This way when I update code it will not break saved game file due to class def not found exceptions. However I quickly ran into an issue that Java did not have a "simple" built in class to handle XML parsing and other free libraries where a little more complex then I was looking for, so although I usually try not to reinvent the wheel while programing, this time I wanted to try my hand at writing a simple Java XML parser.

What this basic XML parser does:  It looks for the starting tag < and then starts appending the tag characters to the node tag.  Then it hits > telling that the tag had ended and the data starts finally it looks for < for the next node.  There is also the case that another < is found in the data which indicates that a nested tag was found.  The older node is pushed on a stack and the parser moves up one child node and begins the tag and data read once more. This is repeated until no nodes are left on the stack indicating the root closing tag has been reached.

As a note:  I have removed most of the advanced checking and my custom xml build methods for security reasons, so you are on your own to add try/catch as needed.  Also, as I am not using any attributes in tags, so this parser does not read them.

Overall the code works well for the xml documents I am reading in, though for one class I may still look into a faster c based solution in the future, since I am working with a JNI library anyway.

You can check out the code HERE
As usual the code is free to use, but please give me some credit if it is used in a large project, or leave a comment about what it was used in.