/dev/null

Static is evil.

February 28, 2011

Communications of the ACM 01/2011

Tags: , — 18:29

While being in Chicago I found the time to read through the last issue of Communications. Here’s a short summary:

  • Where have all the workshops gone?” by Moshe Y. Vardi – he basically complains that todays workshops (“second-rate conferences”) are not the same as they have been in the past (“informal gatherings of researchers”). My suggestion: Go to an Unconference, although these are more informal gatherings of engineers, rather than researchers. Probably depends a lot on the field you’re working in.
  • Letters to the editor: “Objects Always! Well, Almost Always” (Henry Baragar, Toronto) – I agree with him that “OOP provides more tools and techniques for building good models than any other programming paradigm” including closures, which are kind of hyped right now.
  • ACM’s annual report – Yes, they seem to make some money! 4,8 mio USD increase in net assets in 2010.
  • Nonlinear Systems Made Easy” by Gary Anthes – This actually seems interesting. The article is about new algorithms devised by Pablo Parrilo to rewrite nonlinear polynomials as sums of squares of other functions. Sadly I usually don’t write software that could leverage that approach, but maybe in the future… who knows?
  • India’s Elephantine Effort” by Marina Krakovsky – Wow, millions of Indian people get a biometric ID assigned. Seems like this is mainly viewed as a technical challenge. For sure, in Germany a project like this would involve endless political discussions.
  • Don’t bring me a good idea” by Phillip G. Armour – The title is confusing, as you expect some advice how to sell ideas to managers. Instead it is about how to model IT environments using virtual machines.
  • Google AdWords and European Trademark Law” by Stefan Bechtold – As a software engineer, I don’t care a lot about Ads and the law, but it seems like Google is the winner once more.
  • Reflections on the Toyota Debacle” by Michael A. Cusumano – This was insightful and I like the conclusion: “What managers need to understand are the limitations of any best practice as well as the potential even for greater companies to lose their focus and attention to detail – at least temporarily. [...] The Toyota way used to be that one defect is too many. That is the kind of thinking that Toyota seems to be regaining.”
  • Cloud Computing Privacy Concerns on our Doorstep” by Mark D. Ryan – I agree that “we are just at the beginning of the digital era, and many of the solutions we currently accept won’t be considered adequate in the long term”. Nothing new.
  • An interview with Frances E. Allen” by Guy E. Steele Jr. – Didn’t read it all, but she still seems to be in love with Fortran.
  • Collaboration in System Administration” by Haber, Kandogan and Maglio – Could be very interesting for IT managers and sysadmins. I skipped it.
  • Virtualization: Blessing or Curse?” by Evangelos Kotsovinos – Nothing new here as well: “Can virtualization deliver? It absolutely can, but not out of the box.”
  • Using Simple Abstraction to Reinvent Computing for Parallelism” by Uzi Vishkin – an introduction to Explicit Multi-Threading (XMT) based on a PRAM computer. Right now, there is an FPGA implementation of such a computer with 64 cores and a programming language called XMTC, which is a superset of the C programming language. That said, the whole thing is mainly of academic interest for the moment and I personally reject everything that is related to C. But the idea seems good.
  • A Firm Foundation for Private Data Analysis” by Cynthia Dwork (Microsoft) – Conclusion: 1) Large Query Sets don’t provide privacy because you can subtract one result set from another and query auditing does not work well. 2) Adding random noise to the result set has promise, if done correctly. 3) Token-based hashing of search logs can be compromised by statistical analysis. 4) Things that statistical databases are designed to teach can, sometimes indirectly, cause damage to an individual, even if this individual is not in the database. 5) That leads to the definition of “Differential Privacy”, which is further explained in the article. Excellent!

People search for many “obviously” discoloisive things, such as their full names (vanity searches), their own social security numbers (to see if their numbers are publicly available on the Web, possibly with a goal of asserting the threat of identity theft), and even the combination of mother’s maiden name and social security number.

February 9, 2011

Dependency Management

Tags: — 10:47

Right now, I try to find out what works best for me – in all areas. For example:

  • Keep the Facebook account or delete it?
  • Teach people about SQL and JS or keep it to myself?
  • Get in touch with people from the past or ignore them?
  • Enjoy the nightlife in Berlin or sit at home every evening and live healthy?
  • Use git or Subversion?
  • Wear black cloths or more colorful stuff?
  • Get a new car or wait until my Civic dies?
  • Listen to music or enjoy the silence?

These are really tough questions and I won’t be able to answer them without breakfast. This is what the title refers to.

December 11, 2010

Why good code sometimes needs redundancy: The dynamic aspects

Tags: , — 16:15

Introduction

There is a general rule in software development, which says you should avoid duplicated code and data. This is extremely simple to remember and many developers love to quote it. Popular code quality assessment tools use it as a metric. Because this rule sounds so inerrable, it is applied even in cases, where it is not appropriate for one or more reasons.

Another well-known fact is that inexperienced developers like to duplicate everything using copy & paste, which causes a lot of maintenance overhead afterwards. The reasoning for both diametrical-opposed attitudes originates from a somewhat static view on code.

In general, if two functions do the same, they should be merged. If a function is too big and therefore very likely does more than one thing, it should be broke down into smaller functions. If a value is stored in one table of a relational database, it should not also be in a second one. If you can normalize your data, you should do so. While this advice is very good in many cases, the reality is more complex.

The reason is this: Requirements and therefore your application normally changes over time. To optimize code and data structures can cause considerable efforts that have no return of investment, if you have to undo your optimizations frequently. Also, software must fulfill additional requirements concerning aspects like performance and long-term maintainability, besides just being “correct” at a given time.

The term maintainability shall also contain the possibility to be able to update parts of the application without requiring a major refactoring in other parts that should not be affected. If this happens, it does not only consume more time to apply changes, but also every developer must be an expert for the whole system. Most developers only know about a fraction of the code and that is how critical bugs are introduced very often. You can fight this with tests, but you can also try to prevent it from happening.

A non-technical example

Imagine a house and let’s say five cars standing in front of it. The first thought you will probably have is: Why does this guy in there need that much cars?

The obvious optimization is to get rid of 4 cars, because they only produce costs and he can only use one car at a time. But wait: There is a family living in that house. You can still make an optimization, by observing where they are driving and what the maximum amount of cars they use in parallel is. Also people can share a car if they drive to the same destination anyways.

Let’s say after one week, you find out that you can get rid of two cars that are superfluous. But wait: One family member gets a new job at a different location next month and cannot share a car anymore and another one was on holidays during your observation.

Still you could sell one or two cars for the time period they are not used, but this creates transaction costs that far exceed what you save in maintenance. The lesson learned is, that even if it looks like you can get rid of some cars at first, it doesn’t make sense, if you look at it in detail.

The limits of code optimization

Speaking in code, you can imagine two functions that are used for two separate parts of an application that is still under development. Maybe one of these functions is not even in use yet, because the code that will use it is not developed yet. These both functions might contain the same six lines of code. This is the static picture we see.

Before you start optimizing that by thinking about the right solution (a third function that contains these six lines and that is used by both functions? Or maybe just merge the functions into one and use that everywhere?), you should consider this: That code is just an approximation for the start.

For example, they both store data in a file, while you can expect that there will be more and different storage methods later on. Or they both contain a standard algorithm that is likely to be optimized for the specific use case later. If you merge them and they belong to distant parts of the application, then you need to consider that from now on: You need to keep those parts aligned because they depend on a single function. If you change the behavior of your single function, you need to adapt all code that uses it. Sure, there are certainly tools and ways around those problems, but those also need to be investigated, installed, applied, watched, considered, or whatever.

Therefore we save the maintenance costs of two functions that do something similar at a given point in time, but add the transaction costs for first optimizing the code and then refactoring the whole thing again, once they shall be different from each other. By then it might be too late, because lots of workarounds are in place as nobody dared to touch this central function that everything depends on. You are doomed.

Function inlining

A special case exists in JavaScript, where you need to put a lot of work in performance optimization, so that the application even runs on the oldest browsers. In fact, a popular technique is to copy & paste code from different functions into one function (especially when the code is executed in a loop), so that the interpreter saves the time to look them up. For somebody new to JavaScript this might seem strange and you are immediately tempted to clean-up the code with the very best intentions – thus maybe making the application unusable for certain users.

Moving redundancy from the library to the config

Another example everybody is familiar with are configuration files. Especially if the application was written by the same team or company, they are more or less the same for every project. Still they are full featured duplicates, just in case you want to apply changes. You cannot take for granted, that everybody is happy with the default configuration and it is way easier to change an existing setting in a file than to browse through a manual to find out which code is needed to modify the default setting. If a piece of code in the library requires a new option, you can see the developers adding the setting to all the projects on all the installations without complaining a lot. It turns bad, when the code in the library becomes so general and non-redundant that you start needing lots of options and basically start programming in your configuration file. This just moves redundancy from the library (where it could help to actually understand what some code is doing) into the configuration of each and every project.

Redundancy in frameworks and libraries

If you plan to give away code that can be used separate from your existing library or framework, you are frequently forced to cut the ties for the cost of duplicating code – or you explain the dependencies to your users and accept that some of them might reject using your code.

Legal aspects

Many Open Source projects provide a good example for how important redundancy is because of legal reasons. Just look at how much code that essentially does something that was already there got (re-)published under a different license, so that it’s safe to use it. Legal requirements by the way are also a driving force behind redundancy in the real life: Office and home, business and private mobile phone contract, multiple black-boxes in an aircraft and so on.

Redundant data

When it comes to data, the justification of redundancy is mainly performance, availability or because it simply does not matter. There certainly are a few cases where you never have to change data (maybe you are not even allowed to), so why putting effort in making it easily updateable? Imagine a use case where you have to throw away your data every week. A powerful data structure that is flexible and can be used for the next 100 years is of no help in that case.

Another reason for storing the same information two times is that you want to store it once normalized and once in a processed form (maybe joined together with other data), because you have a lot of read-only requests. That can save the time needed to process the normalized data and therefore results in less server hardware to satisfy the same amount of users. Another example is to store the data using different technologies, maybe once on the hard disk (for long-term availability) and once in memory (for short-term availability). Or you have a copy that can be modified and another copy that is read-only etc. It depends on the properties of real-world hardware and most importantly on how the data is expected to be used over time.

Conclusion

If you only look at things at a given time, you can hardly understand why the application was developed the way it is or make any suggestion for how it should be. Good software design must consider the business use case, carefully evaluate what parts should be independent from each other (therefore need some kind of redundancy) and cannot rely on general rules. This is a constant process and requires an experienced engineer.

November 16, 2010

Free math “book”

Tags: , — 09:54

Yesterday, I migrated my personal math reference to the new Wiki:

http://math.chaoticpattern.net/wiki/index

It is all in German and with my own clumsy comments. Those who read it, will notice that math is a useful tool for me, rather than some kind of art – although I agree you can look at it as art, if you got the time. That’s why I tried to compress it as much as possible ;)

November 4, 2010

Tea Party

Tags: — 19:26

Sometimes it’s hard for me to understand what drives other people. Why would somebody join the Tea Party movement in the United States? Even after asking supporters of these “ideas” what they think, it’s not quite clear to me.

Is it just the ordinary entertainment, the Americans like so much? Or some kind of witch hunt?

It would be sad, if that is the case, because a) it does not help and b) some innocent people will suffer. In Germany we do have experience with such things. When the economy turned down, they were hunting the Jewish people, as if that is the most natural thing on earth and the best explanation for Germany’s problems.

Now these Tea Party people think Obama is the new Hitler and they must hunt the Communists. Also they want to disallow sex before marriage including masturbation. That can’t be natural, right? Killing people with another opinion is natural! Not sure what this has to do with limited government or the American constitution at the end of the day. This sounds more like they want to introduce fascism with Sarah Palin as the leader.

How hard is it to accept that they probably ruined their own country with exactly that narrow-minded attitude? Bush started wars in Iraq and Afghanistan (is that not wasting money?) and was on power when the financial crisis started – and I can’t remember anyone from the Tea Party to complain about his leadership. Now Obama introduced a health insurance for everyone and they call him a Communist. If that would be the case, Germany and the UK and many other countries with a conservative and market oriented government should be called communist countries, just because they want to conserve the value of their work force by not letting them rot away. Wow, what a great but late success for the Sowjets.

For my part, I want a strong United States. So what about producing stuff the world wants to buy instead of complaining that it’s all the fault of Islam, the Afro-Americans, the Mexicans and Communism? Sounds smart to me. Hollywood, Apple, Google and Facebook are not enough to feed 307,006,550 people. To blame others for your own failure is the attitude of a loser.

As so often, this whole movement probably is just about earning money, like most things on earth. It would be wrong to think that so many simple minded (isn’t that what they call themselves? “Ms. Hokey Mom”) people are interested in real politics. The Tea Party leaders are very smart, because they know how to take money out of their supporters pockets. This is a success indeed, but not for many and not for the nation.

November 3, 2010

Notification free PHP coding (Part 2)

Tags: — 09:47

Thorsten Rinne just published the new version of phpMyFaq. But probably he didn’t read the notifications during development time and now they pop up on the live site and break everything:

phpMyFAQ notice [8]: Object of class PMF_DB_Mysql could not be converted to int in Captcha.php on line 150

It’s not that I am perfect. My colleges sometimes have to remind me to turn on notifications, because the automatic PHP updates turned them off or I switched computers and forgot to modify the local php.ini. This can happen to everyone.

Das Leben ist kein Ponyhof.

November 1, 2010

Chaotic Pattern: Wiki is online

Tags: , , — 22:06

I recently managed to publish my latest project:

chaoticpattern.net

While reading the book Sync, I felt that this is the right name – an explanation can be found on the Web site. There is already some blog with the same name, but it does not seem to be very active. The last post is from May 2009. Hope I don’t cause any confusion.

Chaotic Pattern will be used as an incubator/lab. The Wiki itself is the first in a series of experiments. It fits my personal needs as a publishing platform and is more convenient than any typical word processor out there. You can not only edit documents online, like in every other Wiki, but also produce wonderful PDFs and include content from Flickr, Youtube and Google Maps. If you like it, use it for free. There are no strings attached and I will not sell your documents or your Facebook profile :)

Also the Wiki is there to demonstrate what you can do with JavaScriptMVC 3.0 and a little bit of PHP. Theoretically, it should scale to billions of users. But as long as I don’t get any venture capital, we will never find out. Anyhow, I want people to use it as a prototype for their own apps. Until end of this year, everything should be properly documented and you can download some of the components.

The next step is to get the Ajax push server finally running, so that you can use the real time features. Ape is installed, but seems to cause connection problems under certain conditions. Either I can fix that with my limited resources or I have to switch to another server. It’s not like I don’t have anything else to do, so please be patient^^

October 23, 2010

Apple announces Mac App Store: Fool me once, shame on you… fool me twice, shame on me.

Tags: — 09:28

Well, it’s official. Apple has now announced it’s bringing the
App Store concept to the Mac and it looks like they’ll be
restricting apps with FairPlay DRM too for good measure. When we
first began talking about the problems with the App Store on the
iPhone and iPod Touch, people wanted us to drop it and stop
talking about the DRM tricks being pulled by Apple on the grounds
that the iPhone wasn’t a general purpose computer (it is, and the
iPad is too) but rather an appliance.

Is this a glimpse of what Apple has in store for the future of computing?

Learn more: http://www.defectivebydesign.org/macappstore

September 9, 2010

The new Google JS search interface

Tags: , — 09:45

The engineers at Google are facing the same (concurrency) problems as every other Web developer. Such problems are hard to find with ordinary testing, that does not take small variations in timing into account. To produce this result, you just have to press the back button at the right time:

September 4, 2010

How to contribute to JavaScriptMVC 3.0

Tags: , — 14:25

If you are reading this, I can safely assume you know what JavaScriptMVC (JMVC) is, what its features are, and what components it consists of, namely:

  • FuncUnit (the test framework)
  • DocumentJS (the documentation engine)
  • jQuery (JavaScriptMVC uses a special fork with added bugfixes)
  • JavaScriptMVC  (the core framework)
  • Steal (the code manager / script loader; sometimes referred to as “StealJS”)
  • Phui  (component library for JavaScriptMVC, not part of the core)

These partly independent components (together with Selenium) are bundled using git submodules (http://book.git-scm.com/5_submodules.html) in a repository called “framework” for your convenience:

http://github.com/jupiterjs/framework

A detailed description of the components can be found on the Jupiter JS blog:

http://jupiterjs.com/#news/javascriptmvc-features

Release 2.0

You should be aware that release 2.0 is still out there. Don’t get confused. It is hosted on Google Code at http://code.google.com/p/javascriptmvc/ and uses Subversion instead of git. Don’t use the issue tracker there for bugs you find in 3.0.

The respective project Web sites are:

Ways to contribute

There is a number of ways to contribute:

  1. You can fork the project repositories on github (http://github.com/jupiterjs/) into your own repository and send a “pull request”, every time you want to submit changes (pull requests replace patches and preserve your authorship)
  2. After contributing for a while, you can become a member of the JavaScriptMVC team and ask for direct write access to the project repositories
  3. You can just checkout/download  the latest framework version and report bugs to the developers (http://github.com/jupiterjs/javascriptmvc/issues) or answer questions on the mailing list (http://groups.google.com/group/javascriptmvc?hl=en)

A few words about git

In every case you should become more or less familiar with git (the distributed version control system) and github (the project hosting Web site).  The Git Community Book is a great reference: http://book.git-scm.com/index.html. If you are already familiar with Subversion, you might want to read the “Git – SVN Crash Course” (http://git.or.cz/course/svn.html). The main idea of git is that everyone has its own repository (that’s why it’s called distributed). It allows everyone to contribute easily to Open Source projects and was initially designed and developed by Linus Torvalds for Linux kernel development.

In contrast to “normal” version control systems, a commit is happening on your local computer only. If you want to actually send your code to the server, you have to push it back to the remote repository. If this is not the main repository of the project but your own fork, you additionally have to send a “pull request” to the project owner (jupiterjs in that case). As a previous user of Subversion I found the whole process of checking out the source, committing my changes and pushing it back to the repository tremendously complicated, even though I understand the need for distributed version control. In part this is because git sometimes comes up with error messages that look pretty scary and that don’t help to understand the actual problem.

Setup your github account

After you created your free github account at http://github.com/, you need to go to the Account Settings page, click on “SSH Public Keys” in the left hand side navigation and then enter your public key. If you don’t have one yet, you open a text terminal on the local computer and enter:

ssh-keygen -t rsa -C “youremail@address.com”

There is a more detailed howto at http://help.github.com/msysgit-key-setup/.

If you run Windows, you are probably out of luck. Try to ask Google for assistance ;)

Next step is to fork the repositories you want to work on. This process is pretty convenient – you simply click the fork button on the main repository page. The complete list is visible on:

http://github.com/jupiterjs

Install Git and Java

If not yet done, you should also install Git and Java on the local computer by using the package management software that comes with your Linux distribution. On Ubuntu you can use the “Synaptic Package Manager” for example. At least in the past, I had bad experiences with OpenJDK and Rhino (the engine that executes JMVC’s command line scripts), so I recommend installing the “original” Sun/Oracle Java version. The Selenium server also needs Java.

Clone the repositories

I recommend cloning (that means checkout) the “framework” repository as the first step. The build script in there allows creating the downloadable packages you see on http://github.com/jupiterjs/framework/downloads. To do this, follow these 4 easy steps:

  1. Create an empty local directory that should contain all your github repositories and change into it
  2. git clone git://github.com/jupiterjs/framework.git
  3. git submodule init
  4. git submodule update

That’s it!

The procedure to clone your forked project repositories is similar. First you need the URL that you have to provide to Git.

Go to your Git home page ( http://github.com/[username]/) and then click on the forked respository, e.g. javascriptmvc. On the top you’ll see 3 different URLS:

  • SSH with read and write access: This is what you want to use
  • HTTP: This is slower and offers read only access
  • Git Read-Only: Since we want to push (write) our changes to the server, this does not make sense

Now simply clone it into your local github directory by typing:

git clone git@github.com:[username]/[repository].git

Push changes

With Git, you can do as many commit as you want locally. Please note that the equivalent to “svn commit” is “git commit –a”. Please be nice and provide a commit message every time, for example:

git commit -a –message ‘Changed github URLs from pinhook to jupiterjs’

To see what would be commited or to see your changes since the last commit you can use “git status” and “git diff”.

After testing all your changes locally (this is probably worth another article, but I’m running out of time now), you are free to commit your changes to the remote repository:

git push origin master

You can also configure git to always push to the matching remote branch (there are other options as well, but I’m not that much into git to fully understand them, to be honest):

git config push.default matching

Next time you push something into that repository, it is enough to use “git push”.

As mentioned earlier, you have to do send a “pull request” (there is a nice button on github to do just that), if you want to send your changes from your fork to the original project repository.

Pull changes for submodules

In my naïve thinking, I assumed that Git will automatically update the submodule directories in the framework repository when you type “git submodule update”. This is not the case.

You have to manually pull the latest changes in each sub directory that contains a submodule like this:

cd funcunit/

git checkout master (only required the first time)

git pull

You will only get the changes in the submodules via “git pull” in the main framework directory if somebody with write access to framework does this “cd [submodule]; git pull” procedure locally and then pushes back to the repository on github:

http://github.com/jupiterjs/framework/commit/1affd96d632b34621e9e0d06707ea83d1d4c3b9d

Thanks

Thanks for reading this. You see, I’m not a Git expert yet and I really hope I didn’t give any bad advice concerning its usage. As I improve my knowledge and find best practices how to do things most efficiently, I will update this page or post a follow-up. Especially testing is not covered in this article, which is a shame.

« Newer PostsOlder Posts »

Powered by PHP, Memcached, Suhosin, MySQL and WordPress