Monday, December 28, 2009

Open Source

When someone makes a program, the public availability of source code is sometimes a weighty matter. Now, if someone goes to all the effort of making a huge, extensive program, why would they want to just give everyone the code for it? Lets take a look at reasons for and against just giving the source code.

Reasons against:
  • Perhaps the biggest reasons to keep source code private is control of whats public and retaining monetary flow. If you have some unapparent bugs in your program or maybe some algorithms you don't want everyone to have, you might think that keeping things from the public protects your interests. If you want to make money with your program, just giving the code doesn't make it that easy.
  • Maybe you have something to hide. You might be sending usage statistics back to company servers and you don't want the user to be aware of this. Obviously, a user might get a bit freaked out if they find out you are collecting information from them, so what better than to not let them know in the first place.
  • You may want to prevent competition. If you let your code into the open, others are destined to see it and try to make something better. If your company depends on keeping things private to remain prosperous, the last thing on your mind is feeding competition the code you have so far, and thus driving business away.
Reasons for:
  • One of the biggest reasons open source programs do so well is because they have a community backing and developing them. The program doesn't just rely on a small set of somewhat knowledgeable people, but many, many people, collectively far more knowledgeable than a subset of employees.
  • Another big reason is the speedy development. If someone finds a bug or gets an idea for improvement, they can fix or add code to their hearts content (obviously moderated by those in charge of accepting changes). This give the community a say in the direction a program goes and how secure it is. If a program has a vulnerability, it can be resolved soon after discovery, and possibly even by the discoverer. This makes a program much more secure and just cooler in general.
  • Everyone can see the code, so they can conform their program to work with yours or take full advantage of your programs capabilities. End-programmers can also learn from your program. Rather than spending hours hassling with something, just trying to get a simple piece of their program to work, they can see how you got yours to work. Its a great way to learn something.
  • Have you ever just made some quick, poorly thought piece of code, which would embarrass you to have everybody see? Well, the whole publicity factor can help you to make higher-quality, clean code that you can be proud to put your name to.
  • Open source programs can bring in revenue as well. If you make a high-quality product, your following community is much more likely to donate money than if you make a low-quality product. This encourages you to make an awesome, well-liked product. Plenty of profit can be made from simply asking your users to donate a little bit of money. If you make the users happy, they will gladly repay.
  • Lastly, its just cool and helps everyone. Sharing is better than selfishly hoarding, so why not apply that to your code?!
Ok, so in this comparison we can see that the reasons for open source outweigh the reasons against (at least I hope you can see that). Now, lets take a look at examples of proprietary vs. open source:

Microsoft has long kept its code private. This has brought a plethora of problems. First off, its operating system has (as I have seen) utterly failed. They control everything with it and it bugily progresses (if not digresses) at a very slow pace. Windows Vista is basically a slower prettier, much buggier version of XP. Windows7, though an improvement over Vista, fails in many ways. Nextly is their browser, Internet Explorer. As seen by the majority of web designers, it has been one of the largest stumbling blocks of internet progress. It renders things terribly, slowly, and very bugily. It doesn't support, but rather impedes many standards. The community has had little to no say in the development Microsoft products, its all corporately managed.

Some of the best known open source projects are Linux and Mozilla Firefox. Linux has been slowly, but surely, crushing Microsoft's monopoly. Anyone can make their own distribution of Linux and can contribute to existing distributions. Its entirely community based. Since its beginnings, Mozilla Firefox has been thriving. The browser quickly gained popularity and is currently one of the biggest forces crushing Microsoft's sad attempt at a browser. Recently, Google released a friendly competitor to Firefox. Google Chrome brags an amazingly fast javascript engine, which makes notable speed difference. Rather than attempting a monopoly, these two browsers provide encouragement to one-up each other. This just makes both of them better. The open source factor has brought Linux, Firefox, and Chrome to be far better than any corporate controlled product could ever become.

So, these are just a few examples of why open source is far superior to closed source. Next time you start a large project, consider open sourcing it. Help others and yourself!

Thursday, December 24, 2009

Learning How To Learn

A large part of the knowledge I have gained has come from my own endeavors to seek out truth. Since a very young age I have been a very curious person. I always had questions and rarely had answers. That is probably one of the biggest reasons I love computers, and especially the internet, so very much. Once I started looking for information in this massive network, filled with a large part of the worlds collective knowledge, I finally began to discover answers to my many questions. If I was without this amazing resource, I would be quite clueless in many respects.

The things I have learned range from how many, many parts of human anatomy work (nervous system, immune system, adaptability, bone structure and composition, tissue composition, etc), to how computers work on the lower levels, and even how operating systems and programming languages work. This isn't all. I have found many answers to random questions that come up. Some of my biggest resources have been the Google search engine and Wikipedia. Its amazing how you can just type a question or well-phrased query and find all sorts of answers.

Maybe I am just weird, but I often found school to get in the way of my education. Sure there are those things that I wouldn't have learned without school, and am grateful I did learn, but I found school to be less productive overall than the time that I really spent researching things myself. I cant say I am an expert in everything, but I have gained some good underlying knowledge of a multiplicity of subjects. I have often found the things that I learn to be extremely helpful and good to know. I prefer to know about things myself, rather than just blindly trusting whatever other people say. If I am going to get an MRI, like I just did days ago, I will research into the subject. If I am in some way injured or ill, I am going to look into it. That way I know about it and how to best treat myself. If I am going to gain mad security penetration skills, I am going to find good information resources (like HTS). For some, a teacher could even be a good, knowledgeable resource (but sometimes, in my experience, teachers are quite clueless in the things they are supposed to be expert in).

Sometimes even my family makes fun of me for my obsession with knowledge. I just laugh at myself right along-side them, rather than taking any offense whatsoever. I realize I am kind of a knowledge nerd, truth seeker, or dare I even say hacker, and I know that's just who I am. If someone doesn't like it, that's their problem.

Now, have you ever had an unanswered question? Have you ever had doubts about the truthfulness of information you have received? Have you ever just wanted to know more about something? Try researching it yourself. Here is a great tutorial for learning to learn. In fact at the top you can find a link to Wikipedia, which has an article about autodidactism (self-learning).

Ok, now that you know your resources, use them. Good luck!

Tuesday, December 22, 2009

PHP

As I wrote about my favorite language, python, I figure I might as well write about other languages I like. This time its PHP.

Ever wanted to write an interactive, dynamically generated web page or site? This language is built mainly for that exact purpose, plus its very powerful. That is why its the most popular server-side languages used for web programming.

PHP is pretty easy to learn and use. Its main site, php.net, has some awesome, pretty thorough, documentation to guide the learning process. Unlike python, the functions are all global. You don't need to import anything to use it. As with python, variable types don't have to be explicitly specified. In php this is called 'type juggling'.

Ok, now for some of the basics. All php code must be surrounded by the start(<?) and end(?>) tags, in order to separating php from html. Variable names begin with the dollar sign ($) and can contain letters, numbers (except for the first character), underscores(_), and some others (ascii 127+). Instructions end with a semicolon(;). Here's a very basic example:

<?
$some_variable = "Hello World!";
echo $some_variable;
?>

This will simply store the text "Hello World!" in a variable then output it. You can do a lot more with php, but these are just some of the basics. You can find more help with learning php from w3schools. This language can also be used for command-line programs, and even graphical programs (via gtk).

Best of luck!

Monday, December 21, 2009

Bot Troubles

Do you run a webserver? Ever seen lines like the following in your logs?

95.31.11.173 - - [06/Dec/2009:07:29:14 -0700] localhost:80 "POST http://yuamin.blog95.fc2.com/?no=2127&ul=96fa963c0da02340 HTTP/1.1" 200 6699
7
88.80.10.1 - - [06/Dec/2009:07:29:26 -0700] localhost:80 "CONNECT auctions.godaddy.com:443 HTTP/1.1" 301 -
7
74.63.225.45 - - [06/Dec/2009:07:29:40 -0700] localhost:80 "GET http://aanserver88.com/js/banner/banner.js HTTP/1.0" 200 11922
74.63.225.45 - - [06/Dec/2009:07:29:41 -0700] localhost:80 "GET http://c5.zedo.com/jsc/c5/fo.js HTTP/1.0" 200 3599
95.31.11.173 - - [06/Dec/2009:07:29:41 -0700] localhost:80 "GET http://seo.fc2.com/spam/ HTTP/1.1" 200 55738
74.63.225.45 - - [06/Dec/2009:07:29:42 -0700] localhost:80 "GET http://c7.zedo.com/bar/v15-202/c5/jsc/fm.js?c=2404/1729&f=&n=735&r=5&d=0&q=&s=80&z=0.3461414148545188 HTTP/1.0" 200 3505
7
...

I had been having this problem for a long time and couldn't think of a solution. These bot networks were almost constantly hammering my server with requests (I even got one log file that was over 12MB; rotated weekly). I hadn't a clue how to stop them. After lots experimentation with apache and research into its capabilities, I found a fairly simple solution. I found information about using the ext_filter module and added the following into my 'httpd.conf' file:

ExtFilterDefine bots mode=input cmd="/scripts/bot_filter.py"
SetInputFilter bots

This filters all input through a python script I made. Using the apache-set environment variables I have enough information to determine whether to block an ip. My script determines if the request is suspicious and, if so, entirely bans the ip address with my firewall of choice, iptables. Heres the script:

#!/usr/bin/env python

import os
from subprocess import Popen

ip = ""
meth = ""
req = ""

try:
    ip = str(os.environ["REMOTE_ADDR"])
    meth = str(os.environ["REQUEST_METHOD"])
    req = str(os.environ["REQUEST_URI"])
except:
    pass

#if req.find("http://") == 0
if req.find("/") != 0 or not (meth == "GET" or meth == "POST" or meth == "HEAD" or meth == "OPTIONS" or meth == "TRACE"):
    try:
        list = os.popen("sudo /sbin/iptables -n --list INPUT").read()
       
        if list.find(ip) < 0:
            Popen(['sudo', '/sbin/iptables', '-I', 'INPUT', '-p', 'all', '-s', ip, '-j', 'DROP'])
    except:
        f = open('/var/log/bot.log', 'a')
        f.write("failed\n")
        f.close()
        os.exit(0)
        pass

    f = open('/var/log/bot.log', 'a')
    f.write(ip + " success " + meth + " " + req + "\n")
    f.close()

# Allows POST data to be transparently passed on
import sys
sys.stdout.write(sys.stdin.read())

This prepends the ip address of the offending request onto the beginning of the iptables INPUT filter chain.

Now the first malicious request earns a well-deserved ban. I no longer have these malformed requests from bot networks hammering my server, but only legitimate request. Its working quite well so far.

Saturday, December 19, 2009

Python, The Good And The Bad

First off, im a bit of an obsessive programmer. I have been programming since like 7th grade or so. It began as just web design, and eventually became a well developed underlying understanding of many types of programming. The languages I have come to know include (in no particular order): Python, PHP, C++, Bash Scripting, C#, Java, ASP.NET, Javascript, CSS, and (X)HTML. Of these, python is the last one I learned. I only really put a focus on it within the last half-year. Since I have come to use it a lot, I have come to love the language more than others. It may not be as fast as a compiled language, but it is way more simplistic and powerful, yet easy to learn and use, than any other language I have come across.

Now for some of the pro's:
  • The language is entirely indentation dependent. Rather than using curly-braces for grouping and scope, it just uses depth of indentation. Aside from the lack of extra lines used just for curly-braces, it teaches the programmer the importance of writing clean, readable code.
  • It is module based. Rather than having everything at a global scope, as in PHP. You can just import what you need to use, and no more.
  • It allows using libraries written in other languages, such as wxWidgets and GStreamer. This allows you to write fast compiled code, then use the compiled code directly, rather than writing code in just pure python.
  • Everything's an object. Every class, string, list, function, etc is an object. They all have attributes and methods. They can all be assigned to a variable or passed as an argument.
  • Documentation. I have found an abundance of documentation for doing what I need to, even if in the __doc__ attribute provided with every object.
  • Flexibility. It is simple to make a program that will work on multiple platforms without much work.
  • And on, and on, and on ...
Ok, now lets look at the downsides:
  • Performance. Python is certainly not a compiled language. As such, it doesn't run as fast as a compiled language. C/C++ programs are compiled, so they run as fast as the computer allows (based, of course, on the quality of code written). Python is an interpreted language, so it doesn't have that low-level benefit.

Aside from the inherent performance penalty, python is an amazing, powerful, simple language. I learned it using, obviously, google (the best search engine ever), ipython (an interactive, introspecting, well-designed python shell), and a lot of experimentation. Recently I took on the moderately sized project of writing a gstreamer backed command-line media player, with audioscrobbler support for last.fm song tracking, and dbus support for gnome mediakey control. I used python because I couldn't get it to work how I wanted in other languages. Also, it took days to write the base code, not weeks.

Now, some resources for learning python:

Google - Best search engine ever.
The Python Website - Good place to find extensive core documentation.
Dive Into Python - Perhaps the best step-by-step, noob-to-pro documentation ever written.
IPython - Well written interactive, introspecting python shell.
EXPERIMENTATION - The best way to learn anything.

I have come to highly advocate this language and recommend it to anyone, whether just starting into the world of programming, or just wanting to learn a new language. Good luck!

Friday, December 18, 2009

Why Re-Invent The Wheel?

Ok, I have never blogged before, but I figured I would start. I couldn't think of how to start, but I thought about this and figured it was as good as anything, so here goes.

On to the point of the post...

All the time I am hearing people complain about 're-inventing the wheel', or doing something themselves, when someone else has done it before. In fact, my programming teacher often stated this when teaching my class last year and the year before. This, to me, is very irritating. I prefer the statement 'Why not re-invent the wheel?'

Ok, what are the benefits of re-inventing the wheel?

Well, first off, if no one ever tried to re-invent the wheel, we would still have primitive bicycle wheels on our automotives (perhaps even rounded chunks of rock). Obviously somebody came along and thought 'hey, I can make a better wheel.' Because of innovation, we have better wheels today. So thats a major benefit.

Nextly, what can you learn by re-inventing the wheel? You can learn how to make a wheel yourself, rather than relying on and trusting others knowledge. Anyway, whats so bad with having a little extra knowledge?! This also comes with the added benefit of the designs and ideas behind the wheel not being lost and forgotten.

If 'the wheel' was never 're-invented', society would be quite a boring, non-progressing place.

What are the down sides? A little extra time and effort. Whats so bad about that?

Now for a programming example. The first year of my programming class (and to a degree, the second) we focused on C# (not my favorite language). My teacher would always brag about how C# is so cool because the .NET framework does everything for you already. Sure it abstracts things a lot, but this isn't always good. I have found the .NET framework to be terribly slow and annoying to work with. When I try to do certain lower-level things, it is difficult to impossible. As well, it leaves little to learn. You never really learn the lower-level things, like socket communication and packet construction, because there are things to do that for you. It is highly abstracted and terribly annoying to such a naturally curious person as me. Besides, if Microsoft is always controling the low-level things, they are not that likely to improve (not quickly at least) and the end-programmer will never really know whats truly going on.

So, the question I pose to you is: 'Why not re-invent the wheel?'