Saturday, January 27, 2018

Software Vendors

Over the last few years, I have had the opportunity to deal with Software Vendors. Either product or service providers or both. Here are some thoughts and learnings on how to manage them.
  • Procurement and Assessment. Here is the proposed model.
    • Business Owner identify need/materiality
    • Business owner with help of IT owner develops a Vendor compatibility matrix
    • References- at least 3
    • Schedule site visits/demo based on your user scenarios to understand:
      • Challenges during implementation
      • How good is the vendor at:
        • Breaking down into smaller modules?
        • Pushing back on requirements, given their expertise. There are way too many vendors who will yes you to death and deliver nothing.
      • What performance metrics (time, quality) were used to evaluate deliverables?
      • Challenges post-implementation/support
      • Challenges with upgrades, updates, and patches
    • Test Drive in a Pilot environment for a few key User Scenarios.
    • Contracting: driven by Business Owner supported by IT owner.
      • Contracts should be performance-based, milestones deliverables.
        • Penalties for missing in timelines and quality.
        • Bonus for coming in sooner and high quality.
      • Understand Upgrade, Updates and Patches process.
      • Need APIs to get data in and out of the system if we choose to move to a different vendor.
      • Need complete transparency of development process.
      • Include Source Code handover process as part of the contract. 
        • Bad code === Bad Product.
        • Access to source control
        • Frequent code reviews.
      • Testing
        • Review Test Plans
        • Deliver executed test plans with each functionality deliverable.
  • Lessons Learned
    • When bringing contract resources onsite, need a timed plan with daily updates to adjust course as needed.
    • Requirements
      • Must include screenshots and flows (diagrammatic representations).
      • Any that says “make x work like y” is unacceptable. Need to define both current and future state.
      • Break requirements into small deliverable modules. Get out of the business of 6 months or longer projects.
  • Contract Management
    • The payment schedule should coincide with deliverables. Not guaranteed payments without a check on deliverables.
    • Review contracts annually for what we use versus don't.
    • Measure yourself by how many unused features you have cut from current contracts.
    • Pay attention to overage charges you have paid over last 6 months.
    • Are the vendors pointing out to you when you are overpaying? If not, use it as leverage when negotiating.

Some of the points above, like Code Reviews, Test Plan reviews are very hard to pull-off. But in my experience, the smallest attempt at it pays huge dividends.

What learnings do you have when dealing with vendors? Please share in comments below.

Thursday, January 18, 2018

Principles

I recently read the amazing book by Ray Dalio called Principles. I highly recommend it. I took the advice in that book and started putting together a list of my principles. I expect this to evolve as years pass. Some of them are cryptic. Get in touch with me and I will explain!

1. Focus. Time is the most scarce resource. Once spent, it will never come back.Say no. Do right things well.

2. Simplify. No shit is easy. But it need not be complicated either.

3. Better to keep moving than stand still.
    “I am moving forward with this plan until I hear otherwise.”

4. Barrels and Ammunition. Barrels take an idea from conception all the way to shipping and bring people with them. Be a Barrel.

5. Favor Action over perfection.

6. Invest in learning faster. Move fast and break things. Actively seek criticism to poke holes in your ideas.

7. Come from Data and Materiality. Remove emotions. In God I trust, all others bring data.

8. Work at the intersection of What you are Good At, Enjoy Doing, will Create Value for the World.

9. Be Effective
    Time most scarce resource
    Measure Results, Contributions not Effort
    Make Strengths Productive
    First things First
    Make decisions

10. Do not answer executive questions qualitatively. Answer with results, contributions.
Exec: “How is your hiring going?”
Bad Answer: “Good”
Good Answer: “Reviewed 10 resumes, phone screened 3, scheduled 1 onsite next week.”

11. Autonomy, Mastery, Purpose for the team

12. Push authority towards  information

13. Control comes from Competency comes from Clarity

14. It is okay to not have all the answers. You never will.

15. Two biggest barriers to good decision making; ego and blind spots.

16. Build culture to take orders in the absence of orders.
Start Cultural Revolution:
    Keep what works
    Create shocking rules
    Incorporate ppl from other cultures and insert them at high levels within org
    Make decisions that demonstrate priorities

17. You don’t have to be liked in the short run, you DO have to be liked in the long run. Don’t shy away from being unpopular. Utilize throwing temper tantrums as a mode of communication.

18. Privilege, Luck, Skill, Hard work

19. Right Decisions come from Experience comes from Wrong Decisions

20. CEOs always act on leading indicators of good news, but only act on lagging indicators of bad news. Optimists most certainly do not listen to leading indicators of bad news. That is human nature.

21. There are only two ways in which a manager can impact an employee’s output: motivation and training. Providing both are the manager’s responsibility.

22. Manager’s output = Output of his org + Output of neighboring orgs under his influence

23. Design orgs using the communication channels as the skeleton. Design it for the people, not the managers.

24. Heighten the conflict where you see it. This will help bring up real issues and resolve instead of covering up.

25. The three areas in which all great leaders must excel: clarity of thought / communication, judgment about people, and personal integrity / commitment

26. Hire for world-class strengths, rather than lack of weaknesses.

27. Pain + Reflection = Progress.


Wednesday, October 1, 2014

My Car Buying Learnings

I recently went through a car buying ordeal and wanted to share some points that worked for me. Some of them were things that I learnt from reviewing the plethora of car buying advice on the internet and some of them were pure luck/thinking on the feet.

Which car to buy is another challenging decision to make. The points below make sense after you have narrowed down the car that you want to buy. 

1. Buy Consumer Reports report on your vehicle(s). This is an under $20 investment for a potential few thousand dollars purchase. So totally worth it. 

2. Get Pre-Approved on a loan, preferably from a Federal Credit Union (because they offer the lowest rates, per my research, at this time).

3. Ask for email quotes from multiple dealers. This way you can avoid visiting multiple dealerships and wasting time.

4. Focus on the out the door $$$. Ignore "free" throw ins like, tire locks, 2 year free maintenance etc. Recruit a buddy to constantly enforce this. I kept getting tempted to go down the 'throw-ins' path. Fortunately, my mother was my buddy who kept me from chasing the shiny object.

5. Pick a month when the month end is on Monday or a Sunday. In hind sight, I believe that this double whammy really worked for me.

6. Reach the dealer to sign the deal on Sunday late afternoon, as they are in a hurry to pack up as well. 4 PM is ideal.

7. Be ready to walk away, if anything does not feel right.

8. Follow the Consumer Reports Car Buying Guide to the 'T'.

9. Negotiate on the Financing Rate based on #2 above.

10. Negotiate on the Extended Warranty $$$ if you are getting one. Pay it outside the loan amount for lesser finance charge. I did not do this and in hind sight, should have.

11. Ask to waive the Doc fees.

12. Pitch one dealer against the other, ask to provide a better offer. Do not feel bad about this. Have a buddy constantly remind you to not feel bad about this.

13. One learning that I found missing in all the research I did was that Buyers Remorse is for real and you should prepare for it. If you are, like me, financially conservative (my wife prefers the term "stingy"), then this will hit you after the deal is closed. Acknowledging it makes it easier to manage it.

One litmus test, I discovered, to measure if you did good is how annoyed was the sales guy when you were signing the deal. The value of the deal is directly proportional to the  pissed off factor!

Do you have any tips or experiences to share on car buying? How about on home buying? Feel free to share those in the comments.

Tuesday, September 16, 2014

Pycasa - Group duplicate files

A problem that I have been battling is how unorganized my photos and videos are. I have multiple backups taken from the various devices, which has resulted in multiple copies of images and videos. Recently, I started thinking about how to organize those with the following goals:


  • Remove duplicates
  • Organize by year and by event
  • Remove images that are out of focus or not relevant


After manually going through a few folders, I realized that it was going to be a very onerous task. So I started thinking about writing a program to identify duplicates. I recently watched a great video by David Beazley on Python generators and it was exactly what I needed for my purposes.

So I created this program that groups the duplicates together and prints a list for review.

import os
import time
import fnmatch
from collections import defaultdict

#http://szudzik.com/ElegantPairing.pdf
#This is used to combine size_of_file_in_bytes and last_modified into one number associated uniquely to both
#using a pairing function.
def elegant_pairing_fn(x,y):
    if x == max(x,y):
        return (x*x)+x+y
    else:
        return (y*y)+x

#generates a sequence of file names for the given search pattern starting from the top directory.
def gen_find(filepat, top):
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist, filepat):
            yield os.path.join(path,name)

#generates a sequence of dictionary objects with detailed file information.
def gen_stat(filelist):
    for name in filelist:
        dstat = (dict(zip(osstatcolnames,os.stat(name))))
        dstat['filename'] = name
        dstat['unique_id'] = elegant_pairing_fn(dstat['size_of_file_in_bytes'], dstat['last_modified'])
        yield dstat

osstatcolnames=('protection_bits', 'inode_number', 'device', 'number_of_hard_links'
          , 'user_id_of_owner', 'group_id_of_owner', 'size_of_file_in_bytes'
          , 'last_accessed', 'last_modified', 'created')

t0 = time.time()
jpgfiles = gen_find("*.jpg", "C:\\Users\\Shantanu\\Pictures")
jpgfilestats=gen_stat(jpgfiles)

#Now group the file names with the same unique_id value
#The algorithm assumes that the combination of file size and last modified date is unique per image file.
# Meaning it is almost impossible to have images that are different and yet have the same size and last modified date.
# This is a safe assumption as long as the files have not been modified using some photo editing software.
filecount = 0
jpgfilegroups = defaultdict(list)
for l in jpgfilestats:
    filecount += 1
    jpgfilegroups[l['unique_id']].append(l['filename'])

#Print groups with more than one file names in it.
#  We do not care to review files that appear only once in our stash.
count = 0
groupcount = 0
for k,v in jpgfilegroups.items():
    groupcount += 1
    if len(v) > 1:
        count += 1
        print(count, k, v)
print ("Total files processed: ", filecount)
print ("Total file groups processed: ", groupcount)
print ("Total time taken: ", time.time() - t0)


On my laptop, the program processed ~11K files in 2.8 seconds and came up with 2689 groups of possible duplicates. Manually reviewing those 2689 groups is a much simpler task.

The algorithm assumes that the combination of file size and last modified date is unique per image file. Meaning it is almost impossible to have images that are different and yet have the same size and last modified date. This is a safe assumption as long as the files have never been modified using some photo editing software (which is conveniently true for my case).

I learnt some new tools as a result of this exercise:

  • A practical use of Python generators for scratching my own itch.
  • The Python os library and its functions
  • The mathematical concept of 'pairing functions' to uniquely represent two numbers as one.
  • EDIT 20141020: I just realized that jpgfilegroups is actually a hash table using chaining for collision resolution.
What problems have you solved using these tools? Feel free to use this code to organize your photos and videos.

Monday, September 15, 2014

My Latest Laptop

I wrote the first half of this post in 2011 with an update in February 2014.

After 7 years of superb performance from my IBM Thinkpad T40, it finally died of a power connection break inside the Motherboard. After doing much research, I decided to stick to Lenovo instead of changing brands.

I noticed in my behavior and of some of my friends too, that one doesn't switch laptop brands that easily.

So here is what I ended up with:
Lenovo Z570
Second Generation i7 (2.0 GHz)
8 GB RAM
240 GB OCZ Vertex 3 SATA 3 6GBPS SSD
700 GB 5400 RPM WD HDD
External Optical Drive

Here is how much it cost me (including taxes & shipping):
1. $10.59 - For 3 new Philip Head Screwdrivers
2. $32.85 - SATA - ESata connector cable for External Optical Drive
3. $51.65 - Optical Drive HDD Caddy
4. $742.69 - Lenovo Z570 Laptop
5. $49.99 - Acronis True Home Image
6. $459.99 - OCZ Vertex 3 SSD 250 GB
For a total of - $1347.76

That's the price I paid for a laptop with:
1. Almost 1 TB of total disk space
2. Super Fast SSD Performance (~20 Second Boot times)
3. Security of backups given the instability of SSDs!
4. External Optical Drive that is not taking up space in the laptop.

Update 20140205: Return of the IBM Thinkpad T40.
First, an update on the Z570. It has been working great. Like Jeff Atwood of Coding Horror fame says "A solid state hard drive is easily the best and most obvious performance upgrade you can make on any computer for a given amount of money. Unless your computer is absolute crap to start with.". It is well worth the price. Also, I have been fortunate that I have not had any catastrophic SSD failures yet.

Going back to the T40. I am glad that I did not throw it away. I was able to move most of the working parts from this T40 to another one that the IT department at work was trashing. I moved the HDD, Internal Wireless Card, Internal Bluetooth Card, RAM to the new chassis.

In turn, I learnt a lot about the inside of a laptop. Most of it is like a jigsaw puzzle. The parts are built in such a way that they fit in perfect slots, mostly. The biggect challenge I had was keeping a track of all the screws. Everytime I opened the laptop, I ended up with a few screws that I could not figure out where I took them out from!

One change that I did make was to move to Lubuntu 12.04. With Windows XP end of life in April 2014, and knowing the fact that the T40 hardware is too weak for Windows 7 and above, I decided to switch. And that was a smart move.

By switching to Lubuntu, I have extended the life of my T40 by at least another 2 years.

Now all I need is a new battery pack, 1 GB of RAM and a 60 GB SSD (SATA 2 will be fine). I will then put the current HDD into the optical drive bay and make the SSD the master. I should end up with the meanest T40 out there!

Do you have any insights to share on ways to alter laptops to make them more useful from a practical point of view? 

Saturday, August 9, 2014

Python GroupBy, Map & Reduce

I came across a really interesting data mangling technique while watching this presentation on advanced Python programming techniques

Here is the example from this talk. Suppose you have a list of dictionary objects, sorted by the 'id' key, my_list as defined below.
>>> my_list = [
    {'id':1, 'name':'raymond'},
    {'id':1, 'email':'ray@spkrbar.com'},
    {'id':2, 'name':'sue'},
    {'id':2, 'email':'sue@sally.com'}] #sorted

Using dictgroupby, map and reduce, there is a very elegant way of grouping all of those dictionaries by the 'id' to get the following list:

[
  {'id': 1, 'email': 'ray@spkrbar.com', 'name': 'raymond'}, 
  {'id': 2, 'email': 'sue@sally.com', 'name': 'sue'}
]

And here is how you do it:
>>>from itertools import groupby
>>> [dict (
    reduce(lambda y,z: y + z,
        map(lambda x: x.items(), v)
    )
)
for k, v in groupby(my_list, key=lambda x: x['id'])]

Notice how much this is 'SQL' like. Let us break this statement down to its individual components to understand what is happening under the covers. Like any SQL statement, we have to start deciphering it inside out.

Let us look at the for loop with the groupby operation in it. The groupby operation will return an iterator grouping by the 'key' parameter. In this case, it is an anonymous function that returns the 'id' value. Essentially, we are asking for the grouping to happen using the 'id' values (1, 2 etc.).

>>>print({k:list(v) for k,v in groupby(my_list, key=lambda x: x['id'])})

{
 1: [{'id': 1, 'name': 'raymond'}, 
     {'id': 1, 'email': 'ray@spkrbar.com'}], 
 2: [{'id': 2, 'name': 'sue'}, 
     {'id': 2, 'email': 'sue@sally.com'}]
}

Note that the 'id' values are the keys and they are also repeated as part of the values. This will come in handy for the next step.

Next, we map the anonymous function, which calls the items() method for the parameter passed in, over each of the groups returned from the groupby operation.

>>> for k, v in groupby(my_list, key=lambda x: x['id']):
...     print(map(lambda x: x.items(), v))
...
[[('id', 1), ('name', 'raymond')], [('id', 1), ('email', 'ray@spkrbar.com')]]
[[('id', 2), ('name', 'sue')], [('id', 2), ('email', 'sue@sally.com')]]

This gives us lists of tuples instead of a dictionaries, which makes the reduction step very easy.

Now, we reduce the list that comes out of the mapping step by doing a simple addition of lists. Addition over two lists results in a list with elements from both. We would not have been able to use the '+' operator if these were dictionaries instead. Notice also the duplicate 'id' tuple.
>>>print(reduce(lambda y,z: y + z, [[('id', 1), ('name', 'raymond')], [('id', 1), ('email', 'ray@spkrbar.com')]]))
[('id', 1), ('name', 'raymond'), ('id', 1), ('email', 'ray@spkrbar.com')]

The reduce step will also happen for the list for 'id' 2 in this example.

We are almost at the end now. The last step is to make a dictionary from the list coming out of the reduce step to remove duplicates and conform back to the input which was a list of dictionaries.

>>>print(dict([('id', 1), ('name', 'raymond'), ('id', 1), ('email', 'ray@spkrbar.com')]))
{'id': 1, 'email': 'ray@spkrbar.com', 'name': 'raymond'}

Note that the duplicate 'id' tuple got removed. As before, this step will also happen for the list for 'id' 2.

That is it! Now, we have what we need. As list of dictionary objects, grouped by 'id'.

[
  {'id': 1, 'email': 'ray@spkrbar.com', 'name': 'raymond'}, 
  {'id': 2, 'email': 'sue@sally.com', 'name': 'sue'}
]

What are some of the practical uses you see for this technique? Do you have any other slick trick to share?Let me know in the comments below.

Friday, September 14, 2012

How a Sub Registrar Officer does corruption

I have been following the India Against Corruption movement for the past 6 months. This movement inspired me to blog about my experience with the Sub Registrar Office in Pune, India where I witnessed corruption happening in the first hand.

The Right To Information act has made it a little difficult for corruption to go unnoticed. However, one of the biggest contributor to corruption is public ignorance. Let me spell out the details of how the events unfolded and explain this point.

I was trying to give my Father the Power Of Attorney so that he could do transactions on my behalf in Pune. In order to do that, one has to type up a 5-6 page document that states, in legal jargon, that you are giving the other person the right to act on your behalf. There are many templates floating around that are used for this. Since one of my distant relations happens to have a few lawyers on his staff, we approached them for some legal advice. Naturally, they got us the template, we filled in the blanks, and now we had a document ready.

This document really doesn't mean a lot until it has been registered by the State Department of Registration and Stamps.

Now, we knew that we had to get this document registered. So we asked the lawyers at our disposal how to proceed. And this is where our ignorance got the best of us. The lawyers immediately referred us to their "agent" who "knows" the process and will "get things done" for us. Obviously for a nominal fee!

The Department of Registration and Stamps has a website and there they have instructions on what is required to get your documents registered. If the lawyers, would have pointed us to this link, we would have been saved.

Anyway, we contacted the agent and asked him what was a good time to meet him and setup the "terms" of our transaction. It was decided that we should come to the Sub Registrar Office where he "works" the next day and he will have our job done within a few hours. As the ignorant fools we were, we fell for it.

So, this is how it works. This "agent" is sometimes a lawyer by education. We are not talking about the cream of the crop here, definitely. This guy has managed to get the LLB degree and is using a little miscommunication and a little bribery to get an "edge". So what is the "edge" you ask? Let me elaborate.

The department website allows you to make an online appointment for your business at the office. Once you have a token, you are expected to appear, with all documents and personnel, 30 minutes before your time of appointment and the rest will take care of itself. The Sub Registrar Officer, however, has the authority to book time-slots from his terminal. And this is how he can provide an "edge" to the "agents". For some money in cash, he hands out time-slots to the "agents". The "agents" now sell those time-slots to their customers by telling them when they should come to the office and be gone once complete. The reason the "agent" has an LLB degree is that you can substitute one lawyer in lieu of two witnesses for the registration. Nowhere on the website does it say that it is okay to bring in one lawyer instead of two witnesses. This "one lawyer" clause is written on a board in the Sub Registrar Office.

So, by selling appointments, that are available online for free, the Sub Registrar Officer is able to make a few bucks on the side.

It is not often that people go to the Sub Registrar Office. So, the attitude they have is that might as well pay a few extra rupees and be done with it. However, from my experience, I did not get any "edge" and it would have been much better if I would have used the website.

The lesson(s) I learnt from this experience is/are:
  • Check the government websites on what is needed for your work.
  • Don't follow the crowd by approaching an "agent".
  • Right To Information Act is a very powerful tool, use it to your advantage
What experiences have you had that would allow other's to not fall into such ignorance traps?

Tuesday, May 8, 2012

Umass Boston CS Alumni Speech 2012

I was invited as the speaker at the 2012 Alumni Party held by the CS department at UMass Boston.
It was an honor and a privilege to speak to the wonderful audience.
Here is the presentation.

Wednesday, March 3, 2010

Peter Norvig's Spelling Corrector in VB

Some Background: The idea behind this implementation was not to build the shortest or the fastest version of the spelling corrector. I looked at the list of languages that this was implemented in and found VB .NET missing. So I decided to fill that void. I also thought of it as a way to unite the cult of VB .NET programmers with the others.
(!!!Noble Peace Prize nomination here please!!!)

More importantly, I wanted to spell out each of the steps to make it easier to understand the concept. I also wanted to use native libraries so that you can dive into the spelling corrector concepts quickly without first having to learn other technologies like LINQ etc.

Please feel free to post any comments, suggestions for improvement and any bugs you find.

Copy the text below into a VB class file and get the BIG.txt file (http://norvig.com/big.txt).

' VB .NET Implementation of Peter Norvig's Spelling Corrector.
' VERSION 1.0, last updated 03 Mar 10.
'
' Peter Norvig's original article located at
' http://norvig.com/spell-correct.html
'
' Copyright (c) 2010 Shantanu Inamdar
'
' Permission is hereby granted, free of charge, to any person obtaining a copy
' of this software and associated documentation files (the "Software"), to deal
' in the Software without restriction, including without limitation the rights
' to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
' copies of the Software, and to permit persons to whom the Software is
' furnished to do so, subject to the following conditions:

' The above copyright notice and this permission notice shall be included in
' all copies or substantial portions of the Software.

' THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
' EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
' MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
' IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR
' ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
' CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
' WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


' Some Background: The idea behind this implementation was not build the shortest
' or the fastest version of the spelling corrector.
' I looked at the list of languages that this was implemented in and found VB .NET
' missing. So I decided to fill that void. I also thought of it as a way to unite
' the VB .NET programmers with the others.
' (!!!Noble Peace Prize nomination here please!!!)
'
' More importantly, I wanted to spell out each of the steps to make it easier to
' understand the concept. I also wanted to use native libraries
' so that you can dive into the spelling corrector concepts quickly
' without first having to learn other technologies like LINQ etc.
'
' Please feel free to post any comments, suggestions for improvement and any bugs
' you find.

Imports System.Collections.Generic

Public Class SimpleSpellingCorrector

Private Const COMMAASCII As Integer = 44
Private Const ALPHABET As String = "abcdefghijklmnopqrstuvwxyz"
Private nWords As New Dictionary(Of String, Integer)


Public Function getCorrection(ByVal word As String) As String
Dim c As String = String.Empty
Dim maxc As Integer = -1
Dim wc As Integer = 0
Dim candidates As System.Collections.Generic.List(Of String)

'Train the model with word occurences in our "dictionary"
nWords = getModel()

'Choose the most probable word with the shortest edit distance
For ed As Integer = 0 To 2
'If we have found a correction, exit loop
If String.Empty <> c Then
Exit For
End If

'Otherwise, start over
c = String.Empty
wc = 0
maxc = -1
candidates = getCandidates(word, ed)
For Each cd As String In candidates
wc = getWordCount(cd)
If wc > maxc Then
maxc = wc
c = cd
End If
Next cd
Next ed

'If no match is found, just send the same word back!
If String.Empty = c Then
c = word
End If

Return c
End Function

'Get the count of how often the word is found in our "dictionary"
'Return 1 for a "new" word
Private Function getWordCount(ByVal word As String) As Integer
If nWords.ContainsKey(word) Then
Return nWords.Item(word)
Else
Return 1
End If
End Function


'Get the big.txt file from http://norvig.com/big.txt
Private Function getModel() As Dictionary(Of String, Integer)
Dim model As New Dictionary(Of String, Integer)
For Each f As System.Text.RegularExpressions.Match In System.Text.RegularExpressions.Regex.Matches(System.IO.File.ReadAllText("big.txt").ToLower(), "[a-z]+", System.Text.RegularExpressions.RegexOptions.Compiled)
If model.ContainsKey(f.Value) Then
model.Item(f.Value) += 1
Else
model.Add(f.Value, 1)
End If
Next f

Return model
End Function

'Get Candidate words that are at the given edit distance
Private Function getCandidates(ByVal word As String, ByVal edits As Integer) As List(Of String)
Dim c As New List(Of String)
Select Case edits
Case 0
Dim wl As New List(Of String)
wl.Add(word)
c.AddRange(getKnown(wl))
Case 1
c.AddRange(getKnown(getEdits1(word)))
Case 2
c.AddRange(getKnownEdits2(word))
Case Else
c.Add(word)
End Select
Return c
End Function

'Get words that are at an edit distance of 1
Private Function getEdits1(ByVal word As String) As List(Of String)

Dim e1 As New List(Of String)
Dim splits As New List(Of String)

'Create a list of comma separated tuples of all possible ways to split the word
For i As Integer = 0 To word.Length
splits.Add(word.Substring(0, i) & "," & word.Substring(i))
Next i

For Each s As String In splits
'Deletes
If String.Empty <> s.Split(Chr(COMMAASCII))(1) Then
e1.Add(s.Split(Chr(COMMAASCII))(0) & s.Split(Chr(COMMAASCII))(1).Substring(1))
End If

'Transposes
If 1 <> s.Split(Chr(COMMAASCII))(1) Then
e1.Add(s.Split(Chr(COMMAASCII))(0) & c & s.Split(Chr(COMMAASCII))(1).Substring(1))
End If


'Replaces
For Each c As Char In ALPHABET.ToCharArray
If String.Empty <> s.Split(Chr(COMMAASCII))(1) Then
e1.Add(s.Split(Chr(COMMAASCII))(0) & c & s.Split(Chr(COMMAASCII)(1).Substring(1))

End If
Next

'Inserts
For Each c As Char In ALPHABET.ToCharArray
e1.Add(s.Split(Chr(COMMAASCII))(0) & c & s.Split(Chr(COMMAASCII))(1))
Next
Next s

Return e1
End Function

'Get Known words that have edit distance of 2
Private Function getKnownEdits2(ByVal word As String) As List(Of String)
Dim ke2 As New List(Of String)
For Each e1 As String In getEdits1(word)
For Each e2 As String In getEdits1(e1)
If nWords.ContainsKey(e2) Then
ke2.Add(e2)
End If
Next e2
Next e1
Return ke2
End Function

'Get Known words; get rid of unknown words
Private Function getKnown(ByRef words As List(Of String)) As List(Of String)
Dim k As New List(Of String)
For Each w As String In words
If nWords.ContainsKey(w) Then
k.Add(w)
End If
Next w
Return k
End Function

End Class

Thursday, January 7, 2010

If it's not personal, it's just business!

As Solozo is explaining the attack on Vito Corleone to Michael, he says that it is not personal, it's business. What needs to be clear here is that it is in regards to the reaction. But the action that caused the reaction was very personal to Solozo.

But I find it often used in an incorrect context of the actual action.

What does this have to do with software development? We had a situation a few weeks back where a project had two people managing/leading it. And it didn't seem to make progress as was expected. I asked one of the leaders, who is also my colleague, if it was imperative for her to have this project done. And the reply was a reluctant no. That's when I knew that the project was headed nowhere.

A few years back, I was working on a project and discussing a complex scenario with my customer. After discussing possible solutions, she asked how in the world was this going to get done, given the resources and time. Then she said something that moved me to the guts. She said that she was so dependent on this project that she would be completely stranded if it didn't happen. I was so motivated to get the project done that I did everything that I could to take it to successful completion.

That's what I mean by that it has to be personal. If a project is not personal for at least one of the involved parties, it is never going to reach completion. That is what the difference is in the commitment of Founders vs Employees.

So, if it is not personal, it is just business. And if it is not personal, good luck getting it to successful completion. (But of course, the definition of 'successful completion' is subjective!)

Wednesday, March 25, 2009

Simple Database Tuning for SQL Server

Simple Database Tuning for SQL Server 2000:
When you Google for database slowness issues, you get plenty of hits to give you grey hair reading through all of them. After going through a number of them, and reading through some great SQL Server books, here is another addition to that lock of grey hairs!
Most of the articles that I have read online assume that you have taken care of the basic stuff, and now hunting to further fine tune your installation. But I am not going to focus on “fine” tuning; rather on the basics.

I agree that SQL Profiler is a very powerful tool, and I am a big fan of it myself. But before you go near it, you should have some basics covered. So here goes nothing:

1. Primary Keys are a must: You will be surprised as to how many “underground” installations are out there that don’t even follow this basic rule. Then these self taught DBAs run the SQL Tuning Advisor tool, which asks you to build a non-clustered index on a column which is supposed to be the primary key! So, please, make sure ALL your tables have primary keys. (FYI: A table that does not have a defined primary key is technically called a heap)

2. 80-20 Rule: 80% of the database problems lie with 20% of the tables! SQL Server is a very robust piece of software and will take abuse from bad design, lack of indexing etc up to a limit. Then, it just grinds to an agonizing halt. To spare it from this fate, run the following set of command:

DBCC UPDATEUSAGE('{your database name here}')

CREATE TABLE #DbTableSizes
(DbTableSizesId INT IDENTITY (1,1)
, TableName varchar(100)
, [Rows] BIGINT
, Reserved varchar(50)
, Data varchar(50)
, IndexSize varchar(50)
, Unused varchar(50))

INSERT INTO #DbTableSizes
EXEC sp_MSforeachtable @command1='EXEC sp_spaceused ''?'''

SELECT * FROM #DbTableSizes
ORDER BY [rows] DESC

DROP TABLE #DbTableSizes

This will give you a list of the tables that are most likely to be in the 20% that are causing problems. Those with the most number of rows are the ones to focus on. If you have tables with large amount of data (meaning large number of columns), but not really a huge number of rows, focus on those as well.

3. Now start targeting tables in this list, one by one. Make sure these have primary keys defined.

4. Using the following query, find out what stored procedures use these tables. You can also use the Action -> All tasks -> Display Dependencies menu option for this.
SELECT OBJECT_NAME(id) AS [Name]
FROM syscomments
WHERE [text] LIKE '%{table name here}%'
AND OBJECTPROPERTY(id, 'IsProcedure') = 1
GROUP BY OBJECT_NAME(id)

5. Now, from within each of the stored procedures and functions, get the queries that are using the table under review. You should create a list of all unique WHERE clauses used on this table.

6. For each of those WHERE clauses, look at the Query Plans, to check none of the following evils are present:
a. Table Scans
b. Clustered Index Scans
c. Parallelism, unless you know you have multiple CPU’s on your server
d. Bookmark Lookups, unless you know exactly why this is happening

7. Modify/Build indexes on the table so that you rid the query plans of the above evils.

This sounds like a very tedious process, and it is. But the benefits are pretty big. You will see a significant improvement in database performance.
If you want to take it a step further, try to build a script that will do all the data collection (Steps 1 through 6) for you and you can review the report periodically. (Maybe I will work on that as a side project)

Advanced Database Tuning for SQL Server 2000:
Once you have taken care of the basics, let us suppose you still find that the database performance is slow. I will point you to a very interesting methodology by Itzik Ben-Gan in the book, Inside SQL Server 2005: T-SQL Querying, chapter 3.
I have used most of the techniques in this chapter and they have helped me to dig much deeper into the slowness/high CPU usage issue that we experienced with one of our production installations.

Thursday, May 8, 2008

Stocks and Employees

Just a random thought popped in my mind on how investing in stocks makes you a better employee. The Jim Cramer (http://www.thestreet.com ) style of Buy and Homework instead of Buy and Hold really drives the point through.

When you have money invested in a stock, you should be spending at least one hour per week doing home work on it. Which includes reading all articles published on that stock by management, analysts and media.

I think we should apply the same philosophy for the company that you are employed at. Spend at least an hour or so googling your company every week. Believe me, it will be worth while. Make sure you do it on your own time though.

So why does this make sense to me? Well, if you look at it, your money is invested in your employer in multiple forms. Your future pay checks, benefits, 401(K) etc. So when your livelihood is "invested" in your employer, it only makes sense you do your homework.

Look at your company in the same way you would look at any other that you own stock in.
- Who are the competitors?
- How is it doing?
- How is it doing against the competitors?
- What are the analysts saying?
- What is management saying?
- What is the media saying?

Here are a few additional questions to ask as an employee:
- Is your job aligned with the company strategy?
- Does your job have an impact on the stock price (or the bottom line for private firms)?
- Is management looking at your position/department as an important part of the company strategy?

Unless you are on top of your homework on this big position that you have in your life's portfolio, you leave yourself vulnerable to the bears!

Monday, April 21, 2008

Anticipation - Key to Success Part I

I am coming up with examples of why Anticipation is the key to success. In this multi-part series, I am trying to come up with analysis of various examples that I come across which emphasize the point that the better you anticipate, the more successful you get. Of course this sounds like a no-brainer but it is interesting to observe how this appears in our "regular" life!

1. Star Wars Episode I - Over the past few weeks, the Star War movies have come back to life on TV and so I was catching up on them. In this classic movie, Qui-Gon Jinn says the following about Anakin Skywalker's Podracing skills "He seems to be able to see into the future. That is why he is the only human who can Podrace." And this statement got me thinking! What the Jedi Knights posses is strong anticipation.

Of course we know that nobody can "see" the future. But the closest you can get to it is by anticipating based on the present situation and past experiences.

2. Cricket - When you first start playing as a batsman, you tend to start off with reacting to what the bowler is bowling at you. So you are in the reaction mode, and hence one step behind.

As your experience grows, you start paying attention to the grip of the bowler to try and anticipate the movement of the ball. This increases the amount of time you have to react to the ball.

But the cricket greats like Sir Don Bradman, Sachin Tendulkar and Brian Lara take it to the next level. They have already moved past the stage of paying attention to the grips and the other visible cues. They reach the ultimate stage where they can anticipate what the bowler is "thinking" and get it right most of the time. That is what propels them to greatness!

Coming soon are some observations from Stock Picking and American Football.

Technical Interview Questions List

I know that there is a plethora of websites/blogs etc. that will list technical interview questions with answers. My focus here is to keep a running list of questions as they pop up in my mind.


General Logic/Programming.

1. How would you reverse a string most economically (Space and Time)? How would you test this?

2. How would you program a Fibonacci Series/Factorial? Recursively and Non-recursively? What are the pros and cons of both approaches?

3. What is the difference between passing parameters by value and by reference to a method?

4. Do you know pointers? What they mean and represent?

5. What is an Interface in OO parlance?



.NET Specific.

1. Difference between Overrides and Overloads.

2. Difference between Overrides and Shadows. (This is not easy)

3. What is the difference between a Function and Sub?



SQL SERVER 2000 Specific.

1. What is the difference between the following SQL Statements:
a. SELECT @MyVar = Column1 FROM MyTable
b. SET @MyVar = (SELECT Column1 FROM MyTable)


Java Specific.



Others.

1. What is the most technically challenging project you have ever worked on?

2. What is the work that you are most proud of? Why?

3. What is your faviourite Dilbert Cartoon?

4. Why do you want to work here?

5. Why are you looking?

6. Which is your faviourite Programming Language? Why? Second faviourite?

7. Which is your faviourite Database? Why? Second faviourite?

8. What positive feedback did you recieve in your last review? What were some of the areas to improve?