Sorting variable length strings in lexicographic order C++

Posted: December 18, 2014 in Programming
Tags: , , , , ,

On an occasion i had a file with words listed one by one and i wanted to sort the file in lexicographic order. Since i work in linux machines , its just 1 command . `cat filename | sort ` to sort it in lexicographic order. But i wanted to do it in c++.  I tried googling for suggestions and found out that radix and bucket sort would help me out. And then i tried implementing the concept of Radix MSD to arrive the result. Sharing the code. I have added inline comments in the program. Hope it helps  (Note: i just wanted to arrive with the solution.. Not much concerned about efficiency of the program :-))

#include <iostream>
#include <map>
#include <vector>
#include <fstream>

using namespace std;

int main()
{

	vector<string> words;

	map<char,vector<string> > bucket;
	map<char,vector<string> > ::iterator it;
	bucket.clear();
	vector<string> tempVec;

	// initializing map with null vector and empty for variable length string
	bucket[' '] = tempVec;
	char ch;
	//for(int i=97; i<123; i++)
	for(int i=65; i<91; i++)
	{
		ch = i;
		bucket[ch] = tempVec;
	}

	words.push_back("jump");
	words.push_back("sit");
	words.push_back("carry");
	words.push_back("run");
	words.push_back("site");
	words.push_back("runner");


	/*cout << "init" << endl;
	  for(int k=0; k<words.size(); k++)
	  cout << words[k] << "  ";
	  cout << endl;*/

	// finding the maximum character
	int max = 0;
	for(int i=0; i<words.size(); i++)
	{
		max = (max > words[i].length())?max:words[i].length();
	}

	// loop from max char till first char 
	for(int i=max-1; i>=0; i--)
	{
		//cout << "Running for Phase " << i << endl;
		// loop through the entire vector
		for(int j=0; j<words.size(); j++)
		{
			string word = words[j];
			int temp_len = word.length();
			// there may be possibility that word boundary is less than max
			if(temp_len < i)
			{
				// push into the "" bucket
				//cout << "Pushing " << word << " into bucket " << "[empty]" << endl;
				bucket[' '].push_back(word);
			}
			else
			{
				// index exists .. Hence check the char
				char temp_char;
				temp_char = word[i];

				// insert into map with temp_char, word
				bucket[temp_char].push_back(word);	
				//cout << "Pushing " << word << " into bucket " << temp_char << endl;

			}


		}
		// end of this loop, all the words are inserted into corresponding buckets
		// clear the input vector -- because we are going to change the order
		words.clear();

		// iterate through the keys and write the order into input vector
		for(it=bucket.begin(); it!=bucket.end(); it++)
		{
			tempVec = it->second;
			for(int k=0; k<tempVec.size(); k++)
				words.push_back(tempVec[k]);

			tempVec.clear();
		}

		/*cout << "After bucket insertion words are  " << endl;
		  for(int k=0; k<words.size(); k++)
		  cout << words[k] << "  ";
		  cout << endl;
		 */


		// now we have the correct order of the phase .. hence clear the bucket
		tempVec.clear();
		for(it=bucket.begin(); it!=bucket.end(); it++)
		{
			it->second.clear();
		}

		//cout << endl;


	}
	cout << "Answer is " << endl;
	for(int k=0; k<words.size(); k++)
		cout << words[k] << endl;


	return 0;
}

[ALGORITHM]
1. Read the words and push into vector
2. find the max value by iterating the vector and finding length of all words
3. Iterate from max to 0
4. take each words first letter and push into corresponding bucket
5. Iterate through the buckets and save it as input word list order
6. After 0th Iteration , input vector will have the result.

[EXPLANATION]

step – 1
words pushed into vector

jump, sit, carry, run, site , runner

step – 2

finding the maximum length string
it would be runner – 6

step – 3

Iterating from index 5 to 0

ITERATION – 5 :
Note : if the word length is less than index , we need to push in empty bucket(‘ ‘)

‘ ‘ – jump,sit,carry,run,site
r – runner

Hence, array order will be – jump, sit, carry, run, site

ITERATION – 4 :

4th index of the words carry and runner are [y] and [e]

‘ ‘ – jump, sit, run, site
e – runner
y – carry

Hence, array order will be – jump, sit, run, site, runner, carry

ITERATION – 3 :

3rd index for the words jump, carry, site, runner [p],[r],[e],[n]

‘ ‘ – sit, run
e – site
n – runner
p – jump
r – carry

Hence array would be sit, run, site, runner, jump, carry

ITERATION – 2 :

2nd index for the words sit, run , site, runner, jump, carry are [t],[n],[t],[n],[m], [r]

m – jump
n – run, runner
r – carry
t – sit, site

Hence array would be jump, run, runner, carry, sit, site

ITERATION – 1

1st index for the words jump, run, runner, carry, sit, site are [u],[u],[u],[a],[i],[i]
a – carry
i – sit , site
u – jump, run, runner

Hence array would be , carry, sit, site , jump, run, runner

ITERATION – 0

0th index of the words are [c], [s], [s], [j], [r], [r]

c – carry
s – sit, site
j – jump
r – run, runner

Hence OUPUT : carry, sit , site , jump, run, runner

Advertisements

Comments are closed.