Friday, April 19, 2013

What is MapReduce?

Its a framework to write programs that allow to process massive amount of unstructured data in parallel across a distributed cluster of computers. It achieves this using 2 functions: 

1.  Map: This function routes chunk of processing job across various nodes in the cluster. 
2.  Reduce: This function collates the work from various nodes, and resolves the results into a single value. 


This framework is said to be fault tolerant. How it achieves it, is by listening to various nodes to reply from time to time.. If a certain node does not reply in some amount of time, the node is thought to be dead. The work that was assigned to that node, is then reassigned to a different node. This way the detection and repair of any failures on the nodes is done at the application side. 

MapReduce allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the others, all maps can be performed in parallel – though in practice it is limited by the number of independent data sources and/or the number of CPUs near each source. MapReduce is important because it allows ordinary developers to use MapReduce library routines to create parallel programs without having to worry about programming for intra-cluster communication, task monitoring or failure handling. It is useful for tasks such as data mining, log file analysis, financial analysis and scientific simulations.

There are various implementations of MapReduce in market today. Hadoop being one of the leading MapReduce implementation in market. The Hadoop project provides end to end Big Data Services.

Thursday, April 18, 2013

What is Apache Hadoop?

Its an open-source software/framework for reliable, scalable, distributed computing.

This framework basically provides for distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Hadoop's own distributed file system (HDFS) allows for rapid data transfer among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce which is also a programming framework where application is broken down into small parts so that it can run on different nodes individually (Map) and then the results are collected and compiled as one (Reduce). 


Wednesday, April 17, 2013

What’s next after Cloud Computing – Big Data?

Yes.. Big Data is no longer just a Buzz word. Its a reality now where companies are understanding the real need for it.
Gartner defines Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Organizations are discovering that important predictions can be made by sorting through and analyzing Big Data. But analyzing this data is not straightforward since a lot of data is unstructured and there are computational boundaries. Here is where Cloud technology will come into play.

Tuesday, April 16, 2013

Unix File Permissions



The Unix files access is controlled. There are three types of access (permissions):
  • read
  • write
  • execute
Each file belongs to a specific user and group (ownership).

Access to the files is controlled by user, group, and what is called other/everyone permission bits and is usually set using a numerical value. For example, 644 as permission bit will result in:

Owner / User      Group    Other/ Everyone
         6                   4                 4


Each number represents the access level and it can be from 0 to 7. The access level, depending on the number is as follows:

0 - no access to the file whatsoever
1 - execute permissions only
2 - write permissions only 
3 - write and execute permissions
4 - read permissions only
5 - read and execute permissions
6 - read and write permissions
7 - read, write and execute permissions (full permissions)

Thus the above 644 permissions example will look like this:

Owner / User - Read and Write 
Group - Read and Write 
Other/ Everyone - Read only

To allow a script to be executed and read by everyone but the only one who can write in it is your user, you would need to set 755 as permissions.

Thursday, April 11, 2013

Resolving mysql error 111


MySql Error 111 means connection refused, which may be coming because of several reasons. Some of which are: 

1.  Your MySQL server is configured that it listes to only connections from localhost. So check if you are getting this error only from some other machine or even from localhost where your DB Server is installed. 

If its only from other machines, check your my.cnf file for below items, comment them and restart tomcat. 


skip-networking
bind-address = 127.0.0.1


2.  If you could login as root to mysql then you should add user privileges. Try command below: 


GRANT ALL PRIVILEGES ON * . * TO  'username'@'IP_ADDRESS' IDENTIFIED BY  '*44612AC693E3B8F7AEA36B50168860122FE106A8'
flush privileges;

The string "*44612AC693E3B8F7AEA36B50168860122FE106A8" is actually the hash password as generated by mysql. Use below command to generate these hashes: 


mysql> select password('test123!');
+-------------------------------------------+
| password('test123!')                      |
+-------------------------------------------+
| *44612AC693E3B8F7AEA36B50168860122FE106A8 | 
+-------------------------------------------+
1 row in set (0.02 sec)


3.  Check firewalls to make sure that they are not blocking the connection. 

cURL Error (7): couldn't connect to host



"cURL Error (7): couldn't connect to host"

This error is self explanatory by itself.. It means that the connection to the host was not possible. This can be due to: 

1.   Your server has not started correctly. 
2.   The URL of the server you have entered is not correct. 
3.   There are firewall or other network settings which are preventing to browse the URL. 

The best way to troubleshoot such problem is first to test if we can browse the URL via a browser. 

Monday, April 8, 2013

DHCP configuration on a Ubuntu machine



-:DHCP configuration:-

·         Find out if the machine is detecting the network interface with command:-
Ifconfig –a | grep eth
·         It will tell you the network interface address for example eth0 eth1……. eth 6.

·         You can control the logical name for mac address by doing changes  in the below configuration file
vi /etc/udev/rules.d/70-persistent-net.rules.

for reference:-
 





·      Configure network configuration file .i.e. interface at location  vi /etc/network/interfaces as:-

 
·  







       Sudo ifup eth0 (will use dhclient for obtaining the dynamic IP/... if it does not restart the machine).



Wednesday, April 3, 2013

svn: Failed to add file : object of the same name is already scheduled for addition


This error is generally seen when you have made changes to the main svn area directly. It will come up everytime you will try to do svn up. 

To resolve revert back using following commands and then checkout/update from repository. 


svn revert .
svn cleanup
svn update

Tuesday, April 2, 2013

Change java Keystore Password and Private Key Password


Change Java Keystore Password

keytool -storepasswd -new new_storepass -keystore keystore.jks -storepass password

Change Private Key Password 

keytool -keypasswd -alias client -keypass old_password -new new_password -keystore client.jks -storepass password