As we have seen in our older post the uses of big data and we also examined the use of big data in ecommerce in another post. Today we understand how this big data is used in forming government policy. Government collects all sorts of data from the public, it has access to birth data, death data, area of country, security data. Government can use this data to reform policies even in real time. For e.g. When the government has access to security data like how many soldiers are present in the border areas, coastal areas and hilly regions, it also knows the no. of penetrations from terrorists in different regions, it can see the current assigned soldiers and assigned equipment. Through all these information the government can formulate a plan for research and development of particular weapons, and same for training soldiers for particular tactics..
Now let see some working examples, here we have a dataset of forest area from all the districts of Indian States. This data is from data.gov.in. and can be downloaded from there. Now letâ€™s go into some practical action.
The objective is to find the best districts which have good amount of dense forest cover, and the districts which really need to increase their forest cover. Now this insight into the forest area in every district helps Government to know how much budget is to be given to forest development department.
For this we use hive and pig to extract the results. The use of Hadoop enables the real time processing of this data to give results. Hadoop increases the efficiency and better reliability in comparison to SQL databases.
First we loaded the data into hive table.
Now we were able to find the districts with best and least forest cover percentage to their land area according to the constraints as >75% and less than < 10% respectively.
Now the command >>select * from forest where percentage >75; when used on MySQL generate the same result but the time taken is a lot when the no. of records are huge. But as we can see in screen shots that this query fires a map reduce operation that divide the file into parts and then searches for the records this decreases latency which we generally experience with SQL databases when handling large volumes.
Now we have to find how many of these districts exist of each kind so that we can look how many district need special attention to increase their forest covers and look which cities donâ€™t have any very dense forest so as to declare some parts as green belts.
From the pictures above we can see count of all districts is 586 and count of districts which have more than 75% cover is 43, and the districts with <10% coverage are 25.
Using this information government can formulate plans, like giving special budget for these districts for forest development. Equipment grants for pipelines and gardeners can be employed.
Districts with very less forest covers including scrubs and open forests,can be asked to declare green belts in them so as to increase greenery.
Similarly we can find out the changes in current and past years in forest area using Big Data. This forest case study was just an example, similarly you can also find market rates, stock market predictions, real time pricing of products and country economy growth factors.
Related links you will like:
Introduction to Hive Partitions