Data Scientist- New Demand at global market

Data Scientists are professionals who can analyze and explain complex digital data. They organize varying data elements with various techniques including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, visualization, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.

To start your career as a Data Scientist, you are required to having a clear knowledge in Data Science. A Data Scientist dealing with the big data an important aspect of data science. In fact, data science is such a relatively new and rising discipline that no one can easily learn it without any training program to be become a data scientist

Demand of Data Scientist

A lot of companies and industries have dramatically increased investments in their digital platforms in the past few years. It has been estimated that by 2020, IT departments will be monitoring 50 times more data than they are today. This tidal wave of data is driving heavy demand for Data Scientist to manage these very large data sets into a competitive advantage.

The study warns there is a significant shortage of qualified Data Scientist to analyze big data sets adequately. According to the report, a shortfall of about 140,000 to 190,000 individuals with analytical expertise is projected by 2018.. The study also predicts a need for an additional 1.5 million managers and analysts by that same date to fully engage the true potential of the currently available data.

Thousands of data scientists are already working at both start-ups and well-established companies. Simply put, if any organization stores multiple petabytes of data, you’ve an opportunity there.

  • Yahoo, one of the firms that employed a group of data scientists early on, was instrumental in developing Hadoop.
  • Facebook’s data team created the language Hive for programming Hadoop projects.
  • Many other data scientists, especially at data-driven companies such as Google, Amazon, Microsoft, Walmart, eBay, LinkedIn, and Twitter, have added to and refined the tool kit.
  • Even the largest firms, such as Accenture, Deloitte, and IBM Global Services, are in the early stages of leading big data projects for their clients.

What to learn to become Data Scientist

    A basic understanding of statistics is vital as a data scientist. Data science is a blend of skills in three major areas:

    1. Mathematics Expertise
    2. Technology Hacking Skills
    3. Tactical Business Consultant
  • You need to learn Hadoop Technology, R Studio, SAS Language, Excel and Tableau
  • Statistical programming language, like R or Python, and a database querying language like SQL
  • You should be familiar with statistical tests, distributions, maximum likelihood estimators, etc.
  • You need to be familiar with the machine learning methods, like k-nearest neighbors, random forests etc.
  • Having knowledge of Data wrangling
  • It is important to visualize data, with the principles of visually encoding data and communicating information

Data scientist jobs responsibilities:

After learning Data Science, you will get many opportunities and jobs in various sectors and industries. There are many resources out there may lead you to believe that becoming a data scientist requires comprehensive mastery of a number of fields, such as software development, data munging, databases, statistics, machine learning and data visualization. Don’t get bogged down with all the jargons being used. Once you learn it, it’ll be much easier.

Listed below are the job responsibilities of a Data Scientist:

  • Data scientist make meaningful data-like contributions to the production code and provide basic insights and analyses.
  • They are responsible for creating information visually and making the patterns clear and compelling.
  • They can advise executives and product managers on the implications of the data for products, processes, and decisions.
  • Your job might consist of tasks like pulling data out of MySQL databases, creating advanced Excel pivot tables, and producing basic data visualizations.
  • Data scientists are deals with number of activities, such as data mining and analysis, which can help businesses gain a competitive edge.

Salary Expectation of Data Scientist

If you became a good data scientist, there will many doors open to you and salary will grow automatically. Several data scientists working at start-ups commented that they’d demanded and got large stock option packages. Salaries of Data Scientists range from $85K on the average low end to $195K on the average high end.

According to PayScale, a Data Scientist earns an average salary of Rs 615,786 per year.

According to GlassDoor the average Data Scientist salary is Rs 650,000 per year.

Conclusion:

As professionals looking for job change and want to pursue better career growth, LinkedIn reveals the top 10 most promising job roles of 2017. According to them Data Scientists or Data Engineers enjoys a career advancement score of 8 out of 10 and job openings are up 85 percent year over year. Courses and books are great for developing fundamental technical skills, but many data science skills can’t be properly developed without proper training in data science course where you learn how data sets are well groomed.

If you want to start a career as Data Scientist and learn Data Science, I would recommend you to go through a structured training on Data Science. If you take up a structured training in Data Science, you will find it very easy to land up a dream job as Data Scientist.

Hope this post helps you. If you have any questions you can comment below and I will be glad to help you out.

6 Excel Functions that every SEO Expert must know

A good SEO professional knows his data well and also knows how to simplify it. When you are working on huge Excel sheets, you can’t do all the things manually. If you try, you will end up wasting a lot of time and doing wrong calculations and therefore Excel has some very interesting functions that can be very helpful to SEO professionals. The list goes as below:

CONCATENATE function in Excel

CONCATENATE function helps you to combine strings of different cells. This function proves to be very useful to the SEO professionals as you can easily combine the pieces of URLs (see image), can create sitemaps, can create link building queries etc.

Using Excel’s CONCATENATE function in SEO

Find Duplicates and Remove Duplicates in Excel

When you are working on too many keywords, there may be instances where you add certain keyword that already exists and that is where the Find Duplicate function of Excel will help you the most. Find duplicate function will help you to highlight the duplicate values in your data, so if there are any duplicate keywords, you can easily highlight it by using this function. You can get the Find Duplicate option in conditional formatting.

Using find duplicate function for SEO

If you think that there are too many duplicates and deleting them will take too much of time then you can use Remove Duplicate function of Excel which will remove all the duplicates at one go. Like we had 2 duplicate values in the above example, so next we delete those duplicates using the Remove duplicate function and you will see that only 14 unique values are remaining.

removing duplicates in Excel

Pivot Table in Excel

Working in the field of SEO, you will often come across excel sheets that can scare you as they contain huge data. Analyzing so much of data at a time can create a lot of confusion. Pivot table helps you to break down the data into smaller part which makes it easier to deal with it. With the help of Pivot Table you will be able to see only the data that concerns you. You can also manipulate the data and include the metrics that you want.

PROPER function in Excel

PROPER function in Excel capitalizes each word in a string. This function can be of great help when you are creating the page titles. So you can just write your page titles without worrying about the Caps lock key and at the end, use the PROPER function and you will see that your data looks perfect.

Using Excel’s PROPER function in SEO

LEFT, RIGHT, MID function in Excel

In the CONCATENATE function above, we saw that how we can join the pieces of URL but what if you want to separate the URL into pieces. With the help of LEFT, RIGHT and MID function you can exactly do that. These functions help you to extract leftmost, middle or the rightmost characters from a string.

Using Excel’s LEFT, MID, RIGHT functions in SEO

LEN function in Excel

Whether you are creating a meta description or title, making sure that it does not exceed the character limit is a task. You can’t check again and again and that is why the LEN function in Excel is very handy. LEN function helps to find the length of a string i.e number of characters in a string. So you can easily make sure that characters in your title are not more than 60 and characters in your meta description are not more than 160.

Using Excel’s LEN function in SEO

You can learn to use CONCATENATE, PROPER, LEFT, MID, RIGHT, LEN and many more text functions here

So that was the list of the functions that we feel are very important for SEO experts, if you think that we missed out something, feel free to mention it in the comments box below.

Watson Analytics: IBM’s Big Data vision for businesses

Watson Analytics

Image source: Watson Analytics

Watson Analytics, the new product of IBM which claims to bring the sophisticated big data analysis to the average business user is announced by IBM yesterday.

A cloud application, Watson Analytics does all of the heavy lifting related to big data processing by retrieving the data, analyzing it, cleaning it, building sophisticated visualizations and offering an environment for communicating and collaborating around the data. Watson label, as many would think, is not slapped on it to take the advantage of the brand fame. IBM explains the name as the source of the technology underlying the product including the ability to process natural language queries. Vice President of worldwide marketing for Business Analytics says the goal of the product is to put “powerful analytics in the hands of every business user.” As he says, “People understand they should be making better decisions to leverage data and analytics, but the reason they don’t is it’s too hard.”

For the most part analyzing big data today requires access to vast infrastructure resources, a team of developers and data scientists and there aren’t enough of the latter. Sall said that means getting answers can take days or weeks and in today’s business climate, that’s just not acceptable. What’s more, business users shouldn’t have to beg for access to the information they need to do their jobs.

Watson Analytics aims to remove the barriers that most business face for analyzing big data which includes vast infrastructure resources a team of developers and data scientists who aren’t available in the market readily Since Watson is in the cloud, so you don’t have to worry about back-end infrastructure, and you don’t need the developers and data scientists because the software is taking care of all of that.

You can start with an existing data source such as Salesforce.com CRM data or you can import your own data. Sall says the base product comes with connectors to many popular business tools. Once you have a data source, you can ask a question, explore the data to see what you can find serendipitously or you can use one of the story templates that comes with the product, which Sall says takes you down a path to explore the data in a standard kind of way.

If you’re looking at sales data, for example, chances are there are some standard questions you want to explore and the template points you there, but you are free to ask questions as well and Watson will process those questions and deliver an answer. Often though, Sall says, business users are looking at data and they don’t really know what to do with it or where to start precisely because they lack the training and understanding. The templates provide a way to get users going when they don’t know what to do by providing a base set of information and visualizations.

What’s more, Sall says they are offering a free version that’s free forever in the IBM Cloud Marketplace. He sees this as removing a barrier to access and says the free version is actually pretty sophisticated.

The upgrade will offer premium features such as additional storage and direct connections to enterprise repositories, which many companies will want to access for their data analysis.

Sall admits this is a big change in sales strategy from men and women in blue suits selling to the CIO or IT pros, but he says the company really wants to push this product to as many people as possible and they believe the freemium model is a way to get it out there where they can upsell to departments and companies once individual business users or departments are comfortable with it.

The digital market approach is really part of the overall IBM cloud strategy. Watson Analytics is itself delivered on the Softlayer platform, the infrastructure provider IBM purchased in June, 2013. It will also be offered as a service through the Bluemix Platform as a Service developer platform to provide a way for developers and other interested parties to build Watson Analytics into third-party applications. Sall says data providers are a big focus of this effort and they hope to see them integrating into Watson Analytics in the future.

He says this new cloud approach reflects where the world is going and where IBM needs to be as a company if it wants to survive. “It’s where we have to go as company. We can’t pretend the world’s not changing, Of course it’s changing,” Sall said. And delivering a cloud product built on their own infrastructure platform, using the freemium model shows that IBM is trying to do business in new ways.

The product goes into Beta this month and they are shooting for general release by the end of the year. As a cloud service, it will run on a variety of platforms including tablets, smartphones and PC/laptops, but there are no dedicated apps yet.

Data Science should be used with caution

Data science has become the hot and trending topic these days all across the globe. Agreeing to the fact, United States Attorney General devoted much of his recently-published annual letter to the US Sentencing Commission. Holder’s letter carries a discussion over the applications of data analytics in criminal justice and stresses on the efforts to use data for risk and needs assessments of offenders.

The newly derived term ‘Data Science’ generally refers to the process of extracting some type of useful information from a set of data. The application of Data science mainly includes using a variety of statistical methods in an attempt to ‘learn’ a pattern in a data set and then to copy that pattern for a specific purpose.

According to Eric Holder, there has been an increased effort at the state and local levels to use data in the criminal justice system. In his letter, the US attorney general commended the efforts to use data for risk and needs assessments of offenders before they re-entered into society. It also helps to ensure a smooth transition and decrease recidivism.

Several law experts have also expressed optimism over the application of data science particularly in case of offenders exiting person. They say that it can help such people in overcoming the incredibly difficult task of re-settling back into a normal life after their sentences are over.

Apart from all these positive side of this new kind of science, Holder also pointed out some of its drawbacks in connection to criminal justice issues. He warned against using pre-recorded data for sentencing itself — a practise which recently started in a few states, including Tennessee and Pennsylvania.

While pointing out the risk in use of static and historical data, which includes the education level, employment history, family circumstances and demographic information of the accused, the US attorney said: “This is a dangerous concept because it bases the sentence for a crime on data points that are not integral to the crime itself.”

In order to show the race and class disparities in the US criminal justice system, the 2012 data of the inmates in the New York City prison can be studied. It notes that in the year 2012, over 57% of the prison population was black, 33% Hispanic, 7% white and 1% Asian, whereas the population of the city in total included 17.5% black people, 70.9% white, 18.4% Hispanic and 8% Asian.

Such citations can in a way prove that some form of data has always been (at least subconsciously) included as a part of a judge’s sentencing techniques. The role personal experience plays in what a judge thinks is a just sentence can also be understood through it.

4 reasons not to be angry at OkCupid’s experiments

It’s out there, it’s been out there for some time the wonders that Big Data can do. We all wowed at it and corporations have been using it to make big bucks and we were ok with that. But now, we are slowly realizing that the source of that big buck is our privacy!

OkCupid’s experiments on its users has created such an outrage in the industry. Both users and corporations are frowning at it (If companies aren’t right now, they should) and people have fallen into deep thoughts.

Here are 4 reasons why we shouldn’t hate OkCupid’s experiments and its decision to publish the results.

1. Reality Check

Truth is, we all are living in a world, where the line between private and not-private information is growing thinner by the second. In today’s world you cannot use a digital device without leaving some kind of information about yourself out there. This so called “Digital Footprint” is always there whether you realize it or not and there are people who have this information of yours with them. You already knew that. Everybody knows that!

When OkCupid released the results of their experiments in a blogpost, the users were outraged and there was a sudden talk of “ethics” in the industry. This also brought back the memories of facebook’s experiment which Facebook tried to defend using its terms and conditions policy.

Truth is, whether you know it or not, your information is used against you, to manipulate you into buying things that you don’t need or want, to decide for you where you should go on the next holiday and thanks to OkCupid it is now proven that you can be tricked into choosing a life partner!

With technology which can create databases of individual genetic information, imagine just what insurance companies can do if they can get their hands on your gene makeup and whatever you are imagining right now, it’s not going to be close to what they are going to do.

While this has been out there for a while, OkCupid’s experiments has put it back on the table.

2. Knowledge is not a sin

One of the strongest oppositions to Facebook’s and OkCupid’s experiments involves “toying with human emotions without their knowledge” argument.

While people might find it offending to find out that they have been manipulated emotionally, OkCupid’s research actually gives us a huge insight into our thinking and behavioural patterns. Instead of focusing on what’s been done, we might do well to think about how we can overcome the inherent irrationalities to make better choices. I mean, how shallow and gullible are we to ignore the entire personality text and judge someone based on their image. Isn’t that something we want to know about ourselves? We don’t need to be offended as much because OkCupid didn’t set out to manipulate John or Jane but the experiment was conducted on random samples using anonymized data.

Besides, OkCupid is not the first nor is it part of a special elite group which is conducting such experiments. They’re happening everywhere, places where the user gave his data (1st party data), places where he agreed to share his data from somewhere (2nd party data) and places where this data is stolen (3rd party). We don’t know what else is being done, we cannot know what can be done. Which brings me to the third reason.

3. Government Role

While I believe that experiments such as these are not entirely bad, this incident should raise the concern of what all is happening that we don’t know about. OkCupid definitely made this mainstream, which is the reason why I believe corporations should be hating it!

I mean, the only reason we are aware of facebook’s or OkCupid’s experiments is because they chose to make those results public. I cannot help but wonder What else are organizations doing with god knows what type of personal data that we don’t know about because they chose to keep it “off public”?

We need government intervention in creating strict laws that regulate what kind of data can and cannot be collected in different domains (social networking, travel websites, e-commerce sites, dating services etc.,) and how it can be used, whether offline or online and then public should be made aware of what type of information is ok for them to provide to these entities and through which channel. There’s also the data security issue intertwined into this whole mess.

To cite an example, There was this delivery guy from one of the major ecommerce sites who came to deliver my t-shirt and he wanted to know my SSN or PAN or Passport ID because he was asked to do so. I couldn’t for the life of me fathom why I would want to share such personal information with an e-commerce site? And that too through a delivery guy!

But there are people who do that because they don’t know how valuable this information is and how dangerous it is to hand it out like that! It is the Government’s duty, besides being a moral responsibility to create awareness among the general public as this in many ways can also lead to issues related to public safety.

4. Understanding Ourselves

Christian Rudder, CEO of OkCupid has made it pretty clear how much we can learn about Human minds using the internet. But wait, why should we believe him? For all we know, he’s just trying to justify OkCupid’s experiments or worse, he’s being a proponent of such experiments on a large scale. I’ve graduated with a major in biology with significant project work in behavioural and clinical psychology.

Believe me when I say that no matter how well a behavioural psychology related experiment is designed, it can never create the ideal conditions for the results to be infallible (One reason why theories are always flying across the rooms of so many schools of thoughts!). Most of what we know about human behaviour is either in dispute with other findings or is yet to be in dispute with findings yet to be made.

Internet is the actual solution to this problem with some tiny problems of its own but overall, Rudder is not kidding about how experiments on Internet users over internet can help us accelerate our understanding of human behaviour and our progress as a human race. Google using search queries was able to understand how intolerant as a society, America is towards homosexuality. I’d call that a win for all of us.

Technology is moving faster than we know because a large portion of it is kept from us and sometimes not everyone of us is equipped enough to understand its ramifications.

Here’s the thing. Big Data is here to stay! It will evolve but it’s not going to go extinct. Anyone having an access to internet can now do Big Data analytics. We should make peace with that and adapt to it while finding ways to protecting our privacy and individuality.

5 types of Data Scientists

Data Science in itself is a very diverse field, and hence many types exist in it. Basing on this and the tools utilized the scientists themselves are classified into various types.

Few types of data scientists:

  1. The quantitative data scientists who rely on theories and exploration. They firmly believe that established theories are the best way to analyze data and all of practical implementation relies upon it.
  2. Operational data scientists come in the next category. Here you have a set of gentlemen who

    rely on facts and figures. They wish to classify all data as numbers, and thus carry out further

    research. For example a coach would rely on various parameters like pass time, running speed,

    dribbling count per minute etc. to analyse the players in his basketball team. After collecting

    this raw data he can use tools to sort and order, so special attention can be given to those who

    under perform. This is more or less what an operational data scientist does at work, with respect

    to various business principles and strategies basing on the stats.

  3. Product management data scientists, this team composes of people who try to enhance

    the product. In short they are trying to tinker the existing modules in order to provide better

    interfaces to the user. The example of improving the app store on an android or iOS platform,

    just to make the users feel at home is a good example in this type of data science. Thorough

    understanding of what the product is set to serve, is key to conquering the market for these

    people.

  4. Marketing data scientists take up the onus of understanding the market well on their

    shoulders. It is essentially one of the trickier jobs out there, because the market is always

    dynamic and relies on a variety of factors. For example you are analysing the telephone calls of

    a particular location over a time period of 1 week, in order to successfully determine what plan

    would benefit both user and end service provider. You would run into a situation in which data

    would be haphazard, and making order out of chaos would become a daunting task. Hence these

    scientists have to work with dedication and adroit skill.

  5. Research data scientists need to think out of the box and survive on innovation and inspection.

    Inspecting how a product is regularly functioning and upgrading it is best possible by the work

    of this type of data scientist.

All in all it is the coming together of various forms of data science that helps in gaining overall success

for various businesses.