course bg
EduPristine>Blog>Unstructured Data

Unstructured Data

December 12, 2014

Along with Big Data, the word unstructured data is also gaining popularity. You must be wondering “So what is Unstructured Data”, we will try to give you detailed information about it in this blog post.

What is Unstructured Data?

Unstructured data is defined as data that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.
Simply put, any data that cannot be understood by a computer is called unstructured. Most of the unstructured data is in a format that is difficult for traditional computer programs to understand.

Text at this point may not seem like a huge deal, mostly because mining data from text has been around for a long time but a huge portion of data generated by humans is in the form of auditory and visual types. Which is not readable by computer programs.

Dealing with unstructured Data:

Software that generates machine-processable structure exploits the linguistic, auditory, and visual structure inherent in all forms of human communication. Algorithms can deduce this inherent structure from text, for instance, by probing word morphology, sentence syntax, and other small- and large-scale patterns. Unstructured information can then be enhanced and tagged to address obscurities and relevancy-based techniques are then used to facilitate search and discovery.

Examples of “unstructured data” may include books, journals, documents, metadata, health records, audio, video, analog data, images, files, and unstructured text such as the body of an e-mail message, Web page, or word-processor document. While the main content being conveyed does not have a defined structure, it generally comes packaged in objects (e.g. in files or documents…) that themselves have structure and are thus a mix of structured and unstructured data, but collectively this is still referred to as “unstructured data”.

For example, an HTML web page is tagged, but HTML mark-up typically serves solely for rendering. It does not capture the meaning or function of tagged elements in ways that support automated processing of the information content of the page. XHTML tagging does allow machine processing of elements, although it typically does not capture or convey the semantic meaning of tagged terms.

About Author

avatar EduPristine

EduPristine is a member of Adtalem Global Education (NYSE: ATGE), a global education provider headquartered in the United States. Adtalem is a 3 billion dollars (20,000 crores) company that has about 9 institutions and companies with more than 16,000 employees spread across 145 locations. Adtalem takes pride in training 142,000 degree-seeking students all over the world.The organization's purpose is to empower students to achieve their goals, find success and make inspiring contributions to our global community. EduPristine is one of India's leading training providers in Analytics, Accounting, Finance, Healthcare, and Marketing. Founded in 2008, EduPristine has a strong online platform and network of classrooms across India and caters to self-paced learning and online learning, in addition to classroom learning


Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

Post ID = 68867