Semantic Web

Gaurav Kumar
3 min readJun 17, 2021

--

We all have been through Wikipedia to read an article about any topic, and for a major part of us seem to find most of our searches being answered by Wikipedia and it is always on the top of your every search result, right!! (P.S. Unless you are a developer and you have Stack Overflow or GeeksforGeeks on top of your every search result). So, one would also notice that the articles in Wikipedia contain a lot of links, over the words that have an article on them such that if you are stuck somewhere you can always visit it. Now it seemed a very normal thing to me as a kid who just wants his school project to be done ASAP!!, because it just means that I can get answers to all my queries and that sufficed my needs back then.

But, this year I wanted to participate in Google Summer of Code and hence I started looking for organizations and projects that I can contribute to and it was at this time that I came across DBpedia for the first time. So it is an organization that has been working on making Linked Data available to the Open Source community since 2007, and I was very astonished to experience if something like this really exists (as well as on my Ignorance). Now they extract data from Wikipedia and link these articles with the links they get from the entirety of the Wikipedia articles. So yeah!!, we have a Knowledge Graph now. Now knowledge graphs are a great way to represent information, cause it is simply a knowledge base with topology, and link various objects together. This answers my reason for the image added above, it is the pictorial representation of the Knowledge graph from the lod-cloud where you can see many visual representations.

But linked data is not the ultimate thing right, cause in the end they are just useful to look up for the meaning of some terms, phrases or events, etc. But, if these linked data has some kind of relationship between the two objects, it will be useful also as Q&A system that works some Algorithm, it will help you to improve the search results just like google and many other search engines did. Also, it is not just about the search engines that are being improved but also about various recommendation systems, voice assistants, voice search, troubleshooting guides, etc.

There are different formats in which these extracted data can be represented, but the most common ones are those involving triples (subject-predicate-object). Where subject and object are the two entities among which we want to obtain a relationship. RDF (Resource Description Format) triples are a popular way to store this and any database with RDF triples can be queried using SPARQL or any similar querying language. One can use SPARQL endpoint provided by DBpedia to query over their RDF triples datasets.

But, the problem is we don’t have linked data directly sensing the relationship between these two entities, and hence they need to be extracted. These relationships are extracted from the infoboxes that contain the structured text and establish precise relationships between a good amount of entities and have been used by both Google and DBpedia using the infoboxes of Google’s Knowledge base and Wikipedia. The Wikipedia infobox is typically the top right box on a desktop browser. Anyone interested to follow up more on how the extraction framework for DBpedia works currently can visit their GitHub repository for the same.

Credits:

  1. How Google is using Linked Data Today and Vision For Tomorrow. [source]
  2. DBpedia and the Live Extraction of Structured Data from Wikipedia [source]

3. Image Credits: https://lod-cloud.net/

--

--