Tag Content Extractor Problem in Java

by Online Tutorials Library July 22, 2022

Tag Content Extractor Problem in Java

It is very interesting problem frequently asked in interviews of top IT companies like Google, Amazon, TCS, Accenture, etc. By solving the problem, one wants to check the logical ability, critical thinking, and problem-solving skill of the interviewee. So, in this section, we are going to solve how to extract content from tag in Java with different approaches and logic. Also, we will create Java programs for the same.

Problem Statement

In this problem, we have given a string of text (tag-based language), and our goal is to parse the text and retrieve the content. We retrieve the data enclosed within a sequence of the well-organized tags meeting the following criterion:

The start and end tag names should be the same. For example, HTML code <h1>Test</h3> is considered as invalid code because it starts with h1 tag and ends with h2 tag.
The content between nested tags is considered invalid. However, the tags can be nested.
Any number of characters can be contained by tags.

Example

Suppose, we have given the following text wrapped inside tags.

   3   <h1>Hello, readers</h1>   <h1><h1>welcome to TutorAspire </h1></h1><par>So wait for a while</par>   <amee>You can learn any technology in an easy way.</amee>  

We have to opt the text from the tags like following.

  Hello, readers   welcome to TutorAspire You can learn any technology in an easy way.  

There are several solutions to the Tag Content Extractor problem in Java programming. Let’s understand each solution to the problem one by one:

TagContentExtractorExample1.java

  import java.util.Scanner;  import java.util.regex.Matcher;  import java.util.regex.Pattern;  // create class TagContentExtractorExample1 to understand the first solution of Tag Content Extractor problem  public class TagContentExtractorExample1{  // main() method start  public static void main(String[] args){  // create Scanner class object  Scanner sc = new Scanner(System.in);  // take input from user for number of tags  System.out.println(“How many lines of code you have?”);  int n = Integer.parseInt(sc.nextLine());  // repeat steps  while(n–>0){   // take input from user for extracting content from tags  String tagLine = sc.nextLine();  // use matcher  Matcher matcher = Pattern.compile(“<(.+)>(([^<>]+))”).matcher(tagLine);  // print none when nothing found in matcher  if (matcher.find() == false) {  System.out.println(“None”);  continue;  }  // reset matcher  matcher.reset();  // repeat step until matcher find data  while (matcher.find() == true){  System.out.println(matcher.group(2));  }  }  // close scanner class object  sc.close();  }   }  

Output:

Tag Content Extractor Problem in Java

Let’s see another approach for the same.

TagContentExtractorExample2.java

  import java.util.Scanner;  import java.util.regex.Matcher;  import java.util.regex.Pattern;  // create class TagContentExtractorExample2 to understand the second solution of Tag Content Extractor problem  public class TagContentExtractorExample2{  // main() method start  public static void main(String[] args){  // create Scanner class object  Scanner sc = new Scanner(System.in);  // create string of regex  String expression = “<([^<>]+)>([^<>]+)”;  // extract pattern from regex  Pattern pattern = Pattern.compile(expression);  // take input from the user for the number of tags  System.out.println(“How many lines of code you have?”);  int n = Integer.parseInt(sc.nextLine());  // repeat steps until n > 0 9  while(n > 0){   // take input from user for extracting content from tags  String tagLine = sc.nextLine();  // use matcher  Matcher matcher = pattern.matcher(tagLine);  // set counter variable to 0  int count = 0;  // repeat step until matcher find data  while (matcher.find() == true){  System.out.println(matcher.group(2));  count++;  }  // count = 0 indicated no data or invalid tag line  if (count == 0) {  System.out.println(“None”);  }  // decrement value of n          n–;  }  // close scanner class object  sc.close();  }   }  

Output:

Tag Content Extractor Problem in Java

Next TopicConvert Roman to Integer in Java

Tag Content Extractor Problem in Java

Tag Content Extractor Problem in Java

Problem Statement

Example

How to block a website on Firefox

How to delete a Facebook page

You may also like