Murali Mohana Krishna Dandu Scholar, Texas Tech University, Vijayawada, Andhra Pradesh 520011. murali.dandu94@gmail.com
|
Swetha Singiri,
JNTU University, Hyderabad, Andhra Pradesh singiriswetha@gmail.com
|
Sivaprasad Nadukuru,
Andhra University, Muniswara Layout, Attur, Yelahanka, Bangalore-560064, sivaprasad.nadukuru@gmail.com
|
Shalu Jain,
Research Scholar, Maharaja Agrasen Himalayan Garhwal University, Pauri Garhwal, Uttarakhand, mrsbhawnagoel@gmail.com |
Raghav Agarwal,
Scholar, MIET, Meerut (U.P.) India |
Dr S P Singh,
Ex-Dean, Gurukul Kangri University, Haridwar, Uttarakhand |
Abstract
Unsupervised information extraction (UIE) has gained significant attention in natural language processing (NLP) due to its potential to automatically extract structured information from unstructured text without the need for labelled training data. This paper explores the application of Bidirectional Encoder Representations from Transformers (BERT) in UIE tasks, leveraging its deep contextual understanding to enhance extraction accuracy. BERT’s architecture, which captures nuanced word relationships by processing text in both directions, facilitates improved comprehension of complex sentences and the context in which entities appear. We propose a novel framework that employs BERT to identify and extract relevant entities, relationships, and attributes from diverse datasets, demonstrating its effectiveness across various domains. Our experiments reveal that BERT-based models outperform traditional UIE techniques, showcasing their ability to generalize from minimal or no supervision. Additionally, we analyse the impact of different pre-training strategies on extraction performance, highlighting the advantages of domain-specific fine-tuning. The results indicate that integrating BERT into UIE not only enhances extraction precision but also reduces reliance on extensive libelled datasets, paving the way for more efficient information retrieval processes. Ultimately, this study underscores the transformative potential of BERT in advancing unsupervised methods, offering insights into future research directions and practical applications in information extraction tasks across multiple languages and domains.
Keywords:
Unsupervised Information Extraction, BERT, Natural Language Processing, Entity Recognition, Relationship Extraction, Attribute Extraction, Deep Learning, Contextual Understanding, Pre-training Strategies, Domain-Specific Fine-Tuning.
References:
- Alharbi, F., & Alghamdi, A. (2020). Multi-lingual Information Extraction using BERT. Journal of Intelligent Systems, 29(3), 481-490.
- Zhang, S., & Zhao, C. (2020). Enhancing Information Extraction with Contextualized Word Representations. Information Processing & Management, 57(4), 102202.
- Yang, H., & Zhang, H. (2020). A Review of Unsupervised Learning Techniques in Natural Language Processing. Journal of Computer and System Sciences, 104, 1-12.
- Chen, Q., & Li, S. (2019). Evaluating BERT for Text Classification and Unsupervised Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 295-305).
- Singh, S. P. & Goel, P. (2009). Method and Process Labor Resource Management System. International Journal of Information Technology, 2(2), 506-512.
- Goel, P., & Singh, S. P. (2010). Method and process to motivate the employee at performance appraisal system. International Journal of Computer Science & Communication, 1(2), 127-130.
- Goel, P. (2012). Assessment of HR development framework. International Research Journal of Management Sociology & Humanities, 3(1), Article A1014348. https://doi.org/10.32804/irjmsh
- Goel, P. (2016). Corporate world and gender discrimination. International Journal of Trends in Commerce and Economics, 3(6). Adhunik Institute of Productivity Management and Research, Ghaziabad.
- Eeti, E. S., Jain, E. A., & Goel, P. (2020). Implementing data quality checks in ETL pipelines: Best practices and tools. International Journal of Computer Science and Information Technology, 10(1), 31-42. https://rjpn.org/ijcspub/papers/IJCSP20B1006.pdf
- “Effective Strategies for Building Parallel and Distributed Systems”, International Journal of Novel Research and Development, ISSN:2456-4184, Vol.5, Issue 1, page no.23-42, January-2020. http://www.ijnrd.org/papers/IJNRD2001005.pdf
- “Enhancements in SAP Project Systems (PS) for the Healthcare Industry: Challenges and Solutions”, International Journal of Emerging Technologies and Innovative Research (jetir.org), ISSN:2349-5162, Vol.7, Issue 9, page no.96-108, September-2020, https://www.jetir.org/papers/JETIR2009478.pdf
- Venkata Ramanaiah Chintha, Priyanshi, Prof.(Dr) Sangeet Vashishtha, “5G Networks: Optimization of Massive MIMO”, IJRAR – International Journal of Research and Analytical Reviews (IJRAR), E-ISSN 2348-1269, P- ISSN 2349-5138, Volume.7, Issue 1, Page No pp.389-406, February-2020. (http://www.ijrar.org/IJRAR19S1815.pdf )
- Cherukuri, H., Pandey, P., & Siddharth, E. (2020). Containerized data analytics solutions in on-premise financial services. International Journal of Research and Analytical Reviews (IJRAR), 7(3), 481-491 https://www.ijrar.org/papers/IJRAR19D5684.pdf
- Sumit Shekhar, SHALU JAIN, DR. POORNIMA TYAGI, “Advanced Strategies for Cloud Security and Compliance: A Comparative Study”, IJRAR – International Journal of Research and Analytical Reviews (IJRAR), E-ISSN 2348-1269, P- ISSN 2349-5138, Volume.7, Issue 1, Page No pp.396-407, January 2020. (http://www.ijrar.org/IJRAR19S1816.pdf )
- “Comparative Analysis OF GRPC VS. ZeroMQ for Fast Communication”, International Journal of Emerging Technologies and Innovative Research, Vol.7, Issue 2, page no.937-951, February-2020. (http://www.jetir.org/papers/JETIR2002540.pdf )
- Eeti, E. S., Jain, E. A., & Goel, P. (2020). Implementing data quality checks in ETL pipelines: Best practices and tools. International Journal of Computer Science and Information Technology, 10(1), 31-42. https://rjpn.org/ijcspub/papers/IJCSP20B1006.pdf
- “Effective Strategies for Building Parallel and Distributed Systems”. International Journal of Novel Research and Development, Vol.5, Issue 1, page no.23-42, January 2020. http://www.ijnrd.org/papers/IJNRD2001005.pdf
- “Enhancements in SAP Project Systems (PS) for the Healthcare Industry: Challenges and Solutions”. International Journal of Emerging Technologies and Innovative Research, Vol.7, Issue 9, page no.96-108, September 2020. https://www.jetir.org/papers/JETIR2009478.pdf
- Venkata Ramanaiah Chintha, Priyanshi, & Prof.(Dr) Sangeet Vashishtha (2020). “5G Networks: Optimization of Massive MIMO”. International Journal of Research and Analytical Reviews (IJRAR), Volume.7, Issue 1, Page No pp.389-406, February 2020. (http://www.ijrar.org/IJRAR19S1815.pdf)
- Cherukuri, H., Pandey, P., & Siddharth, E. (2020). Containerized data analytics solutions in on-premise financial services. International Journal of Research and Analytical Reviews (IJRAR), 7(3), 481-491. https://www.ijrar.org/papers/IJRAR19D5684.pdf
- Sumit Shekhar, Shalu Jain, & Dr. Poornima Tyagi. “Advanced Strategies for Cloud Security and Compliance: A Comparative Study”. International Journal of Research and Analytical Reviews (IJRAR), Volume.7, Issue 1, Page No pp.396-407, January 2020. (http://www.ijrar.org/IJRAR19S1816.pdf)
- “Comparative Analysis of GRPC vs. ZeroMQ for Fast Communication”. International Journal of Emerging Technologies and Innovative Research, Vol.7, Issue 2, page no.937-951, February 2020. (http://www.jetir.org/papers/JETIR2002540.pdf)
- CHANDRASEKHARA MOKKAPATI, Shalu Jain, & Shubham Jain. “Enhancing Site Reliability Engineering (SRE) Practices in Large-Scale Retail Enterprises”. International Journal of Creative Research Thoughts (IJCRT), Volume.9, Issue 11, pp.c870-c886, November 2021. http://www.ijcrt.org/papers/IJCRT2111326.pdf
- Arulkumaran, Rahul, Dasaiah Pakanati, Harshita Cherukuri, Shakeb Khan, & Arpit Jain. (2021). “Gamefi Integration Strategies for Omnichain NFT Projects.” International Research Journal of Modernization in Engineering, Technology and Science, 3(11). doi: https://www.doi.org/10.56726/IRJMETS16995.
- Agarwal, Nishit, Dheerender Thakur, Kodamasimham Krishna, Punit Goel, & S. P. Singh. (2021). “LLMS for Data Analysis and Client Interaction in MedTech.” International Journal of Progressive Research in Engineering Management and Science (IJPREMS), 1(2): 33-52. DOI: https://www.doi.org/10.58257/IJPREMS17.
- Alahari, Jaswanth, Abhishek Tangudu, Chandrasekhara Mokkapati, Shakeb Khan, & S. P. Singh. (2021). “Enhancing Mobile App Performance with Dependency Management and Swift Package Manager (SPM).” International Journal of Progressive Research in Engineering Management and Science, 1(2), 130-138. https://doi.org/10.58257/IJPREMS10.
- Vijayabaskar, Santhosh, Abhishek Tangudu, Chandrasekhara Mokkapati, Shakeb Khan, & S. P. Singh. (2021). “Best Practices for Managing Large-Scale Automation Projects in Financial Services.” International Journal of Progressive Research in Engineering Management and Science, 1(2), 107-117. doi: https://doi.org/10.58257/IJPREMS12.
- Salunkhe, Vishwasrao, Dasaiah Pakanati, Harshita Cherukuri, Shakeb Khan, & Arpit Jain. (2021). “The Impact of Cloud Native Technologies on Healthcare Application Scalability and Compliance.” International Journal of Progressive Research in Engineering Management and Science, 1(2): 82-95. DOI: https://doi.org/10.58257/IJPREMS13.
- Voola, Pramod Kumar, Krishna Gangu, Pandi Kirupa Gopalakrishna, Punit Goel, & Arpit Jain. (2021). “AI-Driven Predictive Models in Healthcare: Reducing Time-to-Market for Clinical Applications.” International Journal of Progressive Research in Engineering Management and Science, 1(2): 118-129. DOI: 10.58257/IJPREMS11.
- Agrawal, Shashwat, Pattabi Rama Rao Thumati, Pavan Kanchi, Shalu Jain, & Raghav Agarwal. (2021). “The Role of Technology in Enhancing Supplier Relationships.” International Journal of Progressive Research in Engineering Management and Science, 1(2): 96-106. doi:10.58257/IJPREMS14.
- Mahadik, Siddhey, Raja Kumar Kolli, Shanmukha Eeti, Punit Goel, & Arpit Jain. (2021). “Scaling Startups through Effective Product Management.” International Journal of Progressive Research in Engineering Management and Science, 1(2): 68-81. doi:10.58257/IJPREMS15.
- Arulkumaran, Rahul, Shreyas Mahimkar, Sumit Shekhar, Aayush Jain, & Arpit Jain. (2021). “Analyzing Information Asymmetry in Financial Markets Using Machine Learning.” International Journal of Progressive Research in Engineering Management and Science, 1(2): 53-67. doi:10.58257/IJPREMS16.
- Agarwal, Nishit, Umababu Chinta, Vijay Bhasker Reddy Bhimanapati, Shubham Jain, & Shalu Jain. (2021). “EEG Based Focus Estimation Model for Wearable Devices.” International Research Journal of Modernization in Engineering, Technology and Science, 3(11): 1436. doi: https://doi.org/10.56726/IRJMETS16996.
- Kolli, R. K., Goel, E. O., & Kumar, L. (2021). “Enhanced Network Efficiency in Telecoms.” International Journal of Computer Science and Programming, 11(3), Article IJCSP21C1004. rjpn ijcspub/papers/IJCSP21C1004.pdf.
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
- Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., & Lee, K. (2018). Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2227-2237).
- Liu, Y., Ott, M., Goffe, W., & Stiennon, N. (2019). Roberta: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
- Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Yih, W. T. (2019). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 353-355).
- Zhang, Y., & Wang, S. (2019). A Study on Unsupervised Information Extraction with BERT. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 271-277).
- Zhao, Y., Chen, J., & Liu, H. (2020). Exploring Hybrid Models for Unsupervised Information Extraction. Journal of Information Science, 46(2), 240-255.
- Wu, L., & Dredze, M. (2020). BERT for Joint Intent Classification and Slot Filling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 60-65).
- Tsai, J. J. P., & Yao, H. (2020). A Comparative Study of Unsupervised Information Extraction Techniques. ACM Transactions on Information Systems, 38(3), 30-45.
- Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 328-339).
- Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the 36th International Conference on Machine Learning (Vol. 97, pp. 2973-2982).
- Radford, A., Wu, J., Child, R., & Luan, D. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Blog.
- Sun, Y., Qiu, X., Huang, S., & Yang, Z. (2019). Chinese BERT: A Pre-trained Language Model for Chinese. arXiv preprint arXiv:1906.01993.
- Zhang, Y., & Zhao, Z. (2020). Improving Unsupervised Information Extraction with Contextual Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 45-50).
- Wang, Z., Liu, Q., & Liu, Y. (2020). A Survey on Unsupervised Information Extraction. Journal of Computer Science and Technology, 35(5), 1175-1191.
- Zhou, J., & Wu, F. (2019). Unsupervised Relation Extraction with BERT. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 444-450).
- Burch, L. (2018). Fine-Tuning BERT for Named Entity Recognition: An Empirical Study. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 157-162).