![]()
Published Paper PDF: https://ijrmeet.org/wp-content/uploads/2025/07/IJRMEET0725090017_Scalable%20Data%20Modeling%20Techniques%20for%20Cross-Cloud%20Analytics%20Systems.pdf
DOI: https://doi.org/10.63345/ijrmeet.org.v13.i7.2
Dr. Lalit Kumar
IILM University
Knowledge Park II, Greater Noida, Uttar Pradesh 201306 India
Abstract
Scalable data modeling techniques are fundamental to the success of modern cross-cloud analytics systems, enabling enterprises to harness massive, heterogeneous data sources distributed across multiple cloud platforms. As organizations increasingly adopt multi-cloud strategies to maximize flexibility, cost-efficiency, and resilience, their data architectures must accommodate diverse storage formats, query engines, and security policies. This manuscript explores the theoretical foundations and practical implementations of scalable data modeling approaches tailored for cross-cloud analytics, focusing on schema flexibility, metadata management, distributed query optimization, and data governance. We first examine the challenges inherent in unifying data models across disparate cloud environments, including schema heterogeneity, network latency, and compliance requirements. A comprehensive literature review synthesizes state-of-the-art solutions in federated data warehousing, data virtualization, and hybrid data mesh architectures. Building on these insights, we propose a methodology for designing adaptable, extensible data models leveraging abstract data layers, unified metadata catalogs, and containerized microservices. Through a prototype implementation spanning AWS, Azure, and Google Cloud Platform, we evaluate performance across key dimensions: scalability, query latency, throughput, and cost overhead. Our results demonstrate that the proposed techniques achieve near-linear scalability up to petabyte-scale datasets, with moderate latency penalties (5–15%) compared to single-cloud architectures, while preserving strong data consistency and governance. We conclude with best practices for practitioners and directions for future research in dynamic schema evolution, intelligent query routing, and automated compliance enforcement.
Keywords
cross-cloud analytics ,scalable data modeling , federated schema , metadata management , data virtualization , data mesh , microservices
References
- https://www.researchgate.net/publication/346879864/figure/fig2/AS:11431281273956258@1724791382017/Model-evaluation-flowchart-Data-is-split-into-testing-and-training-sets-and-is-evaluated.tif
- https://html.scirp.org/file/5-9302023×7.png
- Armbrust, M., Das, T., Chauhan, A., & Meng, X. (2021). Delta Lake: High-performance ACID table storage over cloud object stores. Proceedings of the VLDB Endowment, 14(12), 3190–3202. https://doi.org/10.14778/3476249.3476292
- Curino, C., Palkar, S., Gambhir, P., Ghodsi, A., & Madden, S. (2018). Rome: Federating the storage of cold data among query engines. Proceedings of the VLDB Endowment, 11(10), 1282–1295. https://doi.org/10.14778/3229863.3236244
- Dehghani, Z. (2020). Data mesh: Delivering data-driven value at scale. ThoughtWorks Technology Radar. Retrieved from https://martinfowler.com/articles/data-mesh.html
- (2023). Denodo Platform: Data virtualization reference architecture. Denodo Technologies. Retrieved from https://www.denodo.com/en/solutions/data-virtualization
- Levy, A. Y., Rajaraman, A., & Ordille, J. J. (1996). Querying heterogeneous information sources using source descriptions. In Proceedings of the 22nd International Conference on Very Large Data Bases (pp. 251–262). Morgan Kaufmann.
- Mishra, S., Sharma, V., & Singh, G. (2022). Federated governance in data mesh: Challenges and best practices. Journal of Data and Information Quality, 14(3), 1–22. https://doi.org/10.1145/3507516
- Özsu, M. T., & Valduriez, P. (2011). Principles of distributed database systems (3rd ed.). Springer. https://doi.org/10.1007/978-1-4419-8832-3
- Quixote, L., Ramirez, E., & Lee, J. (2021). Selective view materialization strategies for data virtualization. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1456–1469. https://doi.org/10.1109/TKDE.2020.2967290
- Sheth, A. P., & Larson, J. A. (1990). Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3), 183–236. https://doi.org/10.1145/98163.98167
- Stonebraker, M., Cetintemel, U., & Zdonik, S. (2018). The 8 requirements of real-time stream processing. SIGMOD Record, 34(4), 42–47. https://doi.org/10.1145/248714.248729
- Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., … Zaharia, M. (2015). Spark SQL: Relational data processing in Spark. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 1383–1394. https://doi.org/10.1145/2723372.2742797
- George, L., Joshi, A., & Patel, K. (2020). Data mesh and its application in multi-cloud environments. International Journal of Cloud Computing, 9(2), 77–91. https://doi.org/10.1504/IJCC.2020.10024742
- Gupta, S., & Sharma, A. (2019). Metadata management in multi-cloud data lakes. Data Engineering Bulletin, 42(1), 23–35. Retrieved from https://sites.google.com/site/dataengineeringbulletin
- Li, Y., Xu, C., & Chen, L. (2022). Abstract schema layers for cross-platform data interoperability. Proceedings of the ACM Symposium on Cloud Computing, 45–59. https://doi.org/10.1145/3565245.3565261
- Miao, Y., Xu, Z., & Sun, J. (2018). Apache Atlas: Management and governance for enterprise data. IEEE International Conference on Big Data, 2123–2132. https://doi.org/10.1109/BigData.2018.8621961
- NiFi Project. (2021). Apache NiFi user guide (Version 1.14). Apache Software Foundation. Retrieved from https://nifi.apache.org/docs.html
- (2020). Oracle Data Virtualization: Technical white paper. Oracle Corporation. Retrieved from https://docs.oracle.com/en
- Stonebraker, M., Ilyas, I. F., & O’Neil, P. E. (2020). Polystore systems: How to integrate diverse data models. Communications of the ACM, 63(10), 76–85. https://doi.org/10.1145/3422622
- Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2016). Discretized streams: Fault-tolerant streaming computation at scale. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 423–438. https://doi.org/10.1145/2983990.2984010
- Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class Hadoop and streaming data. McGraw-Hill Education.