CFP last date
20 January 2025
Reseach Article

The Systematic Study of Scientific Workflows in Distributed and Cloud Environments: A Review of Privacy Issues in Workflow Provenance, Challenges and Opportunities

by Shridevi Erayya Hombal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 179 - Number 31
Year of Publication: 2018
Authors: Shridevi Erayya Hombal
10.5120/ijca2018916696

Shridevi Erayya Hombal . The Systematic Study of Scientific Workflows in Distributed and Cloud Environments: A Review of Privacy Issues in Workflow Provenance, Challenges and Opportunities. International Journal of Computer Applications. 179, 31 ( Apr 2018), 32-38. DOI=10.5120/ijca2018916696

@article{ 10.5120/ijca2018916696,
author = { Shridevi Erayya Hombal },
title = { The Systematic Study of Scientific Workflows in Distributed and Cloud Environments: A Review of Privacy Issues in Workflow Provenance, Challenges and Opportunities },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2018 },
volume = { 179 },
number = { 31 },
month = { Apr },
year = { 2018 },
issn = { 0975-8887 },
pages = { 32-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume179/number31/29197-2018916696/ },
doi = { 10.5120/ijca2018916696 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:57:10.120199+05:30
%A Shridevi Erayya Hombal
%T The Systematic Study of Scientific Workflows in Distributed and Cloud Environments: A Review of Privacy Issues in Workflow Provenance, Challenges and Opportunities
%J International Journal of Computer Applications
%@ 0975-8887
%V 179
%N 31
%P 32-38
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Scientific workflow management systems are providing the ability to manage and query the provenance of data products.Understanding the workflow lifecycle is essential because scientific workflow often deals with proprietary modules as well as private or confidential data, such as medical or health information. Comparing the workflow runs and understanding the difference between them is thus important. This paper discusses the i) Workflow lifecycle and challenges ii) Research issues in provenance for the scientific workflows. It is aimed to provide overview of scientific and business workflows. iii) Privacy issues in a scientific workflow- provenance privacy, data privacy and module privacy. In short the data provenance is an overloaded term that has been defined differently by different people. Data provenance can be examined from different point of perspective such as semantics. By the execution of two workflows with same specification leads to problem of differencing the provenance of two data products. At the end the paper discusses about the challenging task related to increasing number of workflows in the cloud environment is that managing various workflows, VMs & workflow execution on VM instances. Resolving workflow execution is very important because workflow and cloud became most popular in cyber infrastructure projects. Workflow has the capability to build flexible applications and cloud provides scalable and economic services. Hence by clearly defining dependent services the proposed workflow architecture Workflow Flow as a Service (WFaaS) proves to manage large number of workflows and VMs. The workflow architecture proposed in the cloud has been concluded by discussing on future work.

References
  1. C. Dwork. Differential privacy: A survey of results. In: TAMC, pages 1-19, 2008.
  2. The OASIS Committee: Web Services Business Process Execution Language (WSBPEL) Version 2.0 (2007)
  3. van der Aalst, W., ter Hofstede, A.: Yet another workflow language. Information Systems 30(4) (2005) 245–275
  4. Deelman, E., Gil, Y.: Workshop on the Challenges of Scientific Workflows. Technical report, Information Sciences Institute, University of Southern California (2006)
  5. Oinn, T., et al: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17) (2004) 3045–3054
  6. Zhao, J., et al: Annotating, linking and browsing provenance logs for e-Science. In: 1st Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data, Sanibel Island, Florida, USA. (2003)
  7. Ludascher, B., et al.: Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience 18(10) (2005) 1039–1065
  8. Taylor, I.J., et al.: Distributed P2P Computing within Triana: A Galaxy Visualization Test Case. In: 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), IEEE Computer Society (2003) 16–27
  9. Allen, G., et. al: Enabling Applications on the Grid: A GridLab Overview. International Journal of High Performance Computing Applications: Special Issue on Grid Computing: Infrastructure and Applications 17(4) (2003) 449–466
  10. Deelman, E., et al.: Pegasus: A framework for Mapping Complex Scientific Workflows onto Distributed Systems. Scientific Programming Journal 13(3) (2005) 219– 237
  11. Brown, J.L., et al.: GridNexus: A Grid Services Scientific Workflow System. International Journal of Computer Information Science (IJCIS) 6(2) (2005) 72–82
  12. Rowe, A., et al.: The Discovery Net System for High Throughput Bioinformatics. Bioinformatics 19(1) (2003) 225–231
  13. L. Moreau, et al., "Concurrency and Computation: Practice and Experience, Special Issue on the First Provenance Challenge," 2007.
  14. L. Moreau, et al., "The Open Provenance Model," University of Southampton 2007. http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf
  15. L. Moreau, et al., "The First Provenance Challenge," Concurrency and Computation: Practice and Experience, 2007.
  16. C. A. Goble, et al., "myExperiment: social networking for workflow-using e-scientists," WORKS, pp. 1-2, 2007.
  17. J. Freire, et al., "Managing Rapidly-Evolving Scientific Workflows.," IPAW, vol. 4145, pp. 10-18, 2006.
  18. S. Miles, et al., "Connecting Scientific Data to Scientific Experiments with Provenance," in e- Science, 2007
  19. E. Deelman, et al., "Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems," Scientific Programming .l, vol. 13, 2005.
  20. O. Lassila, et al., "Resource Description Framework (RDF) Model and Syntax Specification," 1999.
  21. E. Deelman, et al., “Grid-Based Metadata Services," in 16th International Conference on Scientific and Statistical Database Management, 2004.
  22. "Flexible Image Transport System." http://fits.gsfc.nasa.gov/
  23. Workflow optimization of performance and quality of service for bioinformatics application in high performance computing Rashid Al-Ali, Nagarajan Kathiresan, Mohammed El Anbari, Eric R. Schendel, Tariq Abu Zaid
  24. K. Czajkowski, et al., "Grid Information Services for Distributed Resource Sharing," in HPDC 2001
  25. L. Haas. Information for people. http://www.almaden.ibm.com/cs/people/laura/ Information For People keynote.pdf, 2007. Keynote talk at ICDE.
  26. H. V. Jagadish. Making database systems usablehttp://www.eecs.umich.edu/db/usable/ usability-sigmod.ppt, 2007. Keynote talk at SIGMOD.
  27. Second provenance challenge. http://twiki.ipaw.info/bin/view/Challenge/ SecondProvenanceChallenge, 2007. J. Freire, S. Miles, and L. Moreau (organizers).
  28. L. Moreau, J. Freire, J. Futrelle, R. McGrath, J. Myers, and P. Paulson. The open provenance model, December 2007. http://eprints.ecs.soton.ac.uk/14979.
  29. Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms Jianwu Wang1 , Prakashan Korambath2 , Ilkay Altintas1 , Jim Davis3 , Daniel Crawl1
Index Terms

Computer Science
Information Sciences

Keywords

Data Provenance Workflow Scientific Workflow Management Workflow as a Service (WFaaS) Cloud Metadata.