Implementation of a 4-tier Cloud-Based Architecture for Collaborative Health Care Delivery

*Corresponding author’s e-mail address: nurayhn1@gmail.com doi: http://dx.doi.org/10.4314/njtd.v13i1.4 ABSTRACT: Cloud services permit healthcare providers to ensure information handling and allow different service resources such as Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) on the Internet, given that security and information proprietorship concerns are attended to. Health Care Providers (HCPs) in Nigeria however, have been confronted with various issues because of their method of operations. Amongst the issues are ill-advised methods of data storage and unreliable nature of patient medical records. Apart from these challenges, trouble in accessing quality healthcare services, high cost of medical services, and wrong analysis and treatment methodology are not left out. Cloud Computing has relatively possessed the capacity to give proficient and reliable method for securing medical information and the need for data mining tools in this form of distributed system will go a long way in achieving the objective set out for this project. The aim of this research therefore is to implement a cloud-based architecture that is suitable to integrate Healthcare Delivery into the cloud to provide a productive mode of operation. The proposed architecture consists of four phases (4-Tier); a User Authentication and Access Control Engine (UAACE) which prevents unauthorized access to patient medical records and also utilizes standard encryption/decoding techniques to ensure privacy of such records. The architecture likewise contains a Data Analysis and Pattern Prediction Unit (DAPPU) which gives valuable data that guides decision making through standard Data mining procedures as well as Cloud Service Provider (CSP) and Health Care Providers (HCPs). The architecture which has been implemented on CloudSim has proved to be efficient and reliable base on the results obtained when compared with previous work.


I. INTRODUCTION
Cloud computing is an Information Technology (IT) model enabling on-demand access to computing resources as a subscription service.Cloud service providers create and maintain large data centers to provide their clients with ondemand computing resources.Clients access and use external resources dynamically in a pay-as-you-go manner.This proves to be very appealing to businesses as it provides greater flexibility and efficiency than maintaining local infrastructure that is underutilized most of the time while at times it may be insufficient (Nikolay & Rajkumar, 2015).The privacy requirements normally encountered in the traditional paper document world are increasingly expected in Internet transactions today.Secure digital communications are necessary for web-based applications, mandated privacy for medical information, etc.In general, secure connections between parties communicating over the Internet is now a requirement (Shabnam & Priyanka, 2012).
Interestingly, cloud computing is being considered by industries whose operations could have great implications for the well-being of society.Industries such as pharmaceutical and medical research organizations and health-care establishments engaged in the business of finding cures for humanity's major illnesses and helping patients are among the latest customers to test and experience the potential advantages of this new innovation (Sultan & Nabil, 2014).Healthcare organizations today are capable of generating and collecting a large amount of data.This increase in volume of data requires automatic way for these data to be extracted when needed.With the use of data mining techniques it might be possible to extract interesting and useful knowledge and regularities.Knowledge acquired in this manner, can be used in appropriate area to improve work efficiency and enhance quality of decision making process.Above stated points that there is a great need for new generation of computer theories and tools to help people with extracting useful information from constantly growing volume of digital data (Boris & Milan, 2012).
The following are commonly used machine learning algorithms.These algorithms can be applied to almost any data problem: Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, KNN (Yuliang et. al, 2015), K-Means, Random Forest, Dimensionality Reduction Algorithms and Gradient Boost & Adaboost (Wu et. al, 2008).
The authors decided to use KNN in this project because of the following notable advantages:  The naïve version of the algorithm is easy to implement by computing the distances from the test example to all stored examples,  k-NN has some strong consistency results  k-NN is guaranteed to approach the Bayes error rate for some value of k (where k increases as a function of the number of data points). The K-nearest neighbor classification performance can often be significantly improved through (supervised) metric learning (Wu et. al, 2008).The architecture consists of a Cloud Service Provider (CSP), Health Care Provider (HCP), Data Authentication and Access Control Unit (DAACE) which protects medical records and prevents unapproved access as well as Data Analysis and Pattern Prediction Unit (DAPPU).The platform was implemented using a simulation tool for cloud computing to demonstrate the connection of the CSPs and HCPs which serves as the IaaS, while a web based application is used to demonstrate the SaaS; DAACE and DAPPU, with the use of RSA and KNN algorithm respectively.

II. CLOUD COMPUTING AND HEALTHCARE DELIVERY SYSTEM
Healthcare organizations are expected to provide new and improved patient care capabilities at a reduced cost.IT plays a strong role in the health and patient care arenas with Cloud Computing (CC) gradually beginning to make its marks (Guo et al, 2010).Non-digitized information is often not portable and therefore inhibits its sharing amongst various healthcare delivery actors (patients, physicians, pharmacists, clinics, etc.) (Aljabre, et al., 2012).
The use of technology to facilitate collaboration and coordinate healthcare between patients and physicians and amongst the medical community is still limited in Nigeria (Capannini, et al. 2008).The healthcare industry is shifting towards an information-centric care delivery model, enabled in part by open standards that support cooperation, collaborative workflows, and information sharing.CC provides an enabling environment that allows hospitals, medical practices, insurance companies, and research facilities to tap improved computing resources at lower initial capital outlay (Rolim et al, 2010).
CC also provides a platform that reduces the barriers for innovation, modernization of Healthcare Information Technology (HIT) systems and other healthcare related applications (Justice, et al 2014).CC provides facilities that support big data sets for Electronic Health Records (EHRs) (Thomas-MacLean et. al. 2014), radiology images and genomic data offloading (Kuo, 2011).It also facilitates the sharing of EHRs among authorized physicians and hospitals in various geographical areas, providing more timely access to life-saving information thereby reducing the need for duplicate testing (Wu et al, 2010).

Justification for the Choice of CloudSim
The Study and comparison of various Cloud simulators available (Rahul & Prince 2013) resulted in the conclusion that the CloudSim simulator is the most sophisticated among the simulators.The conclusion is based on their characteristics with respect to language platform, networking simulation speed and availability.(Mahdi et. al., 2013) presents a review of the current most applied grid and cloud computing simulation tools as well their capabilities in different aspects of applications as shown in Table 1.

III. ARCHITECTURAL DESIGN
The model is structured on four phases which are stated below:-CSP, HCPs, DAACE and DAPPU.The descriptions of the phases are described below:-

First Phase of the Architecture Cloud Service and Health Care Providers
Table 2 presents the matrix of three CSPs and the basic cloud computing service models.For instance, from Table 2, the category of services offered by the cloud service provider with the code CSP1 is given by the logical (database) operators/functions as specified in (1).CSP1 = {XCSP1, YCSP1, ZCSP1}…(1) Assuming HCP1, HCP2 and HCP3 subscribed for the EHR application, all CSPs that offer such a service must provide a general (standard) format for capturing medical data via the Healthcare application as shown by the logical (database) operators/functions as specified in (5).It is worthy of mentioning that the architecture could accommodate as many HCPs as possible.

Second Phase of the Architecture Data Authentication and Access Control Engine (DAACE)
This module consists of a web application which will enable Users (Patients or Doctors) to login to the secured platform with an ID, Password and a Verification Key.The platform allows different options for Users, which include; Upload, Delete, and Decryption of data file.Once file is being uploaded, it becomes encrypted and assigned a key which will also be needed to decrypt the file afterwards when file is shared to the recipient.The file sharing process can be from Patient to Doctor.The process is implemented with Rivest Sharmir Adleman (RSA) cryptographic algorithm.This algorithm has been shown to be efficient in securing data.This phase adopted a popular algorithm called RSA for its implementation with the following procedure.

RSA Key Generation
There are several techniques like Advanced Encryption Standard (AES), Triple Data Encryption Standard (DES), TwoFish and BlowFish for providing a method of assuring the confidentiality, integrity, authenticity and non-reputability of electronic communications and data storage.However RSA was adopted in this work because of the following advantages: (i) Increased security and convenience: private keys never need to be transmitted or revealed to anyone.(ii) Provision of digital signatures that cannot be repudiated.Authentication via secret-key systems requires the sharing of some secret and sometimes requires trust of a third party as well (Zhang & Gong, 2011) Algorithm 1: RSA Algorithm 1. Pick two distinct primes p and q 2. Compute n = p*q and ɸ(n) = (p -1)(q -1) Where n is the modulus for the public key and the private key and Φ is the Totient of the modulus n.

Explanation of UAACE process flow
The application starts with a Login page where existing users are required to input their Login details (email address and password), while a new user is required to register.Only after a successful registration, a user will able to login to the platform.If a user succeeds with login in after email address and password must have matched with the registered details, a verification (unique) key would be sent to the user's email address, after which the user would be directed to the Verification page to input the verification key which was sent earlier, once the key matches the one sent by the system to the user, the user then gain access to the application, this process is required to make sure only the authorized user is permitted.
In the application, a user can upload data file by clicking the Upload button, once files are uploaded, they are encrypted using the RSA algorithm, view uploaded details by navigating to the VIEW DETAILS menu, here the user can either download or delete encrypted file.A File key would be assigned to each files, and it's required to make a successful download.
*Corresponding author's e-mail address: ____not Indicated_____ An option to share medical details is also available for the user by navigating to the SHARE MEDICAL DETAILS menu, here users can share file with encrypted data file via Email address.To view or delete shared data files, the user is required to navigate to the SHARED DETAILS menu.In cases where there are large volume of stored data file, a SEARCH menu can be of help to the user to find stored data either by Email ID or by File name.To decrypt encrypted file, the user runs the decryption Jar app which can be downloaded from the OTPH menu.After running the decryption Jar file, user can input the encrypted file and key associated to it and click Decrypt button for successful decryption of encrypted file.

Third Phase of the Architecture Data Analysis Pattern Prediction Unit (DAPPU)
This module extends the Health Care web application by allowing Users input/set health details to either predict immediately using K-Nearest Neighbor KNN algorithm or update data for later prediction.If data is updated, it adds information about a user to the training sets.The prediction phase shows Predicted Results of disease, from training set/sample data showing the nearest neighbors and most likely occurrence.K-Nearest Neighbor algorithm (KNN) is a supervised learning algorithm that is applied to fields of data mining, statistical pattern recognition.

Algorithm 2:
Algorithm to calculate KNN i.
Store the output values of the M nearest neighbors to query scenario q in vector r = {r1,…..,rM} by repeating the following loop M times: a. Go to the next scenario si in the data set, where i is the current iteration within the domain {1,…..,P} b.If q is not set or q < d(q, si): q←d (q, si), t ← oi c.Loop until we reach the end of the data set (i.e.i = P) d.Store q into vector c and t into vector r ii.
Calculate the arithmetic mean output across r as follows: iii.
Return ṝ as the output value for the query scenario q Source: (Yuliang et. al., 2015) Calculating the Distance: A case is classified by a majority vote of its neighbors, with the case assigned to the class most common amongst its K nearest neighbors measured by a distance function.If K=1, then the case is simply assigned to the class of its nearest neighbor.The Euclidean distance formula from (maeb4, 2015) is presented in eqn (6).

Explanation of DAPPU process flow
This phase starts with section where a user inputs medical details with options of either saving and updating the database or predicting the outcome of a particular disease or illness.The application accepts inputs (which serves as features) from the user, this features are being extracted and assigned various weights, and the weighs are determined by the efficiency of features in determining the outcome of each available disease or illness, for example: the feature "Stress" can be given a weight of 10, simply because "Stress" is seen as a feature that cause an illness, while the feature "No history of Family Disease" can be given a weight of -10, because the patient is less likely to be prone to illness if there's no history of such in the family.
Next step is to calculate distance between using the Euclidean distance formula for the new data entry to the existing ones (training samples), that is, how close the data to be predicted is nearer to the set of data stored in the database.After these distances are measured and gotten, they are sorted, and the nearest neighbor based on the k-th minimum distance is being determined.Since this is a supervised learning, get all the features of your new data for the stored value which falls under K is gotten.The majority of nearest neighbors is then used as the predictive value.

Fourth Phase of the Architecture Data Analysis and Prediction Outcome
This phase is solely responsible for analyzing results of data been mined from the DAPPU phase of the architecture.For example, we analyzed how likely an individual is prone to some illness or disease.From results gotten, we can as well provide health solutions to those in need, percentage of each category of illness or disease and also causes of those diseases.

IV. IMPLEMENTATION
The CloudSim Toolkit (A framework for Modeling and Simulation of Cloud Computing Infrastructures and Services) is extended to simulate the cloud relationship between the CSPs and the HCPs, where CSP provides a deployment environment for the HCPs, for example, storage allocation, memory allocation, task scheduling, bandwidth allocation, CPU allocation and Virtual Machine provisioning.
Of nearly 6 different versions of CloudSim that are currently available (Kumara & Saxenab, 2015), Network CloudSim was chosen due to its relevance to this work.The parameters considered for simulation include the number of networked systems at each HCPs, user, resources, time required to send and receive data, cloud service provider, cloud services, storage capacity among others.Other information regarding simulation parameters are detailed under section 3.2.
Only text data are allowed to be shared.Efforts are ongoing to include audio, video and multimedia data.Any information obtained or shared relating to patients should not be disclosed by whosoever privileged to access such information.Any authorized user is expected to log in twice after which subsequent trial will be denied if the same login parameters are used.
In Define Internet Characteristics; Internet characteristics can be configured, the Delay Matrix (the transmission delay between regions) and Bandwidth Matrix (the available bandwidth between regions for the simulated application).The following results can be determined with the experiment; Overall response time, Data center processing time, etc.      *Corresponding author's e-mail address: nurayhn1@gmail.comdoi: http://dx.doi.org/10.4314/njtd.v13i1.4

V. CONCLUSION
This research study provides a platform which embodies a UAACE that guarantees the security and confidentiality of patients' medical records and as well prevents unapproved access to such records.It also provides a means through which useful information can be mined through its DAPPU which can as well lead to effective decision making by concerned stakeholders.Healthcare organizations in the developed nations of the world can as well benefit immensely from the cloud based platform implemented in this research as it will also provide financial relief to health patients.The platform can as well be adapted to suit the needs of business organizations outside the healthcare industry.

VI. FUTURE WORK
After this project has achieved the aim of implementing a reliable cloud based healthcare delivery system, more work is needed in analyzing other security mechanisms better than the RSA algorithm by using a more advanced feature with real data with the DAPPU.Lastly, testing various types of pattern predictions against the KNN algorithm will also be carried out in the future work.

Figure
Figure 1: A 4-Tier Cloud-Based Architecture for Health Care Delivery System Table 2: Cloud Service Provider versus Cloud Services SaaS (X) IaaS(Y) PaaS(Z) CSP1 CSP2 CSP3

Figure 3 :
Figure 3: User Authentication and Access Control Engine (UAACE) process flow.

Figure 5 :*
Figure 5: Predicted Result of disease from training set.

Figure 6 :
Figure 6: Main Configuration menu, to set user bases and application deployment.

Figure 9 :
Figure 9: Display of Complete Simulation with Maximum, Average, Minimum response time for User bases.

Figure 10 :Figure12:
Figure 10: Detailed Result of Simulation showing Overall Scheduling and Time Summary.

Figure 13 :
Figure 13: Prediction of disease from training set/sample data showing the nearest neighbors.

Table 1 : Summarized comparison of cloud simulators Cloud Simulator Language of coding Open Source availability Simulation speed Graphical environment
Configuration Simulation menu, we set Configuration for USER BASE (HCPs), which consist of the following options; Add a new HCP, delete an existing HCP, set HCP geographical region, request per HCP per hour, Data Size per Request (bytes), Peak Hours Start(GMT), Peak Hours End (GMT), Average Peak Users, Average Off Peak Users.The Application Deployment Configuration includes the following options; add a new Data center, number of Virtual Machines (VMs), Image size as well as Memory and Bandwidth size.