Data Collector Consolidator

From E-COMPASS_Info_Guide
Revision as of 14:51, 13 January 2016 by GarciaNieto (talk | contribs)
Jump to navigation Jump to search

Overview

The process of translating the collected data from different sources to RFD is carried out by means of mapping functions. Each data source considered in the scope of the SME E-COMPASS project has a different method for collecting, gathering, and providing access to the analytical data. Therefore, a different set of mapping functions is required in order to parse the information provided by each data source to RDF, according to the ontology. Each set of mappings is then composed by functions to translate the attributes with their values into their corresponding triplet form in RDF.

All services of Data Collector and Consolidator are allocated in the same virtual machine VM1. There is only one service in a separate machine (VM2) which consists on the Virtuoso RDF Repository. For this last, the only requirement is to install a Virtuoso service in a different machine, for the sake of a trade-off load balancing. Then, Mappings functions collect data from digital footprints (GA, PIWIK), store RFD data on this Virtuoso service. API REST M1 functions read processed data from these services to return data in JSON format to the remaining modules of E-Compass Data Mining Application. We will explain with more detail in the following sections

Mappings.png

Physical Hardware Characteristics

Model:
Processor:
RAM:
Hard Drive Space:
Network Connection:
Hypervisor used:
Physical Load Balancing:

Virtual Machine Hardware Specifications and Operating System

Guest Operating System:
Processor: Dual 2 core Intel processor and i7 intel 5 cores
RAM: 16 GB GB
Hard Drive Space VM: 100 GB internal
Network Connection: 1 Gbit/s Ethernet
Minimum required Network Connection:

Service Environment and Set-up on VM1

The API of the system and the corresponding database are located on VM1. The API is based on the Flask Microframework for Python and running within an Apache Webserver . The database is a MySQL database . The interface is implemented in Python. For setting up the API please download and install the following software:

Required Software
Software Download
Apache 2.4 http://httpd.apache.org/
Python 2.7 https://www.python.org/download/releases/2.7/
MySQL Server 5.6 (Community Edition) https://dev.mysql.com/downloads/mysql/
Mod_wsgi for Apache 2.4 and Python 2.7 http://www.lfd.uci.edu/~gohlke/pythonlibs/#mod_wsgi

Software Licenses

Please indicate if a commercial provider would need to buy commercial licenses of a certain software used for operating the service and – if so – what cost this may produce approximately Openlink Software Virtuoso Universal server (used as RDF repository in the E-Compass Data Mining Services) requires a software license, which is free of cost for academic use only. In order to run this software productively a commercial license is required. The terms of licensing are available here

OS Environment Variables

Installation of Mappings Functions

Installation of REST API

Service Configuration

Configuration script

availability / location

README / User Manual

availability / location

Configuration steps

Configuration of REST endpoints at:

Operation

Service startup procedure

Restarting the service

Service Logs

Recurring Manual Actions / Maintenance Tasks

Other

Limitations of the service

With which parameters does the service scale?

How many concurrent E-Shops, how many concurrent products and how many users/E-Shop customers are possible without causing loss in quality/speed for the hardware described above?

We are currently managing 20 E-Shops with data for 2 months by average (~30GBs + 20GBs)
If higher scaling was wanted, which of the hardware parameters would need to be increased?
RAM and HD storage
What else would be adjusted for higher scalability?
Which further configuration would be necessary?

Contact Information Data Collector & Consolidator Service

José Manuel García Nieto, jnieto@lcc.uma.es, +34 951 952924