Difference between revisions of "Data Collector Consolidator"
GarciaNieto (talk | contribs) (→Service Environment and Set-up on VM1) |
GarciaNieto (talk | contribs) (→Installation of RDF Repository) |
||
| Line 68: | Line 68: | ||
==== Installation of RDF Repository ==== | ==== Installation of RDF Repository ==== | ||
| − | Download and install OpenLink Virtuoso. [[ | + | Download and install OpenLink Virtuoso. [[https://en.wikipedia.org/wiki/Virtuoso_Universal_Server\Click the link]] |
==== Installation of Piwik ==== | ==== Installation of Piwik ==== | ||
Revision as of 10:20, 15 January 2016
Contents
Overview
The process of translating the collected data from different sources to RFD to consolidate is carried out by means of mapping functions. Each data source (Google Analytics, Piwik and Competitors' Data Collector) considered in the scope of the SME E-COMPASS project has a different method for collecting, gathering, and providing access to the analytical data. Therefore, a different set of mapping functions is required in order to parse the information provided by each data source to RDF, according to the ontology. Each set of mappings is then composed by functions to translate the attributes with their values into their corresponding triplet form in RDF.
The Data Collector and Consolidator is consists of 4 main services:
- Mapping Functions
- REST API
- RDF Repository (Virtuoso OpenLink)
- Piwik Administrator Service (Piwik Open Analytics Platform)
Almost all of these services are currently allocated in the same virtual machine VM1. There is only one service in a separate machine (VM2) which consists on the Virtuoso RDF Repository. Nevertheless, each service can also be easily installed and configured in a different machine in order to get a fair trade-off between network connection and resource requirements, for the sake of a good load balancing.
Mappings functions collect data from digital footprints (GA, PIWIK) and from Competitors' Data Collector module to generate RFD data and store them on the Virtuoso service. API REST functions read processed information from RDF Repository and returns data in JSON format to the remaining modules of E-Compass Data Mining Application.
Physical Hardware Characteristics
| Model: | HP ProLiant DL380 G5/HP ProLiant DL360 G5 |
| Processor: | 8 cores 2GHz/8 cores 2,5GHz |
| RAM: | 32GB/40GB |
| Hard Drive Space: | 4TB Shared |
| Network Connection: | 10 Gbit Ethernet |
| Hypervisor used: | VMware ESX 6 with vSphere Center |
| Physical Load Balancing: | none |
Virtual Machine Hardware Specifications and Operating System
| Guest Operating System: | Linux CentOS 6 |
| Processor: | Dual 2 core Intel processor and i7 intel 5 cores |
| RAM: | 16 GB GB |
| Hard Drive Space VM: | 100 GB internal |
| Network Connection: | 10 Gbit/s Ethernet |
| Minimum required Network Connection: | no info available |
Service Environment and Set-up on VM1
For setting up the API please download and install the following software:
| Software | Download |
|---|---|
| Apache web Service 2.4 | http://httpd.apache.org/ |
| Apache Tomcat/7.0.32 | http://tomcat.apache.org/ |
| MySQL Server 5.6 (Community Edition) | https://dev.mysql.com/downloads/mysql/ |
Software Licenses
Please indicate if a commercial provider would need to buy commercial licenses of a certain software used for operating the service and – if so – what cost this may produce approximately Openlink Software Virtuoso Universal server (used as RDF repository in the E-Compass Data Mining Services) requires a software license, which is free of cost for academic use only. In order to run this software productively a commercial license is required. The terms of licensing are available here
OS Environment Variables
…
Installation of Mappings Functions
…
Installation of REST API
…
Installation of RDF Repository
Download and install OpenLink Virtuoso. [the link]
Installation of Piwik
Download and install Piwik following the 5 minutes installation guide on your own machine. Click the link
Service Configuration
Configuration script
availability / location
README / User Manual
availability / location
Configuration steps
…
Configuration of REST endpoints at:
Operation
Service startup procedure
…
Restarting the service
…
Service Logs
…
Recurring Manual Actions / Maintenance Tasks
…
Other
…
Limitations of the service
- With which parameters does the service scale?
How many concurrent E-Shops, how many concurrent products and how many users/E-Shop customers are possible without causing loss in quality/speed for the hardware described above?
- We are currently managing 20 E-Shops with data for 2 months by average (~30GBs + 20GBs)
- If higher scaling was wanted, which of the hardware parameters would need to be increased?
- RAM and HD storage
- What else would be adjusted for higher scalability?
- …
- Which further configuration would be necessary?
- …
Contact Information Data Collector & Consolidator Service
José Manuel García Nieto, jnieto@lcc.uma.es, +34 951 952924