Cost estimate

From E-COMPASS_Info_Guide
Jump to navigation Jump to search

Cost estimate

For the future owner of the E-Compass Data Mining Services it is essential to be aware of the operational as well as the migration cost. As the migration of the services to a new data center essentially requires the deployment of a couple of virtual machines images which are available in the OVA (open virtualization archive) format and only minor adjustments by editing the service location URIs and running the available service configuration scripts the migrational cost can be neglected in comparison to the operational cost - as well in terms of finances as in terms of the time required. The complete migration of the services should be done within one or two days.

This leaves us with the estimation of the operational cost. In order to estimate this we need the following:

Virtual Machine Specification for all service modules

  • provision cost in data center (the cost estimate should include energy consumption and hardware maintenance, redundancy and backup as needed)
  • by checking all inclusive public cloud offerings we can get a fairly good estimate of the provision cost, given that we know the requirements in terms of CPUs/Cores, RAM, Storage, Network capacities needed.

Maintenance efforts for each service module

  • estimate of the overall effort to maintain the system
  • staff required for service maintenance

Helpdesk / user support costs

  • Experience from user testing: how much effort is needed to get the services set up for each E-Shop?

Limitations

In order to get the dimension of a productive environment correctly - and therefore also the cost estimate - we need to derive the limitations of the current implementation of the different services. This includes the knowledge on all parameters that have influence on the scaling of the different services modules. These include:

  • current load on VM
  • number of E-Shop users
  • number of products monitored for each user

This information then allows to derive

  • scaling limits for these VM specifications
  • VM cost depending on the parameters relevant for scaling
  • general provision cost per user, per product or other parameters like e.g. the number of rules defined for the notification & actions engine

General Scaling Limits

For a proper operation of the E-Compass Data Mining Services it is also helpful to be aware of some general scaling limits. Questions here are how far scaling can be reached by using more hardware capacity. Depending on the software implementation of the services larger scaling may also require the implementation of further parallelization of the software source code if higher scaling is needed. All these issues need to be considered in order to get an appropriate cost estimation for the productive operation of the services.

Hardware Characteristics in Test-Phase

The following table gives an overview over the hardware characteristics used in the current implementations of the services and therefore over the virtual hardware used in total for all E-Compass Data Mining Services:

Service Module ECC DCC & DA Virtuoso CDC VM1 CDC VM2 NAE Total
Guest Operating System CentOS 6.6 - x86_64 CentOS 6.6 - x86_64 CentOS 6.6 - x86_64 Windows 8 Enterprise, 64 Bit Windows 8 Enterprise, 64 Bit Windows 8 Enterprise, 64 Bit 64 bit Linux and Windows
Processor 2 Processors, 4 Cores 2.20GHz Dual 2 core Intel processor i7 intel 5 cores 2 Processors, 2.27GHz, 4 cores 2 Processors, 2.27GHz, 8 cores 4 Cores 10 Processors / 27 Cores
RAM 8 GB 16 GB 16 GB 8 GB 32 GB 3 GB 83 GB
Hard Drive Space VM 68,36 GB (SAN/NAS configuration) 100 GB internal 100 GB internal 50 GB 533 GB 44 GB ca. 900 GB
Network Connection 1 Gbit/s 1 Gbit/s Ethernet 1 Gbit/s Ethernet 10 Gbit/s 10 Gbit/s 10 Gbit/s 1 Gbit/s

Scaling Limits during Test-Phase

The parameters relevant for scaling are described in the following table below. The term "not in sight" means, that during the test-phase the available hardware was not even close to their limits and therefore this parameter should not be critical for scaling.

Service Module ECC DCC & DA Virtuoso CDC VM1 CDC VM2 NAE
Scales with E-Shops Users Users Competitors monitored Competitors monitored Number of rules and total number of users
Limit Processors / Cores not in sight not in sight not in sight not in sight 10 Competitors 3000 rules / limit of users depends on user behaviour
Limit RAM not in sight ? ? not in sight 40 Competitors 3000 rules / limit of users depends on user behaviour
Limit HDD not in sight 60 E-Shops with data for 2 months 100 E-Shops with data for 2 months not in sight not in sight 3000 rules / limit of users depends on user behaviour
Limit Network if there is a limit at al it is the network, unless the communication is redesigned and all service modules write to a distributed database instead of communicating via REST not in sight not in sight not in sight not in sight not in sight
Necessary Actions for Scaling eventually changing the communication between ECC and service other modules more Storage more Storage none more CPU, RAM & Parallelization more CPU, RAM & Parallelization

Infrastructure Cost Estimate

During the test phase the total required storage was below 1 TB. The cost for 1 TB of storage in Amazons S3 storage would be around 30 € per month. As we will see in the following the storage cost is therefore neglectable in comparison to the required server infrastructure.

Looking at the infrastructure as a service offerings and pricing of Amazon Web Services as an example of a possible provider of data center infrastructure the optimal server package for the current dimensioning of the E-Compass Data Mining Services (including some scaling reserves) would be an M4 instance with a sufficient number of CPU/Cores and fairly large RAM. The m4.10xlarge instance with 160 GB of memory, 40 vCPUs on a 64-bit platform would arise costs of approximately 1.000€ per month inclduding the complete data center operation with all additional cost, such as e.g. energy consumption and maintenance.

Given that the system dealt with 10 concurrent E-Shops over the testing period without reaching the scaling limits this would make a maximum of 100€ per E-Shop per month on the infrastructure side.

The third factor relevant for the infrastructure cost estimate is the network throughput. For high-bandwidth network connections the cost scales with the throughput, not with the bandwidth provided. For 10 E-Shops a realistic estimate of the network throughput for the most intense module - the Competitors Data Collector - would be at a maximum of 2 TB per month which would add up costs in the range of 180€. This would add a cost of 18€ per E-Shop and month in average.

Helpdesk and Maintenance Cost Estimate

In order to come to the full cost of operation the maintenance cost on the software / service side as well as the cost for helpdesk operation (in order to help E-Shop owners with configuration and bugfixing) additional personal costs have to be considered. In order to keep these costs in a reasonable range a 9 to 5 support is proposed with 2 employees (redundancy for illness and vacation) with an average monthly income of 3.000€. The cost for the employer are estimated with 4.500€ per month for each employee which makes a cost of 9.000€ per month. Depending on the wage level in different countries the actual costs may vary significantly!

Relation between Infrastructure and Helpdesk/Maintenance Cost

Making this estimate the costs for support and software maintenance are in a completely different dimension than the infrastructure cost (including infrastructure maintenance). In order to get them to a comparable level the infrastructure could be scaled up to cope with 100 E-Shops instead of 10 E-Shops. From the experience of the testing phase the software maintenance effort and user support should allow for the support of 100 E-Shops with the support personell in the above estimate.

Conclusion

In conclusion this would add up to a total cost of approximately 220€ per E-Shop and month including the complete infrastructure (CPU+RAM+storage+network throughput+energy+operations), infrastructure maintenance, software maintenance and user (i.e. E-Shop owner) support.

As mentioned in the beginning the migration cost would be neglectable compared to the cost of operation as the migrational effort will be very small. Even if we consider that for higher scaling in the range of 100 concurrent E-Shops further parallelisation of program code would be required (that is for the Competitors Data Collector and the Notification & Actions Engine) the development cost should be below the cost for one month of operations. Therefore the figure of 220€ per E-Shop and month should describe the maximum of cost which one has to consider for a future productive operation.