How to make a statistical model public from your computer

We all have a calculator in our pockets these days. However, when you need to do statistical calculations, with a program that is much more elaborate than the functions of a scientific calculator, to predict a phenomenon or the probability of a business phenomenon, such as the probability of converting a potential customer into a customer, you need a sort of statistical calculator accessible from the Internet. It is used to increase, for example, the performance of salespeople.

 

Some consider moving to the cloud, that is, someone else’s computer or calculator or server, a pillar of digital transformation for SMEs. Personally, I don’t really agree in the context of small and medium-sized companies, because they have advantages if they run certain services “in house,” even though they may come with extra risks and costs.

I will take it for granted that you have read when you may need a statistical model. And if you have also listened to the podcast episode on the topic what follows will be more understandable.

 

A third-party calculator accessible from the internet costs no less than $2.5/month. Some examples with numbers relative to the date I wrote the article

Render: going to services, web services (our web calculator) we find a minimum of $7/month. In the past I have seen them at 4.

DigitalOcean: going to droplet, it starts at $4/month. Some insiders might tell you to use AWS Lambda (Amazon web services), but there is a decidedly expensive quibble there. One of the reasons why I don’t like many big companies.

 

HuggingFace: going to inference points, a term already discussed, you find $0.032 per hour of computation. Generous considering that as a micro, small business you are unlikely to have statistical calculators that process the result in more than a minute. Of course multiplied by the number of times you run that count.

 

Github allows you to host the calculator for free only through static pages, it can be used in special cases and it is convenient with the Python language instead of R.

The solutions presented so far deal with both the calculator and its accessibility via the internet.

 

Suppose you want to use a business PC as your calculator, a choice that makes more sense from two points of view. In this case you will be able to access the calculator in a limited way, that is, internally. Assuming you need it, you can expose it to the Internet, e.g., for employees working remotely, using ngrok. You can access the calculator using the link (link, URL) statistic they offer for free. Many other services (localtunnel, pinngy, etc.) have it dynamic, unless you pay a price that I think is excessive, which would bring the choice to the services above, as they are more comprehensive. If you want a branded link, with your company name, you need to pay ngrok.

If, on the other hand, you host your company site on a provider that allows you to use Cloudflare, you only need to do a few steps at no cost and have the link customized, e.g., calcatorestatistico.nomeazienda.it. In other words, Cloudflare allows you to create subdomains, in this case calcatorestatistico, for your domain, i.e., your company’s site name.

 

Let’s take a step back, however: how to have the calculator on the business computer? Let us take the simplest case, a statistical calculator with a graphical, rather than programmatic, interface based on R’s shiny library, which is used to make data dashboards. We assume that developing it on Python gives fewer problems. On R, the “more academic than business” statistical language, after creating the shiny app via Rstudio (named app.R), you need to run the file from the command line (cmd on Windows). If run through RStudio (the most widely used workbench for R) as a background job and bread viewer, it consumes more computational resources.

 

Instead, if from the command line (cmd) we execute

C:\Program Files\R\R-4.0.3\bin\Rscript.exe

C:\Users\[UserName]\Documents\ShinyAppLocal\app.R

 

the consumption, on RAM (second column), is lowered because we do not go through Rstudio.

We will have a command line window open with our microservice running. Which is annoying, especially if that company PC is actively used by some employee. The hassle could be avoided on Linux distributions instead of doing everything on Windows.

 

In any case, the microservice turns out to be accessible only locally. You need to use Cloudflare to complete the work. You can see an example of a non-calculator by going to https://shinytest.staticalmo.com/, which I used to have up for 8 hours per day, as we can also schedule when to run that code from cmd, using the Windows program “Scheduler Utility” and/or other R libraries.

 

To exaggerate, a service like this costs you, in energy €0.15/month.

In the more classic case, you wanted a calculator that can be used programmatically rather than graphically, therefore via API, the energy cost would obviously be lower.

There are other solutions to run the microservice on a company PC that does not have R installed. For example, by standardizing the code in a container.

 

Warning: exposing a statistical model in public does not have potential consequences like exposing a data dashboard or database . Especially in the last case, you are at great risk, unless you take precautions that are part of good practices.

 

I repeat: for medium-sized companies and above, this approach is limiting and/or inconvenient from various points of view.

 

Do you know your company needs a statistical model? Let’s discuss in a free call how to operationalize it. Because in some cases the knowledge that comes from having a model is enough, you don’t need to pull up all this circus.

 

Privacy Policy