Thursday, 24 October 2013 00:00
We are in the era of big data – datasets so massive and complex that they defy simple analysis. However, the benefits of even small data appear to elude us still.
Though we spend considerable time and energy collecting data even using taxpayers and State money, what we finally end up doing is collecting rather than sharing, misusing than using, and hiding than opening the data that we have. As a result what we collect does not result in better and enlightened decision making.
Though we comfortably accept the data that we have collected by internal processes, a closer look may indicate issues for concern rather than comfort. The need for better verification and validation will come to light with close scrutiny. This realisation may not come if the collection itself is perhaps the major outcome intended.
A simple statistical compilation with a summary report may end the story of data gathering. It is no secret that we engaged in data collection fairly frequently for a myriad of reasons and this is almost a favourite past time or a favourite way to pass time.
I have heard while sitting in a business school how intelligence personnel filled questionnaires from home and lo and behold the final result reasonably matched the data presented in advance. Extend that type of behaviour to market research, drug assessments, fertiliser applications and yield assessments and evidence based decision making and your position in future is anything but certain.
Where have all the tourists goneIt is indeed interesting reading perhaps for the second time Srilal Miththapala’s favourite question: Where have all the tourists gone? His analysis carried out a couple of times makes interesting reading and shows how even a simple situation can get complicated when proper data is not available or data used is based on poor definitions.
The definition of a tourist as stated by him used by the Immigration statistics is a passenger who stays on for more than 24 hours in the island. The data coming from Immigration statistics is thus considered to be the final assessment. The error or the shortcoming of this definition is obvious. The genuine foreign tourist number has to be a sub set of the number but what is the difference? With some people opting for tourist visas due to the lack of hassle, the summation of totals will not tell us much.
We have planned and have high expectations from tourism as a way to make the economy grow and counting tourists at all entry points too is a favourite activity of the day. We release month-on-month and year-on-year figures and graphic artists come out with curves on top of each other demonstrating how well we are doing by being able to attract tourists to our country.
That this data can have economic ripple effects is no secret as many have planned expanding rooms and space based on visitor growth expectations. Some others will also queue up in expectation of spill-over effects.
The expenditure and the investment plans do consider a type of tourist that is expected – pegging to a certain part of the price spectrum. You should not be planning 1,000 dollars a night rooms when unregistered smaller ‘mom and pop operations’ are the order of the day, as per Srilal. If the numbers are growing due to the influx from the lower economic segment, then that is hardly encouraging news. What Srilal is showing is arrivals and typical room occupancies do not exactly tally.
The analysis is not exactly Malcolm Gladwell style perhaps, but in his case there is no such need to go so much behind data to study this situation. He is coming out with the plausible scenario of what is happening on the ground. However, if there is more granularity on this picture, that would be quite useful as important corrections could be applied.
It is hoped that the decision makers internally have much more clarity than what we the public appear to notice and think. With Srilal questioning twice on the same theme, that appears to be wishful thinking. His write-up is a pointer to be more accurate with definitions and more analytical with data and also the need to pursue data behind data in putting the picture right. In this day and age, this is not difficult.
Global Competitive Index
Again on a scale of country comparison, a new classification of nations as per the Global Competitive Index poses us a new question with regard to generation and use of data. The Daily FT ran this as a headline because Sri Lanka has made some strides upwards in the ranking from last year. The paper headlined it ‘Lanka nudges up in Global Competitive Index,’ for the country having moved up to 65 from 68 in the previous year. It is not a significant jump upwards but a rise nevertheless; however the overall score had not changed.
We accept this analysis and try to swim with it and no counter view is usually offered. We usually do not go seriously behind the data, especially with 12 factors used, to see why and how. However, a single data point with respect to a sub indicator appears to have caused the writer to ask the question, can this be right?
The ranking as per infrastructure Sri Lanka has dropped to 73rd out of 148 countries from 62nd out of 144 countries last year. As per investments made – there is no question that in Sri Lanka this aspect has indeed has received attention and is evident in some infrastructure segments wherever you go in Sri Lanka from north to south or from west to east – and visual evidence available this should not be so.
Roads and highways, telecommunication and power – these infrastructure segments have received considerable attention. Sewage, wastewater and solid waste management areas which too are important infrastructure needs for balanced development and decent living have lagged behind. The question on this sub indicator had been from a visual perspective – How can that be? A more detailed analytical examination should be interesting and that is what one should be doing with all the other factors as well.
One way of ensuring better assessment is being in charge of data and ensuring validity, accuracy and transparency. Keeping data within and confined to mandatory annual reports – which really does not reach the public and most times not even to the institutional members – will not help the cause. This was pointed out with respect to the Global Innovation Index in an earlier column. Hence to derive benefits of GCI rankings, we ourselves should be much more aware of data fields and work towards knowing well yourself the situation and ensuring that this knowledge is shared well, enabling capture from a distance.
Exploiting big data
We should also see the futility of closely guarding datasets generated as a result of surveys. The era of big data uses different source points and if access is allowed you can literally build profiles far more advanced from the data contained within than possible through a simple questionnaire.
Consider the amount of data that is being generated when one buys a SIM for a mobile and then queuing in to pay the monthly bill. Your name, birthday, profession and location are also waiting to be filled up to data fields and we happily share all this information. The mobile company only uses some of these data for billing and to direct the bill to the consumer. They further are aware of all our contacts and habits too. It is similar with credit cards. These are examples where big data currently resides awaiting exploitation.
More understanding is needed to make use of the remaining data but the potential is visible. The resource is considered in monetary terms and these institutions may have additional revenue generating mechanisms at our expense. With abilities to dissect and utilise such data sets becoming available, we should move away from the mundane data gathering to smarter systems and of course better decision making. Note the initiative of the Obama administration which announced the Big Data Research and Development Initiative, which explored how big data could be used to address important problems faced by the federal government.
Big or small, data is important. However, it is quite important that we use what is right and important. One should not be blind to just one number. Life has become slightly too complex for a single number to do justice. One must understand the process of generating the data point, dataset, etc. Proper understanding will benefit the subsequent analytic process.
We have moved on with the use of qualitative jargon instead of the more precise quantitative approach. This up to now may have been an acknowledgement of the tacit understanding of the lower reliability. However, some organisations becoming custodians of significant amount of data with the possibility of analytics is throwing a whole new light on many of the social and economic processes and it is time to think big with a big data mentality.
[The writer is Professor of Chemical and Process Engineering at the University of Moratuwa, Sri Lanka. With an initial BSc Chemical engineering Honours degree from Moratuwa, he proceeded to the University of Cambridge for his PhD. He is the Project Director of COSTI (Coordinating Secretariat for Science, Technology and Innovation), which is a newly established State entity with the mandate of coordinating and monitoring scientific affairs. He can be reached via email on [email protected]]