Jordi Inglada
11 Nov 2019

Several tonnes of reasons not to go to IGARSS 2020

As every 10 years, IGARSS will take place in Hawaii in 2020. This time it won’t be in Honolulu as in 2000 and 2010, but in Waikoloa, in the “Big Island”.

I went to Honolulu for the 2 previous events, and it would be nice to go there again, visit another place and meet with colleagues and friends that I don’t see often out of this kind of gathering.

But the issue is that, without falling victim of solastalgia, I find it difficult to justify flying for about 50h for a conference. As most of my colleagues, I have done it plenty of times. Thanks to IGARSS and since 1998, I have been to a lot of interesting places and met brilliant people from the remote sensing community. But I find it ironic that people observing our planet from space and measuring how climate and biodiversity are going astray wouldn’t change their behaviour and reduce their impact.

Every IGARSS has a particular theme. Here are the ones for the previous 6:

In 2020, the theme is Remote Sensing: Global Perspectives for Local Solutions.

One can see that the environment, our living planet, energy etc. are some of the focus of the community who attends these events. This is why the choice of a place which for most of the attendees will need between 12 and 50 hours of travel by plane is questionable. Some may try to get there by other means, but Hawaii is a 6h flight (one way) for everybody.

Let’s do the math. If we assume greenhouse gas emissions of 1/4 tonne CO2 equivalent per hour flying, this is between 3 and 12 tonnes per person (knowing that In order to stop climate change, 0.6 tonnes is the maximum amount of CO2 that can be generated by a single person in a year). Let’s assume an average of 7. IGARSS 2019 in Yokohama had 2600. We can imagine that at least the same amount of people would want to go to Hawaii, although one could argue that Hawaii may attract more people. The calculator says that 18200 tonnes of C02 would be emitted just by flying to IGARSS, that is the maximum amount that 30,000 people can produce in a year if we want to stop climate change.

Of course, this back-of-the-envelope calculation may not be very accurate, but I think that the orders of magnitude are good.

I can only speak for myself, but I don’t think that my contribution to Earth observation that could potentially be used to mitigate climate change and biodiversity degradation is worth the emissions.

Meeting the remote sensing community is useful to advance science and technology, but other ways can be used. The GRSS society has started a new initiative, as announced by its president:

[…] in 2020 we are starting three regional conferences held in locations far from the IGARSS flagship conference. The idea is to help communities that cannot come to IGARSS because of distance, but also because of economic issues or other barriers, and organise dedicated events.

Let’s hope that these events replace the trips to distant venues and do not add up with them!

Tags: eco politics remote-sensing
23 May 2016

Sympathy for the Evil: let's help improve Google Earth Engine

Please allow me to introduce a couple of ideas which should help improve the user experience on the GEE platform. I know that Google, a company of wealth and taste, has an impressive record on providing services with outstanding features. They have the best search engine, the best web mail application and the best web browser1.

But these services and tools are targeted to non expert users. With GEE, Google is addressing a complete different audience: scientists, or I should say Scientists. These are clever people with PhD's! Therefore, in order to keep them satisfied Google will have to make an extra effort. One could think that scientists can easily be fooled because, for instance, they agree with giving away to private companies the results of research funded with tax payer money2. Or because they accept to be evaluated by how many times their tweets are liked3. Seeing scientists like this would be a mistake. They are very demanding users who only want to use the best tools4.

But Google has the technology needed to attract this smarter-than-the average users. Here go some ideas which could make GEE the best platform for producing impactful research using remote sensing data.

Executable papers

I think that it would be nice to introduce some literate programming facilities in the code editor. This could be similar to what can be done with Emacs org-mode's Babel or Knitr for the R programming language. This would allow to directly write scientific papers on the GEE editor and keep together notes, formulas, code and charts. Of course, exporting to Google Docs would be also very useful so that results can be integrated in slides or spreadsheets.

The possibility of citing bibliographic references should also be integrated in the editor. I suppose that a Google Scholar search function would not be difficult to add. Oh, yes, and Google Books also, by the way. Actually, using the same technology Google uses to insert advertisements in search results or in Gmail, it would be possible to automatically suggest references based on what the user is writing.

In these suggestions, papers produced using GEE could come first, since they are better. Papers written by people in the author's Google contacts list could also be promoted: good friends cite friends and the content of e-mails should help the algorithms determine if they are collaborators or competitors. But let's trust Google to find the algorithm which will make the best suggestions.

Many software development environments have code completion. In the case of GEE the technology5 would be much more powerful since all the code written by all scientists could be used to make suggestions. The same technology could be used to suggest completions for the text of the papers. We all know how boring is writing again and again the same "introduction" and "materials and methods" sections. Google algorithms could introduce some randomness and even compute a plagiarism score to help us make sure that we comply with the scientific literature standards. Of course, the "Conclusions" section could be automatically produced from the results using Google's AI technology.

It would also be nice to have some kind of warning if the user was designing an experiment or a processing chain that somebody else had already done. So some kind of message like "this has already been done" together with the link to the corresponding paper would be great. Also, automatic checking for patent infringement would be useful. Again, Google has all we need. In this case, the warning message could be "I can't let you do that Dave".

Massive peer review

The executable paper written using what has been described above could be made available through Google Plus as a pre-print. Actually, nobody would call that a "pre-print", but rather a paper in beta. All people in the author's circles could be able to comment on it and, most importantly, give a +1 as a warrant of scientific quality. This approach could quickly be replaced by a more reliable one. Using deep learning (of course, what else?) applied to the training data base freely generated by GEE early adopters, Google could propose an unbiased system for paper review which would be much faster than the traditional peer review approach. The h-index should be abandoned and replaced by the paper-rank metric.

Funding research

Thanks to GEE, doing remote sensing based science will become much cheaper. Universities and research centres won't need to buy expensive computers anymore. Instead, just one Chromebook per person will be enough. Actually, not even offices will be needed, since WiFi is free at Starbucks. Lab meetings can be cheaply replaced by Google Hangouts.

However, scientists will still need some funding, since they can't live on alphaet soup and coffee is still not free at Starbucks. Google has a grant programme for scientists, but this is somewhat old school: real people have to review proposals and even worse, scientists have to spend time writing them.

Again, Google has the technology to help here: "AdSense is a free, simple way to earn money by placing ads on your website." Scientists who would allow ads on their papers, could make some revenue.

Conclusion

I know that in this post I have given away many ideas which could be used to get venture capital for a start-up which could make lots of money, but this would be really unfair, because all this would not be possible without:

  • Google Earth Engine
  • Gmail
  • Google Chrome
  • Google Docs
  • Google Scholar
  • Google Books
  • Google Patents
  • Google Plus
  • +1
  • Chromebook
  • Google Starbucks
  • Google Hangouts
  • AdSense
  • Google's Youtube

Don't forget that the mission statement of GEE is "developing and sharing new digital mapping technology to save the world". And anyway, section 4.3 of GEE Terms of Service says6:

Customer Feedback. If Customer provides Google Feedback about the Services, then Google may use that information without obligation to Customer, and Customer hereby irrevocably assigns to Google all right, title, and interest in that Feedback.

Footnotes:

Tags: remote-sensing free-software
12 May 2016

Is Google Earth Engine Evil?

More than for any other post in this blog, the usual disclaimer applies here.

Let's face it: what Google has implemented with the Earth Engine is very appealing since it is the first solution for Earth Observation data exploitation which concentrates all the open access EO data, the computing resources and the processing algorithms. This is the Remote Sensing Scientist dream. Or is it?

Talks and posters at ESA Living Planet Symposium this week show that an increasing number of people are using GEE to do science. One of the reasons put forward is the possibility of sharing the scripts, so that other people can reproduce the results. This is, in my opinion, an incorrect statement. Let's have a look at a definition of reproducible research:

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. —D. Donoho

One important term here is complete. When you use GEE, or any other non free software like Matlab, even if you share your scripts, the core of the algorithms you are using is just a black box which can't be inspected. Actually, the case of GEE is even worse than the one of non free software running locally. Google could change the implementation of the algorithms and your scripts would yield different results without you being able to identify why. Do you remember the "Climategate"? One of the main conclusions was:

… the reports called on the scientists to avoid any such allegations in the future by taking steps to regain public confidence in their work, for example by opening up access to their supporting data, processing methods and software, and by promptly honouring freedom of information requests.

During one of my presentations at the Living Planet Symposium I decided to warn my fellow remote sensers about the issues with GEE and I put a slide with a provocative title. The room was packed with more than 200 people and somebody tweeted this:

So it seems I was able to get some attention, but a 2-minute slide summarised in a 140 character tweet is not the best medium to start this discussion.

As I said during my presentation, I fully understand why scientists are migrating towards GEE and I don't blame them. Actually, there is nobody to blame here. Not even Google. But in the same way that, after many years of scientists using non free software and publishing in non open access journals, we should take a step back and reflect together about how we want to do Earth Observation Science in a sustainable (which is the perenniality of GEE?) and really open way.

What I was suggesting in the 3 last bullet points in my slide (which don't appear in the tweeted picture1) is that we should ask ESA, the European Commission and our national agencies to join efforts to implement the infrastructure where:

And this is much cheaper than launching a satellite.

This is not to criticise what the agencies are doing. ESA's Thematic Exploitation Platforms are a good start. CNES is developing PEPS and Theia which together are a very nice step forward. But I think that a joint effort driven by users' needs coming from the EO Science community would help. So let's speak up and proceed in a constructive way.

Footnotes:

1

By the way, the slides are available here with a Creative Commons license.

Tags: remote-sensing free-software
03 Jan 2016

Why feature standardisation can improve classification results

In a previous post, we saw when and why feature normalisation before training a supervised classifier may needed. The main point of the post was about the fact that distance based classifiers need to operate on features which have similar dynamic ranges.

One thing we didn't discuss is why often things work better when the normalisation is done towards the [0-1] or the [-1,1] intervals rather than, for instance, the [0-100] range.

If you have used the SVM classifier, even with a linear kernel, using standardisation yields faster learning times and improved classification accuracy. Why is this the case?

SVM training usually uses optimisers (solvers) which are complex machines. Therefore, there may be several reasons for this behaviour, but one of them is the representation of floating point numbers in the computer. This representation is defined by an IEEE standard. In a nutshell, this representation gives different precisions to different ranges of values: the closer numbers are to 0, the higher the precision with which they are represented.

For a longer explanation, have a look at John Farrier's Demystifying Floating Point talk at CPPCon 2015. Farrier uses a visual example borrowed from an MSDN blog post that I am reproducing here:

This plot means that the error with which a value is represented in the computer increases exponentially as a function of the value itself. It is therefore easy to understand that this representation error will make things difficult for optimisers even in the case where the cost function is quadratic, smooth and well conditioned.

Therefore, in order to make sure that the optimisation procedure benefits from accurate computations, rescaling feature values close to zero is useful.

Classification algorithms which do not optimise a cost function can also benefit from this rescaling. KNN classifiers, Self Organising Maps and many other algorithms using distances to select and sort will produce more accurate results.

On the other hand, if you use a tree-based classifier with a Gini purity index (like the Random Forest canonical implementation), rescaling is not needed, since the purity is computed over fractions which are already small numbers.

However, bear in mind that some tree-based classifiers use entropy (information gain) or other purity measures, like variance reduction which involve computations which may benefit from the increased precision obtained by the rescaling.

As a rule of thumb, in case of doubt, rescaling data will do no harm. If you use the ORFEO Toolbox (and why wouldn't you?), the TrainImagesClassifier and the ImageClassifier applications have the option to provide a statistics file with the mean and standard deviation of the features so they samples can be standardised. This statistics file can be produced from your feature image by the ComputeImagesStatistics application. You can therefore easily compare the results with and without rescaling and decide what works best in your case.

Tags: remote-sensing, research, programming, algorithms
03 Jun 2015

Le management dans le cambouis

Quelqu'un me faisait remarquer l'autre jour que les propos de Crawford sur les "métiers fantomatiques" étaient méprisants et que de la défense des métiers manuels il en fait une attaque contre les tâches de gestion ou de management.

Je n'ai pas ressenti cela en lisant le livre de Crawford et il s'agit peut-être tout simplement d'un manque de contexte dans mes 2 billets sur le sujet. Quand Crawford fait la critique du travail qui n'a pas de production tangible, il se base principalement sur son expérience de rédacteur de résumés d'articles scientifiques, tâche qu'il a vécu comme quelque chose qui n'avait aucun sens ni aucune vraie utilité. Il est donc très négatif sur ce type d'activité, mais j'ai compris sa critique comme une explication du non sens que l'individu peut vivre (il en est donc victime) et non comme un mépris de l'individu qui réalise la tâche.

Je ne pense pas que la critique de Crawford doive être interprétée comme une façon facile de taper sur les chefs qui profitent du travail de leurs subordonnés. En tout cas, ce serait vraiment naïf de penser que tous les postes dans les entreprises qui ne sont pas liés directement à la production sont "fantomatiques". Il y a souvent besoin de postes de management, non pas pour des questions d'autorité ou de prise de décision seulement, mais surtout pour que quelqu'un puisse avoir une vue d'ensemble d'activités qui impliquent beaucoup de contributions et parties différentes. Il est souvent crucial de détecter des possibilités de collaboration, détecter des doublons inutiles, etc. Ou pour utiliser la LQR, développer des synergies.

Ce n'est peut-être pas nécessaire dans un atelier de réparation de motos où travaillent 4 ou 5 personnes, mais pour concevoir une des motos qui y sont réparées, il faut bien quelques dizaines, voire quelques centaines de personnes, chacune spécialiste de technologies très différentes. Sans des individus qui sont capables de tisser des liens entre ces différentes activités, il est impossible d'arriver à des résultats efficacement.

Je pense que le livre de Crawford a 2 messages principaux :

Tags: fr books ideas philo eco life
Other posts
Creative Commons License
jordiinglada.net by Jordi Inglada is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License. RSS.