Tags: academic, philosophical
Many years back, when I had just moved to Montreal (a great city to meet some of the most fascinating minds in the world, a topic for another blog post), I met Dr. Costopoulos from the Department of Anthropology at McGill University. He mentioned some work he and his students have been doing simulating animal decision making using ice core data.
The premise is simple, particularly coming from a machine learning, optimization and genetic algorithms background: in an environment punctuated by slow, progressive changes followed by cataclistic changes in the opposite direction, individuals that track the enviromment better will overfit (and die). More inaccurate individuals will be the ones surviving long term.
I finally tracked the paper published behind this research: Xue JZ, Costopoulos A, Guichard F (2011) Choosing Fitness-Enhancing Innovations Can Be Detrimental under Fluctuating Environments. PLoS ONE 6(11): e26770. While it explicitly states their assumptions are too simplistic for human populations, the theoretical ground is sound: the best decision makers among human populations should have been wiped out during the rapid deglaciation periods (think woolly mammoth), some of them as recent as 20,000 years ago. This of course has many interesting implications, some I will discuss next.
Tags: business, academic, philosophical
I got asked my thoughts about joining a startup as a technical co-founder straight from an undergrad in Computer Science. Even though I appreciate the lack of contributions behind saying "do not do it", in this case I feel it is more of "do not do it... yet". The key reason is that the interpersonal aspects of technical work has not truly been exercised in college, plus missing out on the simplified straight-out-of-college hiring process. This is a very opinionated post. If successful, any random reader should disagree with plenty of its content.
First, software development is an inherently communal task, a fact that is usually missed with the technical focus of academic instruction. Even though many courses can offer team assignments and projects, that is a far cry from programming in a team, particularly with seasoned developers. There is plenty to be learned from these people you will be missing out jumping to create your own company right away. Lessons that you can then apply when working with a team of your own employees.
In the same vein, a co-founder role will involve management duties in some moment. Trying to manage people without having been exposed to any management whatsoever seems quite difficult. You might be able to do it, but management is something where following some example of a previous manager in your life can be very positive.
Regarding technical skills themselves, starting a project from scratch, chosing the full stack yourself and having absolute control of the technical decisions would be the most appealing reason to be the tech cofounder in a startup. No question there. But at the technical level you will become a jack-of-all-trades type of person that has not necessarily very deep knowledge in none of the technologies involved. Now, most startups fail, what is your plan B? Would the experience you gain doing this will help you advance your professional career? From what I have seen, it pays off specializing deeply in technology (when that technology is of interest to the market, that is). But hey, I went on to do a PhD so that's what I know.
Now, doing somehting or not doing it revolves around the opportunity cost. Time is unidimensional. If you spend your time with the startup, you would not spend your time doing other things. So what is it you would miss going for the startup? The excellent hiring process straight-out-of-college. When you are being hired in a company, there is a process (that many people are trying to improve, I contributed to a now defunct startup in that space a few years ago) but still boils down to keyword matching. They are looking for a person with knowledge of technology "Jabberwocky" and if your previous work experience does not include "Jabberwocky", you are out of luck. It is simpler to hire somebody that knows that technology that let you train in the job. But the process of coming straight out of college is different and it is based on target schools and GPAs. Therefore, after one or two years in your startup, you will need to jobhunt based on the technology stack you used in the startup. If you tried to inflate it (including "Jabberwocky" when you didn't really need it), you were doing your startup a disservice and that could be partially a reason for its failure. But if you use straightforward, less fancy technology, you might have a hard time marketing your skills.
Now, the main reason not to go is... because you are asking. There are things in life where hesitation is a big negative sign (going to graduate school to pursue a PhD and getting married come to mind). From what I have seen, entrepreneurship is a personality trait. A true entreprenuer would have started a couple of ventures through their undergraduate years, because if any of them truly pan out, there was no reason to have a degree. If you are not an entrepreneur at heart, but the technical self determination that comes with being a technical co-founder attracts you, gain more technical skills to maximize the chance of success and earn some money to wait the right business co-founder. With a stable job you can judge the feasibility of startup ideas "on a full stomatch" (whether that is good or bad, it is debatable, but for technical co-founders, I do believe it is a good mindset). If you read all this and wholeheartedly disagree with it, good luck in your new venture! Courage goes a long way.
Tags: floss, debian
I have been looking for a self-hosted alternative to commercial "cloud" products for a while. Initially started using rsync but it had the problem that you need to remember the directionality of the updates: when a file is deleted in a copy, there is not enough information left to know whether the file is a new file in one of the copies or if the deletion was a purposeful act on behalf of the user. Therefore it is necessary to indicate the directionality of the deletions which is error prone. And some updates might be at both sides of the copies.
Looking for alternatives I peered into OwnCloud, even though a PHP implementation was not really my cup of tea. I found the project has rejected being packed by Debian (a major red flag for me, as I trust the Debian security team to keep old versions securely patched on my personal servers) and that the project has been forked. So no OwnCloud for me.
Given my wishlist of a tight Debian GNU/Linux and Android integration, I looked into a few other alternatives (Syncthing and dvcs-autosync come to mind) but decided to settle on sparkleshare as I thought it had a version on the F-Droid FOSS Android Apps Repository. I have heard from some friends using it that it has its glitches but that overall it was just git underneath so you can use it/repair it as you see fit outside of sparkleshare. This is the case so far and it ended up being a key feature for its interoperability with Android.
I thus setup a remote account for the share, with git installed in the remote server and nothing else (there is no need to install sparkleshare on the server, even though the instructions for using the sparkleshare App seems to indicate so). Now, the existing App only allows you to download files on demand and you cannot modify files. It is thus non-operational as a shared folder. As sparkleshare simply maintains a git repo automatically, I am then accessing through SGit, an incredibly resourceful git client for Android.
My workflow is then as follows: in my desktop and in my laptop, I use sparkleshare (one in Debian testing, the other in Debian stable, both interoperate just fine so far). These copies receive plenty of changes and are kept up-to-date just fine. Then in my phone and tablet I use SGit to pull the updates. In the few situations when I need to modify a file (usually a text file with notes on a paper I'm reading or writing), I use a text editor and then I have to go through the slightly clumsy steps of: adding the file to stage, commiting the changes (making sure the checkbox "Auto stage modified files" which is checked by default is unchecked, that takes forever on a mobile device) and then pushing the changes (which requires clicking on "origin", the interface is slightly confusing there). This workflow makes sense for a regular user of git, which is my case.
Now for the bad news: the maximum file size that can be unpacked on the Android version I have seems to be capped at 64Mb (or slightly less). This is really a let down, but seems to be a limitation that exceeds SGit internals and it might be related to the zlib version shipped with a particular Android version. That is not a show stopper but requires a little too much vigilance for my taste (if I drop a file on the sparkleshare folder, it will immediately polute the git repo with a large blob if the file is over 64Mb and the repo will need to be recovered or restarted afresh). On the good news front, if that happens and the repo gets borked, setting a new one is very simple. And as the repo gets bigger and bigger (it stores all files and their history), resetting it by changing to a fresh one from time to time is necessary anyway.
I have been using this solution for a few months and I am quite happy, but would consider switching for something handling bigger files, limited history and the same binary on the Debian and Android sides. I would expect that will take some time to come around, though.
Tags: academic, philosophical
I have been invited to write a book chapter on lexical choice for translators (contact me if you want to see a preprint). To get acquainted on this audience different from my usual computer science I read a few papers on professional translators use of technology. Two of them are quite interesting and I recommend them not only because they make for a good read and they have implications outside translation: Translation Skill-sets in a Machine-translation Age by Anthony Pym (2013) and Is Machine Translation Post-editing Worth the Effort?: A Survey of Research into Post-editing and Effort by Maarit Koponen (2016). This search finished by reading a short ebook by researchers at the MIT Center for Digital Business titled Race Against the Machine: How the Digital Revolution Is Accelerating Innovation, Driving Productivity, and Irreversibly Transforming Employment and the Economy. In that book plus the papers there's this call for humans, if we want to remain employed, to hybridize our work and to seek out ways to work with the computer as some sort of partnership. That process is clear in human translation: checking from previously translated similar sentences or the output of machine translation (instead of creating new translations from scratch).
The question is then what about our trade? What it means to be working on a partnership with the computer rather than for the computer? As other people, I have argued that machine learning (more specifically supervised learning) is akin to traditional programing (in the old soft computing style). It follows many of the pros and cons of the redefined labor of the human translators.
But that's not all. Other areas of programmer / computer partnership that are less deployed (but nonetheless quite explored in the scientific space) are declarative programming techniques for both program verification and program synthesis using automatic theorem provers. The idea here is that instead of writing test cases you write test cases generators and the property checkers for the output of your programs over those generated test cases. I have experience with the Haskell library QuickCheck2 and it's quite pleasant to use (Thanks to http://www.cs.mcgill.ca/~fferre8/ teaching me how to use that library, gracias che!). There are now similar libraries for other programming languages. How can this be described as a programmer / computer partnership, you might ask? At the end is just another test framework. The difference is in the type of task the human is doing (enunciating properties) and the computer (doing the grunt of checking the said properties). Traditional unit testing has much more grunt work on the side of the programmer.
That focus on overall properties rather than the code behind it bring us to the hope of automatic programming using theorem provers. There has been some massive improvement in theorem prover capabilites using general SAT solvers in recent years. Maybe it's time this new technology start finding its way into the desktops of professional developers.
Now these skills are different from regular developers. The same can be said from machine learning. Many great practitioners in machine learning ("data scientists") are average / poor developers but come with backgrounds in engineering or science that makes them thrive in an extended programming task considering supervised machine learning as programming. It reminds me of the fact (as brought by Race Against The Machine) that the best chess players in present times are neither humans nor computer but a thriving partnership of not necessarily the best humans nor the best computers.
Borrowing a page from the experience of human translators, there'll be a time when painstaking 100% human created programs will be deemed too expensive for most but few mission critical situations. And the rest will be created by a redefined computer professional. At this stage this is a mental exercise but given the example from human translators, definitely an exercise worth engaging.
Tags: academic, philosophical
Since I was born, the planet population increased by 50%. (I even heard half the humans ever existed are alive, that's false, more like 6%.) This is all anecdotal but my recollections from childhood speak of a place with just fewer people, where shopkeepers know you by name and expected you to buy certain items regularly. They would know what you like and bring products catering to their audience. Such experience for the most part is lost (it might remain in small towns and such).
Interestingly, human population was rather small for tens of thousands of years. Our human expectations about relating to each other are in line with small communities. This topic is outside my realm, but I heard about it before, Google points to an Urban Ecology paper from 1978 that says
the persistent human propensity to identify with small groups is a consequence of our evolution as a mammal
A machine learning / information retrieval technique that has grown immensely in popularity in the last two decades are recommendation systems. When teaching them before (see my class on the topic, in Spanish), I realized they bring that much needed small village feeling to on-line transactions. When you enter Amazon or Netflix, they "know" you and recommend things based on what they know about you. It is paradoxical that we now need computers to bring a much needed human touch to our interactions.
Moreover, some of the techniques used in recommendation systems (such as user-based recommender), build such small villages as part of their algorithms. In that sense, when Netflix recommends you to watch Anastasia for the fourth time, it actually has enough information to recommend you other people who, you never know, might actually want to form a small village with you.
Tags: academic, political
A popular method for learning from large data sets is Random Forests (see my class on the topic, in Spanish). I would like to drive a paralellism between the way they work and our political decision structures and the so called Wisdom of the crowd.
Random Forests are what is called an ensemble method as they perform better than individual methods by combining their results. The individual method used in Random Forests are Decision Trees, trained from a subset of all the available data (and because of this property of operating on subsets of the data, they are a good method for applying on large datasets).
More interestingly, Random Forests (as discussed in the Machine Learning article by Leo Breiman in 2001), can not only train each of their trees on a subset of the data but also use a subset of the available information (features) when training each decision node in the tree. That makes each of the trees that are part of the ensemble truly random! When creating each individual tree we only see a subset of the data and only a subset of its characteristics. To decide the outcome of the decision, each of these random trees is given a vote. The most voted decision wins.
Now, the "magical" part is that they perform better than a decision tree trained on all available data. Even if the tree were made "smart" by prunning poorly constructed branches (the trees that make the ensemble are unpruned). And they are so high performant that a recent comparative study of 179 different classifiers found them to be consistently top performing across a large set of problems.
Now, if you think for a second, this is the way direct democracy works: each voter has access to a subset of the information and only sees that subset from a particular perspective (their own unique perspective). By using a majority vote, we are actually implementing a Random Forest. And from the theory (Breiman paper is quite delightful) we can see that we don't need more informed voters, just more of them. Food for thought.
Many moons ago, I did my PhD. That was years of hard-ship, unknowns, anxiety. But also self-discovery, with plenty of fun and exciting times.
Then I met a wonderful woman, we got married and she decided to go back to school and pursue a PhD on her own. I have to admit the process of accompanying a loved one through graduate school is far, far worse than going through it yourself. Even knowing full well the challenges ahead, it is much worse to have to see her suffer through them without being able to do anything about it.
The feeling of helplessness while watching a person you care for deeply being in distress is nothing like I have experienced before. We were lucky the decision of going back to graduate school was well thought out and discussed at length even before she applied. But even then there were years the whole process took a terrible toll on our marriage.
It is thus I want to extend my congratulations to Dr. Ying for successfully defending her PhD thesis entitled "Code Fragment Summarization"; to all other PhD candidate spouses out there, I hear your pain. There is light at the end of the tunnel!