#071: Chris Armbruster on how (and why) to become a Data Scientist


Podcast: Play in new window

xl-0079_36175_col_hoffotografen-e1553808615899.jpg


Data Scientist is often called “the hottest job of the 21st century”, but what makes it so attractive and imp
ortant for society? And how can a PhD-graduate transition into this field?

Chris Armbruster, a PhD graduate in Sociology from the Lancaster University, spent two years at the European University Institute in Florence, Italy studying the emerging R&D and innovation landscape in Europe.

Later he has moved “from innovation research to doing innovation” and first worked on rolling out digital infrastructures for the Max Planck Society, and then dived into start-up life in a variety of roles encompassing digital technologies, customer-centric business models, and product development.

Today he is a Director of Community Development at The Drivery – the mobility innovators’ club in Berlin, which aims to push for innovation in the mobility sector, e.g. autonomous driving, electric kickscooters.

His key mission is to cover the shortage of talent for Data Science & Artificial Intelligence, more specifically for roles in Data Analytics, Data Science, and Machine Learning in Europe.

He writes a blog on Medium about Data & AI field and professional opportunities and drives the “10,000 Data Scientists for Europe” initiative, which we can be found on Eventbrite, Meetup, and Facebook.

Most of those experts are not in Europe, and European experts often leave for the United States, Canada, China, and sometimes also India. Hence, the shortage of talent is severe and may even get worse. Doing online courses and attending a bootcamp are a good start, but they are a start only.

– Dr. Chris Armbruster

Transcript

Intro

Hello and welcome to PhD career stories! In today’s podcast, episode number 71 Chris Armbruster will tell us his transition from Science to Data Scientist. What skills he gained from his academic experiences and what options did he have.

Welcome and let’s listen to Chris story.

Hey there. Hello! I am Chris, Chris Armbruster. I am one of the PhDs that has transitioned from Science to Data Science. I live in Berlin, the new startup hub of Europe. I moved from a Postdoc in innovation research at the European University Institute in Florence to rolling out digital infrastructures for the Max Planck Society, and on to start-up life in a variety of roles encompassing digital technologies, customer-centric business models, and product development.

My mission is finding talent for Data Science & Artificial Intelligence, more specifically for roles in Data Analytics, Data Science, and Machine Learning. With colleagues, I am working on scaling the business case for Data Science across companies, and across Europe. The initiative is called 10,000 Data Scientists for Europe, and we can be found on Eventbrite, Meetup, and Facebook.

What helps me very much is that I was a travelling PhD with stops at Lancaster University in the UK, Jagiellonian University in Cracow, Humboldt University in Berlin, the Etvös-Lorand-University in Budapest, the University of Wroclaw in Poland, and Novosibirsk State University in Siberia. It empowers me to communicate across cultures and find solutions for internationally mobile talent.

I have a strong sense of mission: It is all about data-driven innovation. What guides my colleagues and I is the market and the customer. We are looking for data-centric products that scale on the market. We support talent on the road to becoming Machine Learning experts and entrepreneurs. What we want to achieve is a platform giving us the data-driven products of the 21st century.

Particularly rewarding is the social impact we have. Consider the energy industry. Traditionally you would want to predict the demand of energy and match it with a supply of coal, oil or gas. In a world of renewable energy, you must also predict the supply of sunshine or wind, and then match supply and demand while optimizing for storage. Data Science is key to a sustainable energy future.

Next, consider mobility. To satisfy the mobility requirements of billions of urban dwellers we need not just clean mobility, but also autonomous mobility for the optimal allocation of vehicles. Next, we require data-driven efforts at providing much more frictionless mass mobility. Machine learning is key to sustainable mobility.

As a final example, consider health. Most diagnostics are still way too expensive for most people in many countries, and sometimes they are not available at all. Data and images make a difference, and databases of images in conjunction with widespread smartphone usage will allow us to extend the reach of advanced medical diagnostics right around the world. Deep Learning is an exciting opportunity to improve the lot of humanity.

So, how did I get here?

My Postdoc was Plan B. Plan B meant spending two years at the European University Institute in Florence. I was a Jean-Monnet-Fellow and then received a second stipend from the Fondazione Antonio Ruberti. Antonio Ruberti was an engineer and scientist, and in the 1990s European Commissioner for education and culture. The stipend was awarded by a committee from the large European research facilities like CERN in Geneva, the European Molecular Biology Laboratory, the European Southern Observatory, the ILL, the Institut Laue-Langevin in Grenoble.

The European University Institute sits on the hills above Florence, the city that gave us the Renaissance, the Banks, and Power Politics. Looking out over the old city, a few hundred PhDs, postdocs, and professors spend time speaking and thinking about where Europe came from, and where it is headed.

I arrived just after the enlargement of the European Union, with a plan to study the emerging innovation landscape in Europe. I was interested in questions like:

  1. With the rise in public and private funding for research & development, would the universities start contributing more significantly to innovation and economic growth?
  2. Would the emergence of the postdoc as principal investigator, as someone holding large multi-year grants from the European Research Council or a national funding body, make a difference to academic careers and change the system?
  3. What is the impact of the call for open science, e.g. open access to publications or open data, and what kind of business models would work for open science?

If the Postdoc was Plan B, what was Plan A?

Plan A was a startup collecting financial data on higher education and research in Europe. As Europe was coming together in the early 2000s …and some days a united Europe seems like a distant memory… – Well, in the early 2000s there was an opportunity to collect data for the purpose of analyzing, comparing, and predicting performance in education and research. The European Commission was sponsoring contracts for data collection and – against the established competition and the odds – my team was offered the contracts and set off.

However, you guessed it: We were not successful in the end. It is a story for the fuck up night format: We made one mistake in selecting the wrong partner for collaboration in data collection, from which a couple more project management mistakes resulted. It was product-failure the first-time round, and since we were self-funded and there really wasn’t venture capital available back then, I shifted to Plan B.

So, about that innovation research…. There are a few interesting insights I gained.

  1. From the late 1980s up until 2010, policy makers and academics alike often assumed that research universities or large research organizations could and should be hot beds of innovation and even make money by patenting. I checked the data, and specifically the data for the best funded public research university system in the world, namely the University of California. In a nutshell: A) not even the University of California ever gained significant income from R&D activities; b) research institutions aren’t hotbeds of innovation; and c) boosting public spending for research or R&D may correlate with more innovative activity but a causal link cannot be inferred.
  2. More money for research typically came with the introduction of national flagship research awards, also for early career researchers. Everyone became aware of this trend as the European Research Council was set up and the Starting Grant announced. Undoubtedly, it helped focus many minds and some researchers have benefitted very much. On the other hand, the flagship awards contributed to the further stratification of academia. You can see it this way: If you are a postdoc pursuing a research career you need some strong indicators that you have a fair chance at getting one of these flagship awards as a principal investigator. If not, it may well be wiser to leave the academic system earlier.
  3. Open ‘everything’ has become very trendy in academia. Occasionally it is still controversial, but typically openness is backed by all the relevant research funders. In the early days, the main issue was the acceptance of open access by researchers. Quite a few scholarly communities exchanged pre-prints, but would there be any acceptance for online open access to all publications? As it were, there was and is acceptance. And working on this issue was one way in which I moved from innovation research to doing innovation…

What does it mean ‘to do’ innovation? Consider the following: Sharing pre-prints on exchange servers is one thing but establishing open access as a publishing model is quite another thing. In fact, who would have thought that authors pay publication charges? Established practice was that authors were courted, and readers paid. Increasingly, the libraries paid on behalf of the readers for electronic access. This practice proved to be an obstacle to open access because any move quickly became a collective action problem requiring very time-consuming coordination among libraries. Hence, switching the customer was imperative. But, would any author pay anything?

I had moved on to rolling out digital infrastructures for the Max Planck Society. The focus was on the electronic ingestion of publications and data, the internal upgrade to digital practices, and the online dissemination of results. In local as well as international teams we were working on the adoption of the new digital practices. This included finding models for open access publishing. Hence, we looked at the practices of scholarly communities, models of online publishing, and author and reader habits – all for the purpose of devising strategies to find early adopters for open access publishing.

The collaboration with scholarly publishers, international research organizations, and policy makers bore fruit in that two forms of open access publishing did take off: A) The publication fee, whereby the author or authors pay a fee for open access publishing; and B) Publication deposit in a repository, i.e. the final manuscript that has been accepted for publication is deposited in an open access repository parallel to publication. The share of open access publications passed 10%, then quickly 20%, only to move beyond 30% – a clear indication of widespread adoption.

Working on scholarly communication issues provided my first foray into Data Analytics and predictive modelling in the years from 2008 to 2012. In an effort to understand author and reader behavior, I designed the modeling of online user behavior, of scholarly attitudes, and the economics of journal publishing. Data collection and analysis was carried out by three international teams.

One team obtained access to the logfiles from publishers and repositories. In fact, we had access to logfiles from the ten largest scholarly publishers as well as some of the largest research repositories. Logfile refers to the weblogs created by the users on any internet site. A second team had the opportunity to run large surveys on the attitudes of authors and readers, so that we could contrast survey attitudes to online behavior. A third team modeled the economics of publishing.

Our specific focus was on publications that are available twice, that is as scholarly journal article as well as a repository deposit. We were dealing with what is called Green open access, where the publication in a traditional subscription-based journal is supplemented with the deposit of the final manuscript in an open access repository.

The data showed that the availability of open access manuscripts in repositories boosts downloads of journal articles on publisher websites. In other words, making the final manuscript available is great marketing for the journal publication. In hindsight, this result is perhaps not surprising. However, the publisher hypothesis had been that Green open access would cannibalize journal article downloads. By contrast, my major finding was that increased dissemination and awareness of research results drives the usage of the journal article as the version of record. Moreover, the cost of the additional manuscript deposit is low, and only a fraction of the cost of journal publishing.

At the interface of rolling out digital infrastructure, driving innovation, and responding to policy concerns of conflicting stakeholders, I gained some additional skills such as

  • Leading international teams
  • Putting the customer first
  • Modeling user behavior, and
  • Embracing disruptive innovation

While working for the Max Planck Society in Berlin, the city around me matured into a startup hub. After a first wave of e-commerce startups and digital apps, the ecosystem started generating more startups in education, energy, health, and mobility – many of which are data-driven.

In 2013, I opted for a start-up deep dive. Typically, I was working in an operative role – such as Chief Operations Officer. I oversaw the day-to-day scaling of a startup, balancing the growth of revenue with that of operations.

The most interesting startup I participated in was an expensive early failure – one that did not progress beyond project status: The German Engineering Cloud University. Online open courses, more specifically Massive Online Open Courses (the so-called MOOCs) had begun transforming life-long learning. Inside Deutsche Telekom someone had looked at the trend and asked: If you are based in Germany, what can you contribute? What is the one thing the rest of the online world might want most? Well, German engineering, and German engineering universities, the Technical Universities.

German engineering has tradition, it has reputation, and it attracts students worldwide. What if you could bring that higher education into the digital age? What if every higher education student worldwide could enroll in a German engineering degree?

The idea was put to the leadership of the Technical Universities in Germany, and they all signed up. The next challenge was understanding who most likely are the early adopters of a ‘cloud university degree program’ – and then creating an educational experience for them. Extensive research by a well-known strategic consultancy indicated that we would be building for undergraduate students in Eastern Europe or Eastern Asia. The case for Eastern Europe rested on proximity, and especially on easy entry to German higher education and the labor market. The case for Eastern Asia was interesting because of the high reputation of German engineering, and the potentially much larger number of cloud students.

The startup had ample funds, and progress with the user experience and course design was good. The startup really was a corporate spin-off with a single source of funding – and this became a problem. I noticed that the leadership started focusing on management activities typical of large firms. There was not enough entrepreneurial spirit, the organization not agile enough, and course production not fast enough. What killed the startup, however, was that Deutsche Telekom appointed a new CEO, and the new CEO disbanded the startup.

Another really interesting opportunity emerged in late 2016 with Data Science Retreat. It is a boot camp focusing on education and product prototyping. Participants come for 3 months to upskill to Data Science and Machine Learning. Participants go on to work in startups or larger companies in Berlin, in Europe, and throughout the world.

About 1/3 of the participants are STEM PhDs transitioning from academia to industry. The other participants typically are either

  1. Data professionals with a background for example in business intelligence, data warehousing or data analytics; or
  2. Software developers sensing an opportunity to get involved in data-driven product development.

I have spent two years supporting more than 100 talents moving into data-driven roles, mainly in Data Science and Machine Learning, but sometimes also in Data Analytics. In the process I reviewed several hundred applications, conducted more than 200 interviews, and spoke with many hiring managers. Much of what I do revolves around understanding where someone is coming from and where they want to go. A typical outcome is that we re-design the CV so that it emphasizes the aspiration to move into a data-driven role, demonstrates fit with that role, and motivates the hiring manager to offer an interview.

I strive to empower the data community by making connections, hosting events, finding talents, and troubleshooting business issues. Experience, data, and insight show that the key issues facing Europe are:

  1. The shortage of talent, particularly of talent maturing from beginner to expert level;
  2. That companies do not yet have enough Data Science business cases that scale successfully.

Enter 10,000 Data Scientists for Europe – a platform for scaling solutions by

  1. Talent training to the highest level
  2. Scaling the business case for Data Science inside existing companies
  3. Creating new AI-first companies

Journalists from the New York Times have estimated that worldwide there are less than 10,000 AI experts. Experts are people capable of executing a project or product based on Machine Learning or Robotics. On top, most of those experts are not in Europe, and European experts often leave for the United States, Canada, China, and sometimes also India.

Hence, the shortage of talent is severe and may even get worse. Doing online courses and attending a bootcamp are a good start, but they are a start only. We urgently need a platform that accompanies talent all the way to senior level.

Facts on business success with data and machine learning are sensitive and, of course, open to interpretation. Practitioners like to comment that they observe failure in the majority of instances. I can corroborate from the people I work with throughout Europe that success may be elusive and failure not uncommon. This drives talent away.

Building a data-centric product with machine learning inside is actually quite hard. It may also be expensive, especially compared to the app-building rush of digitizing marketplaces in which the buyers and sellers provided all the data, and often manually.

Hence, the second challenge is scaling the business case, inside existing companies but also by creating startups that are AI-first.

If you want to know about my current thinking and activities, you can find these under my name, Chris Armbruster, on Medium.com – the tech publishing website. You will find a longer piece on ‘Scaling versus failing the Data Science business case’ and also a piece on why the talent shortage is the key issue in Europe.

To you: Good luck with your PhD and Postdoc career. Please remember to assess early enough if you want to stay in academia or decide to leave. I have worked successfully with PhDs +5, +7, and even +10 years. That means, I have supported PhDs that continued with their academic career for a very long time before deciding to leave. After 10-15 years of studying and research it is quite a challenge to switch careers, but we were successful in every case. Nevertheless, I noticed that directly after the PhD, or at PhD +2 or +3 years, changing careers is significantly more straightforward and faster.

If you want to explore the Data & AI field as a professional opportunity you can find some relevant guidance on Medium.com too. The 10,000 Data Scientists for Europe campaign is live on Eventbrite, Meetup, and Facebook. I look forward to welcoming you to an event or – if you like – to organizing an event where you are. Get in touch to find out more.

Outro:

Thank you for listening to a new episode of PhD career stories. Don’t hesitate to contact us to share stories, tell us what you think or comment episodes on our webpage Phdcareerstories.com. You can follow us on Facebook, Twitter, Instagram and LinkedIn.
See you in two weeks for a new interesting and unique story!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s