Partek Flow Deployment on Amazon Web Services

Participants:
Richard Uhlig, Chief Executive Officer, Quadrant Biosciences
Jeremy Williams, Vice President-Technology, Quadrant Biosciences
Frank Middleton, Scientific Collaborator at Quadrant Biosciences, a professor at SUNY Upstate Medical University and Director of the university’s Molecular Analysis Core Facility

Paraphrased for clarity

With the exponential growth of the amount of genomics data, analysis on the cloud has become an appealing alternative to deal with the data deluge. Quadrant Biosciences, a translational science company in New York state, recently launched Partek Flow on the Amazon Web Services (AWS) cloud. We caught up with them to discuss their work and their experience analyzing genomics data using Partek Flow on AWS.

Can you describe the work Quadrant Biosciences is doing?

Uhlig: We are a life sciences technology company that fits into the space between basic science discovery and the commercialization of that science. In other words, we are a translational science company.

We form preferred relationships with universities. Currently, we have partnerships with SUNY Upstate New York University, PENN State College of Medicine, Cornell University, and Syracuse University. That list is growing as we evaluate new technologies.

As opposed to a traditional venture capital firm that evaluates technologies and hires management teams to execute, we have our own very broad and diverse management team to partner with universities and develop the technology.

We have two main product engines. One is related to four-dimensional motion capture and the surrounding architecture and cloud computing. The first product to launch on that platform is called Clear Image. One of its principal uses is youth concussion diagnosis and management.

The second product engine is our epigenetic bioinformatics platform. Right now the focus is on Parkinson’s disease (PD) headed by Dr. Middleton and Steve Hicks at Penn State University. The associated collaborators have expanded dramatically over the last several months to include Boston Children’s Hospital, Yale School of Medicine, Mount Sinai, Cincinnati Children’s Hospital, Holland Bloorview in Toronto, CHOP, Rush University in Chicago, UC Irvine, Children’s Hospital of Los Angeles, and Nationwide Children’s Hospital. That research is moving along very rapidly and we are working feverishly on a phase two grant with the NIH that will likely be in the $3-5 million dollar range.

In addition to PD, we are looking at a number of other disorders, including ongoing work relative to traumatic brain injury (TBI) and mild variants thereof in firefighters, collegiate athletes, and active duty service personnel. There will likely be a direct-to-consumer product spun out of the latter.

We’re about 20 people located at the Institute for Human Performance at SUNY Upstate Medical University, co-located with the Upstate Concussion Center, Department of Neuroscience and the university’s Molecular Analysis Core Facility.

Note: Since this interview, Dr. Middleton and Dr. Steve Hicks of Penn State College of Medicine,
sponsored in part by Quadrant Biosciences, published Association of Salivary MicroRNA Changes With Prolonged Concussion Symptoms in which Partek Flow played a role in the data analysis.

How did Partek Flow fit with your data analysis needs, particularly in your epigenetic studies?

Middleton: We found Partek Flow to be the mainstay for the initial processing of our genomics data. Much of what we do is microRNA-based and we do a lot of samples and collaboration with Quadrant Biosciences.

In the past, samples had to go through BaseSpace apps in order to extract the information we were interested in. Our focus is on non-coding and some coding-RNA present in biofluids as well as some microbiome-related molecules. BaseSpace had many apps available to probe some of these things but also charge a lot.

Partek Flow has evolved to support microRNA, coding RNA, and general ncRNA needs. We were excited to see the recent addition of the KRAKEN workflow to Partek Flow. In our experience, we found to do microbiome analysis, which is really microtranscriptome based level analysis of the microbiome, we have to drill down deeper than just mapping the taxon IDs or OTUs. We have to get into the transcript level. We use k-SLAM, which is a workflow Jeremy fine-tuned based on feedback and extensive experience gained through more than a year of trying all available things. We are really working on three things: coding RNA, non-coding RNA, and microbiome analysis. Partek Flow is really good for the first two and coming along potentially for the third.

You originally installed Partek Flow on a server. Why did you switch to Amazon Web Services?

Williams: Our server only had four cores. We beefed it up with as much RAM as we could, I think we got up to 250 gigs of RAM so it was this really lopsided, on-premises server. We wanted to see if we could speed up research. Since Partek Flow was doing a great job of producing microRNA results with different aligners and such, I wanted to see if we could cluster it and spread the work over a bunch of virtual machines in the cloud.

AWS has a great capacity to provision resources in a per-cost, optimized way. I was able to stand up a Partek Flow head node with a very small server to just host the web application. On-demand, we can automatically spin up worker nodes, which is awesome. We decided to continue with two.

The neat thing with AWS is that once you save a server configuration as an image, you can use that image to provision any size or spec machine that you want. That comes in really handy because you can set up the relationship between the worker nodes and head nodes right as they come into existence. When you’re finished with your analysis, you can kill the workers and it’s like they never existed at all. You don’t have to pay to maintain them. The cloud has really sped up our sample analysis. Frank, do you remember the performance difference?

Middleton: We had about 420 microRNA samples we needed to perform alignment on. We really like SHRiMP2 for sensitivity, but using it on the on-premises server, the projected finish time was going to be more than four days. AWS reduced the time to about four hours to do the whole thing in two different ways, with mature and precursor microRNA. We didn’t use the biggest virtual machine available, but it was one of the bigger machines. So we scaled from days to hours.

Did you say you scaled analysis from days to hours?

Williams: Yeah. The cool thing about it is in paying attention to the way that the work is distributed. We think, that having two bigger nodes* will actually perform just as well as four smaller nodes, though we have yet to test that theory.

I personally am very excited to see this as you never want to see technology stand in the way of research. Being able to turn around the analysis of our entire sample set, which getting samples are a huge, huge effort, pragmatically that really elevates a lot of the obstacles that you face just in the normal research cycle. While I’m not a researcher, I am a research supporter, so I’m always excited to see what new things Frank and Steve find.

*Performance depends on the size and specs of nodes.

Using command-line tools, do you think you would have gotten the same performance improvements that you saw with Partek Flow?

Williams: Possibly. Eventually. But half of the battle with software is in the tradeoff of the time to develop a tool and the results of the development. In order to build something like that, it would have taken a really long time and robbed our fairly lean staff of developing tools elsewhere. The short answer is if it took no time to develop, we might have been able to build something that compared in performance. But for a start-up company with the kind of research we do, it just wouldn’t be feasible.

Richard Uhlig: Yeah, it made a big difference in being able to get up and go.

Williams: It’s [Partek Flow] a polished tool. Down the road, we do plan to take advantage of the Partek Flow API to really start to stream our batches so Frank can kick off our custom configured workflow.

When you decided to move to AWS, what was your experience with installing Partek Flow?

Williams: It definitely required technical expertise. For the most part, it was very smooth. Where I hit obstacles was in the dependencies. Partek Flow, as most software systems do, has many dependencies. But your team was so, so fast to respond. It was really amazing service and super competent. So the few hurdles I encountered were taken care of really quickly. It was a very, very positive experience.

If asked by another company for advice about running Partek Flow on the cloud, what would you tell them?

Williams: It depends on their level of comfort with the things they would have to deal with. The stand-alone version is super easy to install. You just have to run a command. The distributed version takes a little bit more know-how because you have to work out all the networking and permissions. But that’s not a shortcoming of Partek Flow. Once it’s set up, it’s super easy to use.

Do you think you will continue using the AWS cloud in the future?

Williams: Oh definitely. There really isn’t a reason to buy our own servers, especially with the size of our company. But even with much bigger companies like Netflix, much of the data is on AWS, and they represent a fifth of internet traffic or something. I don’t foresee needing to buy our own servers for anything computational in nature.

Uhlig: Aside from the SUNY Upstate partnership with another campus, SUNY Oswego, I don’t know anyone who would want to build their own server farm now.

So you will continue to use Partek Flow on the cloud?

Williams: That’s definitely a safe bet. For ad hoc runs, Frank will likely continue using the user interface, but as we start to establish more and more standardized runs, my hope is to use the command line API that Partek Flow has to provision everything with one command or click of a button.

Did you use the Partek Flow GUI to create your pipeline?

Middleton: Largely. It makes it much easier to remember what you did. Otherwise, you have to keep lists, upon lists of long command lines that you’ve tried. The GUI is really a fantastic interface for getting recipes and workflows worked out and remembering what you did.

How many samples do you foresee running on Partek Flow in the future?

Middleton: Hundreds of thousands.
Williams: Yes, we’re going for it!

Is there anything else to add about your experience running Partek Flow in the cloud?

Williams: I’d just like to reemphasize how impressed I am with the level of service that we’ve received from Partek, all the way from the sales rep to the support team. It’s not often you get guys with PhDs to help you out. We definitely feel like you guys are our partner, which makes a big difference.

—

Are you considering using the cloud for your data analysis and have questions? Contact our support team for a free consultation.

Back to Blog Post List