July 28, 2022
5 min read

Data pipelines on GCP – why and how.

At Cobry, we live and breathe data. We’re always searching for new ways to make data as streamlined as possible - and to maximise its effectiveness within Google Cloud Platform. Step one is a good 'ol data pipeline! 

Today we’re talking about how your business can use data pipelining to ensure your data is functional, reliable and scalable. 

First of all, what is a data pipeline? 

A data pipeline is a set of tools and processes used to automate the movement of data between a source system and a target repository. So basically, it’s a system of moving data from one point to another - and sometimes that involves changing it into a form that’s more usable. 

For example, let’s say you have data from an application programming interface (API), and you want to put it into a database like BigQuery. There are a few steps to doing this, and some can be really difficult to do manually. Creating a data pipeline is an effective, structured way to get your data from the start to the end - and to ensure it’s in a form that you can actually use. 

Not sure if you generate data? Trust us, you do. 

These days, basically everything your business does generates data in some way. You may not think that you have much data to process, but even if you use a single SaaS tool, chances are you have plenty to start analysing. Do you run digital ads? That’s data. Are you selling things online? Data! What about your support desk? That will be generating data too!

And if you’re not finding a way to gather insights from the data you collect - you’re leaving yourself in the dark. Cobry harnesses systems to help companies make better data-driven decisions - which helps them increase profits and stand out from the competition.

For example, when Tesco introduced the Clubcard scheme they learned more about their customers in three months than they had ever learned in thirty years. Even on a smaller scale, businesses across the board are using data to inform decisions. From restaurants using data to better order stock and reduce wastage to marketing agencies using data to better understand their client needs.

What type of data pipeline do you need - streaming or batch?

Data comes in many forms, and the type of data impacts how it’s processed. The two general forms of data used to create pipelines are streaming and batch data. Streaming data is a constant flow of real-time data, such as data generated from social networks or financial trading floors. Batch data, on the other hand, is retrieved periodically in large amounts, such as looking at ‘every ticket opened in June’.

The mode of data required will depend on your business goals, and this is something Cobry can help you define. For example, if your goal is to create weekly or monthly reports for your board members, then batch data is a good choice because it updates your data all at once at a specified frequency. However, if you need fresh or live information, a streaming pipeline will update on an ongoing basis as soon as new data is generated. 

How Cobry designs a data pipeline with cloud principles in mind

Like everything we do in the cloud, our goal is to make sure every data pipeline is secure, reliable and scalable. We use GCP’s managed services, which automatically provision more resources as and when required, which keeps costs minimised. This process is known as elastic scaling. 

These managed services also offer multi-regional redundancy, to ensure the pipeline is reliable.  We can also take advantage of other services on GCP - such as the Secret Management System - to keep usernames and passwords, API keys and other sensitive information encrypted and hidden from code files.

Why data pipelining is important

Data pipelines are valuable because they automate data moving from one place to another in a really structured way. So they’re perfect for things like analytics and dashboards, and even like machine learning and other types of artificial intelligence.

Within Cobry we use our data pipelines for moving our data into BigQuery and then ultimately into Looker, where we can create powerful dashboards that can even update in real time. Our goal is to analyse our own support desk so as to better understand client needs and identify where we could go further to help them succeed. 

If we were trying to process this support desk data without a data pipeline, it could be messy, costly and time consuming. Sometimes there is code in random places that each need to be maintained separately. But when you’ve got a data pipeline, you know that you’ve got everything in one place -  flowcharts, logs, metrics and resources - and it’s all structured and effective. 

Curious to know more? Read about how our Developer Lewis built a helpdesk reporting system using Looker and BigQuery.

To find out how you could use data pipelining in your business, get in touch with Cobry. Our team of experts will blend their deep knowledge of GCP’s data tools with your organisation’s specific needs. We’ll get your data infrastructure up and running, and you’ll be able to start making data-driven decisions in no time.

Stay Social

© Cobry Ltd | 0333 789 0102
24 Sandyford Place, Glasgow, Scotland, UK, G3 7NG
167/169 Great Portland Street, 5th Floor, London, W1W 5PF
Privacy Policy

Care for a towel? 👀