1. Anand Ganesh, Founding Software Architect at Hammerspace: Data-as-a-Service solves traditional security problems

Welcome to the Security Podcast of Silicon Valley. I am John McLaughlin, the host, and today I am joined with a very special guest, Anand Ganesh, the founding software architect at HammerSpace. HammerSpace, for those of you who don't know, they're working on storageless data as a service. So Anand, great to have you on.

Thank you, and thank you, John, for having me on the podcast. It's a pleasure to connect again and take this time to talk about storageless data and many other things. I'm super excited to hear all about it. So let's start right at the beginning.

So what is storageless data? Right, so storageless data, the term is, the intent is to evoke in the mind of the listener the same benefits that they have probably experienced and know about with serverless compute, which is that at the end of the day, you don't care about where the compute resource resides, where the infrastructure is, on the planet, what kind of server technology, all you care about is that you get compute for your job. So storageless data is we are building technology that allows you to serve data on any site, on any infrastructure, which is, and then you can orchestrate your data just using intent at a file granular level.

So that's where storageless data comes in the picture. Now I wanna orient maybe your audience a little bit as to where HammerSpace resides, because you have storage, which is, you know, your NAS storage like NetApp or Isilon, or your object storage, which is AWS, Azure, GCP, but also some on-prem technologies like MinIO or Scality. And HammerSpace is really, it's a platform that is, that allows you to disaggregate metadata and data. So HammerSpace as a platform has two pieces that make it up.

One is what we call the Anvil and the other one is the DSX. And the Anvil is our metadata server and DSX is really, you know, the data services component. So what happens is that HammerSpace sort of sits on the side and it routes your data, it routes your clients to their data. And then once the clients know where the data is, they directly talk data to the storage that hosts the data.

So a lot of times people ask me, does that mean it introduces a hop? It doesn't because it's not, there are technologies that'll sit in the IO path, but we sort of sit on the side, if you will, where we just allow your clients to talk directly to the data server. It's built on an open protocol, the PNFS Flex files protocol. There's an RFC for it.

And so you don't have to install anything proprietary on your Linux clients. They just talk PNFS. So your clients will talk to the Anvil, get the location for the data, and then just directly talk data to your storage volume. Wow, that sounds so simple.

I really like the simplicity. I like the use of the open standard. So in terms of security though, what are the traditional security problems that we face when we're thinking about data storage? And I guess we could put that into context.

So with an abstraction layer now, with someone else managing the service, how can we benefit from that? Right, so maybe I should touch a little bit upon sort of the trappings of traditional storage and how sort of HammerSpace solves that. So when you manage your data through the storage layer, you're sort of doing upside down management. All your directives are usually at the storage volume level for folks that are sort of familiar with storage layer terminology.

You're not file granular. And also a lot of times you're very prescriptive. You are doing specific tasks and you are replicating data in groups like at the volume level and whatnot. And it's seldom self-service.

There is a clear role between the person who is using the data and the person who's managing the storage infrastructure, what we call the storage administrator, right? And usually when they're doing that, they have to sort of manually understand how each storage technology works. And even if it's an open spec like NFSv3, which is three decades old, every vendor will have a certain difference to how they implement the protocol. And so you have to learn and integrate with all these nuances, which all gets taken away from your plate as problems to solve when you're with HammerSpace because we've done that hard work.

We've automated pretty much everything when it comes to managing your storage. And so at HammerSpace, you drive data through metadata, which is very important because the metadata is the layer that can be augmented. The metadata layer not just contains your POSIX metadata, but also stuff that you may tag onto it or you may derive from running some sort of analysis on that data or processing the data, right? And so you can completely orchestrate the movement of data or the locking down of access to that data.

Using metadata. So that brings us to, you know, what are the traditional storage problems? So first of all, when you have traditional storage, you have your permissions model and you have to make sure you manage and lock down your permissions. And that's obviously prone to human error, like anything that is done manually.

When you copy data over, you are provisioning storage at a different location or in a different data center for a different project. You are now exacerbating that problem because you are copying those files, which means they have to be locked down over there and you have to make sure you've set up your permissions over there correctly. And think about this at the multi-site global level, like you have data between, let's say, Australia, Europe, and the United States, and you're doing this across time zones and whatnot, which becomes a complete nightmare.

So when you find data that needs to be remediated, like there is a file that contains proprietary information that is out in the open, is not encrypted or whatnot, then a lot of the remediation workflows cause you to move the data to a quarantine server. Now you've made the problem even worse because you've lost lineage because it's sitting on a different system under a different export or whatnot. You've lost lineage. Yeah.

You've lost ownership. Like you don't know who owns the file anymore. So all these problems are solved by HammerSpace because in HammerSpace, a file is a file is a file. There's only one file.

It's just rendered on different sites and it's created on the infrastructure on those sites so that you can access it locally. But all your management for permissions and the objectives you set on it and whatnot can be done in one site, right? So now you have really simplified the management of data across all these sites to a single point, which is the file. I love it.

So you just set up your security policies, your management policies, your access policies in one single pane, HammerSpace. The HammerSpace provides. And no matter who is accessing that data, what client they're accessing that data from, they're going to be governed by the same set of rules. Yes.

Really great simplification. And the one thing I'll also mention, which is, you know, which is, you'll appreciate this given your work with Peacemaker. You usually cannot do sort of BYOK, bring your own keys with traditional storage. Whereas with HammerSpace, because we've solved the problem of data mobility, we can non-disruptively move data from one infrastructure to another, whether that infrastructure is within the same data center or like completely in a different geographical location, we can encrypt your data using the keys that are, you know, from your environment, from your key management systems.

And that gives you a completely different level of control and power. So that's something that has not been possible until HammerSpace has showed up on the scene, right? Until a key from a customer is able to encrypt data that's protected, geographically dispersed. Maybe it's even dispersed across different data centers, across different service providers.

I don't know exactly what sort of service providers are underneath the hood in HammerSpace. Google, AWS. Yes, any AWS, Google Azure, your own object storage in a certain data center. Oh, my own on-prem.

Yes, your own on-prem doesn't really matter because we can move it into any storage container, into any storage volume, and we can encrypt the data if you want it to be encrypted. That's spectacular. And so that provides very robust encryption at rest across that entire ecosystem of available storage. Exactly.

Amazing. So in the security community, you know, everyone has a very distinct and specific, it's almost like a community-defined term, but data protection in the storage community means something very specific. What does that mean in the storage community? So in the storage community, it's a good point.

In storage community, data protection is really, you know, how do you recover your data easily under different failure circumstances, right? So in the storage community, we talk about snapshots, backups, disaster recovery, you know, building up availability zones and making sure your data is replicated between those availability zones and so on and so forth. So that's really data protection for us. And of course, security comes into play because, like I just said, we can encrypt your data using your own keys, which means it's, but we usually talk about that in the context of the term security, not really data protection.

Right. Data protection, I think of as more of the availability, the backups, your recovery times, those sorts of things. Right, right. So at HammerSpace, we do, of course, we do all of these.

And you can take snapshots of your share. So at HammerSpace, you export your data to your. . .

Clients through a share, right? It could be SMB clients or NFS clients. And you can take a snapshot of your share. And you can take this snapshot.

Let me paint a little bit of a picture here so I can better express this. You have your share that has your data. And your share is really just a container for the metadata. It's just containing, imagine the namespace, the permissions, any metadata that you've tagged on the files and whatnot.

But the data that makes up each of these files can be sitting on any and all infrastructure underneath that share. Some data could be on a NetApp volume. Some data could be on an Isilon. Some could be on pure storage.

Some could be in AWS. Some could be in NetApp storage grid. It doesn't really matter. So now you have a share that has its data sitting on these multiple disparate storage containers.

And we have share level snapshots. You can take a snapshot of the share, which means you're taking a sort of a point in time snapshot of all this data across all these storage systems. And that becomes pretty powerful, which is kind of going back to something I was saying earlier is once you know Hammerspace works with a certain storage technology, then you know that all these data services are going to work across all these storage technologies. Right.

And of course, we do backups for backups. Backup is an interesting challenge because you have metadata and data which is disaggregated. So the question is really, what are you backing up? Right.

So for Hammerspace, when you do a backup, the way we differentiate between backup and snapshot is snapshot is really something that you can sort of go back point in time for your metadata and data. And of course, you can instantiate a completely new share with your snapshot is what I was talking about. So you can go back in time with a snapshot. You can instantiate a new copy of the share with a given snapshot copy, so on and so forth.

When talking about backups, we're really talking about reinstalling your Anvil and your DSX, your Hammerspace sort of infrastructure. Right. And if your storage volumes, if you've lost your storage volumes, really the backup is not going to bring it back because you are sort of storing your metadata. But, you know, your data needs to be, you know, sitting on some storage volume.

So really, your snapshot and your replication of data across availability zones gives you the data protection. And then your backup, the Hammerspace backup technology allows you to back up your entire estate and sort of reinstall it on a fresh hardware if you. I got you. I got you.

So backup is more for the just the metadata, all of the data about the data, so to speak. And your snapshots and your high availability gives you the business continuity, your ability to restore your actual data on top of your snapshots. Exactly. Exactly.

And we have features like Undelete. Like if you delete a file, it'll still be available for a certain amount of time. It's configurable in the product. That's convenient.

That would help with, you know, your ransomware attacks. Exactly. Ransomware attacks or just users. We want to make access to your data as self-service as possible.

So the end user basically just if they fat finger something, they delete a bunch of files, you can just go into the Undelete location under the share and sort of bring back the files. And the other challenge with, of course, backup is we have data mobility, which means you could take a backup of your metadata at a time where the data is laid out in a certain way. Your data is mapped to certain volumes. But then as time goes by, your data could move based on objectives that you have, which means when you restore your metadata, your metadata could be pointing at data objects that don't exist at the same place anymore.

So how do you solve that problem? Right. So we have something called as a mobility replay log that you have to figure out where the data moves so that when you go back in time in metadata, you can sort of go back in time in data and figure out where, you know, where the data is now compared to where it was when the backup was taken. So those are very interesting technical challenges that we've had to solve to make a backup restore a viable option for this sort of a metadata and data disaggregated product.

Yeah, it almost sounds like Git, but for the abstract representation of data. Right. Right. The one last thing I'll say in this in this area is that because you can you can instantiate Hammerspace on multiple sites and it's all active active, it's not active passive.

You are a global file system sort of becomes a DR becomes your DR solution. Right. Because you if you lose a site, you really are not waiting. You're not.

Converting one site from passive to active, you're not changing the direction of replication and all the stuff that we've learned to do with traditional storage. You're not doing that anymore. It's active-active. So one site goes down, boom, you just mount from another site and continue.

And you're online. And you're up and you're running. I love that. It's only because of the architecture that Hammerspace took to solve those complex problems and bring that abstraction layer up higher that you're able to.

. . Precisely. Right.

Precisely. Excellent deployment of technology to make everyone's lives better and ultimately save our time and our energy so that we can focus on our end business goals, right? Completely. Yes.

So change topics a little bit here. And if we go back to, you know, data security, I tend to think of data confidentiality and data privacy. So how does Hammerspace approach confidentiality? It sounds like we have a lot of great tooling here where we can distribute our deployment of file systems across very wide regions and seamlessly have it all active-active.

Right. So the cool thing about building Hammerspace this way is just like you can orchestrate data to a storage container, you can orchestrate data to a processor. So we have support on the Hammerspace platform for what, you know, generally in the industry, we call it Lambda functions, right? Where you can plug in a certain piece of compute that can look at your data and derive additional metadata from it, right?

And how does that work here? So you can configure, let's say, data loss prevention tool as something that can scan your data and figure out if there is data that is breaching confidentiality or privacy. You have, I don't know, files with credit card numbers out in the open and some such, right? So we have support for that.

The first couple of technologies that we've integrated with are antivirus scanning and image recognition. One is core to making sure your data is protected and whatnot. The other is more about working with your data to be able to identify what kind of images you have in your environment. So the cool thing about that is when the processor runs under data, we derive metadata from it.

So in the image recognition case, you derive your labels. And in the AV scanning case, you derive whether it's a threat or not a threat and what kind of virus it found and whatnot, right? And if you integrate with DLP tools, it can tell you whether the data has any HIPAA or SOX or any of these, if it has data that needs to be protected as per these laws, right? And the other thing about our platform, like we've talked about earlier, is you can do all of this through objectives.

So you just put an objective on your data that says, process my data through this Lambda function and boom, that's it. That's all you have to do, which we orchestrate everything on the backend through your processor, derive the metadata and make that metadata available through the namespace. And once you've done that, you can put objectives further on that derived metadata to say, if the file is found to be a threat or if this image contains material that is, you know, inappropriate or whatnot, prevent access to the content of that file, right?

So we bring in all this to say that we've built a way for us to integrate with any technology out there that can process data and provide value to the customer in terms of, you know, data security and then using our objective system, you can prevent access to data that shouldn't be out in the open. And additionally, beyond that, we've talked about the bring your own keys model where you can have Hammerspace run your data through, you know, get access to your key management system or your on-premises hardware security module, get the keys, encrypt the data so that while it's sitting on your storage container at rest, it's protected.

And using metadata as the context for security and access and making the platform flexible enough that you can plug in any Lambda function just opens up the possibilities for how you can secure data with Hammerspace. Excellent. So even though as a customer of Hammerspace, I could bring my own key, I could encrypt, you know, with, I guess you mentioned an HSM integration. Wow.

So I could have a key come from an HSM and encrypt all of my data using the Hammerspace layer and still run a DLP, a data loss prevention tool using the Lambdas. Yes. That's very powerful. So you have an internal key management system as part of Hammerspace.

So that's great. I know those are tricky to build. Yes, we spent time making sure that the key management layer is generic enough that you can plug in any key management system, you know, KMSs from cloud vendors like AWS or Google or your own HSM. And for folks that want some basic security and don't necessarily want to use these technologies, we are contemplating doing like a password-based system, right?

For us, it is all just the same because at the end of the day, it's grabbing access to the key, which could be a password or it could be talking to an HSM or a KMS, and then using that in the data orchestration layer to make sure that when the data is passing through this encryption layer, if you will, it is going in as plain text on one side and it's getting spit out as ciphertext on the other side before it sets on your storage layer. Yeah. You know, it's interesting in the data security space, you know, because you see this evolution of security technologies. First, there's TLS just protecting our data as it moves across the network.

And now there's all of this really great work with encrypting at rest. So protecting our data with encryption as it rests on a persistence layer. And HammerSpace is a very interesting persistence layer that's distributed and active-active across all of its clients and And very protected with some very cool features around that data protection, including this BYOK and you're integrated with KMS and HSMs and all of that great stuff. And if we continue that trend, we go up the stack, like eventually we'll see full end-to-end encryption, you know, where we're inside applications encrypting perhaps data inside applications.

Completely, completely. And this is why I love what you're building at Peacemaker. I think the holy grail is really making sure that the data is not present in plain text anywhere between the app that is working with the data and where it's at rest on the storage layer. And those two places, you protect them and anywhere in between it needs to be protected, right?

And it's important to talk about this because we know about data breaches that have happened at, you know, public cloud vendors. And we know that it's because there are systems on top of your data at rest that need to be secure, need to be manually managed, and need to be correct at all times for somebody to not gain access to your data. But that is, in my opinion, just a losing battle because how many of these systems will you protect, right? And how many will you get right before you can feel peace of mind?

And I think the only thing that is viable is when the data is at rest is protected, not with some sort of server-side encryption where somebody, when they try to go through, if you have a breach in your permissions model, they can get access to your data and it's automatically decrypted for them. It should only be decrypted by the keys that are securely in your pocket. And that would mean making sure that the app is the only one that can get to your pocket and get the key so that it can look at the data. And so definitely this is an important problem.

And I think what you're doing at Peacemaker is super cool. No, I super appreciate that, Nan. We're going to make a small dent in a very large problem here. And you're spot on that the only things that should access that data are the apps themselves.

Though I do think of security in terms of layers and appreciate all of the efforts and the care that you've taken at HammerSpace to ensure a solid at-rest encryption layer across a distributed system, you know, managing keys. But at the end of the day, this is a journey, not a destination. And I think this may be our next step in that data security journey is to take one more step up the stack. You talk about those breaches that had happened, just for reference, the end treatment breach, right?

Which was, what was it? It was a technology company that was providing healthcare and patient records management services. They had a Microsoft Azure bucket. It was open to the internet, had a hundred thousand or so files in it, just readable from the internet.

If there was an application that was referencing that data that had encrypted everything, right? There's your protection right there. Yeah, you wouldn't have had the exposure. Totally, totally.

Exactly. There's the other cases out there. That's just one example of a misconfiguration. But if you have an insider threat, do we trust the vendors that are providing us these services?

Do we trust AWS? Do we trust HammerSpace's infrastructure? And I think that as consumers, we shouldn't be forced into a position where we have to trust other people's systems in order to be able to build interesting services that are secure. Right.

And with BYOK and HammerSpace, I feel very comfortable in a space where I have to trust it much less. Yeah, exactly, because it's your key and you know that if you block access to the HSM, nobody can access your data through a HammerSpace, right? And that's the peace of mind that we want you to have. That's excellent.

And that's actually a nice way to delete data too, right? Yes. Yes. So if I encrypt a whole bunch of data and all of a sudden the rotation on the key expires and I throw away the key because that's what it says to do in my key lifecycle management policy and the data is out there in HammerSpace distributed systems, not even HammerSpace can access it anymore, right?

Correct. Yeah. You've logically deleted your data. And that's an interesting point you bring up because there are some of our customers that have asked us how they can.

. . So you have these laws that'll mandate that you have to keep certain data around for a certain amount of time. And beyond that time, it's really a liability to have that data be around, right?

And if you get put on a legal hold and whatnot, then your data has to be around for a very long time and so on and so forth. With HammerSpace, really, and we'll talk a little bit of security terminology here, but if your IV or your salt that you're using for encryption sits with the metadata layer, you RM the file from the namespace and you have lost, you've made it impossible for somebody to go and decrypt your content, right? So you are logically, as soon as you delete it from our namespace, you've logically deleted the data. And the reason I mentioned that is with some of our storage layer technology, we do deduplication and data sharing and whatnot.

And our deletion of data on the storage layer is lazy because we sometimes will run a background job to go and whack things that are not referenced in the namespace anymore. Sure. Which means some of your data could reside beyond stuff being deleted in the namespace. But if you do it this way, as soon as you remove it from the namespace, you've lost all capability to decrypt what's on your storage.

That's some really awesome stuff. I like how easy it is to securely delete data in that system. It's an important feature that oftentimes gets overlooked, especially when we get excited about moving forward and charging ahead. But you're right.

That's absolutely an important use case, especially in the financial industry. They're required to keep all of that data around for a certain period of time. And then it turns into a liability. I suppose you could even make the case that it's a liability during the required period as well.

Right. But we won't get into that. So with all of that great work that you and your founders and everyone there at HammerSpace have been pouring into the product, what do you think the next big piece is for HammerSpace? You don't share any of your secrets or anything like that, but what do you see for the future?

So I think that the need to have data be available on any site, because we are more and more seeing with people working from home and people being all over the planet trying to collaborate with the pandemic here. And this trend is going to continue beyond that. It's super important that your data is available where you are. And that's where our global file system comes into play.

You have a namespace that is describing your files and you don't want to spend any amount of energy trying to move data around, copying it over, so on and so forth. You just want to work with your data and you want objectives to automate the delivery of your data, the new data that you created today. Maybe you want to deliver it tonight to your colleagues that are working off of a different site that may, you know, maybe you've been following the, you may be doing the follow the sun workflow in your organization, whatever the need may be. That becomes super important.

And I think that the global namespace that is, we didn't talk about this much, but we built it decoupled. It's active, active. It's decoupled, which means if one of your sites goes down, all the other sites continue to work as is, as if they were not replicated. And all these pieces together make HammerSpace a very compelling solution for companies that are looking to have a global presence and not have to worry about delivering data to their site.

So any site, any infrastructure is what we're aiming for. Anyone and everyone that's trying to solve that problem and looking to help them. That's an amazing data as a service solution. I love things that make our lives easier and that solve hard problems and solve them the right way.

So Anand, thank you so much for coming on the show. Thank you so much for having me. This was a pleasure. And thanks to all of our listeners for tuning in this time and stay tuned for more episodes of the Security Podcast in Silicon Valley.

Thank you again, Anand. Thank you.

This episode covered SOC 2.

YSecurity helps teams get audit-ready without slowing the roadmap, from first scoping call to clean opinion.

Talk to YSecurity

‹ 2. Andrew Gontarczyk, CISO of Pure Storage: Building a Security Team