And What You Can Do About ItBy: Marc Staimer, President & CDS of Dragon Slayer ConsultingChange. The word alone immediately elicits anxiety in IT professionals responsible for their organization’s
computing infrastructure. For the storage administrator, change is to be avoided because in the minds of
many, that’s when things break, especially with SAN, NAS, and Unified storage. And yet, change has
become inevitable in today’s rapidly evolving computing environment. Virtualization; petabytes to
exabytes of stored data; always on 7 by 24 by 365 world economy; rising user expectations; expanding
power and cooling requirements; larger more expensive data centers; and more are stretching and
consistently breaking traditional storage infrastructure. This is compelling IT organizations to look at
Cloud Services as a viable alternative.
Cloud Services provides IT organizations an alternative to traditional computing infrastructure. They no
longer have to be IT experts to have world class IT operations. They can run their business efficiently, do
their jobs, no longer waste precious human cycles on IT, and do so at an equal or lower total cost of
ownership. Cloud Services has shown significant value in cutting application development time, reducing
deployment costs with equivalent or better quality, increasing user control, enhancing governance, faster
time to market, greater flexibility, and lower costs. It is here to stay.
However, as service providers have delivered Cloud Services, they’ve quickly discovered they’re not
immune from the same storage problems affecting their customers. This has caused severe angst and
consternation about the shortcomings of traditional DAS, NAS, and SAN storage systems. These
shortcomings rear their ugly head much sooner for service providers because of the large scale of their IT
ecosystems. It becomes readily apparent that traditional storage systems are incapable of meeting Cloud
Services needs and requirements without throwing large amounts of money and/or people at the
problems, something few service providers have in abundance.
The root cause of these service provider headaches is the extremely rigid approach of traditional storage
systems. They’re designed for different market requirements and candidly a different era in computing.
They are much more manually labor‐intensive requiring extensive storage administrator expertise. This
has been acceptable in the corporate IT world where experts manage the system complexity, capacity,
performance, and operations as well as data protection. Experts are required because these storage
functions must be prognosticated and pre‐planned with amazing accuracy well in advance of need.
Changes to storage capacity or performance not anticipated and accounted for, creates a mad scramble
with potentially dire consequences, in addition to acute levels of stress. Technology refresh is another
extremely time consuming and costly aspect of these legacy rigid storage systems that taxes the Cloud
Services service provider to the breaking point. The legacy storage model just does not work very well or
at all for this new model, especially for the Cloud Services provider.
Storage vendors argue that their systems have evolved with advanced software functionality. Although
true to a point, the end result does not make their systems more alive, or less rigid, or even less
expensive.
The lack of dynamism, adaptability, flexibility, resilience, and built‐in intelligence falls well short of Cloud
Services needs. To top it off, non‐living storage system cost performance is completely out‐of‐line with
Cloud Service provider requirements. They need SAN storage or better performance but at significantly
lower costs. Market conditions require Cloud Services providers to provide their service at a compellingly
lower cost than their customers can provide themselves. With storage being such a huge part of the total
cost of ownership, getting control of the costs are essential. Traditional storage systems just do not allow
the Cloud Services provider to do this.
This white paper will take a deeper look at Cloud Services operational and financial requirements; the
problems legacy non‐living storage systems cause Cloud Service providers; how and why the work
arounds fail; as well as the best way to solve these problems right now.
Deploying Cloud Services with “Non Living” Storage Does Not Work
Cloud Service provider success comes from leveraging the following key
aspects of cloud technology:
• Multi tenant
• Highly Continuously Scalable From Very Small to Very Large
• Pay for Use Paradigm
• Loosely Coupled Service
• Transparent data resilience and security
• Simple to Implement, Operate, Manage
This is what defines Cloud Services technologies and where traditional storage systems come up short.
Financially, traditional storage systems come up even shorter.
Common sense makes it clear that Cloud Service providers must provide an equivalent or better level of
service IT organizations are accustomed to by delivering the services themselves. But, Cloud Service
providers need considerable economies of scale to be able to provide those services at a cost point that
enables them to make a profit. Storage is a huge part of the Cloud Service providers’ cost. Regrettably,
traditional storage never gets to those necessary economies of scale. At least not to the point that makes
those Cloud Services cost effective and compelling. A deeper examination of the requirements shows
why.
Multi tenant means that in a shared Cloud Services environment, a
specific client is the only one who can ever see or access his or her
own data. Neither service provider employees nor other clients can
ever deliberately or accidentally access their data. This requires both
application and storage multi tenancy. It is the storage multi‐tenancy
that is bit more complicated.
Encryption methods require key management by the application or
user. Integrated charge‐back billing for the storage resources actually
utilized is an anathema to most traditional storage systems. Virtual
namespaces within the storage system are a must so that resellers can
provide unique services without having to have completely separated costly storage infrastructures. Once
again this is rare for traditional storage systems.
This lays the responsibility completely on the service provider. They have to go to a lot of trouble
time developing, documenting, supporting, fixing, and patching a customized application to overcome
shortfall. Quality assurance is a challenge as systems change or software on those systems change.
Legacy storage systems were just not architected for multi tenancy. With a lot of work, effort, and cost,
multi tenancy can be bolted on, but that work is not a one time effort. The work is ongoing. More
importantly, the result tends to not be adaptable to the constantly changing customer base and
requirements of the Cloud Service provider. In other words, this is not a viable long term solution.
Highly Scalable From Very Small to Very Large
It is the ability to scale to extraordinary levels, from dozens of petabytes to exabytes, that makes the
Cloud Services business model viable. It gives service providers economies of scale enabling them to provide a compelling business case for their models. One of the key components to a Cloud service’s
scalability is the storage.
Ask any storage vendor if their storage is highly scalable and they will vigorously say yes. What does
“highly scalable” mean? How is it defined? More often than not it is a matter of degree directly affected
by experience and organizational requirements. What is highly scalable for an Enterprise is not even close
for a Cloud Services operation and overkill for a SMB.
Pay For Use Paradigm
Cloud Services are marketed and purchased as an operating expense. Pricing is on a per use and/or user
basis. Cloud storage is also priced the same way on per GB used per month basis. Customers/clients only
pay for the resources they utilize. This is a complete paradigm shift from the way legacy storage is
marketed and purchased. The current traditional storage model prices all of the storage required over a
three or fouryear period, plus all of the software, and even the maintenance, bundled in one up front
price. In addition, these storage systems calculate costs on raw storage, not usable. It is a misalignment
of models like trying to put a square peg in a round hole.
That traditional purchasing model places all of the risk on the Cloud
Service providers. First the service provider must accurately forecast
how much storage they will need over a specific timeframe. This is
exceedingly difficult for service providers that have a “lumpy”
customer and revenue stream. Then they have to hope they can sell
adequate services to cover the costs of their upfront storage
investment. The storage vendors share none of the risk.
This legacy model is incompatible with the goto market strategy of
the Cloud Service provider. It is not necessarily a death knell for their business; however, it places it in
significant jeopardy.
Loosely Coupled Service
Loosely coupled services are designed to leverage commodity components. By definition a loosely
coupled service has no dependencies on specific hardware enabling movement or changes transparently,
easily, and without disruptions. This concept is at the core of Cloud Services. It allows the service
provider to use low‐cost highly reliable commodity components to provide a highly resilient, high
performance service. That ability to provide application continuity even as hardware is being upgraded,
replaced, or modified is one of the key selling points for Cloud Services. No user downtime is the key.
Legacy storage is the antithesis to being a loosely coupled service. Data is tightly coupled to systems,
volumes, file systems, drives, name spaces, etc., that creates a highly deterministic, manually laborintensive
environment. Increased labor costs are not the correct recipe for a successful Cloud Services
business. Even worse is that the tight coupling is to high cost proprietary hardware and software that
locks in the customer unless they can tolerate the very high cost of moving away from the proprietary,
rigid inflexible architectures.
This is especially unfortunate during the hardware and/or software refresh cycles. Replacing and
refreshing legacy storage systems are terribly application disruptive. It’s also a major expensive time sink.
There are 43 distinct manually labor‐intensive steps with multiple tasks within each step to replace and
migrate a SAN storage system (39 for NAS). Many of these steps, such as server remediation, are
intensely error prone. Data can be and often is corrupted. And all of these steps require huge amounts of
time, people, coordination, cooperation, and communication. Time that’s measured in months even
years, not hours or days. Add in the cost of professional services, plus the storage system overlap costs
(both storage systems on the floor, powered, and paid for at the same time with the use of only one), and
the fact that this refresh cycle takes place every 3 to 4 years, creates a cost model that is untenable for
Cloud Services.
Transparent Data Resilience
The trade press and Internet have numerous headline grabbing stories about Cloud Service outages or
breeches. These headlines are always followed by speculative blogs about whether or not Cloud Services
will survive. This is a common occurrence and standard operating procedure for all paradigm shifting
technologies. It occurred with SAN storage, NAS, the World-Wide-Web, and now Cloud Services.
All IT systems fail. When they fail in a private data center with limited visibility to the outside world, there
are no headlines. Private data center system outages and security breaches occur far more frequently
than is common knowledge and far less frequently than public Cloud Services. Market perception
historically lags technical reality. Nevertheless, Cloud Service providers strive to mitigate or eliminate any
outages or security breaches because in the market perception is reality. A simple Internet search on the
root cause of many of these very public failures shows human error as the primary cause. Examples of
these errors include a misconfiguration, inappropriate parameter setting, false assumption, incorrect
policy, and more. These are manually labor-intensive tasks. These are exactly the same kinds of tasks
that are so prevalent with legacy storage systems.
It is incredibly difficult for Cloud Service providers to minimize outages and security breaches when
utilizing legacy storage. Legacy storage increases opportunities for these outages and security breaches
because of the increase potential of human error.
Cloud Services require transparent data resiliency (no noticeable declines in performance or access) to the
customer data availability even when data is lost, corrupted, or destroyed for whatever reason.
Legacy storage systems have historically been anything but simple or
intuitive. They are getting better. Driven by customers that lack the
expertise that was expected in the past, these systems have overlaid multiple layers of management and
functionality to simplify implementations, operations, and management. However, these layers are like a
Russian Matryoshka or Babushka doll (Russian nesting doll) when it comes to making changes. To be
simple means to keep it simple or static. Dynamic changes are often challenging and difficult, if at all
possible, require extensive expertise and/or expensive professional services, and do not happen in real
time.
Simple to Implement, Operate, and Manage
Putting storage expertise into the storage system instead of the
administrator reduces workloads, tasks, errors, time, expense, and
people. As sensible as this may be, it is a hard and fast requirement for
Cloud Services. Lean and flexible is the name of the game when it comes
to Cloud Services.
Cloud Services are never static. Put bluntly, legacy storage systems cannot meet Cloud Services
requirements.
Real World Work‐Arounds and Why They’re Inadequate
It’s human nature upon discovering difficult problems to figure out a solution or work‐around. And there
are a number of common workarounds that Cloud Services storage admins attempt to implement with
the legacy storage problem.
For multi-tenant billing they try utilizing third party software or write their own scripts. But as pointed
out previously, these efforts often fail in a sea of frustration from lack of ongoing QA, documentation and
their inherent inflexibility.
Another work-around for multi-tenant security is drive encryption. Drive encryption does not prevent
different customers or services from having access to the drive. It only means data written to the drive is
encrypted at rest. If the drive or drives are accessible through the storage system, the data can be
accessed.
The usual workarounds for capacity scaling issues, and to lesser extents object scaling and performance
scaling, is to go to a storage systems sprawl scheme or scale out. As previously discussed, storage system
sprawl is a non-starter for Cloud Service providers. Scale‐out storage, especially scale‐out NAS, have been
gaining in popularity; however, scale‐out storage has its issues. First, each additional node in the cluster
has diminishing marginal returns meaning the capacity, performance, and object gains are less than the
node before. Eventually, additional nodes reduce scalability. Capacity typically tops out in low double
digit PBs, a couple of orders of magnitude too low. Objects are still an issue typically topping off in dozens
of millions, not billions. And the total cost of ownership is still too high for ultimate Cloud Service
provider success.
The workarounds for performance scaling also include utilization of FLASH SSD as cache or as Tier 0
storage tiers and HDD Short-stroking as a cheaper alternative. These workarounds do scale performance
within the limitations of the storage system at a very steep price making the price performance too high
for Cloud Services providers to be cost competitive.
A common pay‐per use paradigm work‐around is to lease. Leasing turns all that upfront CapEx into a
monthly payment. It does not reduce risk and in fact ultimately costs more. Leasing has multiple
components: the residual value of the storage system at the end of the lease period; the amount to be
financed (difference between sale price and residual); the interest on the residual value during the lease
period; fees for leasing; and the potential penalties for running long on the lease because of data
migration timeframe issues. Leasing is not a good alternative to pay‐per‐use.
To limit the negative legacy storage system tight coupling impact many storage admins implement server
virtualization with a storage virtualization layer. This allows a bit more flexibility by loosening the bonds
between the application and the data. It does so at a very steep cost that requires duplicate identical
storage systems, multiple copies of the data on tier 1 expensive storage, and redundant supporting
infrastructure.
Working around the transparent data resilience issues is more complicated and more expensive. It comes
down to copy and replicate with multiple copies of the data on multiple legacy storage systems. But
creating the transparency to the multiple copies is a bit more difficult. When data is lost, corrupted,
misplaced or deleted, the alternative copy or copies must be mounted on the application (NAS) file
system or pointed to the correct LUN (SAN or DAS). Both are manual labor‐intensive tasks and far from
transparent. A costly exercise in time, people, infrastructure, and storage costs.
Managing around legacy storage system complexity usually means homegrown scripts that are rarely
documented; QA’d; updated; or kept up with ongoing system changes. They tend to be inflexible with a
limited shelf life. As personnel leave they have to be completely rewritten.
Organic Storage Meets or Exceeds Requirements
Organic Storage is living storage. It mimics the way life adapts to a constantly changing analog world. It’s
a loosely couple grid of self‐contained nodes each with shared processing, IO, and storage. Nodes are
interconnected on a TCP/IP Ethernet network allowing equal access and allocation of resources on
demand. By leveraging object storage (blocks or files stored with their metadata and index as a single
object), Organic Storage is able to scale to unprecedented amounts of capacity (hundreds of petabytes to
exabytes), objects (many billions), and performance (tens to hundreds of millions of IOPS). Organic
Storage has numerous significant advantages to Cloud Service providers. Some advantages include:
• Built‐in multi‐tenancy and security.
• Adapts in real‐time to changing demands, loads, and performance.
• Distributes over many independent commodity, off-the-shelf commodity components.
• Self heals at extraordinary new levels of resiliency that handles failures from any component or
multiple components with no material impact on performance, functionality, or availability
because the Organic Storage software is loosely coupled with the hardware.
• Scales capacity, objects, and/or performance in small or large increments without limitations.
• Adds additional nodes online that always positively increase capacity and performance without
ever stopping the system.
• A pay-per-use licensing paradigm. Licensed on utilized storage on a per month basis sharing the
risk with the Cloud Service provider.
• Refreshes storage in a manner analogous to the way organic systems replace their cells:
Continuously, progressively, transparently, without application disruption.
This makes Organic Storage ideal for Cloud Services providers. It is the only type of storage that meets or
exceeds all Cloud Service operational and financial requirements.
Scality RING: The Leading Organic Storage System
Scality RING was architected from the ground up to be Organic Storage and exceed Cloud Service
requirements. It is analogous to an organic autonomic nervous system actively and adaptively managing
the what, where, when, why, and how of storage and retrieval without human (conscious) intervention.
Scality Ring leverages its unique industry hardened peer-to-peer technology to provide carrier‐grade
service availability and data reliability.
RING Organic Storage is made up of standard off‐the‐shelf
commodity server nodes. Each node on the RING is responsible for
its own piece of the overall storage puzzle. Every node is
functionally equivalent. A node can fail, be removed, upgraded, or
just new ones added and the system will rebalance automatically
without human intervention. This makes technology refresh a
simple, online process with no application disruptions eliminating
data migration, long nights, and sleepless weekends.
There are no requirements for a master database or directory. The
Ring utilizes instead an incredibly efficient Distributed Hash Table
(DHT) algorithm that consistently maps a particular key to a set of
storage nodes. DHTs provide the lookup service similar to a hash table. Key, value pairs are stored in the
DHT that any and all participating node retrieve the value associated with a given key. Keys embed
information about Class Of Service. Each node is autonomous and responsible for checking consistency
and rebuilding replicas automatically for its keys. Responsibility for maintaining the mapping from keys to
values is distributed among the nodes, in such a way that a change in the set of participants causes a
minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to
handle continual node arrivals, departures, and failures.
The DHT decentralization is the key
to consistent performance that
scales linearly. The nodes
collectively form the system without
any central coordination,
bottlenecks, or single points of
failure. This provides performance
that rivals the fastest SANs (without
any SAN complexity or cost) even
though the applications
interconnect to the RING are the
very simple standardized REST.
Loads are always evenly distributed
and balanced between nodes. DHT
decentralization also enables unrivaled capacity and object scalability. RING scales from dozens to
thousands of nodes, tens of petabytes to exabytes, and billions of objects.
RING’s DHT comes with unsurpassed built-in system data resilience similar to an organic immune system.
Every node constantly monitors a limited number of its peers, and
automatically rebalances replicas and load to make the system
completely autonomically self-healing without human
intervention. Consistent hashing guarantees that only a small
subset of keys is ever affected by a node failure or removal. The
result is a very high level of fault tolerance because the system
stays reliable even with nodes continuously joining, leaving, or
failing.
Advanced key calculation algorithms allow modeling any kind of
geographically aware replication deployment. Data can be spread
across racks or data centers to follow business policies even in the
case of server failures.
In addition to meeting or exceeding Cloud Service provider
operational requirements, Scality RING is software that licenses on
a pay‐per‐use model. Service providers pay on a used capacity
basis, not raw. And because Scality RING utilizes standard off‐the‐shelf commodity servers obtainable
from the service provider vendor of choice, it always provides the lowest possible TCO.
Conclusion
The requirements of Cloud Services are unique. Legacy rigid deterministic storage does not and cannot
meet these requirements. What’s needed is Organic Storage that mimics the way the living adapts to
ever changing conditions while matching Cloud Service provider business models.