Category Archives: Cloud Storage
The best NAS drives in 2023 – Creative Bloq
Networked Attached Storage (or NAS for short) at first glance seems a rather old-school way of backing up your files and creative work. Essentially, networked attached storage is a very mini server a bunch of hard drives all connected together within a system that is then connected to the network, usually your home internet system, so that anyone also attached to this system, wired or otherwise, can access the files on said hard drives. Many NAS systems come with their own software, so managing, backing up and maintaining files is a very straightforward process.
As creatives, we have a lot of work floating around different versions of files, different linked assets and different catalogues and directories that all need backing up and storing in an area that is quick to access. This is one area where NAS excels, as not only is it safer than cloud storage from a security perspective with RAID backups, it also provides lightning quick access and write speeds so it can deal with large files with ease.
In this round-up of some of the best NAS drives for creatives on the market at the moment, well be looking at a range of specifications, prices and setups, as well as assessing which types of NAS drives suit which creative discipline. Well also look at extra added features and ease of setup, which is an important aspect of recommending each product. Its worth bearing in mind that many of the NAS drives on the market dont come with hard drives (although some weve looked at are diskless or ship with them). Factor the additional cost of NAS hard drives (although standard HDD work too) into the budget when looking at the ones weve featured below.
Why you can trust Creative Bloq Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.
01. Synology DiskStation DS1522+
The best NAS drive for creatives right now
Drives: 15
CPU: AMD Ryzen R1600
Max storage: 108TB
Weight: 4.7kg
Excellent speed of data access
Good expansion options
Expensive for the amateur user
Steep learning curve for casual operation
Many of the Synology DiskStation options offer exemplary performance for creatives, with the ability to speedily work on the fly, with excellent transfer rates, a good amount of speed and memory and with the DS1522+, relatively decent value for money if youre a professional who needs to work quickly and has a lot of files to store.
Lots of the DiskStation systems can be upgraded, which also makes it a fairly decent failsafe option for the future. The DS1552+ can be upgraded to 15 drives, with DX517 expansion slots theres also built-in M.2 NVMe slots for SSD cache, which means data can be accessed even quicker during normal operation. As an all-rounder, with plenty of space for expansion, if you have the budget the DiskStation range is hard to beat.
02. Western Digital My Cloud EX2 Ultra
The best NAS drive for creatives right now
Drives: 2
CPU: Marvell 1.2GHz
Max storage: 8TB
Weight: 2.3kg
Affordable
WDs My Cloud OS is easy to use
Good package deals available
Suited to casual use only
Not the fastest device around
For those on a budget, the Western Digital My Cloud EX2 Ultra provides an affordable yet very reliable NAS solution.
It features a user-friendly setup process and with WD's My Cloud OS, managing files and accessing data remotely is fairly effortless for a casual user. Its dual-drive configuration and media streaming capabilities still make it a solid choice for basic storage and sharing needs, whilst not breaking the bank. There are plenty of deals to be had with 8TB hard drive space included for less than 400, which is less than the cost of the many NAS units themselves.
03. Synology DS923+
The best NAS drive for pros
Drives: 4
CPU: AMD Ryzen R1600
Max storage: 120TB
Weight: 2.2kg
DSM interface works well
Easy to install and get going
Compact
Costly without the drives
Only works for 3.5in drives
This is a small diskless unit with a lot of flexibility for home offices or small businesses looking to scale up to more than a one-person operation. Its equipped with an AMD Ryzen R1600 dual-core processor offering performance up to 3.1Ghz, as well as four bays for 2.5in or 3.5in drives. We liked the fact that five extra bays can be added with an expansion unit and the device features two 1GbE ethernet ports for simple integration into existing environments and network systems.
We found it offered impressive performance that ensured smooth multitasking and speedy data transfers, but at more than 500 RRP for the device alone, without hard drives, we felt the device would probably best best suited to a large work environment rather than a purely home backup system.
04. ASUSTOR AS1002T
The best NAS drive for beginners
Drives: 2
CPU: Realtek RTD1296 Quad-Core 1.4 GHz
Max storage: 18TB HDD x 2
Weight: 1.14kg
Easy to get set up
Attractive and quiet
Easy to fit into the home
Only 2 bays
Slower transfer speeds
36TB maximum size
If youre new to the world of NAS drives and youre simply just looking for something to get you started and thats going to be dependable and easy to use, wed certainly recommend the ASUSTOR AS1002T.
The device boasts a straightforward setup process, intuitive interface and entry-level pricing, which makes it a perfect starting point. Although it doesnt have the best expansion options and only provides dual drive bays, it can provide ample storage and basic multimedia features for a casual home user for example, storing photos and movies to watch across devices. We like the stylish look too it hides some of the more techy features away in a black diamond plate cover design the drives themselves are tucked away behind a sliding cover.
05. Synology DiskStation DS3622xs+
The most powerful NAS drive you'll need
Drives: 12
CPU: Intel Xeon D-1531
Max storage: N/A
Weight: 9.8kg
Large scalability
Enormous storage capacity
Good for business environments
Bulkier than some other options
Expensive initial cost
Small and medium business environments are where the DiskStation DS3622xs shines. Featuring a powerful Xeon processor and up to 32GB of ECC RAM, its the most powerful system on this list by a fairly long shot, and we also like the fact it has a large number of connectivity options and up to 36 drives if you so wish with a DX1222 expansion unit.
We think this NAS drive would be ideal for resource-intensive tasks, large-scale storage and demanding enterprise environments in business environments that need to scale up their capacity.
06. Buffalo LinkStation LS220D
The easiest NAS drive to use
Drives: 2
CPU: Marvell ARMADA 370 ARM 800MHz single core
Max storage: 2, 4, 6 or 8TB
Weight: 1.7kg
Easy to use
Plug-and-play with hard drive included
Good media server and torrent client built-in
USB 2.0 not 3.0
Not a massive amount of storage
Simplicity is key for some users, and the Buffalo LinkStation LS220D excels in this regard, as well as surprising with some excellent features such as RAID drive mirroring, which gives you extra peace of mind in the event of one drive failing.
It doesnt necessarily have the most capacity, but what it does do well is its plug-and-play setup - combined with a user-friendly interface, it makes it one of the easiest NAS drives to use. While it may lack advanced features, its straightforward approach to file storage and sharing suits those who would look towards a hassle-free operation. There are a number of sizes available, but wed suggest 8TB for most creatives with a decent amount of stuff to back up.
07. Terramaster F2-223
The best value NAS drive for your money
Drives: 2
CPU: Celeron J3355
Max storage: 40TB
Weight: 1.5kg
Good value without too much compromise
Fast enough to work on intensive files
4K video storage
Lower build quality than top models
Online support could be better
See more here:
The best NAS drives in 2023 - Creative Bloq
Call the security Google’s Nest cameras just got a massive … – TechRadar
Google Nest Cams and Nest Doorbells just got a lot more expensive if you want access to all of their security features, with their subscription pricing going up by as much as 33%.
As spotted by 9to5Google, Nest Aware and Nest Aware Plus subscriptions are both getting a price bump that could make you think twice about paying for the benefits they bring, including cloud video histories and intelligent alerts.
In the US, the basic Nest Aware subscription will now cost $8 a month or $80 annually, which is quite a leap from the previous $6 a month / $60 annual pricing (33% to be precise). If you want Nest Aware Plus, which gives you 60 days of video history rather than 30 days, that'll now cost you $15 a month or $150 annually (a 25% price increase).
While Google has only notified its US customers about the price changes so far, they're likely a sign of what's to come in other regions, too. We've asked Google to confirm if this will be the case in the UK and what the new pricing will be. (Update 4/9/23: Google has confirmed that the new UK pricing for Nest Aware will be $6 a month / 60 a year, while Nest Aware Plus will be 12 a month / 120 a year).
The increased pricing goes into effect immediately for new subscribers in the US, with current Nest Aware and Nest Aware Plus customers getting the unwelcome price increase in their next bill from November 6.
Those two subscription services effectively replaced the Nest 1st generation plan back in May 2020, which started at only $5 per month and gave you five days of 24/7 rolling cloud video storage (a feature that's now only available with Nest Aware Plus).
On the plus side, the current Nest Aware and Nest Aware Plus plans do cover all of your Google Nest devices (unlike its previous per-device plans). So if you have multiple cameras or doorbells then one (albeit pricey) subscription will cover all of them.
Update 4/9/23: Google has pointed us towards its Help Centre page for Nest price increase, which confirms that the "price of existing Nest Aware subscriptions will increase in the US, UK, and Australia in Fall 2023".
Subscription price increases aren't exactly new to most tech fans this year, we've seen everything from Netflix and Disney Plus to Spotify and PlayStation Plus all get a lot more expensive.
On the Nest Aware and Nest Aware Plus rises, Google has only vaguely explained that "subscription prices can change to keep up with market shifts, which can include inflation and local tax updates."
But the reality is that cloud storage one of the main benefits of getting a subscription for your security camera has broadly been getting pricier, with Google Cloud becoming a lot more expensive last year. So it was only a matter of time before cloud storage for security cameras was also given a price hike.
Nest Aware and Nest Aware Plus subscriptions were already among the highest of the best home security cameras, with the likes of Ring, Arlo, Blink and now Philips Hue all offering cheaper basic plans alongside their own 'Plus' options. So that could be a factor for buyers to consider in the long-running Ring vs Nest debate.
Still, Google's plans also cover an unlimited number of devices, so they could still be worthwhile if you have or plan to get a large number of Google-made cameras, doorbells, speakers and displays that all support its smart alerts.
More here:
Call the security Google's Nest cameras just got a massive ... - TechRadar
Nutanix’s looming profitability helped by Cisco deal, Broadcom … – Blocks and Files
Interview. Hyper-converged infrastructure supplier Nutanix has good prospects with a newly minted Cisco partnership, current Nvidia relationship, and looming profitability.
Update: Nutanix confirmed it doesnt support GPUDirect but is considering future support. 8 Sept 2023.
The firms software-defined infrastructure software virtualizes on-prem servers and their storage and network connections, as well as running in the public clouds. It creates a hybrid and multi-cloud data platform on which to run applications and is the main alternative to VMware with its vSAN and VMware Cloud Foundation offering.
Broadcoms pending acquisition of VMware has raised doubts about VMwares future development strategy and general situation.Nutanix is poised to capitalize on such customer concerns, has set up a new route to market with Cisco, and has a strong relationship with Nvidia thats relevant to customers looking to develop their AI/ML capabilities. These three factors are combining with Nutanixs improvement in its business operating efficiency and direction to bring profitability and maturity to Nutanix.
We asked Nutanix CEO Rajiv Ramswami some questions about these topics, and edited the overall conversation for readability.
Blocks & Files: I wondered whether you thought Nutanix and Dell were both capitalizing or benefiting from doubts over Broadcoms acquisition of VMware?
Rajiv Ramswami: I would say for that its still early days for us. Weve certainly seen Dell change their positioning from how they used to lead with VxRail before. Not anymore. Now, its much more than PowerFlex. And weve seen that certainly happened over the last year.
Now, I think, we certainly are seeing a lot of interest from customers. Theres no doubt about that. And weve seen some deals starting to close as well and probably some large ones, you know, seven figure ACV deal with a Fortune 500 company this last quarter. But what remains to be seen here is how many of these engagements actually result in a significant transaction for us [versus] using us as leverage to just try to extract more from VMware when it comes to a price negotiation.
Also, many of these customers have signed up for multi release with VMware to protect themselves prior to the Broadcom deal closing. I think long term this is definitely going to be in our favor. We will see more opportunities as a result of this.
Blocks & Files: How do you see Nutanix competing with external storage vendors? My suspicion is that customers decide whether theyre going to use hyperconverged infrastructure, or not, and then acquire separate compute, separate networking, and separate storage. And you come in after theyve made that decision. Is that right?
Rajiv Ramswami: So I would agree on the first part of what you said, but not on the second part. The customers have those two choices. They can stick with traditional three tier systems separate compute, storage, and network or they can go with hyperconverged, but were a big factor in helping them influence that decision.
Its not like they make that decision up front and then they just say OK, I decided to go HCI and then well look at Nutanix. We are an integral part of saying, look at the benefits of one versus the other. In fact, as part of our selling motion is, Hey, we can do this better than three tier. And heres why we can produce a total cost of ownership, we can get a comparable, if not better performance. And many of these storage arrays, we can also provide a platform for hybrid cloud.
So we go into that motion as actually a core selling motion. We influence a customers choice of whether they go with a traditional array, or they try and come to HCI.
Blocks & Files: Do subscription deals like HPEs GreenLake change this?
Rajiv Ramswami: We have actually had deals through HPE and Greenlake, where we are part of that solution as well with hyperconverged. Now, GreenLake to me doesnt change the dynamic of HCI [versus] separate management, compute, storage, network. Its just putting a subscription overlay on top of that. It doesnt really fundamentally change the dynamic of whether you go three tier, or whether you go hyperconverged. In either scenario, you can put it in like an overlay on top of it to consume it in a subscription, pay as you go monthly or annually.
Blocks & Files:Would you be able to position Cisco and Nutanix versus Ciscos HyperFlex offering?
Rajiv Ramswami: Were the market leader when it comes to hyperconverged. And Cisco has tried with their own solutions for quite a while, many years, and the market share data clearly indicates that we are by far the market share leader compared to their market share.
The one thing about Cisco, and I spent many years of my life there, is that they understand what could have been in the market, and they want to be a market leader. They dont want to be a market follower.
So, I think that they made the right decision by saying, look, if we do this, and partner with Nutanix, we can make a lot more in the market, with our customers. And its the right thing for the customers because they are really likely to choose Nutanix. Theyre winning, so why not? Lets make that easier. Lets offer that as a solution. And thats what drove this relationship.
its good for the customer, because now they get to buy a complete solution from Cisco.
Cisco is perfectly complementary to us. They dont have their own storage arrays and stuff like that, right? You can take your best in class from us and best in class from them, put it together and really get a winning solution in the market. So it makes sense for the customer, it makes sense for Cisco, it makes sense for us.
Blocks & Files:Could you give your view on NVidia GPUs in Nutanixs GPT-in-a-Box offering?
Rajiv Ramswami: GPT-in-a-box runs on top of our standard qualified hardware platforms, which are servers with GPUs Nvidia GPUs. As part of this we are virtualizing the GPUs. We are making them accessible. We also support the full immediate GPU asset. And, after getting certified as a partner for Nvidia, its very much an integral part of the offering.
Blocks & Files:When do you see Nutanix becoming profitable in a GAAP sense? In the next 12 months?
Rajiv Ramswami: We are certainly profitable on a non-GAAP basis. And were generating good free cash flow, ten times more free cash flow this year. The next milestone for us clearly is GAAP profitability. And if you look at the primary difference for us between non-GAAP and GAAP, it is stock-based compensation. And we have been working over the last several years to bring down stock-based compensation as a function of revenue.
We ask you to hold that question till our next Investor Day. We have one coming in next month. And thats when we plan to give our investors longer term views in terms of what the output looks like, including GAAP profitability.
Nutanix is in a favorable market situation. The Cisco partnership should bring in a substantial amount of extra business. Broadcom-VMware worries should also increase Nutanix sales over the next few year or so. Its Nvidia partnership should help it ride the AI interest wave and pick up deals from that.
We predict Nutanix will be profitable in the next quarter or two.
Nutanix supports Nvidias vGPU (virtual GPU) software, which virtualizes a GPU. It creates virtual GPUs that can be shared across multiple virtual machines (VM), accessed by any device, anywhere. The vGPU softwareenables multiple VMs to have simultaneous, direct access to a single physical GPU, using the same Nvidia graphics drivers that are deployed on non-virtualized operating systems.
Nvidias GPUDirect enables network adapters and storage drives to directly read and write to/from GPU memory, without passing through a server hosts CPU and memory as is the case with traditional storage I/O. This speeds data transfer between a servers storage (DAS or external SAN/filer) and a GPUs memory.
GPUDirect component technologies GPUDirect Storage, GPUDirect Remote Direct Memory Access (RDMA), GPUDirect Peer to Peer (P2P) and GPUDirect Video are accessed through a set of APIs. The GPUDirect Storage facility enables a direct data path between local or remote storage such as NVMe or NVMe over Fabric (NVMe-oF) and GPU memory. It avoids the making of extra and time-consuming copies by the host CPU in a so-called bounce buffer in the CPUs memory. This enables a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory.
VMwares recently announced Private AI Foundation with Nvidia includes VMwares vSAN Express Storage Architecture, which will provide NVMe storage and supports GPUDirect storage over RDMA, allowing for direct and so faster I/O transfer from storage to GPUs without CPU involvement. Nvidia GPUDirect partners with systems in production are DDN, Dell EMC, HPE, Hitachi, IBM, Kioxia, Liqid, Micron, NetApp, Samsung, ScaleFlux, SuperMicro, VAST Data and Weka. Pure Storage is not listed and neither is Nutanix.
We think that a GPUDirect support capability will be announced by Nutanix in the future. A company spokesperson said: Nutanix is currently evaluating future support forNvidias GPUDirect storage access protocol.
Read the rest here:
Nutanix's looming profitability helped by Cisco deal, Broadcom ... - Blocks and Files
MoodConnect Partners with Storj to Launch Innovative Mental … – AiThority
In a groundbreaking move towards promoting mental well-being,MoodConnect, an Artificial Intelligence-driven mental health tech company, announced its strategic partnership with Storj, the leader in enterprise-grade, globally distributed cloud object storage. This alliance is set to unveil a state-of-the-art mental health tracker designed to help individuals and corporations capture, store, and share sentiment data securely.
Latest Insights:How to Get Started with Prompt Engineering in Generative AI Projects
MoodConnect empowers users to track, store and share their own emotion data from conversations and to securely possess this data for personal introspection, to share with healthcare professionals, with friends and family, or to gauge organizational sentiment within a company.
MoodConnects AI-backed system coupled with Storjs distributed storage ensures robust data protection. Storjs distributed architecture encrypts files end-to-end and distributes them across tens of thousands of points of presence. This zero-trust security model helps protect against outages, ransomware, and data compromise while boosting global performance and reliability. Given the sensitive nature of mental health data, this collaboration reinforces the commitment both companies have towards maintaining user privacy.
Were thrilled to fuse our AI Mental Health tracker with Storjs distributed storage solutions, said MoodConnects CEOAmy Ouzoonian. Our primary aim is to make mental health tracking as commonplace as any health tracking while ensuring users have utmost confidence in the safety of their data.
In its pursuit of expanding mental health tracking, MoodConnect will be procuring large amounts of sensitive data, said Storj CEOBen Golub. Using our enterprise-grade distributed architecture, which encrypts and then splits files into pieces that are stored across tens of thousands of uncorrelated nodes around the globe, MoodConnect can ensure that its users valuable data is secured at the highest level.
Recommended:AiThority Interview with Brigette McInnis-Day, Chief People Officer at UiPath
In addition to this partnership, MoodConnect is welcoming a new member to their team.Robert Scoble, a name synonymous with cutting-edge technology and innovation, joins as AI Advisor. He will be bringing his extensive knowledge in AI/AR/VR to further enhance MoodConnects offerings.
Im excited to be a part of MoodConnects journey, remarked Scoble. Mental health is a crucial aspect of human well-being, and using AI to aid this cause is not just innovative but deeply impactful.
MoodConnects collaboration with Storj and the addition ofRobert Scobleto their team signals a paradigm shift in how mental health can be approached, tracked, and managed in the digital age.
Latest Interview Insights :AiThority Interview with Abhay Parasnis, Founder and CEO at Typeface
[To share your insights with us, please write tosghosh@martechseries.com]
See the original post:
MoodConnect Partners with Storj to Launch Innovative Mental ... - AiThority
Speed of Apache Pinot at the Cost of Cloud Object Storage with … – InfoQ.com
Transcript
Neha Pawar: My name is Neha Pawar. I'm here to tell you about how we added tiered storage for Apache Pinot, enabling the speed of Apache Pinot at the cost of Cloud Object Storage. I'm going to start off by spending some time explaining why we did this. We'll talk about the different kinds of analytics databases, the kinds of data and use cases that they can handle. We'll dive deep into some internals of Apache Pinot. Then we will discuss why it was crucial for us to decrease the cost of Pinot while keeping the speed of Pinot. Finally, we'll talk in depth about how we implemented this.
Let's begin by talking about time, and the value of data. Events are the most valuable when they have just happened. They tell us more about what is true in the world at the moment. The value of an event tends to decline over time, because the world changes and that one event tells us less and less about what is true as time goes on. It's also the case that the recent real-time data tends to be queried more than the historical data. For instance, with recent data, you would build real-time analytics, anomaly detection, user facing analytics.
These are often served directly to the end users of your company, for example, profile view analytics, or article analytics, or restaurant analytics for owners, feed analytics. Now imagine, if you're building such applications, they will typically come with a concurrency of millions of users have to serve about thousands of queries per second, and the SLAs will be stringent in just a few milliseconds. This puts pressure on those queries to be faster. It also justifies more investment in the infrastructure to support those queries.
Since recent events are more valuable, we can in effect, spend more to query them. Historical data is queried less often than real-time data. For instance, with historical data, you would typically build metrics, reporting, dashboards, use it for ad hoc analysis. You may also use it for user facing analytics. In general, your query volume will be much lower and less concurrent than the recent data. What we know for sure about historical data that it is large, and it keeps getting bigger all the time. None of this means that latency somehow becomes unimportant. We will always want our database to be fast. It's just that with historical data, the cost becomes the dominating factor. To summarize, recent data is more valuable and extremely latency sensitive. Historical data is large, and tends to be cost sensitive.
Given you have two such kinds of data that you have to handle, and manage the use cases that come with it, if you are tasked with choosing an analytics infrastructure for your organizations, the considerations on top of your mind are going to be cost, performance, and flexibility. You need systems which will be able to service the different kinds of workloads, while maintaining query and freshness SLAs needed by these use cases. The other aspect is cost. You'll need a solution where the cost of service is reasonable and the business value extracted justifies this cost. Lastly, you want a solution that is easy to operate, to configure, and also one that will fulfill a lot of your requirements together.
Now let's apply this to two categories of analytics databases that exist today. Firstly, the real-time analytics or the OLAP databases. For serving real-time data and user facing analytics, you will typically pick a system like Apache Pinot. There's also some other open source as well as proprietary systems, which can help serve real-time data, such as ClickHouse and Druid. Let's dig a little deeper into what Apache Pinot is. Apache Pinot is a distributed OLAP datastore that can provide ultra-low latency even at extremely high throughput.
It can ingest data from batch sources such as Hadoop, S3, Azure. It can also ingest directly from streaming sources such as Kafka, Kinesis, and so on. Most importantly, it can make this data available for querying in real-time. At the heart of the system is a columnar store, along with a variety of smart indexing and pre-aggregation techniques for low latency. These optimizations make Pinot a great fit for user facing real-time analytics, and even for applications like anomaly detection, dashboarding, and ad hoc data exploration.
Pinot was originally built at LinkedIn, and it powers a wide variety of applications there. If you've ever been on linkedin.com website, there's a high chance you've already interacted with Pinot. Pinot powers LinkedIn's iconic, who viewed my profile application, and many other such applications, such as feed analytics, employee analytics, talent insights, and so on. Across all of LinkedIn, there's over 80 user facing products backed by Pinot and they're serving queries at 250,000 queries per second, while maintaining strict milliseconds and sub-seconds latency SLAs.
Another great example is Uber Eats Restaurant Manager. This is an application created by Uber to provide restaurant owners with their orders data. On this dashboard, you can see sales metrics, missed orders, inaccurate orders in a real-time fashion, along with other things such as top selling menu items, menu item feedback, and so on. As you can imagine, to load this dashboard, we need to execute multiple complex OLAP queries, all executing concurrently. Multiply this with all the restaurant owners across the globe. This leads to several thousands of queries per second for the underlying database.
Another great example of the adoption of Pinot for user facing real-time analytics is at Stripe. There, Pinot is ingesting hundreds of megabytes per second from Kafka, and petabytes of data from S3, and solving queries at 200k queries per second, while maintaining sub-second p99 latency. It's being used to service a variety of use cases, some of them being for financial analysts, we have ledger analytics. Then there's user facing dashboards built for merchants. There's also internal dashboards for engineers and data scientists.
The Apache open-source community is very active. We have over 3000 members now, almost 3500. We've seen adoption from a wide variety of companies in different sectors such as retail, finance, social media, advertising, logistics, and they're all together pushing the boundaries of Pinot in speed, and scale, and features. These are the numbers from one of the largest Pinot clusters today, where we have a million plus events per second, serving queries at 250k queries per second, while maintaining strict milliseconds query latency.
To set some more context for the rest of the talk, let's take a brief look at Pinot's high-level architecture. The first component is the Pinot servers. This is the component that hosts the data and serve queries of the data that they host. Data in Pinot is stored in the form of segments. Segment is a portion of the data which is packed with metadata and dictionaries, indexes in a columnar fashion. Then we have the brokers.
Brokers are the component that gets queries from the clients. They scatter them to the servers. The servers execute these queries for the portion of data that they host. They send the results back to the brokers. Then the brokers do a final merge and return the results back to the client. Finally, we have the controllers that control all the interactions and state of the cluster with the help of Zookeeper as a persistent metadata store, and Helix for state management.
Why is it that Pinot is able to support such real-time low latency milliseconds level queries? One of the main reasons is because they have tightly coupled storage and compute architecture. The compute nodes used typically have a disk or SSD attached to store the data. The disk and SSD could be on local storage, or it could be remote attached like an EBS volume. The reason that they are so fast is because for both of these, the access method is POSIX APIs.
The data is right there, so you can use techniques like m-mapping. As a result, accessing this data is really fast. It can be microseconds if you're using instant storage and milliseconds if you're using a remote attached, say, an EBS volume. One thing to note here, though, is that the storage that we attach in such a model tends to be only available to the single instance to which it is attached. Then, let's assume that this storage has a cost factor of a dollar. What's the problem, then?
Let's see what happens when the data volume starts increasing by a lot. Say you started with just one compute node, which has 2 terabytes of storage. Assume that the monthly cost is $200 for compute, $200 for storage, so $400 in total. Let's say that your data volume grows 5x. To accommodate that, you can't just add only storage, there are limits on how much storage a single instance can be given. Plus, if you're using instant storage, it often just comes pre-configured, and you don't have much control on scaling that storage up or down for that instance. As a result, you have to provision the compute along with it. Cost will be $1,000 for storage and $1000 for compute.
If your data grows 100x, again, that's an increase in both storage and compute. More often than not, you won't need all the compute that you're forcibly adding just to support the storage, as the increasing data volume doesn't necessarily translate to a proportional increase in query workload. You will end up paying for all this extra compute which could remain underutilized. Plus, this type of storage tends to be very expensive compared to some other storage options available, such as cloud object stores. That's because the storage comes with a very high-performance characteristic.
To summarize, in tightly coupled systems, you will have amazing latencies, but as your data volume grows, you will end up with a really high cost to serve. We have lost out on the cost aspect of our triangle of considerations.
Let's look at modern data warehouses, Query Federation technologies like Spark, Presto, and Trino. These saw the problem of combining storage and compute, so they went with a decoupled architecture, wherein they put storage into a cloud object store such as Amazon S3. This is basically the cheapest way you will ever store data. This is going to be as much as one-fifth of the cost of disk or SSD storage.
On the flip side, what were POSIX file system API calls which completed in microseconds, now became network calls, which can take thousands or maybe 10,000 times longer to complete. Naturally, we cannot use this for real-time data. We cannot use this to serve use cases like real-time analytics and user facing analytics. With decoupled systems, we traded off a lot of latency to save on cost, and now we are looking not so good on the performance aspect of our triangle. We have real-time systems that are fast and expensive, and then batch systems that are slow and cheap. What we ideally want is one system that can do both, but without infrastructure that actually supports this, data teams end up adding both systems into their data ecosystem.
They will keep the recent data in the real-time system and set an aggressive retention period so that the costs stay manageable. As the data times out of the real-time database, they'll migrate it to a storage decoupled system to manage the historical archive. With this, we're doing everything twice. We're maintaining two systems, often duplicating data processing logic. With that, we've lost on the flexibility aspect of our triangle.
Could we somehow have a true best of both worlds, where we'll get the speed of a tightly coupled real-time analytics system. We'll be able to use cost effective storage like a traditionally decoupled analytic system. At the same time, have flexibility and simplicity of being able to use just one system and configure it in many ways. With this motivation in mind, we at StarTree set out to build tiered storage for Apache Beam. With tiered storage, your Pinot cluster is now not limited to just use disk or SSD storage. We are no longer strictly tightly coupled.
You can have multiple tiers of storage with support for using a cloud object storage such as S3 as one of the storage tiers. You can configure exactly which portion of your data you want to keep locally, and which is offloaded to the cloud tier storage. One popular way to split data across local versus cloud is by data age. You could configure in your table, something like, I want data less than 30 days to be on disk, and the rest of it, I want it to go on to S3. Users can then query this entire table across the local and remote data like any other Pinot dataset. With this decoupling, you can now store as much data as you want in Pinot, without worrying about the cost. This is super flexible and configurable.
The threshold is dynamic, and can be changed at any point in time, and Pinot will automatically reflect the changes. You can still operate Pinot in fully tightly coupled mode, if you want, or in completely decoupled mode. Or go for a hybrid approach, where some nodes are still dedicated for local data, some nodes for remote data, and so on. To summarize, we saw that with tiered storage in Pinot, we have the flexibility of using a single system for your real-time data, as well as historical data without worrying about the cost spiraling out of control.
We didn't talk much about the third aspect yet, which is performance. Now that we're using cloud object storage, will the query latencies take a hit and enter the range of other decoupled systems? In the next few sections, we're going to go over in great detail how we approached the performance aspect for queries accessing data on the cloud storage. Until tiered storage, Pinot has been assuming that segments stay on the local disk. It memory mapped the segments to access the data quickly. To make things work with remote segments on S3, we extended the query engine to make it agnostic to the segment location.
Under the hood, we plugged in our own buffer implementation, so that during query execution, we can read data from remote store instead of local as needed. Making the queries work is just part of the story. We want to get the best of two worlds using Pinot. From the table, you can see that the latency to access segments on cloud object storage is a lot higher, so hence, we began our pursuit to ensure we can keep the performance of Pinot in an acceptable range, so that people can keep using Pinot for their real-time user facing analytics use cases that they have been used to.
We began thinking about a few questions. Firstly, what is the data that should be read? We certainly don't need to read all of the segments that are available for any query. We may not even need to read all of the data inside a given segment. What exactly should we be reading? Second question was, when and how to read the data during the query execution? Should we wait until the query has been executed and we're actually processing a segment, or should we do some caching? Should we do some prefetching? What smartness can we apply there? In the following slides, I'll try to answer these questions by explaining some of the design choices we made along the way.
The first idea that we explored was lazy loading. This is a popular technique used by some other systems to solve tiered storage. In lazy loading, all of the data segments would be on the remote store to begin with, and each server will have to have some attached storage. When the first query comes in, it will check if the local instance storage has the segments that it needs. If it does not find the segments on there, those will be downloaded from the remote store during the query execution. Your first query will be slow, of course, because it has to download a lot of segments.
The hope is that the next query will need the same segments or most of the segments that you already have, and hence reuse what's already downloaded, making the second query execute very fast. Here, for what to fetch, we have done the entire segment. For when to fetch, we have done during the query execution. In typical OLAP workloads, your data will rarely ever be reusable across queries.
OLAP workloads come with arbitrary slice and dice point lookups across multiple time ranges and multiple user attributes. Which means that more often than not, you won't be able to reuse the downloaded segment for the next query, which means we have to remove them to make space for the segments needed by the new query, because instance storage is going to be limited. This will cause a lot of churn and downloads. Plus, in this approach, you are fetching the whole segment.
Most of the times, your query will not need all of the columns in the segment, so you will end up fetching a lot of excessive data, which is going to be wasteful. Also, using lazy loading, the p99 or p99.9 of the query latency would be very bad, since there will always be some query that needs to download the remote segments. Because of this, lazy loading method was considered as a strict no-go for OLAP bias where consistent low latency is important. Instead of using lazy loading, or similar ideas, like caching segments on local disks, we started to think about how to solve the worst case. That is when the query has to read data from remote segments. Our hope was that by solving this, we can potentially guarantee consistent and predictable, low latency for all queries.
Then, to answer the question of what should we fetch, given that we know we don't want to fetch the whole segment? We decided to take a deeper look at the Pinot segment format, to see if we could use the columnar nature of this database to our advantage. Here's an example of a Pinot segment file. Let's say we have columns like browser, region, country, and then some metric columns like impression, cost, and then our timestamp as well.
In Pinot, the segments are packed in a columnar fashion. One after the other, you're going to see all these columns lined up in this segment file called columns.psf. For each column as well, you will see specific, relevant data buffers. For example, you could have forward indexes, you could have dictionaries, and then some specialized indexes like inverted index, range index, and so on.
This segment format allowed us to be a lot more selective and specific when deciding what we wanted to read from the Pinot segment. We decided we would do a selective columnar fetch, so bringing back this diagram where we have a server and we have some segments in a cloud object store. If you get a query like select sum of impressions, and a filter on the region column, we are only interested in the region and impressions. That's all we'll fetch.
Further, we also know from the query plan, that region is only needed to evaluate a filter, so we probably just need a dictionary and inverted index for that. Once we have the matching rows or impressions, we only need the dictionary and forward index. All other columns can be skipped. We used a range GET API, which is an API provided by S3, to just pull out these portions of the segment that we need: the specific index buffers, region dictionary, region inverted index, impressions for word index, impressions dictionary.
This worked pretty well for us. We were happy at that point with, this is the what to read part. Now that we know what data to read, next, we began thinking about when to read the data. We already saw earlier, that when a Pinot broker gets a query, it scatters the request to the servers, and then each server executes the query. Now in this figure, we are going to see what happens within a Pinot server when it gets a query. First, the server makes a segment execution plan as part of the planning phase. This is where it decides, which are the segments that it needs to process.
Then those segments are processed by multiple threads in parallel. One of the ideas that we explored was to fetch the data from S3, just as we're about to execute this segment. In each of these segment executions, just before we would fetch the data from S3, and only then proceed to executing the query on that segment.
We quickly realized that this is not a great strategy. To demonstrate that, here's a quick example. Let's say you have 40 segments, and parallelism at our disposal on this particular server is 8. That means we would be processing these 40 segments in batches of 8, and that would mean that we are going to do 5 rounds to process all of them. Let's assume that the latency to download data from S3 is 200 milliseconds.
For each batch, we are going to need 200 milliseconds, because as soon as the segment batch begins to get processed, we will first make a round trip to S3 to get that data from that segment. This is quickly going to add up. For each batch, we will need 200 milliseconds, so your total query time is going to be 1000 milliseconds overhead right there. One thing that we observed was that if you check the CPU utilization during this time, most of the time the threads are waiting for the data to become available, and the CPU cores would just stay idle.
Could we somehow decide the segments needed by the query a lot earlier, and then prefetch them so that we can pipeline the IO and data processing as much as possible? That's exactly what we did. During the planning phase itself, we know on the server which segments are going to be needed by this query. In the planning phase itself, we began prefetching all of them. Then, just before the segment execution, the thread would wait for that data to be available, and the prefetch was already kick started. In the best-case scenario, we're already going to have that data prefetched and ready to go.
Let's come back to our example of the 40 segments with the 8 parallelism. In this case, instead of fetching when each batch is about to be executed, we would have launched the prefetch for all the batches in the planning phase itself. That means that maybe the first batch still has to wait 200 milliseconds for the data to be available. While that is being fetched, the data for all the batches is being fetched. For the future batches, you don't have to spend any time waiting, and this would potentially reduce the query latency down to a single round trip of S3. That's just 200 milliseconds overhead.
Taking these two techniques, so far, which is selective columnar fetch and prefetching during data planning with pipelining the fetch and execution, we did a simple benchmark. The benchmark was conducted in a small setup with about 200 gigabytes of data, one Pinot server. The queries were mostly aggregation queries with filters, GROUP BY and ORDER BY. We also included a baseline number with the same data on Presto to reference this with a decoupled architecture. Let's see the numbers. Overall, Pinot with tiered storage was 5 times to 20 times faster than Presto.
How is it that Pinot is able to achieve such blazing fast query latencies compared to other decoupled systems like Presto, even when we change the underlying design to be decoupled storage and compute? Let's take a look at some of the core optimizations used in Pinot which help with that. Bringing back the relevant components of the architecture, we have broker, let's say we have 3 servers, and say that each server has 4 segments. That means we have total 12 segments in this cluster.
When a query is received by the broker, it finds the servers to scatter the query to. In each server, it finds the segments it should process. Within each segment, we process certain number of documents based on the filters, then we aggregate the results on the servers. A final aggregation is done on the broker. At each of these points, we have optimizations to reduce the amount of work done. Firstly, broker side pruning is done to reduce the number of servers that we fan out to. Brokers ensure that they select the smallest subset of servers needed for a query and optimize it further using techniques like smart segment assignment strategies, partitioning, and so on.
Once the query reaches the server, more pruning is done to reduce the number of segments that it has to process on each server. Then within each segment, we scan the segment to get the documents that we need. To reduce the amount of work done and the document scan, we apply filter optimizations like indexes. Finally, we have a bunch of aggregation optimizations to calculate fast aggregations.
Let's talk more about the pruning techniques available in Pinot, and how we're able to use them even when segments have been moved to the tier. We have pruning based on min/max value columns, or partition-based pruning using partition info. Both of these metadata are cached locally, even if the segment is on a remote cloud object store. Using that, we are quickly able to eliminate segments where we won't find the matching data. Another popular technique used in Pinot is Bloom filter-based pruning. These are built per segment.
We can read it to know if a value is absent from a given segment. This one is a lot more effective than the min/max based or partition-based pruning. These techniques really help us a lot because they help us really narrow down the scope of the segments that we need to process. It helps us reduce the amount of data that we are fetching and processing from S3.
Let's take a look at the filter optimizations available in Pinot. All of these are available for use, even if the segment moves to the remote tier. We have inverted indexes where for every unique value, we keep a bitmap of matching doc IDs. We also have classic techniques like sorted index, where the column in question is sorted within the segment, so we can simply keep start and end document ID for the value. We also have range index, which helps us with range predicates such as timestamp greater than, less than, in between.
This query pattern is quite commonly found in user facing dashboards and in real-time anomaly detection. Then we have a JSON index, which is a very powerful index structure. If your data is in semi-structured form, like complex objects, nested JSON. You don't need to invest in preprocessing your data into structured content, you can ingest it as-is, and Pinot will index every field inside your complex JSON, allowing you to query it extremely fast. Then we have the text index for free text search and RegEx b like queries, which helps with log analytics.
Then, geospatial index, so if you're storing geo coordinates, it lets you compute geospatial queries, which can be very useful in applications like orders near you, looking for things that are 10 miles from a given location, and so on. We also have aggregation optimizations such as theta sketches, and HyperLogLog for approximate aggregations. All of these techniques we can continue using, even if the segment is moved on to a cloud object store. This is one of the major reasons why the query latency for Pinot is so much faster than traditionally decoupled storage and compute systems.
While these techniques did help us get better performance than traditionally decoupled systems, when compared to tightly coupled Pinot, which is our true baseline, we could see a clear slowdown. This showed that the two techniques that we implemented in our first version are not enough, they are not effective enough to hide all the data access latency from S3. To learn more from our first version, we stress tested it with a much larger workload.
We put 10 terabytes of data into a Pinot cluster with 2 servers that had a network bandwidth on each server of 1.25 gigabytes per second. Our first finding from the stress test was that the network was saturated very easily and very often. The reason is that, although we tried to reduce the amount of data to read with segment pruning and columnar fetch, we still read a lot of data unnecessarily for those columns, because we fetch the full column in the segment.
Especially, if you have high selectivity filters where you're probably going to need just a few portions from the whole column, this entire columnar fetch is going to be wasteful. Then, this also puts pressure on the resources that we reserve for prefetching all this data. Also, once the network is saturated, all we can do from the system's perspective, is what the instance network bandwidth will allow us. No amount of further parallelism could help us here. On the other hand, we noticed that when network was not saturated, we could have been doing a lot more work in parallel and reducing the sequential round trips we made to S3. Our two main takeaways were, reduce the amount of unnecessary data read, and increase the parallelism even more.
One of the techniques we added for reading less was an advanced configuration to define how to split the data across local versus remote. It doesn't just have to be by data age, you can be super granular and say, I want this specific column to be local, or the specific index of this column to be local, and everything else on cloud storage. With this, you can pin lightweight data structures such as Bloom filters locally onto the instance storage, which is usually a very small fraction of the total storage, and it helps you do fast and effective pruning. Or you can also pin any other index structures that you know we'll be using often.
Another technique we implemented is, instead of doing a whole columnar fetch all the time, we decided that we will just read relevant chunks of the data from the column. For example, bringing back our example from a few slides ago, in this query, when we are applying the region filter, after reading the inverted index, we know that we only need these few documents from the whole impressions column. Maybe we don't need to fetch the full forward index, all we can do is just read small blocks of data during the post filter execution.
With that, our execution plan becomes, during prefetch, only fetch the region.inv_idx. Or the data that we need to read from the impressions column, we will read that on-demand, and only we will read few blocks. We tested out these optimizations on the 10-terabyte data setup. We took three queries of varying selectivity. Then we ran these queries with the old design that had only columnar fetch and prefetching and pipelining, and also with the new design where we have more granular block level fetches instead of full columnar fetch. We saw some amazing reduction in data size compared to our phase one. This data size reduction directly impacted and improved the query latency.
One index that we did not talk about when we walked through the indexes in Pinot is the StarTree index. Unlike other indexes in Pinot, which are columnar, StarTree is a segment level index. It allows us to maintain pre-aggregated values for certain dimension combinations. You can choose exactly which dimensions you want to pre-aggregate, and also how many values you want to pre-aggregate at each level. For example, assume our data has columns, name, environment ID, type, and a metric column value along with a timestamp column.
We decided that we want to create a StarTree index, and only materialize the name and environment ID, and we only want to store the aggregation of sum of value. Also, that we will not keep more than 10 records unaggregated at any stage. This is how our StarTree will look. We will have a root node, which will split into all the values for the name column. In each name column, we will have again a split-up for all the values of environment ID. Finally, at every leaf node, we will store the aggregation value, sum of value.
Effectively, StarTree lets you choose between pre-aggregating everything, and doing everything on the fly. A query like this where we have a filter on name and environment ID and we are looking for sum of value, this is going to be super-fast because it's going to be a single lookup. We didn't even have to pre-aggregate everything for this, nor did we have to compute anything on the fly.
How did we effectively use this in tiered storage, because you can imagine that this index must be pretty big in size compared to other indexes like inverted or Bloom filter, so pinning it locally won't work as that would be space inefficient. Prefetching it on the fly will hurt our query latency a lot. This is where all the techniques that we talked about previously came together. We pinned only the tree structure locally, which is very small, and lightweight.
As for the data at each node and aggregations, we continue to keep them in S3. When we got a query that could use this index, it quickly traversed the locally pinned tree, pointing us to the exact location of the result in the tree, which we could then get with a few very quick lookups on S3. Then we took this for a spin with some very expensive queries, and we saw a dramatic reduction in latency because the amount of data fetched had reduced.
So far, we discussed techniques on how to reduce the amount of data read. Let's talk about one optimization we are currently playing with to increase the parallelism. Bringing back this example where we knew from the inverted index that we'll only need certain rows, and then we only fetch those blocks during the post filter evaluation phase. We build sparse indexes which help us get this information about which exact chunks we would need from this forward index in the planning phase itself.
Knowing this in the planning phase helps because now we're able to identify and begin prefetching these chunks. In the planning phase, while the filter is getting evaluated, these chunks are getting prefetched in parallel so that the post filter phase is going to be much faster.
We saw a lot of techniques that we used in order to build tiered storage, such that we could keep the speed of Pinot, while reducing the cost. I'd like to summarize some of the key takeaways with tiered storage in Apache Pinot:
See more presentations with transcripts
View post:
Speed of Apache Pinot at the Cost of Cloud Object Storage with ... - InfoQ.com
The clash of sustainability and AI is creating a challenge for Dell, IBM and others – CNBC
Andriy Onufriyenko | Moment | Getty Images
Mass adoption of generative artificial intelligence aside, data creation and replication is growing.
Researchers expect its current compound annual growth rate of 23% to continue through 2025. However, because AI is both computing and energy intensive, its presence makes big data even bigger.
All of this could throw a wrench in business leaders' ability to meet their sustainability goals if left unaddressed. Fortunately, companies across the IT spectrum are looking to take on that challenge.
"From our perspective, AI looks like a really big data problem," said Ben Golub, CEO of decentralized cloud storage company Storj. Through a shared network, Storj helps organizations rent out their spare capacity. This startup is joined by legacy enterprises like Dell and IBM in tackling data storage efficiency for economic and environmental sustainability.
Arthur Lewis, president and COO of Dell Technologies' Infrastructure Solutions Group, is helping customers transition from a traditional three-tier data center model to a software-defined (or decentralized) architecture. This transition helps customers optimize their workload in terms of cost, performance, ease of use, and efficiency.
Lewis noted that with a software-defined architecture, "You have the ability to buy what you need as you need it and scale out with your capacity requirements."
The market is roughly split in half, Lewis says, with those adopting decentralized data storage versus those sticking with traditional, consolidated architecture. With a single data center consuming 50,000 homes' worth of electricity (and the cloud at large now exceeding the airline industry's carbon footprint), industry stalwarts and startups are laser focused on solutions.
For 42% of CEOs, environmental sustainability is the top challenge over the next three years, IBM's 2023 CEO study says. Meanwhile, CEOs report facing pressure to adopt generative AI while weighing data management needs to make that adoption successful. For many, cloud storage is part of that equation, influenced by explosive data growth and sustainability goals.
There are numerous solution pathways for the big data problem, but they all strive for the same destination: a more efficient and effective cloud.
Storj and other startups optimizing spare capacity offer one way for companies to tackle their data footprint, which can consume upwards of 40% of a data-intensive company's overall carbon footprint, Golub posits. "It can be really economical to use spare capacity," he said. "That turns out to also be the way you avoid most of the carbon impact."
Meanwhile, Dell seeks to guide customers to a decentralized, software-defined cloud architecture, but is also making the traditional model more efficient in the meantime. They have more EnergyStar certifications than any data storage vendor on the market, Lewis says. "We have built what we call 'design for sustainability' into our offer lifecycle process," he said.
For those that do go decentralized, Lewis says it's still important to maintain what he calls "centralized management of a decentralized architecture." Of course, Dell has the benefit of owning the entire stack, including the underlying compute, allowing them to drive significant efficiencies across the model. Even for companies without that advantage, prioritizing a holistic management model is key.
Another option centers around repurposing unused office buildings that have fallen out of style since the work-from-home revolution. Ermengarde Jabir, Moody's Analytics senior economist, specializes in commercial real estate. In today's growing data economy, that includes data centers.
Jabir has a front-row seat to the increased investment interest in repurposing unused office buildings as data centers. As co-location or shared space data centers are more accessible, these repurposed buildings not only provide more opportunities for decentralized data storage, but also improve physical building security. "Decentralization helps companies secure data because there isn't one specific location," said Jabir.
Whether a company seeks to optimize a traditional three-tier data center model or move towards a decentralized architecture, efficiency in the hardware used also plays a role in sustainability. To help with this, the Open Compute Project developed guidelines for data center circularity, keeping physical equipment in use for longer and reducing waste. The raw materials, assembly, transportation, distribution, product use, disposal and recycling involved in manufacturing storage drives are no small matter.
According to Alan Peacock, general manager of IBM Cloud, organizations face increasing pressure from investors, regulators and clients to reduce their carbon emissions. "As part of any AI transformation roadmap, businesses must consider how to manage the growth of data across cloud and on-premise environments," Peacock said.
IBM recently launched its Cloud Carbon Calculator, an AI-informed dashboard designed to give clients access to standards-based greenhouse gas emissions data and help manage their cloud carbon footprint. This is yet another way for leaders to redirect their data focus to inherently include sustainability goals.
Lewis predicts decentralized data storage will have its moment in the next few years, but the transition is already underway. AI, with its heavy data and computing power, is pushing it further.
"For customers that are looking to reduce their product carbon footprint, they really need to start thinking about how they are moving to more hybrid and connected environments and breaking down those walled gardens of technology," Lewis said.
Golub recommends performing a data audit to help address a company's carbon footprint as well as economic and performance efficiency. Some questions to ask include: What are your workloads? Which workloads are best suited for a decentralized data storage model?
Golub added, "People need to start by saying where am I creating data, what are my goals with that data, and now what is the carbon impact of that data?"
Big data sets like photo and video, medical images, scientific research, large language models and much more all make up a data-intensive environment. As Golub said, "The greenest drive out there is one that never has to be built and the greenest data center out there is one that never has to be created."
Follow this link:
The clash of sustainability and AI is creating a challenge for Dell, IBM and others - CNBC
Quantum Partners for Long-term Retention of Video Surveillance Data – 107.180.56.147
Quantums award-winning Smart NVR Series,VS-HCI Series Appliances, and Unified Surveillance Platform (USP) solutions that capture and store video surveillance data are now certified with Tigers Surveillance Bridge software to easily tier and archive data to public and private storage clouds.These new offerings provide a simple solution to lower the overall cost of storing and managing growing amounts of retained video surveillance data to meet emerging data analytics needs and compliance requirements.Due to the predominance of high-resolution cameras at more and more locations, longer retention times imposed by government regulation and compliance policies, plus the growing use of video analytics, our surveillance customers data continues to grow, explains Choon-Seng Tan, general manager, strategic markets for Quantum.Choon-Seng Tan, Quantum GM of Strategic MarketsWith our Tiger Surveillance partnership, we are extending our video surveillance solutions to include cloud-enabled workflows for greater agility, pay-as-you-grow economics, and easy scalability. In addition, we can more effectively deliver complete end-to-end video surveillance solutions, from ingest to analysis to archive.Video surveillance data must increasingly be archived and retained for evidence, compliance, and business insights like loss prevention. Archiving that data to cloud storage resources can help customers reduce capital expenses, protect against disasters, and more effectively use high-performance storage within their video management systems.Surveillance Bridge transparently and automatically moves less frequently accessed video from local storage to a public or private cloud. Supported public storage cloud providers include AWS, Microsoft, and Google Cloud.
(See a brief overview of how Surveillance Bridge, an NTFS/ReFSfile system filter driver enables organizations to seamlessly integrate cloud storage for disasterrecovery and unlimited storage extension. Courtesy of Tiger Technology and YouTube.)
(Organizations are looking for ways to simplify their physical security environment while lowering costs, reducing security risks, and ensuring flexibility for future growth. The Quantum Unified Surveillance Platform (USP) provides a resilient, flexible, and secure platform for capturing, storing, and managing mission-critical video and hosting other physical security applications, such as access control, visitor management, or security dashboards. Courtesy of Quantum Corp and YouTube.)
(See some highlights of the 2022 ASTORS Homeland Security Awards Ceremony and Banquet Luncheon in New York City during ISC East at the Javits Center. Courtesy of AST and YouTube.)
These experts are from Government at the federal, state, and local levels as well as from private firms allied to the government.
The 2022 CHAMPIONS serves as your Go-To Source through the year for The Best of 2022 Products and Services endorsed by American Security Today and can satisfy your agencys and/or organizations most pressing Homeland Security and Public Safety needs.
See the article here:
Quantum Partners for Long-term Retention of Video Surveillance Data - 107.180.56.147
Citrix ShareFile vulnerability actively exploited (CVE-2023-24489) – Help Net Security
CVE-2023-24489, a critical Citrix ShareFile vulnerability that the company has fixed in June 2023, is being exploited by attackers.
GreyNoise has flagged on Tuesday a sudden spike in IP addresses from which exploitation attempts are coming, and the Cybersecurity and Infrastructure Agency (CISA) has added the vulnerability to its Known Exploited Vulnerabilities Catalog.
Unearthed and reported by Assetnote researcher Dylan Pindur, CVE-2023-24489 affects the popular cloud-based file-sharing application Citrix ShareFile, more specifically its storage zones controller (a .NET web application running under IIS).
You can use the ShareFile-managed cloud storage by itself or in combination with storage that you maintain, called storage zones for ShareFile Data. The storage zones that you maintain can reside in your on-premises single-tenant storage system or in supported third-party cloud storage, Citrix explains.
Storage zones controller allows users to securely access SharePoint sites and network file shares through storage zone connectors, which enable ShareFile client users to browse, upload, or download documents.
In essence, CVE-2023-24489 is a cryptographic bug that may allow unauthenticated attackers to upload files and (ultimately) execute code on and compromise a vulnerable customer-managed installation.
CVE-2023-24489 has been fixed in ShareFile storage zones controller v5.11.24 and later, and customers have been urged to upgrade ever since.
Vulnerabilities in enterprise-grade file-sharing applications are often exploited by attackers, especially the Cl0p cyber extortion gang, who previously targeted organizations using Accellion File Transfer Appliance (FTA) devices, the GoAnywhere MFT platform, and the MOVEit Transfer solution.
The existence of CVE-2023-24489 and of the fix has been publicly revealed in June 2023, but it wasnt until July 4 that Assetnote published additional technical details and a proof-of-concept (PoC) exploit. Other PoCs have been released on GitHub since then, so it was just a matter of time until attackers used them to create working exploits and leverage them.
According to GreyNoises online tracker of exploit activity related to this vulnerability, first signs have been registered on July 25.
There are still no public details about the attacks exploiting the flaw, but CISA has mandated that US Federal Civilian Executive Branch agencies apply patches for it by September 6th, 2023.
Organizations in the private sector should do the same (if they havent already). If youre not sure which storage zones controller youre using, follow these instructions to find out.
Read the original post:
Citrix ShareFile vulnerability actively exploited (CVE-2023-24489) - Help Net Security
Rekordbox is now compatible with Google Drive’s Sync function – We Rave You
Pioneer DJ has unveiled the latest iteration of its Rekordbox DJ software, version 6.7.4, which adds expanded cloud library functionality through integration with Google Drive. This builds on the companys previous platform updates focused on streamlining and simplifying DJ workflow and music management.
|20 Best VST Plugins of 2023 Click here to checkout
Last years 6.6.4 update introduced the ability to analyze complete track libraries direct from Dropbox cloud storage. Now, Rekordbox further expands cloud capabilities by enabling two-way sync with Google Drive. Users can upload music files and playlists to Google Drive for seamless access across devices and locations.
According to Pioneer DJ, this cloud integration makes Rekordbox an even more powerful tool for managing music libraries and preparing DJ sets. The company has progressively augmented Rekordbox from a basic utility software into a versatile platform for streamlining the modern DJ experience, whether in the club or on the move.
Notable recent enhancements include the launch of the Rekordbox iOS app, which enables editing and organizing a collection on mobile, and advanced library analysis features like automatic BPM detection. By partnering with Google Drive, Pioneer DJ aims to provide DJs with more flexibility and convenience to manage and access their music libraries on cloud storage.
More information can be found here.
Next article: Spotify DJ continues to expand launch of beta software globally
Image Credits: Rekordbox
Read this article:
Rekordbox is now compatible with Google Drive's Sync function - We Rave You
Hardware fails, but I’ve never lost data thanks to this backup plan – ZDNet
Olemedia/Getty Images
In May,severaloutletsreported thatReddituserswerecomplainingabout failing SanDisk Extreme SSDs. Subsequently, replacement drives provided by Western Digital, SanDisk's parent company,were also reported to be failing.
The issue affects SanDisk Extreme Portable SSD V2, SanDisk Extreme Pro Portable SSD V2, and WD My Passport SSD products, and appears to be limited to drives manufactured after November 2022.
Western Digital has also released a firmware update to address the issue.
Also: Top network-attached storage devices: Synology, QNAP, Asustor, and more
Data loss is bad. Hardware can be replaced, but data is irreplaceable. Often, the data can still be retrieved by data recovery specialists, but that's a painfully expensive route to travel.
I handle, process, and store quite a lot of data in the form of photos and videos, and -- as a measure of caution -- I've pulled all affected SanDisk and WD drives, irrespective of manufacture date, out of use. (I only had two.)
In working as a pro-am photographer and videographer for many years, and having grown acutely aware of just how sudden and catastrophic data loss can be, I have developed a workflow that limits my exposure to this risk. This workflow relies on the fact that storage is relatively cheap.
When handling data, I work by the principle of "two is one, one is none, and three is best." What do I mean? If I have two copies of something, and remember they need to be on separate devices, not two copies on your laptop, and one fails, I still have one. If I have one copy and that goes bye-bye, well, I have none. And just to be on the safe side, I prefer to have three copies of everything, spread across different storage devices.
When I'm capturing photos and video, I normally copy the data off the storage cards onto both a laptop and an external drive. (If there's not enough space on the laptop, I'll copy it onto two external drives.)
Also: The top cloud storage services
I also keep the original data on the storage cards for as long as possible before formatting them and reusing them. This is why I prefer having a lot of smaller SD and microSD cards (in the 64GB to 256GB range), rather than a couple of huge 1TB cards.
To move the data off of storage cards onto my laptop and external drives I use a program called Carbon Copy Cloner by Bombich Software. I've been using this software for many years, and it's absolutely packed with features that make copying data from one place to another as fast and reliable as possible.
I'll also move a copy of the raw data onto cloud storage as soon as possible. (Again, I'm minimizing the chances of total loss.)
Also:The other shoe finally dropped on my Google Enterprise cloud storage plan
Note that's just the raw capture data.
Once I start to edit, I like to do something similar. For editing, I make another copy of the data onto a storage drive I use for editing, and I have that backed up to a separate drive using Carbon Copy Cloner on a regular basis, while again also making cloud backups.
You're probably wondering what drives I use. I have a mix of external SSDs and HDDs from a variety of manufacturers. No, I don't buy several drives of one brand from one maker, because I know from past experience that issues like the one plaguing SanDisk can happen.
Currently, I'm using drives from Samsung, Crucial, and OWC. I also back up to Synology NAS boxes.
For cloud storage, I use Backblaze, Dropbox, and also have storage with Amazon.
Also:How I recovered 'irreplaceable' photos off an SD card for free
Another thing that I do is rotate storage drives and storage media every few years. Because I don't want to be using storage media that is five years old, I generally pull them out of service after three years. (Tip: Write the date you started to use something on the device.)
I'll be the first to admit that this all adds up to a fair bit of extra workload, hassle, and expense, but following this regimen has kept my data safe. Yes, I've had drives fail, and that was very annoying. But making sure that there are always multiple copies of my data on multiple devices means I've never lost data as a result of those failures.
Two is one, one is none, and three is best!
Visit link:
Hardware fails, but I've never lost data thanks to this backup plan - ZDNet