Introducing DataHog: A Novel New App to Analyze Stored Files
The CyVerse Data Store allocates users 100GB of free data storage space, enough to fill roughly 50,000 Stephen King-sized novels. With that much space, it’s easy to lose track of individual files, an old project, or that one script you know you put somewhere – and when you don't know where your data are, it's hard to manage them.
Thankfully, the one-of-a-kind, aptly-named, DataHog application helps you see just what you’re doing with all that storage. Designed by software engineer Chris Klimowski of Data7 – the University of Arizona’s Data Science Institute – the app allows users to locate files across different sources, from HPC to laptops, cloud storage, departmental servers, and more.
Users can see all their files across sources in one place, without having to repeatedly search for them – an important functionality for groups cooperating on a project, whose members' files may be stored in different places.
“When you're storing your data remotely, it's hard to get a big-picture view of how that storage is being used,” said Klimowski. “DataHog was developed to help solve that problem.”
- View iRODS, Amazon S3, and local files all in one place
- Discover statistics about your storage space usage
- Search and filter files by size, date, or pattern
- Find duplicate or redundant files across different locations
DataHog gives you a succinct summary of how much space you’re using, and the types of files that are taking up that space. You can view files all in one place, even if they’re stored in multiple locations, and you can tell how much space duplicated files occupy.
Users can browse through files and take advantage of advanced search options to find files in specific date or size ranges.
In addition to supporting iRODS and Amazon S3, the app can be extended to work with any file system using a python script that is included with the app.