Quantcast
Viewing all articles
Browse latest Browse all 5932

Raw notes from the Storage Developer Conference 2015 (SNIA SDC 2015)

Notes and disclaimers:

  • This blog post contains raw notes from one of the SNIA’s SDC 2015 presentations (SNIA’s Storage Developers Conference 2015)
  • These notes were typed during the talks and they may include typos and my own misinterpretations.
  • Text in the bullets under each talk are quotes from the speaker or text from the speaker slides, not my personal opinion.
  • If you are the presenter and you feel that I misquoted you or badly represented the content of a talk, please add a comment to the post.
  • I spent limited time fixing typos or correcting the text after the event. Just so many hours in a day...
  • I have not attended all sessions (since there are many being delivered at a time, that would actually not be possible :-)…
  • SNIA usually posts the actual PDF decks a few weeks after the event. Attendees have access immediately.
  • You can find the event agenda at http://www.snia.org/events/storage-developer/agenda

Next Generation Data Centers: Hyperconverged Architectures Impact On Storage
Mark OConnell, Distinguished Engineer, EMC

  • History: Client/Server –> shared SANs –> Scale-Out systems
  • >> Scale-Out systems: architecture, expansion, balancing
  • >> Evolution of the application platform: physical servers à virtualization à Virtualized application farm
  • >> Virtualized application farms and Storage: local storage à Shared Storage (SAN) à Scale-Out Storage à Hyper-converged
  • >> Early hyper-converged systems: HDFS (Hadoop) à JVM/Tasks/HDFS in every node
  • Effects of hyper-converged systems
  • >> Elasticity (compute/storage density varies)
  • >> App management, containers, app frameworks
  • >> Storage provisioning: frameworks (openstack swift/cinder/manila), pure service architectures
  • >> Hybrid cloud enablement. Apps as self-describing bundles. Storage as a dynamically bound service. Enables movement off-prem.

Understanding the Intel/Micron 3D XPoint Memory
Jim Handy, General Director, Objective Analysis

  • Memory analyst, SSD analyst, blogs: http://thememoryguy.com, http://thessdguy.com
  • Not much information available since the announcement in July: http://newsroom.intel.com/docs/DOC-6713
  • Agenda: What? Why? Who? Is the world ready for it? Should I care? When?
  • What: Picture of the 3D XPoint concept (pronounced 3d-cross-point). Micron’s photograph of “the real thing”.
  • Intel has researched PCM for 45 years. Mentioned in an Intel article at “Electronics” in Sep 28, 1970.
  • The many elements that have been tried shown in the periodic table of elements.
  • NAND laid the path to the increased hierarchy levels. Showed prices of DRAM/NAND from 2001 to 2015. Gap is now 20x.
  • Comparing bandwidth to price per gigabytes for different storage technologies: Tape, HDD, SSD, 3D XPoint, DRAM, L3, L2, L1
  • Intel diagram mentions PCM-based DIMMs (far memory) and DDR DIMMs (near memory).
  • Chart with latency for HDD SAS/SATA, SSD SAS/SATA, SSD NVMe, 3D XPoint NVMe – how much of it is the media, how much is the software stack?
  • 3D Xpoint’s place in the memory/storage hierarchy. IOPS x Access time. DRAM, 3D XPoint (Optane), NVMe SSD, SATA SSD
  • Great gains at low queue depth. 800GB SSD read IOPS using 16GB die. IOPS x queue depth of NAND vs. 3D XPoint.
  • Economic benefits: measuring $/write IOPS for SAS HDD, SATA SSD, PCIe SSD, 3D XPoint
  • Timing is good because: DRAM is running out of speed, NVDIMMs are catching on, some sysadmins understand how to use flash to reduce DRAM needs
  • Timing is bad because: Nobody can make it economically, no software supports SCM (storage class memory), new layers take time to establish Why should I care: better cost/perf ratio, lower power consumption (less DRAM, more perf/server, lower OpEx), in-memory DB starts to make sense
  • When? Micron slide projects 3D XPoint at end of FY17 (two months ahead of CY). Same slide shows NAND production surpassing DRAM production in FY17.
  • Comparing average price per GB compared to the number of GB shipped over time. It takes a lot of shipments to lower price.
  • Looking at the impact in the DRAM industry if this actually happens. DRAM slows down dramatically starting in FY17, as 3D XPoint revenues increase (optimistic).

Implications of Emerging Storage Technologies on Massive Scale Simulation Based Visual Effects
Yahya H. Mirza, CEO/CTO, Aclectic Systems Inc

  • Steve Jobs quote: "You‘ve got to start with the customer experience and work back toward the technology".
  • Problem 1: Improve customer experience. Higher resolution, frame rate, throughput, etc.
  • Problem 2: Production cost continues to rise.
  • Problem 3: Time to render single frame remains constant.
  • Problem 4: Render farm power and cooling increasing. Coherent shared memory model.
  • How do you reduce customer CapEx/OpEx. Low efficiency: 30% CPU. Prooblem is memory access latency and I/O.
  • Production workflow: modeling, animation/simulation/shading, lighting, rendering, compositing. More and more simulation.
  • Concrete production experiment: 2005. Story boards. Attempt to create a short film. Putting himself in the customer’s shoes. Shot decomposition.
  • Real 3-minute short costs $2 million. Animatic to pitch the project.
  • Character modeling and development. Includes flesh and muscle simulation. A lot of it done procedurally.
  • Looking at Disney’s “Big Hero 6”, DreamWorks’ “Puss in Boots” and Weta’s “The Hobbit”, including simulation costs, frame rate, resolution, size of files, etc.
  • Physically based rendering: global illumination effects, reflection, shadows. Comes down to light transport simulation, physically based materials description.
  • Exemplary VFX shot pipeline. VFX Tool (Houdini/Maya), Voxelized Geometry (OpenVDB), Scene description (Alembic), Simulation Engine (PhysBam), Simulation Farm (RenderFarm), Simulation Output (OpenVDB), Rendering Engine (Mantra), Render Farm (RenderFarm), Output format (OpenEXR), Compositor (Flame), Long-term storage.
  • One example: smoke simulation – reference model smoke/fire VFX. Complicated physical model. Hotspot algorithms: monte-carlo integration, ray-intersection test, linear algebra solver (multigrid).
  • Storage implications. Compute storage (scene data, simulation data), Long term storage.
  • Is public cloud computing viable for high-end VFX?
  • Disney’s data center. 55K cores across 4 geos.
  • Vertically integrated systems are going to be more and more important. FPGAs, ARM-based servers.
  • Aclectic Colossus smoke demo. Showing 256x256x256.
  • We don’t want coherency; we don’t want sharing. Excited about Intel OmniPath.
  • http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html

How Did Human Cells Build a Storage Engine?
Sanjay Joshi, CTO Life Sciences, EMC

  • Human cell, Nuclear DNA, Transcription and Translation, DNA Structure
  • The data structure: [char(3*10^9) human_genome] strand
  • 3 gigabases [(3*10^9)*2]/8 = ~750MB. With overlaps, ~1GB per cell. 15-70 trillion cells.
  • Actual files used to store genome are bigger, between 10GB and 4TB (includes lots of redundancy).
  • Genome sequencing will surpass all other data types by 2040
  • Protein coding portion is just a small portion of it. There’s a lot we don’t understand.
  • Nuclear DNA: Is it a file? Flat file system, distributed, asynchronous. Search header, interpret, compile, execute.
  • Nuclear DNA properties: Large:~20K genes/cell, Dynamic: append/overwrite/truncate, Semantics: strict, Consistent: No, Metadata: fixed, View: one-to-many
  • Mitochondrial DNA: Object? Distributed hash table, a ring with 32 partitions. Constant across generations.
  • Mitochondrial DNA: Small: ~40 genes/cell, Static: constancy, energy functions, Semantics: single origin, Consistent: Yes, Metadata: system based, View: one-to-one
  • File versus object. Comparing Nuclear DNA and Mitochondrial DNA characteristics.
  • The human body: 7500 names parts, 206 regularly occurring bones (newborns close to 300), ~640 skeletal muscles (320 pairs), 60+ organs, 37 trillion cells. Distributed cluster.
  • Mapping the ISO 7 layers to this system. Picture.
  • Finite state machine: max 10^45 states at 4*10^53 state-changes/sec. 10^24 NOPS (nucleotide ops per second) across biosphere.
  • Consensus in cell biology: Safety: under all conditions: apoptosis. Availability: billions of replicate copies. Not timing dependent: asynchronous. Command completion: 10 base errors in every 10,000 protein translation (10 AA/sec).
  • Object vs. file. Object: Maternal, Static, Haploid. Small, Simple, Energy, Early. File: Maternal and paternal, Diploid. Scalable, Dynamic, Complex. All cells are female first.

Move Objects to LTFS Tape Using HTTP Web Service Interface
Matt Starr, Chief Technical Officer, Spectra Logic
Jeff Braunstein, Developer Evangelist, Spectra Logic

  • Worldwide data growth: 2009 = 800 EB, 2015 = 6.5ZB, 2020 = 35ZB
  • Genomics. 6 cows = 1TB of data. They keep it forever.
  • Video data. SD to Full HD to 4K UHD (4.2TB per hours) to 8K UHD. Also kept forever.
  • Intel slide on the Internet minute. 90% of the people of the world never took a picture with anything but a camera phone.
  • IOT - Total digital info create or replicated.
  • $1000 genome scan take 780MB fully compressed. 2011 HiSeq-2000 scanner generates 20TB per month. Typical camera generates 105GB/day.
  • More and more examples.
  • Tape storage is the lowest cost. But it’s also complex to deploy. Comparing to Public and Private cloud…
  • Pitfalls of public cloud – chart of $/PB/day. OpEx per PB/day reaches very high for public cloud.
  • Risk of public cloud: Amazon has 1 trillion objects. If they lose 1% it would 10 billion objects.
  • Risk of public cloud: Nirvanix. VC pulled the plug in September 2013.
  • Cloud: Good: toolkits, naturally WAN friendly, user expectation: put it away.
  • What if: Combine S3/Object with tape. Spectra S3 – Front end is REST, backend is LTFS tape.
  • Cost: $.09/GB. 7.2PB. Potentially a $0.20 two-copy archive.
  • Automated: App or user-built. Semi-Automated: NFI or scripting.
  • Information available at https://developer.spectralogic.com
  • All the tools you need to get started. Including simulator of the front end (BlackPearl) in a VM.
  • S3 commands, plus data to write sequentially in bulk fashion.
  • Configure user for access, buckets.
  • Deep storage browser (source code on GitHub) allows you to browse the simulated storage.
  • SDK available in Java, C#, many others. Includes integration with Visual Studio (demonstrated).
  • Showing sample application. 4 lines of code from the SDK to move a folder to tape storage.
  • Q: Access times when not cached? Hours or minutes. Depends on if the tape is already in the drive. You can ask to pull those to cache, set priorities. By default GET has higher priority than PUT. 28TB or 56TB of cache.
  • Q: Can we use CIFS/NFS? Yes, there is an NFI (Network File Interface) using CIFS/NFS, which talks to the cache machine. Manages time-outs.
  • Q: Any protection against this being used as disk? System monitors health of the tape. Using an object-based interface helps.
  • Q: Can you stage a file for some time, like 24h? There is a large cache. But there are no guarantees on the latency. Keeping it on cache is more like Glacier. What’s the trigger to bring the data?
  • Q: Glacier? Considering support for it. Data policy to move to lower cost, move it back (takes time). Not a lot of product or customers demanding it. S3 has become the standard, not sure if Glacier will be that for archive.
  • Q: Drives are a precious resource. How do you handle overload? By default, reads have precedence over writes. Writes usually can wait.

Taxonomy of Differential Compression
Liwei Ren, Scientific Adviser, Trend Micro

  • Mathematical model for describing file differences
  • Lossless data compression categories: data compression (one file), differential compression (two files), data deduplication (multiple files)
  • Purposes: network data transfer acceleration and storage space reduction
  • Areas for DC – mobile phones’ firmware over the air, incremental update of files for security software, file synchronization and transfer over WAN, executable files
  • Math model – Diff procedure: Delta = T – R, Merge procedure: T = R + Delta. Model for reduced network bandwidth, reduced storage cost.
  • Applications: backup, revision control system, patch management, firmware over the air, malware signature update, file sync and transfer, distributed file system, cloud data migration
  • Diff model. Two operations: COPY (source address, size [, destination address] ), ADD (data block, size [, destination address] )
  • How to create the delta? How to encode the delta into a file? How to create the right sequence of COPY/ADD operations?
  • Top task is an effective algorithm to identify common blocks. Not covering it here, since it would take more than half an hour…
  • Modeling a diff package. Example.
  • How do you measure the efficiency of an algorithm? You need a cost model.
  • Categorizing: Local DC - LDC (xdelta, zdelta, bsdiff), Remote DC - RDC (rsync, RDC protocol, tsync), Iterative – IDC (proposed)
  • Categorizing: Not-in-place merging: general files (xdelta, zdelta, bsdiff), executable files (bsdiff, courgette)
  • Categorizing: In place merging: firmware as general files (FOTA), firmware as executable files (FOTA)
  • Topics in depth: LDC vs RDC vs IDC for general files
  • Topics in depth: LDC for executable files
  • Topics in depth: LDC for in-place merging
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 5932

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>