CHIPP – CSCS F2F Meeting Zürich, August 19th 2014 Sadaf Alam George Brown Miguel Gila Gianni Ricciardi
Statistics – Swiss resources • Compute (8x nodes, ~3.5k HS06) – Deployed and fully operational. – Pending to increase priority of Swiss users. •
Operations © CSCS 2014 - HPC operations 11
Operations • Deployed Virtualization servers, evaluating oVirt and/or RHEV • Reserved 4 full nodes for ATLAS, 4 for CMS and 2 for LHCb – A manual f
Operations Next maintenance on Sept 3 or 17. Significant changes: • Swiss users mapping and priority – need to define specific mappings for CMSch a
Operations GPFS2 (https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceGPFS2) • First production service fully configured using Puppet • Observed pe
Plans © CSCS 2014 - HPC operations 15
Phases of Phoenix © CSCS 2014 - HPC operations 16 2012 2013 2014 2015 Phase H Phase J Phase K Phase F+G Now
Pledges © CSCS 2014 - HPC operations 17 Phase Compute power actual/pledged [HS06] Storage actual/pledged [TB] Scratch actual/desired [GB/s] Ph
Decommissions & purchases Purchases • Storage for a total of 720TiB – intended to replace 3x half-racks of IBM DC3500 • 20x compute nodes (~8.5
Thank you for your attention © CSCS 2014 - HPC operations 19
Agenda • 9:45 - Coffee, presentation and agenda • 10:15 - Tier-2 status and plans – CSCS (40') – UNIBE-LHEP (20') • 11:15 - Tier-3 stat
Extra slides © CSCS 2014 - HPC operations 20
NetApp problems (Swiss users storage) • Initial tests ran on the storage were successful. • When the system was put in production under heavy I/O, p
GPFS issues • Metadata inode exhaustion – Due to several identified problems, inodes were exhausted on metadata servers. – This caused the whole cl
dCache issues • Information system not properly handling this • dCache did provided an official fix for this on release 2.6.31. – We run 2.6.27
Swiss National Argus service • 3 KVM VMs on 3 different KVM hosts. • Load-balanced with a common DNS alias: argus.lcg.cscs.ch – Similar to current
Tier 2 status and plans CSCS © CSCS 2014 - HPC operations 3
Status © CSCS 2014 - HPC operations 4
Statistics – Availability & Reliability • Relatively stable operation with small hiccups: – GPFS: inode usage above threshold and IB cable broke
Statistics – CPU Usage • CPU usage increased (specially during July) © CSCS 2014 - HPC operations 6 $%&)$%&4$%&5$%&6$%&"$%
Statistics – CPU Usage from EGI perspective • Computation hours restored to previous values over past months: • There is still a mismatch between l
Statistics – CPU Usage from EGI perspective (extra) © CSCS 2014 - HPC operations 8 • Total computation hours (HS06) (SUM)
Statistics – Storage usage © CSCS 2014 - HPC operations 9 !5$#'&))7#(&'6!#!&!7'#)&75#'&'5)#(&'7
Comentarios a estos manuales