summaryrefslogtreecommitdiff
path: root/site/search/search_index.json
diff options
context:
space:
mode:
authorOwen Jacobson <owen@grimoire.ca>2020-04-20 22:35:02 -0400
committerOwen Jacobson <owen@grimoire.ca>2020-04-20 22:35:02 -0400
commit34bb538e1efd861d323f2ea25ca5e38a47587513 (patch)
tree5c36f877afb6811f3b4b045ee78d326f5886246d /site/search/search_index.json
parenta474c78bed6ba2c4005107454e7b97413c4c26ef (diff)
parent0e886efee7a7b7b6a34f04d176aa319c8a1ec5b7 (diff)
Merge branch 'pull/2'
Diffstat (limited to 'site/search/search_index.json')
-rw-r--r--site/search/search_index.json2
1 files changed, 1 insertions, 1 deletions
diff --git a/site/search/search_index.json b/site/search/search_index.json
index 2b19d1a..7ac5672 100644
--- a/site/search/search_index.json
+++ b/site/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Owen Jacobson \u00b6 Hire Me . I've been a professional software developer since the early 2000s and an enthusiastic amateur even longer, and a manager of developers since 2019. I'm also deeply interested in organizational dynamics and group consensus: software, like ourselves, lives in a society, and both serves the needs of and serves to help shape that society. Code . I program computers. I have done so all of my adult life, and expect to do so as long as I can string concepts together. Like many lifelong programmers, I periodically write up interesting things I've developed, collaborated on, or run across. My larger projects are on Github . Papers of Note . Computer science and development-adjacent papers and academic works I encourage people to read. Gossamer . In 2014, long before Mastodon was in any kind of widespread use, I sketched out an idea for a fully-distributed status sharing network based on Twitter, but without the weakness of the Twitter, Inc. corporation. I've preserved the writeup here, as it's an excellent case study in how blindness to social violence can lead to dangerous software design. Gossamer should never be implemented , because it would put vulnerable users at extreme risk . In 2020, with Mastodon well established and the shape of distributed status networks much more widely understood, a friend pushed me to revisit the idea . The best way to contact me is by email , but I'm present in many places . If you prefer that your mail not be read by others, my GPG key fingerprint is 77BDC4F16EFD607E85AAB63950232991F10DFFD0.","title":"Owen Jacobson"},{"location":"#owen-jacobson","text":"Hire Me . I've been a professional software developer since the early 2000s and an enthusiastic amateur even longer, and a manager of developers since 2019. I'm also deeply interested in organizational dynamics and group consensus: software, like ourselves, lives in a society, and both serves the needs of and serves to help shape that society. Code . I program computers. I have done so all of my adult life, and expect to do so as long as I can string concepts together. Like many lifelong programmers, I periodically write up interesting things I've developed, collaborated on, or run across. My larger projects are on Github . Papers of Note . Computer science and development-adjacent papers and academic works I encourage people to read. Gossamer . In 2014, long before Mastodon was in any kind of widespread use, I sketched out an idea for a fully-distributed status sharing network based on Twitter, but without the weakness of the Twitter, Inc. corporation. I've preserved the writeup here, as it's an excellent case study in how blindness to social violence can lead to dangerous software design. Gossamer should never be implemented , because it would put vulnerable users at extreme risk . In 2020, with Mastodon well established and the shape of distributed status networks much more widely understood, a friend pushed me to revisit the idea . The best way to contact me is by email , but I'm present in many places . If you prefer that your mail not be read by others, my GPG key fingerprint is 77BDC4F16EFD607E85AAB63950232991F10DFFD0.","title":"Owen Jacobson"},{"location":"hire-me/","text":"Hire Me \u00b6 I'm always interested in hearing from people and organizations that I can help, whether that means coming in for a few days to talk about end-to-end testing or joining your organization full-time to help turn an idea into reality. I live in and around Toronto. I am more than happy to work remotely, and I can probably help your organization learn to integrate remote work if it doesn't already know how. For Fun \u00b6 I regularly mentor people new to programming, teaching them how to craft working systems. This is less about teaching people to write code and more about teaching them why we care about source control, how to think about configuration, how to and why to automate testing, and how to think about software systems and data flow at a higher level. I strongly believe that software development needs a formal apprenticeship program, and mentoring has done a lot to validate that belief. Heroku/Salesforce (2015-Present) \u00b6 In my time with Heroku (and with Salesforce, Heroku's parent organization), I've contributed to the operation of services that let developers bring their ideas to life on the internet, both as a developer and as a manager. I've been involved in maintaining and expanding existing features, exploring and developing new products, and in cultivating my peers and my team as people and as developers. As an engineering manager, I've been responsible for building and supporting an effective, unified team. Moving into management was motivated by a desire to act as a force multiplier, which I've brought to life through coaching, process management, facilitating ongoing discussions about the direction and health of the team, and through actively being involved in my reports' progress as developers. As a lead developer, I worked on the Heroku build system , which ingests code from end users and deploys that code to applications running on the Heroku platform. As part of that work, we implemented a number of features to control abuse, support language-specific features and needs, and to develop new ways to deploy code to Heroku. FreshBooks (2009-2014) \u00b6 During the five years I was with the company, it grew from a 20-person one-room organization to a healthy, growing two-hundred-person technology company. As an early employee, I had my hand in many, many projects and helped the development team absorb the massive cultural changes that come with growth, while also building a SaaS product that let others realize their dreams. Some highlights: As the lead database administrator-slash-developer, I worked with the entire development team to balance concerns about reliability and availability with ensuring new ideas and incremental improvements could be executed without massive bureaucracy and at low risk. This extended into diverse parts of the company: alongside the operations team, I handled capacity planning, reliability, outage planning, and performance monitoring, while with the development team, I was responsible for designing processes and deploying tools to ease testing of database changes and ensuring smooth, predictable, and low-effort deployment to production and for training developers to make the best use of MySQL for their projects. As a tools developer, I built the Sparkplug framework to standardize the tools and processes for building message-driven applications, allowing the team to move away from monolithic web applications towards a more event-driven suite of interal systems. Providing a standard framework paid off well; building and deploying completely novel event handlers for FreshBooks\u2019 core systems could be completed in as little as a week, including testing and production provisioning. As an ops-ish toolsmith, I worked extensively on configuration management for both applications and the underlying servers. I lead a number of projects to reduce the risk around deployments: creating a standard development VM to ensure developers had an environment consistent with reality, automating packaging and rollout to testing servers, automating the creation of testing servers, and more. As part of this work, I built training materials and ran sessions to teach other developers how to think like a sysadmin, covering Linux, Puppet, virtualization, and other topics. Riptown Media (2006-2009) \u00b6 Riptown Media was an software development company tasked with building and maintaining a suite of gambling systems for a single client. I was brought on board as a Java developer, and rapidly expanded my role to encompass other fields. As the primary developer for poker-room back office and anti-fraud tools, I worked with the customer support and business intelligence teams to better understand their daily needs and frustrations, so that I could turn those into meaningful improvements to their tools and processes. These improvements, in turn, lead to measurable changes in the frequency and length of customer support calls, in fraud rates, and in the percieved value of internal customer intelligence. As a lead developer, my team put together the server half of an in-house casino gaming platform. We worked in tight collaboration with the client team, in-house and third-party testers, and interaction designers, and delivered our first game in under six months. Our platform was meant to reduce our reliance on third-party \u201cwhite label\u201d games vendors; internally, it was a success. Our game received zero customer-reported defects during its initial run. OSI Geospatial (2004-2006) \u00b6 At OSI Geospatial, I lead the development of a target-tracking and battlespace awareness overlay as part of a suite of operational theatre tools. In 2004, the state of the art for web-based geomatics software was not up to the task; this ended up being a custom server written in C++ and making heavy use of PostgreSQL and PostGIS for its inner workings. Contact Me \u00b6 You can get in touch by email at owen@grimoire.ca. I'd love to hear from you.","title":"Hire Me"},{"location":"hire-me/#hire-me","text":"I'm always interested in hearing from people and organizations that I can help, whether that means coming in for a few days to talk about end-to-end testing or joining your organization full-time to help turn an idea into reality. I live in and around Toronto. I am more than happy to work remotely, and I can probably help your organization learn to integrate remote work if it doesn't already know how.","title":"Hire Me"},{"location":"hire-me/#for-fun","text":"I regularly mentor people new to programming, teaching them how to craft working systems. This is less about teaching people to write code and more about teaching them why we care about source control, how to think about configuration, how to and why to automate testing, and how to think about software systems and data flow at a higher level. I strongly believe that software development needs a formal apprenticeship program, and mentoring has done a lot to validate that belief.","title":"For Fun"},{"location":"hire-me/#herokusalesforce-2015-present","text":"In my time with Heroku (and with Salesforce, Heroku's parent organization), I've contributed to the operation of services that let developers bring their ideas to life on the internet, both as a developer and as a manager. I've been involved in maintaining and expanding existing features, exploring and developing new products, and in cultivating my peers and my team as people and as developers. As an engineering manager, I've been responsible for building and supporting an effective, unified team. Moving into management was motivated by a desire to act as a force multiplier, which I've brought to life through coaching, process management, facilitating ongoing discussions about the direction and health of the team, and through actively being involved in my reports' progress as developers. As a lead developer, I worked on the Heroku build system , which ingests code from end users and deploys that code to applications running on the Heroku platform. As part of that work, we implemented a number of features to control abuse, support language-specific features and needs, and to develop new ways to deploy code to Heroku.","title":"Heroku/Salesforce (2015-Present)"},{"location":"hire-me/#freshbooks-2009-2014","text":"During the five years I was with the company, it grew from a 20-person one-room organization to a healthy, growing two-hundred-person technology company. As an early employee, I had my hand in many, many projects and helped the development team absorb the massive cultural changes that come with growth, while also building a SaaS product that let others realize their dreams. Some highlights: As the lead database administrator-slash-developer, I worked with the entire development team to balance concerns about reliability and availability with ensuring new ideas and incremental improvements could be executed without massive bureaucracy and at low risk. This extended into diverse parts of the company: alongside the operations team, I handled capacity planning, reliability, outage planning, and performance monitoring, while with the development team, I was responsible for designing processes and deploying tools to ease testing of database changes and ensuring smooth, predictable, and low-effort deployment to production and for training developers to make the best use of MySQL for their projects. As a tools developer, I built the Sparkplug framework to standardize the tools and processes for building message-driven applications, allowing the team to move away from monolithic web applications towards a more event-driven suite of interal systems. Providing a standard framework paid off well; building and deploying completely novel event handlers for FreshBooks\u2019 core systems could be completed in as little as a week, including testing and production provisioning. As an ops-ish toolsmith, I worked extensively on configuration management for both applications and the underlying servers. I lead a number of projects to reduce the risk around deployments: creating a standard development VM to ensure developers had an environment consistent with reality, automating packaging and rollout to testing servers, automating the creation of testing servers, and more. As part of this work, I built training materials and ran sessions to teach other developers how to think like a sysadmin, covering Linux, Puppet, virtualization, and other topics.","title":"FreshBooks (2009-2014)"},{"location":"hire-me/#riptown-media-2006-2009","text":"Riptown Media was an software development company tasked with building and maintaining a suite of gambling systems for a single client. I was brought on board as a Java developer, and rapidly expanded my role to encompass other fields. As the primary developer for poker-room back office and anti-fraud tools, I worked with the customer support and business intelligence teams to better understand their daily needs and frustrations, so that I could turn those into meaningful improvements to their tools and processes. These improvements, in turn, lead to measurable changes in the frequency and length of customer support calls, in fraud rates, and in the percieved value of internal customer intelligence. As a lead developer, my team put together the server half of an in-house casino gaming platform. We worked in tight collaboration with the client team, in-house and third-party testers, and interaction designers, and delivered our first game in under six months. Our platform was meant to reduce our reliance on third-party \u201cwhite label\u201d games vendors; internally, it was a success. Our game received zero customer-reported defects during its initial run.","title":"Riptown Media (2006-2009)"},{"location":"hire-me/#osi-geospatial-2004-2006","text":"At OSI Geospatial, I lead the development of a target-tracking and battlespace awareness overlay as part of a suite of operational theatre tools. In 2004, the state of the art for web-based geomatics software was not up to the task; this ended up being a custom server written in C++ and making heavy use of PostgreSQL and PostGIS for its inner workings.","title":"OSI Geospatial (2004-2006)"},{"location":"hire-me/#contact-me","text":"You can get in touch by email at owen@grimoire.ca. I'd love to hear from you.","title":"Contact Me"},{"location":"papers/","text":"Papers of Note \u00b6 Perlman, Radia (1985). \u201c An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN \u201d. ACM SIGCOMM Computer Communication Review. 15 (4): 44\u201353. doi:10.1145/318951.319004. The related Algorhyme , also by Perlman. Guy Lewis Steele, Jr.. \u201c Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO \u201d. MIT AI Lab. AI Lab Memo AIM-443. October 1977. What Every Computer Scientist Should Know About Floating-Point Arithmetic , by David Goldberg, published in the March, 1991 issue of Computing Surveys. Copyright 1991, Association for Computing Machinery, Inc. RFC 1925 . Regular Expression Matching Can Be Simple And Fast , Russ Cox's empirical research into degenerate cases in common regular expression implementations and a proposed implementation based on Thomson's NFA construction. The above-cited Thomson NFA paper on regular expressions. The Eight Fallacies of Distributed Computing . HAKMEM is another good one. It's dense but rewarding. Kahan, William (January 1965), \u201c Further remarks on reducing truncation errors \u201d, Communications of the ACM, 8 (1): 40, doi:10.1145/363707.363723","title":"Papers of Note"},{"location":"papers/#papers-of-note","text":"Perlman, Radia (1985). \u201c An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN \u201d. ACM SIGCOMM Computer Communication Review. 15 (4): 44\u201353. doi:10.1145/318951.319004. The related Algorhyme , also by Perlman. Guy Lewis Steele, Jr.. \u201c Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO \u201d. MIT AI Lab. AI Lab Memo AIM-443. October 1977. What Every Computer Scientist Should Know About Floating-Point Arithmetic , by David Goldberg, published in the March, 1991 issue of Computing Surveys. Copyright 1991, Association for Computing Machinery, Inc. RFC 1925 . Regular Expression Matching Can Be Simple And Fast , Russ Cox's empirical research into degenerate cases in common regular expression implementations and a proposed implementation based on Thomson's NFA construction. The above-cited Thomson NFA paper on regular expressions. The Eight Fallacies of Distributed Computing . HAKMEM is another good one. It's dense but rewarding. Kahan, William (January 1965), \u201c Further remarks on reducing truncation errors \u201d, Communications of the ACM, 8 (1): 40, doi:10.1145/363707.363723","title":"Papers of Note"},{"location":"code/","text":"Code \u00b6 Pieces of code and code-adjacent work, with or without exposition, that don't quite fit into the library ecosystem, but which I enjoyed writing. A Users, Roles & Privileges Scheme Using Graphs \u2014 An SQL schema and associated queries for handling permissions when roles can nest arbitrarily. Configuring Browser Apps \u2014 Notes on the available techniques for delivering runtime configuration to code running in a user's browser, and the tradeoffs involved. Writing Good Commit Messages \u2014 A style guide. Some collected advice about Git \u2014 Not the source control tool we want, but definitely the source control tool we've got, and I think we should make the best of it. I also maintain a Github account for more substantial projects.","title":"Code"},{"location":"code/#code","text":"Pieces of code and code-adjacent work, with or without exposition, that don't quite fit into the library ecosystem, but which I enjoyed writing. A Users, Roles & Privileges Scheme Using Graphs \u2014 An SQL schema and associated queries for handling permissions when roles can nest arbitrarily. Configuring Browser Apps \u2014 Notes on the available techniques for delivering runtime configuration to code running in a user's browser, and the tradeoffs involved. Writing Good Commit Messages \u2014 A style guide. Some collected advice about Git \u2014 Not the source control tool we want, but definitely the source control tool we've got, and I think we should make the best of it. I also maintain a Github account for more substantial projects.","title":"Code"},{"location":"code/commit-messages/","text":"Writing Good Commit Messages \u00b6 Rule zero: \u201cgood\u201d is defined by the standards of the project you're on. Have a look at what the existing messages look like, and try to emulate that first before doing anything else. Having said that, here are some principles I've found helpful and broadly applicable. Treat the first line of the message as a one-sentence summary. Most SCM systems have an \u201coverview\u201d command that shows shortened commit messages in bulk, so making the very beginning of the message meaningful helps make those modes more useful for finding specific commits. It's okay for this to be a \u201cwhat\u201d description if the rest of the message is a \u201cwhy\u201d description. Fill out the rest of the message with prose outlining why you made the change. Don't reiterate the contents of the change in great detail if you can avoid it: anyone who needs that can read the diff themselves, or reach out to ask for help understanding the change. A good rationale sets context for the problem being solved and addresses the ways the proposed change alters that context. If you use an issue tracker (and you should), include whatever issue-linking notes it supports right at the start of the message, where it'll be visible even in summarized commit logs. If your tracker has absurdly long issue-linking syntax, or doesn't support issue links in commits at all, include a short issue identifier at the front of the message and put the long part somewhere out of the way, such as on a line of its own at the end of the message. If you need rich commit messages (links, lists, and so on), pick one markup language and stick with it. It'll be easier to write useful commit formatters if you only have to deal with one syntax, rather than four. Personally, I use Markdown when I can, or a reduced subset of Markdown, as it's something most developers I interact with will be at least passing familiar with.","title":"Writing Good Commit Messages"},{"location":"code/commit-messages/#writing-good-commit-messages","text":"Rule zero: \u201cgood\u201d is defined by the standards of the project you're on. Have a look at what the existing messages look like, and try to emulate that first before doing anything else. Having said that, here are some principles I've found helpful and broadly applicable. Treat the first line of the message as a one-sentence summary. Most SCM systems have an \u201coverview\u201d command that shows shortened commit messages in bulk, so making the very beginning of the message meaningful helps make those modes more useful for finding specific commits. It's okay for this to be a \u201cwhat\u201d description if the rest of the message is a \u201cwhy\u201d description. Fill out the rest of the message with prose outlining why you made the change. Don't reiterate the contents of the change in great detail if you can avoid it: anyone who needs that can read the diff themselves, or reach out to ask for help understanding the change. A good rationale sets context for the problem being solved and addresses the ways the proposed change alters that context. If you use an issue tracker (and you should), include whatever issue-linking notes it supports right at the start of the message, where it'll be visible even in summarized commit logs. If your tracker has absurdly long issue-linking syntax, or doesn't support issue links in commits at all, include a short issue identifier at the front of the message and put the long part somewhere out of the way, such as on a line of its own at the end of the message. If you need rich commit messages (links, lists, and so on), pick one markup language and stick with it. It'll be easier to write useful commit formatters if you only have to deal with one syntax, rather than four. Personally, I use Markdown when I can, or a reduced subset of Markdown, as it's something most developers I interact with will be at least passing familiar with.","title":"Writing Good Commit Messages"},{"location":"code/configuring-browser-apps/","text":"Configuring Browser Apps \u00b6 I've found myself in he unexpected situation of having to write a lot of browser apps/single page apps this year. I have some thoughts on configuration. Why Bother \u00b6 Centralize environment-dependent facts to simplify management & testing Make it easy to manage app secrets. @wlonk adds: \u201cSecrets\u201d? What this means in a browser app is a bit different. Which is unpleasantly true. In a freestanding browser app, a \u201csecret\u201d is only as secret as your users and their network connections choose to make it, i.e., not very secret at all. Maybe that should read \u201cmake it easy to manage app tokens and identities ,\u201d instead. Keep config data & API tokens out of app's source control Integration point for external config sources (Aerobatic, Heroku, etc) The forces described in 12 Factor App: Dependencies and, to a lesser extent, 12 Factor App: Configuration apply just as well to web client apps as they do to freestanding services. What Gets Configured \u00b6 Yes: Base URLs of backend services Tokens and client IDs for various APIs No: \u201cEnvironments\u201d (sorry, Ember folks - I know Ember thought this through carefully, but whole-env configs make it easy to miss settings in prod or test, and encourage patterns like \u201call devs use the same backends\u201d) Delivering Configuration \u00b6 There are a few ways to get configuration into the app. Globals \u00b6 <head> <script>window.appConfig = { \"FOO_URL\": \"https://foo.example.com/\", \"FOO_TOKEN\": \"my-super-secret-token\" };</script> <script src=\"/your/app.js\"></script> </head> Easy to consume: it's just globals, so window.appConfig.foo will read them. This requires some discipline to use well. Have to generate a script to set them. This can be a <script>window.appConfig = {some json}</script> tag or a standalone config script loaded with <script src=\"/config.js\"> Generating config scripts sets a minimum level of complexity for the deployment process: you either need a server to generate the script at request time, or a preprocessing step at deployment time. It's code generation, which is easy to do badly. I had originally proposed using JSON.stringify to generate a Javascript object literal, but this fails for any config values with </script> in them. That may be an unlikely edge case, but that only makes it a nastier trap for administrators. There are more edge cases . I strongly suspect that a hazard-free implementation requires a full-blown JS source generator. I had a look at building something out of escodegen and estemplate , but escodegen 's node version doesn't generate browser-safe code , so string literals with </script> or </head> in them still break the page, and converting javascript values into parse trees to feed to estemplate is some seriously tedious code. Data Attributes and Link Elements \u00b6 <head> <link rel=\"foo-url\" href=\"https://foo.example.com/\"> <script src=\"/your/app.js\" data-foo-token=\"my-super-secret-token\"></script> </head> Flat values only. This is probably a good thing in the grand, since flat configurations are easier to reason about and much easier to document, but it makes namespacing trickier than it needs to be for groups of related config values (URL + token for a single service, for example). Have to generate the DOM to set them. This is only practical given server-side templates or DOM rendering. You can't do this with bare nginx, unless you pre-generate pages at deployment time. Config API Endpoint \u00b6 fetch('/config') /* {\"FOO_URL\": \u2026, \"FOO_TOKEN\": \u2026} */ .then(response => response.json()) .then(json => someConfigurableService); Works even with \u201cdumb\u201d servers (nginx, CloudFront) as the endpoint can be a generated JSON file on disk. If you can generate files, you can generate a JSON endpoint. Requires an additional request to fetch the configuration, and logic for injecting config data into all the relevant configurable places in the code. This request can't happen until all the app code has loaded. It's very tempting to write the config to a global. This produces some hilarious race conditions. Cookies \u00b6 See for example clientconfig : var config = require('clientconfig'); Easy to consume given the right tools; tricky to do right from scratch. Requires server-side support to send the correct cookie. Some servers will allow you to generate the right cookie once and store it in a config file; others will need custom logic, which means (effectively) you need an app server. Cookies persist and get re-sent on subsequent requests, even if the server stops delivering config cookies. Client code has to manage the cookie lifecycle carefully (clientconfig does this automatically) Size limits constrain how much configuration you can do.","title":"Configuring Browser Apps"},{"location":"code/configuring-browser-apps/#configuring-browser-apps","text":"I've found myself in he unexpected situation of having to write a lot of browser apps/single page apps this year. I have some thoughts on configuration.","title":"Configuring Browser Apps"},{"location":"code/configuring-browser-apps/#why-bother","text":"Centralize environment-dependent facts to simplify management & testing Make it easy to manage app secrets. @wlonk adds: \u201cSecrets\u201d? What this means in a browser app is a bit different. Which is unpleasantly true. In a freestanding browser app, a \u201csecret\u201d is only as secret as your users and their network connections choose to make it, i.e., not very secret at all. Maybe that should read \u201cmake it easy to manage app tokens and identities ,\u201d instead. Keep config data & API tokens out of app's source control Integration point for external config sources (Aerobatic, Heroku, etc) The forces described in 12 Factor App: Dependencies and, to a lesser extent, 12 Factor App: Configuration apply just as well to web client apps as they do to freestanding services.","title":"Why Bother"},{"location":"code/configuring-browser-apps/#what-gets-configured","text":"Yes: Base URLs of backend services Tokens and client IDs for various APIs No: \u201cEnvironments\u201d (sorry, Ember folks - I know Ember thought this through carefully, but whole-env configs make it easy to miss settings in prod or test, and encourage patterns like \u201call devs use the same backends\u201d)","title":"What Gets Configured"},{"location":"code/configuring-browser-apps/#delivering-configuration","text":"There are a few ways to get configuration into the app.","title":"Delivering Configuration"},{"location":"code/configuring-browser-apps/#globals","text":"<head> <script>window.appConfig = { \"FOO_URL\": \"https://foo.example.com/\", \"FOO_TOKEN\": \"my-super-secret-token\" };</script> <script src=\"/your/app.js\"></script> </head> Easy to consume: it's just globals, so window.appConfig.foo will read them. This requires some discipline to use well. Have to generate a script to set them. This can be a <script>window.appConfig = {some json}</script> tag or a standalone config script loaded with <script src=\"/config.js\"> Generating config scripts sets a minimum level of complexity for the deployment process: you either need a server to generate the script at request time, or a preprocessing step at deployment time. It's code generation, which is easy to do badly. I had originally proposed using JSON.stringify to generate a Javascript object literal, but this fails for any config values with </script> in them. That may be an unlikely edge case, but that only makes it a nastier trap for administrators. There are more edge cases . I strongly suspect that a hazard-free implementation requires a full-blown JS source generator. I had a look at building something out of escodegen and estemplate , but escodegen 's node version doesn't generate browser-safe code , so string literals with </script> or </head> in them still break the page, and converting javascript values into parse trees to feed to estemplate is some seriously tedious code.","title":"Globals"},{"location":"code/configuring-browser-apps/#data-attributes-and-link-elements","text":"<head> <link rel=\"foo-url\" href=\"https://foo.example.com/\"> <script src=\"/your/app.js\" data-foo-token=\"my-super-secret-token\"></script> </head> Flat values only. This is probably a good thing in the grand, since flat configurations are easier to reason about and much easier to document, but it makes namespacing trickier than it needs to be for groups of related config values (URL + token for a single service, for example). Have to generate the DOM to set them. This is only practical given server-side templates or DOM rendering. You can't do this with bare nginx, unless you pre-generate pages at deployment time.","title":"Data Attributes and Link Elements"},{"location":"code/configuring-browser-apps/#config-api-endpoint","text":"fetch('/config') /* {\"FOO_URL\": \u2026, \"FOO_TOKEN\": \u2026} */ .then(response => response.json()) .then(json => someConfigurableService); Works even with \u201cdumb\u201d servers (nginx, CloudFront) as the endpoint can be a generated JSON file on disk. If you can generate files, you can generate a JSON endpoint. Requires an additional request to fetch the configuration, and logic for injecting config data into all the relevant configurable places in the code. This request can't happen until all the app code has loaded. It's very tempting to write the config to a global. This produces some hilarious race conditions.","title":"Config API Endpoint"},{"location":"code/configuring-browser-apps/#cookies","text":"See for example clientconfig : var config = require('clientconfig'); Easy to consume given the right tools; tricky to do right from scratch. Requires server-side support to send the correct cookie. Some servers will allow you to generate the right cookie once and store it in a config file; others will need custom logic, which means (effectively) you need an app server. Cookies persist and get re-sent on subsequent requests, even if the server stops delivering config cookies. Client code has to manage the cookie lifecycle carefully (clientconfig does this automatically) Size limits constrain how much configuration you can do.","title":"Cookies"},{"location":"code/users-rolegraph-privs/","text":"A Users, Roles & Privileges Scheme Using Graphs \u00b6 The basic elements: Every agent that can interact with a system is represented by a user . Every capability the system has is authorized by a distinct privilege . Each user has a list of zero or more roles . Roles can imply further roles. This relationship is transitive: if role A implies role B, then a member of role A is a member of role B; if role B also implies role C, then a member of role A is also a member of role C. It helps if the resulting role graph is acyclic, but it's not necessary. Roles can grant privileges. A user's privileges are the union of the privileges granted by the transitive closure of their roles. create table \"user\" ( username varchar primary key -- credentials &c ); create table role ( name varchar primary key ); create table role_member ( role varchar not null references role, member varchar not null references \"user\", primary key (role, member) ); create table role_implies ( role varchar not null references role, implied_role varchar not null ); create table privilege ( privilege varchar primary key ); create table role_grants ( role varchar not null references role, privilege varchar not null references privilege, primary key (role, privilege) ); If your database supports recursive CTEs, this schema can be queried in one shot, since we can have the database do all the graph-walking along roles: with recursive user_roles (role) AS ( select role from role_member where member = 'SOME USERNAME' union select implied_role as role from user_roles join role_implies on user_roles.role = role_implies.role ) select distinct role_grants.privilege as privilege from user_roles join role_grants on user_roles.role = role_grants.role order by privilege; If not, you'll need to pull the entire graph into memory and manipulate it there: this schema doesn't give you any easy handles to identify only the roles transitively included in the role of interest, and repeatedly querying for each step of the graph requires an IO roundtrip at each step, burning whole milliseconds along the way. Realistic use cases should have fairly simple graphs: elemental privileges are grouped into concrete roles, which are in turn grouped into abstracted roles (by department, for example), which are in turn granted to users. If the average user is in tens of roles and has hundreds of privileges, the entire dataset fits in memory, and PostgreSQL performs well. In PostgreSQL, the above schema handles ~10k privileges and ~10k roles with randomly-generated graph relationships in around 100ms on my laptop, which is pretty slow but not intolerable. Perverse cases (interconnected total subgraphs, deeply-nested linear graphs) can take absurd time but do not reflect any likely permissions scheme.","title":"A Users, Roles & Privileges Scheme Using Graphs"},{"location":"code/users-rolegraph-privs/#a-users-roles-privileges-scheme-using-graphs","text":"The basic elements: Every agent that can interact with a system is represented by a user . Every capability the system has is authorized by a distinct privilege . Each user has a list of zero or more roles . Roles can imply further roles. This relationship is transitive: if role A implies role B, then a member of role A is a member of role B; if role B also implies role C, then a member of role A is also a member of role C. It helps if the resulting role graph is acyclic, but it's not necessary. Roles can grant privileges. A user's privileges are the union of the privileges granted by the transitive closure of their roles. create table \"user\" ( username varchar primary key -- credentials &c ); create table role ( name varchar primary key ); create table role_member ( role varchar not null references role, member varchar not null references \"user\", primary key (role, member) ); create table role_implies ( role varchar not null references role, implied_role varchar not null ); create table privilege ( privilege varchar primary key ); create table role_grants ( role varchar not null references role, privilege varchar not null references privilege, primary key (role, privilege) ); If your database supports recursive CTEs, this schema can be queried in one shot, since we can have the database do all the graph-walking along roles: with recursive user_roles (role) AS ( select role from role_member where member = 'SOME USERNAME' union select implied_role as role from user_roles join role_implies on user_roles.role = role_implies.role ) select distinct role_grants.privilege as privilege from user_roles join role_grants on user_roles.role = role_grants.role order by privilege; If not, you'll need to pull the entire graph into memory and manipulate it there: this schema doesn't give you any easy handles to identify only the roles transitively included in the role of interest, and repeatedly querying for each step of the graph requires an IO roundtrip at each step, burning whole milliseconds along the way. Realistic use cases should have fairly simple graphs: elemental privileges are grouped into concrete roles, which are in turn grouped into abstracted roles (by department, for example), which are in turn granted to users. If the average user is in tens of roles and has hundreds of privileges, the entire dataset fits in memory, and PostgreSQL performs well. In PostgreSQL, the above schema handles ~10k privileges and ~10k roles with randomly-generated graph relationships in around 100ms on my laptop, which is pretty slow but not intolerable. Perverse cases (interconnected total subgraphs, deeply-nested linear graphs) can take absurd time but do not reflect any likely permissions scheme.","title":"A Users, Roles &amp; Privileges Scheme Using Graphs"},{"location":"git/","text":"Collected Advice about Git \u00b6 git-config Settings You Want \u2014 Git is highly configurable, and the defaults have gotten drastically better over the years, but there are still some non-default behaviours that I've found make life better. Notes Towards Detached Signatures in Git \u2014 An idea I had, but never fully developed, for implementing after-the-fact object signing on top of Git. This was based on a similar feature in Monotone, which I'd found very effective for annotating commits on the fly. Life With Pull Requests \u2014 Some notes I made while getting up to speed with pull requests to help my team come to grips with the workflows. Git Is Not Magic \u2014 An exploration of Git's on-disk data structures and the design choices taken very early in Git's existence. Stop using git pull for deployment! \u2014 Describing the least-painful way to use Git as a deployment tool I had worked out, circa 2014. Written in an aversarial style as a response to repeated \u201dwhy don't we just\u201ds that, while well-intentioned, came from an incomplete understanding of what git pull does. Git Survival Guide \u2014 Some words of caution about Git, git 's preferred workflows, and various recoverable mistakes.","title":"Collected Advice about Git"},{"location":"git/#collected-advice-about-git","text":"git-config Settings You Want \u2014 Git is highly configurable, and the defaults have gotten drastically better over the years, but there are still some non-default behaviours that I've found make life better. Notes Towards Detached Signatures in Git \u2014 An idea I had, but never fully developed, for implementing after-the-fact object signing on top of Git. This was based on a similar feature in Monotone, which I'd found very effective for annotating commits on the fly. Life With Pull Requests \u2014 Some notes I made while getting up to speed with pull requests to help my team come to grips with the workflows. Git Is Not Magic \u2014 An exploration of Git's on-disk data structures and the design choices taken very early in Git's existence. Stop using git pull for deployment! \u2014 Describing the least-painful way to use Git as a deployment tool I had worked out, circa 2014. Written in an aversarial style as a response to repeated \u201dwhy don't we just\u201ds that, while well-intentioned, came from an incomplete understanding of what git pull does. Git Survival Guide \u2014 Some words of caution about Git, git 's preferred workflows, and various recoverable mistakes.","title":"Collected Advice about Git"},{"location":"git/config/","text":"git-config Settings You Want \u00b6 Git comes with some fairly lkml -specific configuration defaults. You should fix this. All of the items below can be set either for your entire login account ( git config --global ) or for a specific repository ( git config ). Full documentation is under git help config , unless otherwise stated. git config user.name 'Your Full Name' and git config user.email 'your-email@example.com' , obviously. Git will remind you about this if you forget. git config merge.defaultToUpstream true - causes an unqualified git merge to merge the current branch's configured upstream branch, rather than being an error. This makes git merge much more consistent with git rebase , and as the two tools fill very similar workflow niches, it's nice to have them behave similarly. git config rebase.autosquash true - causes git rebase -i to parse magic comments created by git commit --squash=some-hash and git commit --fixup=some-hash and reorder the commit list before presenting it for further editing. See the descriptions of \u201csquash\u201d and \u201cfixup\u201d in git help rebase for details; autosquash makes amending commits other than the most recent easier and less error-prone. git config branch.autosetupmerge always - newly-created branches whose start point is a branch ( git checkout master -b some-feature , git branch some-feature origin/develop , and so on) will be configured to have the start point branch as their upstream. By default (with true rather than always ) this only happens when the start point is a remote-tracking branch. git config rerere.enabled true - enable \u201creuse recorded resolution.\u201d The git help rerere docs explain it pretty well, but the short version is that git can record how you resolve conflicts during a \u201ctest\u201d merge and reuse the same approach when resolving the same conflict later, in a \u201creal\u201d merge. For advanced users \u00b6 A few things are nice when you're getting started, but become annoying when you no longer need them. git config advice.detachedHead - if you already understand the difference between having a branch checked out and having a commit checked out, and already understand what \u201cdetatched head\u201d means, the warning on every git checkout ...some detatched thing... isn't helping anyone. This is also useful repositories used for deployment, where specific commits (from tags, for example) are regularly checked out.","title":"git-config Settings You Want"},{"location":"git/config/#git-config-settings-you-want","text":"Git comes with some fairly lkml -specific configuration defaults. You should fix this. All of the items below can be set either for your entire login account ( git config --global ) or for a specific repository ( git config ). Full documentation is under git help config , unless otherwise stated. git config user.name 'Your Full Name' and git config user.email 'your-email@example.com' , obviously. Git will remind you about this if you forget. git config merge.defaultToUpstream true - causes an unqualified git merge to merge the current branch's configured upstream branch, rather than being an error. This makes git merge much more consistent with git rebase , and as the two tools fill very similar workflow niches, it's nice to have them behave similarly. git config rebase.autosquash true - causes git rebase -i to parse magic comments created by git commit --squash=some-hash and git commit --fixup=some-hash and reorder the commit list before presenting it for further editing. See the descriptions of \u201csquash\u201d and \u201cfixup\u201d in git help rebase for details; autosquash makes amending commits other than the most recent easier and less error-prone. git config branch.autosetupmerge always - newly-created branches whose start point is a branch ( git checkout master -b some-feature , git branch some-feature origin/develop , and so on) will be configured to have the start point branch as their upstream. By default (with true rather than always ) this only happens when the start point is a remote-tracking branch. git config rerere.enabled true - enable \u201creuse recorded resolution.\u201d The git help rerere docs explain it pretty well, but the short version is that git can record how you resolve conflicts during a \u201ctest\u201d merge and reuse the same approach when resolving the same conflict later, in a \u201creal\u201d merge.","title":"git-config Settings You Want"},{"location":"git/config/#for-advanced-users","text":"A few things are nice when you're getting started, but become annoying when you no longer need them. git config advice.detachedHead - if you already understand the difference between having a branch checked out and having a commit checked out, and already understand what \u201cdetatched head\u201d means, the warning on every git checkout ...some detatched thing... isn't helping anyone. This is also useful repositories used for deployment, where specific commits (from tags, for example) are regularly checked out.","title":"For advanced users"},{"location":"git/detached-sigs/","text":"Notes Towards Detached Signatures in Git \u00b6 Git supports a limited form of object authentication: specific object categories in Git's internal model can have GPG signatures embedded in them, allowing the authorship of the objects to be verified using GPG's underlying trust model. Tag signatures can be used to verify the authenticity and integrity of the snapshot associated with a tag , and the authenticity of the tag itself, filling a niche broadly similar to code signing in binary distribution systems. Commit signatures can be used to verify the authenticity of the snapshot associated with the commit , and the authorship of the commit itself. (Conventionally, commit signatures are assumed to also authenticate either the entire line of history leading to a commit, or the diff between the commit and its first parent, or both.) Git's existing system has some tradeoffs. Signatures are embedded within the objects they sign. The signature is part of the object's identity; since Git is content-addressed, this means that an object can neither be retroactively signed nor retroactively stripped of its signature without modifying the object's identity. Git's distributed model means that these sorts of identity changes are both complicated and easily detected. Commit signatures are second-class citizens. They're a relatively recent addition to the Git suite, and both the implementation and the social conventions around them continue to evolve. Only some objects can be signed. While Git has relatively weak rules about workflow, the signature system assumes you're using one of Git's more widespread workflows by limiting your options to at most one signature, and by restricting signatures to tags and commits (leaving out blobs, trees, and refs). I believe it would be useful from an authentication standpoint to add \"detached\" signatures to Git, to allow users to make these tradeoffs differently if desired. These signatures would be stored as separate (blob) objects in a dedicated refs namespace, supporting retroactive signatures, multiple signatures for a given object, \"policy\" signatures, and authentication of arbitrary objects. The following notes are partially guided by Git's one existing \"detached metadata\" facility, git notes . Similarities are intentional; divergences will be noted where appropriate. Detached signatures are meant to interoperate with existing Git workflow as much as possible: in particular, they can be fetched and pushed like any other bit of Git metadata. A detached signature cryptographically binds three facts together into an assertion whose authenticity can be checked by anyone with access to the signatory's keys: An object (in the Git sense; a commit, tag, tree, or blob), A policy label, and A signatory (a person or agent making the assertion). These assertions can be published separately from or in tandem with the objects they apply to. Policies \u00b6 Taking a hint from Monotone, every signature includes a \"policy\" identifying how the signature is meant to be interpreted. Policies are arbitrary strings; their meaning is entirely defined by tooling and convention, not by this draft. This draft uses a single policy, author , for its examples. A signature under the author policy implies that the signatory had a hand in the authorship of the designated object. (This is compatible with existing interpretations of signed tags and commits.) (Authorship under this model is strictly self-attested: you can claim authorship of anything, and you cannot assert anyone else's authorship.) The Monotone documentation suggests a number of other useful policies related to testing and release status, automated build results, and numerous other factors. Use your imagination. What's In A Signature \u00b6 Detached signatures cover the disk representation of an object, as given by git cat-file <TYPE> <SHA1> For most of Git's object types, this means that the signed content is plain text. For tree objects, the signed content is the awful binary representation of the tree, not the pretty representation given by git ls-tree or git show . Detached signatures include the \"policy\" identifier in the signed content, to prevent others from tampering with policy choices via refs hackery. (This will make more sense momentarily.) The policy identifier is prepended to the signed content, terminated by a zero byte (as with Git's own type identifiers, but without a length field as length checks are performed by signing and again when the signature is stored in Git). To generate the complete signable version of an object, use something equivalent to the following shell snippet: # generate-signable POLICY TYPE SHA1 function generate-signable() { printf '%s\\0' \"$1\" git cat-file \"$2\" \"$3\" } (In the process of writing this, I discovered how hard it is to get Unix's C-derived shell tools to emit a zero byte.) Signature Storage and Naming \u00b6 We assume that a userid will sign an object at most once. Each signature is stored in an independent blob object in the repository it applies to. The signature object (described above) is stored in Git, and its hash recorded in refs/signatures/<POLICY>/<SUBJECT SHA1>/<SIGNER KEY FINGERPRINT> . # sign POLICY TYPE SHA1 FINGERPRINT function sign() { local SIG_HASH=$( generate-signable \"$@\" | gpg --batch --no-tty --sign -u \"$4\" | git hash-object --stdin -w -t blob ) git update-ref \"refs/signatures/$1/$3/$4\" } Stored signatures always use the complete fingerprint to identify keys, to minimize the risk of colliding key IDs while avoiding the need to store full keys in the refs naming hierarchy. The policy name can be reliably extracted from the ref, as the trailing part has a fixed length (in both path segments and bytes) and each ref begins with a fixed, constant prefix refs/signatures/ . Signature Verification \u00b6 Given a signature ref as described above, we can verify and authenticate the signature and bind it to the associated object and policy by performing the following check: Pick apart the ref into policy, SHA1, and key fingerprint parts. Reconstruct the signed body as above, using the policy name extracted from the ref. Retrieve the signature from the ref and combine it with the object itself. Verify that the policy in the stored signature matches the policy in the ref. Verify the signature with GPG: ```bash verify-gpg POLICY TYPE SHA1 FINGERPRINT \u00b6 verify-gpg() { { git cat-file \"$2\" \"$3\" git cat-file \"refs/signatures/$1/$3/$4\" } | gpg --batch --no-tty --verify } ``` Verify the key fingerprint of the signing key matches the key fingerprint in the ref itself. The specific rules for verifying the signature in GPG are left up to the user to define; for example, some sites may want to auto-retrieve keys and use a web of trust from some known roots to determine which keys are trusted, while others may wish to maintain a specific, known keyring containing all signing keys for each policy, and skip the web of trust entirely. This can be accomplished via git-config , given some work, and via gpg.conf . Distributing Signatures \u00b6 Since each signature is stored in a separate ref, and since signatures are not expected to be amended once published, the following refspec can be used with git fetch and git push to distribute signatures: refs/signatures/*:refs/signatures/* Note the lack of a + decoration; we explicitly do not want to auto-replace modified signatures, normally; explicit user action should be required. Workflow Notes \u00b6 There are two verification workflows for signatures: \"static\" verification, where the repository itself already contains all the refs and objects needed for signature verification, and \"pre-receive\" verification, where an object and its associated signature may be being uploaded at the same time. It is impractical to verify signatures on the fly from an update hook . Only pre-receive hooks can usefully accept or reject ref changes depending on whether the push contains a signature for the pushed objects. (Git does not provide a good mechanism for ensuring that signature objects are pushed before their subjects.) Correctly verifying object signatures during pre-receive regardless of ref order is far too complicated to summarize here. Attacks \u00b6 Lies of Omission \u00b6 It's trivial to hide signatures by deleting the signature refs. Similarly, anyone with access to a repository can delete any or all detached signatures from it without otherwise invalidating the signed objects. Since signatures are mostly static, sites following the recommended no-force policy for signature publication should only be affected if relatively recent signatures are deleted. Older signatures should be available in one or more of the repository users' loca repositories; once created, a signature can be legitimately obtained from anywhere, not only from the original signatory. The signature naming protocol is designed to resist most other forms of assertion tampering, but straight-up omission is hard to prevent. Unwarranted Certification \u00b6 The policy system allows any signatory to assert any policy. While centralized signature distribution points such as \"release\" repositories can make meaningful decisions about which signatures they choose to accept, publish, and propagate, there's no way to determine after the fact whether a policy assertion was obtained from a legitimate source or a malicious one with no grounds for asserting the policy. For example, I could, right now, sign an all-tests-pass policy assertion for the Linux kernel. While there's no chance on Earth that the LKML team would propagate that assertion, if I can convince you to fetch signatures from my repository, you will fetch my bogus assertion. If all-tests-pass is a meaningful policy assertion for the Linux kernel, then you will have very few options besides believing that I assert that all tests have passed. Ambigiuous Policy \u00b6 This is an ongoing problem with crypto policy systems and user interfaces generally, but this design does nothing to ensure that policies are interpreted uniformly by all participants in a repository. In particular, there's no mechanism described for distributing either prose or programmatic policy definitions and checks. All policy information is out of band. Git already has ambiguity problems around commit signing: there are multiple ways to interpret a signature on a commit: I assert that this snapshot and commit message were authored as described in this commit's metadata. (In this interpretation, the signature's authenticity guarantees do not transitively apply to parents.) I assert that this snapshot and commit message were authored as described in this commit's metadata, based on exactly the parent commits described. (In this interpretation, the signature's authenticity guarantees do transitively apply to parents. This is the interpretation favoured by XXX LINK HERE XXX.) I assert that this diff and commit message was authored as described in this commit's metadata. (No assertions about the snapshot are made whatsoever, and assertions about parentage are barely sensical at all. This meshes with widespread, diff-oriented policies.) Grafts and Replacements \u00b6 Git permits post-hoc replacement of arbitrary objects via both the grafts system (via an untracked, non-distributed file in .git , though some repositories distribute graft lists for end-users to manually apply) and the replacements system (via refs/replace/<SHA1> , which can optionally be fetched or pushed). The interaction between these two systems and signature verification needs to be very closely considered; I've not yet done so. Cases of note: Neither signature nor subject replaced - the \"normal\" case Signature not replaced, subject replaced (by graft, by replacement, by both) Signature replaced, subject not replaced Both signature and subject replaced It's tempting to outright disable git replace during signing and verification, but this will have surprising effects when signing a ref-ish instead of a bare hash. Since this is the normal case, I think this merits more thought. (I'm also not aware of a way to disable grafts without modifying .git , and having the two replacement mechanisms treated differently may be dangerous.) No Signed Refs \u00b6 I mentioned early in this draft that Git's existing signing system doesn't support signing refs themselves; since refs are an important piece of Git's workflow ecosystem, this may be a major omission. Unfortunately, this proposal doesn't address that. Possible Refinements \u00b6 Monotone's certificate system is key+value based, rather than label-based. This might be useful; while small pools of related values can be asserted using mutually exclusive policy labels (whose mutual exclusion is a matter of local interpretation), larger pools of related values rapidly become impractical under the proposed system. For example, this proposal would be inappropriate for directly asserting third-party authorship; the asserted author would have to appear in the policy name itself, exposing the user to a potentially very large number of similar policy labels. Ref signing via a manifest (a tree constellation whose paths are ref names and whose blobs sign the refs' values). Consider cribbing DNSSEC here for things like lightweight absence assertions, too. Describe how this should interact with commit-duplicating and commit-rewriting workflows.","title":"Notes Towards Detached Signatures in Git"},{"location":"git/detached-sigs/#notes-towards-detached-signatures-in-git","text":"Git supports a limited form of object authentication: specific object categories in Git's internal model can have GPG signatures embedded in them, allowing the authorship of the objects to be verified using GPG's underlying trust model. Tag signatures can be used to verify the authenticity and integrity of the snapshot associated with a tag , and the authenticity of the tag itself, filling a niche broadly similar to code signing in binary distribution systems. Commit signatures can be used to verify the authenticity of the snapshot associated with the commit , and the authorship of the commit itself. (Conventionally, commit signatures are assumed to also authenticate either the entire line of history leading to a commit, or the diff between the commit and its first parent, or both.) Git's existing system has some tradeoffs. Signatures are embedded within the objects they sign. The signature is part of the object's identity; since Git is content-addressed, this means that an object can neither be retroactively signed nor retroactively stripped of its signature without modifying the object's identity. Git's distributed model means that these sorts of identity changes are both complicated and easily detected. Commit signatures are second-class citizens. They're a relatively recent addition to the Git suite, and both the implementation and the social conventions around them continue to evolve. Only some objects can be signed. While Git has relatively weak rules about workflow, the signature system assumes you're using one of Git's more widespread workflows by limiting your options to at most one signature, and by restricting signatures to tags and commits (leaving out blobs, trees, and refs). I believe it would be useful from an authentication standpoint to add \"detached\" signatures to Git, to allow users to make these tradeoffs differently if desired. These signatures would be stored as separate (blob) objects in a dedicated refs namespace, supporting retroactive signatures, multiple signatures for a given object, \"policy\" signatures, and authentication of arbitrary objects. The following notes are partially guided by Git's one existing \"detached metadata\" facility, git notes . Similarities are intentional; divergences will be noted where appropriate. Detached signatures are meant to interoperate with existing Git workflow as much as possible: in particular, they can be fetched and pushed like any other bit of Git metadata. A detached signature cryptographically binds three facts together into an assertion whose authenticity can be checked by anyone with access to the signatory's keys: An object (in the Git sense; a commit, tag, tree, or blob), A policy label, and A signatory (a person or agent making the assertion). These assertions can be published separately from or in tandem with the objects they apply to.","title":"Notes Towards Detached Signatures in Git"},{"location":"git/detached-sigs/#policies","text":"Taking a hint from Monotone, every signature includes a \"policy\" identifying how the signature is meant to be interpreted. Policies are arbitrary strings; their meaning is entirely defined by tooling and convention, not by this draft. This draft uses a single policy, author , for its examples. A signature under the author policy implies that the signatory had a hand in the authorship of the designated object. (This is compatible with existing interpretations of signed tags and commits.) (Authorship under this model is strictly self-attested: you can claim authorship of anything, and you cannot assert anyone else's authorship.) The Monotone documentation suggests a number of other useful policies related to testing and release status, automated build results, and numerous other factors. Use your imagination.","title":"Policies"},{"location":"git/detached-sigs/#whats-in-a-signature","text":"Detached signatures cover the disk representation of an object, as given by git cat-file <TYPE> <SHA1> For most of Git's object types, this means that the signed content is plain text. For tree objects, the signed content is the awful binary representation of the tree, not the pretty representation given by git ls-tree or git show . Detached signatures include the \"policy\" identifier in the signed content, to prevent others from tampering with policy choices via refs hackery. (This will make more sense momentarily.) The policy identifier is prepended to the signed content, terminated by a zero byte (as with Git's own type identifiers, but without a length field as length checks are performed by signing and again when the signature is stored in Git). To generate the complete signable version of an object, use something equivalent to the following shell snippet: # generate-signable POLICY TYPE SHA1 function generate-signable() { printf '%s\\0' \"$1\" git cat-file \"$2\" \"$3\" } (In the process of writing this, I discovered how hard it is to get Unix's C-derived shell tools to emit a zero byte.)","title":"What's In A Signature"},{"location":"git/detached-sigs/#signature-storage-and-naming","text":"We assume that a userid will sign an object at most once. Each signature is stored in an independent blob object in the repository it applies to. The signature object (described above) is stored in Git, and its hash recorded in refs/signatures/<POLICY>/<SUBJECT SHA1>/<SIGNER KEY FINGERPRINT> . # sign POLICY TYPE SHA1 FINGERPRINT function sign() { local SIG_HASH=$( generate-signable \"$@\" | gpg --batch --no-tty --sign -u \"$4\" | git hash-object --stdin -w -t blob ) git update-ref \"refs/signatures/$1/$3/$4\" } Stored signatures always use the complete fingerprint to identify keys, to minimize the risk of colliding key IDs while avoiding the need to store full keys in the refs naming hierarchy. The policy name can be reliably extracted from the ref, as the trailing part has a fixed length (in both path segments and bytes) and each ref begins with a fixed, constant prefix refs/signatures/ .","title":"Signature Storage and Naming"},{"location":"git/detached-sigs/#signature-verification","text":"Given a signature ref as described above, we can verify and authenticate the signature and bind it to the associated object and policy by performing the following check: Pick apart the ref into policy, SHA1, and key fingerprint parts. Reconstruct the signed body as above, using the policy name extracted from the ref. Retrieve the signature from the ref and combine it with the object itself. Verify that the policy in the stored signature matches the policy in the ref. Verify the signature with GPG: ```bash","title":"Signature Verification"},{"location":"git/detached-sigs/#verify-gpg-policy-type-sha1-fingerprint","text":"verify-gpg() { { git cat-file \"$2\" \"$3\" git cat-file \"refs/signatures/$1/$3/$4\" } | gpg --batch --no-tty --verify } ``` Verify the key fingerprint of the signing key matches the key fingerprint in the ref itself. The specific rules for verifying the signature in GPG are left up to the user to define; for example, some sites may want to auto-retrieve keys and use a web of trust from some known roots to determine which keys are trusted, while others may wish to maintain a specific, known keyring containing all signing keys for each policy, and skip the web of trust entirely. This can be accomplished via git-config , given some work, and via gpg.conf .","title":"verify-gpg POLICY TYPE SHA1 FINGERPRINT"},{"location":"git/detached-sigs/#distributing-signatures","text":"Since each signature is stored in a separate ref, and since signatures are not expected to be amended once published, the following refspec can be used with git fetch and git push to distribute signatures: refs/signatures/*:refs/signatures/* Note the lack of a + decoration; we explicitly do not want to auto-replace modified signatures, normally; explicit user action should be required.","title":"Distributing Signatures"},{"location":"git/detached-sigs/#workflow-notes","text":"There are two verification workflows for signatures: \"static\" verification, where the repository itself already contains all the refs and objects needed for signature verification, and \"pre-receive\" verification, where an object and its associated signature may be being uploaded at the same time. It is impractical to verify signatures on the fly from an update hook . Only pre-receive hooks can usefully accept or reject ref changes depending on whether the push contains a signature for the pushed objects. (Git does not provide a good mechanism for ensuring that signature objects are pushed before their subjects.) Correctly verifying object signatures during pre-receive regardless of ref order is far too complicated to summarize here.","title":"Workflow Notes"},{"location":"git/detached-sigs/#attacks","text":"","title":"Attacks"},{"location":"git/detached-sigs/#lies-of-omission","text":"It's trivial to hide signatures by deleting the signature refs. Similarly, anyone with access to a repository can delete any or all detached signatures from it without otherwise invalidating the signed objects. Since signatures are mostly static, sites following the recommended no-force policy for signature publication should only be affected if relatively recent signatures are deleted. Older signatures should be available in one or more of the repository users' loca repositories; once created, a signature can be legitimately obtained from anywhere, not only from the original signatory. The signature naming protocol is designed to resist most other forms of assertion tampering, but straight-up omission is hard to prevent.","title":"Lies of Omission"},{"location":"git/detached-sigs/#unwarranted-certification","text":"The policy system allows any signatory to assert any policy. While centralized signature distribution points such as \"release\" repositories can make meaningful decisions about which signatures they choose to accept, publish, and propagate, there's no way to determine after the fact whether a policy assertion was obtained from a legitimate source or a malicious one with no grounds for asserting the policy. For example, I could, right now, sign an all-tests-pass policy assertion for the Linux kernel. While there's no chance on Earth that the LKML team would propagate that assertion, if I can convince you to fetch signatures from my repository, you will fetch my bogus assertion. If all-tests-pass is a meaningful policy assertion for the Linux kernel, then you will have very few options besides believing that I assert that all tests have passed.","title":"Unwarranted Certification"},{"location":"git/detached-sigs/#ambigiuous-policy","text":"This is an ongoing problem with crypto policy systems and user interfaces generally, but this design does nothing to ensure that policies are interpreted uniformly by all participants in a repository. In particular, there's no mechanism described for distributing either prose or programmatic policy definitions and checks. All policy information is out of band. Git already has ambiguity problems around commit signing: there are multiple ways to interpret a signature on a commit: I assert that this snapshot and commit message were authored as described in this commit's metadata. (In this interpretation, the signature's authenticity guarantees do not transitively apply to parents.) I assert that this snapshot and commit message were authored as described in this commit's metadata, based on exactly the parent commits described. (In this interpretation, the signature's authenticity guarantees do transitively apply to parents. This is the interpretation favoured by XXX LINK HERE XXX.) I assert that this diff and commit message was authored as described in this commit's metadata. (No assertions about the snapshot are made whatsoever, and assertions about parentage are barely sensical at all. This meshes with widespread, diff-oriented policies.)","title":"Ambigiuous Policy"},{"location":"git/detached-sigs/#grafts-and-replacements","text":"Git permits post-hoc replacement of arbitrary objects via both the grafts system (via an untracked, non-distributed file in .git , though some repositories distribute graft lists for end-users to manually apply) and the replacements system (via refs/replace/<SHA1> , which can optionally be fetched or pushed). The interaction between these two systems and signature verification needs to be very closely considered; I've not yet done so. Cases of note: Neither signature nor subject replaced - the \"normal\" case Signature not replaced, subject replaced (by graft, by replacement, by both) Signature replaced, subject not replaced Both signature and subject replaced It's tempting to outright disable git replace during signing and verification, but this will have surprising effects when signing a ref-ish instead of a bare hash. Since this is the normal case, I think this merits more thought. (I'm also not aware of a way to disable grafts without modifying .git , and having the two replacement mechanisms treated differently may be dangerous.)","title":"Grafts and Replacements"},{"location":"git/detached-sigs/#no-signed-refs","text":"I mentioned early in this draft that Git's existing signing system doesn't support signing refs themselves; since refs are an important piece of Git's workflow ecosystem, this may be a major omission. Unfortunately, this proposal doesn't address that.","title":"No Signed Refs"},{"location":"git/detached-sigs/#possible-refinements","text":"Monotone's certificate system is key+value based, rather than label-based. This might be useful; while small pools of related values can be asserted using mutually exclusive policy labels (whose mutual exclusion is a matter of local interpretation), larger pools of related values rapidly become impractical under the proposed system. For example, this proposal would be inappropriate for directly asserting third-party authorship; the asserted author would have to appear in the policy name itself, exposing the user to a potentially very large number of similar policy labels. Ref signing via a manifest (a tree constellation whose paths are ref names and whose blobs sign the refs' values). Consider cribbing DNSSEC here for things like lightweight absence assertions, too. Describe how this should interact with commit-duplicating and commit-rewriting workflows.","title":"Possible Refinements"},{"location":"git/pull-request-workflow/","text":"Life With Pull Requests \u00b6 I've been party to a number of discussions with folks contributing to pull-request-based projects on Github (and other hosts, but mostly Github). Because of Git's innate flexibility, there are lots of ways to work with pull requests. Here's mine. I use a couple of naming conventions here that are not stock git : origin is the repository to which you publish proposed changes, and upstream is the repository from which you receive ongoing development, and which will receive your changes if they are accepted. One-time setup \u00b6 Do these things once, when starting out on a project. Keep the results around for later. I'll be referring to the original project repository as upstream and pretending its push URL is UPSTREAM-URL below. In real life, the URL will often be something like git@github.com:someguy/project.git . Fork the project \u00b6 Use the repo manager's forking tool to create a copy of the project in your own namespace. This generally creates your copy with a bunch of useless tat; feel free to ignore all of this, as the only purpose of this copy is to provide somewhere for you to publish your changes. We'll be calling this repository origin later. Assume it has a URL, which I'll abbreviate ORIGIN-URL , for git push to use. (You can leave this step for later, but if you know you're going to do it, why not get it out of the way?) Clone the project and configure it \u00b6 You'll need a clone locally to do work in. Create one from origin : git clone ORIGIN-URL some-local-name While you're here, cd into it and add the original project as a remote: cd some-local-name git remote add upstream UPSTREAM-URL Feature process \u00b6 Do these things for each feature you work on. To switch features, just use git checkout my-feature . Create a new feature branch locally \u00b6 We use upstream 's master branch here, so that your feature includes all of upstream 's state initially. We also need to make sure our local cache of upstream 's state is correct: git fetch upstream git checkout upstream/master -b my-feature Do work \u00b6 If you need my help here, stop now. Integrate upstream changes \u00b6 If you find yourself needing something that's been added upstream, use rebase to integrate it to avoid littering your feature branch with \u201cmeaningless\u201d merge commits. git checkout my-feature git fetch upstream git rebase upstream/master Publish your branch \u00b6 When you're \u201cdone,\u201d publish your branch to your personal repository: git push origin my-feature Then visit your copy in your repo manager's web UI and create a pull request for my-feature . Integrating feedback \u00b6 Very likely, your proposed changes will need work. If you use history-editing to integrate feedback, you will need to use --force when updating the branch: git push --force origin my-feature This is safe provided two things are true: The branch has not yet been merged to the upstream repo. You are only force-pushing to your fork, not to the upstream repo. Generally, no other users will have work based on your pull request, so force-pushing history won't cause problems.","title":"Life With Pull Requests"},{"location":"git/pull-request-workflow/#life-with-pull-requests","text":"I've been party to a number of discussions with folks contributing to pull-request-based projects on Github (and other hosts, but mostly Github). Because of Git's innate flexibility, there are lots of ways to work with pull requests. Here's mine. I use a couple of naming conventions here that are not stock git : origin is the repository to which you publish proposed changes, and upstream is the repository from which you receive ongoing development, and which will receive your changes if they are accepted.","title":"Life With Pull Requests"},{"location":"git/pull-request-workflow/#one-time-setup","text":"Do these things once, when starting out on a project. Keep the results around for later. I'll be referring to the original project repository as upstream and pretending its push URL is UPSTREAM-URL below. In real life, the URL will often be something like git@github.com:someguy/project.git .","title":"One-time setup"},{"location":"git/pull-request-workflow/#fork-the-project","text":"Use the repo manager's forking tool to create a copy of the project in your own namespace. This generally creates your copy with a bunch of useless tat; feel free to ignore all of this, as the only purpose of this copy is to provide somewhere for you to publish your changes. We'll be calling this repository origin later. Assume it has a URL, which I'll abbreviate ORIGIN-URL , for git push to use. (You can leave this step for later, but if you know you're going to do it, why not get it out of the way?)","title":"Fork the project"},{"location":"git/pull-request-workflow/#clone-the-project-and-configure-it","text":"You'll need a clone locally to do work in. Create one from origin : git clone ORIGIN-URL some-local-name While you're here, cd into it and add the original project as a remote: cd some-local-name git remote add upstream UPSTREAM-URL","title":"Clone the project and configure it"},{"location":"git/pull-request-workflow/#feature-process","text":"Do these things for each feature you work on. To switch features, just use git checkout my-feature .","title":"Feature process"},{"location":"git/pull-request-workflow/#create-a-new-feature-branch-locally","text":"We use upstream 's master branch here, so that your feature includes all of upstream 's state initially. We also need to make sure our local cache of upstream 's state is correct: git fetch upstream git checkout upstream/master -b my-feature","title":"Create a new feature branch locally"},{"location":"git/pull-request-workflow/#do-work","text":"If you need my help here, stop now.","title":"Do work"},{"location":"git/pull-request-workflow/#integrate-upstream-changes","text":"If you find yourself needing something that's been added upstream, use rebase to integrate it to avoid littering your feature branch with \u201cmeaningless\u201d merge commits. git checkout my-feature git fetch upstream git rebase upstream/master","title":"Integrate upstream changes"},{"location":"git/pull-request-workflow/#publish-your-branch","text":"When you're \u201cdone,\u201d publish your branch to your personal repository: git push origin my-feature Then visit your copy in your repo manager's web UI and create a pull request for my-feature .","title":"Publish your branch"},{"location":"git/pull-request-workflow/#integrating-feedback","text":"Very likely, your proposed changes will need work. If you use history-editing to integrate feedback, you will need to use --force when updating the branch: git push --force origin my-feature This is safe provided two things are true: The branch has not yet been merged to the upstream repo. You are only force-pushing to your fork, not to the upstream repo. Generally, no other users will have work based on your pull request, so force-pushing history won't cause problems.","title":"Integrating feedback"},{"location":"git/scratch/","text":"Git Is Not Magic \u00b6 I'm bored. Let's make a git repository out of whole cloth. Git repos are stored in .git: fakegit$ mkdir .git They have a \u201csymbolic ref\u201d (which are text files, see man git-symbolic-ref ) named HEAD , pointing to the currently checked-out branch. Let's use master . Branches are refs under refs/heads (see man git-branch ): fakegit ((unknown))$ echo 'ref: refs/heads/master' > .git/HEAD The have an object database and a refs database, both of which are simple directories (see man gitrepository-layout and man gitrevisions ). Let's also enable the reflog, because it's a great safety net if you use history-editing tools in git: fakegit ((ref: re...))$ mkdir .git/refs .git/objects .git/logs fakegit (master #)$ Now __git_ps1 , at least, is convinced that we have a working git repository. Does it work? fakegit (master #)$ echo 'Hello, world!' > hello.txt fakegit (master #)$ git add hello.txt fakegit (master #)$ git commit -m 'Initial commit' [master (root-commit) 975307b] Initial commit 1 file changed, 1 insertion(+) create mode 100644 hello.txt fakegit (master)$ git log commit 975307ba0485bff92e295e3379a952aff013c688 Author: Owen Jacobson <owen.jacobson@grimoire.ca> Date: Wed Feb 6 10:07:07 2013 -0500 Initial commit Eeyup . Should you do this? Of course not. Anywhere you could run these commands, you could instead run git init or git clone , which set up a number of other structures, including .git/config and any unusual permissions options. The key part here is that a directory's identity as \u201ca git repository\u201d is entirely a function of its contents, not of having been blessed into being by git itself. You can infer a lot from this: for example, you can infer that it's \u201csafe\u201d to move git repositories around using FS tools, or to back them up with the same tools, for example. This is not as obvious to everyone as you might hope; people","title":"Git Is Not Magic"},{"location":"git/scratch/#git-is-not-magic","text":"I'm bored. Let's make a git repository out of whole cloth. Git repos are stored in .git: fakegit$ mkdir .git They have a \u201csymbolic ref\u201d (which are text files, see man git-symbolic-ref ) named HEAD , pointing to the currently checked-out branch. Let's use master . Branches are refs under refs/heads (see man git-branch ): fakegit ((unknown))$ echo 'ref: refs/heads/master' > .git/HEAD The have an object database and a refs database, both of which are simple directories (see man gitrepository-layout and man gitrevisions ). Let's also enable the reflog, because it's a great safety net if you use history-editing tools in git: fakegit ((ref: re...))$ mkdir .git/refs .git/objects .git/logs fakegit (master #)$ Now __git_ps1 , at least, is convinced that we have a working git repository. Does it work? fakegit (master #)$ echo 'Hello, world!' > hello.txt fakegit (master #)$ git add hello.txt fakegit (master #)$ git commit -m 'Initial commit' [master (root-commit) 975307b] Initial commit 1 file changed, 1 insertion(+) create mode 100644 hello.txt fakegit (master)$ git log commit 975307ba0485bff92e295e3379a952aff013c688 Author: Owen Jacobson <owen.jacobson@grimoire.ca> Date: Wed Feb 6 10:07:07 2013 -0500 Initial commit Eeyup . Should you do this? Of course not. Anywhere you could run these commands, you could instead run git init or git clone , which set up a number of other structures, including .git/config and any unusual permissions options. The key part here is that a directory's identity as \u201ca git repository\u201d is entirely a function of its contents, not of having been blessed into being by git itself. You can infer a lot from this: for example, you can infer that it's \u201csafe\u201d to move git repositories around using FS tools, or to back them up with the same tools, for example. This is not as obvious to everyone as you might hope; people","title":"Git Is Not Magic"},{"location":"git/stop-using-git-pull-to-deploy/","text":"Stop using git pull for deployment! \u00b6 The problem \u00b6 You have a Git repository containing your project. You want to \u201cdeploy\u201d that code when it changes. You'd rather not download the entire project from scratch for each deployment. The antipattern \u00b6 \u201cI know, I'll use git pull in my deployment script!\u201d Stop doing this. Stop teaching other people to do this. It's wrong, and it will eventually lead to deploying something you didn't want. Deployment should be based on predictable, known versions of your code. Ideally, every deployable version has a tag (and you deploy exactly that tag), but even less formal processes, where you deploy a branch tip, should still be deploying exactly the code designated for release. git pull , however, can introduce new commits. git pull is a two-step process: Fetch the current branch's designated upstream remote, to obtain all of the remote's new commits. Merge the current branch's designated upstream branch into the current branch. The merge commit means the actual deployed tree might not be identical to the intended deployment tree. Local changes (intentional or otherwise) will be preserved (and merged) into the deployment, for example; once this happens, the actual deployed commit will never match the intended commit. git pull will approximate the right thing \u201cby accident\u201d: if the current local branch (generally master ) for people using git pull is always clean, and always tracks the desired deployment branch, then git pull will update to the intended commit exactly. This is pretty fragile, though; many git commands can cause the local branch to diverge from its upstream branch, and once that happens, git pull will always create new commits. You can patch around the fragility a bit using the --ff-only option, but that only tells you when your deployment environment has diverged and doesn't fix it. The right pattern \u00b6 Quoting Sitaram Chamarty : Here's what we expect from a deployment tool. Note the rule numbers -- we'll be referring to some of them simply by number later. All files in the branch being deployed should be copied to the deployment directory. Files that were deleted in the git repo since the last deployment should get deleted from the deployment directory. Any changes to tracked files in the deployment directory after the last deployment should be ignored when following rules 1 and 2. However, sometimes you might want to detect such changes and abort if you found any. Untracked files in the deploy directory should be left alone. Again, some people might want to detect this and abort the deployment. Sitaram's own documentation talks about how to accomplish these when \u201cdeploying\u201d straight out of a bare repository. That's unwise (not to mention impractical) in most cases; deployment should use a dedicated clone of the canonical repository. I also disagree with point 3, preferring to keep deployment-related changes outside of tracked files. This makes it much easier to argue that the changes introduced to configure the project for deployment do not introduce new bugs or other surprise features. My deployment process, given a dedicated clone at $DEPLOY_TREE , is as follows: cd \"${DEPLOY_TREE}\" git fetch --all git checkout --force \"${TARGET}\" # Following two lines only required if you use submodules git submodule sync git submodule update --init --recursive # Follow with actual deployment steps (run fabric/capistrano/make/etc) $TARGET is either a tag name ( v1.2.1 ) or a remote branch name ( origin/master ), but could also be a commit hash or anything else Git recognizes as a revision. This will detach the head of the $DEPLOY_TREE repository, which is fine as no new changes should be authored in this repository (so the local branches are irrelevant). The warning Git emits when HEAD becomes detached is unimportant in this case. The tracked contents of $DEPLOY_TREE will end up identical to the desired commit, discarding local changes. The pattern above is very similar to what most continuous integration servers use when building from Git repositories, for much the same reason.","title":"Stop using `git pull` for deployment!"},{"location":"git/stop-using-git-pull-to-deploy/#stop-using-git-pull-for-deployment","text":"","title":"Stop using git pull for deployment!"},{"location":"git/stop-using-git-pull-to-deploy/#the-problem","text":"You have a Git repository containing your project. You want to \u201cdeploy\u201d that code when it changes. You'd rather not download the entire project from scratch for each deployment.","title":"The problem"},{"location":"git/stop-using-git-pull-to-deploy/#the-antipattern","text":"\u201cI know, I'll use git pull in my deployment script!\u201d Stop doing this. Stop teaching other people to do this. It's wrong, and it will eventually lead to deploying something you didn't want. Deployment should be based on predictable, known versions of your code. Ideally, every deployable version has a tag (and you deploy exactly that tag), but even less formal processes, where you deploy a branch tip, should still be deploying exactly the code designated for release. git pull , however, can introduce new commits. git pull is a two-step process: Fetch the current branch's designated upstream remote, to obtain all of the remote's new commits. Merge the current branch's designated upstream branch into the current branch. The merge commit means the actual deployed tree might not be identical to the intended deployment tree. Local changes (intentional or otherwise) will be preserved (and merged) into the deployment, for example; once this happens, the actual deployed commit will never match the intended commit. git pull will approximate the right thing \u201cby accident\u201d: if the current local branch (generally master ) for people using git pull is always clean, and always tracks the desired deployment branch, then git pull will update to the intended commit exactly. This is pretty fragile, though; many git commands can cause the local branch to diverge from its upstream branch, and once that happens, git pull will always create new commits. You can patch around the fragility a bit using the --ff-only option, but that only tells you when your deployment environment has diverged and doesn't fix it.","title":"The antipattern"},{"location":"git/stop-using-git-pull-to-deploy/#the-right-pattern","text":"Quoting Sitaram Chamarty : Here's what we expect from a deployment tool. Note the rule numbers -- we'll be referring to some of them simply by number later. All files in the branch being deployed should be copied to the deployment directory. Files that were deleted in the git repo since the last deployment should get deleted from the deployment directory. Any changes to tracked files in the deployment directory after the last deployment should be ignored when following rules 1 and 2. However, sometimes you might want to detect such changes and abort if you found any. Untracked files in the deploy directory should be left alone. Again, some people might want to detect this and abort the deployment. Sitaram's own documentation talks about how to accomplish these when \u201cdeploying\u201d straight out of a bare repository. That's unwise (not to mention impractical) in most cases; deployment should use a dedicated clone of the canonical repository. I also disagree with point 3, preferring to keep deployment-related changes outside of tracked files. This makes it much easier to argue that the changes introduced to configure the project for deployment do not introduce new bugs or other surprise features. My deployment process, given a dedicated clone at $DEPLOY_TREE , is as follows: cd \"${DEPLOY_TREE}\" git fetch --all git checkout --force \"${TARGET}\" # Following two lines only required if you use submodules git submodule sync git submodule update --init --recursive # Follow with actual deployment steps (run fabric/capistrano/make/etc) $TARGET is either a tag name ( v1.2.1 ) or a remote branch name ( origin/master ), but could also be a commit hash or anything else Git recognizes as a revision. This will detach the head of the $DEPLOY_TREE repository, which is fine as no new changes should be authored in this repository (so the local branches are irrelevant). The warning Git emits when HEAD becomes detached is unimportant in this case. The tracked contents of $DEPLOY_TREE will end up identical to the desired commit, discarding local changes. The pattern above is very similar to what most continuous integration servers use when building from Git repositories, for much the same reason.","title":"The right pattern"},{"location":"git/survival/","text":"Git Survival Guide \u00b6 I think the git UI is pretty awful, and encourages using Git in ways that will screw you. Here are a few things I've picked up that have saved my bacon. You will inevitably need to understand Git's \u201cinternals\u201d to make use of it as an SCM tool. Accept this early. If you think your SCM tool should not expose you to so much plumbing, don't use Git . Git weenies will claim that this plumbing is what gives Git all of its extra power. This is true; it gives Git the power to get you out of situations you wouldn't be in without Git. git log --graph --decorate --oneline --color --all Run git fetch habitually. Stale remote-tracking branches lead to sadness. git push and git pull are not symmetric . git push 's opposite operation is git fetch . ( git pull is equivalent to git fetch followed by git merge , more or less). Git configuration values don't always have the best defaults . The upstream branch of foo is foo@{u} . The upstream branch of your checked-out branch is HEAD@{u} or @{u} . This is documented in git help revisions . You probably don't want to use a merge operation (such as git pull ) to integrate upstream changes into topic branches. The resulting history can be very confusing to follow, especially if you integrate upstream changes frequently. You can leave topic branches \u201creal\u201d relatively safely. You can do a test merge to see if they still work cleanly post-integration without actually integrating upstream into the branch permanently. You can use git rebase or git pull --rebase to transplant your branch to a new, more recent starting point that includes the changes you want to integrate. This makes the upstream changes a permanent part of your branch, just like git merge or git pull would, but generates an easier-to-follow history. Conflict resolution will happen as normal. Example test merge, using origin/master as the upstream branch and foo as the candidate for integration: git fetch origin git checkout origin/master -b test-merge-foo git merge foo # run tests, examine files git diff origin/master..HEAD To discard the test merge, delete the branch after checking out some other branch: git checkout foo git branch -D test-merge-foo You can combine this with git rerere to save time resolving conflicts in a later \u201creal,\u201d permanent merge. You can use git checkout -p to build new, tidy commits out of a branch laden with \u201cwip\u201d commits: git fetch git checkout $(git merge-base origin/master foo) -b foo-cleaner-history git checkout -p foo -- paths/to/files # pick out changes from the presented patch that form a coherent commit # repeat 'git checkout -p foo --' steps for related files to build up # the new commit git commit # repeat 'git checkout -p foo --' and 'git commit' steps until no diffs remain Gotcha: git checkout -p will do nothing for files that are being created. Use git checkout , instead, and edit the file if necessary. Thanks, Git. Gotcha: The new, clean branch must diverge from its upstream branch ( origin/master , in the example above) at exactly the same point, or the diffs presented by git checkout -p foo will include chunks that revert changes on the upstream branch since the \u201cdirty\u201d branch was created. The easiest way to find this point is with git merge-base . Useful Resources \u00b6 That is, resoures that can help you solve problems or understand things, not resources that reiterate the man pages for you. Sitaram Chamarty's git concepts simplified Tv's Git for Computer Scientists","title":"Git Survival Guide"},{"location":"git/survival/#git-survival-guide","text":"I think the git UI is pretty awful, and encourages using Git in ways that will screw you. Here are a few things I've picked up that have saved my bacon. You will inevitably need to understand Git's \u201cinternals\u201d to make use of it as an SCM tool. Accept this early. If you think your SCM tool should not expose you to so much plumbing, don't use Git . Git weenies will claim that this plumbing is what gives Git all of its extra power. This is true; it gives Git the power to get you out of situations you wouldn't be in without Git. git log --graph --decorate --oneline --color --all Run git fetch habitually. Stale remote-tracking branches lead to sadness. git push and git pull are not symmetric . git push 's opposite operation is git fetch . ( git pull is equivalent to git fetch followed by git merge , more or less). Git configuration values don't always have the best defaults . The upstream branch of foo is foo@{u} . The upstream branch of your checked-out branch is HEAD@{u} or @{u} . This is documented in git help revisions . You probably don't want to use a merge operation (such as git pull ) to integrate upstream changes into topic branches. The resulting history can be very confusing to follow, especially if you integrate upstream changes frequently. You can leave topic branches \u201creal\u201d relatively safely. You can do a test merge to see if they still work cleanly post-integration without actually integrating upstream into the branch permanently. You can use git rebase or git pull --rebase to transplant your branch to a new, more recent starting point that includes the changes you want to integrate. This makes the upstream changes a permanent part of your branch, just like git merge or git pull would, but generates an easier-to-follow history. Conflict resolution will happen as normal. Example test merge, using origin/master as the upstream branch and foo as the candidate for integration: git fetch origin git checkout origin/master -b test-merge-foo git merge foo # run tests, examine files git diff origin/master..HEAD To discard the test merge, delete the branch after checking out some other branch: git checkout foo git branch -D test-merge-foo You can combine this with git rerere to save time resolving conflicts in a later \u201creal,\u201d permanent merge. You can use git checkout -p to build new, tidy commits out of a branch laden with \u201cwip\u201d commits: git fetch git checkout $(git merge-base origin/master foo) -b foo-cleaner-history git checkout -p foo -- paths/to/files # pick out changes from the presented patch that form a coherent commit # repeat 'git checkout -p foo --' steps for related files to build up # the new commit git commit # repeat 'git checkout -p foo --' and 'git commit' steps until no diffs remain Gotcha: git checkout -p will do nothing for files that are being created. Use git checkout , instead, and edit the file if necessary. Thanks, Git. Gotcha: The new, clean branch must diverge from its upstream branch ( origin/master , in the example above) at exactly the same point, or the diffs presented by git checkout -p foo will include chunks that revert changes on the upstream branch since the \u201cdirty\u201d branch was created. The easiest way to find this point is with git merge-base .","title":"Git Survival Guide"},{"location":"git/survival/#useful-resources","text":"That is, resoures that can help you solve problems or understand things, not resources that reiterate the man pages for you. Sitaram Chamarty's git concepts simplified Tv's Git for Computer Scientists","title":"Useful Resources"},{"location":"gossamer/","text":"Gossamer: A Decentralized Status-Sharing Network \u00b6 Twitter's pretty great. The short format encourages brief, pithy remarks, and the default assumption of visibility makes it super easy to pitch in on a conversation, or to find new people to listen to. Unfortunately, Twitter is a centralized system: one Bay-area company in the United States controls and mediates all Twitter interactions. From all appearances, Twitter, Inc. is relatively benign, as social media corporations go. There are few reports of censorship, and while their response to abuse of the Twitter network has not been consistently awesome, they can be made to listen. However, there exists the capacity for Twitter, Inc. to subvert the entire Twitter system, either voluntarily or at the behest of governments around the world. (Just ask Turkish people. Or the participants in the Arab Spring.) Gossamer is a Twitter-alike system, designed from the ground up to have no central authority. It resists censorship, enables individual participants to control their own data, and allows anyone at all to integrate new software into the Gossamer network. Gossamer does not exist, but if it did, the following notes describe what it might look like, and the factors to consider when implementing Gossamer as software. I have made fatal mistakes while writing it; I have not rushed to build it specifically because Twitter, Gossamer's model, is so deeply woven into so many peoples' lives. A successor must make fewer mistakes, not merely different mistakes, and certainly not more mistakes. The following is loosely inspired by Rumor Monger , at \u201cwhole world\u201d scale. Design Goals \u00b6 Users must be in control of their own privacy and identity at all times. (This is a major failing with Diaspora, which limits access to personal ownership of data by being hard to run.) Users must be able to communicate without the consent or support of an intermediate authority. Short of being completely offline, Gossamer should be resilient to infrastructural damage. Any functional communication system will be used for illicit purposes. This is an unavoidable consequence of being usable for legitimate purposes without a central authority. Rather than revealing illicit conversations, Gossamer should do what it can to preserve the anonymity and privacy of legitimate ones. All nodes are as equal as possible. The node I use is not more authoritative for messages from me than any other node. You can hear my words from anyone who has heard my words, and I can hear yours from anyone who has heard your words, so long as some variety of authenticity and privacy are maintained. If an identity's secrets are removed, a node should contain no data that correlates the owner with his or her Gossamer identities. Relaying and authoring must be as indistinguishable as possible, to limit the utility of traffic analysis. Public and Private Information \u00b6 Every piece of data Gossamer uses, either internally or to communicate with other ndoes, is classified as either public or private . Public information can be communicated to other nodes, and is assumed to be safe if recovered out of band. Private information includes anything which may be used to associate a Gossamer identity with the person who controls it, except as noted below. Gossamer must ensure users understand what information that they provide will be made public, and what will be kept private, so that they can better decide what, if anything, to share and so that they can better make decisions about their own safety and comfort against abusive parties. Internally, Gossamer always stores private information encrypted, and never transmits it to another node. Gossamer must provide a tool to safely obliterate private data. Public Information \u00b6 Details on the role of each piece of information are covered below. Public status updates, obviously. Gossamer exists to permit users to easily share short messages with one another. The opaque form of a user's incoming and outgoing private messages. The users' identities' public keys. (But not their relationship to one another.) Any information the user places in their profile. (This implies that profiles must not be auto-populated from, for example, the user's address book.) The set of identities verified by the user's identity. Any other information Gossamer retains must be private. Republishing \u00b6 Gossamer is built on the assumption that every participant is willing to act as a relay for every other participant. This is a complicated assumption at the human layer. Inevitably, someone will use the Gossamer network to communicate something morally repugnant or deeply illegal: the Silk Road guy, for example, got done for trying to contract someone to commit murder. Every Gossamer node is complicit in delivering those messages to the rest of the network, whether they're in the clear (status updates) or not (private messages). It's unclear how this interacts with the various legal frameworks, moral codes, and other social constructs throughout the world, and it's ethically troubling to put users in that position by default. The strong alternative, that each node only relay content with the controlling user's explicit and ongoing consent, is also troubling: it limits the Gossamer network's ability to deliver messages at all , and exposes information about which identities each node's owner considers interesting and publishable. I don't have an obvious resolution to this. Gossamer's underlying protocol relies on randomly-selected nodes being more likely to propagate a message than to ignore it, because this helps make Gossamer resilient to hostile users, nosy intelligence agencies, and others who believe communication must be restrictable. On the other hand, I'd like not to put a user in Taiwan at risk of legal or social reprisals because a total stranger in Canada decided to post something vile. (This is one of the reasons I haven't built the damn thing yet. Besides being A Lot Of Code, there's no way to shut off Gossamer once more than one node exists, and I want to be sure I've thought through what I'm doing before creating a prototype.) Identity in the Gossamer Network \u00b6 Every Gossamer message carries with it an identity . Gossamer identities are backed by public-key cryptography. However, unlike traditional public key systems such as GPG, Gossamer identities provide continuity , rather than authenticity : two Gossamer messages signed by the same key are from the same identity, but there is no inherent guarantee that that identity is legitimate. Gossamer maintains relationships between identities to allow users to verify the identities of one another, and to publish attestations of that to other Gossamer nodes. From this, Gossamer can recover much of GPG's \u201cweb of trust.\u201d TODO : revocation of identities, revocation of verifications. Both are important; novice users are likely to verify people poorly, and there should be a recovery path less drastic than GPG's \u201cyou swore it, you're stuck with it\u201d model. Gossamer encourages users to create additional identities as needed to, for example, support the separation of work and home conversations, or to provide anonymity when discussing reputationally-hazardous topics. Identities are not correlated by the Gossamer codebase. Each identity can optionally include a profile : a block of data describing the person behind the identity. The contents of a profile are chosen by the person holding the private key for an identity, and the profile is attached to every new message created with the corresponding identity. A user can update their profile at will; potentially, every message can be sent with a distinct profile. Gossamer software treats the profile it's seen with the highest timestamp as authoritative, retroactively applying it to old messages. Multiple Devices and Key Security \u00b6 A Gossamer identity is entirely contained in its private key. An identity's key must be stored safely, either using the host operating system's key management facilities or using a carefully-designed key store. Keys must not hit long-term storage unprotected; this may involve careful integration with the underlying OS's memory management facilities to avoid, eg., placing identities in swap. This is necessary to protect users from having their identities recovered against their will via, for example, hard drive forensics. Gossamer allows keys to be exported into password-encrypted archive files, which can be loaded into other Gossamer applications to allow them to share the same identity. GOSSAMER MUST TREAT THESE FILES WITH EXTREME CARE, BECAUSE USERS PROBABLY WON'T . Identity keys protect the user's Gossamer identity, but they also protect the user's private messages (see below) and other potentially identifying data. The export format must be designed to be as resilient as possible, and Gossamer's software must take care to ensure that \u201cused\u201d identity files are automatically destroyed safely wherever possible and to discourage users from following practices that weaken their own safety unknowingly. Exported identity files are intrinsically vulnerable to offline brute-force attacks; once obtained, an attacker can try any of the worryingly common passwords at will, and can easily validate a password by using the recovered keys to regenerate some known fact about the original, such as a verification or a message signature. This implies that exported identities must use a key derivation system which has a high computational cost and which is believed to be resilient to, for example, GPU-accelerated cracking. Secure deletion is a Hard Problem; where possible, Gossamer must use operating system-provided facilities for securely destroying files. Status Messages \u00b6 Status messages are messages visible to any interested Gossamer users. These are the primary purpose of Gossamer. Each contains up to 140 Unicode characters, a markup section allowing Gossamer to attach URLs and metadata (including Gossamer locators) to the text, and an attachments section carrying arbitrary MIME blobs of limited total size. All three sections are canonicalized ( TODO : how?) and signed by the publishing identity's private key. The public key, the identity's most recent profile, and the signed status message are combined into a single Gossamer message and injected into the user's Gossamer node exactly as if it had arrived from another node. Each Gossamer node maintains a follow list of identities whose messages the user is interested in seeing. When Gossamer receives a novel status message during a gossip exchange, it displays it to the user if and only if its identity is on the node's follow list. Otherwise, the message is not displayed, but will be shared onwards with other nodes. In this way, every Gossamer node acts as a relay for every other Gossamer node. If Gossamer receives a message signed by an identity it has seen attestations for, it attaches those attestations to the message before delivering them onwards. In this way, users' verifications of one another's identity spread through the network organically. Private Messages \u00b6 Gossamer can optionally encrypt messages, allowing users to send one another private messages. These messages are carried over the Gossamer network as normal, but only nodes holding the appropriate identity key can decrypt them and display them to the user. (At any given time, most Gossamer nodes hold many private messages they cannot decrypt.) Private messages do not carry the author's identity or full profile in the clear. The author's bare identity is included in the encrypted part of the message, to allow the intended recipient to identify the sender. TODO : sign-then-encrypt, or encrypt-then-sign? If sign-then-encrypt, are private messages exempted from the \u201cdrop broken messages\u201d rule above? Following Users \u00b6 Each Gossamer node maintains a database of followed identities. (This may or may not include the owner's own identity.) Any message stored in the node published by an identity in this database will be shown to the user in a timeline-esque view. Gossamer's follow list is purely local , and is not shared between nodes even if they have identities in common. The follow list is additionally stored encrypted using the node's identities (any one identity is sufficient to recover the list), to ensure that the follow list is not easily available to others without the node owner's permission. Exercises such as Finding Paul Revere have shown that the collection of graph edges showing who communicates with whom can often be sufficient to map identities into people. Gossamer attempts to restrict access to this data, believing it is not the network's place to know who follows who. Verified Identities \u00b6 Gossamer allows identities to sign one anothers' public keys. These signatures form verifications . Gossamer considers an identity verified if any of the following hold: Gossamer has access to the identity key for the identity itself. Gossamer has access to the identity key for at least one of the identity's verifications. The identity is signed by at least three (todo: or however many, I didn't do the arithmetic yet) verified identities. Verified identities are marked in the user interface to make it obvious to the user whether a message is from a known friend or from an unknown identity. Gossamer allows users to sign new verifications for any identity they have seen. These verifications are initially stored locally, but will be published as messages transit the node as described below. Verification is a public fact: everyone can see which identities have verified which other identities. This is a potentially very powerful tool for reassociating identities with real-world people; Gossamer must make this clear to users. (I'm pretty sure you could find me, personally, just by watching whose identities I verify.) Each Gossamer node maintains a database of every verification it has ever seen or generated. If the node receives a message from an identity that appears in the verification database, and if the message is under some total size, Gossamer appends verifications from its database to the message before reinjecting it into the network. This allows verifications to propagate through Blocking Users \u00b6 Any social network will attract hostile users who wish to disrupt the network or abuse its participants. Users must be able to filter out these users, and must not provide too much feedback to blocked users that could otherwise be used to circumvent blocks. Each Gossamer node maintains a database of blocked identities. Any message from an identity in this database, or from an identity that is verified by three or more identities in this database, will automatically be filtered out from display. (Additionally, transitively-blocked users will automatically be added to the block database. Blocking is contagious.) ( TODO : should Gossamer drop blocked messages? How does that interact with the inevitable \u201cshared blocklist\u201d systems that arise in any social network?) As with the follow list, the block database is encrypted using the node's identities. Gossamer encourages users to create new identities as often as they see fit and attempts to separate identities from one another as much as possible. This is fundamentally incompatible with strong blocking. It will always be possible for a newly-created identity to deliver at least one message before being blocked. This is a major design problem ; advice encouraged. Gossamer Network Primitives \u00b6 The Gossamer network is built around a gossip protocol, wherein nodes connect to one another periodically to exchange messages with one another. Connections occur over the existing IP internet infrastructure, traversing NAT networks where possible to ensure that users on residential and corporate networks can still participate. Gossamer bootstraps its network using a number of paths: Gossamer nodes in the same broadcast domain discover one another using UDP broadcasts as well as Bonjour/mDNS. Gossamer can generate locator strings, which can be shared \u201cout of band\u201d via email, SMS messages, Twitter, graffiti, etc. Gossamer nodes share knowledge of nodes whenever they exchange messages, to allow the Gossamer network to recover from lost nodes and to permit nodes to remain on the network as \u201cknown\u201d nodes are lost to outages and entropy. Locators \u00b6 A Gossamer locator is a URL in the g scheme, carrying an encoding of one or more network addresses as well as an encoding of one or more identities (see below). Gossamer's software attempts to determine an appropriate identifier for any identities it holds based on the host computer's network configuration, taking into account issues like NAT traversal wherever possible. TODO : Gossamer and uPNP, what do locators look like? When presented with an identifier, Gossamer offers to follow the identities it contains, and uses the nodes whose addresses it contains to connect to the Gossamer network. This allows new clients to bootstrap into Gossamer, and provides an easy way for users to exchange Gossamer identities to connect to one another later. (Clever readers will note that the address list is actually independent of the identity list.) Gossip \u00b6 Each Gossamer node maintains a pair of \u201cfreshness\u201d databases, associating some information with a freshness score (expressed as an integer). One freshness database holds the addresses of known Gossamer nodes, and another holds Gossamer messages. Whenever two Gossamer nodes interact, each sends the other a Gossamer node from its current node database, and a message from its message database. When selecting an item to send for either category, Gossamer uses a random selection that weights towards items with a higher \u201cfreshness\u201d score. ( TODO : how?) When sending a fact, if the receiving node already knows the fact, both nodes decrement that fact's freshness by one. If the receiving node does not already know the fact, the sending node leaves its freshness unaltered, and the receiving node sets its freshness to the freshest possible value. This system encourages nodes to exchange \u201cfresh\u201d facts, then cease exchanging them as the network becomes aware of them. During each exchange, Gossamer nodes send each other one Gossamer node address, and one Gossamer message. Both nodes adjust their freshness databases, as above. If fact exchange fails while communicating with a Gossamer node, both nodes decrement their peer's freshness. Unreliable nodes can continue to initiate connections to other nodes, but will rarely be contacted by other Gossamer nodes. TODO : How do we avoid DDOSing brand-new gossamer nodes with the full might of Gossamer's network? TODO : Can we reuse Bittorrent's DHT system (BEP-5) to avoid having every node know the full network topology? TODO : Are node-to-node exchanges encrypted? If so, why and how? Authenticity \u00b6 Gossamer node addresses are not authenticated. Gossamer relies on freshness to avoid delivering excess traffic to systems not participating in the Gossamer network. ( TODO : this is a shit system for avoiding DDOS, though.) Gossamer messages are partially authenticated: each carries with it a public key, and a signature. If the signature cannot be verified with the included public key, it must be discarded immediately and it must not be propagated to other nodes. The node delivering the message may also be penalized by having its freshness reduced in the receiving node's database. Gossip Triggers \u00b6 Gossamer triggers a new Gossip exchange under the following circumstances: 15 seconds, plus a random jitter between zero and 15 more seconds, elapse since the last exchange attempt. Gossamer completes an exchange wherein it learned a new fact from another node. A user injects a fact into Gossamer directly. Gossamer exchanges that fail, or that deliver only already-known facts, do not trigger further exchanges immediately. TODO : how do we prevent Gossamer from attempting to start an unbounded number of exchanges at the same time? Size \u00b6 Gossamer must not exhaust the user's disk. Gossamer discards extremely un-fresh messages, attempting to keep the on-disk size of the message database to under 10% of the total local storage, or under a user-configurable threshold. Gossamer rejects over-large messages. Public messages carry with them the author's profile and a potentially large collection of verifications. Messages over some size ( TODO what size?) are discarded on receipt without being stored, and the message exchange is considered to have failed.","title":"Gossamer: A Decentralized Status-Sharing Network"},{"location":"gossamer/#gossamer-a-decentralized-status-sharing-network","text":"Twitter's pretty great. The short format encourages brief, pithy remarks, and the default assumption of visibility makes it super easy to pitch in on a conversation, or to find new people to listen to. Unfortunately, Twitter is a centralized system: one Bay-area company in the United States controls and mediates all Twitter interactions. From all appearances, Twitter, Inc. is relatively benign, as social media corporations go. There are few reports of censorship, and while their response to abuse of the Twitter network has not been consistently awesome, they can be made to listen. However, there exists the capacity for Twitter, Inc. to subvert the entire Twitter system, either voluntarily or at the behest of governments around the world. (Just ask Turkish people. Or the participants in the Arab Spring.) Gossamer is a Twitter-alike system, designed from the ground up to have no central authority. It resists censorship, enables individual participants to control their own data, and allows anyone at all to integrate new software into the Gossamer network. Gossamer does not exist, but if it did, the following notes describe what it might look like, and the factors to consider when implementing Gossamer as software. I have made fatal mistakes while writing it; I have not rushed to build it specifically because Twitter, Gossamer's model, is so deeply woven into so many peoples' lives. A successor must make fewer mistakes, not merely different mistakes, and certainly not more mistakes. The following is loosely inspired by Rumor Monger , at \u201cwhole world\u201d scale.","title":"Gossamer: A Decentralized Status-Sharing Network"},{"location":"gossamer/#design-goals","text":"Users must be in control of their own privacy and identity at all times. (This is a major failing with Diaspora, which limits access to personal ownership of data by being hard to run.) Users must be able to communicate without the consent or support of an intermediate authority. Short of being completely offline, Gossamer should be resilient to infrastructural damage. Any functional communication system will be used for illicit purposes. This is an unavoidable consequence of being usable for legitimate purposes without a central authority. Rather than revealing illicit conversations, Gossamer should do what it can to preserve the anonymity and privacy of legitimate ones. All nodes are as equal as possible. The node I use is not more authoritative for messages from me than any other node. You can hear my words from anyone who has heard my words, and I can hear yours from anyone who has heard your words, so long as some variety of authenticity and privacy are maintained. If an identity's secrets are removed, a node should contain no data that correlates the owner with his or her Gossamer identities. Relaying and authoring must be as indistinguishable as possible, to limit the utility of traffic analysis.","title":"Design Goals"},{"location":"gossamer/#public-and-private-information","text":"Every piece of data Gossamer uses, either internally or to communicate with other ndoes, is classified as either public or private . Public information can be communicated to other nodes, and is assumed to be safe if recovered out of band. Private information includes anything which may be used to associate a Gossamer identity with the person who controls it, except as noted below. Gossamer must ensure users understand what information that they provide will be made public, and what will be kept private, so that they can better decide what, if anything, to share and so that they can better make decisions about their own safety and comfort against abusive parties. Internally, Gossamer always stores private information encrypted, and never transmits it to another node. Gossamer must provide a tool to safely obliterate private data.","title":"Public and Private Information"},{"location":"gossamer/#public-information","text":"Details on the role of each piece of information are covered below. Public status updates, obviously. Gossamer exists to permit users to easily share short messages with one another. The opaque form of a user's incoming and outgoing private messages. The users' identities' public keys. (But not their relationship to one another.) Any information the user places in their profile. (This implies that profiles must not be auto-populated from, for example, the user's address book.) The set of identities verified by the user's identity. Any other information Gossamer retains must be private.","title":"Public Information"},{"location":"gossamer/#republishing","text":"Gossamer is built on the assumption that every participant is willing to act as a relay for every other participant. This is a complicated assumption at the human layer. Inevitably, someone will use the Gossamer network to communicate something morally repugnant or deeply illegal: the Silk Road guy, for example, got done for trying to contract someone to commit murder. Every Gossamer node is complicit in delivering those messages to the rest of the network, whether they're in the clear (status updates) or not (private messages). It's unclear how this interacts with the various legal frameworks, moral codes, and other social constructs throughout the world, and it's ethically troubling to put users in that position by default. The strong alternative, that each node only relay content with the controlling user's explicit and ongoing consent, is also troubling: it limits the Gossamer network's ability to deliver messages at all , and exposes information about which identities each node's owner considers interesting and publishable. I don't have an obvious resolution to this. Gossamer's underlying protocol relies on randomly-selected nodes being more likely to propagate a message than to ignore it, because this helps make Gossamer resilient to hostile users, nosy intelligence agencies, and others who believe communication must be restrictable. On the other hand, I'd like not to put a user in Taiwan at risk of legal or social reprisals because a total stranger in Canada decided to post something vile. (This is one of the reasons I haven't built the damn thing yet. Besides being A Lot Of Code, there's no way to shut off Gossamer once more than one node exists, and I want to be sure I've thought through what I'm doing before creating a prototype.)","title":"Republishing"},{"location":"gossamer/#identity-in-the-gossamer-network","text":"Every Gossamer message carries with it an identity . Gossamer identities are backed by public-key cryptography. However, unlike traditional public key systems such as GPG, Gossamer identities provide continuity , rather than authenticity : two Gossamer messages signed by the same key are from the same identity, but there is no inherent guarantee that that identity is legitimate. Gossamer maintains relationships between identities to allow users to verify the identities of one another, and to publish attestations of that to other Gossamer nodes. From this, Gossamer can recover much of GPG's \u201cweb of trust.\u201d TODO : revocation of identities, revocation of verifications. Both are important; novice users are likely to verify people poorly, and there should be a recovery path less drastic than GPG's \u201cyou swore it, you're stuck with it\u201d model. Gossamer encourages users to create additional identities as needed to, for example, support the separation of work and home conversations, or to provide anonymity when discussing reputationally-hazardous topics. Identities are not correlated by the Gossamer codebase. Each identity can optionally include a profile : a block of data describing the person behind the identity. The contents of a profile are chosen by the person holding the private key for an identity, and the profile is attached to every new message created with the corresponding identity. A user can update their profile at will; potentially, every message can be sent with a distinct profile. Gossamer software treats the profile it's seen with the highest timestamp as authoritative, retroactively applying it to old messages.","title":"Identity in the Gossamer Network"},{"location":"gossamer/#multiple-devices-and-key-security","text":"A Gossamer identity is entirely contained in its private key. An identity's key must be stored safely, either using the host operating system's key management facilities or using a carefully-designed key store. Keys must not hit long-term storage unprotected; this may involve careful integration with the underlying OS's memory management facilities to avoid, eg., placing identities in swap. This is necessary to protect users from having their identities recovered against their will via, for example, hard drive forensics. Gossamer allows keys to be exported into password-encrypted archive files, which can be loaded into other Gossamer applications to allow them to share the same identity. GOSSAMER MUST TREAT THESE FILES WITH EXTREME CARE, BECAUSE USERS PROBABLY WON'T . Identity keys protect the user's Gossamer identity, but they also protect the user's private messages (see below) and other potentially identifying data. The export format must be designed to be as resilient as possible, and Gossamer's software must take care to ensure that \u201cused\u201d identity files are automatically destroyed safely wherever possible and to discourage users from following practices that weaken their own safety unknowingly. Exported identity files are intrinsically vulnerable to offline brute-force attacks; once obtained, an attacker can try any of the worryingly common passwords at will, and can easily validate a password by using the recovered keys to regenerate some known fact about the original, such as a verification or a message signature. This implies that exported identities must use a key derivation system which has a high computational cost and which is believed to be resilient to, for example, GPU-accelerated cracking. Secure deletion is a Hard Problem; where possible, Gossamer must use operating system-provided facilities for securely destroying files.","title":"Multiple Devices and Key Security"},{"location":"gossamer/#status-messages","text":"Status messages are messages visible to any interested Gossamer users. These are the primary purpose of Gossamer. Each contains up to 140 Unicode characters, a markup section allowing Gossamer to attach URLs and metadata (including Gossamer locators) to the text, and an attachments section carrying arbitrary MIME blobs of limited total size. All three sections are canonicalized ( TODO : how?) and signed by the publishing identity's private key. The public key, the identity's most recent profile, and the signed status message are combined into a single Gossamer message and injected into the user's Gossamer node exactly as if it had arrived from another node. Each Gossamer node maintains a follow list of identities whose messages the user is interested in seeing. When Gossamer receives a novel status message during a gossip exchange, it displays it to the user if and only if its identity is on the node's follow list. Otherwise, the message is not displayed, but will be shared onwards with other nodes. In this way, every Gossamer node acts as a relay for every other Gossamer node. If Gossamer receives a message signed by an identity it has seen attestations for, it attaches those attestations to the message before delivering them onwards. In this way, users' verifications of one another's identity spread through the network organically.","title":"Status Messages"},{"location":"gossamer/#private-messages","text":"Gossamer can optionally encrypt messages, allowing users to send one another private messages. These messages are carried over the Gossamer network as normal, but only nodes holding the appropriate identity key can decrypt them and display them to the user. (At any given time, most Gossamer nodes hold many private messages they cannot decrypt.) Private messages do not carry the author's identity or full profile in the clear. The author's bare identity is included in the encrypted part of the message, to allow the intended recipient to identify the sender. TODO : sign-then-encrypt, or encrypt-then-sign? If sign-then-encrypt, are private messages exempted from the \u201cdrop broken messages\u201d rule above?","title":"Private Messages"},{"location":"gossamer/#following-users","text":"Each Gossamer node maintains a database of followed identities. (This may or may not include the owner's own identity.) Any message stored in the node published by an identity in this database will be shown to the user in a timeline-esque view. Gossamer's follow list is purely local , and is not shared between nodes even if they have identities in common. The follow list is additionally stored encrypted using the node's identities (any one identity is sufficient to recover the list), to ensure that the follow list is not easily available to others without the node owner's permission. Exercises such as Finding Paul Revere have shown that the collection of graph edges showing who communicates with whom can often be sufficient to map identities into people. Gossamer attempts to restrict access to this data, believing it is not the network's place to know who follows who.","title":"Following Users"},{"location":"gossamer/#verified-identities","text":"Gossamer allows identities to sign one anothers' public keys. These signatures form verifications . Gossamer considers an identity verified if any of the following hold: Gossamer has access to the identity key for the identity itself. Gossamer has access to the identity key for at least one of the identity's verifications. The identity is signed by at least three (todo: or however many, I didn't do the arithmetic yet) verified identities. Verified identities are marked in the user interface to make it obvious to the user whether a message is from a known friend or from an unknown identity. Gossamer allows users to sign new verifications for any identity they have seen. These verifications are initially stored locally, but will be published as messages transit the node as described below. Verification is a public fact: everyone can see which identities have verified which other identities. This is a potentially very powerful tool for reassociating identities with real-world people; Gossamer must make this clear to users. (I'm pretty sure you could find me, personally, just by watching whose identities I verify.) Each Gossamer node maintains a database of every verification it has ever seen or generated. If the node receives a message from an identity that appears in the verification database, and if the message is under some total size, Gossamer appends verifications from its database to the message before reinjecting it into the network. This allows verifications to propagate through","title":"Verified Identities"},{"location":"gossamer/#blocking-users","text":"Any social network will attract hostile users who wish to disrupt the network or abuse its participants. Users must be able to filter out these users, and must not provide too much feedback to blocked users that could otherwise be used to circumvent blocks. Each Gossamer node maintains a database of blocked identities. Any message from an identity in this database, or from an identity that is verified by three or more identities in this database, will automatically be filtered out from display. (Additionally, transitively-blocked users will automatically be added to the block database. Blocking is contagious.) ( TODO : should Gossamer drop blocked messages? How does that interact with the inevitable \u201cshared blocklist\u201d systems that arise in any social network?) As with the follow list, the block database is encrypted using the node's identities. Gossamer encourages users to create new identities as often as they see fit and attempts to separate identities from one another as much as possible. This is fundamentally incompatible with strong blocking. It will always be possible for a newly-created identity to deliver at least one message before being blocked. This is a major design problem ; advice encouraged.","title":"Blocking Users"},{"location":"gossamer/#gossamer-network-primitives","text":"The Gossamer network is built around a gossip protocol, wherein nodes connect to one another periodically to exchange messages with one another. Connections occur over the existing IP internet infrastructure, traversing NAT networks where possible to ensure that users on residential and corporate networks can still participate. Gossamer bootstraps its network using a number of paths: Gossamer nodes in the same broadcast domain discover one another using UDP broadcasts as well as Bonjour/mDNS. Gossamer can generate locator strings, which can be shared \u201cout of band\u201d via email, SMS messages, Twitter, graffiti, etc. Gossamer nodes share knowledge of nodes whenever they exchange messages, to allow the Gossamer network to recover from lost nodes and to permit nodes to remain on the network as \u201cknown\u201d nodes are lost to outages and entropy.","title":"Gossamer Network Primitives"},{"location":"gossamer/#locators","text":"A Gossamer locator is a URL in the g scheme, carrying an encoding of one or more network addresses as well as an encoding of one or more identities (see below). Gossamer's software attempts to determine an appropriate identifier for any identities it holds based on the host computer's network configuration, taking into account issues like NAT traversal wherever possible. TODO : Gossamer and uPNP, what do locators look like? When presented with an identifier, Gossamer offers to follow the identities it contains, and uses the nodes whose addresses it contains to connect to the Gossamer network. This allows new clients to bootstrap into Gossamer, and provides an easy way for users to exchange Gossamer identities to connect to one another later. (Clever readers will note that the address list is actually independent of the identity list.)","title":"Locators"},{"location":"gossamer/#gossip","text":"Each Gossamer node maintains a pair of \u201cfreshness\u201d databases, associating some information with a freshness score (expressed as an integer). One freshness database holds the addresses of known Gossamer nodes, and another holds Gossamer messages. Whenever two Gossamer nodes interact, each sends the other a Gossamer node from its current node database, and a message from its message database. When selecting an item to send for either category, Gossamer uses a random selection that weights towards items with a higher \u201cfreshness\u201d score. ( TODO : how?) When sending a fact, if the receiving node already knows the fact, both nodes decrement that fact's freshness by one. If the receiving node does not already know the fact, the sending node leaves its freshness unaltered, and the receiving node sets its freshness to the freshest possible value. This system encourages nodes to exchange \u201cfresh\u201d facts, then cease exchanging them as the network becomes aware of them. During each exchange, Gossamer nodes send each other one Gossamer node address, and one Gossamer message. Both nodes adjust their freshness databases, as above. If fact exchange fails while communicating with a Gossamer node, both nodes decrement their peer's freshness. Unreliable nodes can continue to initiate connections to other nodes, but will rarely be contacted by other Gossamer nodes. TODO : How do we avoid DDOSing brand-new gossamer nodes with the full might of Gossamer's network? TODO : Can we reuse Bittorrent's DHT system (BEP-5) to avoid having every node know the full network topology? TODO : Are node-to-node exchanges encrypted? If so, why and how?","title":"Gossip"},{"location":"gossamer/#authenticity","text":"Gossamer node addresses are not authenticated. Gossamer relies on freshness to avoid delivering excess traffic to systems not participating in the Gossamer network. ( TODO : this is a shit system for avoiding DDOS, though.) Gossamer messages are partially authenticated: each carries with it a public key, and a signature. If the signature cannot be verified with the included public key, it must be discarded immediately and it must not be propagated to other nodes. The node delivering the message may also be penalized by having its freshness reduced in the receiving node's database.","title":"Authenticity"},{"location":"gossamer/#gossip-triggers","text":"Gossamer triggers a new Gossip exchange under the following circumstances: 15 seconds, plus a random jitter between zero and 15 more seconds, elapse since the last exchange attempt. Gossamer completes an exchange wherein it learned a new fact from another node. A user injects a fact into Gossamer directly. Gossamer exchanges that fail, or that deliver only already-known facts, do not trigger further exchanges immediately. TODO : how do we prevent Gossamer from attempting to start an unbounded number of exchanges at the same time?","title":"Gossip Triggers"},{"location":"gossamer/#size","text":"Gossamer must not exhaust the user's disk. Gossamer discards extremely un-fresh messages, attempting to keep the on-disk size of the message database to under 10% of the total local storage, or under a user-configurable threshold. Gossamer rejects over-large messages. Public messages carry with them the author's profile and a potentially large collection of verifications. Messages over some size ( TODO what size?) are discarded on receipt without being stored, and the message exchange is considered to have failed.","title":"Size"},{"location":"gossamer/coda/","text":"A Coda \u00b6 Kit : How would you make a site where the server operator can't get at a user's data, and given handling complaints and the fact that people can still screen cap receipts etc, would you? Is it a valuable goal? Owen : That's what torpedoed my interest in developing gossamer further, honestly meg laid out an abuse case so dismal that I consider the whole concept compromised centralizing the service a little - mastodon-ishly, say - improves the situation a bit, but if they can't get at their users' data their options are limited I think secrecy and republication resilience are kind of non-goals, and the lesson I took is that accountability (and thus locality and continuity of identity) are way more important specifically accountability between community members, not accountability to the operator or to the state","title":"A Coda"},{"location":"gossamer/coda/#a-coda","text":"Kit : How would you make a site where the server operator can't get at a user's data, and given handling complaints and the fact that people can still screen cap receipts etc, would you? Is it a valuable goal? Owen : That's what torpedoed my interest in developing gossamer further, honestly meg laid out an abuse case so dismal that I consider the whole concept compromised centralizing the service a little - mastodon-ishly, say - improves the situation a bit, but if they can't get at their users' data their options are limited I think secrecy and republication resilience are kind of non-goals, and the lesson I took is that accountability (and thus locality and continuity of identity) are way more important specifically accountability between community members, not accountability to the operator or to the state","title":"A Coda"},{"location":"gossamer/mistakes/","text":"Design Mistakes \u00b6 Is Gossamer Up? \u00b6 @megtastique points out that two factors doom the whole design: There's no way to remove content from Gossamer once it's published, and Gossamer can anonymously share images. Combined, these make Gossamer the perfect vehicle for revenge porn and other gendered, sexually-loaded network abuse. This alone is enough to doom the design, as written: even restricting the size of messages to the single kilobyte range still makes it trivial to irrevocably disseminate links to similar content. Protected Feeds? Who Needs Those? \u00b6 Gossamer's design does not carry forward an important Twitter feature: the protected feed. In brief, protected feeds allow people to be choosy about who reads their status updates, without necessarily having to pick and choose who gets to read them on a message by message basis. This is an important privacy control for people who wish to engage with people they know without necessarily disclosing their whereabouts and activities to the world at large. In particular, it's important to vulnerable people because it allows them to create their own safe spaces. Protected feeds are not mere technology, either. Protected feeds carry with them social expectations: Twitter clients often either refuse to copy text from a protected feed, or present a warning when the user tries to copy text, which acts as a very cheap and, apparently, quite effective brake on the casual re-sharing that Twitter encourages for public feeds. DDOS As A Service \u00b6 Gossamer's network protocol converges towards a total graph, where every node knows how to connect to every other node, and new information (new posts) rapidly push out to every single node. If you've ever been privy to the Twitter \u201cfirehose\u201d feed, you'll understand why this is a drastic mistake. Even a moderately successful social network sees on the order of millions of messages a day. Delivering all of this directly to every node all of the time would rapidly drown users in bandwidth charges and render their internet connections completely unusable. Gossamer's design also has no concept of \u201cquiet\u201d periods: every fifteen to thirty seconds, rain or shine, every node is supposed to wake up and exchange data with some other node, regardless of how long it's been since either node in the exchange has seen new data. This very effectively ensures that Gossamer will continue to flood nodes with traffic at all times; the only way to halt the flood is to shut off the Gossamer client. Passive Nodes Matter \u00b6 It's impractical to run an inbound data service on a mobile device. Mobile devices are, by and large, not addressable or reachable by the internet at large. Mobile devices also provide a huge proportion of Twitter's content: the ability to rapidly post photos, location tags, and short text while away from desks, laptops, and formal internet connections is a huge boon for ad-hoc social organization. You can invite someone to the pub from your phone, from in front of the pub. (This interacts ... poorly with the DDOS point, above.) Traffic Analysis \u00b6 When a user enters a new status update or sends a new private message, their Gossamer node immediately forwards it to at least one other node to inject it into the network. This makes unencrypted Gossamer relatively vulnerable to traffic analysis for correlating Gossamer identities with human beings. Someone at a network \u201cpinch point\u201d -- an ISP, or a coffee shop wifi router -- can monitor Gossamer traffic entering and exiting nodes on their network and easily identify which nodes originated which messages, and thus which nodes have access to which identities. This seriously compromises the effectiveness of Gossamer's decentralized, self-certifying identities.","title":"Design Mistakes"},{"location":"gossamer/mistakes/#design-mistakes","text":"","title":"Design Mistakes"},{"location":"gossamer/mistakes/#is-gossamer-up","text":"@megtastique points out that two factors doom the whole design: There's no way to remove content from Gossamer once it's published, and Gossamer can anonymously share images. Combined, these make Gossamer the perfect vehicle for revenge porn and other gendered, sexually-loaded network abuse. This alone is enough to doom the design, as written: even restricting the size of messages to the single kilobyte range still makes it trivial to irrevocably disseminate links to similar content.","title":"Is Gossamer Up?"},{"location":"gossamer/mistakes/#protected-feeds-who-needs-those","text":"Gossamer's design does not carry forward an important Twitter feature: the protected feed. In brief, protected feeds allow people to be choosy about who reads their status updates, without necessarily having to pick and choose who gets to read them on a message by message basis. This is an important privacy control for people who wish to engage with people they know without necessarily disclosing their whereabouts and activities to the world at large. In particular, it's important to vulnerable people because it allows them to create their own safe spaces. Protected feeds are not mere technology, either. Protected feeds carry with them social expectations: Twitter clients often either refuse to copy text from a protected feed, or present a warning when the user tries to copy text, which acts as a very cheap and, apparently, quite effective brake on the casual re-sharing that Twitter encourages for public feeds.","title":"Protected Feeds? Who Needs Those?"},{"location":"gossamer/mistakes/#ddos-as-a-service","text":"Gossamer's network protocol converges towards a total graph, where every node knows how to connect to every other node, and new information (new posts) rapidly push out to every single node. If you've ever been privy to the Twitter \u201cfirehose\u201d feed, you'll understand why this is a drastic mistake. Even a moderately successful social network sees on the order of millions of messages a day. Delivering all of this directly to every node all of the time would rapidly drown users in bandwidth charges and render their internet connections completely unusable. Gossamer's design also has no concept of \u201cquiet\u201d periods: every fifteen to thirty seconds, rain or shine, every node is supposed to wake up and exchange data with some other node, regardless of how long it's been since either node in the exchange has seen new data. This very effectively ensures that Gossamer will continue to flood nodes with traffic at all times; the only way to halt the flood is to shut off the Gossamer client.","title":"DDOS As A Service"},{"location":"gossamer/mistakes/#passive-nodes-matter","text":"It's impractical to run an inbound data service on a mobile device. Mobile devices are, by and large, not addressable or reachable by the internet at large. Mobile devices also provide a huge proportion of Twitter's content: the ability to rapidly post photos, location tags, and short text while away from desks, laptops, and formal internet connections is a huge boon for ad-hoc social organization. You can invite someone to the pub from your phone, from in front of the pub. (This interacts ... poorly with the DDOS point, above.)","title":"Passive Nodes Matter"},{"location":"gossamer/mistakes/#traffic-analysis","text":"When a user enters a new status update or sends a new private message, their Gossamer node immediately forwards it to at least one other node to inject it into the network. This makes unencrypted Gossamer relatively vulnerable to traffic analysis for correlating Gossamer identities with human beings. Someone at a network \u201cpinch point\u201d -- an ISP, or a coffee shop wifi router -- can monitor Gossamer traffic entering and exiting nodes on their network and easily identify which nodes originated which messages, and thus which nodes have access to which identities. This seriously compromises the effectiveness of Gossamer's decentralized, self-certifying identities.","title":"Traffic Analysis"},{"location":"nomic/","text":"Nomic \u00b6 Nomic is a game invented in 1982 by Peter Suber, as an appendix to his PhD thesis The Paradox of Self-Amendment . In Nomic, the primary move available to the players is to change the rules of the game in a structured way. Nomic itself was intended as a minimalist study of procedural law, but it has been played very successfully by many groups over the years. I first played Nomic through Agora , a long-running Nomic of a heavily procedural bent (as opposed to variants like BlogNomic, that have developed in much more whimsical directions). I've found the game, and the communities that have sprung up around the game, deeply fascinating as a way to examine how groups reach consensus and exercise decisions. I briefly experimented with the notion of running a procedural Nomic - a mini-Agora - via Github, and produced two documents: Notes Towards Initial Rules for a Github Nomic Github Nomic Rules","title":"Nomic"},{"location":"nomic/#nomic","text":"Nomic is a game invented in 1982 by Peter Suber, as an appendix to his PhD thesis The Paradox of Self-Amendment . In Nomic, the primary move available to the players is to change the rules of the game in a structured way. Nomic itself was intended as a minimalist study of procedural law, but it has been played very successfully by many groups over the years. I first played Nomic through Agora , a long-running Nomic of a heavily procedural bent (as opposed to variants like BlogNomic, that have developed in much more whimsical directions). I've found the game, and the communities that have sprung up around the game, deeply fascinating as a way to examine how groups reach consensus and exercise decisions. I briefly experimented with the notion of running a procedural Nomic - a mini-Agora - via Github, and produced two documents: Notes Towards Initial Rules for a Github Nomic Github Nomic Rules","title":"Nomic"},{"location":"nomic/notes/","text":"Notes Towards Initial Rules for a Github Nomic \u00b6 This document is not part of the rules of a Nomic, and is present solely as a guide to the design of this initial ruleset , for play on Github. It should be removed before the game starts, and at no time should it be consulted to guide gameplay directly. Peter Suber's Nomic is a game of rule-making for one or more players. For details on the rationale behind the game and the reasons the game might be interesting, see Suber's own description. Changes from Suber's Rules \u00b6 Format \u00b6 I've marked up Suber's rules into Markdown, one of Github's \u201cnative\u201d text markup formats. This highly-structured format produces quite readable results when viewed through the Github website, and allows useful things like HTML links that point to specific rules. I've also made some diff-friendliness choices around the structure of those Markdown documents. For want of a better idea, the source documents are line-broken with one sentence per line, so that diffs naturally span whole sentences rather than arbitrarily-wrapped text (or unwrapped text). Since Github automatically recombines sequences of non-blank lines into a single HTML paragraph, the rendering on the web site is still quite readable. I have not codified this format in the rules themselves. Asynchrony \u00b6 In its original form, Nomic is appropriate for face-to-face play. The rules assume that it is practical for the players to identify one another using out-of-game context, and that it is practical for the players to take turns. Each player is expected to wait indefinitely (or, more likely, to apply non-game social pressure) if the preceding player takes inordinately long to complete their turn. Similarly, Judgement interrupts the flow of game play and brings turns to a stop. This Nomic is to be played on Github, and the players are not likely to be present simultaneously, or to be willing to wait indefinitely. It's possible for Suber's original Nomic rules to be amended, following themselves, into a form suitable for asynchronous play. This has happened several times: for examples, see Agora and BlogNomic , though there are a multitude of others. However, this process of amendment takes time , and, starting from Suber's initial rules, would require a period of one-turn-at-a-time rule-changes before the game could be played more naturally in the Github format. This period is not very interesting, and is incredibly demanding of the initial players' attention spans. In the interests of preserving the players' time, I have modified Suber's initial ruleset to replace sequential play with a simple asynchronous model of play. In summary: Every player can begin a turn at any time, even during another player's (or players') turn, so long as they aren't already taking a turn. Actions can be resolved in any order, depending on which proposals players choose to vote on, and in what order. The initial rules allow for players to end their turns without gathering every vote, once gameplay has proceeded far enough for non-unanimous votes to be possible. I have attempted to leave the rules as close to Suber's original rules as possible otherwise while implementing this change to the initial ruleset. I have faith that the process of playing Nomic will correct any deficiencies, or, failing that, will clearly identify where these changes break the game entirely. I have, as far as I am able, emulated Suber's preference for succinctness over thoroughness, and resisted the urge to fix or clarify rules even where defects seem obvious to me. In spite of my temptation to remove it, I have even left the notion of \u201cwinning\u201d intact. Rule-numbering \u00b6 The intent of this Nomic is to explore the suitability of Github's suite of tools for proposing, reviewing, and accepting changes to a corpus of text are suitable for self-governed rulemaking processes, as modelled by Nomic. Note that this is a test of Github, not of Git: it is appropriate and intended that the players rely on non-Git elements of Github's workflow (issues, wiki pages, Github Pages, and so on), and similarly it is appropriate and intended that the authentic copy of the game in play is the Github project hosting it, not the Git repo the project contains, and certainly not forks of the project or other clones of the repository. To support this intention, I have re-labelled the initial rules with negative numbers, rather than digits, so that proposals can be numbered starting from 1 without colliding with existing rules, and so that they can be numbered by their Pull Requests and Github issue numbers. (A previous version of these rules used Roman numerals for the initial rules. However, correctly accounting for the priority of new rules over initial rules, following Suber, required more changes than I was comfortable making to Suber's ruleset.) I have made it explicit in these initial rules that Github, not the players, assigns numbers to proposals. This is the only rule which mentions Github by name. I have not explicitly specified that the proposals should be implemented through pull requests; this is an intentional opportunity for player creativity. Projects & Ideas \u00b6 A small personal collection of other ideas to explore: Repeal or replace the victory criteria entirely \u00b6 \u201cWinning\u201d is not an objective I'm personally interested in, and Suber's race to 200 points by popularity of proposal is structurally quite dull. If the game is to have a victory condition, it should be built from the ground up to meet the players' motivations, rather than being retrofitted onto the points-based system. Codify the use of Git commits, rather than prose, for rules-changes \u00b6 This is unstated in this ruleset, despite being part of my intention for playing. So is the relationship between proposals and the Git repository underpinning the Github project hosting the game. Clarify the immigration and exit procedures \u00b6 The question of who the players are , or how one becomes a player, is left intentionally vague. In Suber's original rules, it appears that the players are those who are engaged in playing the game: tautological on paper, but inherently obvious by simple observation of the playing-space. On Github, the answer to this question may not be so simple. A public repository is visible to anyone with an internet connection, and will accept proposed pull requests (and issue reports) equally freely. This suggests that either everyone is, inherently, a player, or that player-ness is somehow a function of engaging with the game. I leave it to the players to resolve this situation to their own satisfaction, but my suggestion is to track player-ness using repository collaborators or organization member accounts. Figure out how to regulate the use of Github features \u00b6 Nomic, as written, largely revolves around sequential proposals. That's fine as far as it goes, but Github has a very wide array of project management features - and that set of features changes over time, outside the control of the players, as Github roll out improvements (and, sometimes, break things). Features of probable interest: The gh-pages branch and associated web site. Issue and pull request tagging and approval settings. Third-party integrations. Whether to store non-rule state, as such arises, in the repository, or in the wiki, or elsewhere. Pull request reactions and approvals. The mutability of most Github features. Expand the rules-change process to permit a single proposal to amend many rules \u00b6 This is a standard rules patch, as Suber's initial rule-set is (I believe intentionally) very restrictive. This may turn out to be less relevant on Github, if players are allowed to submit turns in rapid succession with themselves. Transition from immediate amendment to a system of sessions \u00b6 Why not? Parliamentary procedure is fun, right? In an asynchronous environment, the discrete phases of a session system (where proposals are gathered, then debated, then voted upon, then enacted as a unit) might be a better fit for the Github mode of play. Evaluate other models of proposal vetting besides majority vote \u00b6 Github open source projects regularly have a small core team of maintainers supporting a larger group of users. Is it possible to mirror this structure in Nomic? Is it wise to do so? I suspect this is only possible with an inordinately large number of players, but Github could, at least in principle, support that number of players. Note that this is a fairly standard Nomic passtime.","title":"Notes Towards Initial Rules for a Github Nomic"},{"location":"nomic/notes/#notes-towards-initial-rules-for-a-github-nomic","text":"This document is not part of the rules of a Nomic, and is present solely as a guide to the design of this initial ruleset , for play on Github. It should be removed before the game starts, and at no time should it be consulted to guide gameplay directly. Peter Suber's Nomic is a game of rule-making for one or more players. For details on the rationale behind the game and the reasons the game might be interesting, see Suber's own description.","title":"Notes Towards Initial Rules for a Github Nomic"},{"location":"nomic/notes/#changes-from-subers-rules","text":"","title":"Changes from Suber's Rules"},{"location":"nomic/notes/#format","text":"I've marked up Suber's rules into Markdown, one of Github's \u201cnative\u201d text markup formats. This highly-structured format produces quite readable results when viewed through the Github website, and allows useful things like HTML links that point to specific rules. I've also made some diff-friendliness choices around the structure of those Markdown documents. For want of a better idea, the source documents are line-broken with one sentence per line, so that diffs naturally span whole sentences rather than arbitrarily-wrapped text (or unwrapped text). Since Github automatically recombines sequences of non-blank lines into a single HTML paragraph, the rendering on the web site is still quite readable. I have not codified this format in the rules themselves.","title":"Format"},{"location":"nomic/notes/#asynchrony","text":"In its original form, Nomic is appropriate for face-to-face play. The rules assume that it is practical for the players to identify one another using out-of-game context, and that it is practical for the players to take turns. Each player is expected to wait indefinitely (or, more likely, to apply non-game social pressure) if the preceding player takes inordinately long to complete their turn. Similarly, Judgement interrupts the flow of game play and brings turns to a stop. This Nomic is to be played on Github, and the players are not likely to be present simultaneously, or to be willing to wait indefinitely. It's possible for Suber's original Nomic rules to be amended, following themselves, into a form suitable for asynchronous play. This has happened several times: for examples, see Agora and BlogNomic , though there are a multitude of others. However, this process of amendment takes time , and, starting from Suber's initial rules, would require a period of one-turn-at-a-time rule-changes before the game could be played more naturally in the Github format. This period is not very interesting, and is incredibly demanding of the initial players' attention spans. In the interests of preserving the players' time, I have modified Suber's initial ruleset to replace sequential play with a simple asynchronous model of play. In summary: Every player can begin a turn at any time, even during another player's (or players') turn, so long as they aren't already taking a turn. Actions can be resolved in any order, depending on which proposals players choose to vote on, and in what order. The initial rules allow for players to end their turns without gathering every vote, once gameplay has proceeded far enough for non-unanimous votes to be possible. I have attempted to leave the rules as close to Suber's original rules as possible otherwise while implementing this change to the initial ruleset. I have faith that the process of playing Nomic will correct any deficiencies, or, failing that, will clearly identify where these changes break the game entirely. I have, as far as I am able, emulated Suber's preference for succinctness over thoroughness, and resisted the urge to fix or clarify rules even where defects seem obvious to me. In spite of my temptation to remove it, I have even left the notion of \u201cwinning\u201d intact.","title":"Asynchrony"},{"location":"nomic/notes/#rule-numbering","text":"The intent of this Nomic is to explore the suitability of Github's suite of tools for proposing, reviewing, and accepting changes to a corpus of text are suitable for self-governed rulemaking processes, as modelled by Nomic. Note that this is a test of Github, not of Git: it is appropriate and intended that the players rely on non-Git elements of Github's workflow (issues, wiki pages, Github Pages, and so on), and similarly it is appropriate and intended that the authentic copy of the game in play is the Github project hosting it, not the Git repo the project contains, and certainly not forks of the project or other clones of the repository. To support this intention, I have re-labelled the initial rules with negative numbers, rather than digits, so that proposals can be numbered starting from 1 without colliding with existing rules, and so that they can be numbered by their Pull Requests and Github issue numbers. (A previous version of these rules used Roman numerals for the initial rules. However, correctly accounting for the priority of new rules over initial rules, following Suber, required more changes than I was comfortable making to Suber's ruleset.) I have made it explicit in these initial rules that Github, not the players, assigns numbers to proposals. This is the only rule which mentions Github by name. I have not explicitly specified that the proposals should be implemented through pull requests; this is an intentional opportunity for player creativity.","title":"Rule-numbering"},{"location":"nomic/notes/#projects-ideas","text":"A small personal collection of other ideas to explore:","title":"Projects &amp; Ideas"},{"location":"nomic/notes/#repeal-or-replace-the-victory-criteria-entirely","text":"\u201cWinning\u201d is not an objective I'm personally interested in, and Suber's race to 200 points by popularity of proposal is structurally quite dull. If the game is to have a victory condition, it should be built from the ground up to meet the players' motivations, rather than being retrofitted onto the points-based system.","title":"Repeal or replace the victory criteria entirely"},{"location":"nomic/notes/#codify-the-use-of-git-commits-rather-than-prose-for-rules-changes","text":"This is unstated in this ruleset, despite being part of my intention for playing. So is the relationship between proposals and the Git repository underpinning the Github project hosting the game.","title":"Codify the use of Git commits, rather than prose, for rules-changes"},{"location":"nomic/notes/#clarify-the-immigration-and-exit-procedures","text":"The question of who the players are , or how one becomes a player, is left intentionally vague. In Suber's original rules, it appears that the players are those who are engaged in playing the game: tautological on paper, but inherently obvious by simple observation of the playing-space. On Github, the answer to this question may not be so simple. A public repository is visible to anyone with an internet connection, and will accept proposed pull requests (and issue reports) equally freely. This suggests that either everyone is, inherently, a player, or that player-ness is somehow a function of engaging with the game. I leave it to the players to resolve this situation to their own satisfaction, but my suggestion is to track player-ness using repository collaborators or organization member accounts.","title":"Clarify the immigration and exit procedures"},{"location":"nomic/notes/#figure-out-how-to-regulate-the-use-of-github-features","text":"Nomic, as written, largely revolves around sequential proposals. That's fine as far as it goes, but Github has a very wide array of project management features - and that set of features changes over time, outside the control of the players, as Github roll out improvements (and, sometimes, break things). Features of probable interest: The gh-pages branch and associated web site. Issue and pull request tagging and approval settings. Third-party integrations. Whether to store non-rule state, as such arises, in the repository, or in the wiki, or elsewhere. Pull request reactions and approvals. The mutability of most Github features.","title":"Figure out how to regulate the use of Github features"},{"location":"nomic/notes/#expand-the-rules-change-process-to-permit-a-single-proposal-to-amend-many-rules","text":"This is a standard rules patch, as Suber's initial rule-set is (I believe intentionally) very restrictive. This may turn out to be less relevant on Github, if players are allowed to submit turns in rapid succession with themselves.","title":"Expand the rules-change process to permit a single proposal to amend many rules"},{"location":"nomic/notes/#transition-from-immediate-amendment-to-a-system-of-sessions","text":"Why not? Parliamentary procedure is fun, right? In an asynchronous environment, the discrete phases of a session system (where proposals are gathered, then debated, then voted upon, then enacted as a unit) might be a better fit for the Github mode of play.","title":"Transition from immediate amendment to a system of sessions"},{"location":"nomic/notes/#evaluate-other-models-of-proposal-vetting-besides-majority-vote","text":"Github open source projects regularly have a small core team of maintainers supporting a larger group of users. Is it possible to mirror this structure in Nomic? Is it wise to do so? I suspect this is only possible with an inordinately large number of players, but Github could, at least in principle, support that number of players. Note that this is a fairly standard Nomic passtime.","title":"Evaluate other models of proposal vetting besides majority vote"},{"location":"nomic/rules/","text":"Github Nomic Rules \u00b6 Immutable Rules \u00b6 Rule -216. \u00b6 All players must always abide by all the rules then in effect, in the form in which they are then in effect. The rules in the Initial Set are in effect whenever a game begins. The Initial Set consists of rules -216 through -201 (immutable) and rules -112 through -101 (mutable). Rule -215. \u00b6 Initially, rules -216 through -201 are immutable, and rules -112 through -101 are mutable. Rules subsequently enacted or transmuted (that is, changed from immutable to mutable or vice versa) may be immutable or mutable regardless of their numbers, and rules in the Initial Set may be transmuted regardless of their numbers. Rule -214. \u00b6 A rule-change is any of the following: the enactment, repeal, or amendment of a mutable rule; the enactment, repeal, or amendment of an amendment of a mutable rule; or the transmutation of an immutable rule into a mutable rule or vice versa. (Note: This definition implies that, at least initially, all new rules are mutable; immutable rules, as long as they are immutable, may not be amended or repealed; mutable rules, as long as they are mutable, may be amended or repealed; any rule of any status may be transmuted; no rule is absolutely immune to change.) Rule -213. \u00b6 All rule-changes proposed in the proper way shall be voted on. They will be adopted if and only if they receive the required number of votes. Rule -212. \u00b6 Every player is an eligible voter. Rule -211. \u00b6 All proposed rule-changes shall be written down before they are voted on. If they are adopted, they shall guide play in the form in which they were voted on. Rule -210. \u00b6 No rule-change may take effect earlier than the moment of the completion of the vote that adopted it, even if its wording explicitly states otherwise. No rule-change may have retroactive application. Rule -209. \u00b6 Each proposed rule-change shall be given a number for reference. The numbers shall be assigned by Github, so that each rule-change proposed in the proper way shall receive the a distinct integer from all prior proposals, whether or not the proposal is adopted. If a rule is repealed and reenacted, it receives the number of the proposal to reenact it. If a rule is amended or transmuted, it receives the number of the proposal to amend or transmute it. If an amendment is amended or repealed, the entire rule of which it is a part receives the number of the proposal to amend or repeal the amendment. Rule -208. \u00b6 Rule-changes that transmute immutable rules into mutable rules may be adopted if and only if the vote is unanimous among the eligible voters. Transmutation shall not be implied, but must be stated explicitly in a proposal to take effect. Rule -207. \u00b6 In a conflict between a mutable and an immutable rule, the immutable rule takes precedence and the mutable rule shall be entirely void. For the purposes of this rule a proposal to transmute an immutable rule does not \"conflict\" with that immutable rule. Rule -206. \u00b6 If a rule-change as proposed is unclear, ambiguous, paradoxical, or destructive of play, or if it arguably consists of two or more rule-changes compounded or is an amendment that makes no difference, or if it is otherwise of questionable value, then the other players may suggest amendments or argue against the proposal before the vote. A reasonable time must be allowed for this debate. The proponent decides the final form in which the proposal is to be voted on and, unless the Judge has been asked to do so, also decides the time to end debate and vote. Rule -205. \u00b6 The state of affairs that constitutes winning may not be altered from achieving n points to any other state of affairs. The magnitude of n and the means of earning points may be changed, and rules that establish a winner when play cannot continue may be enacted and (while they are mutable) be amended or repealed. Rule -204. \u00b6 A player always has the option to forfeit the game rather than continue to play or incur a game penalty. No penalty worse than losing, in the judgment of the player to incur it, may be imposed. Rule -203. \u00b6 There must always be at least one mutable rule. The adoption of rule-changes must never become completely impermissible. Rule -202. \u00b6 Rule-changes that affect rules needed to allow or apply rule-changes are as permissible as other rule-changes. Even rule-changes that amend or repeal their own authority are permissible. No rule-change or type of move is impermissible solely on account of the self-reference or self-application of a rule. Rule -201. \u00b6 Whatever is not prohibited or regulated by a rule is permitted and unregulated, with the sole exception of changing the rules, which is permitted only when a rule or set of rules explicitly or implicitly permits it. Mutable Rules \u00b6 Rule -112. \u00b6 A player may begin a turn at any time that suits them. Turns may overlap: one player may begin a turn while another player's is in progress. No player may begin a turn unless all of their previous turns have ended. All players begin with zero points. Rule -111. \u00b6 One turn consists of two parts in this order: proposing one rule-change and having it voted on, and scoring the proposal and adding that score to the proposing player's score. A proposal is scored by taking the proposal number, adding nine to it, multiplying the result by the fraction of favourable votes the proposal received, and rounding that result to the nearest integer. (This scoring system yields a number between 0 and 10 for the first proposal, with the upper limit increasing by one for each new proposal; more points are awarded for more popular proposals.) Rule -110. \u00b6 A rule-change is adopted if and only if the vote in favour is unanimous among the eligible voters. If this rule is not amended before each player has had two turns, it automatically changes to require only a simple majority. If and when rule-changes can only be adopted unanimously, the voting may be ended as soon as an opposing vote is counted. If and when rule-changes can be adopted by simple majority, the voting may be ended as soon as a simple majority in favour or a simple majority against is counted. Rule -109. \u00b6 If and when rule-changes can be adopted without unanimity, the players who vote against winning proposals shall receive 10 points each. Rule -108. \u00b6 An adopted rule-change takes full effect at the moment of the completion of the vote that adopted it. Rule -107. \u00b6 When a proposed rule-change is defeated, the player who proposed it loses 10 points. Rule -106. \u00b6 Each player always has exactly one vote. Rule -105. \u00b6 The winner is the first player to achieve 200 (positive) points. Rule -104. \u00b6 At no time may there be more than 25 mutable rules. Rule -103. \u00b6 If two or more mutable rules conflict with one another, or if two or more immutable rules conflict with one another, then the rule with the lowest ordinal number takes precedence. If at least one of the rules in conflict explicitly says of itself that it defers to another rule (or type of rule) or takes precedence over another rule (or type of rule), then such provisions shall supersede the numerical method for determining precedence. If two or more rules claim to take precedence over one another or to defer to one another, then the numerical method again governs. Rule -102. \u00b6 If players disagree about the legality of a move or the interpretation or application of a rule, then the player moving may ask any other player to be the Judge and decide the question. Disagreement for the purposes of this rule may be created by the insistence of any player. This process is called invoking Judgment. When Judgment has been invoked, no player may begin his or her turn without the consent of a majority of the other players. The Judge's Judgment may be overruled only by a unanimous vote of the other players taken before the next turn is begun. If a Judge's Judgment is overruled, then the Judge may ask any player other than the moving player, and other than any player who has already been the Judge for the question, to become the new Judge for the question, and so on, except that no player is to be Judge during his or her own turn or during the turn of a team-mate. Unless a Judge is overruled, one Judge settles all questions arising from the game until the next turn is begun, including questions as to his or her own legitimacy and jurisdiction as Judge. New Judges are not bound by the decisions of old Judges. New Judges may, however, settle only those questions on which the players currently disagree and that affect the completion of the turn in which Judgment was invoked. All decisions by Judges shall be in accordance with all the rules then in effect; but when the rules are silent, inconsistent, or unclear on the point at issue, then the Judge shall consider game-custom and the spirit of the game before applying other standards. Rule -101. \u00b6 If the rules are changed so that further play is impossible, or if the legality of a move cannot be determined with finality, or if by the Judge's best reasoning, not overruled, a move appears equally legal and illegal, then the first player unable to complete a turn is the winner. This rule takes precedence over every other rule determining the winner.","title":"Github Nomic Rules"},{"location":"nomic/rules/#github-nomic-rules","text":"","title":"Github Nomic Rules"},{"location":"nomic/rules/#immutable-rules","text":"","title":"Immutable Rules"},{"location":"nomic/rules/#rule-216","text":"All players must always abide by all the rules then in effect, in the form in which they are then in effect. The rules in the Initial Set are in effect whenever a game begins. The Initial Set consists of rules -216 through -201 (immutable) and rules -112 through -101 (mutable).","title":"Rule -216."},{"location":"nomic/rules/#rule-215","text":"Initially, rules -216 through -201 are immutable, and rules -112 through -101 are mutable. Rules subsequently enacted or transmuted (that is, changed from immutable to mutable or vice versa) may be immutable or mutable regardless of their numbers, and rules in the Initial Set may be transmuted regardless of their numbers.","title":"Rule -215."},{"location":"nomic/rules/#rule-214","text":"A rule-change is any of the following: the enactment, repeal, or amendment of a mutable rule; the enactment, repeal, or amendment of an amendment of a mutable rule; or the transmutation of an immutable rule into a mutable rule or vice versa. (Note: This definition implies that, at least initially, all new rules are mutable; immutable rules, as long as they are immutable, may not be amended or repealed; mutable rules, as long as they are mutable, may be amended or repealed; any rule of any status may be transmuted; no rule is absolutely immune to change.)","title":"Rule -214."},{"location":"nomic/rules/#rule-213","text":"All rule-changes proposed in the proper way shall be voted on. They will be adopted if and only if they receive the required number of votes.","title":"Rule -213."},{"location":"nomic/rules/#rule-212","text":"Every player is an eligible voter.","title":"Rule -212."},{"location":"nomic/rules/#rule-211","text":"All proposed rule-changes shall be written down before they are voted on. If they are adopted, they shall guide play in the form in which they were voted on.","title":"Rule -211."},{"location":"nomic/rules/#rule-210","text":"No rule-change may take effect earlier than the moment of the completion of the vote that adopted it, even if its wording explicitly states otherwise. No rule-change may have retroactive application.","title":"Rule -210."},{"location":"nomic/rules/#rule-209","text":"Each proposed rule-change shall be given a number for reference. The numbers shall be assigned by Github, so that each rule-change proposed in the proper way shall receive the a distinct integer from all prior proposals, whether or not the proposal is adopted. If a rule is repealed and reenacted, it receives the number of the proposal to reenact it. If a rule is amended or transmuted, it receives the number of the proposal to amend or transmute it. If an amendment is amended or repealed, the entire rule of which it is a part receives the number of the proposal to amend or repeal the amendment.","title":"Rule -209."},{"location":"nomic/rules/#rule-208","text":"Rule-changes that transmute immutable rules into mutable rules may be adopted if and only if the vote is unanimous among the eligible voters. Transmutation shall not be implied, but must be stated explicitly in a proposal to take effect.","title":"Rule -208."},{"location":"nomic/rules/#rule-207","text":"In a conflict between a mutable and an immutable rule, the immutable rule takes precedence and the mutable rule shall be entirely void. For the purposes of this rule a proposal to transmute an immutable rule does not \"conflict\" with that immutable rule.","title":"Rule -207."},{"location":"nomic/rules/#rule-206","text":"If a rule-change as proposed is unclear, ambiguous, paradoxical, or destructive of play, or if it arguably consists of two or more rule-changes compounded or is an amendment that makes no difference, or if it is otherwise of questionable value, then the other players may suggest amendments or argue against the proposal before the vote. A reasonable time must be allowed for this debate. The proponent decides the final form in which the proposal is to be voted on and, unless the Judge has been asked to do so, also decides the time to end debate and vote.","title":"Rule -206."},{"location":"nomic/rules/#rule-205","text":"The state of affairs that constitutes winning may not be altered from achieving n points to any other state of affairs. The magnitude of n and the means of earning points may be changed, and rules that establish a winner when play cannot continue may be enacted and (while they are mutable) be amended or repealed.","title":"Rule -205."},{"location":"nomic/rules/#rule-204","text":"A player always has the option to forfeit the game rather than continue to play or incur a game penalty. No penalty worse than losing, in the judgment of the player to incur it, may be imposed.","title":"Rule -204."},{"location":"nomic/rules/#rule-203","text":"There must always be at least one mutable rule. The adoption of rule-changes must never become completely impermissible.","title":"Rule -203."},{"location":"nomic/rules/#rule-202","text":"Rule-changes that affect rules needed to allow or apply rule-changes are as permissible as other rule-changes. Even rule-changes that amend or repeal their own authority are permissible. No rule-change or type of move is impermissible solely on account of the self-reference or self-application of a rule.","title":"Rule -202."},{"location":"nomic/rules/#rule-201","text":"Whatever is not prohibited or regulated by a rule is permitted and unregulated, with the sole exception of changing the rules, which is permitted only when a rule or set of rules explicitly or implicitly permits it.","title":"Rule -201."},{"location":"nomic/rules/#mutable-rules","text":"","title":"Mutable Rules"},{"location":"nomic/rules/#rule-112","text":"A player may begin a turn at any time that suits them. Turns may overlap: one player may begin a turn while another player's is in progress. No player may begin a turn unless all of their previous turns have ended. All players begin with zero points.","title":"Rule -112."},{"location":"nomic/rules/#rule-111","text":"One turn consists of two parts in this order: proposing one rule-change and having it voted on, and scoring the proposal and adding that score to the proposing player's score. A proposal is scored by taking the proposal number, adding nine to it, multiplying the result by the fraction of favourable votes the proposal received, and rounding that result to the nearest integer. (This scoring system yields a number between 0 and 10 for the first proposal, with the upper limit increasing by one for each new proposal; more points are awarded for more popular proposals.)","title":"Rule -111."},{"location":"nomic/rules/#rule-110","text":"A rule-change is adopted if and only if the vote in favour is unanimous among the eligible voters. If this rule is not amended before each player has had two turns, it automatically changes to require only a simple majority. If and when rule-changes can only be adopted unanimously, the voting may be ended as soon as an opposing vote is counted. If and when rule-changes can be adopted by simple majority, the voting may be ended as soon as a simple majority in favour or a simple majority against is counted.","title":"Rule -110."},{"location":"nomic/rules/#rule-109","text":"If and when rule-changes can be adopted without unanimity, the players who vote against winning proposals shall receive 10 points each.","title":"Rule -109."},{"location":"nomic/rules/#rule-108","text":"An adopted rule-change takes full effect at the moment of the completion of the vote that adopted it.","title":"Rule -108."},{"location":"nomic/rules/#rule-107","text":"When a proposed rule-change is defeated, the player who proposed it loses 10 points.","title":"Rule -107."},{"location":"nomic/rules/#rule-106","text":"Each player always has exactly one vote.","title":"Rule -106."},{"location":"nomic/rules/#rule-105","text":"The winner is the first player to achieve 200 (positive) points.","title":"Rule -105."},{"location":"nomic/rules/#rule-104","text":"At no time may there be more than 25 mutable rules.","title":"Rule -104."},{"location":"nomic/rules/#rule-103","text":"If two or more mutable rules conflict with one another, or if two or more immutable rules conflict with one another, then the rule with the lowest ordinal number takes precedence. If at least one of the rules in conflict explicitly says of itself that it defers to another rule (or type of rule) or takes precedence over another rule (or type of rule), then such provisions shall supersede the numerical method for determining precedence. If two or more rules claim to take precedence over one another or to defer to one another, then the numerical method again governs.","title":"Rule -103."},{"location":"nomic/rules/#rule-102","text":"If players disagree about the legality of a move or the interpretation or application of a rule, then the player moving may ask any other player to be the Judge and decide the question. Disagreement for the purposes of this rule may be created by the insistence of any player. This process is called invoking Judgment. When Judgment has been invoked, no player may begin his or her turn without the consent of a majority of the other players. The Judge's Judgment may be overruled only by a unanimous vote of the other players taken before the next turn is begun. If a Judge's Judgment is overruled, then the Judge may ask any player other than the moving player, and other than any player who has already been the Judge for the question, to become the new Judge for the question, and so on, except that no player is to be Judge during his or her own turn or during the turn of a team-mate. Unless a Judge is overruled, one Judge settles all questions arising from the game until the next turn is begun, including questions as to his or her own legitimacy and jurisdiction as Judge. New Judges are not bound by the decisions of old Judges. New Judges may, however, settle only those questions on which the players currently disagree and that affect the completion of the turn in which Judgment was invoked. All decisions by Judges shall be in accordance with all the rules then in effect; but when the rules are silent, inconsistent, or unclear on the point at issue, then the Judge shall consider game-custom and the spirit of the game before applying other standards.","title":"Rule -102."},{"location":"nomic/rules/#rule-101","text":"If the rules are changed so that further play is impossible, or if the legality of a move cannot be determined with finality, or if by the Judge's best reasoning, not overruled, a move appears equally legal and illegal, then the first player unable to complete a turn is the winner. This rule takes precedence over every other rule determining the winner.","title":"Rule -101."}]} \ No newline at end of file
+{"config":{"lang":["en"],"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Owen Jacobson \u00b6 Hire Me . I've been a professional software developer since the early 2000s and an enthusiastic amateur even longer, and a manager of developers since 2019. I'm also deeply interested in organizational dynamics and group consensus: software, like ourselves, lives in a society, and both serves the needs of and serves to help shape that society. Code . I program computers. I have done so all of my adult life, and expect to do so as long as I can string concepts together. Like many lifelong programmers, I periodically write up interesting things I've developed, collaborated on, or run across. My larger projects are on Github . Papers of Note . Computer science and development-adjacent papers and academic works I encourage people to read. Gossamer . In 2014, long before Mastodon was in any kind of widespread use, I sketched out an idea for a fully-distributed status sharing network based on Twitter, but without the weakness of the Twitter, Inc. corporation. I've preserved the writeup here, as it's an excellent case study in how blindness to social violence can lead to dangerous software design. Gossamer should never be implemented , because it would put vulnerable users at extreme risk . In 2020, with Mastodon well established and the shape of distributed status networks much more widely understood, a friend pushed me to revisit the idea . The best way to contact me is by email , but I'm present in many places . If you prefer that your mail not be read by others, my GPG key fingerprint is 77BDC4F16EFD607E85AAB63950232991F10DFFD0.","title":"Owen Jacobson"},{"location":"#owen-jacobson","text":"Hire Me . I've been a professional software developer since the early 2000s and an enthusiastic amateur even longer, and a manager of developers since 2019. I'm also deeply interested in organizational dynamics and group consensus: software, like ourselves, lives in a society, and both serves the needs of and serves to help shape that society. Code . I program computers. I have done so all of my adult life, and expect to do so as long as I can string concepts together. Like many lifelong programmers, I periodically write up interesting things I've developed, collaborated on, or run across. My larger projects are on Github . Papers of Note . Computer science and development-adjacent papers and academic works I encourage people to read. Gossamer . In 2014, long before Mastodon was in any kind of widespread use, I sketched out an idea for a fully-distributed status sharing network based on Twitter, but without the weakness of the Twitter, Inc. corporation. I've preserved the writeup here, as it's an excellent case study in how blindness to social violence can lead to dangerous software design. Gossamer should never be implemented , because it would put vulnerable users at extreme risk . In 2020, with Mastodon well established and the shape of distributed status networks much more widely understood, a friend pushed me to revisit the idea . The best way to contact me is by email , but I'm present in many places . If you prefer that your mail not be read by others, my GPG key fingerprint is 77BDC4F16EFD607E85AAB63950232991F10DFFD0.","title":"Owen Jacobson"},{"location":"hire-me/","text":"Hire Me \u00b6 I'm always interested in hearing from people and organizations that I can help, whether that means coming in for a few days to talk about end-to-end testing or joining your organization full-time to help turn an idea into reality. I live in and around Toronto. I am more than happy to work remotely, and I can probably help your organization learn to integrate remote work if it doesn't already know how. For Fun \u00b6 I regularly mentor people new to programming, teaching them how to craft working systems. This is less about teaching people to write code and more about teaching them why we care about source control, how to think about configuration, how to and why to automate testing, and how to think about software systems and data flow at a higher level. I strongly believe that software development needs a formal apprenticeship program, and mentoring has done a lot to validate that belief. Heroku/Salesforce (2015-Present) \u00b6 In my time with Heroku (and with Salesforce, Heroku's parent organization), I've contributed to the operation of services that let developers bring their ideas to life on the internet, both as a developer and as a manager. I've been involved in maintaining and expanding existing features, exploring and developing new products, and in cultivating my peers and my team as people and as developers. As an engineering manager, I've been responsible for building and supporting an effective, unified team. Moving into management was motivated by a desire to act as a force multiplier, which I've brought to life through coaching, process management, facilitating ongoing discussions about the direction and health of the team, and through actively being involved in my reports' progress as developers. As a lead developer, I worked on the Heroku build system , which ingests code from end users and deploys that code to applications running on the Heroku platform. As part of that work, we implemented a number of features to control abuse, support language-specific features and needs, and to develop new ways to deploy code to Heroku. FreshBooks (2009-2014) \u00b6 During the five years I was with the company, it grew from a 20-person one-room organization to a healthy, growing two-hundred-person technology company. As an early employee, I had my hand in many, many projects and helped the development team absorb the massive cultural changes that come with growth, while also building a SaaS product that let others realize their dreams. Some highlights: As the lead database administrator-slash-developer, I worked with the entire development team to balance concerns about reliability and availability with ensuring new ideas and incremental improvements could be executed without massive bureaucracy and at low risk. This extended into diverse parts of the company: alongside the operations team, I handled capacity planning, reliability, outage planning, and performance monitoring, while with the development team, I was responsible for designing processes and deploying tools to ease testing of database changes and ensuring smooth, predictable, and low-effort deployment to production and for training developers to make the best use of MySQL for their projects. As a tools developer, I built the Sparkplug framework to standardize the tools and processes for building message-driven applications, allowing the team to move away from monolithic web applications towards a more event-driven suite of interal systems. Providing a standard framework paid off well; building and deploying completely novel event handlers for FreshBooks\u2019 core systems could be completed in as little as a week, including testing and production provisioning. As an ops-ish toolsmith, I worked extensively on configuration management for both applications and the underlying servers. I lead a number of projects to reduce the risk around deployments: creating a standard development VM to ensure developers had an environment consistent with reality, automating packaging and rollout to testing servers, automating the creation of testing servers, and more. As part of this work, I built training materials and ran sessions to teach other developers how to think like a sysadmin, covering Linux, Puppet, virtualization, and other topics. Riptown Media (2006-2009) \u00b6 Riptown Media was an software development company tasked with building and maintaining a suite of gambling systems for a single client. I was brought on board as a Java developer, and rapidly expanded my role to encompass other fields. As the primary developer for poker-room back office and anti-fraud tools, I worked with the customer support and business intelligence teams to better understand their daily needs and frustrations, so that I could turn those into meaningful improvements to their tools and processes. These improvements, in turn, lead to measurable changes in the frequency and length of customer support calls, in fraud rates, and in the percieved value of internal customer intelligence. As a lead developer, my team put together the server half of an in-house casino gaming platform. We worked in tight collaboration with the client team, in-house and third-party testers, and interaction designers, and delivered our first game in under six months. Our platform was meant to reduce our reliance on third-party \u201cwhite label\u201d games vendors; internally, it was a success. Our game received zero customer-reported defects during its initial run. OSI Geospatial (2004-2006) \u00b6 At OSI Geospatial, I lead the development of a target-tracking and battlespace awareness overlay as part of a suite of operational theatre tools. In 2004, the state of the art for web-based geomatics software was not up to the task; this ended up being a custom server written in C++ and making heavy use of PostgreSQL and PostGIS for its inner workings. Contact Me \u00b6 You can get in touch by email at owen@grimoire.ca. I'd love to hear from you.","title":"Hire Me"},{"location":"hire-me/#hire-me","text":"I'm always interested in hearing from people and organizations that I can help, whether that means coming in for a few days to talk about end-to-end testing or joining your organization full-time to help turn an idea into reality. I live in and around Toronto. I am more than happy to work remotely, and I can probably help your organization learn to integrate remote work if it doesn't already know how.","title":"Hire Me"},{"location":"hire-me/#for-fun","text":"I regularly mentor people new to programming, teaching them how to craft working systems. This is less about teaching people to write code and more about teaching them why we care about source control, how to think about configuration, how to and why to automate testing, and how to think about software systems and data flow at a higher level. I strongly believe that software development needs a formal apprenticeship program, and mentoring has done a lot to validate that belief.","title":"For Fun"},{"location":"hire-me/#herokusalesforce-2015-present","text":"In my time with Heroku (and with Salesforce, Heroku's parent organization), I've contributed to the operation of services that let developers bring their ideas to life on the internet, both as a developer and as a manager. I've been involved in maintaining and expanding existing features, exploring and developing new products, and in cultivating my peers and my team as people and as developers. As an engineering manager, I've been responsible for building and supporting an effective, unified team. Moving into management was motivated by a desire to act as a force multiplier, which I've brought to life through coaching, process management, facilitating ongoing discussions about the direction and health of the team, and through actively being involved in my reports' progress as developers. As a lead developer, I worked on the Heroku build system , which ingests code from end users and deploys that code to applications running on the Heroku platform. As part of that work, we implemented a number of features to control abuse, support language-specific features and needs, and to develop new ways to deploy code to Heroku.","title":"Heroku/Salesforce (2015-Present)"},{"location":"hire-me/#freshbooks-2009-2014","text":"During the five years I was with the company, it grew from a 20-person one-room organization to a healthy, growing two-hundred-person technology company. As an early employee, I had my hand in many, many projects and helped the development team absorb the massive cultural changes that come with growth, while also building a SaaS product that let others realize their dreams. Some highlights: As the lead database administrator-slash-developer, I worked with the entire development team to balance concerns about reliability and availability with ensuring new ideas and incremental improvements could be executed without massive bureaucracy and at low risk. This extended into diverse parts of the company: alongside the operations team, I handled capacity planning, reliability, outage planning, and performance monitoring, while with the development team, I was responsible for designing processes and deploying tools to ease testing of database changes and ensuring smooth, predictable, and low-effort deployment to production and for training developers to make the best use of MySQL for their projects. As a tools developer, I built the Sparkplug framework to standardize the tools and processes for building message-driven applications, allowing the team to move away from monolithic web applications towards a more event-driven suite of interal systems. Providing a standard framework paid off well; building and deploying completely novel event handlers for FreshBooks\u2019 core systems could be completed in as little as a week, including testing and production provisioning. As an ops-ish toolsmith, I worked extensively on configuration management for both applications and the underlying servers. I lead a number of projects to reduce the risk around deployments: creating a standard development VM to ensure developers had an environment consistent with reality, automating packaging and rollout to testing servers, automating the creation of testing servers, and more. As part of this work, I built training materials and ran sessions to teach other developers how to think like a sysadmin, covering Linux, Puppet, virtualization, and other topics.","title":"FreshBooks (2009-2014)"},{"location":"hire-me/#riptown-media-2006-2009","text":"Riptown Media was an software development company tasked with building and maintaining a suite of gambling systems for a single client. I was brought on board as a Java developer, and rapidly expanded my role to encompass other fields. As the primary developer for poker-room back office and anti-fraud tools, I worked with the customer support and business intelligence teams to better understand their daily needs and frustrations, so that I could turn those into meaningful improvements to their tools and processes. These improvements, in turn, lead to measurable changes in the frequency and length of customer support calls, in fraud rates, and in the percieved value of internal customer intelligence. As a lead developer, my team put together the server half of an in-house casino gaming platform. We worked in tight collaboration with the client team, in-house and third-party testers, and interaction designers, and delivered our first game in under six months. Our platform was meant to reduce our reliance on third-party \u201cwhite label\u201d games vendors; internally, it was a success. Our game received zero customer-reported defects during its initial run.","title":"Riptown Media (2006-2009)"},{"location":"hire-me/#osi-geospatial-2004-2006","text":"At OSI Geospatial, I lead the development of a target-tracking and battlespace awareness overlay as part of a suite of operational theatre tools. In 2004, the state of the art for web-based geomatics software was not up to the task; this ended up being a custom server written in C++ and making heavy use of PostgreSQL and PostGIS for its inner workings.","title":"OSI Geospatial (2004-2006)"},{"location":"hire-me/#contact-me","text":"You can get in touch by email at owen@grimoire.ca. I'd love to hear from you.","title":"Contact Me"},{"location":"papers/","text":"Papers of Note \u00b6 Perlman, Radia (1985). \u201c An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN \u201d. ACM SIGCOMM Computer Communication Review. 15 (4): 44\u201353. doi:10.1145/318951.319004. The related Algorhyme , also by Perlman. Guy Lewis Steele, Jr.. \u201c Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO \u201d. MIT AI Lab. AI Lab Memo AIM-443. October 1977. What Every Computer Scientist Should Know About Floating-Point Arithmetic , by David Goldberg, published in the March, 1991 issue of Computing Surveys. Copyright 1991, Association for Computing Machinery, Inc. RFC 1925 . Regular Expression Matching Can Be Simple And Fast , Russ Cox's empirical research into degenerate cases in common regular expression implementations and a proposed implementation based on Thomson's NFA construction. The above-cited Thomson NFA paper on regular expressions. The Eight Fallacies of Distributed Computing . HAKMEM is another good one. It's dense but rewarding. Kahan, William (January 1965), \u201c Further remarks on reducing truncation errors \u201d, Communications of the ACM, 8 (1): 40, doi:10.1145/363707.363723","title":"Papers of Note"},{"location":"papers/#papers-of-note","text":"Perlman, Radia (1985). \u201c An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN \u201d. ACM SIGCOMM Computer Communication Review. 15 (4): 44\u201353. doi:10.1145/318951.319004. The related Algorhyme , also by Perlman. Guy Lewis Steele, Jr.. \u201c Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO \u201d. MIT AI Lab. AI Lab Memo AIM-443. October 1977. What Every Computer Scientist Should Know About Floating-Point Arithmetic , by David Goldberg, published in the March, 1991 issue of Computing Surveys. Copyright 1991, Association for Computing Machinery, Inc. RFC 1925 . Regular Expression Matching Can Be Simple And Fast , Russ Cox's empirical research into degenerate cases in common regular expression implementations and a proposed implementation based on Thomson's NFA construction. The above-cited Thomson NFA paper on regular expressions. The Eight Fallacies of Distributed Computing . HAKMEM is another good one. It's dense but rewarding. Kahan, William (January 1965), \u201c Further remarks on reducing truncation errors \u201d, Communications of the ACM, 8 (1): 40, doi:10.1145/363707.363723","title":"Papers of Note"},{"location":"code/","text":"Code \u00b6 Pieces of code and code-adjacent work, with or without exposition, that don't quite fit into the library ecosystem, but which I enjoyed writing. A Users, Roles & Privileges Scheme Using Graphs \u2014 An SQL schema and associated queries for handling permissions when roles can nest arbitrarily. Configuring Browser Apps \u2014 Notes on the available techniques for delivering runtime configuration to code running in a user's browser, and the tradeoffs involved. Writing Good Commit Messages \u2014 A style guide. Some collected advice about Git \u2014 Not the source control tool we want, but definitely the source control tool we've got, and I think we should make the best of it. I also maintain a Github account for more substantial projects.","title":"Code"},{"location":"code/#code","text":"Pieces of code and code-adjacent work, with or without exposition, that don't quite fit into the library ecosystem, but which I enjoyed writing. A Users, Roles & Privileges Scheme Using Graphs \u2014 An SQL schema and associated queries for handling permissions when roles can nest arbitrarily. Configuring Browser Apps \u2014 Notes on the available techniques for delivering runtime configuration to code running in a user's browser, and the tradeoffs involved. Writing Good Commit Messages \u2014 A style guide. Some collected advice about Git \u2014 Not the source control tool we want, but definitely the source control tool we've got, and I think we should make the best of it. I also maintain a Github account for more substantial projects.","title":"Code"},{"location":"code/commit-messages/","text":"Writing Good Commit Messages \u00b6 Rule zero: \u201cgood\u201d is defined by the standards of the project you're on. Have a look at what the existing messages look like, and try to emulate that first before doing anything else. Having said that, here are some principles I've found helpful and broadly applicable. Treat the first line of the message as a one-sentence summary. Most SCM systems have an \u201coverview\u201d command that shows shortened commit messages in bulk, so making the very beginning of the message meaningful helps make those modes more useful for finding specific commits. It's okay for this to be a \u201cwhat\u201d description if the rest of the message is a \u201cwhy\u201d description. Fill out the rest of the message with prose outlining why you made the change. Don't reiterate the contents of the change in great detail if you can avoid it: anyone who needs that can read the diff themselves, or reach out to ask for help understanding the change. A good rationale sets context for the problem being solved and addresses the ways the proposed change alters that context. If you use an issue tracker (and you should), include whatever issue-linking notes it supports right at the start of the message, where it'll be visible even in summarized commit logs. If your tracker has absurdly long issue-linking syntax, or doesn't support issue links in commits at all, include a short issue identifier at the front of the message and put the long part somewhere out of the way, such as on a line of its own at the end of the message. If you need rich commit messages (links, lists, and so on), pick one markup language and stick with it. It'll be easier to write useful commit formatters if you only have to deal with one syntax, rather than four. Personally, I use Markdown when I can, or a reduced subset of Markdown, as it's something most developers I interact with will be at least passing familiar with.","title":"Writing Good Commit Messages"},{"location":"code/commit-messages/#writing-good-commit-messages","text":"Rule zero: \u201cgood\u201d is defined by the standards of the project you're on. Have a look at what the existing messages look like, and try to emulate that first before doing anything else. Having said that, here are some principles I've found helpful and broadly applicable. Treat the first line of the message as a one-sentence summary. Most SCM systems have an \u201coverview\u201d command that shows shortened commit messages in bulk, so making the very beginning of the message meaningful helps make those modes more useful for finding specific commits. It's okay for this to be a \u201cwhat\u201d description if the rest of the message is a \u201cwhy\u201d description. Fill out the rest of the message with prose outlining why you made the change. Don't reiterate the contents of the change in great detail if you can avoid it: anyone who needs that can read the diff themselves, or reach out to ask for help understanding the change. A good rationale sets context for the problem being solved and addresses the ways the proposed change alters that context. If you use an issue tracker (and you should), include whatever issue-linking notes it supports right at the start of the message, where it'll be visible even in summarized commit logs. If your tracker has absurdly long issue-linking syntax, or doesn't support issue links in commits at all, include a short issue identifier at the front of the message and put the long part somewhere out of the way, such as on a line of its own at the end of the message. If you need rich commit messages (links, lists, and so on), pick one markup language and stick with it. It'll be easier to write useful commit formatters if you only have to deal with one syntax, rather than four. Personally, I use Markdown when I can, or a reduced subset of Markdown, as it's something most developers I interact with will be at least passing familiar with.","title":"Writing Good Commit Messages"},{"location":"code/configuring-browser-apps/","text":"Configuring Browser Apps \u00b6 I've found myself in he unexpected situation of having to write a lot of browser apps/single page apps this year. I have some thoughts on configuration. Why Bother \u00b6 Centralize environment-dependent facts to simplify management & testing Make it easy to manage app secrets. @wlonk adds: \u201cSecrets\u201d? What this means in a browser app is a bit different. Which is unpleasantly true. In a freestanding browser app, a \u201csecret\u201d is only as secret as your users and their network connections choose to make it, i.e., not very secret at all. Maybe that should read \u201cmake it easy to manage app tokens and identities ,\u201d instead. Keep config data & API tokens out of app's source control Integration point for external config sources (Aerobatic, Heroku, etc) The forces described in 12 Factor App: Dependencies and, to a lesser extent, 12 Factor App: Configuration apply just as well to web client apps as they do to freestanding services. What Gets Configured \u00b6 Yes: Base URLs of backend services Tokens and client IDs for various APIs No: \u201cEnvironments\u201d (sorry, Ember folks - I know Ember thought this through carefully, but whole-env configs make it easy to miss settings in prod or test, and encourage patterns like \u201call devs use the same backends\u201d) Delivering Configuration \u00b6 There are a few ways to get configuration into the app. Globals \u00b6 <head> <script>window.appConfig = { \"FOO_URL\": \"https://foo.example.com/\", \"FOO_TOKEN\": \"my-super-secret-token\" };</script> <script src=\"/your/app.js\"></script> </head> Easy to consume: it's just globals, so window.appConfig.foo will read them. This requires some discipline to use well. Have to generate a script to set them. This can be a <script>window.appConfig = {some json}</script> tag or a standalone config script loaded with <script src=\"/config.js\"> Generating config scripts sets a minimum level of complexity for the deployment process: you either need a server to generate the script at request time, or a preprocessing step at deployment time. It's code generation, which is easy to do badly. I had originally proposed using JSON.stringify to generate a Javascript object literal, but this fails for any config values with </script> in them. That may be an unlikely edge case, but that only makes it a nastier trap for administrators. There are more edge cases . I strongly suspect that a hazard-free implementation requires a full-blown JS source generator. I had a look at building something out of escodegen and estemplate , but escodegen 's node version doesn't generate browser-safe code , so string literals with </script> or </head> in them still break the page, and converting javascript values into parse trees to feed to estemplate is some seriously tedious code. Data Attributes and Link Elements \u00b6 <head> <link rel=\"foo-url\" href=\"https://foo.example.com/\"> <script src=\"/your/app.js\" data-foo-token=\"my-super-secret-token\"></script> </head> Flat values only. This is probably a good thing in the grand, since flat configurations are easier to reason about and much easier to document, but it makes namespacing trickier than it needs to be for groups of related config values (URL + token for a single service, for example). Have to generate the DOM to set them. This is only practical given server-side templates or DOM rendering. You can't do this with bare nginx, unless you pre-generate pages at deployment time. Config API Endpoint \u00b6 fetch('/config') /* {\"FOO_URL\": \u2026, \"FOO_TOKEN\": \u2026} */ .then(response => response.json()) .then(json => someConfigurableService); Works even with \u201cdumb\u201d servers (nginx, CloudFront) as the endpoint can be a generated JSON file on disk. If you can generate files, you can generate a JSON endpoint. Requires an additional request to fetch the configuration, and logic for injecting config data into all the relevant configurable places in the code. This request can't happen until all the app code has loaded. It's very tempting to write the config to a global. This produces some hilarious race conditions. Cookies \u00b6 See for example clientconfig : var config = require('clientconfig'); Easy to consume given the right tools; tricky to do right from scratch. Requires server-side support to send the correct cookie. Some servers will allow you to generate the right cookie once and store it in a config file; others will need custom logic, which means (effectively) you need an app server. Cookies persist and get re-sent on subsequent requests, even if the server stops delivering config cookies. Client code has to manage the cookie lifecycle carefully (clientconfig does this automatically) Size limits constrain how much configuration you can do.","title":"Configuring Browser Apps"},{"location":"code/configuring-browser-apps/#configuring-browser-apps","text":"I've found myself in he unexpected situation of having to write a lot of browser apps/single page apps this year. I have some thoughts on configuration.","title":"Configuring Browser Apps"},{"location":"code/configuring-browser-apps/#why-bother","text":"Centralize environment-dependent facts to simplify management & testing Make it easy to manage app secrets. @wlonk adds: \u201cSecrets\u201d? What this means in a browser app is a bit different. Which is unpleasantly true. In a freestanding browser app, a \u201csecret\u201d is only as secret as your users and their network connections choose to make it, i.e., not very secret at all. Maybe that should read \u201cmake it easy to manage app tokens and identities ,\u201d instead. Keep config data & API tokens out of app's source control Integration point for external config sources (Aerobatic, Heroku, etc) The forces described in 12 Factor App: Dependencies and, to a lesser extent, 12 Factor App: Configuration apply just as well to web client apps as they do to freestanding services.","title":"Why Bother"},{"location":"code/configuring-browser-apps/#what-gets-configured","text":"Yes: Base URLs of backend services Tokens and client IDs for various APIs No: \u201cEnvironments\u201d (sorry, Ember folks - I know Ember thought this through carefully, but whole-env configs make it easy to miss settings in prod or test, and encourage patterns like \u201call devs use the same backends\u201d)","title":"What Gets Configured"},{"location":"code/configuring-browser-apps/#delivering-configuration","text":"There are a few ways to get configuration into the app.","title":"Delivering Configuration"},{"location":"code/configuring-browser-apps/#globals","text":"<head> <script>window.appConfig = { \"FOO_URL\": \"https://foo.example.com/\", \"FOO_TOKEN\": \"my-super-secret-token\" };</script> <script src=\"/your/app.js\"></script> </head> Easy to consume: it's just globals, so window.appConfig.foo will read them. This requires some discipline to use well. Have to generate a script to set them. This can be a <script>window.appConfig = {some json}</script> tag or a standalone config script loaded with <script src=\"/config.js\"> Generating config scripts sets a minimum level of complexity for the deployment process: you either need a server to generate the script at request time, or a preprocessing step at deployment time. It's code generation, which is easy to do badly. I had originally proposed using JSON.stringify to generate a Javascript object literal, but this fails for any config values with </script> in them. That may be an unlikely edge case, but that only makes it a nastier trap for administrators. There are more edge cases . I strongly suspect that a hazard-free implementation requires a full-blown JS source generator. I had a look at building something out of escodegen and estemplate , but escodegen 's node version doesn't generate browser-safe code , so string literals with </script> or </head> in them still break the page, and converting javascript values into parse trees to feed to estemplate is some seriously tedious code.","title":"Globals"},{"location":"code/configuring-browser-apps/#data-attributes-and-link-elements","text":"<head> <link rel=\"foo-url\" href=\"https://foo.example.com/\"> <script src=\"/your/app.js\" data-foo-token=\"my-super-secret-token\"></script> </head> Flat values only. This is probably a good thing in the grand, since flat configurations are easier to reason about and much easier to document, but it makes namespacing trickier than it needs to be for groups of related config values (URL + token for a single service, for example). Have to generate the DOM to set them. This is only practical given server-side templates or DOM rendering. You can't do this with bare nginx, unless you pre-generate pages at deployment time.","title":"Data Attributes and Link Elements"},{"location":"code/configuring-browser-apps/#config-api-endpoint","text":"fetch('/config') /* {\"FOO_URL\": \u2026, \"FOO_TOKEN\": \u2026} */ .then(response => response.json()) .then(json => someConfigurableService); Works even with \u201cdumb\u201d servers (nginx, CloudFront) as the endpoint can be a generated JSON file on disk. If you can generate files, you can generate a JSON endpoint. Requires an additional request to fetch the configuration, and logic for injecting config data into all the relevant configurable places in the code. This request can't happen until all the app code has loaded. It's very tempting to write the config to a global. This produces some hilarious race conditions.","title":"Config API Endpoint"},{"location":"code/configuring-browser-apps/#cookies","text":"See for example clientconfig : var config = require('clientconfig'); Easy to consume given the right tools; tricky to do right from scratch. Requires server-side support to send the correct cookie. Some servers will allow you to generate the right cookie once and store it in a config file; others will need custom logic, which means (effectively) you need an app server. Cookies persist and get re-sent on subsequent requests, even if the server stops delivering config cookies. Client code has to manage the cookie lifecycle carefully (clientconfig does this automatically) Size limits constrain how much configuration you can do.","title":"Cookies"},{"location":"code/users-rolegraph-privs/","text":"A Users, Roles & Privileges Scheme Using Graphs \u00b6 The basic elements: Every agent that can interact with a system is represented by a user . Every capability the system has is authorized by a distinct privilege . Each user has a list of zero or more roles . Roles can imply further roles. This relationship is transitive: if role A implies role B, then a member of role A is a member of role B; if role B also implies role C, then a member of role A is also a member of role C. It helps if the resulting role graph is acyclic, but it's not necessary. Roles can grant privileges. A user's privileges are the union of the privileges granted by the transitive closure of their roles. create table \"user\" ( username varchar primary key -- credentials &c ); create table role ( name varchar primary key ); create table role_member ( role varchar not null references role, member varchar not null references \"user\", primary key (role, member) ); create table role_implies ( role varchar not null references role, implied_role varchar not null ); create table privilege ( privilege varchar primary key ); create table role_grants ( role varchar not null references role, privilege varchar not null references privilege, primary key (role, privilege) ); If your database supports recursive CTEs, this schema can be queried in one shot, since we can have the database do all the graph-walking along roles: with recursive user_roles (role) AS ( select role from role_member where member = 'SOME USERNAME' union select implied_role as role from user_roles join role_implies on user_roles.role = role_implies.role ) select distinct role_grants.privilege as privilege from user_roles join role_grants on user_roles.role = role_grants.role order by privilege; If not, you'll need to pull the entire graph into memory and manipulate it there: this schema doesn't give you any easy handles to identify only the roles transitively included in the role of interest, and repeatedly querying for each step of the graph requires an IO roundtrip at each step, burning whole milliseconds along the way. Realistic use cases should have fairly simple graphs: elemental privileges are grouped into concrete roles, which are in turn grouped into abstracted roles (by department, for example), which are in turn granted to users. If the average user is in tens of roles and has hundreds of privileges, the entire dataset fits in memory, and PostgreSQL performs well. In PostgreSQL, the above schema handles ~10k privileges and ~10k roles with randomly-generated graph relationships in around 100ms on my laptop, which is pretty slow but not intolerable. Perverse cases (interconnected total subgraphs, deeply-nested linear graphs) can take absurd time but do not reflect any likely permissions scheme.","title":"A Users, Roles & Privileges Scheme Using Graphs"},{"location":"code/users-rolegraph-privs/#a-users-roles-privileges-scheme-using-graphs","text":"The basic elements: Every agent that can interact with a system is represented by a user . Every capability the system has is authorized by a distinct privilege . Each user has a list of zero or more roles . Roles can imply further roles. This relationship is transitive: if role A implies role B, then a member of role A is a member of role B; if role B also implies role C, then a member of role A is also a member of role C. It helps if the resulting role graph is acyclic, but it's not necessary. Roles can grant privileges. A user's privileges are the union of the privileges granted by the transitive closure of their roles. create table \"user\" ( username varchar primary key -- credentials &c ); create table role ( name varchar primary key ); create table role_member ( role varchar not null references role, member varchar not null references \"user\", primary key (role, member) ); create table role_implies ( role varchar not null references role, implied_role varchar not null ); create table privilege ( privilege varchar primary key ); create table role_grants ( role varchar not null references role, privilege varchar not null references privilege, primary key (role, privilege) ); If your database supports recursive CTEs, this schema can be queried in one shot, since we can have the database do all the graph-walking along roles: with recursive user_roles (role) AS ( select role from role_member where member = 'SOME USERNAME' union select implied_role as role from user_roles join role_implies on user_roles.role = role_implies.role ) select distinct role_grants.privilege as privilege from user_roles join role_grants on user_roles.role = role_grants.role order by privilege; If not, you'll need to pull the entire graph into memory and manipulate it there: this schema doesn't give you any easy handles to identify only the roles transitively included in the role of interest, and repeatedly querying for each step of the graph requires an IO roundtrip at each step, burning whole milliseconds along the way. Realistic use cases should have fairly simple graphs: elemental privileges are grouped into concrete roles, which are in turn grouped into abstracted roles (by department, for example), which are in turn granted to users. If the average user is in tens of roles and has hundreds of privileges, the entire dataset fits in memory, and PostgreSQL performs well. In PostgreSQL, the above schema handles ~10k privileges and ~10k roles with randomly-generated graph relationships in around 100ms on my laptop, which is pretty slow but not intolerable. Perverse cases (interconnected total subgraphs, deeply-nested linear graphs) can take absurd time but do not reflect any likely permissions scheme.","title":"A Users, Roles &amp; Privileges Scheme Using Graphs"},{"location":"git/","text":"Collected Advice about Git \u00b6 git-config Settings You Want \u2014 Git is highly configurable, and the defaults have gotten drastically better over the years, but there are still some non-default behaviours that I've found make life better. Notes Towards Detached Signatures in Git \u2014 An idea I had, but never fully developed, for implementing after-the-fact object signing on top of Git. This was based on a similar feature in Monotone, which I'd found very effective for annotating commits on the fly. Life With Pull Requests \u2014 Some notes I made while getting up to speed with pull requests to help my team come to grips with the workflows. Git Is Not Magic \u2014 An exploration of Git's on-disk data structures and the design choices taken very early in Git's existence. Stop using git pull for deployment! \u2014 Describing the least-painful way to use Git as a deployment tool I had worked out, circa 2014. Written in an aversarial style as a response to repeated \u201dwhy don't we just\u201ds that, while well-intentioned, came from an incomplete understanding of what git pull does. Git Survival Guide \u2014 Some words of caution about Git, git 's preferred workflows, and various recoverable mistakes.","title":"Collected Advice about Git"},{"location":"git/#collected-advice-about-git","text":"git-config Settings You Want \u2014 Git is highly configurable, and the defaults have gotten drastically better over the years, but there are still some non-default behaviours that I've found make life better. Notes Towards Detached Signatures in Git \u2014 An idea I had, but never fully developed, for implementing after-the-fact object signing on top of Git. This was based on a similar feature in Monotone, which I'd found very effective for annotating commits on the fly. Life With Pull Requests \u2014 Some notes I made while getting up to speed with pull requests to help my team come to grips with the workflows. Git Is Not Magic \u2014 An exploration of Git's on-disk data structures and the design choices taken very early in Git's existence. Stop using git pull for deployment! \u2014 Describing the least-painful way to use Git as a deployment tool I had worked out, circa 2014. Written in an aversarial style as a response to repeated \u201dwhy don't we just\u201ds that, while well-intentioned, came from an incomplete understanding of what git pull does. Git Survival Guide \u2014 Some words of caution about Git, git 's preferred workflows, and various recoverable mistakes.","title":"Collected Advice about Git"},{"location":"git/config/","text":"git-config Settings You Want \u00b6 Git comes with some fairly lkml -specific configuration defaults. You should fix this. All of the items below can be set either for your entire login account ( git config --global ) or for a specific repository ( git config ). Full documentation is under git help config , unless otherwise stated. git config user.name 'Your Full Name' and git config user.email 'your-email@example.com' , obviously. Git will remind you about this if you forget. git config merge.defaultToUpstream true - causes an unqualified git merge to merge the current branch's configured upstream branch, rather than being an error. This makes git merge much more consistent with git rebase , and as the two tools fill very similar workflow niches, it's nice to have them behave similarly. git config rebase.autosquash true - causes git rebase -i to parse magic comments created by git commit --squash=some-hash and git commit --fixup=some-hash and reorder the commit list before presenting it for further editing. See the descriptions of \u201csquash\u201d and \u201cfixup\u201d in git help rebase for details; autosquash makes amending commits other than the most recent easier and less error-prone. git config branch.autosetupmerge always - newly-created branches whose start point is a branch ( git checkout master -b some-feature , git branch some-feature origin/develop , and so on) will be configured to have the start point branch as their upstream. By default (with true rather than always ) this only happens when the start point is a remote-tracking branch. git config rerere.enabled true - enable \u201creuse recorded resolution.\u201d The git help rerere docs explain it pretty well, but the short version is that git can record how you resolve conflicts during a \u201ctest\u201d merge and reuse the same approach when resolving the same conflict later, in a \u201creal\u201d merge. For advanced users \u00b6 A few things are nice when you're getting started, but become annoying when you no longer need them. git config advice.detachedHead - if you already understand the difference between having a branch checked out and having a commit checked out, and already understand what \u201cdetatched head\u201d means, the warning on every git checkout ...some detatched thing... isn't helping anyone. This is also useful repositories used for deployment, where specific commits (from tags, for example) are regularly checked out.","title":"git-config Settings You Want"},{"location":"git/config/#git-config-settings-you-want","text":"Git comes with some fairly lkml -specific configuration defaults. You should fix this. All of the items below can be set either for your entire login account ( git config --global ) or for a specific repository ( git config ). Full documentation is under git help config , unless otherwise stated. git config user.name 'Your Full Name' and git config user.email 'your-email@example.com' , obviously. Git will remind you about this if you forget. git config merge.defaultToUpstream true - causes an unqualified git merge to merge the current branch's configured upstream branch, rather than being an error. This makes git merge much more consistent with git rebase , and as the two tools fill very similar workflow niches, it's nice to have them behave similarly. git config rebase.autosquash true - causes git rebase -i to parse magic comments created by git commit --squash=some-hash and git commit --fixup=some-hash and reorder the commit list before presenting it for further editing. See the descriptions of \u201csquash\u201d and \u201cfixup\u201d in git help rebase for details; autosquash makes amending commits other than the most recent easier and less error-prone. git config branch.autosetupmerge always - newly-created branches whose start point is a branch ( git checkout master -b some-feature , git branch some-feature origin/develop , and so on) will be configured to have the start point branch as their upstream. By default (with true rather than always ) this only happens when the start point is a remote-tracking branch. git config rerere.enabled true - enable \u201creuse recorded resolution.\u201d The git help rerere docs explain it pretty well, but the short version is that git can record how you resolve conflicts during a \u201ctest\u201d merge and reuse the same approach when resolving the same conflict later, in a \u201creal\u201d merge.","title":"git-config Settings You Want"},{"location":"git/config/#for-advanced-users","text":"A few things are nice when you're getting started, but become annoying when you no longer need them. git config advice.detachedHead - if you already understand the difference between having a branch checked out and having a commit checked out, and already understand what \u201cdetatched head\u201d means, the warning on every git checkout ...some detatched thing... isn't helping anyone. This is also useful repositories used for deployment, where specific commits (from tags, for example) are regularly checked out.","title":"For advanced users"},{"location":"git/detached-sigs/","text":"Notes Towards Detached Signatures in Git \u00b6 Git supports a limited form of object authentication: specific object categories in Git's internal model can have GPG signatures embedded in them, allowing the authorship of the objects to be verified using GPG's underlying trust model. Tag signatures can be used to verify the authenticity and integrity of the snapshot associated with a tag , and the authenticity of the tag itself, filling a niche broadly similar to code signing in binary distribution systems. Commit signatures can be used to verify the authenticity of the snapshot associated with the commit , and the authorship of the commit itself. (Conventionally, commit signatures are assumed to also authenticate either the entire line of history leading to a commit, or the diff between the commit and its first parent, or both.) Git's existing system has some tradeoffs. Signatures are embedded within the objects they sign. The signature is part of the object's identity; since Git is content-addressed, this means that an object can neither be retroactively signed nor retroactively stripped of its signature without modifying the object's identity. Git's distributed model means that these sorts of identity changes are both complicated and easily detected. Commit signatures are second-class citizens. They're a relatively recent addition to the Git suite, and both the implementation and the social conventions around them continue to evolve. Only some objects can be signed. While Git has relatively weak rules about workflow, the signature system assumes you're using one of Git's more widespread workflows by limiting your options to at most one signature, and by restricting signatures to tags and commits (leaving out blobs, trees, and refs). I believe it would be useful from an authentication standpoint to add \"detached\" signatures to Git, to allow users to make these tradeoffs differently if desired. These signatures would be stored as separate (blob) objects in a dedicated refs namespace, supporting retroactive signatures, multiple signatures for a given object, \"policy\" signatures, and authentication of arbitrary objects. The following notes are partially guided by Git's one existing \"detached metadata\" facility, git notes . Similarities are intentional; divergences will be noted where appropriate. Detached signatures are meant to interoperate with existing Git workflow as much as possible: in particular, they can be fetched and pushed like any other bit of Git metadata. A detached signature cryptographically binds three facts together into an assertion whose authenticity can be checked by anyone with access to the signatory's keys: An object (in the Git sense; a commit, tag, tree, or blob), A policy label, and A signatory (a person or agent making the assertion). These assertions can be published separately from or in tandem with the objects they apply to. Policies \u00b6 Taking a hint from Monotone, every signature includes a \"policy\" identifying how the signature is meant to be interpreted. Policies are arbitrary strings; their meaning is entirely defined by tooling and convention, not by this draft. This draft uses a single policy, author , for its examples. A signature under the author policy implies that the signatory had a hand in the authorship of the designated object. (This is compatible with existing interpretations of signed tags and commits.) (Authorship under this model is strictly self-attested: you can claim authorship of anything, and you cannot assert anyone else's authorship.) The Monotone documentation suggests a number of other useful policies related to testing and release status, automated build results, and numerous other factors. Use your imagination. What's In A Signature \u00b6 Detached signatures cover the disk representation of an object, as given by git cat-file <TYPE> <SHA1> For most of Git's object types, this means that the signed content is plain text. For tree objects, the signed content is the awful binary representation of the tree, not the pretty representation given by git ls-tree or git show . Detached signatures include the \"policy\" identifier in the signed content, to prevent others from tampering with policy choices via refs hackery. (This will make more sense momentarily.) The policy identifier is prepended to the signed content, terminated by a zero byte (as with Git's own type identifiers, but without a length field as length checks are performed by signing and again when the signature is stored in Git). To generate the complete signable version of an object, use something equivalent to the following shell snippet: # generate-signable POLICY TYPE SHA1 function generate-signable() { printf '%s\\0' \"$1\" git cat-file \"$2\" \"$3\" } (In the process of writing this, I discovered how hard it is to get Unix's C-derived shell tools to emit a zero byte.) Signature Storage and Naming \u00b6 We assume that a userid will sign an object at most once. Each signature is stored in an independent blob object in the repository it applies to. The signature object (described above) is stored in Git, and its hash recorded in refs/signatures/<POLICY>/<SUBJECT SHA1>/<SIGNER KEY FINGERPRINT> . # sign POLICY TYPE SHA1 FINGERPRINT function sign() { local SIG_HASH=$( generate-signable \"$@\" | gpg --batch --no-tty --sign -u \"$4\" | git hash-object --stdin -w -t blob ) git update-ref \"refs/signatures/$1/$3/$4\" } Stored signatures always use the complete fingerprint to identify keys, to minimize the risk of colliding key IDs while avoiding the need to store full keys in the refs naming hierarchy. The policy name can be reliably extracted from the ref, as the trailing part has a fixed length (in both path segments and bytes) and each ref begins with a fixed, constant prefix refs/signatures/ . Signature Verification \u00b6 Given a signature ref as described above, we can verify and authenticate the signature and bind it to the associated object and policy by performing the following check: Pick apart the ref into policy, SHA1, and key fingerprint parts. Reconstruct the signed body as above, using the policy name extracted from the ref. Retrieve the signature from the ref and combine it with the object itself. Verify that the policy in the stored signature matches the policy in the ref. Verify the signature with GPG: ```bash verify-gpg POLICY TYPE SHA1 FINGERPRINT \u00b6 verify-gpg() { { git cat-file \"$2\" \"$3\" git cat-file \"refs/signatures/$1/$3/$4\" } | gpg --batch --no-tty --verify } ``` Verify the key fingerprint of the signing key matches the key fingerprint in the ref itself. The specific rules for verifying the signature in GPG are left up to the user to define; for example, some sites may want to auto-retrieve keys and use a web of trust from some known roots to determine which keys are trusted, while others may wish to maintain a specific, known keyring containing all signing keys for each policy, and skip the web of trust entirely. This can be accomplished via git-config , given some work, and via gpg.conf . Distributing Signatures \u00b6 Since each signature is stored in a separate ref, and since signatures are not expected to be amended once published, the following refspec can be used with git fetch and git push to distribute signatures: refs/signatures/*:refs/signatures/* Note the lack of a + decoration; we explicitly do not want to auto-replace modified signatures, normally; explicit user action should be required. Workflow Notes \u00b6 There are two verification workflows for signatures: \"static\" verification, where the repository itself already contains all the refs and objects needed for signature verification, and \"pre-receive\" verification, where an object and its associated signature may be being uploaded at the same time. It is impractical to verify signatures on the fly from an update hook . Only pre-receive hooks can usefully accept or reject ref changes depending on whether the push contains a signature for the pushed objects. (Git does not provide a good mechanism for ensuring that signature objects are pushed before their subjects.) Correctly verifying object signatures during pre-receive regardless of ref order is far too complicated to summarize here. Attacks \u00b6 Lies of Omission \u00b6 It's trivial to hide signatures by deleting the signature refs. Similarly, anyone with access to a repository can delete any or all detached signatures from it without otherwise invalidating the signed objects. Since signatures are mostly static, sites following the recommended no-force policy for signature publication should only be affected if relatively recent signatures are deleted. Older signatures should be available in one or more of the repository users' loca repositories; once created, a signature can be legitimately obtained from anywhere, not only from the original signatory. The signature naming protocol is designed to resist most other forms of assertion tampering, but straight-up omission is hard to prevent. Unwarranted Certification \u00b6 The policy system allows any signatory to assert any policy. While centralized signature distribution points such as \"release\" repositories can make meaningful decisions about which signatures they choose to accept, publish, and propagate, there's no way to determine after the fact whether a policy assertion was obtained from a legitimate source or a malicious one with no grounds for asserting the policy. For example, I could, right now, sign an all-tests-pass policy assertion for the Linux kernel. While there's no chance on Earth that the LKML team would propagate that assertion, if I can convince you to fetch signatures from my repository, you will fetch my bogus assertion. If all-tests-pass is a meaningful policy assertion for the Linux kernel, then you will have very few options besides believing that I assert that all tests have passed. Ambigiuous Policy \u00b6 This is an ongoing problem with crypto policy systems and user interfaces generally, but this design does nothing to ensure that policies are interpreted uniformly by all participants in a repository. In particular, there's no mechanism described for distributing either prose or programmatic policy definitions and checks. All policy information is out of band. Git already has ambiguity problems around commit signing: there are multiple ways to interpret a signature on a commit: I assert that this snapshot and commit message were authored as described in this commit's metadata. (In this interpretation, the signature's authenticity guarantees do not transitively apply to parents.) I assert that this snapshot and commit message were authored as described in this commit's metadata, based on exactly the parent commits described. (In this interpretation, the signature's authenticity guarantees do transitively apply to parents. This is the interpretation favoured by XXX LINK HERE XXX.) I assert that this diff and commit message was authored as described in this commit's metadata. (No assertions about the snapshot are made whatsoever, and assertions about parentage are barely sensical at all. This meshes with widespread, diff-oriented policies.) Grafts and Replacements \u00b6 Git permits post-hoc replacement of arbitrary objects via both the grafts system (via an untracked, non-distributed file in .git , though some repositories distribute graft lists for end-users to manually apply) and the replacements system (via refs/replace/<SHA1> , which can optionally be fetched or pushed). The interaction between these two systems and signature verification needs to be very closely considered; I've not yet done so. Cases of note: Neither signature nor subject replaced - the \"normal\" case Signature not replaced, subject replaced (by graft, by replacement, by both) Signature replaced, subject not replaced Both signature and subject replaced It's tempting to outright disable git replace during signing and verification, but this will have surprising effects when signing a ref-ish instead of a bare hash. Since this is the normal case, I think this merits more thought. (I'm also not aware of a way to disable grafts without modifying .git , and having the two replacement mechanisms treated differently may be dangerous.) No Signed Refs \u00b6 I mentioned early in this draft that Git's existing signing system doesn't support signing refs themselves; since refs are an important piece of Git's workflow ecosystem, this may be a major omission. Unfortunately, this proposal doesn't address that. Possible Refinements \u00b6 Monotone's certificate system is key+value based, rather than label-based. This might be useful; while small pools of related values can be asserted using mutually exclusive policy labels (whose mutual exclusion is a matter of local interpretation), larger pools of related values rapidly become impractical under the proposed system. For example, this proposal would be inappropriate for directly asserting third-party authorship; the asserted author would have to appear in the policy name itself, exposing the user to a potentially very large number of similar policy labels. Ref signing via a manifest (a tree constellation whose paths are ref names and whose blobs sign the refs' values). Consider cribbing DNSSEC here for things like lightweight absence assertions, too. Describe how this should interact with commit-duplicating and commit-rewriting workflows.","title":"Notes Towards Detached Signatures in Git"},{"location":"git/detached-sigs/#notes-towards-detached-signatures-in-git","text":"Git supports a limited form of object authentication: specific object categories in Git's internal model can have GPG signatures embedded in them, allowing the authorship of the objects to be verified using GPG's underlying trust model. Tag signatures can be used to verify the authenticity and integrity of the snapshot associated with a tag , and the authenticity of the tag itself, filling a niche broadly similar to code signing in binary distribution systems. Commit signatures can be used to verify the authenticity of the snapshot associated with the commit , and the authorship of the commit itself. (Conventionally, commit signatures are assumed to also authenticate either the entire line of history leading to a commit, or the diff between the commit and its first parent, or both.) Git's existing system has some tradeoffs. Signatures are embedded within the objects they sign. The signature is part of the object's identity; since Git is content-addressed, this means that an object can neither be retroactively signed nor retroactively stripped of its signature without modifying the object's identity. Git's distributed model means that these sorts of identity changes are both complicated and easily detected. Commit signatures are second-class citizens. They're a relatively recent addition to the Git suite, and both the implementation and the social conventions around them continue to evolve. Only some objects can be signed. While Git has relatively weak rules about workflow, the signature system assumes you're using one of Git's more widespread workflows by limiting your options to at most one signature, and by restricting signatures to tags and commits (leaving out blobs, trees, and refs). I believe it would be useful from an authentication standpoint to add \"detached\" signatures to Git, to allow users to make these tradeoffs differently if desired. These signatures would be stored as separate (blob) objects in a dedicated refs namespace, supporting retroactive signatures, multiple signatures for a given object, \"policy\" signatures, and authentication of arbitrary objects. The following notes are partially guided by Git's one existing \"detached metadata\" facility, git notes . Similarities are intentional; divergences will be noted where appropriate. Detached signatures are meant to interoperate with existing Git workflow as much as possible: in particular, they can be fetched and pushed like any other bit of Git metadata. A detached signature cryptographically binds three facts together into an assertion whose authenticity can be checked by anyone with access to the signatory's keys: An object (in the Git sense; a commit, tag, tree, or blob), A policy label, and A signatory (a person or agent making the assertion). These assertions can be published separately from or in tandem with the objects they apply to.","title":"Notes Towards Detached Signatures in Git"},{"location":"git/detached-sigs/#policies","text":"Taking a hint from Monotone, every signature includes a \"policy\" identifying how the signature is meant to be interpreted. Policies are arbitrary strings; their meaning is entirely defined by tooling and convention, not by this draft. This draft uses a single policy, author , for its examples. A signature under the author policy implies that the signatory had a hand in the authorship of the designated object. (This is compatible with existing interpretations of signed tags and commits.) (Authorship under this model is strictly self-attested: you can claim authorship of anything, and you cannot assert anyone else's authorship.) The Monotone documentation suggests a number of other useful policies related to testing and release status, automated build results, and numerous other factors. Use your imagination.","title":"Policies"},{"location":"git/detached-sigs/#whats-in-a-signature","text":"Detached signatures cover the disk representation of an object, as given by git cat-file <TYPE> <SHA1> For most of Git's object types, this means that the signed content is plain text. For tree objects, the signed content is the awful binary representation of the tree, not the pretty representation given by git ls-tree or git show . Detached signatures include the \"policy\" identifier in the signed content, to prevent others from tampering with policy choices via refs hackery. (This will make more sense momentarily.) The policy identifier is prepended to the signed content, terminated by a zero byte (as with Git's own type identifiers, but without a length field as length checks are performed by signing and again when the signature is stored in Git). To generate the complete signable version of an object, use something equivalent to the following shell snippet: # generate-signable POLICY TYPE SHA1 function generate-signable() { printf '%s\\0' \"$1\" git cat-file \"$2\" \"$3\" } (In the process of writing this, I discovered how hard it is to get Unix's C-derived shell tools to emit a zero byte.)","title":"What's In A Signature"},{"location":"git/detached-sigs/#signature-storage-and-naming","text":"We assume that a userid will sign an object at most once. Each signature is stored in an independent blob object in the repository it applies to. The signature object (described above) is stored in Git, and its hash recorded in refs/signatures/<POLICY>/<SUBJECT SHA1>/<SIGNER KEY FINGERPRINT> . # sign POLICY TYPE SHA1 FINGERPRINT function sign() { local SIG_HASH=$( generate-signable \"$@\" | gpg --batch --no-tty --sign -u \"$4\" | git hash-object --stdin -w -t blob ) git update-ref \"refs/signatures/$1/$3/$4\" } Stored signatures always use the complete fingerprint to identify keys, to minimize the risk of colliding key IDs while avoiding the need to store full keys in the refs naming hierarchy. The policy name can be reliably extracted from the ref, as the trailing part has a fixed length (in both path segments and bytes) and each ref begins with a fixed, constant prefix refs/signatures/ .","title":"Signature Storage and Naming"},{"location":"git/detached-sigs/#signature-verification","text":"Given a signature ref as described above, we can verify and authenticate the signature and bind it to the associated object and policy by performing the following check: Pick apart the ref into policy, SHA1, and key fingerprint parts. Reconstruct the signed body as above, using the policy name extracted from the ref. Retrieve the signature from the ref and combine it with the object itself. Verify that the policy in the stored signature matches the policy in the ref. Verify the signature with GPG: ```bash","title":"Signature Verification"},{"location":"git/detached-sigs/#verify-gpg-policy-type-sha1-fingerprint","text":"verify-gpg() { { git cat-file \"$2\" \"$3\" git cat-file \"refs/signatures/$1/$3/$4\" } | gpg --batch --no-tty --verify } ``` Verify the key fingerprint of the signing key matches the key fingerprint in the ref itself. The specific rules for verifying the signature in GPG are left up to the user to define; for example, some sites may want to auto-retrieve keys and use a web of trust from some known roots to determine which keys are trusted, while others may wish to maintain a specific, known keyring containing all signing keys for each policy, and skip the web of trust entirely. This can be accomplished via git-config , given some work, and via gpg.conf .","title":"verify-gpg POLICY TYPE SHA1 FINGERPRINT"},{"location":"git/detached-sigs/#distributing-signatures","text":"Since each signature is stored in a separate ref, and since signatures are not expected to be amended once published, the following refspec can be used with git fetch and git push to distribute signatures: refs/signatures/*:refs/signatures/* Note the lack of a + decoration; we explicitly do not want to auto-replace modified signatures, normally; explicit user action should be required.","title":"Distributing Signatures"},{"location":"git/detached-sigs/#workflow-notes","text":"There are two verification workflows for signatures: \"static\" verification, where the repository itself already contains all the refs and objects needed for signature verification, and \"pre-receive\" verification, where an object and its associated signature may be being uploaded at the same time. It is impractical to verify signatures on the fly from an update hook . Only pre-receive hooks can usefully accept or reject ref changes depending on whether the push contains a signature for the pushed objects. (Git does not provide a good mechanism for ensuring that signature objects are pushed before their subjects.) Correctly verifying object signatures during pre-receive regardless of ref order is far too complicated to summarize here.","title":"Workflow Notes"},{"location":"git/detached-sigs/#attacks","text":"","title":"Attacks"},{"location":"git/detached-sigs/#lies-of-omission","text":"It's trivial to hide signatures by deleting the signature refs. Similarly, anyone with access to a repository can delete any or all detached signatures from it without otherwise invalidating the signed objects. Since signatures are mostly static, sites following the recommended no-force policy for signature publication should only be affected if relatively recent signatures are deleted. Older signatures should be available in one or more of the repository users' loca repositories; once created, a signature can be legitimately obtained from anywhere, not only from the original signatory. The signature naming protocol is designed to resist most other forms of assertion tampering, but straight-up omission is hard to prevent.","title":"Lies of Omission"},{"location":"git/detached-sigs/#unwarranted-certification","text":"The policy system allows any signatory to assert any policy. While centralized signature distribution points such as \"release\" repositories can make meaningful decisions about which signatures they choose to accept, publish, and propagate, there's no way to determine after the fact whether a policy assertion was obtained from a legitimate source or a malicious one with no grounds for asserting the policy. For example, I could, right now, sign an all-tests-pass policy assertion for the Linux kernel. While there's no chance on Earth that the LKML team would propagate that assertion, if I can convince you to fetch signatures from my repository, you will fetch my bogus assertion. If all-tests-pass is a meaningful policy assertion for the Linux kernel, then you will have very few options besides believing that I assert that all tests have passed.","title":"Unwarranted Certification"},{"location":"git/detached-sigs/#ambigiuous-policy","text":"This is an ongoing problem with crypto policy systems and user interfaces generally, but this design does nothing to ensure that policies are interpreted uniformly by all participants in a repository. In particular, there's no mechanism described for distributing either prose or programmatic policy definitions and checks. All policy information is out of band. Git already has ambiguity problems around commit signing: there are multiple ways to interpret a signature on a commit: I assert that this snapshot and commit message were authored as described in this commit's metadata. (In this interpretation, the signature's authenticity guarantees do not transitively apply to parents.) I assert that this snapshot and commit message were authored as described in this commit's metadata, based on exactly the parent commits described. (In this interpretation, the signature's authenticity guarantees do transitively apply to parents. This is the interpretation favoured by XXX LINK HERE XXX.) I assert that this diff and commit message was authored as described in this commit's metadata. (No assertions about the snapshot are made whatsoever, and assertions about parentage are barely sensical at all. This meshes with widespread, diff-oriented policies.)","title":"Ambigiuous Policy"},{"location":"git/detached-sigs/#grafts-and-replacements","text":"Git permits post-hoc replacement of arbitrary objects via both the grafts system (via an untracked, non-distributed file in .git , though some repositories distribute graft lists for end-users to manually apply) and the replacements system (via refs/replace/<SHA1> , which can optionally be fetched or pushed). The interaction between these two systems and signature verification needs to be very closely considered; I've not yet done so. Cases of note: Neither signature nor subject replaced - the \"normal\" case Signature not replaced, subject replaced (by graft, by replacement, by both) Signature replaced, subject not replaced Both signature and subject replaced It's tempting to outright disable git replace during signing and verification, but this will have surprising effects when signing a ref-ish instead of a bare hash. Since this is the normal case, I think this merits more thought. (I'm also not aware of a way to disable grafts without modifying .git , and having the two replacement mechanisms treated differently may be dangerous.)","title":"Grafts and Replacements"},{"location":"git/detached-sigs/#no-signed-refs","text":"I mentioned early in this draft that Git's existing signing system doesn't support signing refs themselves; since refs are an important piece of Git's workflow ecosystem, this may be a major omission. Unfortunately, this proposal doesn't address that.","title":"No Signed Refs"},{"location":"git/detached-sigs/#possible-refinements","text":"Monotone's certificate system is key+value based, rather than label-based. This might be useful; while small pools of related values can be asserted using mutually exclusive policy labels (whose mutual exclusion is a matter of local interpretation), larger pools of related values rapidly become impractical under the proposed system. For example, this proposal would be inappropriate for directly asserting third-party authorship; the asserted author would have to appear in the policy name itself, exposing the user to a potentially very large number of similar policy labels. Ref signing via a manifest (a tree constellation whose paths are ref names and whose blobs sign the refs' values). Consider cribbing DNSSEC here for things like lightweight absence assertions, too. Describe how this should interact with commit-duplicating and commit-rewriting workflows.","title":"Possible Refinements"},{"location":"git/pull-request-workflow/","text":"Life With Pull Requests \u00b6 I've been party to a number of discussions with folks contributing to pull-request-based projects on Github (and other hosts, but mostly Github). Because of Git's innate flexibility, there are lots of ways to work with pull requests. Here's mine. I use a couple of naming conventions here that are not stock git : origin is the repository to which you publish proposed changes, and upstream is the repository from which you receive ongoing development, and which will receive your changes if they are accepted. One-time setup \u00b6 Do these things once, when starting out on a project. Keep the results around for later. I'll be referring to the original project repository as upstream and pretending its push URL is UPSTREAM-URL below. In real life, the URL will often be something like git@github.com:someguy/project.git . Fork the project \u00b6 Use the repo manager's forking tool to create a copy of the project in your own namespace. This generally creates your copy with a bunch of useless tat; feel free to ignore all of this, as the only purpose of this copy is to provide somewhere for you to publish your changes. We'll be calling this repository origin later. Assume it has a URL, which I'll abbreviate ORIGIN-URL , for git push to use. (You can leave this step for later, but if you know you're going to do it, why not get it out of the way?) Clone the project and configure it \u00b6 You'll need a clone locally to do work in. Create one from origin : git clone ORIGIN-URL some-local-name While you're here, cd into it and add the original project as a remote: cd some-local-name git remote add upstream UPSTREAM-URL Feature process \u00b6 Do these things for each feature you work on. To switch features, just use git checkout my-feature . Create a new feature branch locally \u00b6 We use upstream 's master branch here, so that your feature includes all of upstream 's state initially. We also need to make sure our local cache of upstream 's state is correct: git fetch upstream git checkout upstream/master -b my-feature Do work \u00b6 If you need my help here, stop now. Integrate upstream changes \u00b6 If you find yourself needing something that's been added upstream, use rebase to integrate it to avoid littering your feature branch with \u201cmeaningless\u201d merge commits. git checkout my-feature git fetch upstream git rebase upstream/master Publish your branch \u00b6 When you're \u201cdone,\u201d publish your branch to your personal repository: git push origin my-feature Then visit your copy in your repo manager's web UI and create a pull request for my-feature . Integrating feedback \u00b6 Very likely, your proposed changes will need work. If you use history-editing to integrate feedback, you will need to use --force when updating the branch: git push --force origin my-feature This is safe provided two things are true: The branch has not yet been merged to the upstream repo. You are only force-pushing to your fork, not to the upstream repo. Generally, no other users will have work based on your pull request, so force-pushing history won't cause problems.","title":"Life With Pull Requests"},{"location":"git/pull-request-workflow/#life-with-pull-requests","text":"I've been party to a number of discussions with folks contributing to pull-request-based projects on Github (and other hosts, but mostly Github). Because of Git's innate flexibility, there are lots of ways to work with pull requests. Here's mine. I use a couple of naming conventions here that are not stock git : origin is the repository to which you publish proposed changes, and upstream is the repository from which you receive ongoing development, and which will receive your changes if they are accepted.","title":"Life With Pull Requests"},{"location":"git/pull-request-workflow/#one-time-setup","text":"Do these things once, when starting out on a project. Keep the results around for later. I'll be referring to the original project repository as upstream and pretending its push URL is UPSTREAM-URL below. In real life, the URL will often be something like git@github.com:someguy/project.git .","title":"One-time setup"},{"location":"git/pull-request-workflow/#fork-the-project","text":"Use the repo manager's forking tool to create a copy of the project in your own namespace. This generally creates your copy with a bunch of useless tat; feel free to ignore all of this, as the only purpose of this copy is to provide somewhere for you to publish your changes. We'll be calling this repository origin later. Assume it has a URL, which I'll abbreviate ORIGIN-URL , for git push to use. (You can leave this step for later, but if you know you're going to do it, why not get it out of the way?)","title":"Fork the project"},{"location":"git/pull-request-workflow/#clone-the-project-and-configure-it","text":"You'll need a clone locally to do work in. Create one from origin : git clone ORIGIN-URL some-local-name While you're here, cd into it and add the original project as a remote: cd some-local-name git remote add upstream UPSTREAM-URL","title":"Clone the project and configure it"},{"location":"git/pull-request-workflow/#feature-process","text":"Do these things for each feature you work on. To switch features, just use git checkout my-feature .","title":"Feature process"},{"location":"git/pull-request-workflow/#create-a-new-feature-branch-locally","text":"We use upstream 's master branch here, so that your feature includes all of upstream 's state initially. We also need to make sure our local cache of upstream 's state is correct: git fetch upstream git checkout upstream/master -b my-feature","title":"Create a new feature branch locally"},{"location":"git/pull-request-workflow/#do-work","text":"If you need my help here, stop now.","title":"Do work"},{"location":"git/pull-request-workflow/#integrate-upstream-changes","text":"If you find yourself needing something that's been added upstream, use rebase to integrate it to avoid littering your feature branch with \u201cmeaningless\u201d merge commits. git checkout my-feature git fetch upstream git rebase upstream/master","title":"Integrate upstream changes"},{"location":"git/pull-request-workflow/#publish-your-branch","text":"When you're \u201cdone,\u201d publish your branch to your personal repository: git push origin my-feature Then visit your copy in your repo manager's web UI and create a pull request for my-feature .","title":"Publish your branch"},{"location":"git/pull-request-workflow/#integrating-feedback","text":"Very likely, your proposed changes will need work. If you use history-editing to integrate feedback, you will need to use --force when updating the branch: git push --force origin my-feature This is safe provided two things are true: The branch has not yet been merged to the upstream repo. You are only force-pushing to your fork, not to the upstream repo. Generally, no other users will have work based on your pull request, so force-pushing history won't cause problems.","title":"Integrating feedback"},{"location":"git/scratch/","text":"Git Is Not Magic \u00b6 I'm bored. Let's make a git repository out of whole cloth. Git repos are stored in .git: fakegit$ mkdir .git They have a \u201csymbolic ref\u201d (which are text files, see man git-symbolic-ref ) named HEAD , pointing to the currently checked-out branch. Let's use master . Branches are refs under refs/heads (see man git-branch ): fakegit ((unknown))$ echo 'ref: refs/heads/master' > .git/HEAD The have an object database and a refs database, both of which are simple directories (see man gitrepository-layout and man gitrevisions ). Let's also enable the reflog, because it's a great safety net if you use history-editing tools in git: fakegit ((ref: re...))$ mkdir .git/refs .git/objects .git/logs fakegit (master #)$ Now __git_ps1 , at least, is convinced that we have a working git repository. Does it work? fakegit (master #)$ echo 'Hello, world!' > hello.txt fakegit (master #)$ git add hello.txt fakegit (master #)$ git commit -m 'Initial commit' [master (root-commit) 975307b] Initial commit 1 file changed, 1 insertion(+) create mode 100644 hello.txt fakegit (master)$ git log commit 975307ba0485bff92e295e3379a952aff013c688 Author: Owen Jacobson <owen.jacobson@grimoire.ca> Date: Wed Feb 6 10:07:07 2013 -0500 Initial commit Eeyup . Should you do this? Of course not. Anywhere you could run these commands, you could instead run git init or git clone , which set up a number of other structures, including .git/config and any unusual permissions options. The key part here is that a directory's identity as \u201ca git repository\u201d is entirely a function of its contents, not of having been blessed into being by git itself. You can infer a lot from this: for example, you can infer that it's \u201csafe\u201d to move git repositories around using FS tools, or to back them up with the same tools, for example. This is not as obvious to everyone as you might hope; people","title":"Git Is Not Magic"},{"location":"git/scratch/#git-is-not-magic","text":"I'm bored. Let's make a git repository out of whole cloth. Git repos are stored in .git: fakegit$ mkdir .git They have a \u201csymbolic ref\u201d (which are text files, see man git-symbolic-ref ) named HEAD , pointing to the currently checked-out branch. Let's use master . Branches are refs under refs/heads (see man git-branch ): fakegit ((unknown))$ echo 'ref: refs/heads/master' > .git/HEAD The have an object database and a refs database, both of which are simple directories (see man gitrepository-layout and man gitrevisions ). Let's also enable the reflog, because it's a great safety net if you use history-editing tools in git: fakegit ((ref: re...))$ mkdir .git/refs .git/objects .git/logs fakegit (master #)$ Now __git_ps1 , at least, is convinced that we have a working git repository. Does it work? fakegit (master #)$ echo 'Hello, world!' > hello.txt fakegit (master #)$ git add hello.txt fakegit (master #)$ git commit -m 'Initial commit' [master (root-commit) 975307b] Initial commit 1 file changed, 1 insertion(+) create mode 100644 hello.txt fakegit (master)$ git log commit 975307ba0485bff92e295e3379a952aff013c688 Author: Owen Jacobson <owen.jacobson@grimoire.ca> Date: Wed Feb 6 10:07:07 2013 -0500 Initial commit Eeyup . Should you do this? Of course not. Anywhere you could run these commands, you could instead run git init or git clone , which set up a number of other structures, including .git/config and any unusual permissions options. The key part here is that a directory's identity as \u201ca git repository\u201d is entirely a function of its contents, not of having been blessed into being by git itself. You can infer a lot from this: for example, you can infer that it's \u201csafe\u201d to move git repositories around using FS tools, or to back them up with the same tools, for example. This is not as obvious to everyone as you might hope; people","title":"Git Is Not Magic"},{"location":"git/stop-using-git-pull-to-deploy/","text":"Stop using git pull for deployment! \u00b6 The problem \u00b6 You have a Git repository containing your project. You want to \u201cdeploy\u201d that code when it changes. You'd rather not download the entire project from scratch for each deployment. The antipattern \u00b6 \u201cI know, I'll use git pull in my deployment script!\u201d Stop doing this. Stop teaching other people to do this. It's wrong, and it will eventually lead to deploying something you didn't want. Deployment should be based on predictable, known versions of your code. Ideally, every deployable version has a tag (and you deploy exactly that tag), but even less formal processes, where you deploy a branch tip, should still be deploying exactly the code designated for release. git pull , however, can introduce new commits. git pull is a two-step process: Fetch the current branch's designated upstream remote, to obtain all of the remote's new commits. Merge the current branch's designated upstream branch into the current branch. The merge commit means the actual deployed tree might not be identical to the intended deployment tree. Local changes (intentional or otherwise) will be preserved (and merged) into the deployment, for example; once this happens, the actual deployed commit will never match the intended commit. git pull will approximate the right thing \u201cby accident\u201d: if the current local branch (generally master ) for people using git pull is always clean, and always tracks the desired deployment branch, then git pull will update to the intended commit exactly. This is pretty fragile, though; many git commands can cause the local branch to diverge from its upstream branch, and once that happens, git pull will always create new commits. You can patch around the fragility a bit using the --ff-only option, but that only tells you when your deployment environment has diverged and doesn't fix it. The right pattern \u00b6 Quoting Sitaram Chamarty : Here's what we expect from a deployment tool. Note the rule numbers -- we'll be referring to some of them simply by number later. All files in the branch being deployed should be copied to the deployment directory. Files that were deleted in the git repo since the last deployment should get deleted from the deployment directory. Any changes to tracked files in the deployment directory after the last deployment should be ignored when following rules 1 and 2. However, sometimes you might want to detect such changes and abort if you found any. Untracked files in the deploy directory should be left alone. Again, some people might want to detect this and abort the deployment. Sitaram's own documentation talks about how to accomplish these when \u201cdeploying\u201d straight out of a bare repository. That's unwise (not to mention impractical) in most cases; deployment should use a dedicated clone of the canonical repository. I also disagree with point 3, preferring to keep deployment-related changes outside of tracked files. This makes it much easier to argue that the changes introduced to configure the project for deployment do not introduce new bugs or other surprise features. My deployment process, given a dedicated clone at $DEPLOY_TREE , is as follows: cd \"${DEPLOY_TREE}\" git fetch --all git checkout --force \"${TARGET}\" # Following two lines only required if you use submodules git submodule sync git submodule update --init --recursive # Follow with actual deployment steps (run fabric/capistrano/make/etc) $TARGET is either a tag name ( v1.2.1 ) or a remote branch name ( origin/master ), but could also be a commit hash or anything else Git recognizes as a revision. This will detach the head of the $DEPLOY_TREE repository, which is fine as no new changes should be authored in this repository (so the local branches are irrelevant). The warning Git emits when HEAD becomes detached is unimportant in this case. The tracked contents of $DEPLOY_TREE will end up identical to the desired commit, discarding local changes. The pattern above is very similar to what most continuous integration servers use when building from Git repositories, for much the same reason.","title":"Stop using `git pull` for deployment!"},{"location":"git/stop-using-git-pull-to-deploy/#stop-using-git-pull-for-deployment","text":"","title":"Stop using git pull for deployment!"},{"location":"git/stop-using-git-pull-to-deploy/#the-problem","text":"You have a Git repository containing your project. You want to \u201cdeploy\u201d that code when it changes. You'd rather not download the entire project from scratch for each deployment.","title":"The problem"},{"location":"git/stop-using-git-pull-to-deploy/#the-antipattern","text":"\u201cI know, I'll use git pull in my deployment script!\u201d Stop doing this. Stop teaching other people to do this. It's wrong, and it will eventually lead to deploying something you didn't want. Deployment should be based on predictable, known versions of your code. Ideally, every deployable version has a tag (and you deploy exactly that tag), but even less formal processes, where you deploy a branch tip, should still be deploying exactly the code designated for release. git pull , however, can introduce new commits. git pull is a two-step process: Fetch the current branch's designated upstream remote, to obtain all of the remote's new commits. Merge the current branch's designated upstream branch into the current branch. The merge commit means the actual deployed tree might not be identical to the intended deployment tree. Local changes (intentional or otherwise) will be preserved (and merged) into the deployment, for example; once this happens, the actual deployed commit will never match the intended commit. git pull will approximate the right thing \u201cby accident\u201d: if the current local branch (generally master ) for people using git pull is always clean, and always tracks the desired deployment branch, then git pull will update to the intended commit exactly. This is pretty fragile, though; many git commands can cause the local branch to diverge from its upstream branch, and once that happens, git pull will always create new commits. You can patch around the fragility a bit using the --ff-only option, but that only tells you when your deployment environment has diverged and doesn't fix it.","title":"The antipattern"},{"location":"git/stop-using-git-pull-to-deploy/#the-right-pattern","text":"Quoting Sitaram Chamarty : Here's what we expect from a deployment tool. Note the rule numbers -- we'll be referring to some of them simply by number later. All files in the branch being deployed should be copied to the deployment directory. Files that were deleted in the git repo since the last deployment should get deleted from the deployment directory. Any changes to tracked files in the deployment directory after the last deployment should be ignored when following rules 1 and 2. However, sometimes you might want to detect such changes and abort if you found any. Untracked files in the deploy directory should be left alone. Again, some people might want to detect this and abort the deployment. Sitaram's own documentation talks about how to accomplish these when \u201cdeploying\u201d straight out of a bare repository. That's unwise (not to mention impractical) in most cases; deployment should use a dedicated clone of the canonical repository. I also disagree with point 3, preferring to keep deployment-related changes outside of tracked files. This makes it much easier to argue that the changes introduced to configure the project for deployment do not introduce new bugs or other surprise features. My deployment process, given a dedicated clone at $DEPLOY_TREE , is as follows: cd \"${DEPLOY_TREE}\" git fetch --all git checkout --force \"${TARGET}\" # Following two lines only required if you use submodules git submodule sync git submodule update --init --recursive # Follow with actual deployment steps (run fabric/capistrano/make/etc) $TARGET is either a tag name ( v1.2.1 ) or a remote branch name ( origin/master ), but could also be a commit hash or anything else Git recognizes as a revision. This will detach the head of the $DEPLOY_TREE repository, which is fine as no new changes should be authored in this repository (so the local branches are irrelevant). The warning Git emits when HEAD becomes detached is unimportant in this case. The tracked contents of $DEPLOY_TREE will end up identical to the desired commit, discarding local changes. The pattern above is very similar to what most continuous integration servers use when building from Git repositories, for much the same reason.","title":"The right pattern"},{"location":"git/survival/","text":"Git Survival Guide \u00b6 I think the git UI is pretty awful, and encourages using Git in ways that will screw you. Here are a few things I've picked up that have saved my bacon. You will inevitably need to understand Git's \u201cinternals\u201d to make use of it as an SCM tool. Accept this early. If you think your SCM tool should not expose you to so much plumbing, don't use Git . Git weenies will claim that this plumbing is what gives Git all of its extra power. This is true; it gives Git the power to get you out of situations you wouldn't be in without Git. git log --graph --decorate --oneline --color --all Run git fetch habitually. Stale remote-tracking branches lead to sadness. git push and git pull are not symmetric . git push 's opposite operation is git fetch . ( git pull is equivalent to git fetch followed by git merge , more or less). Git configuration values don't always have the best defaults . The upstream branch of foo is foo@{u} . The upstream branch of your checked-out branch is HEAD@{u} or @{u} . This is documented in git help revisions . You probably don't want to use a merge operation (such as git pull ) to integrate upstream changes into topic branches. The resulting history can be very confusing to follow, especially if you integrate upstream changes frequently. You can leave topic branches \u201creal\u201d relatively safely. You can do a test merge to see if they still work cleanly post-integration without actually integrating upstream into the branch permanently. You can use git rebase or git pull --rebase to transplant your branch to a new, more recent starting point that includes the changes you want to integrate. This makes the upstream changes a permanent part of your branch, just like git merge or git pull would, but generates an easier-to-follow history. Conflict resolution will happen as normal. Example test merge, using origin/master as the upstream branch and foo as the candidate for integration: git fetch origin git checkout origin/master -b test-merge-foo git merge foo # run tests, examine files git diff origin/master..HEAD To discard the test merge, delete the branch after checking out some other branch: git checkout foo git branch -D test-merge-foo You can combine this with git rerere to save time resolving conflicts in a later \u201creal,\u201d permanent merge. You can use git checkout -p to build new, tidy commits out of a branch laden with \u201cwip\u201d commits: git fetch git checkout $(git merge-base origin/master foo) -b foo-cleaner-history git checkout -p foo -- paths/to/files # pick out changes from the presented patch that form a coherent commit # repeat 'git checkout -p foo --' steps for related files to build up # the new commit git commit # repeat 'git checkout -p foo --' and 'git commit' steps until no diffs remain Gotcha: git checkout -p will do nothing for files that are being created. Use git checkout , instead, and edit the file if necessary. Thanks, Git. Gotcha: The new, clean branch must diverge from its upstream branch ( origin/master , in the example above) at exactly the same point, or the diffs presented by git checkout -p foo will include chunks that revert changes on the upstream branch since the \u201cdirty\u201d branch was created. The easiest way to find this point is with git merge-base . Useful Resources \u00b6 That is, resoures that can help you solve problems or understand things, not resources that reiterate the man pages for you. Sitaram Chamarty's git concepts simplified Tv's Git for Computer Scientists","title":"Git Survival Guide"},{"location":"git/survival/#git-survival-guide","text":"I think the git UI is pretty awful, and encourages using Git in ways that will screw you. Here are a few things I've picked up that have saved my bacon. You will inevitably need to understand Git's \u201cinternals\u201d to make use of it as an SCM tool. Accept this early. If you think your SCM tool should not expose you to so much plumbing, don't use Git . Git weenies will claim that this plumbing is what gives Git all of its extra power. This is true; it gives Git the power to get you out of situations you wouldn't be in without Git. git log --graph --decorate --oneline --color --all Run git fetch habitually. Stale remote-tracking branches lead to sadness. git push and git pull are not symmetric . git push 's opposite operation is git fetch . ( git pull is equivalent to git fetch followed by git merge , more or less). Git configuration values don't always have the best defaults . The upstream branch of foo is foo@{u} . The upstream branch of your checked-out branch is HEAD@{u} or @{u} . This is documented in git help revisions . You probably don't want to use a merge operation (such as git pull ) to integrate upstream changes into topic branches. The resulting history can be very confusing to follow, especially if you integrate upstream changes frequently. You can leave topic branches \u201creal\u201d relatively safely. You can do a test merge to see if they still work cleanly post-integration without actually integrating upstream into the branch permanently. You can use git rebase or git pull --rebase to transplant your branch to a new, more recent starting point that includes the changes you want to integrate. This makes the upstream changes a permanent part of your branch, just like git merge or git pull would, but generates an easier-to-follow history. Conflict resolution will happen as normal. Example test merge, using origin/master as the upstream branch and foo as the candidate for integration: git fetch origin git checkout origin/master -b test-merge-foo git merge foo # run tests, examine files git diff origin/master..HEAD To discard the test merge, delete the branch after checking out some other branch: git checkout foo git branch -D test-merge-foo You can combine this with git rerere to save time resolving conflicts in a later \u201creal,\u201d permanent merge. You can use git checkout -p to build new, tidy commits out of a branch laden with \u201cwip\u201d commits: git fetch git checkout $(git merge-base origin/master foo) -b foo-cleaner-history git checkout -p foo -- paths/to/files # pick out changes from the presented patch that form a coherent commit # repeat 'git checkout -p foo --' steps for related files to build up # the new commit git commit # repeat 'git checkout -p foo --' and 'git commit' steps until no diffs remain Gotcha: git checkout -p will do nothing for files that are being created. Use git checkout , instead, and edit the file if necessary. Thanks, Git. Gotcha: The new, clean branch must diverge from its upstream branch ( origin/master , in the example above) at exactly the same point, or the diffs presented by git checkout -p foo will include chunks that revert changes on the upstream branch since the \u201cdirty\u201d branch was created. The easiest way to find this point is with git merge-base .","title":"Git Survival Guide"},{"location":"git/survival/#useful-resources","text":"That is, resoures that can help you solve problems or understand things, not resources that reiterate the man pages for you. Sitaram Chamarty's git concepts simplified Tv's Git for Computer Scientists","title":"Useful Resources"},{"location":"gossamer/","text":"Gossamer: A Decentralized Status-Sharing Network \u00b6 Twitter's pretty great. The short format encourages brief, pithy remarks, and the default assumption of visibility makes it super easy to pitch in on a conversation, or to find new people to listen to. Unfortunately, Twitter is a centralized system: one Bay-area company in the United States controls and mediates all Twitter interactions. From all appearances, Twitter, Inc. is relatively benign, as social media corporations go. There are few reports of censorship, and while their response to abuse of the Twitter network has not been consistently awesome, they can be made to listen. However, there exists the capacity for Twitter, Inc. to subvert the entire Twitter system, either voluntarily or at the behest of governments around the world. (Just ask Turkish people. Or the participants in the Arab Spring.) Gossamer is a Twitter-alike system, designed from the ground up to have no central authority. It resists censorship, enables individual participants to control their own data, and allows anyone at all to integrate new software into the Gossamer network. Gossamer does not exist, but if it did, the following notes describe what it might look like, and the factors to consider when implementing Gossamer as software. I have made fatal mistakes while writing it; I have not rushed to build it specifically because Twitter, Gossamer's model, is so deeply woven into so many peoples' lives. A successor must make fewer mistakes, not merely different mistakes, and certainly not more mistakes. The following is loosely inspired by Rumor Monger , at \u201cwhole world\u201d scale. Design Goals \u00b6 Users must be in control of their own privacy and identity at all times. (This is a major failing with Diaspora, which limits access to personal ownership of data by being hard to run.) Users must be able to communicate without the consent or support of an intermediate authority. Short of being completely offline, Gossamer should be resilient to infrastructural damage. Any functional communication system will be used for illicit purposes. This is an unavoidable consequence of being usable for legitimate purposes without a central authority. Rather than revealing illicit conversations, Gossamer should do what it can to preserve the anonymity and privacy of legitimate ones. All nodes are as equal as possible. The node I use is not more authoritative for messages from me than any other node. You can hear my words from anyone who has heard my words, and I can hear yours from anyone who has heard your words, so long as some variety of authenticity and privacy are maintained. If an identity's secrets are removed, a node should contain no data that correlates the owner with his or her Gossamer identities. Relaying and authoring must be as indistinguishable as possible, to limit the utility of traffic analysis. Public and Private Information \u00b6 Every piece of data Gossamer uses, either internally or to communicate with other ndoes, is classified as either public or private . Public information can be communicated to other nodes, and is assumed to be safe if recovered out of band. Private information includes anything which may be used to associate a Gossamer identity with the person who controls it, except as noted below. Gossamer must ensure users understand what information that they provide will be made public, and what will be kept private, so that they can better decide what, if anything, to share and so that they can better make decisions about their own safety and comfort against abusive parties. Internally, Gossamer always stores private information encrypted, and never transmits it to another node. Gossamer must provide a tool to safely obliterate private data. Public Information \u00b6 Details on the role of each piece of information are covered below. Public status updates, obviously. Gossamer exists to permit users to easily share short messages with one another. The opaque form of a user's incoming and outgoing private messages. The users' identities' public keys. (But not their relationship to one another.) Any information the user places in their profile. (This implies that profiles must not be auto-populated from, for example, the user's address book.) The set of identities verified by the user's identity. Any other information Gossamer retains must be private. Republishing \u00b6 Gossamer is built on the assumption that every participant is willing to act as a relay for every other participant. This is a complicated assumption at the human layer. Inevitably, someone will use the Gossamer network to communicate something morally repugnant or deeply illegal: the Silk Road guy, for example, got done for trying to contract someone to commit murder. Every Gossamer node is complicit in delivering those messages to the rest of the network, whether they're in the clear (status updates) or not (private messages). It's unclear how this interacts with the various legal frameworks, moral codes, and other social constructs throughout the world, and it's ethically troubling to put users in that position by default. The strong alternative, that each node only relay content with the controlling user's explicit and ongoing consent, is also troubling: it limits the Gossamer network's ability to deliver messages at all , and exposes information about which identities each node's owner considers interesting and publishable. I don't have an obvious resolution to this. Gossamer's underlying protocol relies on randomly-selected nodes being more likely to propagate a message than to ignore it, because this helps make Gossamer resilient to hostile users, nosy intelligence agencies, and others who believe communication must be restrictable. On the other hand, I'd like not to put a user in Taiwan at risk of legal or social reprisals because a total stranger in Canada decided to post something vile. (This is one of the reasons I haven't built the damn thing yet. Besides being A Lot Of Code, there's no way to shut off Gossamer once more than one node exists, and I want to be sure I've thought through what I'm doing before creating a prototype.) Identity in the Gossamer Network \u00b6 Every Gossamer message carries with it an identity . Gossamer identities are backed by public-key cryptography. However, unlike traditional public key systems such as GPG, Gossamer identities provide continuity , rather than authenticity : two Gossamer messages signed by the same key are from the same identity, but there is no inherent guarantee that that identity is legitimate. Gossamer maintains relationships between identities to allow users to verify the identities of one another, and to publish attestations of that to other Gossamer nodes. From this, Gossamer can recover much of GPG's \u201cweb of trust.\u201d TODO : revocation of identities, revocation of verifications. Both are important; novice users are likely to verify people poorly, and there should be a recovery path less drastic than GPG's \u201cyou swore it, you're stuck with it\u201d model. Gossamer encourages users to create additional identities as needed to, for example, support the separation of work and home conversations, or to provide anonymity when discussing reputationally-hazardous topics. Identities are not correlated by the Gossamer codebase. Each identity can optionally include a profile : a block of data describing the person behind the identity. The contents of a profile are chosen by the person holding the private key for an identity, and the profile is attached to every new message created with the corresponding identity. A user can update their profile at will; potentially, every message can be sent with a distinct profile. Gossamer software treats the profile it's seen with the highest timestamp as authoritative, retroactively applying it to old messages. Multiple Devices and Key Security \u00b6 A Gossamer identity is entirely contained in its private key. An identity's key must be stored safely, either using the host operating system's key management facilities or using a carefully-designed key store. Keys must not hit long-term storage unprotected; this may involve careful integration with the underlying OS's memory management facilities to avoid, eg., placing identities in swap. This is necessary to protect users from having their identities recovered against their will via, for example, hard drive forensics. Gossamer allows keys to be exported into password-encrypted archive files, which can be loaded into other Gossamer applications to allow them to share the same identity. GOSSAMER MUST TREAT THESE FILES WITH EXTREME CARE, BECAUSE USERS PROBABLY WON'T . Identity keys protect the user's Gossamer identity, but they also protect the user's private messages (see below) and other potentially identifying data. The export format must be designed to be as resilient as possible, and Gossamer's software must take care to ensure that \u201cused\u201d identity files are automatically destroyed safely wherever possible and to discourage users from following practices that weaken their own safety unknowingly. Exported identity files are intrinsically vulnerable to offline brute-force attacks; once obtained, an attacker can try any of the worryingly common passwords at will, and can easily validate a password by using the recovered keys to regenerate some known fact about the original, such as a verification or a message signature. This implies that exported identities must use a key derivation system which has a high computational cost and which is believed to be resilient to, for example, GPU-accelerated cracking. Secure deletion is a Hard Problem; where possible, Gossamer must use operating system-provided facilities for securely destroying files. Status Messages \u00b6 Status messages are messages visible to any interested Gossamer users. These are the primary purpose of Gossamer. Each contains up to 140 Unicode characters, a markup section allowing Gossamer to attach URLs and metadata (including Gossamer locators) to the text, and an attachments section carrying arbitrary MIME blobs of limited total size. All three sections are canonicalized ( TODO : how?) and signed by the publishing identity's private key. The public key, the identity's most recent profile, and the signed status message are combined into a single Gossamer message and injected into the user's Gossamer node exactly as if it had arrived from another node. Each Gossamer node maintains a follow list of identities whose messages the user is interested in seeing. When Gossamer receives a novel status message during a gossip exchange, it displays it to the user if and only if its identity is on the node's follow list. Otherwise, the message is not displayed, but will be shared onwards with other nodes. In this way, every Gossamer node acts as a relay for every other Gossamer node. If Gossamer receives a message signed by an identity it has seen attestations for, it attaches those attestations to the message before delivering them onwards. In this way, users' verifications of one another's identity spread through the network organically. Private Messages \u00b6 Gossamer can optionally encrypt messages, allowing users to send one another private messages. These messages are carried over the Gossamer network as normal, but only nodes holding the appropriate identity key can decrypt them and display them to the user. (At any given time, most Gossamer nodes hold many private messages they cannot decrypt.) Private messages do not carry the author's identity or full profile in the clear. The author's bare identity is included in the encrypted part of the message, to allow the intended recipient to identify the sender. TODO : sign-then-encrypt, or encrypt-then-sign? If sign-then-encrypt, are private messages exempted from the \u201cdrop broken messages\u201d rule above? Following Users \u00b6 Each Gossamer node maintains a database of followed identities. (This may or may not include the owner's own identity.) Any message stored in the node published by an identity in this database will be shown to the user in a timeline-esque view. Gossamer's follow list is purely local , and is not shared between nodes even if they have identities in common. The follow list is additionally stored encrypted using the node's identities (any one identity is sufficient to recover the list), to ensure that the follow list is not easily available to others without the node owner's permission. Exercises such as Finding Paul Revere have shown that the collection of graph edges showing who communicates with whom can often be sufficient to map identities into people. Gossamer attempts to restrict access to this data, believing it is not the network's place to know who follows who. Verified Identities \u00b6 Gossamer allows identities to sign one anothers' public keys. These signatures form verifications . Gossamer considers an identity verified if any of the following hold: Gossamer has access to the identity key for the identity itself. Gossamer has access to the identity key for at least one of the identity's verifications. The identity is signed by at least three (todo: or however many, I didn't do the arithmetic yet) verified identities. Verified identities are marked in the user interface to make it obvious to the user whether a message is from a known friend or from an unknown identity. Gossamer allows users to sign new verifications for any identity they have seen. These verifications are initially stored locally, but will be published as messages transit the node as described below. Verification is a public fact: everyone can see which identities have verified which other identities. This is a potentially very powerful tool for reassociating identities with real-world people; Gossamer must make this clear to users. (I'm pretty sure you could find me, personally, just by watching whose identities I verify.) Each Gossamer node maintains a database of every verification it has ever seen or generated. If the node receives a message from an identity that appears in the verification database, and if the message is under some total size, Gossamer appends verifications from its database to the message before reinjecting it into the network. This allows verifications to propagate through Blocking Users \u00b6 Any social network will attract hostile users who wish to disrupt the network or abuse its participants. Users must be able to filter out these users, and must not provide too much feedback to blocked users that could otherwise be used to circumvent blocks. Each Gossamer node maintains a database of blocked identities. Any message from an identity in this database, or from an identity that is verified by three or more identities in this database, will automatically be filtered out from display. (Additionally, transitively-blocked users will automatically be added to the block database. Blocking is contagious.) ( TODO : should Gossamer drop blocked messages? How does that interact with the inevitable \u201cshared blocklist\u201d systems that arise in any social network?) As with the follow list, the block database is encrypted using the node's identities. Gossamer encourages users to create new identities as often as they see fit and attempts to separate identities from one another as much as possible. This is fundamentally incompatible with strong blocking. It will always be possible for a newly-created identity to deliver at least one message before being blocked. This is a major design problem ; advice encouraged. Gossamer Network Primitives \u00b6 The Gossamer network is built around a gossip protocol, wherein nodes connect to one another periodically to exchange messages with one another. Connections occur over the existing IP internet infrastructure, traversing NAT networks where possible to ensure that users on residential and corporate networks can still participate. Gossamer bootstraps its network using a number of paths: Gossamer nodes in the same broadcast domain discover one another using UDP broadcasts as well as Bonjour/mDNS. Gossamer can generate locator strings, which can be shared \u201cout of band\u201d via email, SMS messages, Twitter, graffiti, etc. Gossamer nodes share knowledge of nodes whenever they exchange messages, to allow the Gossamer network to recover from lost nodes and to permit nodes to remain on the network as \u201cknown\u201d nodes are lost to outages and entropy. Locators \u00b6 A Gossamer locator is a URL in the g scheme, carrying an encoding of one or more network addresses as well as an encoding of one or more identities (see below). Gossamer's software attempts to determine an appropriate identifier for any identities it holds based on the host computer's network configuration, taking into account issues like NAT traversal wherever possible. TODO : Gossamer and uPNP, what do locators look like? When presented with an identifier, Gossamer offers to follow the identities it contains, and uses the nodes whose addresses it contains to connect to the Gossamer network. This allows new clients to bootstrap into Gossamer, and provides an easy way for users to exchange Gossamer identities to connect to one another later. (Clever readers will note that the address list is actually independent of the identity list.) Gossip \u00b6 Each Gossamer node maintains a pair of \u201cfreshness\u201d databases, associating some information with a freshness score (expressed as an integer). One freshness database holds the addresses of known Gossamer nodes, and another holds Gossamer messages. Whenever two Gossamer nodes interact, each sends the other a Gossamer node from its current node database, and a message from its message database. When selecting an item to send for either category, Gossamer uses a random selection that weights towards items with a higher \u201cfreshness\u201d score. ( TODO : how?) When sending a fact, if the receiving node already knows the fact, both nodes decrement that fact's freshness by one. If the receiving node does not already know the fact, the sending node leaves its freshness unaltered, and the receiving node sets its freshness to the freshest possible value. This system encourages nodes to exchange \u201cfresh\u201d facts, then cease exchanging them as the network becomes aware of them. During each exchange, Gossamer nodes send each other one Gossamer node address, and one Gossamer message. Both nodes adjust their freshness databases, as above. If fact exchange fails while communicating with a Gossamer node, both nodes decrement their peer's freshness. Unreliable nodes can continue to initiate connections to other nodes, but will rarely be contacted by other Gossamer nodes. TODO : How do we avoid DDOSing brand-new gossamer nodes with the full might of Gossamer's network? TODO : Can we reuse Bittorrent's DHT system (BEP-5) to avoid having every node know the full network topology? TODO : Are node-to-node exchanges encrypted? If so, why and how? Authenticity \u00b6 Gossamer node addresses are not authenticated. Gossamer relies on freshness to avoid delivering excess traffic to systems not participating in the Gossamer network. ( TODO : this is a shit system for avoiding DDOS, though.) Gossamer messages are partially authenticated: each carries with it a public key, and a signature. If the signature cannot be verified with the included public key, it must be discarded immediately and it must not be propagated to other nodes. The node delivering the message may also be penalized by having its freshness reduced in the receiving node's database. Gossip Triggers \u00b6 Gossamer triggers a new Gossip exchange under the following circumstances: 15 seconds, plus a random jitter between zero and 15 more seconds, elapse since the last exchange attempt. Gossamer completes an exchange wherein it learned a new fact from another node. A user injects a fact into Gossamer directly. Gossamer exchanges that fail, or that deliver only already-known facts, do not trigger further exchanges immediately. TODO : how do we prevent Gossamer from attempting to start an unbounded number of exchanges at the same time? Size \u00b6 Gossamer must not exhaust the user's disk. Gossamer discards extremely un-fresh messages, attempting to keep the on-disk size of the message database to under 10% of the total local storage, or under a user-configurable threshold. Gossamer rejects over-large messages. Public messages carry with them the author's profile and a potentially large collection of verifications. Messages over some size ( TODO what size?) are discarded on receipt without being stored, and the message exchange is considered to have failed.","title":"Gossamer: A Decentralized Status-Sharing Network"},{"location":"gossamer/#gossamer-a-decentralized-status-sharing-network","text":"Twitter's pretty great. The short format encourages brief, pithy remarks, and the default assumption of visibility makes it super easy to pitch in on a conversation, or to find new people to listen to. Unfortunately, Twitter is a centralized system: one Bay-area company in the United States controls and mediates all Twitter interactions. From all appearances, Twitter, Inc. is relatively benign, as social media corporations go. There are few reports of censorship, and while their response to abuse of the Twitter network has not been consistently awesome, they can be made to listen. However, there exists the capacity for Twitter, Inc. to subvert the entire Twitter system, either voluntarily or at the behest of governments around the world. (Just ask Turkish people. Or the participants in the Arab Spring.) Gossamer is a Twitter-alike system, designed from the ground up to have no central authority. It resists censorship, enables individual participants to control their own data, and allows anyone at all to integrate new software into the Gossamer network. Gossamer does not exist, but if it did, the following notes describe what it might look like, and the factors to consider when implementing Gossamer as software. I have made fatal mistakes while writing it; I have not rushed to build it specifically because Twitter, Gossamer's model, is so deeply woven into so many peoples' lives. A successor must make fewer mistakes, not merely different mistakes, and certainly not more mistakes. The following is loosely inspired by Rumor Monger , at \u201cwhole world\u201d scale.","title":"Gossamer: A Decentralized Status-Sharing Network"},{"location":"gossamer/#design-goals","text":"Users must be in control of their own privacy and identity at all times. (This is a major failing with Diaspora, which limits access to personal ownership of data by being hard to run.) Users must be able to communicate without the consent or support of an intermediate authority. Short of being completely offline, Gossamer should be resilient to infrastructural damage. Any functional communication system will be used for illicit purposes. This is an unavoidable consequence of being usable for legitimate purposes without a central authority. Rather than revealing illicit conversations, Gossamer should do what it can to preserve the anonymity and privacy of legitimate ones. All nodes are as equal as possible. The node I use is not more authoritative for messages from me than any other node. You can hear my words from anyone who has heard my words, and I can hear yours from anyone who has heard your words, so long as some variety of authenticity and privacy are maintained. If an identity's secrets are removed, a node should contain no data that correlates the owner with his or her Gossamer identities. Relaying and authoring must be as indistinguishable as possible, to limit the utility of traffic analysis.","title":"Design Goals"},{"location":"gossamer/#public-and-private-information","text":"Every piece of data Gossamer uses, either internally or to communicate with other ndoes, is classified as either public or private . Public information can be communicated to other nodes, and is assumed to be safe if recovered out of band. Private information includes anything which may be used to associate a Gossamer identity with the person who controls it, except as noted below. Gossamer must ensure users understand what information that they provide will be made public, and what will be kept private, so that they can better decide what, if anything, to share and so that they can better make decisions about their own safety and comfort against abusive parties. Internally, Gossamer always stores private information encrypted, and never transmits it to another node. Gossamer must provide a tool to safely obliterate private data.","title":"Public and Private Information"},{"location":"gossamer/#public-information","text":"Details on the role of each piece of information are covered below. Public status updates, obviously. Gossamer exists to permit users to easily share short messages with one another. The opaque form of a user's incoming and outgoing private messages. The users' identities' public keys. (But not their relationship to one another.) Any information the user places in their profile. (This implies that profiles must not be auto-populated from, for example, the user's address book.) The set of identities verified by the user's identity. Any other information Gossamer retains must be private.","title":"Public Information"},{"location":"gossamer/#republishing","text":"Gossamer is built on the assumption that every participant is willing to act as a relay for every other participant. This is a complicated assumption at the human layer. Inevitably, someone will use the Gossamer network to communicate something morally repugnant or deeply illegal: the Silk Road guy, for example, got done for trying to contract someone to commit murder. Every Gossamer node is complicit in delivering those messages to the rest of the network, whether they're in the clear (status updates) or not (private messages). It's unclear how this interacts with the various legal frameworks, moral codes, and other social constructs throughout the world, and it's ethically troubling to put users in that position by default. The strong alternative, that each node only relay content with the controlling user's explicit and ongoing consent, is also troubling: it limits the Gossamer network's ability to deliver messages at all , and exposes information about which identities each node's owner considers interesting and publishable. I don't have an obvious resolution to this. Gossamer's underlying protocol relies on randomly-selected nodes being more likely to propagate a message than to ignore it, because this helps make Gossamer resilient to hostile users, nosy intelligence agencies, and others who believe communication must be restrictable. On the other hand, I'd like not to put a user in Taiwan at risk of legal or social reprisals because a total stranger in Canada decided to post something vile. (This is one of the reasons I haven't built the damn thing yet. Besides being A Lot Of Code, there's no way to shut off Gossamer once more than one node exists, and I want to be sure I've thought through what I'm doing before creating a prototype.)","title":"Republishing"},{"location":"gossamer/#identity-in-the-gossamer-network","text":"Every Gossamer message carries with it an identity . Gossamer identities are backed by public-key cryptography. However, unlike traditional public key systems such as GPG, Gossamer identities provide continuity , rather than authenticity : two Gossamer messages signed by the same key are from the same identity, but there is no inherent guarantee that that identity is legitimate. Gossamer maintains relationships between identities to allow users to verify the identities of one another, and to publish attestations of that to other Gossamer nodes. From this, Gossamer can recover much of GPG's \u201cweb of trust.\u201d TODO : revocation of identities, revocation of verifications. Both are important; novice users are likely to verify people poorly, and there should be a recovery path less drastic than GPG's \u201cyou swore it, you're stuck with it\u201d model. Gossamer encourages users to create additional identities as needed to, for example, support the separation of work and home conversations, or to provide anonymity when discussing reputationally-hazardous topics. Identities are not correlated by the Gossamer codebase. Each identity can optionally include a profile : a block of data describing the person behind the identity. The contents of a profile are chosen by the person holding the private key for an identity, and the profile is attached to every new message created with the corresponding identity. A user can update their profile at will; potentially, every message can be sent with a distinct profile. Gossamer software treats the profile it's seen with the highest timestamp as authoritative, retroactively applying it to old messages.","title":"Identity in the Gossamer Network"},{"location":"gossamer/#multiple-devices-and-key-security","text":"A Gossamer identity is entirely contained in its private key. An identity's key must be stored safely, either using the host operating system's key management facilities or using a carefully-designed key store. Keys must not hit long-term storage unprotected; this may involve careful integration with the underlying OS's memory management facilities to avoid, eg., placing identities in swap. This is necessary to protect users from having their identities recovered against their will via, for example, hard drive forensics. Gossamer allows keys to be exported into password-encrypted archive files, which can be loaded into other Gossamer applications to allow them to share the same identity. GOSSAMER MUST TREAT THESE FILES WITH EXTREME CARE, BECAUSE USERS PROBABLY WON'T . Identity keys protect the user's Gossamer identity, but they also protect the user's private messages (see below) and other potentially identifying data. The export format must be designed to be as resilient as possible, and Gossamer's software must take care to ensure that \u201cused\u201d identity files are automatically destroyed safely wherever possible and to discourage users from following practices that weaken their own safety unknowingly. Exported identity files are intrinsically vulnerable to offline brute-force attacks; once obtained, an attacker can try any of the worryingly common passwords at will, and can easily validate a password by using the recovered keys to regenerate some known fact about the original, such as a verification or a message signature. This implies that exported identities must use a key derivation system which has a high computational cost and which is believed to be resilient to, for example, GPU-accelerated cracking. Secure deletion is a Hard Problem; where possible, Gossamer must use operating system-provided facilities for securely destroying files.","title":"Multiple Devices and Key Security"},{"location":"gossamer/#status-messages","text":"Status messages are messages visible to any interested Gossamer users. These are the primary purpose of Gossamer. Each contains up to 140 Unicode characters, a markup section allowing Gossamer to attach URLs and metadata (including Gossamer locators) to the text, and an attachments section carrying arbitrary MIME blobs of limited total size. All three sections are canonicalized ( TODO : how?) and signed by the publishing identity's private key. The public key, the identity's most recent profile, and the signed status message are combined into a single Gossamer message and injected into the user's Gossamer node exactly as if it had arrived from another node. Each Gossamer node maintains a follow list of identities whose messages the user is interested in seeing. When Gossamer receives a novel status message during a gossip exchange, it displays it to the user if and only if its identity is on the node's follow list. Otherwise, the message is not displayed, but will be shared onwards with other nodes. In this way, every Gossamer node acts as a relay for every other Gossamer node. If Gossamer receives a message signed by an identity it has seen attestations for, it attaches those attestations to the message before delivering them onwards. In this way, users' verifications of one another's identity spread through the network organically.","title":"Status Messages"},{"location":"gossamer/#private-messages","text":"Gossamer can optionally encrypt messages, allowing users to send one another private messages. These messages are carried over the Gossamer network as normal, but only nodes holding the appropriate identity key can decrypt them and display them to the user. (At any given time, most Gossamer nodes hold many private messages they cannot decrypt.) Private messages do not carry the author's identity or full profile in the clear. The author's bare identity is included in the encrypted part of the message, to allow the intended recipient to identify the sender. TODO : sign-then-encrypt, or encrypt-then-sign? If sign-then-encrypt, are private messages exempted from the \u201cdrop broken messages\u201d rule above?","title":"Private Messages"},{"location":"gossamer/#following-users","text":"Each Gossamer node maintains a database of followed identities. (This may or may not include the owner's own identity.) Any message stored in the node published by an identity in this database will be shown to the user in a timeline-esque view. Gossamer's follow list is purely local , and is not shared between nodes even if they have identities in common. The follow list is additionally stored encrypted using the node's identities (any one identity is sufficient to recover the list), to ensure that the follow list is not easily available to others without the node owner's permission. Exercises such as Finding Paul Revere have shown that the collection of graph edges showing who communicates with whom can often be sufficient to map identities into people. Gossamer attempts to restrict access to this data, believing it is not the network's place to know who follows who.","title":"Following Users"},{"location":"gossamer/#verified-identities","text":"Gossamer allows identities to sign one anothers' public keys. These signatures form verifications . Gossamer considers an identity verified if any of the following hold: Gossamer has access to the identity key for the identity itself. Gossamer has access to the identity key for at least one of the identity's verifications. The identity is signed by at least three (todo: or however many, I didn't do the arithmetic yet) verified identities. Verified identities are marked in the user interface to make it obvious to the user whether a message is from a known friend or from an unknown identity. Gossamer allows users to sign new verifications for any identity they have seen. These verifications are initially stored locally, but will be published as messages transit the node as described below. Verification is a public fact: everyone can see which identities have verified which other identities. This is a potentially very powerful tool for reassociating identities with real-world people; Gossamer must make this clear to users. (I'm pretty sure you could find me, personally, just by watching whose identities I verify.) Each Gossamer node maintains a database of every verification it has ever seen or generated. If the node receives a message from an identity that appears in the verification database, and if the message is under some total size, Gossamer appends verifications from its database to the message before reinjecting it into the network. This allows verifications to propagate through","title":"Verified Identities"},{"location":"gossamer/#blocking-users","text":"Any social network will attract hostile users who wish to disrupt the network or abuse its participants. Users must be able to filter out these users, and must not provide too much feedback to blocked users that could otherwise be used to circumvent blocks. Each Gossamer node maintains a database of blocked identities. Any message from an identity in this database, or from an identity that is verified by three or more identities in this database, will automatically be filtered out from display. (Additionally, transitively-blocked users will automatically be added to the block database. Blocking is contagious.) ( TODO : should Gossamer drop blocked messages? How does that interact with the inevitable \u201cshared blocklist\u201d systems that arise in any social network?) As with the follow list, the block database is encrypted using the node's identities. Gossamer encourages users to create new identities as often as they see fit and attempts to separate identities from one another as much as possible. This is fundamentally incompatible with strong blocking. It will always be possible for a newly-created identity to deliver at least one message before being blocked. This is a major design problem ; advice encouraged.","title":"Blocking Users"},{"location":"gossamer/#gossamer-network-primitives","text":"The Gossamer network is built around a gossip protocol, wherein nodes connect to one another periodically to exchange messages with one another. Connections occur over the existing IP internet infrastructure, traversing NAT networks where possible to ensure that users on residential and corporate networks can still participate. Gossamer bootstraps its network using a number of paths: Gossamer nodes in the same broadcast domain discover one another using UDP broadcasts as well as Bonjour/mDNS. Gossamer can generate locator strings, which can be shared \u201cout of band\u201d via email, SMS messages, Twitter, graffiti, etc. Gossamer nodes share knowledge of nodes whenever they exchange messages, to allow the Gossamer network to recover from lost nodes and to permit nodes to remain on the network as \u201cknown\u201d nodes are lost to outages and entropy.","title":"Gossamer Network Primitives"},{"location":"gossamer/#locators","text":"A Gossamer locator is a URL in the g scheme, carrying an encoding of one or more network addresses as well as an encoding of one or more identities (see below). Gossamer's software attempts to determine an appropriate identifier for any identities it holds based on the host computer's network configuration, taking into account issues like NAT traversal wherever possible. TODO : Gossamer and uPNP, what do locators look like? When presented with an identifier, Gossamer offers to follow the identities it contains, and uses the nodes whose addresses it contains to connect to the Gossamer network. This allows new clients to bootstrap into Gossamer, and provides an easy way for users to exchange Gossamer identities to connect to one another later. (Clever readers will note that the address list is actually independent of the identity list.)","title":"Locators"},{"location":"gossamer/#gossip","text":"Each Gossamer node maintains a pair of \u201cfreshness\u201d databases, associating some information with a freshness score (expressed as an integer). One freshness database holds the addresses of known Gossamer nodes, and another holds Gossamer messages. Whenever two Gossamer nodes interact, each sends the other a Gossamer node from its current node database, and a message from its message database. When selecting an item to send for either category, Gossamer uses a random selection that weights towards items with a higher \u201cfreshness\u201d score. ( TODO : how?) When sending a fact, if the receiving node already knows the fact, both nodes decrement that fact's freshness by one. If the receiving node does not already know the fact, the sending node leaves its freshness unaltered, and the receiving node sets its freshness to the freshest possible value. This system encourages nodes to exchange \u201cfresh\u201d facts, then cease exchanging them as the network becomes aware of them. During each exchange, Gossamer nodes send each other one Gossamer node address, and one Gossamer message. Both nodes adjust their freshness databases, as above. If fact exchange fails while communicating with a Gossamer node, both nodes decrement their peer's freshness. Unreliable nodes can continue to initiate connections to other nodes, but will rarely be contacted by other Gossamer nodes. TODO : How do we avoid DDOSing brand-new gossamer nodes with the full might of Gossamer's network? TODO : Can we reuse Bittorrent's DHT system (BEP-5) to avoid having every node know the full network topology? TODO : Are node-to-node exchanges encrypted? If so, why and how?","title":"Gossip"},{"location":"gossamer/#authenticity","text":"Gossamer node addresses are not authenticated. Gossamer relies on freshness to avoid delivering excess traffic to systems not participating in the Gossamer network. ( TODO : this is a shit system for avoiding DDOS, though.) Gossamer messages are partially authenticated: each carries with it a public key, and a signature. If the signature cannot be verified with the included public key, it must be discarded immediately and it must not be propagated to other nodes. The node delivering the message may also be penalized by having its freshness reduced in the receiving node's database.","title":"Authenticity"},{"location":"gossamer/#gossip-triggers","text":"Gossamer triggers a new Gossip exchange under the following circumstances: 15 seconds, plus a random jitter between zero and 15 more seconds, elapse since the last exchange attempt. Gossamer completes an exchange wherein it learned a new fact from another node. A user injects a fact into Gossamer directly. Gossamer exchanges that fail, or that deliver only already-known facts, do not trigger further exchanges immediately. TODO : how do we prevent Gossamer from attempting to start an unbounded number of exchanges at the same time?","title":"Gossip Triggers"},{"location":"gossamer/#size","text":"Gossamer must not exhaust the user's disk. Gossamer discards extremely un-fresh messages, attempting to keep the on-disk size of the message database to under 10% of the total local storage, or under a user-configurable threshold. Gossamer rejects over-large messages. Public messages carry with them the author's profile and a potentially large collection of verifications. Messages over some size ( TODO what size?) are discarded on receipt without being stored, and the message exchange is considered to have failed.","title":"Size"},{"location":"gossamer/coda/","text":"A Coda \u00b6 Kit : How would you make a site where the server operator can't get at a user's data, and given handling complaints and the fact that people can still screen cap receipts etc, would you? Is it a valuable goal? Owen : That's what torpedoed my interest in developing gossamer further, honestly meg laid out an abuse case so dismal that I consider the whole concept compromised centralizing the service a little - mastodon-ishly, say - improves the situation a bit, but if they can't get at their users' data their options are limited I think secrecy and republication resilience are kind of non-goals, and the lesson I took is that accountability (and thus locality and continuity of identity) are way more important specifically accountability between community members, not accountability to the operator or to the state","title":"A Coda"},{"location":"gossamer/coda/#a-coda","text":"Kit : How would you make a site where the server operator can't get at a user's data, and given handling complaints and the fact that people can still screen cap receipts etc, would you? Is it a valuable goal? Owen : That's what torpedoed my interest in developing gossamer further, honestly meg laid out an abuse case so dismal that I consider the whole concept compromised centralizing the service a little - mastodon-ishly, say - improves the situation a bit, but if they can't get at their users' data their options are limited I think secrecy and republication resilience are kind of non-goals, and the lesson I took is that accountability (and thus locality and continuity of identity) are way more important specifically accountability between community members, not accountability to the operator or to the state","title":"A Coda"},{"location":"gossamer/mistakes/","text":"Design Mistakes \u00b6 Is Gossamer Up? \u00b6 @megtastique points out that two factors doom the whole design: There's no way to remove content from Gossamer once it's published, and Gossamer can anonymously share images. Combined, these make Gossamer the perfect vehicle for revenge porn and other gendered, sexually-loaded network abuse. This alone is enough to doom the design, as written: even restricting the size of messages to the single kilobyte range still makes it trivial to irrevocably disseminate links to similar content. Protected Feeds? Who Needs Those? \u00b6 Gossamer's design does not carry forward an important Twitter feature: the protected feed. In brief, protected feeds allow people to be choosy about who reads their status updates, without necessarily having to pick and choose who gets to read them on a message by message basis. This is an important privacy control for people who wish to engage with people they know without necessarily disclosing their whereabouts and activities to the world at large. In particular, it's important to vulnerable people because it allows them to create their own safe spaces. Protected feeds are not mere technology, either. Protected feeds carry with them social expectations: Twitter clients often either refuse to copy text from a protected feed, or present a warning when the user tries to copy text, which acts as a very cheap and, apparently, quite effective brake on the casual re-sharing that Twitter encourages for public feeds. DDOS As A Service \u00b6 Gossamer's network protocol converges towards a total graph, where every node knows how to connect to every other node, and new information (new posts) rapidly push out to every single node. If you've ever been privy to the Twitter \u201cfirehose\u201d feed, you'll understand why this is a drastic mistake. Even a moderately successful social network sees on the order of millions of messages a day. Delivering all of this directly to every node all of the time would rapidly drown users in bandwidth charges and render their internet connections completely unusable. Gossamer's design also has no concept of \u201cquiet\u201d periods: every fifteen to thirty seconds, rain or shine, every node is supposed to wake up and exchange data with some other node, regardless of how long it's been since either node in the exchange has seen new data. This very effectively ensures that Gossamer will continue to flood nodes with traffic at all times; the only way to halt the flood is to shut off the Gossamer client. Passive Nodes Matter \u00b6 It's impractical to run an inbound data service on a mobile device. Mobile devices are, by and large, not addressable or reachable by the internet at large. Mobile devices also provide a huge proportion of Twitter's content: the ability to rapidly post photos, location tags, and short text while away from desks, laptops, and formal internet connections is a huge boon for ad-hoc social organization. You can invite someone to the pub from your phone, from in front of the pub. (This interacts ... poorly with the DDOS point, above.) Traffic Analysis \u00b6 When a user enters a new status update or sends a new private message, their Gossamer node immediately forwards it to at least one other node to inject it into the network. This makes unencrypted Gossamer relatively vulnerable to traffic analysis for correlating Gossamer identities with human beings. Someone at a network \u201cpinch point\u201d -- an ISP, or a coffee shop wifi router -- can monitor Gossamer traffic entering and exiting nodes on their network and easily identify which nodes originated which messages, and thus which nodes have access to which identities. This seriously compromises the effectiveness of Gossamer's decentralized, self-certifying identities.","title":"Design Mistakes"},{"location":"gossamer/mistakes/#design-mistakes","text":"","title":"Design Mistakes"},{"location":"gossamer/mistakes/#is-gossamer-up","text":"@megtastique points out that two factors doom the whole design: There's no way to remove content from Gossamer once it's published, and Gossamer can anonymously share images. Combined, these make Gossamer the perfect vehicle for revenge porn and other gendered, sexually-loaded network abuse. This alone is enough to doom the design, as written: even restricting the size of messages to the single kilobyte range still makes it trivial to irrevocably disseminate links to similar content.","title":"Is Gossamer Up?"},{"location":"gossamer/mistakes/#protected-feeds-who-needs-those","text":"Gossamer's design does not carry forward an important Twitter feature: the protected feed. In brief, protected feeds allow people to be choosy about who reads their status updates, without necessarily having to pick and choose who gets to read them on a message by message basis. This is an important privacy control for people who wish to engage with people they know without necessarily disclosing their whereabouts and activities to the world at large. In particular, it's important to vulnerable people because it allows them to create their own safe spaces. Protected feeds are not mere technology, either. Protected feeds carry with them social expectations: Twitter clients often either refuse to copy text from a protected feed, or present a warning when the user tries to copy text, which acts as a very cheap and, apparently, quite effective brake on the casual re-sharing that Twitter encourages for public feeds.","title":"Protected Feeds? Who Needs Those?"},{"location":"gossamer/mistakes/#ddos-as-a-service","text":"Gossamer's network protocol converges towards a total graph, where every node knows how to connect to every other node, and new information (new posts) rapidly push out to every single node. If you've ever been privy to the Twitter \u201cfirehose\u201d feed, you'll understand why this is a drastic mistake. Even a moderately successful social network sees on the order of millions of messages a day. Delivering all of this directly to every node all of the time would rapidly drown users in bandwidth charges and render their internet connections completely unusable. Gossamer's design also has no concept of \u201cquiet\u201d periods: every fifteen to thirty seconds, rain or shine, every node is supposed to wake up and exchange data with some other node, regardless of how long it's been since either node in the exchange has seen new data. This very effectively ensures that Gossamer will continue to flood nodes with traffic at all times; the only way to halt the flood is to shut off the Gossamer client.","title":"DDOS As A Service"},{"location":"gossamer/mistakes/#passive-nodes-matter","text":"It's impractical to run an inbound data service on a mobile device. Mobile devices are, by and large, not addressable or reachable by the internet at large. Mobile devices also provide a huge proportion of Twitter's content: the ability to rapidly post photos, location tags, and short text while away from desks, laptops, and formal internet connections is a huge boon for ad-hoc social organization. You can invite someone to the pub from your phone, from in front of the pub. (This interacts ... poorly with the DDOS point, above.)","title":"Passive Nodes Matter"},{"location":"gossamer/mistakes/#traffic-analysis","text":"When a user enters a new status update or sends a new private message, their Gossamer node immediately forwards it to at least one other node to inject it into the network. This makes unencrypted Gossamer relatively vulnerable to traffic analysis for correlating Gossamer identities with human beings. Someone at a network \u201cpinch point\u201d -- an ISP, or a coffee shop wifi router -- can monitor Gossamer traffic entering and exiting nodes on their network and easily identify which nodes originated which messages, and thus which nodes have access to which identities. This seriously compromises the effectiveness of Gossamer's decentralized, self-certifying identities.","title":"Traffic Analysis"},{"location":"mysql/choose-something-else/","text":"Do Not Pass This Way Again \u00b6 Warning I wrote this article in 2013, in what amounts to a fit of pique, and never revisited it. Much of this information is outdated, and you rely on it at your own risk. I restored it at the request of a reader . The tone and structure of this article also reflects an angrier and much less understanding person than the one I try to be today. Don't let my anger be your cudgel. Considering MySQL? Use something else. Already on MySQL? Migrate. For every successful project built on MySQL, you could uncover a history of time wasted mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense of accomplishment over the effort spent making MySQL behave. Thesis: databases fill roles ranging from pure storage to complex and interesting data processing; MySQL is differently bad at both tasks. Real apps all fall somewhere between these poles, and suffer variably from both sets of MySQL flaws. MySQL is bad at storage . MySQL is bad at data processing . MySQL is bad by design . Bad arguments for using MySQL. Much of this is inspired by the principles behind PHP: A Fractal of Bad Design . I suggest reading that article too -- it's got a lot of good thought in it even if you already know to stay well away from PHP. (If that article offends you, well, this page probably will too.) Storage \u00b6 Storage systems have four properties: Take and store data they receive from applications. Keep that data safe against loss or accidental change. Provide stored data to applications on demand. Give administrators effective management tools. In a truly \u201cpure\u201d storage application, data-comprehension features (constraints and relationships, nontrivial functions and aggregates) would go totally unused. There is a time and a place for this: the return of \u201cNoSQL\u201d storage systems attests to that. Pure storage systems tend to be closely coupled to their \u201cmain\u201d application: consider most web/server app databases. \u201cSecondary\u201d clients tend to be read-only (reporting applications, monitoring) or to be utilities in service of the main application (migration tools, documentation tools). If you believe constraints, validity checks, and other comprehension features can be implemented in \u201cthe application,\u201d you are probably thinking of databases close to this pole. Storing Data \u00b6 MySQL has many edge cases which reduce the predictability of its behaviour when storing information. Most of these edge cases are documented, but violate the principle of least surprise (not to mention the expectations of users familiar with other SQL implementations). Implicit conversions (particularly to and from string types) can modify MySQL's behaviour. Many implicit conversions are also silent (no warning, no diagnostic), by design, making it more likely developers are entirely unaware of them until one does something surprising. Conversions that violate basic constraints (range, length) of the output type often coerce data rather than failing. Sometimes this raises a warning; does your app check for those? This behaviour is unlike many typed systems (but closely like PHP and remotely like Perl). Conversion behaviour depends on a per-connection configuration value ( sql_mode ) that has a large constellation of possible states , making it harder to carry expectations from manual testing over to code or from tool to tool. MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1. The implimentation of utf8 up until MySQL 5.5 was only the 3-byte BMP . MySQL 5.5 and beyond supports a 4-byte utf8 , but confusingly must be set with the character-set utf8mb4 . Implementation details of these encodings within MySQL, such as the utf8 3-byte limit, tend to leak out into client applications. Data that does not fit MySQL's understanding of the storage encoding will be transformed until it does, by truncation or replacement, by default. Collation support is per-encoding, with one of the stranger default configurations: by default, the collation orders characters according to Swedish alphabetization rules, case-insensitively. Since it's the default, lots of folks who don't know the manual inside-out and backwards observe MySQL's case-insensitive collation behaviour ( 'a' = 'A' ) and conclude that \u201cMySQL is case-insensitive,\u201d complicating any effort to use a case-sensitive locale. Both the encoding and the collation can vary, independently, by column . Do you keep your schema definition open when you write queries to watch out for this sort of shit? The TIMESTAMP type tries to do something smart by storing values in a canonical timezone (UTC), but it's done with so few affordances that it's very hard to even tell that MySQL's done a right thing with your data. And even after that, the result of foo < '2012-04-01 09:00:00' still depends on what time of year it is when you evaluate the query, unless you're very careful with your connection timezone. TIMESTAMP is also special-cased in MySQL's schema definition handling, making it easy to accidentally create (or to accidentally fail to create) an auto-updating field when you didn't (did) want one. DATETIME does not get the same timezone handling TIMESTAMP does. What? And you can't provide your own without resorting to hacks like extra columns. Oh, did you want to use MySQL's timezone support? Too bad, none of that data's loaded by default. You have to process the OS's tzinfo files into SQL with a separate tool and import that. If you ever want to update MySQL's timezone settings later, you need to take the server down just to make sure the changes apply. Preserving Data \u00b6 ... against unexpected changes: like most disk-backed storage systems, MySQL is as reliable as the disks and filesystems its data lives on. MySQL provides no additional functionality in terms of mirroring or hardware failure tolerance (such as Oracle ASM ). However this is a limitation shared with many, many other systems. When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page checksums in order to detect corruption caused by underlying storage. However, many third-party software applications, as sell as users upgrading from earlier versions of MySQL may be using MyISAM, which will frequently corrupt data files on improper shutdown. The implicit conversion rules that bite when storing data also bite when asking MySQL to modify data - my favourite example being a fat-fingered UPDATE query where a mistyped = (as - , off by a single key) caused 90% of the rows in the table to be affected, instead of one row, because of implicit string-to-integer conversions. ... against loss: hoo boy. MySQL, out of the box, gives you three approaches to backups : Take \u201cblind\u201d filesystem backups with tar or rsync . Unless you meticulously lock tables or make the database read-only for the duration, this produces a backup that requires crash recovery before it will be usable, and can produce an inconsistent database. This can bite quite hard if you use InnoDB, as InnoDB crash recovery takes time proportional to both the number of InnoDB tables and the total size of InnoDB tables, with a large constant. Dump to SQL with mysqldump : slow, relatively large backups, and non-incremental. Archive binary logs: fragile, complex, over-configurable, and configured badly by default. (Binary logging is also the basis of MySQL's replication system.) If neither of these are sufficient, you're left with purchasing a backup tool from Oracle or from one of the third-party MySQL vendors. Like many of MySQL's features, the binary logging feature is too configurable , while still, somehow, defaulting to modes that are hazardous or surprising: the default behaviour is to log SQL statements, rather than logging their side effects. This has lead to numerous bugs over the years; MySQL (now) makes an effort to make common \u201cnon-deterministic\u201d cases such as NOW() and RANDOM() act deterministically but these have been addressed using ad-hoc solutions. Restoring binary-log-based backups can easily lead to data that differs from the original system, and by the time you've noticed the problem, it's too late to do anything about it. (Seriously. The binary log entries for each statement contain the \u201ccurrent\u201d time on the master and the random seed at the start of the statement, just in case. If your non-deterministic query uses any other function, you're still fucked by default .) Additionally, a number of apparently-harmless features can lead to backups or replicas wandering out of sync with the original database, in the default configuration: AUTO_INCREMENT and UPDATE statements. AUTO_INCREMENT and INSERT statements (sometimes). SURPRISE. Triggers. User-defined (native) functions. Stored (procedural SQL) functions. DELETE ... LIMIT and UPDATE ... LIMIT statements, though if you use these, you've misunderstood how SQL is supposed to work. INSERT ... ON DUPLICATE KEY UPDATE statements. Bulk-loading data with LOAD DATA statements. Operations on floating-point values . Retrieving Data \u00b6 This mostly works as expected. Most of the ways MySQL will screw you happen when you store data, not when you retrieve it. However, there are a few things that implicitly transform stored data before returning it: MySQL's surreal type conversion system works the same way during SELECT that it works during other operations, which can lead to queries matching unexpected rows: owen@scratch> CREATE TABLE account ( -> accountid INTEGER -> AUTO_INCREMENT -> PRIMARY KEY, -> discountid INTEGER -> ); Query OK, 0 rows affected (0.54 sec) owen@scratch> INSERT INTO account -> (discountid) -> VALUES -> (0), -> (1), -> (2); Query OK, 3 rows affected (0.03 sec) Records: 3 Duplicates: 0 Warnings: 0 owen@scratch> SELECT * -> FROM account -> WHERE discountid = 'banana'; +-----------+------------+ | accountid | discountid | +-----------+------------+ | 1 | 0 | +-----------+------------+ 1 row in set, 1 warning (0.05 sec) Ok, unexpected, but there's at least a warning (do your apps check for those?) - let's see what it says: owen@scratch> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' | +---------+------+--------------------------------------------+ 1 row in set (0.03 sec) I can count on one hand the number of DOUBLE columns in this example and still have five fingers left over. You might think this is an unreasonable example: maybe you should always make sure your argument types exactly match the field types, and the query should use 57 instead of 'banana' . (This does actually \u201cfix\u201d the problem.) It's unrealistic to expect every single user to run SHOW CREATE TABLE before every single query, or to memorize the types of every column in your schema, though. This example derived from a technically-skilled but MySQL-ignorant tester examining MySQL data to verify some behavioural changes in an app. Actually, you don't even need a table for this: SELECT 0 = 'banana' returns 1 . Did the PHP folks design MySQL's = operator? This isn't affected by sql_mode , even though so many other things are. TIMESTAMP columns (and only TIMESTAMP columns) can return apparently-differing values for the same stored value depending on per-connection configuration even during read-only operation. This is done silently and the default behaviour can change as a side effect of non-MySQL configuration changes in the underlying OS. String-typed columns are transformed for encoding on output if the connection is not using the same encoding as the underlying storage, using the same rules as the transformation on input. Values that stricter sql_mode settings would reject during storage can still be returned during retrieval; it is impossible to predict in advance whether such data exists, since clients are free to set sql_mode to any value at any time. Efficiency \u00b6 For purely store-and-retrieve applications, MySQL's query planner (which transforms the miniature program contained in each SQL statement into a tree of disk access and data manipulation steps) is sufficient, but only barely. Queries that retrieve data from one table, or from one table and a small number of one-to-maybe-one related tables, produce relatively efficient plans. MySQL, however, offers a number of tuning options that can have dramatic and counterintuitive effects, and the documentation provides very little advice for choosing settings. Tuning relies on the administrator's personal experience, blog articles of varying quality, and consultants. The MySQL query cache defaults to a non-zero size in some commonly-installed configurations. However, the larger the cache, the slower writes proceed: invalidating cache entries that include the tables modified by a query means considering every entry in the cache. This cache also uses MySQL's LRU implementation, which has its own performance problems during eviction that get worse with larger cache sizes. Memory-management settings, including key_buffer_size and innodb_buffer_pool_size , have non-linear relationships with performance. The standard advice advises making whichever value you care about more to a large value, but this can be counterproductive if the related data is larger than the pool can hold: MySQL is once again bad at discarding old buffer pages when the buffer is exhausted, leading to dramatic slowdowns when query load reaches a certain point. This also affects filesystem tuning settings such as table_open_cache . InnoDB, out of the box, comes configured to use one large (and automatically growing) tablespace file for all tables, complicating backups and storage management. This is fine for trivial databases, but MySQL provides no tools (aside from DROP TABLE and reloading the data from an SQL dump) for transplanting a table to another tablespace, and provides no tools (aside from a filesystem-level rm , and reloading all InnoDB data from an SQL dump) for reclaiming empty space in a tablespace file. MySQL itself provides very few tools to manage storage; tasks like storing large or infrequently-accessed tables and databases on dedicated filesystems must be done on the filesystem, with MySQL shut down. Data Processing \u00b6 Data processing encompasses tasks that require making decisions about data and tasks that derive new data from existing data. This is a huge range of topics: Deciding (and enforcing) application-specific validity rules. Summarizing and deriving data. Providing and maintaining alternate representations and structures. Hosting complex domain logic near the data it operates on. The further towards data processing tasks applications move, the more their SQL resembles tiny programs sent to the data. MySQL is totally unprepared for programs, and expects SQL to retrieve or modify simple rows. Validity \u00b6 Good constraints are like assert s: in an ideal world, you can't tell if they work, because your code never violates them. Here in the real world, constraint violations happen for all sorts of reasons, ranging from buggy code to buggy human cognition. A good database gives you more places to describe your expectations and more tools for detecting and preventing surprises. MySQL, on the other hand, can't validate your data for you, beyond simple (and fixed) type constraints: As with the data you store in it, MySQL feels free to change your table definitions implicitly and silently . Many of these silent schema changes have important performance and feature-availability implications. Foreign keys are ignored if you spell them certain, common, ways: CREATE TABLE foo ( -- ..., parent INTEGER NOT NULL REFERENCES foo_parent (id) -- , ... ) silently ignores the foreign key specification, while CREATE TABLE foo ( -- ..., parent INTEGER NOT NULL, FOREIGN KEY (parent) REFERENCES foo_parent (id) -- , ... ) preserves it. Foreign keys, one of the most widely-used database validity checks, are an engine-specific feature, restricting their availability in combination with other engine-specific features. (For example, a table cannot have both foreign key constraints and full-text indexes, as of MySQL 5.5.) Configurations that violate assumptions about foreign keys, such as a foreign key pointing into a MyISAM or NDB table, do not cause warnings or any other diagnostics. The foreign key is simply discarded. SURPRISE. (MySQL is riddled with these sorts of surprises, and apologists lean very heavily on the \u201cthat's documented\u201d excuse for its bad behaviour.) The MySQL parser recognizes CHECK clauses, which allow schema developers to make complex declarative assertions about tuples in the database, but discards them without warning . If you want CHECK -like constraints, you must implement them as triggers - but see below... MySQL's comprehension of the DEFAULT clause is, uh, limited: only constants are permitted, except for the special case of at most one TIMESTAMP column per table and at most one sequence-derived column. Who designed this mess? Furthermore, there's no way to say \u201cno default\u201d and raise an error when an INSERT forgets to provide a value. The default DEFAULT is either NULL or a zero-like constant ( 0 , '' , and so on). Even for types with no meaningful zero-like values ( DATETIME ). MySQL has no mechanism for introducing new types, which might otherwise provide a route to enforcing validity. Counting the number of special cases in MySQL's existing type system illustrates why that's probably unfixable. I hope every client with write access to your data is absolutely perfect, because MySQL cannot help you if you make a mistake. Summarizing and Deriving Data \u00b6 SQL databases generally provide features for doing \u201cinteresting\u201d things with sets of tuples, and MySQL is no exception. However, MySQL's limitations mean that actually processing data in the database is fraught with wasted money, brains, and time: Aggregate ( GROUP BY ) queries run up against limits in MySQL's query planner: a query with both WHERE and GROUP BY clauses can only satisfy one constraint or the other with indexes, unless there's an index that covers all the relevant fields in both clauses, in the right order. (What this order is depends on the complexity of the query and on the distribution of the underlying data, but that's hardly MySQL-specific.) If you have all three of WHERE , GROUP BY , and ORDER BY in the same query, you're more or less fucked. Good luck designing a single index that satisfies all three. Even though MySQL allows database administrators to define normal functions in a procedural SQL dialect , custom aggregate functions can only be defined by native plugins. Good thing, too, because procedural SQL in MySQL is its own kind of awful - more on that below. Subqueries are often convenient and occasionally necessary for expressing multi-step transformations on some underlying data. MySQL's query planner has only one strategy for optimizing them: evaluate the innermost query as written, into an in-memory table, then use a nested loop to satisfy joins or IN clauses. For large subquery results or interestingly nested subqueries, this is absurdly slow. MySQL's query planner can't fold constraints from outer queries into subqueries. The generated in-memory table never has any indexes, ever, even when appropriate indexes are \u201cobvious\u201d from the surrounding query; you cannot even specify them. These limitations also affect views, which are evaluated as if they were subqueries. In combination with the lack of constraint folding in the planner, this makes filtering or aggregating over large views completely impractical. MySQL lacks common table expressions . Even if subquery efficiency problems get fixed, the inability to give meaningful names to subqueries makes them hard to read and comprehend. I hope you like CREATE TEMPORARY TABLE AS SELECT , because that's your only real alternative. Window functions do not exist at all in MySQL. This complicates many kinds of analysis, including time series analyses and ranking analyses. Specific cases (for example, assigning rank numbers to rows) can be implemented using server-side variables and side effects during SELECT . What? Good luck understanding that code in six months. Even interesting joins run into trouble. MySQL's query planner has trouble with a number of cases that can easily arise in well-normalized data: Joining and ordering by rows from multiple tables often forces MySQL to dump the whole join to a temporary table, then sort it -- awful, especially if you then use LIMIT BY to paginate the results. JOIN clauses with non-trivial conditions, such as joins by range or joins by similarity, generally cause the planner to revert to table scans even if the same condition would be indexable outside of a join. Joins with WHERE clauses that span both tables, where the rows selected by the WHERE clause are outliers relative to the table statistics, often cause MySQL to access tables in suboptimal order. Ok, forget about interesting joins. Even interesting WHERE clauses can run into trouble: MySQL can't index deterministic functions of a row, either. While some deterministic functions can be eliminated from the WHERE clause using simple algebra, many useful cases (whitespace-insensitive comparison, hash-based comparisons, and so on) can't. You can fake these by storing the computed value in the row alongside the \u201creal\u201d value. This leaves your schema with some ugly data repetition and a chance for the two to fall out of sync, and clients must use the \u201ccomputed\u201d column explicitly. Oh, and they must maintain the \u201ccomputed\u201d version explicitly. Or you can use triggers. Ha. See above. And now you know why MySQL advocates are such big fans of doing data processing in \u201cthe client\u201d or \u201cthe app.\u201d Alternate Representations and Derived Tables \u00b6 Many databases let schema designers and administrators abstract the underlying \u201cphysical\u201d table structure from the presentation given to clients, or to some specific clients, for any of a number of reasons. MySQL tries to let you do this, too! And fumbles it quite badly. As mentioned above, non-trivial views are basically useless. Queries like SELECT some columns FROM a_view WHERE id = 53 are evaluated in the stupidest -- and slowest -- possible way. Good luck hiding unusual partitioning arrangements or a permissions check in a view if you want any kind of performance. The poor interactions between triggers and binary logging's default configuration make it impractical to use triggers to maintain \u201cmaterialized\u201d views to avoid the problems with \u201creal\u201d views. It also effectively means triggers can't be used to emulate CHECK constraints and other consistency features. Code to maintain materialized views is also finicky and hard to get \u201cright,\u201d especially if the view includes aggregates or interesting joins over its source data. I hope you enjoy debugging MySQL's procedural SQL\u2026 For the relatively common case of wanting to abstract partitioned storage away for clients, MySQL actually has a tool for it! But it comes with enough caveats to strangle a horse : It's a separate table engine wrapping a \u201creal\u201d storage engine, which means it has its own, separate support for engine-specific features: transactions, foreign keys, and index types, AUTO_INCREMENT , and others. The syntax for configuring partitions makes selecting the wrong underlying engine entirely too easy, too. Partitioned tables may not be the referrent of foreign keys: you can't have both enforced relationships and this kind of storage management. MySQL doesn't actually know how to store partitions on separate disks or filesystems. You still need to reach underneath of MySQL do to actual storage management. Partitioning an InnoDB table under the default InnoDB configuration stores all of the partitions in the global tablespace file anyways. Helpful! For per-table configurations, they still all end up together in the same file. Partitioning InnoDB tables is a waste of time for managing storage. TL,DR: MySQL's partition support is so finicky and limited that MySQL-based apps tend to opt for multiple MySQL servers (\u201csharding\u201d) instead. Hosting Logic In The Database \u00b6 Yeah, yeah, the usual reaction to stored procedures and in-DB code is \u201ceww, yuck!\u201d for some not-terrible reasons, but hear me out on two points: Under the freestanding-database-server paradigm, there will usually be network latency between database clients and the database itself. There are two ways to minimize the impact of that: move the data to the code in bulk to minimize round-trips, or move the code to the data. Some database administration tasks are better implemented using in-database code than as freestanding clients: complex data migrations that can't be expressed as freestanding SQL queries, for example. MySQL, as of version 5.0 (released in 2003 -- remember that date, I'll come back to it), has support for in-database code via a procedural SQL-like dialect, like many other SQL databases. This includes server-side procedures (blocks of stored code that are invoked outside of any other statements and return statement-like results), functions (blocks of stored code that compute a result, used in any expression context such as a SELECT list or WHERE clause), and triggers (blocks of stored code that run whenever a row is created, modified, or deleted). Given the examples of other contemporaneous procedural languages , MySQL's procedural dialect -- an implementation of the SQL/PSM language -- is quite limited: There is no language construct for looping over a query result. This seems like a pretty fundamental feature for a database-hosted language, but no. There is no language construct for looping while a condition holds. This seems like a pretty fundamental feature for an imperative language designed any time after about 1975, but no. There is no language construct for looping over a range. There is, in fact, one language construct for looping: the unconditional loop. All other iteration control is done via conditional LEAVE statements, as BEGIN DECLARE c CURSOR FOR SELECT foo, bar, baz FROM some_table WHERE some_condition; DECLARE done INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; DECLARE c_foo INTEGER; DECLARE c_bar INTEGER; DECLARE c_baz INTEGER; OPEN c; process_some_table: LOOP FETCH c INTO c_foo, c_bar, c_baz; IF done THEN LEAVE process_some_table; END IF; -- do something with c_foo, c_bar, c_baz END LOOP; END; The original \u201cstructured programming\u201d revolution in the 1960s seems to have passed the MySQL team by. Okay, I lied. There are two looping constructs: there's also the REPEAT ... UNTIL condition END REPEAT construct, analogous to C's do {} while (!condition); loop. But you still can't loop over query results, and you can't run zero iterations of the loop's main body this way. There is nothing resembling a modern exception system with automatic scoping of handlers or declarative exception management. Error handling is entirely via Visual Basic-style \u201con condition X, do Y\u201d instructions, which remain in effect for the rest of the program's execution. In the language shipped with MySQL 5.0, there wasn't a way to signal errors, either: programmers had to resort to stunts like intentionally issuing failing queries , instead. Later versions of the language addressed this with the SIGNAL statement : see, they can learn from better languages, eventually. You can't escape to some other language, since MySQL doesn't have an extension mechanism for server-side languages or a good way to call out-of-process services during queries. The net result is that developing MySQL stored programs is unpleasant, uncomfortable, and far more error-prone than it could have been. Why Is MySQL The Way It Is? { #by-design } \u00b6 MySQL's technology and history contain the seeds of all of these flaws. Pluggable Storage Engines \u00b6 Very early in MySQL's life, the MySQL dev team realized that MyISAM was not the only way to store data, and opted to support other storage backends within MySQL. This is basically an alright idea; while I personally prefer storage systems that focus their effort on making one backend work very well, supporting multiple backends and letting third-party developers write their own is a pretty good approach too. Unfortunately, MySQL's storage backend interface puts a very low ceiling on the ways storage backends can make MySQL behave better. MySQL's data access paths through table engines are very simple: MySQL asks the engine to open a table, asks the engine to iterate through the table returning rows, filters the rows itself (outside of the storage engine), then asks the engine to close the table. Alternately, MySQL asks the engine to open a table, asks the engine to retrieve rows in range or for a single value over a specific index, filters the rows itself, and asks the engine to close the table. This simplistic interface frees table engines from having to worry about query optimization - in theory. Unfortunately, engine-specific features have a large impact on the performance of various query plans, but the channels back to the query planner provide very little granularity for estimating cost and prevent the planner from making good use of the engine in unusual cases. Conversely, the table engine system is totally isolated from the actual query, and can't make query-dependent performance choices \u201con its own.\u201d There's no third path; the query planner itself is not pluggable. Similar consequences apply to type checking, support for new types, or even something as \u201cobvious\u201d as multiple automatic TIMESTAMP columns in the same table. Table manipulation -- creation, structural modification, and so on -- runs into similar problems. MySQL itself parses each CREATE TABLE statement, then hands off a parsed representation to the table engine so that it can manage storage. The parsed representation is lossy: there are plenty of forms MySQL's parser recognizes that aren't representable in a TABLE structure, preventing engines from implementing, say, column or tuple CHECK constraints without MySQL's help. The sheer number of table engines makes that help very slow in coming. Any change to the table engine interface means perturbing the code to each engine, making progress on new MySQL-level features that interact with storage such as better query planning or new SQL constructs necessarily slow to implement and slow to test. Held Back By History \u00b6 The original MySQL team focused on pure read performance and on \u201cease of use\u201d (for new users with simple needs, as far as I can tell) over correctness and completeness, violating Knuth's laws of optimization. Many of these decisions locked MySQL into behaviours very early in its life that it still displays now. Features like implicit type conversions legitimately do help streamline development in very simple cases; experience with other languages unfortunately shows that the same behaviours sandbag development and help hide bugs in more sophisticated scenarios. MySQL has since changed hands, and the teams working on MySQL (and MariaDB, and Percona) are much more mature now than the team that made those early decisions. MySQL's massive and frequently non-savvy userbase makes it very hard to introduce breaking changes. At the same time, adding optional breaking changes via server and client mode flags (such as sql_mode ) increases the cognitive overhead of understanding MySQL's behaviours -- especially when that behaviour can vary from client to client, or when the server's configuration is out of the user's control (for example, on a shared host, or on EC2). A solution similar to Python's from __future__ import pragmas for making breaking changes opt-in some releases in advance of making them mandatory might help, but MySQL doesn't have the kind of highly-invested, highly-skilled user base that would make that effective -- and it still has all of the problems of modal behaviour. Bad Arguments \u00b6 Inevitably, someone's going to come along and tell me how wrong I am and how MySQL is just fine as a database system. These people are everywhere, and they mean well too, and they are almost all wrong. There are two good reasons to use MySQL: Some earlier group wrote for it, and we haven't finished porting our code off of MySQL. We've considered all of these points, and many more, and decided that ___feature_x___ that MySQL offers is worth the hassle. Unfortunately, these aren't the reasons people do give, generally. The following are much more common: It's good enough. No it ain't. There are plenty of other equally-capable data storage systems that don't come with MySQL's huge raft of edge cases and quirks. We haven't run into these problems. Actually, a lot of these problems happen silently . Odds are, unless you write your queries and schema statements with the manual open and refer back to it constantly, or have been using MySQL since the 3.x era daily , at least some of these issues have bitten you. The ones that prevent you from using your database intelligently are very hard to notice in action. We already know how to use it. MySQL development and administration causes brain damage, folks, the same way PHP does. Where PHP teaches programmers that \u201carray\u201d is the only structure you need, MySQL teaches people that databases are awkward, slow, hard-to-tune monsters that require constant attention. That doesn't have to be true; there are comfortable, fast, and easily-tuned systems out there that don't require daily care and feeding or the love of a specialist. It's the only thing our host supports. Get a better host . It's not like they're expensive or hard to find. We used it because it was there. Please hire some fucking software developers and go back to writing elevator pitches and flirting with Y Combinator. Everybody knows MySQL. It's easy to hire MySQL folks. It's easy to hire MCSEs, too, but you should be hiring for attitude and ability to learn, not for specific skillsets, if you want to run a successful software project. It's popular. Sure, and nobody ever got fired for buying IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if we let popularity dictate what technology we use and improve we'll never get anywhere. Marketing software to geeks is easy - it's just that lots of high-quality projects don't bother. It's lightweight. So's SQLite 3 or H2 . If you care about deployment footprint more than any other factor, MySQL is actually pretty clunky (and embedded MySQL has even bigger problems than freestanding MySQL). It's getting better, so we might as well stay on it. It's true , if you go by feature checklists and the manual, MySQL is improving \u201crapidly.\u201d 5.6 is due out soon and superficially looks to contain a number of good changes. I have two problems with this line of reasoning: Why wait? Other databases are good now , not eventually . MySQL has a history of providing the bare minimum to satisfy a feature checkbox without actually making the feature work well, work consistently, or work in combination with other features.","title":"Do Not Pass This Way Again"},{"location":"mysql/choose-something-else/#do-not-pass-this-way-again","text":"Warning I wrote this article in 2013, in what amounts to a fit of pique, and never revisited it. Much of this information is outdated, and you rely on it at your own risk. I restored it at the request of a reader . The tone and structure of this article also reflects an angrier and much less understanding person than the one I try to be today. Don't let my anger be your cudgel. Considering MySQL? Use something else. Already on MySQL? Migrate. For every successful project built on MySQL, you could uncover a history of time wasted mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense of accomplishment over the effort spent making MySQL behave. Thesis: databases fill roles ranging from pure storage to complex and interesting data processing; MySQL is differently bad at both tasks. Real apps all fall somewhere between these poles, and suffer variably from both sets of MySQL flaws. MySQL is bad at storage . MySQL is bad at data processing . MySQL is bad by design . Bad arguments for using MySQL. Much of this is inspired by the principles behind PHP: A Fractal of Bad Design . I suggest reading that article too -- it's got a lot of good thought in it even if you already know to stay well away from PHP. (If that article offends you, well, this page probably will too.)","title":"Do Not Pass This Way Again"},{"location":"mysql/choose-something-else/#storage","text":"Storage systems have four properties: Take and store data they receive from applications. Keep that data safe against loss or accidental change. Provide stored data to applications on demand. Give administrators effective management tools. In a truly \u201cpure\u201d storage application, data-comprehension features (constraints and relationships, nontrivial functions and aggregates) would go totally unused. There is a time and a place for this: the return of \u201cNoSQL\u201d storage systems attests to that. Pure storage systems tend to be closely coupled to their \u201cmain\u201d application: consider most web/server app databases. \u201cSecondary\u201d clients tend to be read-only (reporting applications, monitoring) or to be utilities in service of the main application (migration tools, documentation tools). If you believe constraints, validity checks, and other comprehension features can be implemented in \u201cthe application,\u201d you are probably thinking of databases close to this pole.","title":"Storage"},{"location":"mysql/choose-something-else/#storing-data","text":"MySQL has many edge cases which reduce the predictability of its behaviour when storing information. Most of these edge cases are documented, but violate the principle of least surprise (not to mention the expectations of users familiar with other SQL implementations). Implicit conversions (particularly to and from string types) can modify MySQL's behaviour. Many implicit conversions are also silent (no warning, no diagnostic), by design, making it more likely developers are entirely unaware of them until one does something surprising. Conversions that violate basic constraints (range, length) of the output type often coerce data rather than failing. Sometimes this raises a warning; does your app check for those? This behaviour is unlike many typed systems (but closely like PHP and remotely like Perl). Conversion behaviour depends on a per-connection configuration value ( sql_mode ) that has a large constellation of possible states , making it harder to carry expectations from manual testing over to code or from tool to tool. MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1. The implimentation of utf8 up until MySQL 5.5 was only the 3-byte BMP . MySQL 5.5 and beyond supports a 4-byte utf8 , but confusingly must be set with the character-set utf8mb4 . Implementation details of these encodings within MySQL, such as the utf8 3-byte limit, tend to leak out into client applications. Data that does not fit MySQL's understanding of the storage encoding will be transformed until it does, by truncation or replacement, by default. Collation support is per-encoding, with one of the stranger default configurations: by default, the collation orders characters according to Swedish alphabetization rules, case-insensitively. Since it's the default, lots of folks who don't know the manual inside-out and backwards observe MySQL's case-insensitive collation behaviour ( 'a' = 'A' ) and conclude that \u201cMySQL is case-insensitive,\u201d complicating any effort to use a case-sensitive locale. Both the encoding and the collation can vary, independently, by column . Do you keep your schema definition open when you write queries to watch out for this sort of shit? The TIMESTAMP type tries to do something smart by storing values in a canonical timezone (UTC), but it's done with so few affordances that it's very hard to even tell that MySQL's done a right thing with your data. And even after that, the result of foo < '2012-04-01 09:00:00' still depends on what time of year it is when you evaluate the query, unless you're very careful with your connection timezone. TIMESTAMP is also special-cased in MySQL's schema definition handling, making it easy to accidentally create (or to accidentally fail to create) an auto-updating field when you didn't (did) want one. DATETIME does not get the same timezone handling TIMESTAMP does. What? And you can't provide your own without resorting to hacks like extra columns. Oh, did you want to use MySQL's timezone support? Too bad, none of that data's loaded by default. You have to process the OS's tzinfo files into SQL with a separate tool and import that. If you ever want to update MySQL's timezone settings later, you need to take the server down just to make sure the changes apply.","title":"Storing Data"},{"location":"mysql/choose-something-else/#preserving-data","text":"... against unexpected changes: like most disk-backed storage systems, MySQL is as reliable as the disks and filesystems its data lives on. MySQL provides no additional functionality in terms of mirroring or hardware failure tolerance (such as Oracle ASM ). However this is a limitation shared with many, many other systems. When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page checksums in order to detect corruption caused by underlying storage. However, many third-party software applications, as sell as users upgrading from earlier versions of MySQL may be using MyISAM, which will frequently corrupt data files on improper shutdown. The implicit conversion rules that bite when storing data also bite when asking MySQL to modify data - my favourite example being a fat-fingered UPDATE query where a mistyped = (as - , off by a single key) caused 90% of the rows in the table to be affected, instead of one row, because of implicit string-to-integer conversions. ... against loss: hoo boy. MySQL, out of the box, gives you three approaches to backups : Take \u201cblind\u201d filesystem backups with tar or rsync . Unless you meticulously lock tables or make the database read-only for the duration, this produces a backup that requires crash recovery before it will be usable, and can produce an inconsistent database. This can bite quite hard if you use InnoDB, as InnoDB crash recovery takes time proportional to both the number of InnoDB tables and the total size of InnoDB tables, with a large constant. Dump to SQL with mysqldump : slow, relatively large backups, and non-incremental. Archive binary logs: fragile, complex, over-configurable, and configured badly by default. (Binary logging is also the basis of MySQL's replication system.) If neither of these are sufficient, you're left with purchasing a backup tool from Oracle or from one of the third-party MySQL vendors. Like many of MySQL's features, the binary logging feature is too configurable , while still, somehow, defaulting to modes that are hazardous or surprising: the default behaviour is to log SQL statements, rather than logging their side effects. This has lead to numerous bugs over the years; MySQL (now) makes an effort to make common \u201cnon-deterministic\u201d cases such as NOW() and RANDOM() act deterministically but these have been addressed using ad-hoc solutions. Restoring binary-log-based backups can easily lead to data that differs from the original system, and by the time you've noticed the problem, it's too late to do anything about it. (Seriously. The binary log entries for each statement contain the \u201ccurrent\u201d time on the master and the random seed at the start of the statement, just in case. If your non-deterministic query uses any other function, you're still fucked by default .) Additionally, a number of apparently-harmless features can lead to backups or replicas wandering out of sync with the original database, in the default configuration: AUTO_INCREMENT and UPDATE statements. AUTO_INCREMENT and INSERT statements (sometimes). SURPRISE. Triggers. User-defined (native) functions. Stored (procedural SQL) functions. DELETE ... LIMIT and UPDATE ... LIMIT statements, though if you use these, you've misunderstood how SQL is supposed to work. INSERT ... ON DUPLICATE KEY UPDATE statements. Bulk-loading data with LOAD DATA statements. Operations on floating-point values .","title":"Preserving Data"},{"location":"mysql/choose-something-else/#retrieving-data","text":"This mostly works as expected. Most of the ways MySQL will screw you happen when you store data, not when you retrieve it. However, there are a few things that implicitly transform stored data before returning it: MySQL's surreal type conversion system works the same way during SELECT that it works during other operations, which can lead to queries matching unexpected rows: owen@scratch> CREATE TABLE account ( -> accountid INTEGER -> AUTO_INCREMENT -> PRIMARY KEY, -> discountid INTEGER -> ); Query OK, 0 rows affected (0.54 sec) owen@scratch> INSERT INTO account -> (discountid) -> VALUES -> (0), -> (1), -> (2); Query OK, 3 rows affected (0.03 sec) Records: 3 Duplicates: 0 Warnings: 0 owen@scratch> SELECT * -> FROM account -> WHERE discountid = 'banana'; +-----------+------------+ | accountid | discountid | +-----------+------------+ | 1 | 0 | +-----------+------------+ 1 row in set, 1 warning (0.05 sec) Ok, unexpected, but there's at least a warning (do your apps check for those?) - let's see what it says: owen@scratch> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' | +---------+------+--------------------------------------------+ 1 row in set (0.03 sec) I can count on one hand the number of DOUBLE columns in this example and still have five fingers left over. You might think this is an unreasonable example: maybe you should always make sure your argument types exactly match the field types, and the query should use 57 instead of 'banana' . (This does actually \u201cfix\u201d the problem.) It's unrealistic to expect every single user to run SHOW CREATE TABLE before every single query, or to memorize the types of every column in your schema, though. This example derived from a technically-skilled but MySQL-ignorant tester examining MySQL data to verify some behavioural changes in an app. Actually, you don't even need a table for this: SELECT 0 = 'banana' returns 1 . Did the PHP folks design MySQL's = operator? This isn't affected by sql_mode , even though so many other things are. TIMESTAMP columns (and only TIMESTAMP columns) can return apparently-differing values for the same stored value depending on per-connection configuration even during read-only operation. This is done silently and the default behaviour can change as a side effect of non-MySQL configuration changes in the underlying OS. String-typed columns are transformed for encoding on output if the connection is not using the same encoding as the underlying storage, using the same rules as the transformation on input. Values that stricter sql_mode settings would reject during storage can still be returned during retrieval; it is impossible to predict in advance whether such data exists, since clients are free to set sql_mode to any value at any time.","title":"Retrieving Data"},{"location":"mysql/choose-something-else/#efficiency","text":"For purely store-and-retrieve applications, MySQL's query planner (which transforms the miniature program contained in each SQL statement into a tree of disk access and data manipulation steps) is sufficient, but only barely. Queries that retrieve data from one table, or from one table and a small number of one-to-maybe-one related tables, produce relatively efficient plans. MySQL, however, offers a number of tuning options that can have dramatic and counterintuitive effects, and the documentation provides very little advice for choosing settings. Tuning relies on the administrator's personal experience, blog articles of varying quality, and consultants. The MySQL query cache defaults to a non-zero size in some commonly-installed configurations. However, the larger the cache, the slower writes proceed: invalidating cache entries that include the tables modified by a query means considering every entry in the cache. This cache also uses MySQL's LRU implementation, which has its own performance problems during eviction that get worse with larger cache sizes. Memory-management settings, including key_buffer_size and innodb_buffer_pool_size , have non-linear relationships with performance. The standard advice advises making whichever value you care about more to a large value, but this can be counterproductive if the related data is larger than the pool can hold: MySQL is once again bad at discarding old buffer pages when the buffer is exhausted, leading to dramatic slowdowns when query load reaches a certain point. This also affects filesystem tuning settings such as table_open_cache . InnoDB, out of the box, comes configured to use one large (and automatically growing) tablespace file for all tables, complicating backups and storage management. This is fine for trivial databases, but MySQL provides no tools (aside from DROP TABLE and reloading the data from an SQL dump) for transplanting a table to another tablespace, and provides no tools (aside from a filesystem-level rm , and reloading all InnoDB data from an SQL dump) for reclaiming empty space in a tablespace file. MySQL itself provides very few tools to manage storage; tasks like storing large or infrequently-accessed tables and databases on dedicated filesystems must be done on the filesystem, with MySQL shut down.","title":"Efficiency"},{"location":"mysql/choose-something-else/#data-processing","text":"Data processing encompasses tasks that require making decisions about data and tasks that derive new data from existing data. This is a huge range of topics: Deciding (and enforcing) application-specific validity rules. Summarizing and deriving data. Providing and maintaining alternate representations and structures. Hosting complex domain logic near the data it operates on. The further towards data processing tasks applications move, the more their SQL resembles tiny programs sent to the data. MySQL is totally unprepared for programs, and expects SQL to retrieve or modify simple rows.","title":"Data Processing"},{"location":"mysql/choose-something-else/#validity","text":"Good constraints are like assert s: in an ideal world, you can't tell if they work, because your code never violates them. Here in the real world, constraint violations happen for all sorts of reasons, ranging from buggy code to buggy human cognition. A good database gives you more places to describe your expectations and more tools for detecting and preventing surprises. MySQL, on the other hand, can't validate your data for you, beyond simple (and fixed) type constraints: As with the data you store in it, MySQL feels free to change your table definitions implicitly and silently . Many of these silent schema changes have important performance and feature-availability implications. Foreign keys are ignored if you spell them certain, common, ways: CREATE TABLE foo ( -- ..., parent INTEGER NOT NULL REFERENCES foo_parent (id) -- , ... ) silently ignores the foreign key specification, while CREATE TABLE foo ( -- ..., parent INTEGER NOT NULL, FOREIGN KEY (parent) REFERENCES foo_parent (id) -- , ... ) preserves it. Foreign keys, one of the most widely-used database validity checks, are an engine-specific feature, restricting their availability in combination with other engine-specific features. (For example, a table cannot have both foreign key constraints and full-text indexes, as of MySQL 5.5.) Configurations that violate assumptions about foreign keys, such as a foreign key pointing into a MyISAM or NDB table, do not cause warnings or any other diagnostics. The foreign key is simply discarded. SURPRISE. (MySQL is riddled with these sorts of surprises, and apologists lean very heavily on the \u201cthat's documented\u201d excuse for its bad behaviour.) The MySQL parser recognizes CHECK clauses, which allow schema developers to make complex declarative assertions about tuples in the database, but discards them without warning . If you want CHECK -like constraints, you must implement them as triggers - but see below... MySQL's comprehension of the DEFAULT clause is, uh, limited: only constants are permitted, except for the special case of at most one TIMESTAMP column per table and at most one sequence-derived column. Who designed this mess? Furthermore, there's no way to say \u201cno default\u201d and raise an error when an INSERT forgets to provide a value. The default DEFAULT is either NULL or a zero-like constant ( 0 , '' , and so on). Even for types with no meaningful zero-like values ( DATETIME ). MySQL has no mechanism for introducing new types, which might otherwise provide a route to enforcing validity. Counting the number of special cases in MySQL's existing type system illustrates why that's probably unfixable. I hope every client with write access to your data is absolutely perfect, because MySQL cannot help you if you make a mistake.","title":"Validity"},{"location":"mysql/choose-something-else/#summarizing-and-deriving-data","text":"SQL databases generally provide features for doing \u201cinteresting\u201d things with sets of tuples, and MySQL is no exception. However, MySQL's limitations mean that actually processing data in the database is fraught with wasted money, brains, and time: Aggregate ( GROUP BY ) queries run up against limits in MySQL's query planner: a query with both WHERE and GROUP BY clauses can only satisfy one constraint or the other with indexes, unless there's an index that covers all the relevant fields in both clauses, in the right order. (What this order is depends on the complexity of the query and on the distribution of the underlying data, but that's hardly MySQL-specific.) If you have all three of WHERE , GROUP BY , and ORDER BY in the same query, you're more or less fucked. Good luck designing a single index that satisfies all three. Even though MySQL allows database administrators to define normal functions in a procedural SQL dialect , custom aggregate functions can only be defined by native plugins. Good thing, too, because procedural SQL in MySQL is its own kind of awful - more on that below. Subqueries are often convenient and occasionally necessary for expressing multi-step transformations on some underlying data. MySQL's query planner has only one strategy for optimizing them: evaluate the innermost query as written, into an in-memory table, then use a nested loop to satisfy joins or IN clauses. For large subquery results or interestingly nested subqueries, this is absurdly slow. MySQL's query planner can't fold constraints from outer queries into subqueries. The generated in-memory table never has any indexes, ever, even when appropriate indexes are \u201cobvious\u201d from the surrounding query; you cannot even specify them. These limitations also affect views, which are evaluated as if they were subqueries. In combination with the lack of constraint folding in the planner, this makes filtering or aggregating over large views completely impractical. MySQL lacks common table expressions . Even if subquery efficiency problems get fixed, the inability to give meaningful names to subqueries makes them hard to read and comprehend. I hope you like CREATE TEMPORARY TABLE AS SELECT , because that's your only real alternative. Window functions do not exist at all in MySQL. This complicates many kinds of analysis, including time series analyses and ranking analyses. Specific cases (for example, assigning rank numbers to rows) can be implemented using server-side variables and side effects during SELECT . What? Good luck understanding that code in six months. Even interesting joins run into trouble. MySQL's query planner has trouble with a number of cases that can easily arise in well-normalized data: Joining and ordering by rows from multiple tables often forces MySQL to dump the whole join to a temporary table, then sort it -- awful, especially if you then use LIMIT BY to paginate the results. JOIN clauses with non-trivial conditions, such as joins by range or joins by similarity, generally cause the planner to revert to table scans even if the same condition would be indexable outside of a join. Joins with WHERE clauses that span both tables, where the rows selected by the WHERE clause are outliers relative to the table statistics, often cause MySQL to access tables in suboptimal order. Ok, forget about interesting joins. Even interesting WHERE clauses can run into trouble: MySQL can't index deterministic functions of a row, either. While some deterministic functions can be eliminated from the WHERE clause using simple algebra, many useful cases (whitespace-insensitive comparison, hash-based comparisons, and so on) can't. You can fake these by storing the computed value in the row alongside the \u201creal\u201d value. This leaves your schema with some ugly data repetition and a chance for the two to fall out of sync, and clients must use the \u201ccomputed\u201d column explicitly. Oh, and they must maintain the \u201ccomputed\u201d version explicitly. Or you can use triggers. Ha. See above. And now you know why MySQL advocates are such big fans of doing data processing in \u201cthe client\u201d or \u201cthe app.\u201d","title":"Summarizing and Deriving Data"},{"location":"mysql/choose-something-else/#alternate-representations-and-derived-tables","text":"Many databases let schema designers and administrators abstract the underlying \u201cphysical\u201d table structure from the presentation given to clients, or to some specific clients, for any of a number of reasons. MySQL tries to let you do this, too! And fumbles it quite badly. As mentioned above, non-trivial views are basically useless. Queries like SELECT some columns FROM a_view WHERE id = 53 are evaluated in the stupidest -- and slowest -- possible way. Good luck hiding unusual partitioning arrangements or a permissions check in a view if you want any kind of performance. The poor interactions between triggers and binary logging's default configuration make it impractical to use triggers to maintain \u201cmaterialized\u201d views to avoid the problems with \u201creal\u201d views. It also effectively means triggers can't be used to emulate CHECK constraints and other consistency features. Code to maintain materialized views is also finicky and hard to get \u201cright,\u201d especially if the view includes aggregates or interesting joins over its source data. I hope you enjoy debugging MySQL's procedural SQL\u2026 For the relatively common case of wanting to abstract partitioned storage away for clients, MySQL actually has a tool for it! But it comes with enough caveats to strangle a horse : It's a separate table engine wrapping a \u201creal\u201d storage engine, which means it has its own, separate support for engine-specific features: transactions, foreign keys, and index types, AUTO_INCREMENT , and others. The syntax for configuring partitions makes selecting the wrong underlying engine entirely too easy, too. Partitioned tables may not be the referrent of foreign keys: you can't have both enforced relationships and this kind of storage management. MySQL doesn't actually know how to store partitions on separate disks or filesystems. You still need to reach underneath of MySQL do to actual storage management. Partitioning an InnoDB table under the default InnoDB configuration stores all of the partitions in the global tablespace file anyways. Helpful! For per-table configurations, they still all end up together in the same file. Partitioning InnoDB tables is a waste of time for managing storage. TL,DR: MySQL's partition support is so finicky and limited that MySQL-based apps tend to opt for multiple MySQL servers (\u201csharding\u201d) instead.","title":"Alternate Representations and Derived Tables"},{"location":"mysql/choose-something-else/#hosting-logic-in-the-database","text":"Yeah, yeah, the usual reaction to stored procedures and in-DB code is \u201ceww, yuck!\u201d for some not-terrible reasons, but hear me out on two points: Under the freestanding-database-server paradigm, there will usually be network latency between database clients and the database itself. There are two ways to minimize the impact of that: move the data to the code in bulk to minimize round-trips, or move the code to the data. Some database administration tasks are better implemented using in-database code than as freestanding clients: complex data migrations that can't be expressed as freestanding SQL queries, for example. MySQL, as of version 5.0 (released in 2003 -- remember that date, I'll come back to it), has support for in-database code via a procedural SQL-like dialect, like many other SQL databases. This includes server-side procedures (blocks of stored code that are invoked outside of any other statements and return statement-like results), functions (blocks of stored code that compute a result, used in any expression context such as a SELECT list or WHERE clause), and triggers (blocks of stored code that run whenever a row is created, modified, or deleted). Given the examples of other contemporaneous procedural languages , MySQL's procedural dialect -- an implementation of the SQL/PSM language -- is quite limited: There is no language construct for looping over a query result. This seems like a pretty fundamental feature for a database-hosted language, but no. There is no language construct for looping while a condition holds. This seems like a pretty fundamental feature for an imperative language designed any time after about 1975, but no. There is no language construct for looping over a range. There is, in fact, one language construct for looping: the unconditional loop. All other iteration control is done via conditional LEAVE statements, as BEGIN DECLARE c CURSOR FOR SELECT foo, bar, baz FROM some_table WHERE some_condition; DECLARE done INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; DECLARE c_foo INTEGER; DECLARE c_bar INTEGER; DECLARE c_baz INTEGER; OPEN c; process_some_table: LOOP FETCH c INTO c_foo, c_bar, c_baz; IF done THEN LEAVE process_some_table; END IF; -- do something with c_foo, c_bar, c_baz END LOOP; END; The original \u201cstructured programming\u201d revolution in the 1960s seems to have passed the MySQL team by. Okay, I lied. There are two looping constructs: there's also the REPEAT ... UNTIL condition END REPEAT construct, analogous to C's do {} while (!condition); loop. But you still can't loop over query results, and you can't run zero iterations of the loop's main body this way. There is nothing resembling a modern exception system with automatic scoping of handlers or declarative exception management. Error handling is entirely via Visual Basic-style \u201con condition X, do Y\u201d instructions, which remain in effect for the rest of the program's execution. In the language shipped with MySQL 5.0, there wasn't a way to signal errors, either: programmers had to resort to stunts like intentionally issuing failing queries , instead. Later versions of the language addressed this with the SIGNAL statement : see, they can learn from better languages, eventually. You can't escape to some other language, since MySQL doesn't have an extension mechanism for server-side languages or a good way to call out-of-process services during queries. The net result is that developing MySQL stored programs is unpleasant, uncomfortable, and far more error-prone than it could have been.","title":"Hosting Logic In The Database"},{"location":"mysql/choose-something-else/#why-is-mysql-the-way-it-is-by-design","text":"MySQL's technology and history contain the seeds of all of these flaws.","title":"Why Is MySQL The Way It Is? { #by-design }"},{"location":"mysql/choose-something-else/#pluggable-storage-engines","text":"Very early in MySQL's life, the MySQL dev team realized that MyISAM was not the only way to store data, and opted to support other storage backends within MySQL. This is basically an alright idea; while I personally prefer storage systems that focus their effort on making one backend work very well, supporting multiple backends and letting third-party developers write their own is a pretty good approach too. Unfortunately, MySQL's storage backend interface puts a very low ceiling on the ways storage backends can make MySQL behave better. MySQL's data access paths through table engines are very simple: MySQL asks the engine to open a table, asks the engine to iterate through the table returning rows, filters the rows itself (outside of the storage engine), then asks the engine to close the table. Alternately, MySQL asks the engine to open a table, asks the engine to retrieve rows in range or for a single value over a specific index, filters the rows itself, and asks the engine to close the table. This simplistic interface frees table engines from having to worry about query optimization - in theory. Unfortunately, engine-specific features have a large impact on the performance of various query plans, but the channels back to the query planner provide very little granularity for estimating cost and prevent the planner from making good use of the engine in unusual cases. Conversely, the table engine system is totally isolated from the actual query, and can't make query-dependent performance choices \u201con its own.\u201d There's no third path; the query planner itself is not pluggable. Similar consequences apply to type checking, support for new types, or even something as \u201cobvious\u201d as multiple automatic TIMESTAMP columns in the same table. Table manipulation -- creation, structural modification, and so on -- runs into similar problems. MySQL itself parses each CREATE TABLE statement, then hands off a parsed representation to the table engine so that it can manage storage. The parsed representation is lossy: there are plenty of forms MySQL's parser recognizes that aren't representable in a TABLE structure, preventing engines from implementing, say, column or tuple CHECK constraints without MySQL's help. The sheer number of table engines makes that help very slow in coming. Any change to the table engine interface means perturbing the code to each engine, making progress on new MySQL-level features that interact with storage such as better query planning or new SQL constructs necessarily slow to implement and slow to test.","title":"Pluggable Storage Engines"},{"location":"mysql/choose-something-else/#held-back-by-history","text":"The original MySQL team focused on pure read performance and on \u201cease of use\u201d (for new users with simple needs, as far as I can tell) over correctness and completeness, violating Knuth's laws of optimization. Many of these decisions locked MySQL into behaviours very early in its life that it still displays now. Features like implicit type conversions legitimately do help streamline development in very simple cases; experience with other languages unfortunately shows that the same behaviours sandbag development and help hide bugs in more sophisticated scenarios. MySQL has since changed hands, and the teams working on MySQL (and MariaDB, and Percona) are much more mature now than the team that made those early decisions. MySQL's massive and frequently non-savvy userbase makes it very hard to introduce breaking changes. At the same time, adding optional breaking changes via server and client mode flags (such as sql_mode ) increases the cognitive overhead of understanding MySQL's behaviours -- especially when that behaviour can vary from client to client, or when the server's configuration is out of the user's control (for example, on a shared host, or on EC2). A solution similar to Python's from __future__ import pragmas for making breaking changes opt-in some releases in advance of making them mandatory might help, but MySQL doesn't have the kind of highly-invested, highly-skilled user base that would make that effective -- and it still has all of the problems of modal behaviour.","title":"Held Back By History"},{"location":"mysql/choose-something-else/#bad-arguments","text":"Inevitably, someone's going to come along and tell me how wrong I am and how MySQL is just fine as a database system. These people are everywhere, and they mean well too, and they are almost all wrong. There are two good reasons to use MySQL: Some earlier group wrote for it, and we haven't finished porting our code off of MySQL. We've considered all of these points, and many more, and decided that ___feature_x___ that MySQL offers is worth the hassle. Unfortunately, these aren't the reasons people do give, generally. The following are much more common: It's good enough. No it ain't. There are plenty of other equally-capable data storage systems that don't come with MySQL's huge raft of edge cases and quirks. We haven't run into these problems. Actually, a lot of these problems happen silently . Odds are, unless you write your queries and schema statements with the manual open and refer back to it constantly, or have been using MySQL since the 3.x era daily , at least some of these issues have bitten you. The ones that prevent you from using your database intelligently are very hard to notice in action. We already know how to use it. MySQL development and administration causes brain damage, folks, the same way PHP does. Where PHP teaches programmers that \u201carray\u201d is the only structure you need, MySQL teaches people that databases are awkward, slow, hard-to-tune monsters that require constant attention. That doesn't have to be true; there are comfortable, fast, and easily-tuned systems out there that don't require daily care and feeding or the love of a specialist. It's the only thing our host supports. Get a better host . It's not like they're expensive or hard to find. We used it because it was there. Please hire some fucking software developers and go back to writing elevator pitches and flirting with Y Combinator. Everybody knows MySQL. It's easy to hire MySQL folks. It's easy to hire MCSEs, too, but you should be hiring for attitude and ability to learn, not for specific skillsets, if you want to run a successful software project. It's popular. Sure, and nobody ever got fired for buying IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if we let popularity dictate what technology we use and improve we'll never get anywhere. Marketing software to geeks is easy - it's just that lots of high-quality projects don't bother. It's lightweight. So's SQLite 3 or H2 . If you care about deployment footprint more than any other factor, MySQL is actually pretty clunky (and embedded MySQL has even bigger problems than freestanding MySQL). It's getting better, so we might as well stay on it. It's true , if you go by feature checklists and the manual, MySQL is improving \u201crapidly.\u201d 5.6 is due out soon and superficially looks to contain a number of good changes. I have two problems with this line of reasoning: Why wait? Other databases are good now , not eventually . MySQL has a history of providing the bare minimum to satisfy a feature checkbox without actually making the feature work well, work consistently, or work in combination with other features.","title":"Bad Arguments"},{"location":"nomic/","text":"Nomic \u00b6 Nomic is a game invented in 1982 by Peter Suber, as an appendix to his PhD thesis The Paradox of Self-Amendment . In Nomic, the primary move available to the players is to change the rules of the game in a structured way. Nomic itself was intended as a minimalist study of procedural law, but it has been played very successfully by many groups over the years. I first played Nomic through Agora , a long-running Nomic of a heavily procedural bent (as opposed to variants like BlogNomic, that have developed in much more whimsical directions). I've found the game, and the communities that have sprung up around the game, deeply fascinating as a way to examine how groups reach consensus and exercise decisions. I briefly experimented with the notion of running a procedural Nomic - a mini-Agora - via Github, and produced two documents: Notes Towards Initial Rules for a Github Nomic Github Nomic Rules","title":"Nomic"},{"location":"nomic/#nomic","text":"Nomic is a game invented in 1982 by Peter Suber, as an appendix to his PhD thesis The Paradox of Self-Amendment . In Nomic, the primary move available to the players is to change the rules of the game in a structured way. Nomic itself was intended as a minimalist study of procedural law, but it has been played very successfully by many groups over the years. I first played Nomic through Agora , a long-running Nomic of a heavily procedural bent (as opposed to variants like BlogNomic, that have developed in much more whimsical directions). I've found the game, and the communities that have sprung up around the game, deeply fascinating as a way to examine how groups reach consensus and exercise decisions. I briefly experimented with the notion of running a procedural Nomic - a mini-Agora - via Github, and produced two documents: Notes Towards Initial Rules for a Github Nomic Github Nomic Rules","title":"Nomic"},{"location":"nomic/notes/","text":"Notes Towards Initial Rules for a Github Nomic \u00b6 This document is not part of the rules of a Nomic, and is present solely as a guide to the design of this initial ruleset , for play on Github. It should be removed before the game starts, and at no time should it be consulted to guide gameplay directly. Peter Suber's Nomic is a game of rule-making for one or more players. For details on the rationale behind the game and the reasons the game might be interesting, see Suber's own description. Changes from Suber's Rules \u00b6 Format \u00b6 I've marked up Suber's rules into Markdown, one of Github's \u201cnative\u201d text markup formats. This highly-structured format produces quite readable results when viewed through the Github website, and allows useful things like HTML links that point to specific rules. I've also made some diff-friendliness choices around the structure of those Markdown documents. For want of a better idea, the source documents are line-broken with one sentence per line, so that diffs naturally span whole sentences rather than arbitrarily-wrapped text (or unwrapped text). Since Github automatically recombines sequences of non-blank lines into a single HTML paragraph, the rendering on the web site is still quite readable. I have not codified this format in the rules themselves. Asynchrony \u00b6 In its original form, Nomic is appropriate for face-to-face play. The rules assume that it is practical for the players to identify one another using out-of-game context, and that it is practical for the players to take turns. Each player is expected to wait indefinitely (or, more likely, to apply non-game social pressure) if the preceding player takes inordinately long to complete their turn. Similarly, Judgement interrupts the flow of game play and brings turns to a stop. This Nomic is to be played on Github, and the players are not likely to be present simultaneously, or to be willing to wait indefinitely. It's possible for Suber's original Nomic rules to be amended, following themselves, into a form suitable for asynchronous play. This has happened several times: for examples, see Agora and BlogNomic , though there are a multitude of others. However, this process of amendment takes time , and, starting from Suber's initial rules, would require a period of one-turn-at-a-time rule-changes before the game could be played more naturally in the Github format. This period is not very interesting, and is incredibly demanding of the initial players' attention spans. In the interests of preserving the players' time, I have modified Suber's initial ruleset to replace sequential play with a simple asynchronous model of play. In summary: Every player can begin a turn at any time, even during another player's (or players') turn, so long as they aren't already taking a turn. Actions can be resolved in any order, depending on which proposals players choose to vote on, and in what order. The initial rules allow for players to end their turns without gathering every vote, once gameplay has proceeded far enough for non-unanimous votes to be possible. I have attempted to leave the rules as close to Suber's original rules as possible otherwise while implementing this change to the initial ruleset. I have faith that the process of playing Nomic will correct any deficiencies, or, failing that, will clearly identify where these changes break the game entirely. I have, as far as I am able, emulated Suber's preference for succinctness over thoroughness, and resisted the urge to fix or clarify rules even where defects seem obvious to me. In spite of my temptation to remove it, I have even left the notion of \u201cwinning\u201d intact. Rule-numbering \u00b6 The intent of this Nomic is to explore the suitability of Github's suite of tools for proposing, reviewing, and accepting changes to a corpus of text are suitable for self-governed rulemaking processes, as modelled by Nomic. Note that this is a test of Github, not of Git: it is appropriate and intended that the players rely on non-Git elements of Github's workflow (issues, wiki pages, Github Pages, and so on), and similarly it is appropriate and intended that the authentic copy of the game in play is the Github project hosting it, not the Git repo the project contains, and certainly not forks of the project or other clones of the repository. To support this intention, I have re-labelled the initial rules with negative numbers, rather than digits, so that proposals can be numbered starting from 1 without colliding with existing rules, and so that they can be numbered by their Pull Requests and Github issue numbers. (A previous version of these rules used Roman numerals for the initial rules. However, correctly accounting for the priority of new rules over initial rules, following Suber, required more changes than I was comfortable making to Suber's ruleset.) I have made it explicit in these initial rules that Github, not the players, assigns numbers to proposals. This is the only rule which mentions Github by name. I have not explicitly specified that the proposals should be implemented through pull requests; this is an intentional opportunity for player creativity. Projects & Ideas \u00b6 A small personal collection of other ideas to explore: Repeal or replace the victory criteria entirely \u00b6 \u201cWinning\u201d is not an objective I'm personally interested in, and Suber's race to 200 points by popularity of proposal is structurally quite dull. If the game is to have a victory condition, it should be built from the ground up to meet the players' motivations, rather than being retrofitted onto the points-based system. Codify the use of Git commits, rather than prose, for rules-changes \u00b6 This is unstated in this ruleset, despite being part of my intention for playing. So is the relationship between proposals and the Git repository underpinning the Github project hosting the game. Clarify the immigration and exit procedures \u00b6 The question of who the players are , or how one becomes a player, is left intentionally vague. In Suber's original rules, it appears that the players are those who are engaged in playing the game: tautological on paper, but inherently obvious by simple observation of the playing-space. On Github, the answer to this question may not be so simple. A public repository is visible to anyone with an internet connection, and will accept proposed pull requests (and issue reports) equally freely. This suggests that either everyone is, inherently, a player, or that player-ness is somehow a function of engaging with the game. I leave it to the players to resolve this situation to their own satisfaction, but my suggestion is to track player-ness using repository collaborators or organization member accounts. Figure out how to regulate the use of Github features \u00b6 Nomic, as written, largely revolves around sequential proposals. That's fine as far as it goes, but Github has a very wide array of project management features - and that set of features changes over time, outside the control of the players, as Github roll out improvements (and, sometimes, break things). Features of probable interest: The gh-pages branch and associated web site. Issue and pull request tagging and approval settings. Third-party integrations. Whether to store non-rule state, as such arises, in the repository, or in the wiki, or elsewhere. Pull request reactions and approvals. The mutability of most Github features. Expand the rules-change process to permit a single proposal to amend many rules \u00b6 This is a standard rules patch, as Suber's initial rule-set is (I believe intentionally) very restrictive. This may turn out to be less relevant on Github, if players are allowed to submit turns in rapid succession with themselves. Transition from immediate amendment to a system of sessions \u00b6 Why not? Parliamentary procedure is fun, right? In an asynchronous environment, the discrete phases of a session system (where proposals are gathered, then debated, then voted upon, then enacted as a unit) might be a better fit for the Github mode of play. Evaluate other models of proposal vetting besides majority vote \u00b6 Github open source projects regularly have a small core team of maintainers supporting a larger group of users. Is it possible to mirror this structure in Nomic? Is it wise to do so? I suspect this is only possible with an inordinately large number of players, but Github could, at least in principle, support that number of players. Note that this is a fairly standard Nomic passtime.","title":"Notes Towards Initial Rules for a Github Nomic"},{"location":"nomic/notes/#notes-towards-initial-rules-for-a-github-nomic","text":"This document is not part of the rules of a Nomic, and is present solely as a guide to the design of this initial ruleset , for play on Github. It should be removed before the game starts, and at no time should it be consulted to guide gameplay directly. Peter Suber's Nomic is a game of rule-making for one or more players. For details on the rationale behind the game and the reasons the game might be interesting, see Suber's own description.","title":"Notes Towards Initial Rules for a Github Nomic"},{"location":"nomic/notes/#changes-from-subers-rules","text":"","title":"Changes from Suber's Rules"},{"location":"nomic/notes/#format","text":"I've marked up Suber's rules into Markdown, one of Github's \u201cnative\u201d text markup formats. This highly-structured format produces quite readable results when viewed through the Github website, and allows useful things like HTML links that point to specific rules. I've also made some diff-friendliness choices around the structure of those Markdown documents. For want of a better idea, the source documents are line-broken with one sentence per line, so that diffs naturally span whole sentences rather than arbitrarily-wrapped text (or unwrapped text). Since Github automatically recombines sequences of non-blank lines into a single HTML paragraph, the rendering on the web site is still quite readable. I have not codified this format in the rules themselves.","title":"Format"},{"location":"nomic/notes/#asynchrony","text":"In its original form, Nomic is appropriate for face-to-face play. The rules assume that it is practical for the players to identify one another using out-of-game context, and that it is practical for the players to take turns. Each player is expected to wait indefinitely (or, more likely, to apply non-game social pressure) if the preceding player takes inordinately long to complete their turn. Similarly, Judgement interrupts the flow of game play and brings turns to a stop. This Nomic is to be played on Github, and the players are not likely to be present simultaneously, or to be willing to wait indefinitely. It's possible for Suber's original Nomic rules to be amended, following themselves, into a form suitable for asynchronous play. This has happened several times: for examples, see Agora and BlogNomic , though there are a multitude of others. However, this process of amendment takes time , and, starting from Suber's initial rules, would require a period of one-turn-at-a-time rule-changes before the game could be played more naturally in the Github format. This period is not very interesting, and is incredibly demanding of the initial players' attention spans. In the interests of preserving the players' time, I have modified Suber's initial ruleset to replace sequential play with a simple asynchronous model of play. In summary: Every player can begin a turn at any time, even during another player's (or players') turn, so long as they aren't already taking a turn. Actions can be resolved in any order, depending on which proposals players choose to vote on, and in what order. The initial rules allow for players to end their turns without gathering every vote, once gameplay has proceeded far enough for non-unanimous votes to be possible. I have attempted to leave the rules as close to Suber's original rules as possible otherwise while implementing this change to the initial ruleset. I have faith that the process of playing Nomic will correct any deficiencies, or, failing that, will clearly identify where these changes break the game entirely. I have, as far as I am able, emulated Suber's preference for succinctness over thoroughness, and resisted the urge to fix or clarify rules even where defects seem obvious to me. In spite of my temptation to remove it, I have even left the notion of \u201cwinning\u201d intact.","title":"Asynchrony"},{"location":"nomic/notes/#rule-numbering","text":"The intent of this Nomic is to explore the suitability of Github's suite of tools for proposing, reviewing, and accepting changes to a corpus of text are suitable for self-governed rulemaking processes, as modelled by Nomic. Note that this is a test of Github, not of Git: it is appropriate and intended that the players rely on non-Git elements of Github's workflow (issues, wiki pages, Github Pages, and so on), and similarly it is appropriate and intended that the authentic copy of the game in play is the Github project hosting it, not the Git repo the project contains, and certainly not forks of the project or other clones of the repository. To support this intention, I have re-labelled the initial rules with negative numbers, rather than digits, so that proposals can be numbered starting from 1 without colliding with existing rules, and so that they can be numbered by their Pull Requests and Github issue numbers. (A previous version of these rules used Roman numerals for the initial rules. However, correctly accounting for the priority of new rules over initial rules, following Suber, required more changes than I was comfortable making to Suber's ruleset.) I have made it explicit in these initial rules that Github, not the players, assigns numbers to proposals. This is the only rule which mentions Github by name. I have not explicitly specified that the proposals should be implemented through pull requests; this is an intentional opportunity for player creativity.","title":"Rule-numbering"},{"location":"nomic/notes/#projects-ideas","text":"A small personal collection of other ideas to explore:","title":"Projects &amp; Ideas"},{"location":"nomic/notes/#repeal-or-replace-the-victory-criteria-entirely","text":"\u201cWinning\u201d is not an objective I'm personally interested in, and Suber's race to 200 points by popularity of proposal is structurally quite dull. If the game is to have a victory condition, it should be built from the ground up to meet the players' motivations, rather than being retrofitted onto the points-based system.","title":"Repeal or replace the victory criteria entirely"},{"location":"nomic/notes/#codify-the-use-of-git-commits-rather-than-prose-for-rules-changes","text":"This is unstated in this ruleset, despite being part of my intention for playing. So is the relationship between proposals and the Git repository underpinning the Github project hosting the game.","title":"Codify the use of Git commits, rather than prose, for rules-changes"},{"location":"nomic/notes/#clarify-the-immigration-and-exit-procedures","text":"The question of who the players are , or how one becomes a player, is left intentionally vague. In Suber's original rules, it appears that the players are those who are engaged in playing the game: tautological on paper, but inherently obvious by simple observation of the playing-space. On Github, the answer to this question may not be so simple. A public repository is visible to anyone with an internet connection, and will accept proposed pull requests (and issue reports) equally freely. This suggests that either everyone is, inherently, a player, or that player-ness is somehow a function of engaging with the game. I leave it to the players to resolve this situation to their own satisfaction, but my suggestion is to track player-ness using repository collaborators or organization member accounts.","title":"Clarify the immigration and exit procedures"},{"location":"nomic/notes/#figure-out-how-to-regulate-the-use-of-github-features","text":"Nomic, as written, largely revolves around sequential proposals. That's fine as far as it goes, but Github has a very wide array of project management features - and that set of features changes over time, outside the control of the players, as Github roll out improvements (and, sometimes, break things). Features of probable interest: The gh-pages branch and associated web site. Issue and pull request tagging and approval settings. Third-party integrations. Whether to store non-rule state, as such arises, in the repository, or in the wiki, or elsewhere. Pull request reactions and approvals. The mutability of most Github features.","title":"Figure out how to regulate the use of Github features"},{"location":"nomic/notes/#expand-the-rules-change-process-to-permit-a-single-proposal-to-amend-many-rules","text":"This is a standard rules patch, as Suber's initial rule-set is (I believe intentionally) very restrictive. This may turn out to be less relevant on Github, if players are allowed to submit turns in rapid succession with themselves.","title":"Expand the rules-change process to permit a single proposal to amend many rules"},{"location":"nomic/notes/#transition-from-immediate-amendment-to-a-system-of-sessions","text":"Why not? Parliamentary procedure is fun, right? In an asynchronous environment, the discrete phases of a session system (where proposals are gathered, then debated, then voted upon, then enacted as a unit) might be a better fit for the Github mode of play.","title":"Transition from immediate amendment to a system of sessions"},{"location":"nomic/notes/#evaluate-other-models-of-proposal-vetting-besides-majority-vote","text":"Github open source projects regularly have a small core team of maintainers supporting a larger group of users. Is it possible to mirror this structure in Nomic? Is it wise to do so? I suspect this is only possible with an inordinately large number of players, but Github could, at least in principle, support that number of players. Note that this is a fairly standard Nomic passtime.","title":"Evaluate other models of proposal vetting besides majority vote"},{"location":"nomic/rules/","text":"Github Nomic Rules \u00b6 Immutable Rules \u00b6 Rule -216. \u00b6 All players must always abide by all the rules then in effect, in the form in which they are then in effect. The rules in the Initial Set are in effect whenever a game begins. The Initial Set consists of rules -216 through -201 (immutable) and rules -112 through -101 (mutable). Rule -215. \u00b6 Initially, rules -216 through -201 are immutable, and rules -112 through -101 are mutable. Rules subsequently enacted or transmuted (that is, changed from immutable to mutable or vice versa) may be immutable or mutable regardless of their numbers, and rules in the Initial Set may be transmuted regardless of their numbers. Rule -214. \u00b6 A rule-change is any of the following: the enactment, repeal, or amendment of a mutable rule; the enactment, repeal, or amendment of an amendment of a mutable rule; or the transmutation of an immutable rule into a mutable rule or vice versa. (Note: This definition implies that, at least initially, all new rules are mutable; immutable rules, as long as they are immutable, may not be amended or repealed; mutable rules, as long as they are mutable, may be amended or repealed; any rule of any status may be transmuted; no rule is absolutely immune to change.) Rule -213. \u00b6 All rule-changes proposed in the proper way shall be voted on. They will be adopted if and only if they receive the required number of votes. Rule -212. \u00b6 Every player is an eligible voter. Rule -211. \u00b6 All proposed rule-changes shall be written down before they are voted on. If they are adopted, they shall guide play in the form in which they were voted on. Rule -210. \u00b6 No rule-change may take effect earlier than the moment of the completion of the vote that adopted it, even if its wording explicitly states otherwise. No rule-change may have retroactive application. Rule -209. \u00b6 Each proposed rule-change shall be given a number for reference. The numbers shall be assigned by Github, so that each rule-change proposed in the proper way shall receive the a distinct integer from all prior proposals, whether or not the proposal is adopted. If a rule is repealed and reenacted, it receives the number of the proposal to reenact it. If a rule is amended or transmuted, it receives the number of the proposal to amend or transmute it. If an amendment is amended or repealed, the entire rule of which it is a part receives the number of the proposal to amend or repeal the amendment. Rule -208. \u00b6 Rule-changes that transmute immutable rules into mutable rules may be adopted if and only if the vote is unanimous among the eligible voters. Transmutation shall not be implied, but must be stated explicitly in a proposal to take effect. Rule -207. \u00b6 In a conflict between a mutable and an immutable rule, the immutable rule takes precedence and the mutable rule shall be entirely void. For the purposes of this rule a proposal to transmute an immutable rule does not \"conflict\" with that immutable rule. Rule -206. \u00b6 If a rule-change as proposed is unclear, ambiguous, paradoxical, or destructive of play, or if it arguably consists of two or more rule-changes compounded or is an amendment that makes no difference, or if it is otherwise of questionable value, then the other players may suggest amendments or argue against the proposal before the vote. A reasonable time must be allowed for this debate. The proponent decides the final form in which the proposal is to be voted on and, unless the Judge has been asked to do so, also decides the time to end debate and vote. Rule -205. \u00b6 The state of affairs that constitutes winning may not be altered from achieving n points to any other state of affairs. The magnitude of n and the means of earning points may be changed, and rules that establish a winner when play cannot continue may be enacted and (while they are mutable) be amended or repealed. Rule -204. \u00b6 A player always has the option to forfeit the game rather than continue to play or incur a game penalty. No penalty worse than losing, in the judgment of the player to incur it, may be imposed. Rule -203. \u00b6 There must always be at least one mutable rule. The adoption of rule-changes must never become completely impermissible. Rule -202. \u00b6 Rule-changes that affect rules needed to allow or apply rule-changes are as permissible as other rule-changes. Even rule-changes that amend or repeal their own authority are permissible. No rule-change or type of move is impermissible solely on account of the self-reference or self-application of a rule. Rule -201. \u00b6 Whatever is not prohibited or regulated by a rule is permitted and unregulated, with the sole exception of changing the rules, which is permitted only when a rule or set of rules explicitly or implicitly permits it. Mutable Rules \u00b6 Rule -112. \u00b6 A player may begin a turn at any time that suits them. Turns may overlap: one player may begin a turn while another player's is in progress. No player may begin a turn unless all of their previous turns have ended. All players begin with zero points. Rule -111. \u00b6 One turn consists of two parts in this order: proposing one rule-change and having it voted on, and scoring the proposal and adding that score to the proposing player's score. A proposal is scored by taking the proposal number, adding nine to it, multiplying the result by the fraction of favourable votes the proposal received, and rounding that result to the nearest integer. (This scoring system yields a number between 0 and 10 for the first proposal, with the upper limit increasing by one for each new proposal; more points are awarded for more popular proposals.) Rule -110. \u00b6 A rule-change is adopted if and only if the vote in favour is unanimous among the eligible voters. If this rule is not amended before each player has had two turns, it automatically changes to require only a simple majority. If and when rule-changes can only be adopted unanimously, the voting may be ended as soon as an opposing vote is counted. If and when rule-changes can be adopted by simple majority, the voting may be ended as soon as a simple majority in favour or a simple majority against is counted. Rule -109. \u00b6 If and when rule-changes can be adopted without unanimity, the players who vote against winning proposals shall receive 10 points each. Rule -108. \u00b6 An adopted rule-change takes full effect at the moment of the completion of the vote that adopted it. Rule -107. \u00b6 When a proposed rule-change is defeated, the player who proposed it loses 10 points. Rule -106. \u00b6 Each player always has exactly one vote. Rule -105. \u00b6 The winner is the first player to achieve 200 (positive) points. Rule -104. \u00b6 At no time may there be more than 25 mutable rules. Rule -103. \u00b6 If two or more mutable rules conflict with one another, or if two or more immutable rules conflict with one another, then the rule with the lowest ordinal number takes precedence. If at least one of the rules in conflict explicitly says of itself that it defers to another rule (or type of rule) or takes precedence over another rule (or type of rule), then such provisions shall supersede the numerical method for determining precedence. If two or more rules claim to take precedence over one another or to defer to one another, then the numerical method again governs. Rule -102. \u00b6 If players disagree about the legality of a move or the interpretation or application of a rule, then the player moving may ask any other player to be the Judge and decide the question. Disagreement for the purposes of this rule may be created by the insistence of any player. This process is called invoking Judgment. When Judgment has been invoked, no player may begin his or her turn without the consent of a majority of the other players. The Judge's Judgment may be overruled only by a unanimous vote of the other players taken before the next turn is begun. If a Judge's Judgment is overruled, then the Judge may ask any player other than the moving player, and other than any player who has already been the Judge for the question, to become the new Judge for the question, and so on, except that no player is to be Judge during his or her own turn or during the turn of a team-mate. Unless a Judge is overruled, one Judge settles all questions arising from the game until the next turn is begun, including questions as to his or her own legitimacy and jurisdiction as Judge. New Judges are not bound by the decisions of old Judges. New Judges may, however, settle only those questions on which the players currently disagree and that affect the completion of the turn in which Judgment was invoked. All decisions by Judges shall be in accordance with all the rules then in effect; but when the rules are silent, inconsistent, or unclear on the point at issue, then the Judge shall consider game-custom and the spirit of the game before applying other standards. Rule -101. \u00b6 If the rules are changed so that further play is impossible, or if the legality of a move cannot be determined with finality, or if by the Judge's best reasoning, not overruled, a move appears equally legal and illegal, then the first player unable to complete a turn is the winner. This rule takes precedence over every other rule determining the winner.","title":"Github Nomic Rules"},{"location":"nomic/rules/#github-nomic-rules","text":"","title":"Github Nomic Rules"},{"location":"nomic/rules/#immutable-rules","text":"","title":"Immutable Rules"},{"location":"nomic/rules/#rule-216","text":"All players must always abide by all the rules then in effect, in the form in which they are then in effect. The rules in the Initial Set are in effect whenever a game begins. The Initial Set consists of rules -216 through -201 (immutable) and rules -112 through -101 (mutable).","title":"Rule -216."},{"location":"nomic/rules/#rule-215","text":"Initially, rules -216 through -201 are immutable, and rules -112 through -101 are mutable. Rules subsequently enacted or transmuted (that is, changed from immutable to mutable or vice versa) may be immutable or mutable regardless of their numbers, and rules in the Initial Set may be transmuted regardless of their numbers.","title":"Rule -215."},{"location":"nomic/rules/#rule-214","text":"A rule-change is any of the following: the enactment, repeal, or amendment of a mutable rule; the enactment, repeal, or amendment of an amendment of a mutable rule; or the transmutation of an immutable rule into a mutable rule or vice versa. (Note: This definition implies that, at least initially, all new rules are mutable; immutable rules, as long as they are immutable, may not be amended or repealed; mutable rules, as long as they are mutable, may be amended or repealed; any rule of any status may be transmuted; no rule is absolutely immune to change.)","title":"Rule -214."},{"location":"nomic/rules/#rule-213","text":"All rule-changes proposed in the proper way shall be voted on. They will be adopted if and only if they receive the required number of votes.","title":"Rule -213."},{"location":"nomic/rules/#rule-212","text":"Every player is an eligible voter.","title":"Rule -212."},{"location":"nomic/rules/#rule-211","text":"All proposed rule-changes shall be written down before they are voted on. If they are adopted, they shall guide play in the form in which they were voted on.","title":"Rule -211."},{"location":"nomic/rules/#rule-210","text":"No rule-change may take effect earlier than the moment of the completion of the vote that adopted it, even if its wording explicitly states otherwise. No rule-change may have retroactive application.","title":"Rule -210."},{"location":"nomic/rules/#rule-209","text":"Each proposed rule-change shall be given a number for reference. The numbers shall be assigned by Github, so that each rule-change proposed in the proper way shall receive the a distinct integer from all prior proposals, whether or not the proposal is adopted. If a rule is repealed and reenacted, it receives the number of the proposal to reenact it. If a rule is amended or transmuted, it receives the number of the proposal to amend or transmute it. If an amendment is amended or repealed, the entire rule of which it is a part receives the number of the proposal to amend or repeal the amendment.","title":"Rule -209."},{"location":"nomic/rules/#rule-208","text":"Rule-changes that transmute immutable rules into mutable rules may be adopted if and only if the vote is unanimous among the eligible voters. Transmutation shall not be implied, but must be stated explicitly in a proposal to take effect.","title":"Rule -208."},{"location":"nomic/rules/#rule-207","text":"In a conflict between a mutable and an immutable rule, the immutable rule takes precedence and the mutable rule shall be entirely void. For the purposes of this rule a proposal to transmute an immutable rule does not \"conflict\" with that immutable rule.","title":"Rule -207."},{"location":"nomic/rules/#rule-206","text":"If a rule-change as proposed is unclear, ambiguous, paradoxical, or destructive of play, or if it arguably consists of two or more rule-changes compounded or is an amendment that makes no difference, or if it is otherwise of questionable value, then the other players may suggest amendments or argue against the proposal before the vote. A reasonable time must be allowed for this debate. The proponent decides the final form in which the proposal is to be voted on and, unless the Judge has been asked to do so, also decides the time to end debate and vote.","title":"Rule -206."},{"location":"nomic/rules/#rule-205","text":"The state of affairs that constitutes winning may not be altered from achieving n points to any other state of affairs. The magnitude of n and the means of earning points may be changed, and rules that establish a winner when play cannot continue may be enacted and (while they are mutable) be amended or repealed.","title":"Rule -205."},{"location":"nomic/rules/#rule-204","text":"A player always has the option to forfeit the game rather than continue to play or incur a game penalty. No penalty worse than losing, in the judgment of the player to incur it, may be imposed.","title":"Rule -204."},{"location":"nomic/rules/#rule-203","text":"There must always be at least one mutable rule. The adoption of rule-changes must never become completely impermissible.","title":"Rule -203."},{"location":"nomic/rules/#rule-202","text":"Rule-changes that affect rules needed to allow or apply rule-changes are as permissible as other rule-changes. Even rule-changes that amend or repeal their own authority are permissible. No rule-change or type of move is impermissible solely on account of the self-reference or self-application of a rule.","title":"Rule -202."},{"location":"nomic/rules/#rule-201","text":"Whatever is not prohibited or regulated by a rule is permitted and unregulated, with the sole exception of changing the rules, which is permitted only when a rule or set of rules explicitly or implicitly permits it.","title":"Rule -201."},{"location":"nomic/rules/#mutable-rules","text":"","title":"Mutable Rules"},{"location":"nomic/rules/#rule-112","text":"A player may begin a turn at any time that suits them. Turns may overlap: one player may begin a turn while another player's is in progress. No player may begin a turn unless all of their previous turns have ended. All players begin with zero points.","title":"Rule -112."},{"location":"nomic/rules/#rule-111","text":"One turn consists of two parts in this order: proposing one rule-change and having it voted on, and scoring the proposal and adding that score to the proposing player's score. A proposal is scored by taking the proposal number, adding nine to it, multiplying the result by the fraction of favourable votes the proposal received, and rounding that result to the nearest integer. (This scoring system yields a number between 0 and 10 for the first proposal, with the upper limit increasing by one for each new proposal; more points are awarded for more popular proposals.)","title":"Rule -111."},{"location":"nomic/rules/#rule-110","text":"A rule-change is adopted if and only if the vote in favour is unanimous among the eligible voters. If this rule is not amended before each player has had two turns, it automatically changes to require only a simple majority. If and when rule-changes can only be adopted unanimously, the voting may be ended as soon as an opposing vote is counted. If and when rule-changes can be adopted by simple majority, the voting may be ended as soon as a simple majority in favour or a simple majority against is counted.","title":"Rule -110."},{"location":"nomic/rules/#rule-109","text":"If and when rule-changes can be adopted without unanimity, the players who vote against winning proposals shall receive 10 points each.","title":"Rule -109."},{"location":"nomic/rules/#rule-108","text":"An adopted rule-change takes full effect at the moment of the completion of the vote that adopted it.","title":"Rule -108."},{"location":"nomic/rules/#rule-107","text":"When a proposed rule-change is defeated, the player who proposed it loses 10 points.","title":"Rule -107."},{"location":"nomic/rules/#rule-106","text":"Each player always has exactly one vote.","title":"Rule -106."},{"location":"nomic/rules/#rule-105","text":"The winner is the first player to achieve 200 (positive) points.","title":"Rule -105."},{"location":"nomic/rules/#rule-104","text":"At no time may there be more than 25 mutable rules.","title":"Rule -104."},{"location":"nomic/rules/#rule-103","text":"If two or more mutable rules conflict with one another, or if two or more immutable rules conflict with one another, then the rule with the lowest ordinal number takes precedence. If at least one of the rules in conflict explicitly says of itself that it defers to another rule (or type of rule) or takes precedence over another rule (or type of rule), then such provisions shall supersede the numerical method for determining precedence. If two or more rules claim to take precedence over one another or to defer to one another, then the numerical method again governs.","title":"Rule -103."},{"location":"nomic/rules/#rule-102","text":"If players disagree about the legality of a move or the interpretation or application of a rule, then the player moving may ask any other player to be the Judge and decide the question. Disagreement for the purposes of this rule may be created by the insistence of any player. This process is called invoking Judgment. When Judgment has been invoked, no player may begin his or her turn without the consent of a majority of the other players. The Judge's Judgment may be overruled only by a unanimous vote of the other players taken before the next turn is begun. If a Judge's Judgment is overruled, then the Judge may ask any player other than the moving player, and other than any player who has already been the Judge for the question, to become the new Judge for the question, and so on, except that no player is to be Judge during his or her own turn or during the turn of a team-mate. Unless a Judge is overruled, one Judge settles all questions arising from the game until the next turn is begun, including questions as to his or her own legitimacy and jurisdiction as Judge. New Judges are not bound by the decisions of old Judges. New Judges may, however, settle only those questions on which the players currently disagree and that affect the completion of the turn in which Judgment was invoked. All decisions by Judges shall be in accordance with all the rules then in effect; but when the rules are silent, inconsistent, or unclear on the point at issue, then the Judge shall consider game-custom and the spirit of the game before applying other standards.","title":"Rule -102."},{"location":"nomic/rules/#rule-101","text":"If the rules are changed so that further play is impossible, or if the legality of a move cannot be determined with finality, or if by the Judge's best reasoning, not overruled, a move appears equally legal and illegal, then the first player unable to complete a turn is the winner. This rule takes precedence over every other rule determining the winner.","title":"Rule -101."}]} \ No newline at end of file