Jump to content
php.lv forumi

Aleksejs

Moderatori
  • Posts

    4,584
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by Aleksejs

  1. Atgriežoties pie NoSQL tēmas...

    lūk visai kritisks raksts par MongoDB:

    http://pastebin.com/raw.php?i=FD3xe6Jt

    Pārpublicēju, jo var no pastebina pazust:

     

    Don't use MongoDB

    =================

     

    I've kept quiet for awhile for various political reasons, but I now

    feel a kind of social responsibility to deter people from banking

    their business on MongoDB.

     

    Our team did serious load on MongoDB on a large (10s of millions

    of users, high profile company) userbase, expecting, from early good

    experiences, that the long-term scalability benefits touted by 10gen

    would pan out. We were wrong, and this rant serves to deter you

    from believing those benefits and making the same mistake

    we did. If one person avoid the trap, it will have been

    worth writing. Hopefully, many more do.

     

    Note that, in our experiences with 10gen, they were nearly always

    helpful and cordial, and often extremely so. But at the same

    time, that cannot be reason alone to supress information about

    the failings of their product.

     

    Why this matters

    ----------------

     

    Databases must be right, or as-right-as-possible, b/c database

    mistakes are so much more severe than almost every other variation

    of mistake. Not only does it have the largest impact on uptime,

    performance, expense, and value (the inherit value of the data),

    but data has *inertia*. Migrating TBs of data on-the-fly is

    a massive undertaking compared to changing drcses or fixing the

    average logic error in your code. Recovering TBs of data while

    down, limited by what spindles can do for you, is a helpless

    feeling.

     

    Databases are also complex systems that are effectively black

    boxes to the end developer. By adopting a database system,

    you place absolute trust in their ability to do the right thing

    with your data to keep it consistent and available.

     

    Why is MongoDB popular?

    -----------------------

     

    To be fair, it must be acknowledged that MongoDB is popular,

    and that there are valid reasons for its popularity.

     

    * It is remarkably easy to get running

    * Schema-free models that map to JSON-like structures

    have great appeal to developers (they fit our brains),

    and a developer is almost always the individual who

    makes the platform decisions when a project is in

    its infancy

    * Maturity and robustness, track record, tested real-world

    use cases, etc, are typically more important to sysadmin

    types or operations specialists, who often inherit the

    platform long after the initial decisions are made

    * Its single-system, low concurrency read performance benchmarks

    are impressive, and for the inexperienced evaluator, this

    is often The Most Important Thing

     

    Now, if you're writing a toy site, or a prototype, something

    where developer productivity trumps all other considerations,

    it basically doesn't matter *what* you use. Use whatever

    gets the job done.

     

    But if you're intending to really run a large scale system

    on Mongo, one that a business might depend on, simply put:

     

    Don't.

     

    Why not?

    --------

     

    **1. MongoDB issues writes in unsafe ways *by default* in order to

    win benchmarks**

     

    If you don't issue getLastError(), MongoDB doesn't wait for any

    confirmation from the database that the command was processed.

    This introduces at least two classes of problems:

     

    * In a concurrent environment (connection pools, etc), you may

    have a subsequent read fail after a write has "finished";

    there is no barrier condition to know at what point the

    database will recognize a write commitment

    * Any unknown number of save operations can be dropped on the floor

    due to queueing in various places, things outstanding in the TCP

    buffer, etc, when your connection drops of the db were to be KILL'd or

    segfault, hardware crash, you name it

     

    **2. MongoDB can lose data in many startling ways**

     

    Here is a list of ways we personally experienced records go missing:

     

    1. They just disappeared sometimes. Cause unknown.

    2. Recovery on corrupt database was not successful,

    pre transaction log.

    3. Replication between master and slave had *gaps* in the oplogs,

    causing slaves to be missing records the master had. Yes,

    there is no checksum, and yes, the replication status had the

    slaves current

    4. Replication just stops sometimes, without error. Monitor

    your replication status!

     

    **3. MongoDB requires a global write lock to issue any write**

     

    Under a write-heavy load, this will kill you. If you run a blog,

    you maybe don't care b/c your R:W ratio is so high.

     

    **4. MongoDB's sharding doesn't work that well under load**

     

    Adding a shard under heavy load is a nightmare.

    Mongo either moves chunks between shards so quickly it DOSes

    the production traffic, or refuses to more chunks altogether.

     

    This pretty much makes it a non-starter for high-traffic

    sites with heavy write volume.

     

    **5. mongos is unreliable**

     

    The mongod/config server/mongos architecture is actually pretty

    reasonable and clever. Unfortunately, mongos is complete

    garbage. Under load, it crashed anywhere from every few hours

    to every few days. Restart supervision didn't always help b/c

    sometimes it would throw some assertion that would bail out a

    critical thread, but the process would stay running. Double

    fail.

     

    It got so bad the only usable way we found to run mongos was

    to run haproxy in front of dozens of mongos instances, and

    to have a job that slowly rotated through them and killed them

    to keep fresh/live ones in the pool. No joke.

     

    **6. MongoDB actually once deleted the entire dataset**

     

    MongoDB, 1.6, in replica set configuration, would sometimes

    determine the wrong node (often an empty node) was the freshest

    copy of the data available. It would then DELETE ALL THE DATA

    ON THE REPLICA (which may have been the 700GB of good data)

    AND REPLICATE THE EMPTY SET. The database should never never

    never do this. Faced with a situation like that, the database

    should throw an error and make the admin disambiguate by

    wiping/resetting data, or forcing the correct configuration.

    NEVER DELETE ALL THE DATA. (This was a bad day.)

     

    They fixed this in 1.8, thank god.

     

    **7. Things were shipped that should have never been shipped**

     

    Things with known, embarrassing bugs that could cause data

    problems were in "stable" releases--and often we weren't told

    about these issues until after they bit us, and then only b/c

    we had a super duper crazy platinum support contract with 10gen.

     

    The response was to send up a hot patch and that they were

    calling an RC internally, and then run that on our data.

     

    **8. Replication was lackluster on busy servers**

     

    Replication would often, again, either DOS the master, or

    replicate so slowly that it would take far too long and

    the oplog would be exhausted (even with a 50G oplog).

     

    We had a busy, large dataset that we simply could

    not replicate b/c of this dynamic. It was a harrowing month

    or two of finger crossing before we got it onto a different

    database system.

     

    **But, the real problem:**

     

    You might object, my information is out of date; they've

    fixed these problems or intend to fix them in the next version;

    problem X can be mitigated by optional practice Y.

     

    Unfortunately, it doesn't matter.

     

    The real problem is that so many of these problems existed

    in the first place.

     

    Database developers must be held to a higher standard than

    your average developer. Namely, your priority list should

    typically be something like:

     

    1. Don't lose data, be very deterministic with data

    2. Employ practices to stay available

    3. Multi-node scalability

    4. Minimize latency at 99% and 95%

    5. Raw req/s per resource

     

    10gen's order seems to be, #5, then everything else in some

    order. #1 ain't in the top 3.

     

    These failings, and the implied priorities of the company,

    indicate a basic cultural problem, irrespective of whatever

    problems exist in any single release: a lack of the requisite

    discipline to design database systems businesses should bet on.

     

    Please take this warning seriously.

  2. Local Session Hijacking in PHP

    PHP's default session handler stores session data in files. And by default these files are placed in /tmp. In a shared enviroment session files should never be placed in a directory that can be read by a malicious local user like the world readable /tmp directory.

     

    Local Session Snooping in PHP

    Local session snooping is not as much a security issue as a way of gathering information from an already compromised web application. Unless it is a badly configured shared host where an attacker might gather otherwise unobtainable information. It's basically about extracting all the information a web application stored in the super global $_SESSION variable.

     

    Local Session Poisoning in PHP Part 1: The Basics of Exploitation and How to Secure a Server

    Session poisoning is the act of manipulating sessions specific data in PHP. To add, change or remove variables stored in the super global $_SESSION array.

     

    Local session poisoning is enabled by the fact that one web application can manipulate a variable in the $_SESSION array while another web application has no way of knowing how that variable's value came to be, and will interpret the variable according to its own logic. The $_SESSION array can then be manipulated to contain the values needed to spoof a logged in user or exploit a vulnerable function. PHP programmers put far more trust in $_SESSION variables than for example $_GET variables. The $_SESSION array is considered an internal variable, and an internal variable would never contain malicious input, would it?

     

    Local Session Poisoning in PHP Part 2: Promiscuous Session Files

    FastCGI, suPHP and suExec can all ensure that a PHP script which is called from the web will execute under the user that owns it, as opposed to the user the web server is running as. This seemingly protects against session poisoning by ensuring that a malicious user no longer can open and manipulate session files owned by other users in a shared host.

     

    The hidden pitfall is that while these protection mechanisms protect session files from unauthorized access, they can not prevent a user from authorizing others to access its session files. If all the session files are stored in a common folder it is trivial to trick a web application into loading session variables from a promiscuous session file.

     

    Local Session Poisoning in PHP Part 3: Bypassing Suhosin's Session Encryption

    By default Suhosin transparently encrypts session files stored by PHP. This seems to be adequate protection against local session poisoning in a shared hosting environment. But let's take a closer look.
  3. Stenfordas inženierijas skola no oktobra piedāvās trīs bezmaksas tālmācības priekšmetus:

     

    http://www.ml-class.org/ Machine Learning

    Machine learning is the science of getting computers to act without being explicitly programmed.

     

    http://www.ai-class.com/ Introduction to Artificial Intelligence

    Artificial Intelligence is the science of making computer software that reasons about the world around it.

     

    http://www.db-class.org/ Introduction to Databases

  4. Tikko interneta plašajās ārēs uzgāju šādu, manuprāt, interesantu rakstu:

    How much money should my company raise

    Raksts ir par ASV. Raksts ir par to brīdi, kad uzņēmums pāriet no startupa fāzes nākošajā fāzē. Rakstā īpaši iekrita acīs rindkopa:

    As one of our angel investors put it to us, "given favorable terms, raise as much money as possible." His rationality was two-fold: 1) the bubble won't last forever, and 2) startups cost more money and require more time than the entrepreneur ever predicts.
  5. Visādiem amerikāņiem un citādiem aizjūru bāliņiem mēdzot sesijas ietvaros mainīties adrese, caur kuru viņa "AoL" pieslēgums tiek translēts (NAT). Tāpat arī visādiem anonimitātes faniem, viņu Tor (un citu onion-routing) tīklā adreses mainās bieži un negaidīti.

    Tas, protams, nenozīmē, ka tādēļ jāatmet doma par IP adreses izmantošanu, vienkārši jāapzinās, ka šādi "pacienti" eksistē un ka jābūt gatavam saskarties ar sūdzībām no viņu puses.

×
×
  • Create New...